The Source for Java Technology Collaboration


The Mural Matching Engine supports the joining of records that relate to the same entity in two or more disparate data sets. In absence of a shared, unique key, record matching requires the comparison of groups of partially-identifying, non-unique data items between pairs of records. Data items with variable formats, such as names and addresses, need to be transformed and normalized in order to validly carry out these comparisons. This process is commonly referred to as "standardization". This mini-talk presents Mural's approach to standardization which uses a combination of lexicon-based tokenization and finite state machine-driven parsing where the statistical distribution of input symbol sequences is learned from training data sets.

Topic CommunityCorner2008-TheMuralStandardizationEngine . { Edit | Ref-By | Printable | Diffs r1 | More }
 XML java.net RSS

Revision r1 - 12 Mar 2008 - 19:37:22 - Main.xrrocha
Parents: WebHome > CommunityCorner