Business Optimist • Technology Realist • Old School Values

Company entity matching (and deduplication)

  • data management
  • Refresh NEW UI

Scoped, provided outside in knowledge through partnerships / OEM vendors, and developed monetization strategy for Entity matching / record linkage as-a-service.  Migrated from rule based record linkage to ML based models (random forest & logistic regression) using hadoop and spark applications with 20% increase in F1 and 40% increase in recall while maintaining 90% precision.  Core features include:

  • Tokenization:  Significant prediction, segmentation, run together parsing, ordering, stop words
  • Transformation:  entity names, address, phone, and url canonicalization
  • Business Taxonomies: Abbreviations, acronyms, alternate names, word expansions, misspellings, internationalization
  • Feature blocking: tokenization, fuzzy (ratio, token set ratio, token partial ratio)
  • Context: Family Tree (subsidiaries, corporate linkages)

Increase content extraction and yield by ~100%;  enabled data cleansing applications (workflow and API ) with monetization of $1.1M in ACV in first 18 months (2016).

Refresh NEW UI

Comments are closed.