Lead the re-factoring of b2b contact data harvester (using ML and text analytics) from publicly available corporate domains (and URLs). The goal was to increase output for both net new data integration as well as data validations on existing records. Results were an increase in recall by 22% (to 75%) (~4M b2b contacts against 800k domains) while increasing precision by 5% ( to 80%). Key harvester modules include:
- Crawler / Parser (leadership directories, press releases, quotes)
- Page (and biography predictor) Classifier (BOW)
- Name / Role Identifier (NER / NLP)
- Name segmentation taxonomies (ML)
- Title and persona taxonomies (ML)