Quite often, an enterprise faces an event where it needs to load massive customer files into its enterprise information systems.
Examples include integrating a new subsidiary’s customer master with the parent’s CRM or ERP system, migrating to a brand new ERP, consolidating customers from various silos within the enterprise, importing partner files and their customers, etc.
Sometimes, attempts are made to programmatically improve data quality within a customer record, but because of tight deadlines, data quality across the file is usually not given serious attention.
IT’s thinking is usually that “We received 50,000 customer records; we uploaded 50,000 records – job well done!” But wait a minute, is that really true?
It is highly likely that duplicates exist within the file and the same customer is being loaded more than once. There’s also a possibility that the same customer already exists in your target system.
Multiple instances of a single customer can lead to end-user confusion, serious reporting errors and even to reduced efficiency and impacts to customer service.
A good approach is to be proactive about data quality and to plan for spending extra cycles correcting these types of problems in the customer files before doing the migration.
A simple tactic is to extract the existing customer records from the target system and run this file along with the legacy / source system data through an address validation and matching process. A number of vendors can do this task for you at a reasonable cost, ranging from 15 cents to 55 cents per record.
The next step is to separate the non-duplicates and load only these records in the target system. The duplicates are either managed outside the target system (by building cross-references in your data warehouse, for example) or, if your target system has a way to maintain cross-references, by uploading the cross-references only into the target system (typically an MDM hub or ERP application).
A major benefit of this approach is that the new records are genuinely new and have validated addresses for deliverability. This significantly enhances corporate data quality. Then, IT can say “We received 50,000 customer records; we uploaded only 40,000 records, the other 10,000 were duplicates – job well done!”