Hidden Costs of Duplicate Customer Data

A client asked me last week about what rate of duplicate data was “normal” in customer master data.

My initial answer was that, among companies that don’t have any formal master data management, data governance or data quality initiatives in place, duplication rates of 10%-30% (or more) are not uncommon.

When I was at D&B, we used to routinely see that level of duplication in client’s customer files.

In a study in the healthcare field, Children’s Medical Center Dallas engaged an outside firm to help clean up their duplicate data:

“Solving both the current and future problems around duplicate records helped Children’s improve the quality of patient care and increase physician acceptance of the new EHR. The duplicate record rate was initially reduced from 22.0% to 0.2% and five years later it remains an exceptionally low 0.14%. The 5 FTEs initially tasked with resolving duplicate records have been reduced to less than 1 FTE.”

“For the Children’s Medical Center, the results were heartening, not only from a care delivery standpoint but also because of the significant cost-savings that can be realized. A study conducted on Children’s data showed that on average, a duplicate medical record costs the organization more than $96.

So it is possible to get the duplication rate down to really low levels through careful analysis and the application of the right tools, as part of an ongoing data governance program. Even the hospital above (and hospitals are usually not mentioned as practitioners of best practices) was able to maintain a duplication rate of only 0.14% after 5 years.

And there are very real costs to not de-duplicating your customer data.  Depending on the functional area (marketing, sales, finance, customer service, etc.) and the business activities you undertake, high levels of duplicate customer data can:

  • annoy customers or undermine their confidence in your company,
  • increase mailing costs,
  • cause hundreds of hours of manual reconciliation of data,
  • increase resistance to implementation of new systems,
  • result in multiple sales people, sales teams or collectors calling on the same customer,
  • etc.

The best studies I’ve seen of the cost of duplicate data have been in the healthcare industry. One study I saw said:

“According to Just Associates, the direct cost of leaving duplicates in an Master Patient Index database is anywhere from $20 per duplicate to several hundred dollars. The lower cost reflects the organization’s labor and supply costs to identify and fix the record while the higher expense reflects the costs of repeated diagnostic tests done on a patient whose previous medical records could not be located.

The American Health Information Management Association (AHIMA) estimates that it costs between $10 and $20 per pair of duplicates to reconcile the records. If the records aren’t reconciled, however, the costs are even higher.”

Here are three more case studies backing up the range I quoted of 10%-30%:

  • Once the analysis was complete, Sentara discovered they had a significant duplication rate, over 18%. They had attempted to address the duplication rate in the past through a remediation process, but due to either technology issues or because the cost of merging and cleaning up the duplicates across their many different systems was too high, they had not yet successfully reduced their duplication rate. Source: Initiate Systems success story
  • Emerson Process Management faced a tremendous challenge four years ago in getting its CRM data in order: There were potentially 400 different master records for each customer, based on different locations or different functions associated with the client. “You have to begin to think about a customer as an organization you do business with that has a set of addresses tied to it,” says Nancy Rybeck, the data warehouse architect at Emerson who took charge of the cleanup. Working with Group 1, Rybeck analyzed the customer records for similarities and connections using everything from postal standards to D&B data, and managed to eliminate the 75 percent site-duplication rate the company suffered in its data. “That’s going to ripple through everything,” she says. Source: DestinationCRM.com
  • Problem: Number of duplicate records: 20.9% of Utah Statewide Immunization Information System records. Impact of Problem: Difficult to find patients in system—key barrier to provider participation, risk of over-immunization—unable to find reliable patient record, cost of unnecessary immunizations, risk of adverse effects on patients. Source: health.utah.gov.

And here’s a good quote from a white paper titled “Data Quality and the Bottom Line” by The Data Warehousing Institute:

“Peter Harvey, CEO of Intellidyn, a marketing analytics firm, says that when his firm audits recently ‘cleaned’ customer files from clients, it finds that 5 percent of the file contains duplicate records. The duplication rate for untouched customer files can be 20 percent or more.”

Every organization will need its own metrics, but left unchecked, the duplication problem is a hidden cost that drags at your company, slowing down your processes and making your analyses less reliable.

If your sales analysis reports can’t be sure that there’s one and only one record for each of your largest customers, then the sales figures for those customers are probably not right. So the entire report becomes suspect at that point.

I’d like to end with a great quote on data quality by Ken Orr from the Cutter Consortium in “The Good, The Bad, and The Data Quality”:

“Ultimately, poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point, you either have to stop and clear the windshield or risk everything.”

Please let us know what you think by commenting here.  We’re interested in hearing your thoughts on data quality and the issue of customer data duplication.

Tags: , , , , , , ,

13 Comments on “Hidden Costs of Duplicate Customer Data”

  1. Thomas Dye, CCP 12/17/2009 at 11:30 am #

    A recent client had lots of duplicate entities in source data, but a lot of that was by design. Their source systems allowed multiple duplicate customer records tied to a single account. The ongoing challenge was to change their thinking from the customer/account model to the party organization and individual model for master data. Common discussion was on the ‘real’ percent of duplicates and whether or not duplicate customers tied to a single account counted as a duplicate or not. Our thoughts: for cleansing for MDM purposes, these would always be counted as dups.

  2. Kenneth Hansen 01/06/2010 at 3:16 pm #

    The percentage of customer data duplicated in source systems varies much more widely and largely reflects the method by which products are sold/provided/managed and how the organisation has grown. e.g. insurance companies tended to create new silo systems for every new product and rarely migrated records from old systems to new when they acquired other companies so it is not unusual to see records of the “same customer” many many times. Another reason for “duplicates” is that business customers/suppliers merge or acquire each other.

    Success in de-duplication depends in large part on the extent to which customers are involved – that’s easier in healthcare, where both parties are anxious to ensure all the correct data is collated, than vendors who do not want to lose face and prefer to err on the safe side. Consequerntly most banks have matched large numbers of records of personal customers but still have a significant percentage that they dare not match because of the cost (financial and embarrassment) of an erroneous match. That applies particularly to banks who opened customer accounts in countries and times when prospective customers did not have to show an identity card.

    MDM does not reduce duplicated data as it often simply cross-references all the different records and that is valuable for reporting with aggregations per customer. The real cost of duplicate records is only reduced where processes on legacy systems are rewritten for shared use of customer data from a central server.

  3. Dan Power 01/06/2010 at 3:22 pm #

    Thanks for the thoughtful comment, Kenneth. You make several good points from a number of different industries. I think you’re right that the real cost of duplication is only addressed when business processes are re-engineered to use shared data on a centralized repository like an MDM hub.

    Thanks for dropping by the blog – I hope you’ll visit again soon, and if you’d ever like to contribute an article as a guest author, we’d be interested in that too.

  4. Phil Simon 01/10/2010 at 5:08 pm #

    Dan –

    You could write a book about this line:

    I think you’re right that the real cost of duplication is only addressed when business processes are re-engineered to use shared data on a centralized repository like an MDM hub.

    Great discussion all around.

    • Dan Power 01/10/2010 at 6:49 pm #

      Thanks for the kind words, Phil. This article sparked some great discussion – hope to see it continue!

  5. William Sharp 01/20/2010 at 10:14 am #

    Great information! More of these types of studies need to be conducted. In fact, you’ve inspired me to track the cost savings that my efforts deliver! On one project alone, I was able to deliver $450,000 in marketing cost reductions in the first year. When expanded to the lifecycle of marketing campaigns at this particular client, the total cost reduction was in the neighborhood of $1,350,000.
    Thanks for the data in this post. I’ve tried to blast it out there on LinkedIn and Twitter!

  6. Dan Power 01/20/2010 at 12:04 pm #

    Thanks for the kind words, William. And thanks for the mentions on LinkedIn and Twitter – every little bit helps!

    Perhaps you’d like to write a guest article on the Hub Designs Blog someday?

    Best regards — Dan

  7. Satesh 01/21/2010 at 12:56 pm #

    Excellent Post Dan!!! Written from a business perspective rather than DQ terms alone.

    Specially liked the bit “And there are very real costs to not de-duplicating your customer data” – the stats provided truly depict the financial/brand infringement that could occur due to bad data. The post remembers me of an article by Henrik Liliendahl (http://bit.ly/4oD1wb) as to how Copenhagen could be duplicated due to muli-lingual/human processing challenges leading to multiple contacts/customer record for a single customer


  8. Secondary sales management 12/22/2010 at 5:26 am #

    Duplication can be checked by some good tools. It is ok if it is up to 15% but should check carefully to not to exceed that. The whole post is so good with lot of information.



  1. Informatica Data Quality Matching Algorithms: Eliminate duplicates and reduce costs « Edgewater Technology Weblog - 01/27/2010

    […] you looking for a way to cut costs from operations? Matching algorithms can help you do it! Duplication data consolidation can deliver a direct cost savings to an organization’s operations through the elimination of redundant, costly […]

  2. Thank You To Our Readers « Hub Designs Blog - 12/23/2010

    […] article on the Hidden Costs of Duplicate Customer Data has received 1,175 total views over the past year, and How Master Data Management is Similar to […]

  3. It’s Good To Be On The First Page of Google « Hub Designs Blog - 03/27/2011

    […] Master Data Management is Similar to ERP was #1 for “erp master data”, and Hidden Costs of Duplicate Customer Data was #1 for “hidden customer […]

  4. Why Govern Master Data? | Hub Designs Blog - 04/03/2011

    […] real costs of duplicate data in your customer master […]

%d bloggers like this: