Seven Steps to Data Quality, by Kuppusamy Bhaskar

A great article on data quality by our friend Kuppusamy Bhaskar

Seven Steps to Data Quality

Many organizations start a Data Quality Program solely as an IT initiative without business involvement, and then try to persuade business teams about the value of data quality. For business stakeholders to fully understand the importance and value of data quality, it’s essential to get them involved from Day One as champions of the effort.

Follow these seven simple and practical steps to jump-start your Data Quality Program and make it quick, effective and able to meet your business needs. These ideas are from my practical experience of having implemented it at a large financial services company.

Step 1: Understand the business need for data quality

Take a “top down” approach by starting with the business and understanding their current pain points and the value to them of improved data quality. They’re the key stakeholders, so it’s best they drive the program and be part of the data quality process.

Case Study Lesson: we identified a small team of data stewards that engaged with business stakeholders to understand the issues they were facing due to lack of data quality. Those discussions led to detailed requirements gathering sessions.

Step 2: Define the scope

Take a “start small, finish big” approach. Identify the first set of systems and attributes that have major data quality issues where fixing them will give the best value for the effort.

Case Study Lesson: we identified source systems and data attributes that contributed to the key performance indicators (KPIs) of the business. We prioritized the data elements into low, medium or high in order to be included in the scope for measuring, monitoring and improving data quality.

Keep the scope narrow and yet effective so you can get back to the business with results sooner (days rather than weeks). Address disconnects or refinements early on.

Step 3: Collect detailed business requirements

Sounds obvious yet many data quality programs don’t have requirements documented, reviewed, and approved by the business stakeholders.

Case Study Lesson: we prepared an inventory of about 100 critical business rules to identify exception conditions in business operations: 

  • An “A-Active” loan cannot have unpaid loan amount (UPB) equal to zero. This implies either the loan status or loan amount or both are incorrect.
  • Loan to Value (LTV) ratio cannot exceed a predefined limit (Loans with high LTV indicate higher risk)
  • A customer with occupation as “Lawyer” or “Professor” cannot qualify for a “Student” discount.
  • Identify marketing amount spent on “Do Not Contact” customers or leads.

Step 4: Automate and profile business requirements using a standard tool.

Many data quality tools perform out-of-the-box analysis on rows and columns that could mean little to the business if the data is not validated against THEIR business rules.

Case Study Lesson: we used a combination of home-grown and Commercial Off The Shelf (COTS) tools. The rules were part of the data quality database. The rules were run periodically and results were analyzed by the data stewards and the findings were presented to technology and business stakeholders.

Step 5: Perform root cause analysis on the identified issues

Review findings with both technology and business stakeholders. The findings typically include the need for business process change, application change, user training, and data quality requirements change.

Case Study Lesson: our data stewards, in combination with technology and business stakeholders, performed the root cause analysis of issues.

The findings included:

  • Technology issues — application allowing null values in a certain field
  • Process issues — compliance documents not submitted for several years, added a process to check for compliance
  • Training issues — user did not know where or how to check for the document repository
  • Refinement of quality rules — data results initially thought issue prone were found to be valid under certain conditions
  • Or a combination of all of the above

Step 6: Remediate the root cause of issues.

Case Study example: this involved technology “fixes”. These were typically scheduled for the next available release. Process changes were communicated to appropriate business stakeholders for remedy and follow up. A very important field, storing percentages, had several versions of precision after the decimal in different systems (e.g. 5.5, 5.555, 5.55, or 5.55555). When these were used in calculations over huge loan amounts, the differences were materially significant.

Step 7: Measure and Monitor Data Quality Metrics on an on-going basis.

Case Study Lesson: we performed Steps 1 through 6 on a monthly basis. Metrics on data quality measurements were reported in a simple spreadsheet with insightful charts and graphs and trends over time.

As a result, business stakeholders were easily able to see how their data was getting better over time. If done right, you should start seeing data quality improvements within the first 90 days.

Additionally, I recommend base lining your data set prior to program initiation to compare and demonstrate the value once the program is in place. Communicate your “aha” moments from data profiling and analysis to business stakeholders. Demonstrate the value add and go for the next increment of the data set.

Can you share your experiences in improving data quality? How effective have they been? What would you add to the above list to make it simpler and more effective?

Kuppusamy Bhaskar,

Tags: , , , ,

3 Comments on “Seven Steps to Data Quality, by Kuppusamy Bhaskar”

  1. Dhaval Brahmbhatt 03/19/2012 at 11:42 am #

    Great article! Congrats to both Bhaskar and Dan for having published this. This is so true in our engagement – we started off with an executive asking for data metrics and started collecting them without realizing whether the metrics that we were collecting made any sense. The approach, as prescribed in here, should have been to understand pain points and measure those pain points via metrics on the data set that we have.

    Kudos on presenting this!

    Dhaval Brahmbhatt
    Director, CDM, Dell Inc.

  2. Dan Power 03/19/2012 at 12:19 pm #

    Glad you liked the article, Dhaval. I think Bhaskar hits on some great points and we’re looking forward to some other new articles that he’s working on.

  3. datadrivenservices 03/19/2012 at 12:31 pm #

    Thank you for your comment Dhaval!

    In my experience top down approach is more effective as business stakeholders get involved early and the data quality process can evolve sooner without going too far down the wrong direction.

    Thanks again!

%d bloggers like this: