Skip to content

January 27, 2008

2

Critical Data Quality Questions

by Gaurav Arora

digg digg this | del.icio.us del.icio.us | reddit Reddit | StumbleUpon Stumble It!

In past issues of this blog, we’ve discussed aspects of the People, Process and Technology issues related to a Master Data Management program. In this article, we’ll touch on an equally important aspect – the underlying information.

Having decided on the domain of the data (customer, product, employee or something else), it’s imperative for the company to assign one or more project resources (preferably from the business, not IT) to gather an in depth understanding of that domain’s data. This can be done well before the software evaluation stage of the project cycle.

Some of the questions to be answered by those resources would be:

(1) As part of the MDM initiative, around which data elements will the organization will build its data quality program?

For example, a customer record can composed of: customer legal name, common or “street” name, brand name, detailed physical address (including county, landmark, driving directions, ZIP/Postal Code), e-mail address, phone number(s), parent company relationships, and additional attributes like customer type, memberships, etc. It’s important to scope these data elements up front, and communicate to the stakeholders that as part of MDM, data quality will be injected into these selected data elements on an organization-wide basis. A data profiling tool can help enormously in identifying the data elements that need better data quality.

Only the identified & scoped data elements need to be sent to the MDM Hub, e.g. only customer name, address lines, city, country, ZIP from various source systems may need to be sent to a hybrid-style hub.

(2) In how many source systems does that domain’s data reside?

For example, a customer address can reside in the ERP or CRM systems, or a self-service kiosk, a sales force automation system, a partner network or even an external data provider like D&B. One needs to identify these sources and the structure of the data in these systems. Later, a mapping exercise will be done to map the data elements in the various systems into the Hub.

(3) Who are the people that create that data? Who are the people that consume the data? Who are the people that are impacted by the data?

It’s very common that the customer legal name and address are created by the ERP system’s order management end user, with a later batch-feed to the CRM. The Customer Support Representative may then modify those data elements. Even later, the customer themselves may log into a self-service portal and modify the same data elements. The CRM may batch-feed the data to the Contracts or Installed Base system. It’s critical these data flows and the impact of the changes be understood, documented and stored in a metadata repository.

(4) Which system can be deemed as the authoritative source for each data element?

Typically, the system that creates the record is treated as the authoritative source, as opposed to another system which consumes it. However, certain elements of the record are added or modified later by downstream systems. It’s important to build business rules for each data element to identify which system can be deemed to own that data element.

A typical hub provides tools to cull from various source systems and build the single source of truth from multiple sources. But to implement that, business rules need to be written, communicated, signed off and stored in the metadata repository.

All the above are hard questions, and estimating the resources and effort for the above tasks can be a make-or-break issue for the MDM project. Therefore, if an organization can start answering these questions and documenting them in the metadata repository in the early stages of the MDM initiative, the team has a great head start on the design stage.

Your comments (as always) are welcome.

Follow

Get every new post delivered to your Inbox.

Join 5,198 other followers

%d bloggers like this: