Most data quality initiatives consist of processes, procedures and technology, with the goal of achieving a desired level of quality in corporate data.
While companies sometimes did data quality projects prior to the emergence of MDM, I cannot stress enough how important it is to take a fresh look at data quality when embarking on an MDM initiative.
Companies all too often focus exclusively on the MDM hub and either hope that the MDM hub will fix their data quality problems or have not yet recognized the fact that there is a fundamental difference in data quality requirements between MDM and previous information management approaches.
We consider data to be of a high quality if it is accurate (correctness) , complete (no missing data), valid (format and rules), available (as needed) and stored securely.
The need for high quality data quality was first fueled by projects such as CRM, ERP and data warehousing beginning more than a decade ago. However, what resulted from many of those projects are silos of corporate data, along with a silo approach to data quality.
For example, a data warehousing project cleaned up the data being stored into the data warehouse through the ETL (Extract-Transform-Load) processes, with further cleansing after the data was loaded, leaving the source systems untouched. Other examples include solutions for compliance and business intelligence with the capability to query data from numerous sources and once again treat data quality at the target only, not at the sources.
In contrast, an MDM initiative requires data quality to be addressed at both the source systems and the hub, which is a key difference between previous approaches and MDM.
Major source systems (and there can be as many as 30-50 at large companies) typically collect and maintain their data locally and do not share the data. An MDM hub, whether implemented in the transactional, registry or hybrid style, becomes the system of record for enterprise data and facilitates the sharing of that data throughout the enterprise as needed.
Therefore for MDM to operate properly in sharing and synchronizing data between sources and the hub, it is critical that the data remain consistent between the source and the hub and hence that data quality needs to be addressed at the source.
It’s simply not enough to cleanse the data only at the hub if the source system does not recognize the data when it reads it back from the hub.
So where does one begin? What we recommend is that prior to implementing an MDM hub, begin with data profiling of all source systems involved.
Profile the data that will feed the hub, analyze the results and carefully catalog the metadata for the key data elements. Once the characteristics and quality of the data is understood, the next step is to create business rules to correct data quality issues at each source.
There are many data quality products on the market and choosing one will have to be the subject of another blog.