As part of a Master Data Management solution, one of the key decisions to be made is the choice of the MDM Hub architecture.
Gartner and others describe some two or three different Hub styles, but in this article, we’ll be paraphrasing those styles and explaining them based on our experience.
Registry Style: In a Registry Style hub, the various source systems publish their customer data and the subscribing hub stores only the source system IDs, the Foreign Keys (record IDs) and the key data values (needed for matching). The hub runs the cleansing & matching algorithms and assigns unique global identifiers to the matched records, but it does not send any data back to the source systems.
The Registry Style hub has an attribute locator service that serves as a reference for finding the Single Source of Truth value for a particular data element of a record. And if the hub vendor provides a metadata service, one can key in the business rules in the registry hub as well.
When a request to get a composite view of a customer is received by the hub, it has to build the view using the cross-reference data and the global customer ID. It does this by invoking each of the reference systems and fetching the authoritative values of each data element that makes up the data record. It then builds the 360O view in real-time.
A Registry Style hub is preferred when you have a large number of source systems spread across the globe. Each data source is the “system of record” for a particular record type. In such cases, it’s difficult to find an authoritative source of the data element outside of the source system and each geography may have its own political and data-peculiar turf.
The Registry Style architecture avoids the political issues about over-writing data in the source systems. However, performance can be an issue in this architecture, since calls are made to each source system and latency factors can quickly come into play.
Other benefits include low cost data integration in a fast time-frame. Very little data reconciliation is required between the source systems and the Hub.
The strength of the match engine is critical, since false positives and false negatives in matching can both be a serious problem. Because matching is a critical topic and could take quite a bit of space to discuss, we’ll cover it in more depth in a future blog article.
Transaction Style: In this architecture, the Hub stores, enhances and maintains all the relevant data attributes. It becomes the authoritative source of truth and it publishes this information back to the respective source systems.
The Hub publishes and writes back the various data elements to the source systems after the linking, cleansing, matching and enriching algorithms have done their work. The Hub needs to support merging as well as splitting of master records. This style of hub is appropriate when the organization has decided not to invest additional resources to maintain data in the source systems or when the data quality in the source systems is poor or information is not current.
The Transaction Style hub typically feeds data warehouses and data marts as well.
Technologically, this architecture requires more complex ETL rules, more intensive data reconciliation and data synchronization and thus, takes longer to implement. However, the Hub serves as the authoritative source of truth and data quality is monitored centrally in terms of accuracy, timeliness, standardization and integrity.
Security and visibility policies at the data attribute level need to be supported by the Transaction Style hub, as well.
Hybrid Style: This style resembles the Registry Style but stores more data elements / attributes than just those required for matching. Sometimes to enable a larger set of data elements, ETL tools may need to be integrated with the Hybrid Hub solution. The Hybrid Style addresses the performance issues mentioned above which are often faced by Registry Style hubs.
One can use some collaboration tools like workflows / Business Process Execution Language (BPEL) to accept the recommended changes by the Hub into the source systems and that has been a useful approach implemented at some companies.
Hybrid Style hubs can be implemented relatively quickly, because the data in the source systems still usually does not get changed, and integration requirements are less than the full Transaction Style.
Your comments, as always, are welcome.