“Where Does Data Come From, Mummy?” by John Owens
Another strong article by John Owens, a thought leader and consultant on data quality, master data management, as well as business process and data modeling.
Gooseberry Bushes And Storks
Could it be that the current sorry state of data management around the globe is due to data professionals having repressed parents?
Perhaps when many data professionals were young and inquisitive and asked their parents, “Mummy, Daddy, where did I come from?” the horrified look that they saw in their parents’ eyes was enough to put them off asking such questions ever again.
Some parents, recovering from their initial shock, may have tried to be helpful (if still repressed) and told them that babies just appeared under gooseberry bushes or were delivered by invisible storks.
Whatever the response, the effect on the fledgling data analysts was profound and lasting, as most have carried the trauma into their adult professional lives.
If you observe them at work, you can’t help asking yourself, “Do they ever really ask themselves where data and data structures actually come from?” Sadly, it appears that few ever do. To most of them, data is just data and data structures are just data structures. It does not really come from anywhere special – it just is.
When they do act on the instinct that it must come from somewhere, they look in all of the wrong places, such as data in existing systems, or remembering data from a past assignment. But most times, they just plain guess.
I’ve even observed data analysts gathered around whiteboards and “brainstorming” the entities that they imagine are in a particular “data domain”, for example, the production data domain, the finance data domain, the logistics data domain, etc.
Having created a list of imagined entities, they then try to imagine what the attributes of those entities might be and what relationships (if any) the imagined entities might have with each other.
A Decree Is Made
Having created this imagined list, they then publish it as the “Corporate Data Model” and decree that these are the entities and attributes that must be created and kept updated throughout the enterprise.
They see this as “Data Governance” and “Data Standards”, the implementation of which will bring about “Data Quality”.
All departments and employees across the enterprise are mandated to adhere to the standards. This would be all very laudable, were it not entirely insane!
A Black Economy Is Born
These are the mandates that bring the practices of Data Governance, Data Standards and Data Quality into disrepute. Worse still, they quite often bring an enterprise to its knees.
The reason is that data definitions derived in this way have nothing to do with the needs of the enterprise. They are completely bogus, a fiction.
Because employees are mandated with updating systems with this data, they have to do so. However, because they are measured against Key Performance Indicators (KPIs), they are forced to set up and maintain completely different systems and sets of data in order to be able to do their day-to-day work and meet their KPIs.
This is why a data black economy is born in so many enterprises. The data establishment in these enterprises berates and tries to eradicate it, spouting terms like ‘governance’, ‘standards’, ‘best practice’, etc. The irony is that it’s the data black economy that keeps such enterprises alive. Were they to implement the completely misconceived data structures of the data establishment, the enterprise would most times collapse.
The Birds And The Bees
If you are a squeamish data professional, you’re going to want to run away now or maybe close your eyes, put your fingers in your ears and hum ‘nah nah, nah, nah, nah’.
Alternatively, you could open your eyes to read something that could take your work and the fortunes of your enterprise to a whole new level.
The truth that you have been avoiding all these years is only shocking in that it is so powerfully simple – all data comes from business functions!
When I use the term business functions, I’m referring to the core activities of the enterprise, not to organisational units, which are often incorrectly referred to as ‘functions’.
All data in every enterprise is created, used and transformed by the business functions of that enterprise.
All of the elements of data—entities, attributes and relationships—are also defined by the business functions.
How Does That Work?
Functions That Create Data
Business functions that create data can tell you quite a lot about data entities and relationships; those that use and transform data can tell you even more.
For example, the business function called “Sell Product to Customer” tells you that the enterprise needs data entities called Sale, Product and Customer*.
It also tells us that there are relationships between these data entities.
The business function called “Accept Delivery of Product from Supplier” tells you that the enterprise needs data entities called Delivery, Product and Supplier* and that there are relationships between these Data Entities.
*Note: Customer and Supplier are not actually data entities, they are roles played by the data entity known as Party. The rationalisation of these roles is an essential part of master data management (MDM). Please see my article “How SCV Causes Fragmented MDM” to find out more about this.
Functions That Use Data
Business functions that use and transform data will tell you a lot more about data and data structures.
For example, the business function “Analyse Sales Volume and Value by Product and Product Type for a Specific Calendar Period” tells us all of the following:
- Sales have a relationship to Products.
- Sales need to have attributes of Date, Value and Volume.
- Products need to have a means of being classified by Product Type. This might be an attribute of Product or a relationship to a reference or domain entity called Product Type.
However, this Business Function does not tell us that Sales have a relationship to Party, which the business function “Sell Products to Customers” did. This shows us that we need to look at all business functions in order to get the full range of data entities and their attributes.
There is one element of data structure that is not given by business functions and that is the Unique Identifiers (UIDs) of data entities.
A very powerful way of deciding on the UID of any data entity is to ask the question, “With regard to this enterprise, what are the elements of this data entity that make one occurrence of it uniquely different from every other occurrence of it?”
There are two major things to know and apply regarding UIDs. The first is that it is the business and not IT that must define what they are. The second is that UIDs are never codes. I explain why this is so in my articles The Five Pillars of Preventative Quality and Unique Keys are the Primary Cause of Duplication in Databases.
Remember, every business function creates, uses or transforms data. If it does not, then it is not a business function.
A simple, yet very powerful, tool to help you identify what data elements are needed to support the execution of the business functions across the enterprise is called the CRUD Matrix.
This is a matrix with business functions on one axis and data entities on the other axis. The intersection displays C for Create, R for Read, U for Update and D for Delete.
A matrix can also be drawn for the attributes of Data Entities as shown below.
These matrices are a very powerful means of checking for and removing anomalies and duplication in the Function / Data Matrix.
It’s Simple, Simon
Defining the data needs of an enterprise is not black magic. Nor is it an abstract or abstruse exercise.
It is in fact a relatively simple exercise based on the fundamental understanding that the only data that needs to be known about and held in any enterprise is that required to support the execution of the business functions of that enterprise – nothing more, nothing less.
All of the data needs of any enterprise of any size and in any sector can be unambiguously defined by knowing its business functions.
About the Author
John Owens is a thought leader, consultant, mentor, practitioner and writer in the worlds of enterprise identity, strategy, business process and data modelling, data quality and master entity management.
He is the creator of IMM, the Integrated Modeling Method.
As a coach and mentor John typically works with directors, senior executives and managers who are dealing with the pain and losses in their enterprise caused by unclear identity and strategy, flawed processes, bad data and fragmented organisation and systems.
He helps eliminate the losses and add to the bottom line.
He has built an international reputation as a highly innovative specialist in resolving seemingly intractable issues in these areas and has worked in and led multi-million dollar projects in a wide range of industries across the UK, Ireland, Europe and New Zealand.
View John’s LinkedIn profile at http://www.linkedin.com/profile/view?id=29090224.