Here’s the final segment of Rob DuMoulin’s series on the dangers of implicit knowledge and the importance of context.
In the first and second segments of this series, we discussed situations where implicit values are woven into business processes or IT systems unknowingly or on purpose. The examples provided may seem trivial on the surface if one considers that everybody in the business had the implicit knowledge needed to make sense of the information. It is not so trivial when time passes, employees retire or leave, and there are implicit values scattered around the enterprise that nobody remembers anymore. Add to that the fact that implicit values, by nature, are not self-documenting and in my experience rarely externally documented. By now you should be able to identify the existence of implicit values in overloaded, NULL, or implied default values within your organization. More importantly, you should understand the potential consequences of each situation and be able to intelligently challenge those who perpetuate these practices. This last section provides alternatives to implicit designs and provides methods for mitigating risks within legacy systems.
Governance Policies are where many data quality issues get resolved or avoided. Without defined Governance, the status quo becomes the standard and data quality does not improve. Adding new policies may do little for directly combatting legacy system shortcomings, but it does set the stage for preventing the proliferation of implicit values by mandating that new interfaces to legacy systems not accept them.
An involved and informed Data Governance team is the best defense to limiting proliferation of implicit values. Introducing an enterprise-wide standard to restrict the use of optional relationships between code parent and referring child tables is a start. As stated earlier, optional relationships are ones that allow NULL child table values to exist. For example, a sales table allows a discount type code of NULL if no discount applies. To satisfy such a mandate in this situation, discount code table entries of ‘Not Provided’ and ‘No Discount’ would be added. If a NULL is received, the relationship maps to the ‘Not Provided’ entry. The Point of Sale system could now default a discount to the ‘No Discount’ code value if none was given. This solution explicitly provides a discount value in all situations and provides an easy means to identify when data transformation issues occur if a ‘Not Provided’ value sneaks in. The approach leaves open possibilities for additional entries in code tables to represent a ‘Not Applicable’ business condition or an ‘Invalid’ situation which could be set if a code value is detected during loading of the sales table that does not match a value within the domain of valid discount codes.
A Data Governance body worth its salt creates standards concerning definition, collection, and dissemination of metadata. In the policy planning phase, the Governance body can define the required column-level metadata collected to include values like: Default Value, Min/Max values, Domain, and Nullability. If business users define that NULL values are allowed, a follow-up discussion is required to determine the business meaning of a NULL value. Such a discussion is the opportune time to explore options to NULL values. Forcing the business data owners to define data elements to this degree provides the rules to eliminate several forms of undesired implicit values and it makes for better business system designs.
A strong Data Architecture group plays a role in Data Governance policy enforcement. Data Architecture groups influence database and application designers to use NOT NULL and default value constraints and to define relationships as ’Required’ instead of ‘Optional.’ Mandatory peer code reviews using checklists ensure designs meet enforced policies.
Overloading, as defined in the first segment of this series, is a costly practice that is easily avoided through proper data design techniques. Overloaded code values are avoided by creating separate child fields and separate parent code tables. Overloading of transaction metrics is handled similarly by splitting transaction values into different attributes if at the same summary level or different tables if transactions are at different summary levels. Overloaded records, like copybooks, just have no place in relational database design. If somebody tries to justify them, you have my permission to mock them in public but I am not responsible for any repercussions if the mock-ee is your boss.
The goal of this article was to demonstrate that using implicit values comes with long term consequences and this practice should be scrutinized. There are compelling arguments to use implicit design methods, but doing so should be done knowing all of the options, consequences, risks, and potential long-term costs. Resist temptations for quick-fixes when it comes to data design and focus instead on best practices.