“I shot a cat wearing green pajamas.”
Before you report me for animal cruelty, did you paint a mental image of me or the cat in pajamas? How do you know I wasn’t referring to the phrase used in the 1920’s to describe a person at the top of their game, Cat’s Pajamas? Would your opinion of me change if you had implicit knowledge that I was a professional photographer specializing in novelty pet shoots? How about if you knew I photographed vintage sports while wearing bed clothes or made analogies as if cars were dressed? Maybe you implicitly know I am a caretaker that lives at a zoo who carries a tranquilizer gun for safety or that I am a guide for a big-game safari hunting club? Do I need to go on? A simple factual statement like this is common in tabloid press because without proper context or additional implicit information, it is easy to reach invalid conclusions.
Computers are good at many things, but unless they are programmed to be learning machines, computers are not good at inferring context. Moreover, computers don’t need to infer context if system designers and data owners understand how ambiguity affects their MDM and Data Governance effectiveness.
Do your business users have all the information they need to absolutely reach valid conclusions in every situation? If you said ‘Yes’ and truly understand what the term ‘implicit values’ means, you don’t need to continue reading.
Looks like all of you are still here. For the record, I was not the one that was wearing pajamas.
In Data Architecture terms, an implicit value is one that is conveyed indirectly by applying information found elsewhere. The antonym of an implicit value is an explicit one, which is a value with a self-contained purpose that exists in a single domain (which means that all values are of the same type and have the same business rules applied to them). These topics will be further discussed in this article. In practice, explicit definitions are desired over implicit ones, but there are exceptions.
This article will present the various scenarios, reasons, risks, and alternatives to having certain types of implicit values in your design and why there is literally no reason to use certain types of implicit values in MDM or BI architectures. Three classes of implicit values will be presented: Coded Values, Overloading, and Null/Default values.
Using coded values is a widely accepted and valuable relational modeling practice used to classify a value within a single domain as defined by a separate code value. For example, a GL dollar amount can be assigned to one of many GL Line Types. Two columns consisting of the GL dollar amount and the GL Line Type Code are implicitly linked together because each has no context without the other. This is an acceptable design practice because each column exists in one and only one domain: GL dollar amounts or GL Line Type Codes.
Overloaded columns or records, on the other hand, exist when one of the columns can be of a different domain based on the value of another. If the GL amount from the previous example represented a dollar amount for one GL line type and an item count for a different GL line type, the GL amount column would be considered overloaded.
This practice was done quite often in the past using COBOL Copybooks. Copybooks exist under the premise that a variable record type was an efficient way to model information within a 3GL coding environment. With the advent and advances of relational databases (and normalization standards), the copybook approach does not translate well into database tables (thank goodness).
One disadvantage of using overloaded columns is that the characteristics of a column are dictated by the superset of all the domains the column represents. This removes one of the benefits of defining a domain to explicitly enforce data quality, integrity, and definition. For example, if an overloaded column implicitly stores either the nine characters of a Social Security Number or an account holder’s alpha-numeric 14 character Member ID, the column must be at least 14 characters in length and allow for numbers and characters. This makes it more difficult to explicitly enforce SSN formatting rules and display options because it needs some outside code or test to put the value into context.
Overloading inherently restricts a record into referencing only one type of value, which on the surface may seem like a noble goal because it guarantees mutual exclusivity. In some business cases this may never change, but in others, the intent may be short-sighted when business rules change. For instance, an overloaded column of division code and line of business code created under the premise that a record would align to only one of these classifications breaks if products realign to span multiple LOBs.
Whether the reason for using this approach was to upgrade older systems cheaply, or because a designer believed at the time that it was an innovative solution to a problem, the results are the same. Overloading makes data management more difficult, data quality harder to enforce, adds risk of incorrect use of data, and longer-term data changes become more expensive. Despite the brow-beating (or possibly because of the lack of it), designers and developers still overload columns in ERP, CRM, and custom-developed systems. Their existence in legacy systems may not be undoable without costly rework, but they have no place in good MDM or BI design (more on this later).
In the next installment of this series, we’ll discuss the use of NULLs and default values.