Skip to content

Posts tagged ‘Data Quality’

30
May
Arctic Ocean

The System of Record in MDM, by Dalton Cervo

Today’s guest article is by Dalton Cervo, the co-author of a great new book titled Master Data Management in Practice – Achieving True Customer MDM. Read more »

23
May
Elephant at Pilanesberg National Park, South Africa

Africom’s PROTEA Program

Our 300th article. After this year’s Gartner MDM Summit conference (May 4-6 in Los Angeles), Hub Designs sent a small team to a new client in South Africa called Africom.  Read more »

22
May
Information Management

Hub Designs Clients Named To “25 Top Information Managers”

Each year beginning in 2010, Information Management magazine recognizes the best information managers and “people to watch”. Read more »

19
May
Mountains of Data

Information, Intelligence and Process (Part 4) by Julie Hunt

Here’s the final article in this great series by Julie Hunt, an accomplished software industry analyst.  Read more »

18
May
Intelligence

Information, Intelligence and Process (Part 3) by Julie Hunt

Here’s the next article in the series by Julie Hunt, an accomplished software industry analyst.  Read more »

17
May
Breaking Down The Silos

Information, Intelligence and Process (Part 2) by Julie Hunt

Here’s the next article in the series by Julie Hunt, an accomplished software industry analyst.  Read more »

16
May
Environment

Information, Intelligence and Process (Part 1) by Julie Hunt

We’ve been on site at a new client in South Africa since the Gartner MDM Summit. Here’s a great series of new articles by Julie Hunt, an accomplished software industry analyst. Read more »

28
Apr
Launch by thejcgerm

Launching Hub Designs Magazine

The Hub Designs Blog is becoming the Hub Designs Magazine! Read more »

24
Apr
Think Tank

Announcing the Hub Designs MDM Think Tank

Hub Designs gets requests pretty frequently from various MDM vendors to brief us on their latest products, “go to market” strategies, acquisitions, future directions, positioning, etc. Read more »

21
Apr
Zakim Bridge by stripermjg

MDM Is Not Only About Aligning “Business” and “IT” (Part 1)

Business and IT alignment is a topic repeated ad nauseam. There seems to be a belief that the Holy Grail of IT is achieved once that alignment is in place. This belief applies strongly to Master Data Management (MDM) as well. Read more »

20
Apr
Managing Complexity by Michael Heiss

Getting Started with Data Governance, Part 2

This is the third article in an ongoing series on Data Governance sponsored by SAP. Here are Part One and Part Two of the series. Read more »

19
Apr
Data Governance

Getting Started with Data Governance, Part 1

This is the second article in an ongoing series on Data Governance sponsored by SAP. You can find the first article in the series here. Read more »

18
Apr
Platinum and Gold

Golden Relations and Platinum Relations, by Henrik Liliendahl Sørensen

“Golden copy” is a term widely used in master data management (MDM), as we often see the master data hub as a golden copy of the data in the company’s operational databases. Read more »

16
Apr
Oracle MDM

Oracle 2011 MDM Strategy and Roadmap

This session at COLLABORATE 2011 was presented by Manoj Tahiliani, Senior Director of MDM Product Management & Strategy at Oracle. Read more »

13
Apr

The Strategic Nature of MDM According to Oracle

Oracle Logo

This week, I attended David Butler’s presentation at the Oracle Applications Users Group COLLABORATE 11 conference in Orlando, FL.  Read more »

31
Mar

Talend MDM Celebrates Its One Year Birthday

Talend Logo

Jim Walker, who handles MDM Product Marketing at Talend, sat down with us recently for an analyst briefing to fill us in on how Talend is doing with its Talend Master Data Management product. Read more »

20
Mar

Why Govern Master Data?

This is the first article in an ongoing series on Data Governance sponsored by SAP.

Data Governance

The most important thing about data governance is to “start from where you are”. Most companies are just getting started on their data governance journeys. It can be hard to admit that your company is at data governance maturity level 0 or 1. But the most critical step is the first one – getting started. Read more »

1
Mar

“Data Governance In The Cloud” Seminar, March 24th in Atlanta

User Adoption is a Critical Component of Your Success.

March 24, 2011 • 7:30-11:30AM
JW Marriott Buckhead
3300 Lenox Road,
Atlanta, GA 30326

Webinar Registration

Master Data Management (MDM) is key to driving revenue and achieving greater productivity with Salesforce.com.  A clear data strategy and processes for collecting, aggregating, consolidating, and distributing data throughout an organization impacts your bottom line.

  • Is bad data undermining your sales performance?
  • Do you have data in disparate systems, fragmented — some in the cloud, some on-premise?
  • Are your people, processes, and technologies not aligned properly to ensure data efficiency, access and consistency across your organization?
  • Is your sales team spending more time searching for data than they are with customers?

If you’re experiencing any of these issues, attend Data Governance in the Cloud to get control of your data and an action plan for success.

Agenda Overview

  • Introduction to the Fundamentals of Master Data Management
  • How to Establish Data Governance within your Organization
  • Introduction to CRM Process Modeling
  • How to Develop a Sales & CRM Process Model
  • Cloud Data Integration – Informatica
Seminar Sponsors

Key Insight from Proven Leaders in CRM… Attend this information packed morning and leave with an action plan to gain control of your data, empower your people, initiate processes, and learn which technologies can help you accomplish your goals.

Meet the Experts

Ernie Megazzini
VP, Cloud Technology
CoreMatrix

Dan Power
President
Hub Designs

Darren Cunningham
VP, Marketing
Informatica Cloud

Who Should Attend

Business and IT executives and management responsible for consistent and proper handling of data across an organization.

Benefits of Attending

Attendees will leave with a clear understanding of how a company with a well defined, integrated MDM strategy can achieve greater revenues and increased productivity. 

Attend this information packed seminar to get control of your data strategy and achieve success with your CRM solution

Webinar Registration

29
Dec

MDM Solution Specialist / Senior Manager at Oracle

Oracle Logo

From time to time, Hub Designs highlights an MDM-related open position as a courtesy to a friend of the firm.

Oracle’s Applications Solution Group (ASG) is part of the North American Sales organization, and is responsible for, among other things, fueling the growth strategies of the Public Sector’s applications sales teams by providing innovative Edge solutions, creating compelling applications upgrade programs, and driving successful early adopters of new applications. In addition, the ASG is responsible for delivering clear, relevant content and enablement to assist in demand creation and customer roadmap activities.

Oracle’s Master Data Management (MDM) solution is a Commercial-Off-The-Shelf (COTS) set of applications (MDM Hubs) designed to consolidate, cleanse, enrich, and synchronize key business data objects across the enterprise and across time. It includes pre-defined comprehensive data models with powerful applications to load, cleanse, govern and share the master data with all business processes, operational applications and business intelligence systems. The MDM solution provides Public Sector customers with the ability to overcome problems associated with poor quality and fragmented data that are typically inherent with major programs such as citizen services, social services, child welfare, revenue collections, fraud detection, vendor file management, regulatory compliance, financial controls and site consolidations.

Overlay sales personnel provide specialist product expertise to the sales force. Manages and directs a staff of solution specialists and/or managers in providing specific industry or product expertise to facilitate the closing of deals within sales territory. Establishes and communicates departmental objectives and implements plans to ensure attainment of business objectives. Works closely with sales management to ensure proper utilization of resources and provides justification for additional resource requests. Oversees the Interaction with sales team to architect the solution, and develop and execute solution strategies for market. Manages teams in the sales process for establishing market visibility and deal visibility. Develops forecasts. Participates in industry/product functions, seminars and round tables to remain up to date on industry or product knowledge. May deliver presentations/solutions to high level clients and industry conference attendees. May provide training to field sales on industry/solutions.

Manages and controls activities in multi-functional areas or sections. Ensures appropriate operational planning is effectively executed to meet business needs.

The person in this position will serve as a Solutions Specialist supporting Oracle’s Master Data Management sales activities as part of a team that comprises other ASG members and personnel from various North American Public Sector organizations, including business development, program management, field sales and Industry Business Unit Solutions Specialists.

The Oracle MDM solution includes the following:

  • MDM Hubs such as Oracle Customer Hub, Oracle Product Hub, Oracle Supplier Hub and Oracle Site Hub.
  • Oracle Data Quality Servers, providing end-to-end data quality for structured data (and Oracle Product Data Quality accommodates unstructured data).
  • In addition to the Data Hubs, Oracle MDM includes Data Relationship Management (DRM), a powerful financial reference data master, with best in class hierarchy management capabilities and a business enterprise dimension-mastering tool with deep integration into Enterprise Performance Management (EPM).

MDM represents one of the fastest growing areas for Oracle’s solutions in the government marketplace.

Responsibilities include:

  • Develop strategic & tactical plans for Oracle’s MDM solution for the Public Sector (US Federal, State & Local & Canadian governments), including the Data Hub components identified above
  • Build clear and differentiated messaging and value propositions for the appropriate target audiences.
  • Work with the field sales force to create forecasts and provide status reports regarding revenue quotas
  • Work closely with product development & management, business development,
  • program management and field sales to develop pipelines and meet revenue quotas
  • Assess competitive products and analyze trends to effectively position MDM for Public Sector solutions
  • Meet with Public Sector customers to make MDM presentations to convey value propositions for productivity improvements and cost savings

Qualifications:

  • Knowledge of and demonstrated experience with MDM software solutions
    • Hands-on experience (configuration, implementation, architecture) of MDM products.
    • Experience with the data quality components such as Informatica or Product Data Quality
  • Knowledge of and demonstrated experience with Public Sector enterprise business processes
  • 10+ years of experience in supporting enterprise software sales
  • Demonstrated success in devising strategies and supporting end-to-end sales plans and revenue quota achievement
  • Proven leadership and “take charge” skills
  • Ability to effectively and efficiently work in cross-functional teams and matrix environments.

As part of Oracle’s U.S. employment process, candidates will be required to complete a background check, prior to an offer being extended. These background checks include:

  • Prior Employment Verification
  • Education Verification
  • Social Security Trace
  • Criminal Background Check
  • Motor Vehicles Records (where required for position)

For a listing of all opportunities at Oracle, please go to www.irecruitment.oracle.com

Oracle Supports Workforce Diversity.

To apply for this position, please contact:

Julie Boyer, Oracle Recruiting at 434-973-0898 or julie.boyer@oracle.com

15
Dec
Informatica Logo

Informatica Progress

Misti Lusher and Ravi Shankar from Informatica were kind enough to do an analyst briefing for Hub Designs recently, to bring us up to date on what’s been happening with Informatica in the past few months.

The combination of Siperian with Informatica has exceeded their expectations so far, with MDM revenue running significantly ahead of quota and Informatica landing customers in a number of new vertical industries such as retail, healthcare, aerospace/defense, agriculture, education, and hospitality. Informatica continues to penetrate EMEA and has had its first successes in Asia Pacific and Latin America as well.

There’s also a healthy sales pipeline being built for future quarters, with the top three verticals being healthcare and life sciences, financial services and insurance, and high tech and retail. Growth is being seen all over the world, with a large percentage of the bigger sales opportunities for Informatica involving MDM, regardless of the region.

Ravi highlighted how the Informatica Master Data Management (MDM) solution is solving multidomain business problems like physician spend compliance, product mastering, high volume reference data mastering, clinical trial management, customer and channel management, and Salesforce.com enablement. He also discussed how Informatica’s other products usually fit into an MDM solution.

As the Informatica MDM product has evolved, it has remained true to its roots, and continues to offer complex hierarchy management, to be business user focused, and to allow for fast time to value. What Informatica has done, building on what Siperian had created before its acquisition, is to provide for true multidomain master data management, which allows for a much wider range of problems to be solved.

Informatica continues to increase its market share beyond the pharmaceutical vertical, and shows a strong track record of expanding its footprint within existing customers as well.

Informatica MDM Data Director has been widely used as well, with every new customer since its release in March 2009 buying it along with the MDM hub.

Informatica just finished up an 18-city MDM road show in the U.S. and Canada, and featured its MDM product prominently at Informatica World in early November. It has both a horizontal and a vertical industry marketing strategy.

Ravi previewed for us the materials for their “Customer and Channel Management Solution”, which manages hierarchies and relationships between customers, channel partners, products, and resources, in order to maximize account penetration, optimize coverage, and enable business agility and speed.

Ravi also gave us a demo of the latest version of the Informatica MDM product, with built-in dashboards using Data Director measuring data quality for individual customers and organizational customers. He also demonstrated the integration of MDM with the rest of the Informatica Platform – Power Center Business Glossary and Metadata Manager, and Informatica Data Quality.

Another impressive feature is enabling business applications, such as Salesforce.com, to be MDM aware. New records can be entered in the Salesforce.com application and instantly be bounced up directly against the Informatica MDM hub, and customer hierarchies can be viewed in a Salesforce.com tab, rather than requiring the user to jump back and forth between a Salesforce window and an Informatica MDM window. And the Salesforce user can see a timeline of a record “as of” a particular date, including all the hierarchy data.

At the end of the briefing, I came away feeling (again) that Informatica had made a great move in purchasing Siperian, and that Informatica’s MDM business has clearly gained momentum since the acquisition. This is clearly one of those cases where one plus one equals three. Informatica has done a great job integrating Siperian into the company, in taking advantage of the synergies between the two companies, and in promoting the product. Opportunities exist to take it even further, but the Informatica team is to be congratulated, since almost 60% of all mergers and acquisitions fail to create shareholder value according to the Boston Consulting Group.

6
Dec
Kalido Data Governance Framework

First Look at Kalido Data Governance Director

I attended an analyst briefing today with Kalido on their new product, Kalido Data Governance Director.

The Kalido presenters included Bill Hewitt, President and CEO, Winston Chen, VP of Strategy and Business Development, Lovan Chetty, Senior Manager of Product Management, Mike Wheeler, Director of Data Governance Solutions and Lorita Vannah, Director of Marketing Communications. Lorita is the person who first turned me on to Kalido, about two years ago now. We first met at the 2008 Gartner MDM Summit in Chicago, and she impressed me then with her passion for MDM, data governance and her company.

Bill started off by talking about how the data governance market has been exploding as the volume of corporate data has been exploding, which is certainly true, and observed that Kalido noticed a disconnect between data and business processes. To address this issue, Kalido developed a new product from the ground up, because the company felt that data was better managed through policies. For example, it may be okay to store customer data in multiple places, as long as the relevant policy allows that.

As part of its research into data governance, Kalido developed its own data governance maturity assessment. Winston described the evolution of data governance, from application-centric to today’s “enterprise repository centric” approach. The next phase, according to Kalido, is policy centric, followed by fully governed. Winston also discussed the need to manage data policies in context: you’ve got data, but you’ve also got business processes, systems and organizational scope.

That allows you to fully describe the context in which a particular policy is being defined.

The way to operationalize governance processes is: to define the policy, to implement the policy, and then to enforce the policy, which Kalido modeled on how laws are created by the legislative branch, implemented by the executive branch, and then enforced by law enforcement and the judicial branch of government.

Kalido has been working with data quality vendors such as DataFlux and Trillium to build integration with their products into Kalido Data Governance Director, so metrics can be automatically gathered back into DGG from those data quality tools.

If a data quality problem goes beyond the single or small number “issue” state, then it could be remediated as an “initiative”, where it would be put into Data Governance Director and tracked as a separate initiative, with all of the visibility and accountability that goes with that, and the full life cycle of governance – definition, implementation, and metrics / enforcement – could be used to make sure the data quality issue was resolved.

Lovan Chetty did a brief demonstration of the product, showing a web-based user interface to author new initiatives and policies, manage scope and organizational parameters, and create a unified business model, including a data model, process model and systems model.

Mike Wheeler talked about Kalido’s lighthouse customer program for Data Governance Director, which consisted of cultivating about 16 companies and 3 consulting firms, including some large financial services providers and manufacturing companies, at different levels of data governance maturity, to provide input and feedback on their policies and data governance programs and practices.

A number of them will be speaking at tomorrow’s Kalido Connect virtual user conference.

One very large company had a “light going on” moment when using the product, when they realized that pulling the knowledge out of everyone’s head is the hardest part, and that lots of “tribal knowledge” is often never incorporated in the actual policies.

One of the largest banks in Mexico, Scotiabank, has already bought the product prior to its general availability, in order to streamline its data governance operations. And a Top-5 pharmaceutical company has also signed up as a customer.

After a short Q&A session, Kalido promised to let everyone get a closer look at the new product in their virtual user conference tomorrow. For more information, or to register, please go to http://bit.ly/kalido-register.

The screen shot below shows the product measuring and reporting data policy compliance status based on results captured from 3rd party monitoring tools.

Kalido Data Governance Director Screen Shot

12
Nov
Kalido Logo

Kalido Data Governance Maturity Survey Results

This morning, Kalido, a Hub Designs partner, released an initial analysis based on the almost 100 responses it received to its Data Governance Maturity Assessment Survey.

The results were not surprising, but I found them very interesting nonetheless. Keep in mind that this was a self-selecting group; that is, people who were interested enough in data governance to have taken the survey. That suggests that the general population would be even less mature.

The biggest finding was that only 10% of organizations have been able to move their data governance programs beyond the first two levels of data governance maturity. That matches well with our experience at Hub Designs – most companies are just getting started with data governance.

Despite the commonly expressed belief that data should be owned by the business, traditional IT organizations are accountable for data in nearly two-thirds (63 percent) of organizations.  At Hub Designs, we believe that the business should be accountable for the data – but sometimes, that’s a “bridge too far”. You’ve got to start where you are, and evolve over time to higher levels of maturity. If the center of gravity right now is in the IT organization, that that’s where you start. But over time, have a strategy for moving data governance into the business.

Nearly half (45 percent) of organizations taking the survey said they have a formal data governance council in place, but only 27 percent have established a data governance council with business representation and formal data stewardship. That tells me that even in places where they’re doing some type of (immature) data governance, there are still lots of opportunities for improvement, by increasing the level of business involvement, stewardship and data quality.

This finding I found stunning: more than half (57 percent) of organizations do not measure the performance of data management activities at all. That leads me to believe that those organizations won’t be doing data management for much longer, because lack of measurement tends to lead to lack of funding, because of a perceived lack of documented results.

Clearly, we have a long way to go in the corporate world in becoming more mature from a data governance perspective. I really liked Kalido’s survey, and you can find Winston Chen, Kalido’s VP of strategy and business development, discussing it on his blog at http://bit.ly/cbckxD.

Speaking of Kalido, Hub Designs is sponsoring their upcoming Virtual User Conference on December 7, 2010. The Kalido Connect virtual conference provides attendees with a cutting-edge platform for networking, exhibition, collaboration and learning. Attendees can watch a presentation in a packed auditorium, network with peers in the Kalido Connect Lounge, or visit fully interactive sponsor booths on the exhibit floor. From group chats to one-on-one discussions, the virtual platform allows for a live conference and exhibition floor with real-time user interaction. To register, just click http://bit.ly/kalido-register.

Kalido Connect offers:

  • Real-world examples of Kalido’s business value as told by their customers
  • Keynote sessions on BI, MDM and data governance trends and how to keep ahead of the curve
  • Technical breakout sessions to maximize your investment in the Kalido Information Engine™ and expand your skill set
  • Exhibit hall showcasing complementary products and services from Kalido partners and sponsors
  • Opportunities to network with colleagues, industry leaders and executives

Last year, more than 300 people attended the Kalido Connect Virtual User Conference, and Kalido expects to double that this year.

11
Nov
Informatica Logo

Informatica MDM Tweet Jam

This is a transcript (lightly edited for brevity) of today’s Informatica MDM Tweet Jam. We hope you enjoyed the actual Tweet Jam and this transcript. If there were questions you didn’t get a chance to ask, please feel free to ask them via our web site’s Contact Us page.

Dan Power: Informatica MDM Tweet Jam like playing “stump Dan” – see if you can perplex, mystify and amaze me!

Dan Power: Actually, just kidding – want to have a good dialogue with everyone – would love to have a good MDM discussion.

Informatica Corp.: Right now! Join the #MDM TweetJam with @dan_power. 9am PT.

Dan Power: OK, the Tweet Jam is officially open!

Jakki Geiger: Dan, what are the most common concerns you hear about MDM?

Dan Power: IT people still seem concerned about how to involve the business and sell it to senior management.

Jakki Geiger: what advice do you give them?

Dan Power: IT seems to know that MDM is needed but sometimes can’t seem to get the business on board, and it can be hard to pitch to the C-Suite.

Dan Power: We advise building a compelling business case – getting outside help if needed – and recruiting internal business champions.

Jakki Geiger: What strategies to get the business on board have you seen work?

Dan Power: I wrote an article about that in a recent Information Management magazine and a blog article on Hub Designs Blog that accompanied it.

Jakki Geiger: We’ve seen IT successfully tie MDM to key strategic imperatives like improving cross-sell and up-sell=getting sales on board.

Ravi Shankar: One thing we have done to help IT is to quantify how much DQ issues can cut costs or increase revenue.

Dan Power: Getting the business on board means STARTING in the business – find out their pain points and recruit them to drive from Day 1.

Jakki Geiger: Others include onboarding channel partners onboard faster, which appeals to sales and channel operations.

Jakki Geiger: A huge driver has been regulatory compliance = appealing to those who gather data across the enterprise and create reports.

Ravi Shankar: I like what Charles Bloodworth of J&J said at Informatica World 2010 – “MDM is not just a project; it’s a discipline – a way of doing bus for us”.

Dan Power: Good points Jakki & Ravi – those are the pain points I’m talking about: increasing revenue / onboarding channel partners faster.

Jakki Geiger: One area I think is really going to take off is improving business processes = improve data to improve the process.

Jakki Geiger: One exec got buy in from exec team with “we need to manage our product supply chain and info supply chain equally efficiently”.

Ravi Shankar: Agreed – bus needs to be involved in MDM. Charles of J&J said bus involvement drove their MDM and data governance success.

Dan Power: That’s right – becomes a way of life – new discipline for the business – to have a golden copy of the data that they can trust.

Jakki Geiger: I agree with u. IT needs to understand what the business pains and strategic imperatives are, then evaluate “can MDM help?”

Dan Power: Product management and supply chain are just as fertile for most companies as customer data – so MDM is just getting started.

Dan Power: I’ve been talking to a lot of companies lately that have already done customer MDM and are now looking at doing product MDM.

Ravi Shankar: Product MDM: I see lot of demand for this from manufacturing companies. Just came from S. Korea – product MDM is hot.

Dan Power: Or even supplier MDM – in order to get global strategic sourcing initiatives off the ground, which can save millions of $.

Ravi Shankar: Customer MDM to product MDM – we’ve seen that with our own early customers – They leveraged the same Informatica platform.

Julie Hunt: How do you see MDM implementations evolving to take advantage of newer tech such as ‘cloud’?

Julie Hunt: And what advantages does the cloud offer to MDM solutions?

Dan Power: Good question, Julie – definitely see a movement towards the cloud – people don’t want to create tomorrow’s “legacy systems”.

Dan Power: So they increasingly are asking their vendors about cloud deployment options, even if they don’t rush to take advantage of them.

Dan Power: They want to know they’re available

Dan Power: To Julie’s Q about cloud, I think eventually we’ll see cloud deployments at lower cost than on-premise (particularly hardware).

Ravi Shankar: Let me outline 2 use cases we’ve seen @ InformaticaCorp.

Ravi Shankar: Use case 1: During peak times like holiday seasons, retailers can burst into cloud for additional capacity.

Ravi Shankar: Use case 2: Mktg mgrs can use self service tools to upload attendee list from event w/o having to bother IT.

Dan Power: The promise of cloud for me, is more flexibility as my business grows and if we have seasonal peaks and valleys of demand.

ocdqblog (Jim Harris): What do you say to companies that expected that from their data warehouse? How is MDM different from conformed dims?

Ravi Shankar: ocdqblog – welcome. Looking forward to a lively MDM discussion.

Dan Power: Good question, Jim. Most companies had unrealistic expectations from data warehouses, which ended up being expensive, read-only,

Dan Power: and updated infrequently. MDM gives them the capability to modify the data, publish to a DW, and manage complex hierarchies.

Dan Power: So to finish answering your question Jim, I think MDM offers more flexibility than the typical DW.

Dan Power: That’s why BI on top of MDM (or more likely, BI on top of a DW that draws data from an MDM) is so popular.

Ravi Shankar: MDM for DW – 90% of Informatica MDM customers use it for analytical use (in addition to operational).

ocdqblog (Jim Harris): Thanks Dan – Follow-up is do you see MDM as compliment or replacement for DW?

Dan Power: Definitely a compliment – fills void in the middle between trx systems and the DW – does things that neither can do to data.

Jakki Geiger: are you seeing this trend? Evolving beyond single customer view= visibility into 360 customer view w/products and channels, etc.

Dan Power: Yes, Jakki – people want more than a single view – they want multiple views on top of the single view.

Ravi Shankar: Siperian customers – We’re having a lively chat on MDM and data governance. Join in!

Ravi Shankar: Dan, what do you tell DW admins that DW provides their single view for enterprise?

Dan Power: I tell DW admins that most people in the enterprise aren’t completely happy with DW – that’s why there’s pain leading to MDM.

Jakki Geiger: Since the driver of MDM is the business, how are we getting master data into the hands of the business?

Dan Power: Good Q, Jakki – getting MDM data back into hands of the business should be built into the project – and the software platform.

Ravi Shankar: Compliance is driven out of DW – you need MDM for accurate compliance reports – Do you agree?

Dan Power: Yes, Ravi – Garbage in, Garbage out – you need quality data from the MDM system to feed into the data warehouse.

Julie Hunt: So we must advocate value of data governance as well as value of MDM with business, senior management?

Dan Power: I tell people to think of their initiative as a data governance project that happens to involve #MDM technology.

Dan Power: Not an #MDM technology project that requires data governance.

Dan Power: And to start the data governance piece about 6 months before the technology piece, if possible.

Julie Hunt: The importance of data quality = another layer to be advocated to the business and to management – show them the impact on outcomes.

Jakki Geiger: MDM is like a Ferrari. If you don’t use DQ with MDM, it’s like putting regular gas in Ferrari=sub optimal performance.

Dan Power: I’ve seen people try to do MDM without data quality – and it’s a disaster, like trying to run a submarine on dry land!

Dan Power: The fact is that #MDM and data quality are linked, just as #MDM and data governance are linked.

Ravi Shankar: Should data quality be integrated within #MDM?

Dan Power: Good question, Ravi – I’ve seen it both ways – a data quality engine integrated with the MDM platform or separate, both can work as long as the data quality tool is robust and the integration is solid, shouldn’t matter.

Dan Power: Most MDM platform vendors are not equally good at developing data quality tools – Informatica is one of the few that is.

Julie Hunt: How much does corporate culture impact success/failure of projects for #MDM, data governance etc.?

Dan Power: Great Q – corporate culture is a huge impact on success because data governance drives MDM and requires a lot of change mgt. So spend a lot of time on org. change in the data governance side of the #MDM initiative in order to be successful.

Ravi Shankar: Heard a customer say – “Don’t overdo data governance – do just what’s necessary” Do you agree?

Dan Power: I’d agree not to go overboard on data governance – balanced approach that’s right for your co. just enough to get the job done. Too much data governance can be worse than not enough – can be bureaucratic – the “data governance police”.

Ravi Shankar: Data governance applies to all data, but I hear that in MDM context a lot. Do you hear “master data governance” for MDM?

Jakki Geiger: Some argue shouldn’t call it data governance because the -ve connotation of “governance” thoughts?

Dan Power: I actually like that phrase – master data governance – makes it more clear and precise what we’re talking about

Dan Power: Because otherwise, data governance organization can get drawn into all kinds of weird things not related to master data

Dan Power: We need to recognized that data governance is (a) political, (b) controversial, (c) going to have an enforcement side.

Ravi Shankar: Now, do orgs do data governance first before implementing MDM or after they select an MDM product?

Dan Power: So in some ways, I actually like the term “data government” better – makes it more explicit what we’re talking about.

Dan Power: And it reminds people that we’re talking about governing the enterprise’s core master data – just like we govern other key assets.

Jakki Geiger: I think the challenge is that we’re still in the process of understanding that data is a strategic asset.

Dan Power: It’s ideal if they can start data governance before even selecting a product – so that the data governance org. can help w/ the selection process.

Ravi Shankar: Dan wrote an excellent whitepaper – “When Data Governance Turns Bureaucratic” – you can download it from http://bit.ly/ck2Gw8.

Dan Power: Truly competitive 21st century companies not only understand that data is a strategic asset, it’s how they run their business.

Dan Power: Forward looking businesses like Google, Amazon, Century 21, eBay, etc. realize that the data IS their business!

Jakki Geiger: “Data as strategic asset” is a fairly new concept. Visionaries recognize need 4 scale and intelligence=harnessing & analyzing data.

Dan Power: That was a fun white paper to write – looking forward to doing another one with the great folks at Informatica again soon!

Jakki Geiger: What I liked about Dan’s WP was the discussion around stopping the problem of data quality at the source.

Seth Grimes: Is data governance also (d) useful on balance and (e) capable of delivering ROI?

Dan Power: Yes, of course – or people wouldn’t be doing it. You can’t bring together massive amounts of data in an MDM hub and not have some type of governance framework in place. And if there was no ROI, it wouldn’t be happening.

Dan Power: I’m pretty familiar with Oracle’s data governance program, and for a huge company, it’s not real expensive.

Ravi Shankar: Welcome to #INFATJ – good data governance question.

Ravi Shankar: Successful Informatica MDM customers like J&J, Merrill, and numerous others have had strong global data governance orgs.

Ravi Shankar: Data is a key asset that many firms make a lot of money out of it – Bloomberg for e.g.

Ray Wang: RT @Ravi_Shankar_: Data is a key asset that many firms make a lot of money out of it – Bloomberg for e.g.

Dan Power: Good example with Bloomberg – welcome Ray!

Ravi Shankar: @rwang0 thx for the RT

Jakki Geiger: Can you create a career out of MDM? Many of our customers have extended MDM to address more and more issues in their orgs.

Dan Power: Good Q, Jakki – u can create a career out of it, I have for the last 6 years, but you’ve got to really have this in your blood

Ravi Shankar: Within Informatica customers, we’ve seen careers of several people take off b/c of successful #MDM data governance.

Julie Hunt: Thanks for great tweet jam!

Jakki Geiger: Thank you for participating! Looking forward to next time. Good luck to you all!

Dan Power: Thanks for joining us today – hope you enjoyed it! Check out the Hub Designs Blog at http://blog.hubdesigns.com.

Ravi Shankar: Thx for your insightful discussion and advice on #MDM data governance. Hope you all enjoyed it. Until next time!

Dan Power: This is Dan Power, signing off – have a great day everyone!

25
Oct
Guy Kawasaki as Evangelist

The Need for MDM Evangelism

For a long time now, I’ve admired Guy Kawasaki, one of the early Apple employees responsible for marketing the Macintosh computer in 1984. He’s credited with being one of the people to bring the concept of evangelism, in his case focused on creating passionate users and developers to become advocates for Apple, to the high tech business.

I’ve tried to emulate him by being an evangelist for customer and product MDM. From 2001 to 2004, I was a consultant working with the precursor to Oracle’s Customer Data Hub platform. At D&B from 2004 to 2007, I managed its strategic alliance with Oracle while Oracle launched and refined Customer Data Hub. I left D&B to start Hub Designs in 2007 because I wanted to work more directly in developing and executing MDM strategy at corporate clients. All this time, I’ve tried to get people excited about using the evolving technology to solve business problems.

In the past nine years, in all of the different industries and companies I’ve worked with, most have quickly “gotten” MDM:

  • They understand the value of the Single View of the Customer (or Product, as the case may be).
  • They see the revenue increases from being able to up-sell and cross-sell customers by knowing more about them, and by knowing their own products better.
  • They understand the dollar value of having a streamlined, coordinated New Product Introduction process.
  • They see the short payback period and millions in savings from a strategic sourcing program that consolidates vendors and products, and renegotiates agreements.
  • They understand the contribution MDM makes to credit risk management (know your customer, and whether they can pay their bills on time).
  • And they see how MDM (done properly, which includes data quality improvement and a data governance program) makes it much easier and more efficient to have accurate, complete, timely and consistent information available for compliance with governance regulations.

But all of those organizations, where I’ve been the “external champion” or evangelist, have needed a corresponding “internal champion” or evangelist.

Someone to lead the charge internally, to have the hallway conversations, to fight the good fight politically, to scrap for every budget dollar, to convince the powers that be, the type of person who digs in and doesn’t let go. Someone who’s convinced that master data management and data governance is important to his or her company. That it’s so important that it’s worth going out on a bit of a career limb. Or who perhaps was brought in specifically to head up an initiative like this.

My friend Tom Carlock wrote a great article called “So You Want to be a Data Champion?”, where he discusses how to be prepared to be your organization’s “data champion”. Tom knows whereof he speaks, because he’s been in roles like that at The CIT Group and AIG, and is now a leader of product strategy at D&B. He mentions attributes like being able to have a consistent vision that you can “sell” to others, the ability to develop and maintain relationships, being able to listen, ask for input and deal with objections, and being optimistic, hopeful and patient.

To that I would add, being persistent. My father introduced me to a quote by Calvin Coolidge, the 30th U.S. President:

“Nothing in this world can take the place of persistence. Talent will not; nothing is more common than unsuccessful people with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination alone are omnipotent.”

If you decide to become an MDM evangelist at your company, and you’re persistent in that role, you can help your company manage master data as an enterprise-wide asset – and transform itself in the process. I think our corporations today – ten years into the twenty-first century – desperately need that type of innovation and change.

21
Oct
Complexity

Master Data Management Best Practice #9 – Don’t Underestimate the Complexity

One of my favorite quotes is from Albert Einstein, who said “Everything should be made as simple as possible, but not simpler.”

This is very true in master data management (MDM) – where you’ll inevitably come under pressure to oversimplify. It’s not uncommon to have 20-30 source systems (or more) that have to be integrated with the MDM hub. And tackling other initiatives in the enterprise at the same time (like service-oriented architecture or major ERP or CRM upgrades) can increase the pressure. MDM can help with those other initiatives but doing several things at once may increase the overall degree of difficulty.

Remember, if you oversimplify or underestimate, you’ll be under pressure to cut functionality later. Satisfying important requirements will be postponed to later phases, and the business will be disappointed.

So watch out for the temptation to oversimplify. I had a client once who was setting up a customer hub with about five very complex mainframe-based source systems. They were oversimplifying by making the integration from the source systems to the hub one-way only. So new customer records would flow to the hub, but any updates or data quality improvements made in the hub would not flow back to the source systems.

I asked them what the plan was for those updates, and their answer was “manual integration” (which, of course, is no integration at all – just data stewards manually entering the changes a second time back into the source systems). We all know how that turns out – a great opportunity to synchronize updates and data quality improvements from the hub back to the source systems goes untapped.

Another thing I’ve noticed is that data governance can be disruptive to the business unless the business itself is driving the data governance program and it has been well-planned. Then, any disruption seems to be overlooked, much as you’d be willing to overlook a bit of mess from a home renovation when you were living in the house, as long as you got your dream house at the end of the process. But if someone else (IT, for example) tries to impose governance on the business, that’s a different story. Then, any disruption tends to be bitterly resented, since it’s being imposed from the outside.

Please let us know – in the comments here or in the forums on the MDM Community – what you think of this tendency to underestimate the complexity of MDM projects. And I mean it this time – let’s have your comments and “war stories”!

The next article in the series is: MDM Best Practice #10 – Use a Balanced, Holistic Approach

9
Sep
SAP

Speaking at SAP Virtual Trade Show

Hub Designs is an associate member of SAP’s alliance program, and on September 23rd, Dan Power from Hub Designs will be speaking at an SAP virtual trade show being put on by SearchSAP.com and TechTarget.

This free virtual seminar is focused on best practices for maximizing SAP performance. The day long virtual event features expert presentations, live panels and expert networking opportunities to help you make the most of your SAP environment, and will cover the hottest topics in SAP right now – including business intelligence, virtualization, master data management and mobile technologies. You’ll learn tips that you can put into practice immediately and you’ll get unbiased advice for long-term strategy development. At this unique online event, go beyond the hype and get insight into the latest technologies and best practices you can use to improve operational efficiency in SAP environments.

Dan Power’s session will be at 1:30 pm EDT, and will cover topics such as:

  • Definitions of master data management, data governance and data quality
  • The five essential elements of MDM
  • Why companies are doing MDM and what this means to you
  • Getting started on an MDM roadmap
  • Is your organization ready?
  • Creating the MDM business case
  • MDM software selection
  • Some important best practices

For more information, please visit http://searchsap.techtarget.com/feature/Getting-the-most-out-of-your-SAP-environment,  and to register, please click here.

8
Sep

Call for Papers for MDM Track at OAUG COLLABORATE 2011

Oracle Applications Users Group

I’ve been involved in the Oracle Applications Users Group (OAUG) since 1995, and have been a member of the OAUG Education Committee for several years now. The Education Committee is starting to plan next April’s COLLABORATE 11 Conference, and I’m managing the “Master Data Management” track.

Together with the Special Interest Group (SIG) coordinators for the Customer Data Management SIG and the Oracle Enterprise Product Lifecycle Management SIG, we invite YOU to submit a paper for the 2011 conference’s MDM track.

Our vision for the MDM track at COLLABORATE 11 is to have:

Here are the important facts from the OAUG Call for Papers:

You’ll have the opportunity to connect with more than 5,000 users, technology leaders, Oracle executives and solution innovators gathering for the user-driven education and networking event April 10 – 14, 2011 at the Orange County Convention Center West in Orlando, Florida. Proposals are now being accepted. The deadline is Friday, October 1, 2010 at 11:59 p.m. EDT. To submit a paper, go to http://collaborate.oaug.org/submit/.  For more information, you can go to http://collaborate.oaug.org/presenterinfo/.

Note to Oracle Employees: All Oracle employees interested in speaking at COLLABORATE 11 are to submit your papers through the Call for Papers submission form. Please contact speakerprograms@oaug.com for assistance with technical difficulties. For all other inquiries, please contact Lisa Stuart at lisa.stuart@oracle.com.

30
Aug
photo by Wonderlane

Our MDM Strategy Offerings

Recently, I put together an overview of Hub Designs’ MDM strategy offerings for a potential client. Here’s a recap.

Education

  • Based on our popular “Best Practices in MDM and Data Governance” speaking engagements, presented at Oracle OpenWorld and the Oracle Applications Users Group COLLABORATE conference.
  • Our workshops get business & IT professionals up to speed quickly
  • You get access to the best MDM experts, and can bring your business people into the process early

Roadmap

  • Based on Hub Designs’ MDM framework
  • Defines where you are now, where you want to be, and over what time period
  • Looks at master data management, data integration, data quality, and data governance over time

Readiness Assessment

  • Looks at issues relating to politics & culture
  • Performs skills assessment on people who may need training
  • Examines process issues, outlining where business processes need improvement or redesign
  • Investigates technology issues, detailing where essential components are not present or not able to support your upcoming MDM initiative
  • Performs data profiling to discover data quality issues

Business Case

  • Captures business requirements
  • Identifies stakeholders and select metrics
  • Baselines current performance
  • Negotiates expected benefits
  • Converts to financial results
  • Develops total cost of ownership
  • Calculates hard-dollar ROI

Software Selection

  • Develops selection criteria
  • Creates a weighted vendor scoring model
  • Includes functionality, technology, viability, costs, services and vision
  • Develops demo scripts for vendors to follow and sample data sets to give them
  • Manages proof of concept (POC) process
  • Assists in evaluating POC performance and scoring vendors

These engagements range in length from one to twelve months, with teams varying from two to ten people, depending on the size of the company, the number of domains of master data  involved, and the complexity of the politics and legacy systems in the enterprise.

If you’re interested in discussing an MDM strategy engagement like this, please contact Hub Designs at http://www.hubdesigns.com/contact_us.html. Or if you have comments on the above approaches, please let us know by commenting here.

30
Jul

Data Profiling For All The Right Reasons, Part 5

The Hub Designs Blog welcomes the final installment of this great series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.

Part 5: The Profiling Payoff

This is the final part of a five-part series, describing how data profiling benefits both IT projects and business operations.  In Part One, we discussed profiling perspectives.  In Parts Two, Three and Four, we introduced the value of system, entity, and attribute-level metrics.  This part discusses the archival and beneficial uses of profile results.

If you have defined your corporate data profiling strategy similar to the methods discussed in the preceding parts of this series, you’ll have amassed a robust collection of metadata spanning relevant systems across your business.  Although systems may be of different types and locations, the structured approach and common metrics you collected create a centralized repository of information that can be examined holistically. Ideally, this information will exist in an open-source database repository with reports made available across the enterprise. System and Entity information help planners and developers organize information strategies. Attribute-level domains, constraints, and business rules help data architects understand existing systems. Relationships and value patterns are readily available to support validation of information-related hypotheses as needed.

If you plan to design your own repository, consider adding timestamps and indicators to help you manage and present the information.  To keep your repository relevant to business needs, design collection rules to be configurable. This allows you to easily ignore superfluous information or enable tests only at certain critical times. Allow initial system profiling efforts to gather a large set of metrics and store them as your baseline.  As you learn about the information, you will see which tests or which data objects add no value.  Us geeky DBA-types who understand system-level catalogs have our own scripts to do much of what was described inParts Two,Three and Four. Those less-inclined may prefer to use a third-party tool for profiling. Either way works as long as the business needs are satisfied and the entire enterprise standardizes on one approach (and thus one integrated repository).

You will find that collecting and maintaining this level of detail has a definite cost.  Even if the collection is automated, interrogations of large data sets places an overhead on production systems that may not be practical. Record and monitor profile execution metrics to identify bottlenecks or tuning opportunities. Realize that the extent of data profiling is contingent on the project phase, specific data elements, and most of all, business value. Review profiling goals on a regular basis and eliminate unnecessary and redundant checks.

How much profile history to maintain is another consideration.  Even though disk is “relatively” cheap, maintaining all historical entries in a live repository may not be necessary. Consider business needs and value for historical profile information. Even consider archiving at a summarized (or less frequent) level and keep only a limited time window of statistics online.

This discussion on data profiling was intended to broaden perceptions of what it means to a business and the value it can bring if done in a sustainable way. The blog format is not conducive to in-depth discussions, but hopefully the topics covered here spur some thoughts into how you can add value to your business by implementing some of these concepts.  Use your imagination, but remember that no matter how cool it might be to collect and store some profile output, if it does not add business value to somebody, it might not be worth the overhead to continue recording it.

29
Jul

Data Profiling For All The Right Reasons, Part 4

The Hub Designs Blog welcomes Part 4 of this series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.

Part 4: Profiling Relationships and Patterns

This is part four of a five-part series describing how data profiling assists in all aspects of system development, from design through deployment.

Part One introduced different perspectives on data profiling. Part Two identified valuable system and entity metrics to track. Part Three discussed attributes. In this segment, we dive deeper into attribute relationships and pattern recognition. Also, we expand on primary key identification discussion and discuss hidden relationships.

Pattern grouping provides a mask of distinct format patterns within an attribute data set and a count of the number of occurrences. Patterns give insight into the type of values found in an attribute. For example, a numeric pattern analysis may show values such as 999.99999, 99, or -.9999.

Observing distinct patterns gives insight into the maximum digits and precision, and also domains such as integer or real. Pattern of a database date or date-time type provides unremarkably similar patterns for all dates. Because the database management system typically enforces the domain, date analysis provides no value and can be ignored. If dates are stored in character format, however, patterns quickly show variations in date formatting. Character patterns only have significance to a limited number of positions. It makes no sense to pattern a description field of 200 or 2000 characters. Smaller code attributes of less than 10 characters though do provide value. Ignore pattern profiling for character strings over 20 characters at first, then refine to shorter character strings if the results do not add value.

In pure database theory, referential integrity (RI) is your friend. In practice, designers and software vendors often forgo RI to improve system performance on data inserts. These designers place the data quality burden on the application and do not endorse external data manipulation outside the application interfaces. In the real world, though, data corruption occurs and without RI or routine data quality checks, corruptions may not be found for a long time or not at all. Personally, I have identified over $50,000 of recent orphaned sales in a retail client resulting from deliberately disabled RI. These unreported sales were not added to the ledger and were allowed to occur for performance reasons until I found them through simple profiling. Enforcement of RI is a topic for another discussion but is mentioned here because it does identify a valid reason for data profiling.

In even presumably good relational designs, some parent-child relationships are not enforced for different reasons. First, interrogate the RI listed in the system catalogs to identify all enforced relationships. Reverse-engineering a system with a good modeling tool is probably the best way to do this. A harder and more valuable analysis is to identify unenforced relationships and determining the probability of the relationship if not all values are an exact match. Do this by counting all the candidate child attribute values that exist within a known parent attribute table. If all match and there are a non-trivial number of matches, there is a good probability of a non-identified relationship. A small number of mismatches could identify data quality issues.

In Part 5, we tie all the techniques discussed in the first four parts together to show the value of a repeatable data profiling process.

28
Jul

Data Profiling For All The Right Reasons, Part 3

The Hub Designs Blog welcomes Part 3 of this series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.

Part 3: Attribute-Level Analyses

This is part three of a five-part series on data profiling.

In Part One, we took a light-hearted view of where profiling benefits an organization and in Part Two, we discussed the fundamentals of a profiling strategy.  The remaining three parts discuss attributes, relationships, patterns, and how to use the combined data profiling information you collect.  In this section, we introduce attributes, the lowest-level components of a profiling effort.

An attribute is simply a individual data element.  Alone, an attribute has no context.  Given the simple descriptor of “Cost” for an attribute tells us very little about the attribute’s true purpose and immediately drives a need for additional information, such as units (hours, Dollars, Euros…), type (weighted, unit, gross…), and use (invoice, sum, average…).  Attributes therefore must be analyzed within the context of their business purpose to have meaning.

Some characteristics require business knowledge to define and others can be determined through interrogation of existing values and underlying rules of the storage medium. It takes both analyses to get a complete picture of information within a system. While assembling this puzzle, though, keep in mind that until you validate the enforcement of business rules, only assumptions can result from physical profiling or business context.

Analyses of values, domains, and constraints allows insight into use (or abuse) of an attribute. The larger the sample size, the better confidence you gain in the results. Without explicit proof of business rule enforcement, though, you must assume that just because a value does not presently exist does not mean it cannot exist. Business rules are defined by business experts and enforced through database constraints, data type/precision, and application code. Knowing the methods of enforcement allow you to narrow a domain but not totally understand it. Profiling of actual values provides additional refinement in terms of percentage of NULL values, percentage of distinct values, minimum, maximum, and average values, top x and bottom x recurring values along with their counts, and minimum, maximum, and average data lengths.

Some attributes within a data set serve valuable purposes that are important to identify. Attributes that individually or in conjunction with others define uniqueness of the data set also may support relationships between entities.  Uniqueness can be further classified as being either members of a system-enforced primary key or of a business key (outside of the defined primary key).  System-enforced primary keys are relatively easy to define within a database system through interrogation of the system catalog.  Business keys that exist in tables in addition to a primary key may be more difficult to identify, especially if more than one attribute is needed to define uniqueness.

Attribute-level information of interest includes: data type (size and precision), the number and percent of NULL values, column descriptions, number and percent of distinct values, and the minimum-maximum-average values and lengths.  Uses of the system catalog provides some of this information, but others must be collected through sampling the data.

Other types of attributes that may help in identifying relevancy are those that provide system-level auditing or change control. Knowing which attributes fill these roles may either allow you to (a) ignore them for profiling purposes or (b) use them to help explain versions or data anomalies.

Part 4 expands on attribute profiling with the introduction of relationships and patterns.

27
Jul

Data Profiling For All The Right Reasons, Part 2

The Hub Designs Blog welcomes Part 2 of this series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.

Part 2: Profiling the Basics

This discussion is the second of a five-part series on data profiling. In Part 1, we discussed the project roles that benefit from data profiling and how better understanding information results in more reliable information systems. Important goals of any profiling strategy include automation of metric collection and socializing results to support the differing objectives of a data-centric project.

Early in a system development life cycle, profiling helps define sources, data storage requirements, and data transformations. As a system goes into production (or if profiling is added to an existing system for quality control purposes), routine profiling is useful to audit system quality and business rule enforcement. The frequency of collection and amount of effort you expend to automate your profiling methods should be based on the ability of the organization to benefit from the profile results.

This section discusses the beginnings of a profiling effort. Information assembled here forms the foundation of other profiling activities. For this discussion, consider a Profile Group as a set of information sharing a common purpose and data management methods. Examples of profile groups include tables within a single database schema or a group of spreadsheets with the same format but each spreadsheet representing a different time slice of data.

The underlying System managing a set of information within the profile group may be a named relational database, a file system directory, or even a web site being accessed through web services. The reason we abstract information into Systems is to group the information into distinct governance methods common to the underlying information. Relevant metadata and governance methods we track at the system-level include: technical contacts, backup schedules, system descriptors, connection strings, business unit owners, and host operating systems. System-level metadata common to a profile group helps us understand and troubleshoot future analyses. This level of information also provides developers with an understanding of inherent restrictions (or freedoms) they may encounter when trying to use or integrate the information.

Entities within a profile group belong to the same system, may have a common unique identifier, and, for database entities, have the same schema owner. Typically, entities are database tables, but may also be similar files or spreadsheet tabs containing like attribute lists. For entities, we track characteristics common to all the attributes they contain. These include: row counts, entity-level descriptors, growth characteristics (size and frequency), last analyzed date, and various customized indicators such as active/inactive, existence of change data management attributes such as insert/update timestamps, and existence of audit traceability indicators such as insert/update username.

The combination of system and entity level profiling supply the foundation for the attribute-level profiling, which is where physical information in a system resides. It also provides valuable metadata to classify information and allows for future correlation of like information across systems. Assembly and publication of entity and system level information benefits the various consumers of the information by providing a centralized “master” source of contact and context information.

In Part 3, we will dive into the attribute level analyses around data profiling.

26
Jul

Data Profiling For All The Right Reasons, Part 1

The Hub Designs Blog welcomes a guest post by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.

Part 1: The Psychology of Data Profiling

Swiss psychologist Carl Gustav Jung founded the Analytical School of Psychology. His word association theories form the basis of the Myers-Briggs Type Indicator Assessment test to identify career aptitude in today’s high school students. Dr. Jung’s approach assigned personality profiles based on how an individual’s thoughts associated to various phrases. By analyzing responses, he could understand how an individual viewed the world around them and perceived themselves. Typically, subjects are asked to speak the first thought entering their minds after hearing a trigger phrase. For the following example, remember, there are no wrong answers. If I say the words “Data Profiling”, what is the first thing you think of?

If you thought of food, cats, country music, CSI NY, or residential plumbing, you are either not in IT or are an IT Manager.

If your first thought was “Quality Assurance”, you align yourself with data quality professionals having anti-social thoughts of failing test cases and sadistically reporting lazy developers for buggy code. You gleefully scour test cases looking for any evidence of truncation, missing values, non-matching codes, numeric precision errors, and inconsistent abbreviation, text, and date formatting.

If “Integration” comes first in your mind, past legacy integration projects have scarred you with a disdain for source system data quality levels. You view production apps with contempt and loathe the time it takes to track down data issues caused by system integrations. You investigate upstream sources to create detailed mappings and transformation rules. Typical debugging sessions consist of validating relationships to identify orphaned data, identifying attributes that contain overloaded columns (attributes containing more than one distinct data element), or fixing format errors from implied decimals.

Some of you responded with “Value Domains” or “Data Types”, indicating you are obsessive compulsive data architects compelled to organize the world into strict and orderly fashion with some degree of normalization, though you are not considered “normal” by your peers. Your concerns lie in understanding and regulating naming conventions, relationships, existence of NULL or default values, and understanding the meaning of each data element to accurately identify business rules and when two or more objects are related or redundant.

Lastly, if “Debugging” is the first item in your thought queue, you are a coder justifying why presumably good code is not working. Extreme paranoia has taught you to assume nothing about data quality, so you add tests to identify duplicates, validate relationships, enforce business rules, track change data capture, provide substitute values. Your phobia of early morning phone calls cause you to add auditing to your code to inform a DBA of data issues rather than waking you up in the middle of the night.

It is truly amazing how much we can conclude from the response to one simple phrase.

As stated before, there are no wrong answers. Aside from the innocent jab at Managers and non-IT resources, we all realize the benefits of information quality and absolutely need business involvement to understand context and domains of business information. The meaning and actions of Data Profiling change both by role and by project phase. Through profiling, we are able to identify best sources of information, learn proper ways to categorize and store it, reactively identify quality issues, and proactively define business rules to prevent future issues.

Identifying what is important to profile, when and how profiling is done, and how to share our findings across business and project resources is key. Done properly, profile results integrate to a master metadata repository and are periodically refreshed through an automated process.

This five-part series provides a tool-agnostic approach to comprehensive data profiling, focusing on information meaning and use. The next part of the series discusses system and table-level profiling. In particular, what information is important to collect at the system and table level and how can that information be leveraged by the Enterprise to help assure quality. The third part dives into attribute-level profiling and the fourth discusses attribute patterns and relationships. The final part discusses the benefits and utility of gathering profiled information into a single repository.

Follow

Get every new post delivered to your Inbox.

Join 2,897 other followers