Briefing by Informatica on PowerCenter and Data Quality v10 and Big Data Management launches
Recently, I received a briefing by Carter Lusher in Analyst Relations for Information Quality Solutions at Informatica on new capabilities in the company’s flagship data integration product, PowerCenter, as well as Informatica Data Quality and Data Integration Hub. Version 10 of all three products was launched last month under the banner of “traditional use cases”.
And a new product – Informatica Big Data Management – was launched this month under the heading of “next gen use cases”.
Informatica customers have been investing separately in both – in traditional areas such as data warehousing, business intelligence, master data management, data governance, data quality, etc. and in newer areas of technology such as big data, advanced analytics, data lakes, Hadoop, etc.
Informatica sees this as a chance to bring the two worlds together and to simplify things for customers by offering a unified platform for their data management requirements.
Traditional Use Cases
Informatica has been very successful with its flagship products – PowerCenter, Informatica Data Quality and Data Integration Hub.
PowerCenter has added new capabilities in the v10.0 release that are free to existing customers. Some areas have up to 50 times faster performance, and Informatica has added more connectivity to cloud-based data sources. The end result should be enhanced agility for integrating enterprise data.
Data Quality v10.0 has new approval workflow and voting functionality for business terms in the glossary. Data profiling has new visualization capabilities, and the business rule builder has been enhanced. The matching functionality has more flexibility, including enhancements to Universal IDs. These new capabilities boost scalability and improve enterprise data stewardship.
Data Integration Hub now handles publish and subscribe to the Informatica Cloud, and allows management of both on-premises and cloud data pipelines. It provides Secure FTP for data exchange with remote offices, and automatically stores all data in Hadoop, with indexing for big data analytics and other big data applications. This significantly simplifies the data integration process in today’s faster, hybrid information world.
By adding these new functions, Informatica has improved its data governance capabilities, offering an end-to-end data stewardship process, employing workflow, approvals and voting, and more proactive involvement of business analysts and data stewards through the enhanced Business Rule Builder.
Next Gen Use Cases
In working with its customers, Informatica has been finding that big data projects often start out as empowering, but end up needing help. A “skunk works” project may get to the prototype stage, but struggles to get to the production stage. Ingesting and mapping a lot of new data sources can turn into a bottleneck. Overall, there’s a big unmet need to transform data management from a labor-intensive, qualitative approach to a systematic, quantitative approach. A lot of Hadoop implementations stall out due to the heavy manual workload.
Informatica sees a lack of end-to-end management for big data, with challenges like:
- Quickly integrating big data: big data and Hadoop don’t eliminate the need for data integration
- Certifying and governing big data: the likelihood of data quality issues is even higher than before
- Securing big data: big data can mean more proliferation of private and sensitive information
Big data can mean more volume of data, more variety of data and more velocity of data, with even more separate data platforms, more data consumers and more data silos. As a result, a lot of valuable data goes unused or is delivered too late to be useful.
Governance issues around big data reveal that the meaning of quality can change, due to the same data being used for multiple purposes. Hidden relationships embedded in the mass quantities of data may not always surface. Trust issues are magnified due to the lack of control over external data sources.
Security issues are magnified as well, as the increase in sensitive data can lead to more breaches and exposure to risk.
Informatica is introducing its “Big Data Management” product, a single, comprehensive and integrated platform for end-to-end big data processing. It stresses what Informatica calls the three pillars of big data management:
- Big data integration
- Big data governance
- Big data security
Big data integration provides a simple visual environment, with highly optimized execution and a flexible deployment model.
The product contains hundreds of pre-built transforms, connectors and parsers, and provides broker-based data ingestion.
Big data governance provides a business glossary with improved collaboration capabilities, and data quality capabilities including analysis (profiling).
The product also provides end-to-end data lineage and 360-degree relationship views.
The security pillar means that sensitive data can be discovered and classified.
Users can do proliferation analyses to see exactly where that sensitive data has been replicated, and assess the enterprise-wide risks. Information can be persistently and dynamically masked.
Other Important Capabilities
Functionality includes dynamic mappings – which means mappings self-adjust in response to changes in external schemas and the ability to process flat files with columns in a changing order or a different number of columns. This leads to a 10x increase in integration developer productivity.
Informatica also added smart optimizers to boost performance, with automatic tuning and partitioning of mappings.
Big Data Management includes a component called “Blaze”, which is a Hadoop YARN-based high performance execution engine for complex batch processes. It leverages all of the compute nodes on a Hadoop cluster to provide the highest possible throughput. Blaze improves performance by eliminating intermediate data staging with efficient data exchange between nodes.
Live Data Map is an exciting capability as well – it’s a universal metadata catalog to build and maintain a massive knowledge graph of enterprise data assets.
Informatica has a lot of great new capabilities for both the traditional and the next gen (big data) use cases. The company – now privately owned and able to take a longer view than just this quarter’s earnings – is listening hard to its customers, and helping them bridge the two worlds through reuse of infrastructure, reuse of skill sets, and reuse of work products. Return on investment is increasingly realized not only for the first project a customer executes, but through extending the new platform to second, third and fourth projects. The capabilities that Informatica provides are well-suited to that model.
About the Author
Dan Power is the Founder & President of Hub Designs, a global consulting firm focused on strategy development, solution delivery and thought leadership for master data management (MDM) and data governance. He’s the publisher of Hub Designs Magazine, one of the first online publications for MDM and data governance, with over 3,000 regular readers. Dan has written more than 35 white papers and articles, and is a frequent speaker at industry events, conferences, and webinars. To contact him, please visit http://hubdesigns.com/contact.