28msec: Taxonomy-Driven Data Management for Complex Regulatory Reporting

The Hub Designs MDM Think Tank had a first-time introduction to 28msec when we met with Eric Kish, who served several years as its CEO. 28msec provides solutions that particularly target complex data management use cases, as well as data governance and data analytics. It was founded in 2008 in Zurich, Switzerland and is now headquartered in Menlo Park, California. In 2013, the company came out of “stealth mode” to openly market its data solutions.

28msec grew out of a research project started by Ph.D. students at ETH Zurich University and several outstanding data management experts. Founders include: Roger Bamford who was an original team member and a principal architect for the development of the Oracle database; Donald Kossmann who continues as a professor at ETH Zurich University and functions as a principal researcher for Microsoft; and Daniela Florescu whose accomplishments include development of the XQuery and JSONiq languages, and Zorba, a virtual query engine that can handle multiple query languages and data processing capabilities.

The 28msec team spent several years developing technologies that are meant to revolutionize methods for processing and managing data. The centerpiece technology is a massively scalable query engine for JSONiq, designed to process any kind of data and text, from structured to unstructured.

The company name can now be seen as a historical reference, evoking a time not too long ago when it took 28 milliseconds for a database to access data stored on a hard disk. It still stands out as an interesting company name that should pique the interest of those looking at data management vendors.

Why Taxonomy-Driven?

Before organizations faced the difficulties of processing large volumes of disparate data, taxonomies were principally utilized for cataloguing unstructured digital content and documents. Taxonomies are now important to help provide structure and definition to the more complex data and information with which organizations want to work. Many useful data sources are unstructured or multi-structured, and are challenging to process, validate and connect to relevant context.

Many master data management solutions now process unstructured information, where master data provides context and relevance. So it seems a natural step for data management solutions to work more extensively with taxonomies to attain documented understanding about any data source and the relationships between data components. Taxonomies can provide a wide array of attributes including semantic structure, context, agreed-upon meanings for data terms, and relationships to other terms.

For the 28msec approach to processing data, taxonomies take the place of modeling and schemas. Taxonomies that are flexible and dynamic semantic structures drive and accelerate processing to meet ever-changing business situations and regulatory requirements. The 28msec solution acts as middleware that enforces all aspects of taxonomies including business rules and validation routines.

28msec-picture1

The Right Problems to Solve

After many years of developing their technologies, the company researched industry problems that would benefit from the 28msec approach of leveraging taxonomies for data processing and analytics.

They saw the complex requirements for global regulatory compliance and reporting for the financial services industry as a fertile area for their solution. Financial institutions need solutions for regulatory reporting as well as a data management platform that can help them better manage and understand their data as it relates to their compliance risks. The U.S. was the first global market that 28msec targeted for this area, but a growing number of customers are now in Europe and Japan.

Regulatory reporting for the financial industry is based on XBRL (eXtensible Business Reporting Language) which is dependent on sophisticated taxonomies and validation rules. Taxonomies for regulatory reporting can change quite often; each regulatory filing by an organization frequently has a different taxonomy. Traditional data warehousing approaches can’t handle disparate taxonomies; five years of filings will likely overwhelm traditional modeling. Organizations that try to generate XBRL regulatory reports based on traditional data warehousing approaches and database schemas will find it very costly in terms of time, resources and money, and still may end up with inadequate and unreliable results.

Since data warehousing can be too unwieldy to be effective for regulatory reporting needs, organizations often try manual approaches based on Excel spreadsheets. Again using spreadsheets as the data source is often an error-prone course with no guarantees for accuracy or any means to prove accuracy. But organizations continue to derive XBRL from Excel worksheets for regulatory reporting.

28msec provides another financial services solution related to BCBS239, which comprises guidance from the Basel Committee for Banking Supervision for managing and processing data in the banking system. Regulators require more extensive reporting based on data that is timely and accurate, no matter the source systems. Eric mentioned that the U.S. Federal Reserve System has failed two banks based on the lack of rigorous data management capabilities complying with the rules.

Financial institutions are still early in adopting solutions for these problems. Often they don’t fully grasp the issues or even understand how to figure out what the solution should be. They also lack an understanding of the overall costs: time, money, resources, commitment to new way of doing things, and overhauling data management in the organization. Many financial firms are overwhelmed just thinking about the work that will have to be done.

Aggregating Volumes of Information: Leveraging Data Lakes and Cell Stores

Obviously volumes of disparate data are swamping traditional processing methods. It’s a major challenge to bring all the data together in meaningful and accurate ways that can be safely used for business requirements. As Eric noted, a great deal of the data that organizations want to use is multi-dimensional which relational models don’t handle very well. Often data comes in complicated formats. Complications include: hierarchical relationships, networks of influence, multi-structured formats, and information lacking context.

28msec uses regulatory taxonomies to analyze data across an entire financial organization to help derive a “single source of truth” for reporting purposes. Since data lakes store data in native formats, they have a natural role in the 28msec approach. Structural definition only comes into play when the data in the data lake is needed for processing. Then 28msec uses cell stores to transform data, utilizing taxonomies for processing purposes. Cell stores have an affinity with data lakes since cell arrays can contain data of variable types and sizes, and can handle data multi-dimensionality. Cell stores are able to query across all archives to bring together large amounts of disparate data, and can scale up to handle tens of thousands of dimensions, with no schema needed in advance.

28msec-picture2

Cell Stores for XBRL Reporting

In a cell store architecture, the data is stored at the cell level as part of a single collection where the data can be clustered, replicated, and retrieved efficiently. While the cell store natively supports the XBRL format, the data is “decoupled” from the XBRL. Cell stores can then avoid exposing complex XBRL syntax to business users. Instead business users are able to work with data structures as derived from familiar artifacts like spreadsheets.

Through the alignment with the XBRL structures related to regulatory reporting, the 28msec cell store functions as middleware that can sit on top of different data stores (graph, columnar, etc.) to enable real-time processing and reporting. The cell store runs report-building in parallel with the processing of validation rules.

Cell stores also scale up seamlessly with queries that can contain more than 10 dimensions. Cells can be assembled into virtual data cubes using hypercube queries, or assembled into spreadsheet views that are friendly to business users for read and write activities. Business users can define their own taxonomies, schemas, maps, and rules, without any involvement of the IT department. These end users can take advantage of their existing BI and analytics tools for processing and reporting.

When processing data, 28msec utilizes a data Extract and Load (EL) method, and then validates data against the current taxonomy. 28msec imports the taxonomy and sets up the EL processes. Data is mapped into a conceptual model (the most current taxonomy) for regulatory processing and reporting.

28msec-picture3

Solution Technology

28msec provides a unified way to access information, by abstracting the storage layer and simplifying the requisite architecture. The in-memory query engine sits in the core, and is available as a cloud service. The 28msec query engine can process a combination of JSONiq, SPARQL, XQuery and SQL query languages. A parallel computing framework allows users to query and join in real time any and multiple data sources ranging from structured, semi- or multi-structured, and unstructured.

28msec supports a flexible Risk Data Aggregation and Risk Reporting (RDARR) architecture. RDARR is the methodology recommended to banking organizations for implementing the right information technology, data architecture and data management for compliance. RDARR requirements are formalized in BCBS239 guidelines. The RDARR methodology is very much intertwined with familiar master data management and data governance functions: data lineage, metadata management, data quality, data stewardship, accuracy, and accountability.

For regulatory reporting purposes, data lineage tracks from the original data sources, through integration and aggregation processes, to directly mapping into the reports that are generated.

28msec-picture4

Implementations and Performance

Because of the virtual approach of the 28msec solution, implementations usually only take 60 days on average. However, this implementation timeframe is predicated on having taxonomies already in existence. The 28msec implementation does not come into play until customer organizations have created, tested and finalized their taxonomies. 28msec doesn’t provide services for taxonomy creation, but has partner consulting firms that can be referred to customers. These partner consultants also handle implementations of the 28msec solution.

As for query engine performance, Eric cited this example: 28msec can query a two terabyte repository in less than 4 seconds.

Building Success in the World of Regulatory Technologies

Eric stated that 28msec has had good success with the financial sector because financial organizations have had poor technology alternatives for efficiently handling regulatory reporting as well as the management of disparate data scattered across numerous systems. 28msec gained advantage for this solution by working with regional regulators through partners and earning their approval for the 28msec approach. This gives 28msec a solid basis for talking with prospects, and has led to organizations contacting them due to regulator awareness of the 28msec solution.

About the Author

Julie Hunt is the editor of Hub Designs Magazine and co-founder of the Hub Designs MDM Think Tank. Her “day job” is as an independent B2B software industry solution strategist and analyst. She provides consulting services for vendors to help develop successful strategies for buyers, customer and user experiences, solutions, go-to-market, and future direction.

Tags: , , , , ,

Comments are closed.

%d bloggers like this: