DBMS, July 1997
DBMS Online: Enterprise C/S By Judith Hurwitz

The Evolution of Metadata

You must be aware of the capabilities and limitations of current metadata offerings.

Organizations often decide to begin implementing a data warehouse because management is dealing with so much data that it can't make sense of it all. The need to implement a data warehouse is often accelerated when companies merge or when they are looking for ways to market existing products to additional customers because of competitive threats. Too often departmental or IT management rushes into data warehousing projects without understanding some of the most important underlying systems issues. The most important of these is the creation of a metadata model. Metadata is a set of definitions about data elements stored in traditional data sources. Because data warehouses often take information from multiple data sources, it is critical that common definitions exist across these information sources.

Disaster can strike if metadata issues are not taken into account and managed properly. This is especially true when organizations deploy increasingly complex, interdependent data warehouse and transaction systems. Metadata is a key issue because the only way to change data into meaningful knowledge within a warehouse is to be able to compare apples to apples. Because few organizations have the opportunity to establish corporatewide definitions of terms, creating metadata becomes a difficult task. The fact that metadata has no clear, emerging standard will increasingly become a problem as large-scale enterprises deploy complex data warehouses or move to integrate heterogeneous data marts. To develop a viable strategy for managing metadata, IT must be aware of the capabilities and limitations of current metadata offerings and how vendors are positioning themselves in the marketplace.

The data warehousing industry is plagued by a muddle of metadata offerings. Given the crucial role metadata plays in the design and management of a warehouse solution, the current mixture of offerings is crying out for an integrated, open, standards-based solution.

Metadata Usage

What makes the whole issue complicated is that there is no single type of metadata. This has led to considerable marketplace confusion as vendors try to enhance their position by touting their products' metadata capabilities. In my view, organizations should be aware of three forms of metadata: navigational metadata, operational metadata, and RDBMS metadata. Each type serves a different purpose.

Navigational metadata, usually described as "data about data," helps end users who are browsing or querying a warehouse. This type of metadata (also called semantic metadata) provides end users with a business-oriented view of data, including meaningful data names, data descriptions, and data relationships.

Operational metadata facilitates the crucial extract, transform, move, load (ETML) process. This layer is used by warehouse designers and developers who must map data from various source systems to the warehouse database. This type of metadata identifies the source data location and format, and it contains the logic required to transform the source data into a warehouse-ready format. This transformation logic can be quite complex and requires specialized data cleaning products or significant amounts of customized code; it can also be as straightforward as changing every instance of "1" in a gender field to "M."

RDBMS metadata is the traditional metadata used by database administrators to manage and maintain internal tables and other structures in the database.

Various Approaches

The confusion in the marketplace stems from the fact that each vendor supports its own proprietary version of middleware. Little, if any, commonality or interoperability exists among the various tools that can be used to build, populate, manage, and access a large data warehouse. Because metadata is relatively new as a discipline, IT organizations have no clear guidelines. However, some organizations and approaches offer starting points.

The Metadata Coalition was created to develop a common metadata interchange format, but the vendor community has greeted the coalition with varying levels of support and interest. Several vendors, principally Evolutionary Technologies International (ETI) and Intellidex Systems, have championed the coalition and have announced support for the first version of the Metadata Interchange Specification (MDIS) V1.0, which was released on June 7, 1996.

MDIS, designed by ETML vendors, is the lowest common denominator for sharing metadata among tools. It is a file-format definition intended to be loaded by a warehousing tool in batch mode using a public API. MDIS describes multiple object types, including databases, schemas, files, and relationships, and it supports extensions for exchanging tool-specific or proprietary metadata. ETI, Intellidex, IBM Corp., Carleton Corp., and R&O Inc. have announced their intent to support MDIS V1.0 in the near future.

Prism Solutions Inc., an industry leader, has adopted the CASE Data Interchange Format (CDIF) from the Electronics Industries Association (www.cdif.org). CDIF is an extremely comprehensive standards effort designed to provide a semantically complete model for the development life cycle, including object-oriented modeling and business process modeling. Given this broadly ambitious goal, CDIF is probably not appropriate for the more limited, tactical goals of metadata management.

Another approach is for a vendor to open up its metadata in an effort to gain rapid acceptance by a critical mass of vendors. Informatica Corp. has taken this approach with the recent publication of APIs that enable access to its metadata repository. Business Objects Inc., Brio Technology Inc., Cognos Corp., and Andyne Computing Ltd. -- all desktop client vendors -- have declared their support for the Informatica MX architecture. However, Informatica is the only vendor that has announced a solution intended to increase market share. Therefore, the underlying issue of metadata incompatibility remains unresolved.

Logic Works Inc. has taken a stab at this problem with the recent release of its Universal Directory. It is designed to import metadata descriptions from a variety of sources into a single Logic Works database, where it can be manipulated by developers, end users, and operations staff. However, the Universal Directory solution (despite the lofty-sounding name) is not universally accepted by vendors and has limited capabilities to manipulate and export metadata to its original source systems. Given Logic Works' successful track record with its ERwin data modeling tool, however, it's too early to assess whether the marketplace will accept Logic Works' initial efforts.

There is a compelling need for a common, workable, standards-based metadata interchange framework. What the market requires is a common metadata sharing format based on standard relational database technology and public APIs. The batch transfer/ASCII file approach of the Metadata Coalition is a tactical solution that could be leveraged into a more strategic offering.

Ideally, each vendor would define a name space, or common data format, for the metadata it creates within a standard, SQL-compatible schema, and this name space would be submitted to the Metadata Coalition. When an end user deploys a warehouse solution, the site manager could designate a relational database as the metadata repository, and each tool would write its metadata into that repository.

Metadata: Not Just for Warehousing

A final thought. Although this article has focused on the needs of the data warehousing market, readers should also keep in mind the metadata needs of evolving decision-support and transaction-processing systems. The artificial distinction between DSS and OLTP systems is going to become increasingly blurred during the next five years, as organizations move into what I call the Hyper-Tier environment.

A natural consequence of this migration will be the evolution of the corporate data repository (CDR), which will become the mother lode for corporate data. I believe that enterprises are already beginning to build systems that use the CDR (also referred to as the operational data store) as a resource that both feeds data to DSS and OLTP systems and captures results from them. Additionally, the establishment of Extranets linking enterprisewide operations with external business partners (such as suppliers, distributors, and customers) has profound implications for data naming, navigation, and access. CDR is an evolving concept. It will take organizations several years before they can begin to effectively apply this approach. Organizations will have to plan so that this metadata can be accessed by all databases, not just the warehouse. In the long run, this central data repository will become part of a company's overall information architecture.

During the next two to five years, we will see an explosion of interest in how to capture and reuse metadata in these extended systems. As a result, a single metadata resource (possibly a virtual, distributed resource) that spans DSS, OLTP, internal, and external systems will become an increasingly crucial requirement for the design, deployment, and management of large-scale systems.

Organizations that are counting on a data warehouse to change the bottom line must be prepared to address the metadata issues early and often. Management must first understand the purpose of the warehouse and the structure and nature of the data being warehoused. Understand which approach to metadata you need -- navigational, operational, or relational. At the same time, don't begin by defining an entire enterprisewide corporate data repository. Currently there are no defined standards, and it is simply too large a task for most organizations. Get your feet wet with a smaller, manageable project. Define metadata for a single department or even for two or three applications. The experience you will gain from this approach will be of great value as you expand the scope of your efforts.


Judith Hurwitz is president and CEO of Hurwitz Group Inc., a technology and management consulting company based in Newton, Massachusetts. Hurwitz Group focuses on the business impact, use, and deployment of distributed technology. You can email Judith at jhurwitz@hurwitz.com or visit her company Web site at www.hurwitz.com.

What did you think of this article? Send a letter to the editor.


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
July 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Wednesday, June 18, 1997