If your enterprise dreads the sprawl of a data warehouse, consider the virtues of the data mart.
Data mart technology is changing -- growing beyond its roots of providing easy and fast access to one subject area of information for a small group of users. Organizations are realizing that enterprisewide data warehouses are complex to both build and use. To implement, data warehouse development requires significant staff, hardware, software, time, and funding. Conflicting organizational needs must be addressed. Data warehouses have been called data jailhouses, data basements, and data cemeteries because users find them difficult to understand and tough to navigate. Consequently, organizations are building data marts in place of or to complement enterprise data warehouses.
As the use of data marts has exploded, so too has the capability required of them. Rather than simply being small databases with easy end-user access, some data marts must now be scalable to consist of hundreds of gigabytes of data upon which sophisticated analyses such as data mining can be performed. Hundreds of users must be able to remotely access data marts -- a capability that Internet and Intranet technology can provide at low cost. Users are questioning why data mart installations require a large amount of custom setup. Finally, organizations need to be able to centrally administer and manage many data marts to prevent the proliferation of data marts that contain inconsistent and conflicting data -- just the problem that data warehousing was supposed to solve!
Data marts are growing to sizes that approach that of small-scale, enterprise-wide data warehouses. Traditionally, data marts were defined as small databases focusing on one subject or functional area while an enterprisewide data warehouse contained information from many different subject areas across the corporation. However, 100GB Unix and Windows NT decision-support databases are being built today. In some situations, this size of database includes all of a corporation's decision-support needs and is a data warehouse. In other situations, the same system would be a data mart extracted from a terabyte-sized data warehouse.
Although it is now difficult to differentiate data marts and data warehouses simply based on size, some distinctions are still important:
New data mart technology is still being developed, although not as prolifically as several years ago when relational OLAP was new and numerous vendors came onto the data warehousing scene. Both Information Builders Inc. (IBI) and SAS Institute Inc. announced new databases in the last year: Focus Fusion and SAS MDDB, respectively.
The recent growth in data marts, however, has created several problems in providing users with access to an organization's information:
Red Brick Warehouse 5.0 from Red Brick Systems Inc. provides an example of significant performance improvements being made to an underlying data mart database engine. Features include Red Brick's Continually Adaptive Indexing TARGETindex capability, which provides indexes that automatically and continually adapt to the data being processed. A new hybrid hash join more efficiently handles situations such as subjoin recursion in very large joins, the joining of tables of significantly different sizes, and improved performance optimization of hash joins. To avoid having to run several SQL statements in a row to retrieve results for a complex query, SQL queries can be embedded in the FROM clause of another query. Like some other tools, initial rows of a query response can now be retrieved for end-user analysis while other result rows are being created.
Multidimensional databases (MDDBs) such as Arbor Software Corp.'s Essbase support incremental database updating so that the entire MDDB structure doesn't need to be changed for each update -- only data cells that are affected will get updated. This is an advantage because, unlike a relational table consisting of rows that can be individually updated, MDDB cubes traditionally have required the entire cube to be updated if any data in the cube changed -- a time-consuming process.
One promising yet underused area in data mart performance improvements involves decreasing the size of the data mart itself. Several vendors have created innovations in this area. Pilot Software Inc.'s Pilot Decision Support Suite now supports dynamic dimensions and hierarchies to significantly reduce server size and consolidation time. Dynamic dimensions allow aggregations to be calculated on demand rather than preaggregated and stored in the MDDB cube. This can significantly reduce the size of an MDDB cube as well as the consolidation time required to calculate MDDB aggregations. One MDDB reportedly decreased in size from 4GB to 200M using this technique.
CrossZ Software's data mart solution, QueryObject, can be viewed as either an MDDB or relational database using ODBC-compliant query tools. The company claims that fractal algorithms compress source data up to a 10,000-to-one ratio with 100-percent accuracy. End-user response time is still a primary consideration in all data mart designs, and no method guarantees satisfactory data mart performance. Although the approaches tried by various vendors improve data mart performance, there will always be tradeoffs in balancing end-user performance, precalculated vs. on-the-fly summarization, data load performance, and the size of the data mart.
MicroStrategy Inc.'s DSS Administrator is designed to manage multiple decision-support system projects, user groups, and reports. (See Figure 2.) One powerful capability is the management of virtual data marts that enable users to access appropriate information throughout an enterprise from many different physical data marts. Users can be placed in groups for easy security administration. Application development can be managed by moving applications from a prototype to development to deployment data mart. The administrator can perform an overall system analysis as well as examine report generation response time and resource usage by time of day. System resource, report, and data usage is available by user. Bills and chargebacks can be generated for users of data marts based on usage, activity, and time of day. Because system metrics are stored in a database catalog, custom reports can also be generated.
One aspect of data mart administration is the ability to optimally tune the data mart. IBI's Site Analyzer enables an administrator to analyze user and resource statistics such as columns and tables accessed, performance, and resources consumed about each query. An administrator can use this information to set query thresholds. The IBI pre-emptive governor uses rules in a knowledgebase built from the system statistics and thresholds to estimate before a query is run whether it will exceed resource limits; then the governor accepts or rejects the request based on its estimate.
I'm relieved to see products that address the need for enterprise data mart administration. This is an area with growing requirements, such as the coordination of data mart and enterprisewide data extraction, load, and replication procedures, backup and recovery procedures, metadata management, security, and performance tuning.
IBI's SmartMart employs its EDA middleware through source data acquisition tools to extract and process data from over 60 relational database or file sources into a data mart based on IBI's Fusion MDDB or major relational databases. IBI's Focus Six for Windows is the data access tool. In addition, WebFocus is the Internet end-user data access product. Warehouse administration tools and a metadata information catalog are included.
IBM's Visual Warehouse runs on an OS/2 or Windows NT server. The Windows NT version includes the Visual Warehouse server, ODBC drivers, a DDCS connectivity product, DataGuide metadata repository, and Lotus Approach for end-user analysis. All versions of DB2 and popular relational and nonrelational databases can be accessed.
Informatica Corp.'s PowerMart suite 3.5 contains Informatica PowerMart Designer, Repository, Server Manager, PowerMart Server, and Change/Capture components. Popular relational databases are supported. The Star Schema Design Wizard uses a visual interface to step a user through the database design. Once completed, the SQL data definition language is generated for the target database. The Change/Capture function accesses operational system log records to move incremental change records into the data mart. Changes can be captured as they occur or on a periodic basis.
NCR Corp.'s RightStart program is designed to deliver within 90 days a departmental data mart that can then grow into an enterprise data warehouse. The data mart includes a WorldMark 5100S server, the NCR Unix MP-RAS operating system, a database (Teradata, Oracle, or Informix), data access and transformation tools, and professional services. Separately, NCR has announced that it will port its Teradata database to Windows NT, provide high-speed asynchronous and synchronous replication, and improve interoperability between Teradata and Microsoft SQL Server databases.
Data marts in a box can provide a convenient and relatively low-cost source of data mart tools. Although these products can kick-start a data mart development project, issues such as operational data extraction logic, consistent data definitions, and designing the data mart for optimal performance must still be considered.
Data mart Internet/Intranet access products generally consist of code that sits between a Web server and the vendor's OLAP product. Netscape Navigator and Microsoft Explorer browsers are supported. Leading data mart companies are beginning to include Java and/or ActiveX capabilities in their offerings, as compared to initial products that were constrained by HTML functionality.
The Windows NT-based Essbase Web Gateway provides OLAP functionality to users of Essbase. In addition to slicing and dicing, multiple users can write back to the database.
Pilot Software's Pilot Internet Publisher lets Pilot Desktop users access Pilot Decision Support Suite functionality via standard Web browsers. Pilot Software and Blue Isle Software Inc. have agreed to a partnership that will use Blue Isle's InTouch/2000 software to distribute Pilot personal cubes and reports to users via the Internet or major email systems. Data is compressed to five to 25 percent of its original size.
MicroStrategy's Windows NT-based DSS Web 4.1 features include a Java-based AutoPrompt facility to aid in building queries, international language support at the desktop level, and DBA-level diagnostics. The program supports Windows 3.1, Windows 95, Windows NT, OS/2, Macintosh, and Unix client operating systems.
Over time, data mart vendors should expand their Internet server support beyond Windows NT to Unix. Internet browser functionality must catch up to the functionality found in traditional client/server OLAP applications, or else there will be two classes of data mart access: basic slice-and-dice capability for Internet-based users, and more sophisticated capabilities for those with client/server access. Vendors may use Java and ActiveX to provide sophisticated functionality to Internet clients, although this could require getting away from the thin-client paradigm.
Although Internet browsers don't contain all of the access capabilities available in client/server applications, Internet access is a good solution for providing data mart access to users who are large in number or geographically dispersed -- especially users who don't need sophisticated interactive capabilities. These differences will disappear as Java and ActiveX technologies are exploited and users migrate to Windows 95.
If you are investigating or building a data warehouse or data marts, then the benefits and costs of taking the time and resources to build an enterprisewide data warehouse must be balanced against the benefits and costs of getting a data mart up and running quickly to address specific business opportunities.
An enterprise data warehouse approach has several advantages: The requirements of all business functions can be included; data definitions and business rules are consistent; and data redundancy is minimized. Advantages to a data mart strategy include lower initial implementation costs and effort, as well as a shorter development timeframe; end users experience improved performance; and business functions can control their own data marts.
Although data marts can provide success in solving business problems, the proliferation of unplanned data marts throughout an enterprise can lead to data inconsistency, data duplication, and an inability of users to access all of the data they need. Vendors are addressing some of these ongoing challenges:

