DBMS
 

 

Sagent Data Mart Solution

By Joseph Williams
DBMS, February 1997 The Sagent Data Mart Solution is a family of integrated products for building, managing, and accessing data marts.

The effective use of data marts in the data warehouse environment is an important factor in warehouse effectiveness and can also be a determinant to successful project deployment. Data marts are designed to meet the specific needs of common user groups such as geographical divisions or organizational divisions. Data marts are usually subsets of a data warehouse repository, but they can also blend data from a number of heterogeneous sources and even be larger (in scope, if not in the volume of data) than the central warehouse itself.

Because data marts are critical to the success of wide-scaled data warehouse projects, the successful creation and maintenance of these marts is also a critical factor. Sagent Technology Inc.'s Sagent Data Mart Solution provides an integrated data mart solution based on Windows NT that allows for central management of distributed data marts. The Sagent Data Mart Solution consists of four companion applications:

First Things First

Installation of the Sagent Data Mart Solution is a straightforward task; however, it does require some planning to ensure optimal performance and to ease the effort involved in setting up new data marts. The three major decisions to make are: where to create the repository database, where to create the data mart database, and where to install the DC Agent.

The repository is the database where the Sagent specific data and the metadata reside. The data mart database holds all of the business-related data and returns much larger result sets than the repository. Sagent supports Oracle 7.x, Sybase System 10 and System 11, and Microsoft SQL Server 6.x for the data mart and repository servers. Additionally, Sagent provides built-in support for performing batch data loads to these platforms. IBM's DB2 and Red Brick Warehouse can also be used as a repository or data mart; however, batch loader support is not yet available for them.

The DC Agent is a fully configurable service that runs on an NT server and monitors and controls all database connections; it also brokers all requests for information. Although the DC Agent can run on the same machine as the repository, the number of connected users and the requests they generate will have a great effect on memory requirements and consumption and should be considered carefully.

I set up the Sagent Data Mart Server on an NT 3.51 server running Microsoft SQL Server 6.5 with 32MB of RAM. I ran the repository database, data mart database, and DC Agent all on the same machine. I installed the client software on an NT 3.51 workstation with 32MB of RAM. Sagent includes a setup utility for loading sample data. This utility does a nice job of building and populating the necessary tables after the destination databases are manually created.

A Table With a View, Please

Developers use the Design Studio module as a graphical toolkit to populate data marts and create various views for end users. BaseViews are the building blocks of the Sagent paradigm. A BaseView is a graphical representation of the database including tables, columns, and joins. (See Figure 1.) When creating a new BaseView, a dialog box prompts you for server name, database name, user name, and password. The BaseView editor then displays all tables in that database, and joins are established by clicking on the desired field of one table and dragging it to the join field. Access to BaseViews is through the BaseView editor on the tool menu in Design Studio. BaseViews cannot be seen through the Information Studio module.

Once a BaseView is created, multiple MetaViews can be derived from it. A MetaView is a logical representation of the Parts from one or more BaseViews. The Parts Bin is the Sagent method for showing all of the elements available in the MetaView. (See Figure 2.) Initially, tables are displayed as categories and fields as parts. You can rename categories and parts or remove them from the MetaView altogether. By dragging the desired fields from the BaseView editor, you can also add new Parts to the MetaView. These changes affect only the MetaView and not the underlying BaseView.

MetaViews let you use a single BaseView to present different pieces of information to different sets of users. For example, if the BaseView was the Gourmet Food schema shown in Figure 2, one MetaView might be designed for the marketing department with only promotions information, and another MetaView might be designed for the accounting department with sales information by region. This setup lets you present only the information you desire to any specific set of users.

Everyone Needs a Plan

The right side of Figure 3 shows the Data Flow Editor on top and the corresponding result set on the bottom. The Data Flow Editor uses the MetaViews and Transform objects available to it to create reusable output called Plans. These Plans can do anything from return the results of a query to perform data cleaning and data mart population. The Data Flow Editor uses a visual programming paradigm, which makes performing complex SQL operations and transforming and loading data a relatively easy task. Once a Plan is saved, it is stored in the repository. Plans can be shared with other users by publishing or autosubscribing the Plan to them. Plans can also be exported to a file and imported to another repository or sent to other users using Design Studio's email integration.

The purpose of the Plan in Figure 3 is to return the store name, store manager, and product name of the of the 10 items with the highest sales. The first step was to select the appropriate MetaView; the one I chose shows all of the data elements in the Gourmet Food star schema shown in Figure 2. By dragging the desired fields onto the workspace, the basic SQL statement is generated. Next, click on the toolbox icon at the bottom right side of the screen to display all of the available Transform objects.

There are five broad categories of Transform objects: Data Flow, analysis, broadcast, population, and VBScript. (I will look at these in more detail below.) I selected an analysis Transform that performs ranking operations. I dragged it into the Data Flow Editor workspace, connected the input side to my SQL query and the output side to my display grid, and then double-clicked on the Rank object to edit its properties.

Transformation

The Transform objects give the Sagent Data Mart Solution flexibility and power. End users will primarily use Data Flow and Analysis Transforms in the normal course of business. The Data Flow objects include the ability to perform visual and manual SQL queries, create output grids, and join columns from two Data Flows. The Analysis Transforms include ranking, sorting, averages, and total. These tools greatly empower the user to answer many of the questions that previously required the assistance of special report writers.

Although the Sagent Data Mart Solution is a good ad hoc query tool, it is designed as a data mart management tool, and this is where it really shows value. The issue of migrating OLTP database systems into a data mart that is optimized for decision support is handled well. The documentation provides a good primer on the differences between a typical OLTP database and a database built on a star schema, optimized for warehouse reporting.

The sample data includes an OLTP database that tracks credit card transactions. The documentation walks you through converting the database to a star schema and populating a data mart. It provides good guidelines for creating the fact and dimension tables in the destination database.

Sagent provides Population Transforms to extract data from an external source, transform the structure, and, if necessary, load the data into your data mart. These transforms include Key Generation, Time Generation, Key Lookup, Time Lookup, and Batch Loaders. The Time Generation and Key Generation populate the dimension tables in a star schema and create integer keys if none exist. The Time Generation Transform populates the time dimension table, enabling the user to analyze time data without complex SQL calculations. The Time Generation Transform does not accept input; it is a data source that creates the data needed for the time dimension table and then populates it through the Batch Load Transform.

In contrast, the Key Generation Transform accepts input from the SQL Query Transform or any other valid input, generates a key (if needed), and sends the information to the Batch Loader Transform to populate the various dimension tables.

The fact table is loaded via Key Lookup Transforms. The dimension tables must be populated before the fact table can be populated. The fact table represents the business measurements in your OLTP database. This data is passed from the SQL Query Transform into the Time Lookup Transform and then successively through a series of Key Lookup Transforms corresponding to the dimension tables previously populated. This sequence defines the granularity and content of the data in the fact table. Next, the data is passed into a Batch Load Transform, where the fact table is finally populated.

Sagent provides Batch Load Transforms for Microsoft, Oracle, and Sybase to assist in populating the destination database. The loader accepts input from a SQL Query Transform, Key Generation Transform, or other valid input source, and it uses a configuration file to map the destination database and table. The Sagent Data Mart Solution does not have robust internal data translation capabilities for performing calculations or value substitutions to source data. However, you can add custom VBScript and C++ Transforms to the data flow plans you use.

Information Studio

I have concentrated on the Design Studio module of the Sagent Data Mart Solution because that is where the construction capabilities lie. However, Design Studio is a superset of the Information Studio module, and I would be remiss if I didn't point out some of its features.

I mentioned earlier the ability to publish and subscribe to Plans. This ability makes information sharing and report dissemination an easy task. Additionally, Sagent has an additional piece called Sagent WebLink that allows access to the information in the Sagent Data Mart Server via your favorite Internet browser.

Sagent also provides the ability to save the results of a Plan with its associated data. This ability is known as a Snap; Snaps can be distributed just like Plans, and they provide an excellent mechanism for viewing historical data. Alerts are available to inform Sagent users when a Snap or Plan is published.

The Sagent Data Mart Solution has a drag-and-drop scheduler that eases the administrative burden of the data mart environment. This feature is a great way to automate the extract and load processes.

Where on the Shelf?

Analysis and Reporting modules are due out in the near future. Cofounders Ken Gardner and John Zicker were also founders of ReportSmith. With the reporting experience these men bring to the table, these components will make Sagent competitive as an integrated solution.

All in all, the Sagent Data Mart Solution is a strong product -- but several annoyances did occur, mostly as a result of being a 1.0 release. Sagent attempts to "think outside the box" and free itself of traditional methodologies. Overall, this provides benefit and added value, but the interface did not always flow as nicely as I would have liked. For example, the BaseView editor provides a "virtual view" of the desktop to aid in locating and relocating tables and positioning them for manipulation. I found the view cumbersome to work with, although it was certainly functional. As I was dragging this desktop around, I also found many times that adding simple scroll bars would make it a lot more maneuverable.

The installation of the test data proved interesting as well. I ran into several rights issues where I could not see predefined BaseViews and MetaViews because the password for the "SA" user on the test database did not match the SA password on my Microsoft SQL Server database. I was able to get around that obstacle by recreating the BaseViews and MetaViews. This process was cumbersome but not critical; I experienced several critical errors until I recreated these views, and then they went away.

These peripheral issues can be annoying, but they do not take away from the quality of the product itself. For a 1.0 release, the Sagent Data Mart Solution makes a strong statement to quality and functionality. This product will make a nice addition to any data mart administrator's shelf. But watch out -- when the Reporting and Analysis modules come out, you'll need a bigger shelf.


Figure 1.


This is a BaseView of the Gourmet Food database that ships with the Sagent Data Mart Solution. A BaseView is a graphical representation of a database including tables, columns, and joins; it is the foundation that other views and Plans are based upon.


Figure 2.


The window on the left shows the current MetaView. In this view, tables are known as categories and columns are known as parts. The window on the right is the Data Flow Editor. The top portion shows the current Plan and the bottom portion shows the result grid from the current Plan. The grid is created by dragging parts from the left window onto the Data Flow Editor workspace.


Joseph Williams is a senior consultant with Baily & Quinn and the president of ZMS Consulting in Atlanta, Georgia. He specializes in client/server application development and deployment. You can email Joseph at jewill@mindspring.com.
Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
February 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Wednesday, January 22, 1997.