DBMS

DBMS ODBC Special Report Using ODBC to Access Nontabular Data

By Paul Reed
DBMS ODBC Special Report, April 1996

How several vendors provide ODBC access to objects, documents, hierarchical, and network data sources.


Open Database Connectivity (ODBC) has become a household word, and for good reasons. Since Microsoft Corp. introduced ODBC in 1991 as an answer to the SQL Access Group's long search for a common database access interface, it has matured into the de facto common SQL interface standard for heterogeneous database access to tabular data. DBMS servers such as Oracle7 and Microsoft SQL Server, as well many desktop products such as Xbase data managers (dBASE, FoxPro, and others), Microsoft Access, and Borland' s Paradox, to name a few, all present data in a tabular form organized into rows and columns.

Unfortunately, not all data sources found within organizations support the notion of tabular data. What about data that lives in legacy hierarchical databases on the mainframe? What about data that resides within objects stored in an object-oriented data base? What about data that is stored in network model databases held together by intertwined pointers? Although these data sources fall outside the typical view of a SQL-driven application accessing and manipulating rows and columns of data, they are all data assets that can benefit from one common access mechanism.

ODBC's Architecture

A brief primer on ODBC is necessary to reemphasize its mission and structure. There have been many excellent reviews done on ODBC to explain both its benefits and shortcomings (such as performance). One of the seminal reviews done to date is in the March 1994 issue of DBMS magazine. I still carry this issue around and recommend it to many of my clients. Another excellent resource is Kyle Geiger's Inside ODBC (Microsoft Press, 1995) and the SDK Reference Guides, most notably the ODBC 2.0 Programmers Reference and SDK Guide (Microsoft Press, 1994).

Remember that ODBC's mission is to provide a common access interface using SQL. Figure 1 shows the general architecture of ODBC. The application in this case can be built from scratch with any development tool that can call a DLL f unction, or it can be a product that provides data access functionality right out of the box. Two examples of the latter approach are Microsoft Access and Business Objects, an end user query tool from Business Objects Inc.

The ODBC API is a collection of function calls such as SQLGetInfo, SQLFetch, and others that interact with the ODBC Driver Manager (a DLL usually found in the user's windows/system directory), and a driver supplied by the DBMS vendor or a third party. Th e Driver Manager loads drivers on behalf of the application. Applications access databases represented as ODBC data sources, and each ODBC data source must specify a driver. A data source in this case might be an Oracle database or a Microsoft Access MDB file. The job of the driver is to take the ODBC calls passed from the application and translate them into equivalent calls or protocol in the format expected by the target database. These are called two-tier drivers, and two examples are drivers for Ora cle and Sybase SQL Server. In some cases, the driver itself actually processes the SQL and retrieves the data. Borland's Paradox and Microsoft Access are examples of these single-tier drivers.

It is apparent that the driver is key to unlocking the mysteries of the target DBMS. All drivers are not created equal. For instance, there are at least six different Oracle ODBC drivers on the market today. Vendors of products that don't support tabular data must focus their efforts on mapping their data format into the tabular format accepted by ODBC.

Vendor Implementations

This article describes six different examples of ODBC drivers that provide SQL access to nontabular data sources. The products I discuss represent a broad cross-section of different types of nontabular sources, ranging from legacy hierarchical databases to network model databases. This is just a small sample; many other products provide similar support for other nontabular data sources.

Hierarchical Databases

According to the Gartner Group's March 22, 1995 research report titled, "IBM's IMS Product Strategy," IBM's hierarchical database Information Management System (IMS) is still the cornerstone of many OLTP legacy applications in production today. EDA/SQL f rom Information Builders Inc. (IBI) is one example of a middleware product that provides access to IMS and a host of other data sources.

EDA/SQL provides an ODBC driver to access IMS data. Figure 2 shows an example of a simple IMS data structure. Notice that the Customer segment, IMS's name for the entity, has two child segments, Order and Communication. There can b e many Order segments and many Communication segments for a single Customer. This hierarchical structure doesn't conform to a relational model of tabular data. Sure, all the columns within Customer are the same, but how can you retrieve customer name and address, as well as each order number and order amount? Or, how about a query to return the customer number followed by the average value of all that customer's orders sorted by zip code?

This is where EDA/SQL's ODBC driver and API/SQL API step in. The EDA/SQL ODBC driver translates ODBC API calls into IBI's API/SQL calls. In Figure 2, the application could view the three segments as three tables. The application generates SQL to join the Customer table and Order table to return the columns desired. Any ORDER BY or GROUP BY processing as well as aggregate and scalar functions are resolved on the mainframe, taking advantage of its horsepower.

Internally, IBI uses a mainframe-based dictionary called "EDA catalog" to determine what specific IMS access mechanisms should be used to resolve the query. The query is translated into IMS calls referencing the appropriate IMS segments. For you ex-IMSer s like me, for a specific Customer this may actually be a Get Unique (GU) call to the Customer segment, followed by Get Next Within Parent (GNP) calls for the related Order child segments. Additionally, to speed access the IMS database administrator may opt to provide a logical view of the two segments as one relational table, although this is highly denormalized. To ensure acceptable performance, the IMS database administrator must know how applications will access the IMS data in order to tune the EDA catalog entries.

IBI recently enhanced EDA/SQL to support INSERT, UPDATE, and DELETE statements against IMS databases accessed through ODBC, thus making it more attractive for more robust client/server applications. However, due to the nature of IMS, some core SQL DDL st atements are not supported by the EDA/SQL ODBC driver. These include table level statements such as CREATE TABLE, DROP TABLE, and ALTER TABLE. EDA/SQL cannot execute these statements dynamically because the internal architecture of IMS requires the use o f specialized administration tools on the mainframe.

Lotus Notes

Lotus Notes has become a darling of the messaging and workflow marketplace. Notes has become a mainstay at many companies that use it for applications such as workgroup collaborations and complex sales tracking and monitoring. Because Notes organizes dat a into documents, Notes' internal database structure is far from tabular. The compound document data structure is designed for maximum flexibility in order to support the rich object collections that must be included in Notes-rich text documents. In summ ary, Notes is tuned to facilitate unstructured communication and information flow. However, Notes users still have traditional analysis and reporting requirements and need access to Notes information from a variety of tools that rely on SQL data access. Enter NotesSQL, Lotus' ODBC driver for Notes.

Figure 3 shows an application that utilizes NotesSQL. A budgeting spreadsheet in Microsoft Excel needs access to sales forecasts from a Notes database. When invoked by Microsoft Query from within Excel, the NotesSQL ODBC driver (wh ich is delivered as two DLLs) takes the ODBC API calls from Microsoft Query and interacts with the Lotus Notes workstation software. Lotus Notes workstation and server components handle all network traffic and security. The NotesSQL ODBC driver can treat both Notes forms and views as tables. In Notes, forms define the fields on a document, and users create and update documents using forms. Views define and display a selection of documents, as well as calculated columns and sort specifications. Notes vie ws are indexed, and SQL queries on views can use these indexes to improve performance. If a SQL query needs a unique ordering or grouping not supported by an existing view, then a new Notes view should be created to utilize the Notes server's horsepower. Some Notes features are not accessible through the ODBC driver. For instance, Notes view columns that reference certain functions such as @DocLevel and @DocNumber do not return a value when you use the ODBC driver.

The NotesSQL ODBC driver version 1.2, which became available in October 1995, has gone a long way to improve performance over the initial release of the driver. Early performance problems were traced to the initial ODBC SQLConnect function, which dynamic ally cached the system catalog tables. Currently, NotesSQL does not support access to multivalued composite fields. According to Lotus, this was a conscious decision to avoid some rather arcane work arounds using ODBC. They have opted to wait for the wor k coming out of the SQL3 and X3H2 standards groups, which is addressing many of these issues and should be supported in ODBC 3.0. (See Ken North's article, "Understanding ODBC 3.0 Standards and OLE DB," in this issue.) By the t ime this article goes to press, Lotus will have released Notes 4.0 and version 2.0 of their NotesSQL ODBC driver. The new driver is now 32-bit as well as cross platform. Lotus also focused on tuning SQL optimization and index selection on the server side .

Object and Object-Relational DBMSs

Illustra Information Technologies Inc. markets the Illustra Server, its flagship product, as an object-relational DBMS. (Informix announced plans to acquire Illustra, but the acquisition had not been completed by press time.) The Illustra Server supports object-oriented management of rich data types, and it also provides a SQL based on extensions to industry-standard SQL. Illustra uses extensions called DataBlades to enable its server to store and manage new data types or domains. DataBlades come in man y flavors, ranging from 2D/3D spatial modeling to time series analysis for data such as stock prices. Illustra has heard loud and clear from its customers who need to access data stored in the Illustra Server using front-end tools such as Microsoft Excel .

Like other products, the Illustra ODBC driver maps ODBC function calls to LIBMI (Illustra's API) and communicates these requests to the server. The standard ODBC interface defines a fixed set of data types, while the Illustra Server supports an extensibl e set of data types. However, you can map Illustra data types to ODBC SQL data types. The Illustra ODBC driver can bind an Illustra data type to a different Illustra data type before the Illustra server returns the data to a client. There are several way s to accomplish this, but the default method is to utilize a lookup table residing on the Illustra server. The server then executes functions that change the Illustra data type to an ODBC data type. If desired, the application can bypass the table lookup approach to bind user-defined data types dynamically by utilizing Illustra extensions to ODBC Option function calls such as "SQLSetConnectOption" and "SQLSetStmtOption."

It is important to realize that depending on what data is returned, such as spatial X,Y coordinates, the client will require logic that can deal with the result set. Illustra is currently working on intelligent objects packaged as OCXs that can be embedd ed within the client application to handle certain types of returned data. If you're a crafty C/C++ programmer and understand the layout of the returned object, then there is no problem.

In some cases, off-the-shelf tools such as Microsoft Excel already know how to deal with the data returned from an Illustra server. A good example involves using Illustra's Time Series DataBlade and ODBC through Excel. Using Excel's ODBC add-in (xlodbc.x la, not MS Query), you can request the 200-day moving average of a stock, and the data returned will be populated into a range of spreadsheet cells. (See Figure 4.) For example, the request could specify Informix's stock price from January 1, 1995 through July 19, 1995 for a 200-day moving average, as well as the high/low information for each day. This can be formulated as a SQL query and passed to the Illustra ODBC driver, which maps it to LIBMI calls on the server. Interacting w ith the Time Series DataBlade, the data would be cast from Illustra's Advanced Data Type (ADT), such as time series data, to an ODBC SQL data type. Once in Excel, the stock analyst can chart it and even overlay it on to another time series such as the NA SDAQ high/low over the same period of time. Again, this scenario uses tools already in-house, without needing additional specialized viewers or reporter generators using ODBC.

Object-oriented database management systems (OODBMS) store any type of complex object, while also providing acceptable levels of performance and a rich programming interface. Object Design Inc. has been shipping its ObjectStore OODBMS server since 1988. While ObjectStore supports both C++ and Smalltalk as native interface languages, at press time, Object Design plans to ship an ODBC driver in March of this year.

ObjectStore uses a schema mapper, which maps C header files to relational views. You can write methods to materialize a relational view from a collection of many different objects. In Figure 5, an ObjectStore Customer Object may al so encapsulate more than one Address Object. Both the Customer and Address Objects would have a C header file, and these header files would map either to one denormalized relational table representing rows of both Customer and Address attributes, or to t wo tables, one for Customer the other for Address. For example, you can use Microsoft Word and ObjectStore's ODBC driver to retrieve the data and generate customer invoices addressed to each customer's shipping address. Word sends the ODBC calls to Objec tStore, which then materializes the result set after consulting the schema mapper, as well as any unique mapping methods. ObjectStore has opted to stay with native ODBC support and does not provide any of its own extensions.

ObjectStore incorporates SQL access technology developed by Dharma Systems Inc. Dharma provides the Dharma/SQL ODBC driver and Dharma/SQL Access (a SQL engine that can be layered on top of any database environment) to companies such as Object Design that want to open their nontabular data stores to SQL access. Other companies using Dharma's technology include Intergraph, a CAD/ CAM vendor, and Objectivity Inc., another OODBMS vendor. Simba Technologies Inc., formerly PageAhead Software, also offers similar layering products.

Network Database Vendors

Do you ever wonder how to handle relationships such as a parts list? For example, an airplane wing is a part when the entire plane is considered. But the wing may be composed of multiple parts, which in turn are made up of other parts. Relational tables are not the most elegant way to handle this relationship. Network database technology handles these types of complex relationships more naturally. A series of pointers relating parts enables applications to efficiently "walk" or access the result set at high speeds. Titanium from Micro Data Base Systems Inc. (mdbs) is one example of a DBMS that implements the network database model. mdbs provides an ODBC driver to support their clients' traditional data-processing needs ranging from reporting to ad hoc analysis of pointer-based data structures.

Titanium uses Dynamic Pointer Arrays to manage complex relationships. Because Titanium is founded on the ANSI/SPARC three-schema architecture, mdbs strives to insulate the external view from the internal and conceptual views of the data. For relational s upport, Titanium uses a table definition file (TDF) to flatten the complex physical data structures. These tabular views are not actually materialized when the TDF is defined, but when the query is submitted either through mdbs' own API or through ODBC i n conjunction with the Titanium API.

In Figure 6, you can notice how Titanium's ODBC driver handles these complex many-to-many and many-to-one recursive relationships. The goal is to produce a report showing the managers' names, their employees, and the assigned tasks , using the popular Crystal Reports (from Crystal, a Seagate Software Co.) reporting tool, which supports ODBC. A SQL query references the relationship defined in the Titanium TDF script. Titanium's ODBC driver on the client maps ODBC calls made by Cryst al Reports to the Titanium API. These calls are then passed to the Titanium server where the relationship, defined in the TDF script, is mapped to the network schema's base records, so that the query set can be resolved. Titanium's ODBC driver currently supports select statements only.

Another network database vendor, Raima Corp. (Issaquah, Wash.) should have released an ODBC driver for their Raima Database Manager (RDM) by the time you read this. Raima took the SQL engine from their Velocis product and used it to build a product calle d the SQLizer, which creates relational views of the RDM data. RDM will also only support SELECT statements initially, but Raima promises full update support in a future release.

No Longer a Pipe Dream

A common method of accessing heterogeneous databases, once a pipe dream, is now more real than ever. More and more vendors are providing products that support SQL queries using ODBC as a common interface, regardless of the target data source. ODBC driver providers such as Intersolv's Q+E, Visigenic, and OpenLink are providing the majority of driver packs on the market today. Microsoft also recently licensed the rights to distribute ODBC SDKs on non-Windows platforms to Visigenic. In addition, with speci alty ODBC driver players such as Dharma and Simba Technologies serving some of the nontabular marketplace vendors, the future seems brighter than ever. The SQL Access Group's goal of providing a common access mechanism to tabular data using SQL has grown into a specification that all vendors have taken seriously. When tempted to stray from the vision that ODBC is intended to support, remember that homog eneity, or using one data access interface, means that more time can be spent on building applications and crafting solutions and less time on interconnectivity issues.


Paul Reed is president of Jackson-Reed Inc., a Seattle-based international training and consulting company focused on migrating organizations to client/server technology. You can reach Paul via email at prreed@jacksonreed.com or via the Internet at http: //www.halcyon. com/prreed/jackreed.html.

* Dharma Systems Inc., 15 Trafalgar Sq., Nashua, NH 03063; 603-886-1400 or fax 603-883-6904.
* Illustra Information Technologies Inc., 1111 Broadway, Ste. 2000, Oakland, CA 94607; 510-652-8000 or fax 510-869-6388.
* Information Builders Inc., 1250 Broadway, New York, NY 10001-3782; 212-736-4433 or fax 212-268-7470.
* Lotus Development Corp., 55 Cambridge Pkwy., Cambridge, MA 02142; 617-577-8500 or fax 617-693-3512.
* Micro Data Base Systems Inc. (mdbs), 1305 Cumberland Ave., West Lafayette, IN 47906-0438; 800-455-6327, 317-463-7200, or fax 317-463-1234.
* Object Design Inc., 25 Burlington Mall Rd., Burlington, MA 01803; 800-962-9620, 617-674-5000, or fax 617-674-5010.


Figure 1


--In ODBC's layered architecture, an application invokes an ODBC driver for a specific data source through the driver manager. The driver interacts with a data source that may be tabular data or nontabular data.

Figure 2


--IBI's EDA/SQL driver accesses IBM IMS hierarchical databases through ODBC calls mapped to EDA API/SQL calls. The EDA API/SQL calls are then converted to IMS commands that return Customer and Order information as a tabular result set.

Figure 3


--The Lotus NotesSQL ODBC driver can access, via Microsoft Excel, budgeting information stored in a Notes database. Data retrieved from a Notes View or Form is returned as tabular data that populates a range of cells.

Figure 4


--Illustra's Time Series DataBlade can transfer stock data to a client application. The Illustra Server changes its Advanced Data Types to SQL data types before returning data to the client.

Figure 5


--The ObjectStore server accepts ODBC calls from Microsoft Word. It maps C header files to relational views and returns invoice data for use in Word documents.

Figure 6


--Titanium converts ODBC calls to Titanium API calls passed to a Titanium server. The server utilizes a table definition file (TDF) to materialize a relational view of the underlying network data.
Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
April 1996 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Monday, October 14, 1996