DBMS, May 1997
DBMS Online: Server Side By Martin Rennhackkamp

Desegrating DBMSs

The Art of Implementing Integrated, Interactive Web-to-Database Applications

Taking simple data from a database and placing it on the Web is a rather simple task, because this process is usually implemented as block-mode operations using single SQL statements. I have illustrated this process in a separate sidebar titled, "Basic Web Architectures". However, implementing interactive Web-to-database applications is more complex, because difficult concepts such as database connections, session switching, multistatement transactions, and resource and security controls must be implemented. In most cases, the data you want to publish is maintained in a variety of sources, including legacy, relational, and object databases, as well as in flat files, text files, word-processing documents, and Web pages. The problem becomes much more complex when these diverse data sources must be queried or updated.

In this article I present the techniques and tools that you can use to implement integrated and interactive Web-to-database applications. In particular, I focus on how you can integrate the data from multiple databases for publication on an Intranet or the Internet. I also note the facilities offered by the commercially available DBMSs, Web servers, Web browsers, and other related tools.

Enabling Technologies

Database query pages become much more interesting, as well as complex, when you want to integrate the data from multiple data sources on the Web. An example is when you want to join the data from two relational tables in a single table field on a Web page. Of particular interest is when the tables are stored in two different databases on two different servers, managed by two different DBMSs.

If you consider the architectures of the basic CGI-based, Web server API-based, and Java-based Web-to-database applications, there are various points where you can integrate the data from multiple data sources. You can integrate the data at the Web browser, at the Web server, between the Web server and the DBMS server, and "at" the DBMS server.

In the scenario where we want to integrate data from multiple data sources, the query submitted from the Web browser refers to tables stored in different databases. In most of these cases, the query must be partitioned and the results from the different data sources must be integrated. Either you must code the query-partitioning and the data-integration facilities, or you can use some form of middleware to perform the process. You can use various enabling technologies to integrate data on Web-to-database applications, such as database gateways, multidatabases, TP monitors, and various advanced Web-development tools. Although some of these tools are not part of the Web-specific technology wave (they originated in the client/server era), they are very useful in the Web world.

Database Gateways

Gateways, from a database perspective, permit database interoperability, enabling client applications to connect to multiple back-end data sources. The database gateway is a special type of data-delivery middleware that gives client applications an API to enable various data sources to appear equivalent.

A database gateway consists of the following components:

Client API Library: The client-side API, which determines the format and meaning of the requests that the client applications may issue.

Server API Library: The API on the database server-side, which determines the database services available to the clients. These database services are available in terms specified by the DBMS vendor.

Glue: Translation and mapping mechanisms that transform the client API to the DBMS server's API, and vice versa for the data returned to the applications.

Multidatabases

A multidatabase system (MDBS) is a special type of distributed database system that is encountered more often in industry than the truly distributed database systems described in academic literature. It is a coherent, integrated collection of data that logically looks like a single database but is physically implemented in a number of databases. This is illustrated in Figure 1.

An MDBS has several characteristics that distinguish it from the "model" distributed database system that is usually described and studied in theoretical contexts. Each participating database is autonomous. The local users of the participating database continue to use their local applications on the database, unaffected by its participation in the MDBS. The data in the participating database is also usually owned by its local users.

The databases participating in the MDBS can be heterogeneous, meaning that they can differ in platform, type, data model, and contents. From a control, locality, fragmentation, and implementation point of view, the databases participating in the MDBS should be independent. Users of the MDBS should not be aware of how or where the data is accessed, whether the data is spread over multiple participating databases, or in which way.

An example of an MDBS is EDA/SQL, which provides access to any number of Informix, Sybase, Oracle, and various other databases as if the application were accessing a single SQL database. OmniConnect is Sybase's multidatabase offering. Through OmniConnect, a single SQL-based transaction can span multiple, heterogeneous data sources transparently. These data sources are accessed through DirectConnect, the main building block for OmniConnect. Through OmniConnect you can get common SQL access to data stored in Software AG Adabas, AS/400, Computer Associates International Inc.'s CA-OpenIngres, IBM DB2, IDMS, IMS, Informix, ISAM, Microsoft SQL Server, Oracle, Rdb, RMS, sequential files, SQL/DS, Sybase SQL Server, NCR Teradata, and VSAM databases and file systems. All of the various data sources are registered in the global data catalog, and from then on, the data can be used in full distributed read-write operations as if the data sources were local. You can access multiple disparate SQL data sources, with full distributed query optimization, and integrate the results without even running the Sybase SQL Server. Another example of an MDBS is CA-OpenIngres/Star, which uses the CA-Enterprise Access database gateways to access data in other sources, such as Oracle, DB2, CA-Datacom, and CA-IDMS.

TP Monitors

A transaction-processing (TP) monitor is an extensive framework into which you can embed your application logic. The framework consists of the following:

Client Applications: In this context, these consist only of the user-interface functions, such as screen logic, screen handling, input handling, and some validation functions.

Application Servers: All of the details of the application's logic are coded in the application servers as so-called application services.

Resource Managers: All of the lower-level services, such as interfacing between the database and the application services, are provided by so-called resource managers.

In the context of distributed systems, a TP monitor provides a number of useful functions, including automatic multithreading, function shipping, security, and global naming and name-resolution services. A TP monitor also provides asynchronous or synchronous access to data sources, support for a large number of users, and easy system integration over numerous heterogeneous environments and components.

Tuxedo and Encina are examples of TP monitors; the latter is built on OSF's distributed computing environment (DCE). Most DBMSs have full X/A support for TP monitors, as illustrated in Figure 2.

Multitier Architectures

Using the enabling technologies I just described, you can implement what is commonly known as three-tier, four-tier, and n-tier architectures, to integrate the data from multiple data sources into a single Web application.

In a typical three-tier architecture, the client Web browser application requests data from the Web server. The Web server requests the data from a database server through a standard client/server connection, as if it were a standalone client requesting a database service from a remote database server. The database server can, in turn, retrieve the data from a remote database (for example, through a database gateway). In the case of Java-enabled applications, the application may make a direct connection to the DBMS server if a JDBC driver is available for the DBMS.

There are various four- and n-tier architectures, where the client Web-browser application requests data from the Web server. The Web server requests the data from either an application server (in a TP monitor system) or a multidatabase server (in an MDBS), which in turn requests the data from the DBMS, possibly through a database gateway.

A typical four-tier architecture with a TP monitor is as follows: The application server requests the required data from various database servers, and the database servers then retrieve the data, as I described previously. The application server then integrates the data it receives from the various database servers. The application server can perform any integration functions that the application-server developers can manage to code. Note, however, that those who develop and implement such a system must be aware of the distribution of the various data components, their interrelationships, and how they must be integrated.

In the MDBS scenario, the Web server requests the MDBS to retrieve the required data. It does this by issuing a global-level SQL query to the MDBS. The MDBS then decomposes the global query and issues the resulting local queries to the various participating database servers. The database servers may even be managed by different DBMS servers, but they can be accessed through database gateways to appear similar. The MDBS then integrates the local results it receives from the various database servers, and it presents the integrated results as a global result to the Web server. The MDBS handles all of the data localities, interrelationships, and integration techniques. The integration of the data is automatically handled by the MDBS -- it does not have to be coded by the application developers. The Web server merely issues a global request to the MDBS and receives a global result from it. In this scenario, the combination of Web server and Web browser is not unlike the role played by the client application in a client/server model. We've just substituted the client/server application with the Web browser and server.

The scenario is similar in the Java world. For example, in such an architecture, a JDBC-enabled Web browser issues a JDBC request to the Web server. The Web server translates the request to an ODBC request, which the ODBC client on the Web server passes to a remote ODBC server (OpenLink's ODBC, for example, has such a client/server architecture). The ODBC server then issues the request to a multidatabase server such as CA-OpenIngres/Star, EDA/SQL, or OmniConnect. The multidatabase server then retrieves the data from the remote databases (possibly from heterogeneous databases through database gateways), integrates it, and returns it to the application along the same path.

It Has Never Been Easy

There are also hundreds of Web application-development tools on the market. Some of them, such as HAHT Software Inc.'s HAHTSite and Symantec Corp.'s Café Pro, provide extensive functionality and access to multiple databases from a single Web application. However, accessing multiple and different databases from a single Web page -- or even better, from a single "table field" on a single Web page -- is a different kettle of fish. I have only seen one demonstration of Symantec Café Pro, using its underlying dbAnywhere technology, where the data from two tables, stored in Microsoft SQL Server and Sybase databases respectively, was integrated on a single Web page. In this configuration, dbAnywhere is the enabling middleware technology, that performs the integration of the data retrieved from the two databases.

Integrating heterogeneous data, let alone publishing it on a Web page in real time, has never been easy. To integrate the data from various sources on dynamic, interactive, database-driven Web pages, you must implement a three-tier, four-tier, or n-tier application architecture. You must use some kind of enabling technology, such as a multidatabase or TP monitor, to help you integrate the data from multiple databases --especially if you want to do that under full transaction control. You may also have to use some kind of gateway technology to integrate the data from different types of databases, because the gateway technology will hide the differences between the various DBMSs for you. Ultimately, your choice depends on which tools fit into the global application architecture, the functionality and explicit control required, and the skill set of the application developers. You must also consider the implementation schedules, as well as the desired throughput of the resulting system.


Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can reach Martin via the Internet at mr@dba.co.za.


Figure 1


-- Multidatabase architecture.

Figure 2


-- TP monitor architecture.

Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
May 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Monday, April 14, 1997