
There is an increasing need to store and manipulate complex data in relational databases. Complex data is imminent not only in multimedia applications for the Web, but also in specialized application domains such as medical care (including X-rays, MRI imaging, and EKG traces); geographical, space, and exploration systems (such as maps, seismic data, and satellite images); and even financial systems (such as time series data).
Most relational DBMS vendors have added the capabilities to manipulate binary large objects (BLOBs) in the database. However, these implementations are not good enough. The data is stored inside the BLOB as a noninterpreted byte stream. Because of this overly simple generalization, the DBMS does not have any knowledge concerning the content of the BLOB or its internal structure. Consequently, you cannot perform queries and operations on inherently rich and structured datatypes, such as images, video, Web pages, hypertext, and word-processing documents. The operations and algorithms to manipulate these datatypes are not available to the query processing and indexing facilities of the DBMS. This means that the users' application programs must perform the necessary processing on the contents of the BLOBs. This, in turn, suggests that the entire BLOB must be shipped across the network to the client's workstation (or in a three-tier client/server architecture at least to the application server) before any operations can be performed on it. The BLOB may have to be shipped all this way, consuming a lot of unnecessary bandwidth, only for the application to determine that it has no interest in its content.
It should be possible to search, access, and manipulate complex datatypes in the database with standard SQL, without breaking the rules of the relational data model. It is, however, not possible for each DBMS vendor to implement all the datatypes and access methods for all the rich datatypes required by the yet-to-be-developed applications of the near future. There will always be applications requiring more datatypes than those supported by the DBMSs. Developers often cannot wait for the DBMS vendor to implement their required datatypes in a subsequent release of the DBMS. In many cases it will not be economically worthwhile for the DBMS vendor to implement a specialized datatype for a small part of the user base. Some DBMS vendors have already realized that the solution is to make the DBMS extendible so that the developers themselves can add the datatypes they require. Moreover, new datatypes can be developed separately from new releases of the database server, and even third parties can develop and market datatypes. It goes without saying that these new structures must then fit in the relational model. You must be able to assign these new datatypes to the columns of tables. However, in this context extendible means more than just adding new datatypes. You also need ways to manipulate and efficiently search through the data stored in these new datatypes using standard SQL.
Advocates of object orientation argue that this extensibility implies making the relational DBMS more object oriented. Apart from being fashionable, this term is quite correct. There is more to it than just adding new simple datatypes to the database server. It means adding user-defined datatypes (UDTs), user-defined functions (UDFs), user-defined access methods, and usually changing an extensible optimizer. UDTs include complex datatypes that may encapsulate complex internal structures and attributes. UDFs define the methods by which applications can create, manipulate, and access the data stored in these new datatypes. Users and applications only call the UDFs and don't need to understand their internal structure. UDFs also support the notion of overloading, which refers to the concept of using the same name for different routines (actually called methods or member functions). You can define different UDFs with the same name, but each UDF is applicable to a specific UDT. For example, a function called sum(...) applied to integer datatypes would perform conventional addition, while sum(...) applied to spatial datatypes would perform vector addition. User-defined access methods define the techniques by which the UDTs are accessed and indexed. For instance, through user-defined access methods, you can implement R-trees for spatial datatypes. An extensible optimizer provides ways to assign costs to UDFs and user-defined access methods, so the DBMS can determine the best way to access the data stored in the UDTs.
Three of the leading DBMS vendors -- IBM, Informix, and Oracle -- have extended their conventional relational DBMSs to become object/relational "universal servers." These servers provide the mechanisms to extend the data storage capabilities of the database and the functionality of the DBMS itself. However, each vendor implemented the universal server concept using different mechanisms. In DB2 the mechanisms are called Relational Extenders, in Informix they are called DataBlades, and in Oracle they are called Cartridges. In this article I will review these technologies in an attempt to highlight their similarities and differences.
IBM DB2 V5's Relational Extenders are built on the object/relational facilities first introduced in DB2 version 2. (For more detail on DB2 V5, see my Server Side column in this issue.) The object/relational facilities form the first part of IBM's implementation of the emerging SQL3 standard. It includes UDTs, UDFs, large objects (LOBs), triggers, stored procedures, and checks. As a next step, the DB2 Relational Extenders are used to define and implement new complex datatypes. The Relational Extenders encapsulate the attributes, structure, and behavior of these new datatypes, storing them in table columns of a DB2 database. The new datatypes can be accessed through SQL statements in the same manner as the standard DB2 datatypes. The DBMS treats these datatypes in a strongly typed manner, ensuring that they are only used where data items or columns of that particular datatype are anticipated. A DB2 Relational Extender is therefore a package consisting of a number of UDTs, UDFs, triggers, stored procedures, and constraints.
A UDT is a new logical datatype. Users can define table columns, triggers, and procedure parameters based on these new datatypes. Internally, the UDT may be mapped to a standard datatype, or it may be built up from a collection of other standard DB2 datatypes or UDTs.
A UDF is a new logical function. It may be associated with a particular UDT or with a standard datatype. It may be used anywhere in a SQL expression where a standard DB2 built-in function can be applied, forming a seamless extension to the SQL language. Internally, a UDF is written in C, Fortran, or a 4GL (VisualAge for Java or VisualAge for Basic), or it is built up from the standard DB2 functions or from existing function libraries. The UDFs support overloading. The context of the UDF (that is, the UDT or standard datatype to which it is applied) determines the list of parameters in its interface, as well as the exact operations it performs.
Users can write triggers that fire automatically when the rows in a relational table are inserted, updated, or deleted. A trigger may be activated by the operations performed on a UDT, even through a UDF. The trigger can activate a stored procedure and it can pass the UDT's data as a parameter to the procedure. Internally, a Relational Extender may employ triggers to maintain the structures used to implement the UDT. For example, if a UDT was internally implemented through a number of related internal tables, the Relational Extender can use triggers to maintain the consistency of these internal tables. The internal triggers may also use stored procedures.
Users can specify SQL-style integrity constraints on columns defined over UDTs. The constraints are automatically applied to the values assigned to the UDT. This includes unique, primary-key, referential, and check constraints. The Relational Extender may also use similar constraints internally to ensure the correctness of the underlying data.
Although they work on completely different levels, a Relational Extender developer uses the same mechanisms internally as does a Relational Extender user (such as a typical DBA). A DBA uses tables, UDTs, UDFs, triggers, stored procedures, and constraints to store, manage, and maintain the integrity of an organization's data in relation to other data and the organization's business rules. The Relational Extender developer uses the same mechanisms, namely tables, other UDTs and UDFs, triggers, stored procedures, and constraints to implement the Relational Extender and represent its contents internally. No matter how complex the data structure that a Relational Extender represents, it is still represented internally in tables with triggers, stored procedures, and constraints. As a result, you don't need a separate SDK (or a degree in rocket science) to implement a Relational Extender -- a typically well-skilled DBA can implement a Relational Extender using database objects with which he or she is already familiar. You can also copy and adapt from the four Relational Extenders shipped with DB2.
A complex Relational Extender may define a UDT with an elaborate internal structure that has multiple internal attributes. In this case, the logical view of the data may not necessarily be the same as the internal physical storage format. The internal attributes and structure may be hidden behind a public functional interface. In such an implementation, the column in the user's table would not contain the actual data of the UDT, but it would contain a handle that refers to an instance of the datatype. The interface would consist of a set of UDFs that take this handle as an argument and perform the retrieving, storing, searching, and manipulating of the underlying attributes and structure. For example, an extender can be used to store rich text documents. The data itself may be stored in multiple tables, with keywords stored in separate tables and hidden indexes. Such an extender would typically include a UDF that encapsulates an advanced search engine, which searches through the keyword tables. A UDF may even be a powerful function accessing, retrieving, and manipulating data on another advanced server, such as the IBM OS/2 LAN Server Ultimedia. Applications use the extender transparently to process the columns in the tables that contain the particular UDTs corresponding to the extender. The UDFs appear the same to the applications as the standard DB2 SQL functions. The applications use the appropriate UDFs in their SQL operations, but internally these are mapped to operations on the data in the underlying tables. Figure 1 shows the relationship between the Relational Extenders, the database, and the application.
When installing a Relational Extender on a database, various files are copied into the server's directories, including the function library containing the UDFs. Then an application is run against the database to define the Relational Extender's database definitions to the server. This includes scripts to create the necessary internal tables, triggers, stored procedures, and constraints, and scripts to define the UDTs and UDFs making up the Relational Extender. For example, the UDFs are registered with the DBMS through a create function DDL statement. Examples of these applications and scripts are provided with the four Relational Extenders shipped with DB2.
Informix DataBlades are named after the special-purpose blades you can insert into a general-purpose utility knife. The DataBlades are standard software modules that plug into the database and extend its capabilities. A DataBlade is like an object-oriented package, similar to a C++ class library, that encapsulates a data object's class definition. The DataBlades not only let you add new and advanced datatypes to the DBMS, but they also let you specify new, efficient, and optimized access methods and processing options for these datatypes.
A DataBlade includes the datatype definition (or structure) as well as the methods (or operations) through which it can be processed. It also includes the rules (or integrity constraints) that should be enforced, similar to a standard built-in datatype. Figure 2 illustrates how DataBlades fit into the Informix Universal Server architecture.
A DataBlade is composed of a UDT, a number of UDFs, access methods, interfaces, tables, indexes, and client code. A DataBlade consists primarily of a new UDT. These datatypes are treated by the server in exactly the same way as the built-in datatypes. Their values may be stored, queried, indexed, returned to applications, and passed as function parameters. The UDTs can be based on other built-in or user-defined datatypes from which they can inherit methods and other properties. There are three basic types of UDTs in DataBlades: row types, distinct types, and opaque types. A row type is a SQL3 datatype incorporating a number of columns based on built-in or user-defined datatypes. Through the inheritance properties of the row types, relationships such as joins between related tables can be easily implemented. The distinct datatypes let you customize existing datatypes. For example, you can refine the definition of an existing datatype to create a new specialized datatype. The opaque datatypes are the most flexible. These are implemented using C, C++, or Java code, where the code defines how occurrences of the datatype have to be stored, indexed, and processed.
Each DataBlade can have a number of UDFs that operate on its datatype. A function can operate on the datatype of the DataBlade and on other datatypes (including other DataBlades). These functions are usually coded in Informix's stored procedure language (SPL), C, C++, or Java. Functions developed in C, C++, or Java are compiled and loaded in a shared object file or a dynamic link library (DLL). When the function is invoked, the shared object is linked into the database server and executes in its space. For the opaque datatypes, you must code a minimal set of required functions, and you can add a set of additional functions.
The access methods operate on the tables and indexes that are managed by the server. When defining new datatypes, you can use the existing access methods or implement new ones. An access method is defined to the server as a set of functions that it can call at various times when executing a query. These include functions, for example, to start scanning an index, get a next row, insert a new row, or delete an existing row. The access methods can be used to implement faster search methods for specialized datatypes such as R-trees, which are more efficient than B-trees for searching through 2D and 3D spatial datatypes.
An interface enables one DataBlade to share the services of another DataBlade. It is a collection of functions that conforms to a specified standard. Through this facility, various DataBlades can share common functions, such as search and retrieval. A DataBlade can store and manipulate its own definition data in tables and indexes in the database. This makes the whole DataBlade data-driven and easy to change, manage, and extend. Client code is application code that accesses the database through the DataBlade API library. It provides an interface through which users can query, display, and modify the new datatypes.
Informix provides a DataBlade Developers Kit (DBDK), a comprehensive graphical development environment for creating new datatypes and functions. It manages the packaging, installation, and registration of new DataBlades. Apart from documentation, sample code, and supporting software, the DBDK consists of three tools:
BladeSmith: A tool to organize DataBlade development projects. You can define projects, as well as various objects (such as datatypes and functions) that constitute the DataBlade module. BladeSmith generates all the necessary source files, header files, SQL scripts, setup files, and a make file. You only have to fill in the bodies of the various functions and scripts. When you are finished, you run the make file to compile the DataBlade into a shared object file.
BladePack: A tool to create an installable DataBlade module. It is used to package all the necessary SQL scripts, the shared object file, and other required files for installation. You can package a DataBlade into formats for diskettes, CD-ROMs, or Unix tar files. BladePack includes various installation scripts, all with the same look and feel.
BladeManager: A tool to manage DataBlades in customer databases. It ensures that all the necessary SQL scripts are run against the databases to register or unregister all the datatypes, functions, and other components of each DataBlade. The BladeManager is included with every copy of the Informix Universal Server, which means that customers do not have to acquire the DBDK to install custom-developed DataBlades.
Oracle Cartridges are a key component of the new Oracle Network Computing Architecture (NCA). The other components of the NCA are client applications, Oracle's Universal Application Server, and the Oracle Universal Server. These components all communicate through open, standardized protocols. From a database point of view, the Oracle Universal Server is the component of the NCA providing extensible and scalable data storage and data manipulation facilities. (For more details on Oracle8, see Robin Schumacher's feature)
A Cartridge is a pluggable, manageable object. Its structure is defined using SQL3, and its access interface is identified to the other objects in the system through the language-independent Interface Definition Language (IDL) of the Object Management Group (OMG). You can develop cartridges in C, C++, PL/SQL, or Java.
In the NCA, there are basically three types of Cartridges, classified according to where they expect to be hosted:
Cartridges communicate through an object bus called the Inter-Cartridge Exchange (ICX). ICX is implemented as a set of libraries and services. Through these libraries, the Cartridges have access to other Cartridges, clients, servers, and other services. ICX uses HTTP and the IIOP for communication, with automatic translations where necessary. It also provides interfaces to ActiveX/COM (through a bridge), Java (through a language mapping for CORBA IDL, over IIOP, or through JDBC), and various other legacy systems and Cartridge Services. Figure 3 illustrates the types of Cartridges and their interaction with one another within the NCA.
In the Oracle8 Universal Server, through the high-level IDL and SQL interfaces, you can extend the SQL query execution engine, the indexing mechanisms, the optimizer, and the administration components of the database server. This is done by implementing user-defined Data Cartridges. A Data Cartridge is a predefined package consisting of datatypes, methods, operators, aggregates, and index types that can be integrated with Oracle8 via the extensibility interfaces.
Objects are defined to represent new datatypes. A simple object can be built from existing datatypes, and complex objects can be built using object modeling techniques. Collections of objects can be represented in array structures or nested tables. An object type can be either a column type or a table type. A column type defines the type for a single column in a table. A table type defines an entire row, which is then used as the basis for creating a table, called an object table. An Oracle object type is semantically equivalent to the SQL3 named-row type. An object type has one or more attributes and each attribute has a name and a type. The type can be an existing Oracle scalar type (character or number), a collection of values (either an array or a nested table), a LOB, another object type (object types can be nested), or a reference to another object type.
Methods, operators, and aggregates are defined on these objects. Each object type can have methods associated with it, although methods are not required. The name and signature of each method can be included directly in the type definition. This is necessary because Oracle8 supports function overloading, which means the same function name can represent different functions and the DBMS decides which function to apply at runtime based on the arguments it receives. The database can also make calls to external programs written in C, C++, or Java. The calls can be made through protocols including HTTP or IIOP.
Oracle8 includes object views, which enable you to take existing relational data and synthesize it into Oracle8 object types. The object views can be updated via so-called "instead-of" triggers. You write triggers to define how inserts, updates, and deletes are executed through the view onto the underlying base tables. You can also use instead-of triggers for relational views.
In future releases of Oracle, Cartridges will be extended to Application Server Cartridges and Client Cartridges. This broad applicability of Cartridges will enable you to implement not only extensible database servers, but also extensible client programs and extensible application servers.
Cartridges can be developed in a language-centric approach, using tools focused on the development language. Any tool that generates Java, JavaScript, Visual Basic, C, C++, or SQL can be used through the CORBA and language-based mappings to the IDL. Oracle's Designer/2000 and Developer/2000, as well as various Java development environments, fall in this category. You can also take an object-centric approach to developing Cartridges using tools that focus on the specification of the objects (in this case the Cartridges) being created.
Adding object support to a relational DBMS is not leading-edge technology. Even by the late 1980s, through its Object Management Extension (OME), Ingres 6.3 (at that point still owned by ASK Inc.) provided you with the capabilities to define your own UDTs and UDFs, including additional functions for the standard datatypes. (It is interesting to note that the chief architect of Ingres, Dr. Michael Stonebraker, is also the chief architect of Illustra and the Informix Universal Server.) This functionality, although not broadly advertised as a Universal Server per se, is still shipped with CA-OpenIngres 2.0. However, the CA-OpenIngres OME implementation does not have the easy-to-use integration tools and software development kits that DB2, Informix, and Oracle provide. Nor is it as standards-based as, for example, Oracle's Cartridges and DB2's Relational Extenders.
There are three important differentiators when looking at DB2's Relational Extenders, Informix's DataBlades, and Oracle's Cartridges: standards, the development environment, and what I call the "layer of applicability." First, if standards and openness are important to your organization -- which they should be if you are using heterogeneous systems on heterogeneous computing platforms -- then you should look closely at the middleware and the standards used in the implementation. Oracle's Cartridges, for example, are based on the OMG's IDL and the CORBA 2.0 standard. DB2's Relational Extenders, on the other hand, are based on the SQL3 standard. Second, adding a new object class to a relational DBMS is not a trivial task. You must define the UDT, develop various UDFs, specify optimizer statistics, and implement some checks and constraints -- and then package this all in a neat little bundle into the DBMS server. Informix provides its DBDK tool set (BladeSmith, BladePack, and BladeManager) for developing, managing, and integrating DataBlades into the Informix Universal Server. Finally, you should look at the layer of applicability. It does not help much if it is simply adding new object classes to the DBMS server when you have to bend over backward to use them in applications. This is a problem with many of the BLOB implementations in some of the DBMSs. Extensive BLOB handlers must be developed in order to process the BLOBs in the application once they leave the safety of the database. This is due to the datatype mismatch between the database and the application -- a problem that has frustrated relational database application developers since the day they had to use cursors and null indicators. This makes the Oracle Cartridges interesting because they are not confined to the database -- they can be used in the application server or on the client.
Extending a relational database server with an object class is not everybody's cup of tea. A large percentage of applications will be able to function with the standard datatypes that DBMSs provide. Of the remaining applications, a large percentage will be satisfied with the object classes developed by third-party object class specialists (typically in specific application areas). As they become popular, some of the third-party object classes will also eventually find their way into the DBMS products. For example, many DBMS products are already offering spatial datatypes and BLOBs as standard datatypes. As DBMSs evolve in this manner and with the addition of third-party object classes as new standard datatypes, they should satisfy the requirements of most standard applications. However, the ability to extend a database server is an extremely powerful, useful concept. Some specialized application domains, such as those listed in the introduction, would profit greatly from specialized datatypes. There are also specialized third-party developers in these application domains that would make a good living developing object classes for the various DBMS products. (See the product chart for a complete listing of all the current Cartridge, DataBlade, and Relational Extenders on the market and in development.) At this stage, each object class still has to be ported to each of the DBMS products. It will take a considerable amount of time and negotiation before the different DBMS extensions, such as Cartridges, DataBlades, and Relational Extenders, adhere to a common implementation standard.

