DBMS, June 1997
DBMS Online: Server Side By Martin Rennhackkamp

Extendable Databases

The Widely Varying Facilities DBMS Vendors Offer for Implementing Your Own Datatypes.

As we're all aware, no single technology is a panacea for all of our information technology problems and issues. We've been through the relational revolution, the client/server coup, and the object-oriented offensive, and now we see the same thing happening with the Internet insurgency. However, each of these technology waves introduces some very useful concepts and techniques that can often be engineered back into the other technologies. We can see this phenomenon in action in the way the object-oriented approaches are influencing the more traditional relational systems.

All of the leading relational DBMS vendors are extending their products with some object-oriented capabilities. The result is the uprising of the so-called "Universal Server." One of the features of this new type of DBMS server is its extensibility. You can extend the functionality of the DBMS server by adding your own user-defined datatypes (UDTs) and your own user-defined functions (UDFs).

But why would you want to do this? The answer is quite simple (and quite object-oriented): for reuse and sharing. You want to reuse the functionality you have developed over and over again, and you don't want to recode it in every application. You want to code it once in the DBMS server and share it among all of the applications accessing the database. The spatial datatypes offered by each DBMS vendor as practically a give-away UDT are an excellent example. The distance between two points (in the 2D or 3D plane) is a single-dimension numeric figure. When you work with spatial datatypes, you want to apply the distance function to two coordinate points and obtain a distance as a result, and you don't want to code Phythagoras's theorems in every application. You simply want to apply the function. However, this implies that someone had to implement 2D and 3D spatial datatypes, as well as the distance function, in your DBMS server. The problem is that implementing a new datatype in an extendible database server is not trivial.

The facilities provided by the major DBMSs to implement your own datatypes differ quite substantially. They range from letting you refine the definitions of the existing standard datatypes (almost like defining a domain) to the facilities to letting you "invent" and code your own datatypes and link them into the DBMS server.

The Players

Although it is not called a Universal Server, the CA-OpenIngres DBMS (currently in release 1.2) has had support for UDTs and UDFs since Ingres 6.3, which was released in the late 1980s (at that stage still developed by Ingres Corp.). Through the CA-OpenIngres Object Management Extension (OME), you can define your own datatypes and your own functions. You can also define functions for your own datatypes and for the standard existing datatypes.

In the Informix Universal Server (release 9.00 currently in beta testing) you can use so-called DataBlades to extend the capabilities of the database server. A DataBlade (a software module that you plug into the database) lets you do anything you care to code with any data. Informix provides a DataBlade Developers Kit (DBDK), a GUI tool for automating the creation of DataBlades that provides useful utilities for maintaining any existing DataBlades as well. The DBDK generates most of the SQL and C code necessary to create a DataBlade.

In IBM's DB2 (release 2.1.1) you can define UDFs to extend the set of SQL built-in functions. You can use these UDFs wherever a built-in SQL function can be used. You can also define UDTs that are derived from the existing datatypes. The UDTs can therefore refine the existing datatypes, but they cannot be used to extend or combine the standard datatypes. The UDTs can only be used in the system-provided or built-in functions if you have created UDFs for the UDT.

You can define your own UDFs in Oracle7 release 7.3 using Oracle's PL/SQL language. You can use the UDFs wherever you can use a standard SQL function. The UDFs must be created as top-level PL/SQL functions, or they must be declared with a package specification before being used in a SQL statement. The only correspondence to a UDT in Oracle is a user-defined type equivalence used in embedded SQL programs with the Oracle7 precompilers.

Sybase SQL Server 11 is rather lean on UDTs and UDFs. Although it allows user-defined datatypes, they must always be a subset of one of the standard datatypes. It is not possible to create complex datatypes as a combination of other datatypes. Sybase's extensive and extendable functional processing is based solely on its very powerful stored procedures. However, the stored procedures cannot be called like standard SQL functions can -- you can only activate them through explicit procedure calls.

Microsoft SQL Server 6.5 lets you define stored procedures as dynamic links to C library routines. This means that you can develop an API as a set of C functions that can perform complex operations on the standard datatypes. Although this feature does let you extend the server's functionality, it does not cover the definition of new UDFs, because you cannot call these functions from SQL statements. You can define new UDTs using Transact-SQL, but these are only subsets of the existing scalar and character datatypes. You cannot, however, define any functions or operators on the new datatype that can be used in SQL SELECT or WHERE clauses. This functionality is therefore more suited for defining domains on existing base datatypes than for defining new complex UDTs.

User-Defined Types

To define a datatype, you must define its appearance (for example, how it must be displayed to the user), its internal representation (how it should be stored in the database), and its standard functions (how the DBMS must apply the standard SQL functions and the internal functions to the stored datatype). Note that a DBMS must apply a whole collection of internal functions and operations to all of the data items it manages. For example, if the DBMS supports hashed indexes, it must know how to calculate a hash index pointer based on the value of a data item; this hash index pointer should preferably give quite a unique value for each of the different data values that can be assigned to the datatype.

To implement a UDT in CA-OpenIngres, you define its internal representation, code all of the required functions, operators, and coercion routines, and then link them with the DBMS server. For a typical UDT, you must implement 16 functions. These routines are, for example, to compare two data items, to convert a data item to a representable datatype, to get an empty value, to calculate a hash key, to check that the data item satisfies its length specification, and many more -- that is, in addition to any special UDFs that you want to implement for your new datatype. Once you have coded the methods for storing and manipulating your new datatypes and have added them to your server, all of the end users and applications have SQL-based access to it. You can adapt the standard operators to have a special meaning in the context of your own datatypes. For example, where the "1" operator means concatenation for the string datatypes and addition for the numeric datatypes, you can implement it to mean vector addition for your spatial datatypes or component-by-component addition for complex datatypes such as complex numbers or matrices. In this way, it takes care of full operator overloading, dependent on the context of the datatype.

When you create a DataBlade in Informix, you must implement its UDTs, its user-defined routines (UDRs), and its user-defined access methods (UDAMs). There are three varieties of UDTs in Informix, namely:

  • DISTINCT datatypes, which inherit their physical characteristics from the existing datatypes (such as their internal representation), but which can have different behaviors--for example:

    CREATE DISTINCT TYPE yen AS money;

  • ROW datatypes, a group of columns organized together into one datatype--for example:

    CREATE ROW TYPE myCustomer ( cust_num INTEGER, company CHAR(40), phone CHAR(12));

  • OPAQUE datatypes, which are defined as C structures, but the C structures are transparent ("opaque") to the user. These are usually used for large objects--for example:

    CREATE OPAQUE TYPE reallyLongInteger(internalLength = 8)

    User-Defined Functions

    Creating a user-defined function entails defining its interface and its implementation. The interface specifies what the function is called (its name), how it is activated, and the parameters passed to and from the function. The implementation of the function specifies the actual "work" that the function must perform; that is, how it must process the given input parameters to give the desired output parameters.

    To create a UDF in CA-OpenIngres, you must specify its name, type, and internal ID in a predefined storage structure, implement the function's code, and link the function's definition to its implementation in another predefined storage structure. Finally, you must link the defined structures and the function's code with the DBMS server. After this, the function is automatically invoked at run time when the function call is encountered in a SQL statement.

    An Informix UDR is a function written in SQL, Stored Procedure Language, C, or C++. A UDR is called from any SQL statement, just like a standard SQL function--for example:

    SELECT myCfunction(18900, 12) FROM currencyTable.

    You use Informix UDAMs to create new types of indexes and other ways to access your data. These are coded in C or C++ and are added to the Informix server. Once the UDAM has been added to the server, you can create an index using the defined access method--for example:

    CREATE INDEX myIndex ON customers(cust_info.company) USING myBtree;

    In IBM's DB2 (release 2.1.1) you can implement two types of UDFs: external functions written in a standard programming language, and sourced functions whose implementations are inherited from existing functions. DB2 provides a rich library of UDFs, including mathematical and scientific functions.

    A DB2 UDF can be built using C++ commands, Visual Age, Java, or other language stored in a .bat or .exe file, but it is the easiest to use C++. The UDFs are compiled like stored procedures, but they do not require a connection to the database or a precompile phase. A UDF cannot contain any embedded SQL statements, but the application using the UDF does. The application may call the UDF just like it calls any standard SQL function. To define a UDF, you need to open a DB2 Command Line Processor (CLP) window and type: UDFname udf. After that, any application can use the function. UDFs can operate over all database types, including LOBs and distinct datatypes. A UDF can operate on data managed within or outside the DB2 database. For example, you can code a UDF to send an electronic message, update a flat file, or perform some other operation.

    You create an Oracle UDF, called a user function or a stored function, as a top-level PL/SQL function by using the CREATE FUNCTION statement. You must be running Oracle7 with PL/SQL to be able to create Oracle UDFs. As part of the function's definition, you specify its name, its input and output parameters, its return value, and the body of PL/SQL code that must be executed. Alternatively, you can create an Oracle UDF as part of a package using the CREATE PACKAGE statement. Before you can create any UDFs, the user SYS must run the SQL script DBMSSTDX.SQL. To create a function in your own schema, you must have the CREATE PROCEDURE system privilege. To create a function in another user's schema, you must have the CREATE ANY PROCEDURE system privilege.

    Latest Developments

    An interesting related development is Oracle's "cartridge" technology, as documented in the company's Network Computing Architecture white paper. The Network Computing Architecture is particularly aimed at providing extensibility for distributed environments. In addition to supporting de facto standards such as open COM/CORBA and IIOP, one of the cornerstones of the Network Computing Architecture is "pluggable" cartridges. A cartridge is just another name for a manageable object. Through a standardized Interface Definition Language (IDL), each cartridge identifies itself to the other objects in the distributed system. It can be coded in any language, such as Java, Visual Basic, C++, and even PL/SQL. Interface, application, and data cartridges, on various layers of the Network Computing Architecture, communicate with each other through the Inter-Cartridge Exchange (ICX). ICX uses open interfaces such as IIOP to make transactions cross between the different environments found on the different layers of the architecture. Developers can use cartridges to extend the functionality of the interface, the application, or the database. For example, a database developer can implement a data cartridge with its own interface specification to store a special type of data in the database. Through its IDL, client and application processes would know how to process the data. This will give developers incredible extensibility power -- if a database or application does not have a particular datatype (or even a function for that datatype), you can merely implement a cartridge to provide you with the required functionality.

    It is also interesting to note that CA-OpenIngres (and before that CA-Ingres) used the same mechanisms internally to implement the standard datatypes as are now available to its users to define their own user-defined datatypes in the CA-OpenIngres/Object Management Extension. By using a generic mechanism for all of the datatypes, whether they're standard or user-defined, all datatypes are defined and processed in the same way. This makes it just as easy for the CA-OpenIngres developers to implement new datatypes as for the users. This same philosophy of generic extendible datatypes probably led to the development of the DataBlades that were used in the Illustra database server and are now making their appearance in the Informix Universal Server release 9.00.

    Third-party vendors are making use of this extensibility to the database servers to code libraries of datatypes that are applicable to their particular businesses. For example, a partnership between ASK (former owners of Ingres) and ESL (Ingres development partners) in 1992 led to the development of the Spatial Object Libraries currently available with CA-OpenIngres. This library gives you all of the spatial datatypes, such as points, lines, segments, circles, boxes, and polygons, with all of their associated functions, such as inside, intersect, area, length, perimeter, and distance. I foresee that many more datatype libraries, or DataBlades in Informix terminology, will become available in the near future, for a wide variety of business areas. These types of libraries enable developers to implement intelligent datatypes -- call them business objects if you want -- that encapsulate knowledge of the business-specific features and functions required from such a datatype.


    Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can reach Martin via the Internet at mr@dba.co.za.
    Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
    June 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

    DBMS and Internet Systems (http://www.dbmsmag.com)
    Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
    Redistribution without permission is prohibited.
    Please send questions or comments to dbms@mfi.com
    Updated Friday, May 16, 1997