DBMS, December 1997
DBMS Online: Server Side By Martin Rennhackkamp

A DBA's View of DB2

Taking IBM's New Turbocharged DB2 Version 5.0 Out For a Spin.


It was with great excitement that I opened the parcel, which arrived by courier from IBM. My first universal server had arrived -- in the form of IBM's DB2 Universal Server version 5 ("V5") for NT. The cover letter said "Check it out!" -- and that is what I plan to do in this month's column.

The package consisted of a Quick Beginnings Guide, various release notes, and a pile of CDs: DB2 Universal Server for NT, DB2 Universal Server Extenders, DB2 Universal Server Software Developers Kit, DB2 Universal Server Client Pack, Net.Data for NT, Visual Age for Basic, and Lotus Approach 97.

DB2 V5 combines what used to be Common Server (2.x) and the Parallel Edition (2.x). The Windows NT, Unix, and OS/2 code bases are the same, but other platforms, including the OS/390 and AS/400 versions of DB2, have their own separate code bases.

Getting Started

Installing DB2 for NT is trivially simple. An Installation wizard asks you to select the target directory and the system components you want to install. It checks whether you have sufficient space (85MB for a typical installation) and whether your user name is acceptable (a valid user from the administrator group, not called Administrator). After it copies all the files, installs the services and program groups, and reboots your machine, you are set up to use DB2 Universal Server. After installation, a First Steps wizard guides you through setting up and using the sample database. You can also view the online documentation, called the "product library," from this wizard.

Tools and Facilities

DB2 V5 has a collection of new administration tools. The DB2 Control Center is an extensive GUI-based, Explorer-like tool with which you can manage everything from systems and databases to views, triggers, and users. You can perform most administrative tasks, such as configuring the system, managing directories, backing up and recovering databases, scheduling jobs, and managing storage media. The Control Center also contains a Replication Administration facility to set up graphically the replication of data among systems. The Control Center, illustrated in Figure 1, consumes about 4MB of memory while running.

The Command Center and Script Center provide interactive windows in which you can execute SQL statements, DB2 commands, and operating system commands. The results are displayed in a scrollable result window. From the Command Center, you can view the access plan and the statistics associated with a SQL statement before it executes. You can also schedule command scripts to run at a later date or at regular intervals. The results of scheduled jobs are written in a journal, which can be viewed from the script center.

The Performance Monitor provides information for performance analysis of the database manager and database applications. Performance characteristics are captured periodically by the Snapshot Monitor. The snapshot data can be viewed in realtime as graphs or text and in detailed or summary form. You define exception conditions by specifying threshold values. When a threshold value is reached, any of the following actions can be initiated: notification through a window or an audible alarm, logging of a record in the database, or execution of a script or program. The Event Analyzer is used to capture summary information on the completion of events such as completed statements, completed transactions, and applications disconnected from the database.

The Governor is used to monitor applications running against the database. It consists of a front-end utility and a daemon. When you start the Governor's front-end utility, it starts the Governor daemon. The daemon collects statistics about the database applications. The Governor checks the statistics against rules specified in a configuration file. It can take various actions, such as changing the application's priority or even disconnecting it from the database.

Instances and Spaces

DB2, similar to Oracle, manages databases in the context of instances. An instance is a copy of the database manager code in a specific directory. You can have more than one instance on a system. DB2 builds a directory structure around the instance, where the database files are stored. During installation, DB2 creates a default instance called DB2. You can create additional instances, change the location of the instance's directory structure, and switch among instances by setting the appropriate environment variables. Instances are started and stopped from the DB2 Control Center. An instance must be running before you can connect to a database, precompile an application, or bind a package to the database. Prior to using an instance, you must, in DB2 terminology, "attach" to the instance using a valid username and password.

Unlike Oracle, where each database requires its own instance, a DB2 instance can manage many databases. But each database exists in the context of an instance. You can create a database in various ways. The easiest way is to use the Create Database SmartGuide wizard provided by the Control Center. The SmartGuide prompts you for the required details and manages the database creation. It has optional steps for space allocation and basic performance tuning. You can also create a database using the create database command from the Command Center. This command provides all the options for aliases, code sets, collation sequences, extents, segment directories, and table spaces. Before gaining access to a database, you must connect using a valid username and password. Figure 2 illustrates the relationship between systems, instances, and databases.

The objects associated with a database need not always be stored in the database directory. DB2 uses various storage space structures: containers, nodegroups, partitions, table spaces, and extents. A container is a name for a directory, a device, or a file. A nodegroup is a named set of one or more nodes belonging to a database. A nodegroup can contain one or more database partitions. You use table spaces to assign the locations of the database and its tables to containers. A table space is a storage structure containing tables, indexes, large objects, and long datatypes. Table spaces reside in nodegroups. The table space selected to hold a table defines how the table is spread across the database partitions in a nodegroup. A single table space can span several containers. The extent size for a table space indicates the number of pages of data that will be written to a container before switching to the next container.

A database must contain at least three table spaces. It must contain one catalog table space, which contains all the system catalog tables for the database. It must contain one or more user table spaces, which contain all the user-defined tables. Each database has a default table space, so if you do not specify a table space name when creating a table, it is placed in the first table space (or in the default table space if you haven't created any). Finally, a database must contain one or more temporary table spaces for temporary tables. If there is more than one temporary table space, temporary objects are allocated in a round-robin fashion.

If you are planning database installations of any substantial size, you need to know about containers, nodegroups, partitions, table spaces, extents, and their interrelationships. The DB2 Administrator's Guide provides useful information.

Database Objects

DB2 supports a complete set of database objects. Most database objects, including systems, instances, databases, table spaces, tables, views, aliases, triggers, schemas, indexes, connections, replication details, buffer pools, user-defined datatypes, user-defined functions, packages, users, and groups can be created and managed from the Control Center or the Command Center. Some objects, such as event monitors and procedures, must be created and managed from the Command Center.

Schemas are new in DB2 V5. It supports a complete implementation of schemas. A schema is a named logical grouping of database objects. When an object such as a table, view, alias, distinct type, function, index, package, or trigger is created, it is assigned to a schema. If you do not explicitly specify a schema, the default schema (identified by your user ID) is assumed. Tables, views, indexes, and grants on these can also be created as part of the create schema command.

When you create a table, you have to assign it to a schema -- otherwise it is assigned to the default schema. Obviously, you also define the columns of the table, with their datatypes. DB2 supports integer, smallint, float, real, double, decimal, numeric, character (optionally for bit data), varchar (optionally for bit data), long varchar, blob, clob, dbclob, graphic, vargraphic, long vargraphic, date, time, timestamp, and user-defined distinct datatypes (UDDTS). A blob can be up to 2GB in size. You can have multiple blob columns, up to 4TB per table. clobs and dbclobs are special blobs for single-byte and double-byte character strings. There are various options you can specify for different columns, such as default values for nullable columns, compression for blob columns, and nonlogging for blob columns. You can specify different table spaces for the table, its blobs, and its indexes, optionally with partitioning -- otherwise everything is stored in the default table space.

DB2 also has an almost complete implementation of integrity constraints. A table can have named or unnamed primary key, unique, referential, and user-defined check constraints. A referential constraint can have no action, restrict, cascade, or set null options for deletes, but it can only have the restrict options for updates.

DB2 supports a complete implementation of triggers. You can define any number of triggers per table. The trigger specifies a set of actions that must be executed when an insert, update, or delete operation is performed on the base table. Triggers can fire before or after the operations on the base table. However, an operation executed before the operation on the base table will not cause other triggers to fire. A before-trigger may perform a select statement, set variables, or signal a SQL state. An after-trigger may perform a select statement, an insert statement, a qualified update statement (with a SQL where clause), a qualified delete statement, or signal a SQL state. Triggers can fire per row or per statement. The old and new values of the row, for example, in an update statement, can be assigned different role names. A trigger can also include a when clause to specify under which conditions it should fire. However, when you use triggers and constraints together, you should consider their execution sequences carefully. The interaction between triggers and constraints is explained very well in the SQL Reference Manual.

You can code DB2 stored procedures in Java (using IBM's Visual Age for Java, available separately), Basic (through IBM's Visual Age for Basic), or in 3GLs such as Cobol, Fortran, C, and C++. A stored procedure can process multiple rows and return multiple result sets. The stored procedure is typically not stored in the database's system catalogs but in a designated directory. This open and extensible server architecture makes your stored procedures independent of DB2 releases.

UDDTs and UDFs

A user-defined distinct type (UDDT) is a new datatype derived from an existing datatype. It is defined using the create distinct type statement, which names the new datatype, specifies its base datatype, and specifies whether comparisons between instances are allowed. Once you have created a UDDT you can use it to define columns in a create table statement.

DB2's UDDTs support strong typing, meaning that although a UDDT shares the same internal representation as the source type, the two types are incompatible. Instances of UDDTs cannot be used as arguments of functions or operations defined on the source type, not even for system-provided or built-in functions. Instances of the same UDDT can only be compared with each other if the with comparisons clause was specified during creation. You must provide equivalent user-defined functions for the built-in functions. However, when you create a new UDDT based on an existing datatype, DB2 generates the necessary typecasting functions between the UDDT and the existing datatype.

A user-defined function (UDF) is defined to extend the built-in SQL functions. It is created using the create function statement and can be called wherever a built-in function can. There are two types of UDFs: An external function is written in a programming language, while a sourced function's implementation is inherited from an existing function. External UDFs are classified as table or column functions. A column UDF can be used anywhere an expression is allowed, for example, in a select statement, a create view statement, a check constraint, or in a trigger definition. A column function returns a single scalar value. A table UDF can only be used in a from clause -- it returns an entire table as a result. UDFs are stored in DB2 in the same way as stored procedures.

I described UDDTs, user-defined datatypes (UDTs), and UDFs in the context of DB2's Relational Extenders, comparing them to Informix's DataBlades and Oracle's Cartridges, in my feature article this month, "Extending Relational DBMSs". Briefly, Relational Extenders are collections of UDDTs and UDFs, used to extend the functionality of the DBMS to cater to nontraditional data. DB2 Universal Server V5 is released with four built-in Relational Extenders: text, image, video, and audio.

Replication

DB2 V5's replication facilities are the result of bundling in what used to be a separate IBM product called Data Propagator. DB2's replication facilities copy modified data from one source database to one or more target databases. Replication is managed from the Control Center, which includes facilities for defining source and target tables, scheduling, data enhancement, and conflict detection. To designate a table for replication, it must use the data capture changes option, which indicates that extra information for replication must be logged. This is specified as part of the create table or alter table statements. Replication is performed by two programs: Capture and Apply. The Capture program takes changes to the source tables, as recorded in the database log, and stores them until the target is ready to receive them. The Apply program replicates the changes to the target databases. The data is replicated asynchronously, either pushed from the source to one or more targets or pulled off the source from the target. The Apply server can transform the data it is replicating.

DB2 can manage update-anywhere replication, allowing updates to the source or target tables in a replication configuration. You can specify transaction conflicts through declarative constraints or triggers, and you can resolve conflicts through compensating transactions.

In V5, various performance improvements have also been made to the replication mechanisms, including the use of stored procedures and static SQL, reduced connect processing, and uncommitted reads during the recording process.

Concurrency Control

Like most other relational DBMSs, DB2 uses locking as its concurrency control method. DB2 can lock rows, tables, and table spaces, with automatic escalation from row- to table-level locks, based on the availability of lock resources.

DB2 V5 employs four isolation levels: repeatable read, read stability, cursor stability, and uncommitted read. The isolation level determines how data is locked from other processes while the data is being accessed. The isolation level is in effect for the duration of a transaction. DB2 uses various lock modes to implement these isolation levels. The default lock modes can be overridden per transaction with the lock table statement.

OLAP Support

DB2 Universal Server V5 has various new features for OLAP, including star joins, cubes, roll-up, parallel querying, a cost-based optimizer, query rewriting, and bitmap indexes. DB2 uses bitmap indexes when processing warehouse-type queries. It builds the bitmap indexes dynamically during query processing and uses them during query execution. It applies a similar technique when processing star joins from large fact tables.

There are new extensions to the group by clause of the SQL select statement for roll-up processing and to form cubes. You can add so-called super groups to the group by clause, which in turn can consist of rollup and cube clauses, with additional grouping columns. The effect of a rollup clause is similar to a control break in report writers. The effect of a cube is equivalent to a cross tabulation.

Performance

DB2 V5 incorporates a number of performance enhancement features; many of these aim to exploit multiprocessor architectures. It employs a technique called intraquery parallelism, where complex queries are decomposed into subqueries that are executed in parallel on multiple processors. This suits symmetrical multiprocessor (SMP) architectures, where a number of agents can process the query on shared disks and in shared memory. Massively parallel processor (MPP) and cluster architectures are exploited through hash-based table partitioning or through databases partitioned across clusters. Other improvements aimed at multiprocessor architectures include parallel I/O, parallel index creation, parallel load, and parallel backup and restore utilities.

Other performance improvements to DB2 V5 include memory usage through multiple and tunable buffer pools that can be assigned to different table spaces, large memory support with 64-bit addressing, asynchronous writes, prefetch reading, dynamic space allocation, big block reads, and the Global SQL Cache.

Extensible and Extensive

After this evaluation of DB2 V5 for NT, my initial enthusiasm only waned for one reason: the vastness of its functionality, which can be tiring to evaluate. DB2 V5 is an extensive product. In this month's column I could only review a certain number of features important to a typical DBA. However, I didn't mention its rich security features, such as directory services, DCE security services, security roles, single login, and the use of Kerberos technology. Nor did I mention its backup and recovery facilities, such as table space-based and partition-based recovery, online backup and online restore, point-in-time recovery per table space, and faster restart.

DB2 V5 also includes many new application development features such as recursive queries for bill-of-material type structures; left, right, and full outer joins; UDFs; the Visual Explain utility to investigate the internal query access plans; and a call-level interface. This obviously also includes DB2 V5's extensive support for the Web, offering full support for Java. Applications developed in Java and Java applets can access a DB2 database directly through JDBC calls. You can also develop stored procedures, triggers, and functions in Java. Table-level UDFs, developed in Java, can be used to provide access to nonrelational data in the from clauses of SQL select statements.

From a DBA's point of view, DB2 V5 is extremely easy to install and administer. Database objects can be created and managed either through the Control Center, with its convenient GUI interface, or by executing scripts through the Command Center or the Script Center. In an environment with CASE tools or where there are multiple databases, you may prefer to use scripts to ensure that the databases are consistent with one another. Scripts are also useful for development, test, and deployment life cycles. DB2 is highly tunable, especially for multiple data locations and multiprocessor architectures. The only additions I would like to see are permanent bitmap indexes and a choice of table and index storage structures. At this stage you can only store data in tables of 4K pages and indexes with B-trieve structures.

I found DB2's most powerful feature by far to be its extensible architecture. Having added the Relational Extenders and the component UDDTs and UDFs, IBM is letting its users extend the functionality of their DB2 servers to cater to any application domain. The extended-relational DBMS competition just got tougher.

Figure 1.


DB2's Control Center facility.

Figure 2.


The relationship among systems, instances, and databases in DB2.
Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can reach Martin via the Internet at
mr@dba.co.za.
What did you think of this article? Send a letter to the editor.


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
December 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated October 31, 1997