DBMS
DBMS, August 1996
Server Side By Martin Rennhackkamp

Integrity Control

A Comparative Look At How Five Leading DBMS Vendors Support Data Integrity Controls

In this month's column I complete my miniseries on the data integrity controls provided by popular DBMSs. I investigate key, domain, and column integrity constraints. Remember that integrity constraints ensure that the data in a database is correct and accurate (I covered referential-integrity constraints in last month's column). First I explain the integrity constraints in generic terms. Then I explain the ANSI SQL syntax that supports these concepts. Finally, I review the integrity controls provided by some of the popular DBMSs, also noting the options for "do-it-yourself" integrity control.

Integrity Controls

In this section I investigate key, domain, and column integrity constraints in generic terms. First, a discussion on key integrity. Every table (relation) must have a primary key. The primary key of a relation is a set of columns (attributes) that uniquely identify the rows (tuples) of the relation. If the primary key consists of more than one column, it is called a composite primary key. The following are characteristics of the primary key:

  • Uniqueness: The primary key values must identify the rows in the table uniquely.
  • Minimality: If the primary key is a composite, no column can be eliminated from the primary key without losing the uniqueness property.

    Key Integrity Constraints. Apart from uniqueness, the key integrity constraint, which must be enforced by the DBMS, must ensure that no component of the primary key is allowed to take the undefined (null) value. For example, consider a table that represents courses taken by students: ENROL(StudentNo, CourseNo, CourseYear, Semester, FullTime, TestMark, ExamMark).

    Each row in the table must have values for the StudentNo, CourseNo, CourseYear, and Semester columns. These all make up the composite key, representing the fact that a student may take the same course in the same semester in different years, or in different semesters in the same year (for example, if he or she failed the course the first time).

    Domain Integrity Constraints. Domains are used in the relational model to define the characteristics of the columns of a table. The domain specifies its own name, data type, and logical size. The logical size represents the size as perceived by the user, not how it is implemented internally. For example, for an integer, the logical size represents the number of digits used to display the integer, not the number of bytes used to store it. The domain integrity constraints are used to specify the valid values that a column defined over the domain can take. You can define the valid values by listing them as a set of values (such as an enumerated data type in a strongly typed programming language), a range of values, or an expression that accepts the valid values. Strictly speaking, only values from the same domain should ever be compared or be integrated through a union operator.

    Note that a formal treatment of the domain concept would require the following for all of the domains:

  • the ability to specify the complete set of domains that apply to a given database (the result of any operation on any column defined over any domain must then yield a result in one of the specified domains)
  • the ability to specify - for every domain, pair of domains, triplet of domains, and so on - which operators can be applied to the values taken from the domains, as well as what the domain of the result must be
  • the ability to specify an ordering of the values in the domain

    In the previous example, FullTime would be defined over a domain called Flag, of data type and size char(1), which may take the values "y" and "n." TestMark and ExamMark would be defined over a domain called Mark, of data type decimal(5,2), which may take the real values between 0 and 100 inclusively.

    Column Integrity Constraints. A column integrity constraint is used to specify additional restrictions on a column defined over a domain. You use a column integrity constraint to further refine or restrict the definition of a generic domain for the specific context of the table in which the domain is used. This can be used to eliminate the proliferation of a number of similar domains. Another well-known column constraint is the not null, which is used to specify that a particular column may not take the undefined null value. My previous example had CourseYear defined over a domain called Year, of data type integer (or date), which, for example, may take the integer values between 1900 and 2500. The Year domain is used for columns to represent dates, for example BirthYear (such as for people), PublicationYear (such as for books and journals), and, of course, CourseYear, as in our example. However, we want to restrict the Year domain even further for our CourseYear column in the ENROL table. These courses are only being offered from 1957 up to the present year. The restriction that CourseYear may only take values between 1957 and the year of the system date is a column constraint.

    The material in this section is based on C.J. Date's text, An Introduction to Database Systems, Volume 1, Addison-Wesley, 5th edition, 1990, and Dr. E.F. Codd's text, The Relational Model for Database Management, Version 2, Addison-Wesley, 1990.

    ANSI SQL Syntax

    You can implement key, domain, and column integrity constraints using the ANSI SQL-92 standard syntax. You implement a key integrity constraint using the primary key clause of the create table statement. You create a domain using the create domain statement and define the domain integrity constraints using the check clause of the create domain statement. Once you have defined the domains, you can use them in the create table statement for the columns' data types, instead of the standard system data types. You define column constraints using the check clause and the optional constraint clause of the column definition in the create table statement.

    Listing 1 shows how the ENROL table would be specified using the ANSI SQL-92 standard syntax. The "date_trunc" function in the listing truncates the date to the year. Date extraction and data type conversion functions are dependent on implementation - they are not specified in the ANSI SQL-92 standard. Note that I would like (read: "require") a DBA-specified message for every constraint and for every domain definition, similar to the messages I described in last month's column on referential-integrity constraints (See DBMS, July 1996, page 74). Alas, such a message is not possible using the ANSI SQL-92 standard syntax.

    The material in this section is based on J. Melton and A.R. Simon's text, Understanding the New SQL: A Complete Guide, Morgan Kaufmann, 1993, and S. Cannan and G. Otten's text, SQL - The Standard Handbook, McGraw-Hill, 1993.

    Implementations

    Most relational DBMSs support integrity constraints to some extent, and each implementation has its own features and shortcomings. The following sections describe the implementations of the key, domain, and column integrity constraints in Computer Associates CA-OpenIngres 1.2, Borland InterBase Server 4.0, Microsoft SQL Server 6.0, Oracle Server 7.2, and Centura SQLBase 6.0 (also licensed as CA-OpenIngres Desktop 1.1). For each DBMS, I describe the functionality provided and the options available for "do-it-yourself" integrity control. Note that other DBMSs such as Informix-OnLine, Oracle Enterprise Server 7.2, Progress Database Server, and Sybase SQL Server also support integrity constraints.

    CA-OpenIngres

    CA-OpenIngres does not support domains, so you must implement each domain constraint as column constraints for every column in which the domain is used. Well, that's not really the full story. If you are brave, you can buy the CA-OpenIngres Object Management Extension and code your own domains as user-defined data types. With this approach you must code 16 procedures in C for each user-defined data type and link these into the CA-OpenIngres server.

    CA-OpenIngres supports key constraints such as those in the ANSI SQL-92 syntax shown in the previous section. To implement the key integrity constraint, CA-OpenIngres created a unique B-tree index, called "$enrolkey," behind the scenes, owned by the owner of the table. The index has the column structure of a normal user-defined index, namely the key columns and ID of the row in the table in which the actual data is stored. CA-OpenIngres also lets you create your own unique B-tree index on the same columns. You cannot change the system-generated message that is returned when the key integrity constraint is violated.

    CA-OpenIngres supports column constraints such as those in the ANSI SQL-92 syntax shown previously. To implement the column constraints and the domain constraints specified as column constraints, CA-OpenIngres created a rule (the Ingres implementation of triggers) for each constrained column. The rules had names such as "$enrol_c000000f900000000," and they were stored in the iirules catalog in which user-defined rules are normally kept. The column called "text_segment" is conveniently blanked out, however, so you can't investigate the rule's text. Rules always call procedures to perform the necessary actions, as I described in my column on triggers (see DBMS, May 1996), but there were no procedures created in the iiprocedures catalog. You cannot change the messages returned by these constraints.

    Alternatively, you can code your own integrity constraints using CA-OpenIngres rules and procedures. Remember that the rules fire after the operations and the rules are row-based; in other words, the rules fire for each row affected by the triggering operation. Interestingly enough, internally CA-OpenIngres 1.2 uses operation-based rules and stored procedures for integrity checking. With these internally used rules and procedures, the rule fires only once for all of the rows affected by an operation, and the rule passes a set of values to the procedure as parameters. However, this functionality is not (yet) generally available for you to create your own integrity constraint rules and procedures.

    InterBase

    InterBase has one of the best implementations of domains I have come across. InterBase has been supporting domains since as long ago as 1989, when I first worked with it (see my July column for an in-depth look at InterBase 4.0). InterBase 4.0 supports a create domain statement, with proper domain-based integrity constraints. However, you cannot name the domain constraints in the domain's check clause, and you cannot change the messages returned by these constraints. To implement the domain constraint that was defined for the Mark domain in Listing 2, InterBase created a constraint named INTEG_35. There were no triggers referring to this constraint, only field descriptions that point to it. The ENROL table in my example would be specified as shown in Listing 2 using the InterBase syntax.

    InterBase supports the primary-key constraint using the unnamed primary key constraint clause of the ANSI SQL-92 syntax shown in Listing 1. You cannot change the message returned by the primary-key constraint. To implement the key integrity constraint, InterBase created an index called "rdb$primary2." It also allows you to create your own unique index over the same columns as those in the primary key. Interestingly enough, the query plan for a select query, which specified a single key value in the where clause, indicated that it used both indexes!

    InterBase supports column constraints using a syntax similar to the ANSI SQL-92 syntax in Listing 1. The column constraints must be specified either as unnamed check constraints in the create table statement or as named constraints using the alter table ! add constraint ! statement. You cannot change the messages returned by these constraints, either. To implement a column constraint on the TestMark column in Listing 2, InterBase created two triggers called check_5 and check_6, which fired respectively before insert and before update operations on the ENROL table.

    You can also code your own integrity constraints using triggers. InterBase's preoperation triggers are efficient for checking whether the constraints are about to be violated, before they are actually violated (I also explained this in my May column).

    Microsoft SQL Server

    Microsoft SQL Server 6.0 does not support domains per se, so you may be tempted to implement each domain constraint as column constraints for every column where the domain is used. However, you can create your own user-defined data types (domains) and bind rules (domain constraints) to them. The constraints are thus always enforced where the domain is used. This gives you the full functionality of domains, albeit using a syntax different from the proposed ANSI SQL-92 syntax. If the data type is changed, you have the option of making that change in all occurrences of the data type or only when the data type is used in the future. Unfortunately, if incorrectly used, this flexibility can cause your domain usages to become inconsistent.

    The ENROL table of my example would be specified as shown in Listing 3, using Microsoft SQL Server's sp_addtype and sp_bindrule system stored procedures. Microsoft SQL Server supports key constraints, as does the ANSI SQL-92 syntax shown in Listing 1. In addition, the clustered keyword can be used to indicate that the physical table must be sorted around the key values, instead of creating a separate secondary index. For each primary-key constraint and each unique constraint in the database definition, Microsoft SQL Server will create an index in the database. If no name is given to the constraint, a constraint name is automatically generated. This name is also used for the index name, such as "pk_enrol_411B38CB" in my example. You cannot change the messages returned by these constraints. However, if you are brave, you can change the error message template that is stored in the sysmessages table in the master database, or you can use the system stored procedure sp_altermessage to change the template message text.

    Microsoft SQL Server supports column constraints using the ANSI SQL-92 syntax shown in Listing 1. You cannot change the system-generated messages that are returned when these constraints are violated. Alternatively, you can code your own integrity constraints using Microsoft SQL Server's triggers. The integrity checking code must be integrated with all of the other trigger code in the three triggers allowed per table. The triggers fire after the triggering operations have been performed; thus the trigger must check the effects of the operation after it has occurred and then remove it if it caused an integrity violation (as I described in my May column).

    Oracle

    Oracle does not support domains, so you must implement each domain constraint as column constraints for every column in which the domain is used. Oracle supports the key constraints as in the ANSI SQL-92 syntax shown in the previous section. To implement the key integrity constraint, Oracle created a unique index, called "ENROLLKEY," behind the scenes, owned by "SYS." The index has the column structure of a normal user-defined index, consisting of the key columns. Oracle does not let you create your own unique index on the same columns. You cannot change the system-generated message that is returned when the key integrity constraint is violated.

    Oracle supports column constraints as in the ANSI SQL-92 syntax shown previously. To implement the column constraints and the domain constraints specified as column constraints, Oracle created only constraints; it did not create any triggers, as some other products do. The constraints were named correctly, according to the names of the constraints in the definition. You cannot change the messages returned by these constraints. Alternatively, you can code your own integrity constraints using Oracle triggers (again, as I described in my May column).

    SQLBase

    SQLBase does not support domains, domain constraints, or column constraints. This is quite strange if you consider the fact that SQLBase has one of the better implementations of declarative referential-integrity constraints (see DBMS, June 1996, page 88). SQLBase supports the key integrity constraint through the primary key clause, based on the standard ANSI SQL-92 syntax. However, the definition seems to be logical only because SQLBase requires you to define a unique index over the columns making up the primary key. It renders the table "incomplete," and you cannot use it in any way until you have defined this particular unique index. In addition, you cannot name the constraint.

    In SQLBase you have no choice but to define your domain and column integrity constraints using triggers. Fortunately, SQLBase has very powerful triggers. SQLBase's preoperation triggers are useful and efficient for checking whether an operation will violate an integrity constraint before the operation is actually performed. The problem with SQLBase triggers and procedures is that it is difficult to return meaningful error messages to the application. However, you can add your own messages to the SQLBase configuration file, as described in detail in my June column.

    A Far Cry...

    Although most of the major RDBMSs provide some support for the domain, key, and column integrity constraints (mostly based on some limited form of the proposed ANSI SQL-92 syntax), their implementations are still far from ideal. First, most of these RDBMSs do not yet support the full ANSI SQL-92 syntax. Second, few of the implementations can be tailored to return meaningful and informative messages to the user or the application (in this respect, the ANSI SQL-92 standard also falls short). Finally, the integrity constraints are components of the logical data model. In theory, they should not affect the physical implementation. However, in most of the implementations, the way that the constraints are actually implemented is so physically explicit that it can affect the way that DBAs manage their databases. For example, DBAs cannot blindly drop and re-create all of the indexes listed in the system catalogs. The transition from not supporting declarative constraints to supporting them has not been very well implemented in the popular RDBMSs. Most of them exploit existing physical database objects rather crudely to support these logical concepts.

    The biggest problem, however, is that none of the RDBMSs, with the notable exception of InterBase, actually support domains at all - let alone support them properly. If I may quote from Dr. Codd's text: "When I introduced the domain concept into database management 20 years ago as part of the relational data model, it was regarded by almost all of my IBM colleagues as a purely academic exercise. It is now time for the implementors of DBMS products to recognize that the domain concept must be implemented as part of the DBMS if these products are to provide adequate support for database integrity. Without adequate support for database integrity, the DBMS vendors are asking the DBMS consumers to put their businesses at unnecessary risk." I couldn't have said it better myself!


    Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can reach Martin via the Internet at mr@dba.co.za.
    * Borland International Inc., 100 Borland Way, Scotts Valley, CA 95066; 800-245-7367, 408-431-1000, or fax 408-431-4122; http://www.borland.com.
    * Centura Software Corp., 1060 Marsh Rd., Menlo Park, CA 94025; 800-444-8782, 415-617-8500, or fax 415-321-5471; http://www.centurasoft.com.
    * Computer Associates International Inc., One Computer Associates Plaza, Islandia, NY 11788; 800-225-5224, 516-342-5224, or fax 516-342-5734; http://www.cai.com.
    * Microsoft Corp., One Microsoft Way, Redmond, WA 98052; 800-426-9400, 206-882-8080, or fax 206-936-7329; http://www.microsoft.com.
    * Oracle Corp., 500 Oracle Pkwy., Redwood Shores, CA 94065; 800-672-2537, 415-506-7000, or fax 415-506-7200; http://www.oracle.com.

    Table of Contents - August 1996 | Home Page
    Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
    Redistribution without permission is prohibited.
    Please send questions or comments to mfrank@mfi.com
    Updated Monday, August 12, 1996