
In this month's column, I investigate the merits of using the declarative referential-integrity controls provided by popular DBMSs, as opposed to implementing these integrity controls "manually" using triggers. (For a more detailed description of triggers, see my May 1996 DBMS column, page 89.) I will first discuss referential integrity in generic terms, including what I see as the requirements of a declarative referential-integrity control mechanism. Then I'll review the declarative referential-integrity controls provided by some of the popular DBMSs, noting the options you have for do-it-yourself referential-integrity control.
Referential-integrity constraints are based on the concept of a foreign key. A foreign key is a column or set of columns in one table that refer to the primary key of another table. The table containing the foreign key is called the referencing, foreign, or child table, while the table containing the primary key is called the referenced, primary, look-up, or parent table. The reference from the foreign key to the primary key is the mechanism whereby the relationships between the occurrences of rows in the two tables are enforced.
As an example, consider the following tables used to store data about members borrowing books from a library:
BOOKS (BookNo, Title, Author).
MEMBERS (MemberID, MemberName, MemberType).
LOANS (BookNo, MemberID, LoanDate, ReturnDate).
MEMBERTYPES(MemberType, Privileges).
BookNo is a foreign key in the loans table because it is the primary key of the books table. loans is the foreign table and books the primary table. MemberType is a foreign key in the members table because it is the primary key of the membertypes table.
The relational referential-integrity constraint specifies that either of the following must be enforced for a foreign key (C.J. Date, An Introduction to Database Systems -- Volume I, 5th edition, Addison-Wesley, 1990):
* The value of the foreign key must be a key value in the corresponding primary table.
* The value of the foreign key must be undefined (that is, null). This only applies if the foreign key does not participate in the primary key of the foreign table and if the foreign-key columns in the foreign table are allowed to have null values.
In the previous example, each BookNo in the loans table must be a BookNo that is recorded in the books table. In this case the BookNo in the loans table may not take an undefined null value as it forms part of the key of the table. If MemberType is an optional column in the members table, it must either take a value that exists in the membertypes table, or it must have a null value.
When I define a referential-integrity constraint, I should be able to specify what action the DBMS must take when the constraint is violated in each of the following scenarios:
* when a row containing an invalid foreign-key value is inserted in the foreign table
* when a foreign key in the foreign table is updated to an invalid value
* when a row with a referenced primary key is deleted from the primary table
* when a referenced primary key is updated in the primary table
I should be able to specify the following actions when the constraint is violated:
* Restrict the operation, or roll back the operation that violated the referential integrity-constraint.
* Allow the operation, but warn the user of the violation. Relational purists may cringe at this requirement, but it is useful for archive tables and in data warehouse databases.
* In the case of operations performed on the primary table, set the foreign columns to null.
* In the case of operations performed on the primary table, set the foreign columns to a predefined default value.
* In the case of operations performed on the primary table, cascade the operations on the primary table down to the foreign tables. This includes deleting all the referencing rows (a cascading delete) and updating all the referencing foreign-key values (a cascading or collateral update).
ANSI has specified a standard SQL syntax for referential-integrity constraints as part of the SQL-92 specification. (See J. Melton & A.R. Simon, Understanding The New SQL: A Complete Guide, Morgan Kaufmann, 1993.)
constraint CONSTRAINTNAME
foreign key (ColumnList) references PRIMARYTABLE (ColumnList) [match { full | partial }]
on delete { no action | cascade | set default | set null | warning }
on update { no action | cascade | set default | set null | warning }
[initially { deferred | immediate }]
[[not] deferrable]
The match clause specifies how partial and null-valued foreign keys should be handled. The on clauses specify the referential actions that should be performed when the referenced rows in the primary table are deleted or updated. If the on clauses are not specified, the default no action is assumed. The initially and deferrable clauses are used to specify the time when the referential-integrity constraint should be enforced. Most of these clauses are part of the intermediate-level ANSI-92 standard. The entry-level standard only specifies the foreign key ... references ... clause. The intermediate-level ANSI specification addresses most of the requirements I described previously, except the option to allow a constraint violation and only warn the user about it.
In addition to these requirements, I want to specify a different error or warning message for each of the four scenarios when a constraint is violated. For example, I want to have different error messages when the constraint is violated because of the insertion of a row in the foreign table and when it is violated because of the deletion of a row from the primary table. Sometimes I want to specify conditional referential-integrity checking for some of the four scenarios. An example is when users want to use predefined "undefined" values rather than null values. When the predefined "undefined" value is inserted in a foreign-key column, the referential-integrity constraints should not be checked.
I would like to propose and use the syntax in Listing 1 (page 90) for a standard declarative referential integrity constraint, which encompasses the required functionality described previously.
The two table names in the constraint definition let you specify the constraints independently from the corresponding tables. The alias clauses let you use abbreviated names in the where clauses. The where clauses let you define conditional constraint checking. The on clauses let you specify the different actions you want activated when the constraint is violated. The message clauses let you specify the messages you want returned when the corresponding action is taken.
Let's look at another example. The referential-integrity constraint from the BookNo foreign-key column in the loans table to the books table would then be defined in the following way, using the ANSI SQL-92 standard, as part of the definition of the loans table:
constraint LOANStoBOOKS
BookNo references BOOKS ( BookNo )
on delete no action
on update cascade
I would like to define it as shown in Listing 2 (page 90) using my proposed syntax.
Although it is not part of the ANSI SQL-92 syntax, some DBMSs let you tailor the error messages returned to users. With these DBMSs you could implement the messages illustrated here. I will point out these DBMSs in the examples in the remainder of this column. Few DBMSs, however, provide you with the functionality to implement the conditional referential checking illustrated for the "referenced delete" case.
Most relational DBMSs support declarative integrity constraints, each with its own features and shortcomings. Let's look at the implementations of declarative referential-integrity constraints in CA-OpenIngres 1.2, Borland InterBase Server 4.0, Microsoft SQL Server 6.0, Oracle Enterprise Server 7.2, SQLBase 6.0 (also licensed as CA-OpenIngres/Desktop 1.1), and Sybase SQL Server 10. For each DBMS I describe the functionality provided, the user-friendliness of the implementation, and the available options for "do-it-yourself" referential-integrity control. Note that other DBMSs such as Informix-OnLine and Progress Database Server also support declarative referential-integrity constraints.
In CA-OpenIngres 1.2, you can define basic declarative referential-integrity constraints as part of the create table and alter table statements, using entry-level ANSI SQL-92 syntax, as shown in the previous example. CA-OpenIngres does not yet support the on clause; therefore, the only action available when a constraint is violated is the default to restrict the operation.
CA-OpenIngres returns a default, system-generated message to the application when a constraint is violated. For our loanstobooks example, it returned the message "E_US1906 Cannot INSERT into table '"loans"' because the values do not match those in table '"books"' (violation of REFERENTIAL constraint '"$loans_r000000ed00000000"')." The constraints may have names. However, if you do not give names to your constraints, CA-OpenIngres generates default (read: illegible) names for them, as shown in the message. If you want to manage the constraints individually (for example, drop or replace them), you must refer to them by name. In such a situation, you can use the "help table constraint" utility to obtain the constraints' names.
Alternatively, you can code your own referential-integrity constraints using rules and procedures (the CA-OpenIngres implementation of triggers). The important points to remember are: the rules fire after the operations, and they are row-based. In other words, the rules fire for each row affected by the triggering operation. Interestingly enough, CA-OpenIngres 1.2 uses operation-based rules and stored procedures internally for declarative integrity checking, where the rules and procedures can pass sets of values as parameters. This means that an operation fires the rule once, which calls the procedure once to perform the integrity check once (using a set-based query), even if a thousand rows were updated.
In Borland InterBase 4.0, you can define basic declarative referential-integrity constraints as part of the create table and alter table statements by using entry-level ANSI SQL-92 syntax. The only action available when a constraint is violated is the default to restrict the operation. InterBase returned the following system-generated message for our loanstobooks example: "Statement Failed, SQLCODE = -530. Violation of FOREIGN KEY constraint "INTEG_8" on table "LOANS"." Similar to CA-OpenIngres, InterBase generates unique names for unnamed constraints, as shown in the message.
You can also code your own referential-integrity constraints using InterBase's powerful triggers. InterBase's pre-operation triggers are useful and efficient for checking if an operation is going to violate a referential-integrity constraint before the operation is actually performed. As I described in last month's column, you can define InterBase triggers on views, or you can perform your integrity check triggers against a view that may, for example, be the union of two or more tables. This is very useful for specialization and generalization hierarchies. It is a pity that InterBase doesn't allow declarative referential-integrity checks from or to views. That would have matched the power and orthagonality provided by its triggers on views.
In Microsoft SQL Server 6.0, you can define basic declarative referential-integrity constraints as part of the create table and alter table statements, using entry-level ANSI SQL-92 syntax. The only action available when a constraint is violated is the default to restrict the operation.
For our loanstobooks example, Microsoft SQL Server returned the system-generated message: "Msg 547, Level 16, State 2 INSERT statement conflicted with COLUMN FOREIGN KEY constraint 'FK_LOANS_BookNo_18E19391' The conflict occurred in database 'library,' table 'BOOKS,' column 'BookNo'." Microsoft SQL Server also assigns system-generated names to unnamed constraints, as shown in the message. You can use the Microsoft SQL Enterprise Manager to obtain the constraints' names.
You can also code your own referential-integrity constraints using Microsoft SQL Server's triggers. The referential-integrity checking code must be integrated with all of the other trigger code in the three triggers allowed per table. Remember also that the triggers are fired after the triggering operations have been performed; thus, the trigger must check the effects of the operation after it has occurred and then remove it if it caused an integrity violation.
In Oracle 7.2, you can define standard declarative referential-integrity constraints as part of the create table and alter table statements, using ANSI SQL-92-like syntax. Oracle 7.2 supports the restrict and cascade options for delete operations on the primary table, but only the restrict option for updates on the primary table. Note that the ANSI SQL-92 standard syntax is actually no action, and not restrict, as implemented in Oracle. Oracle supports the match none, match full, and match partial options for composite and nullable foreign keys, which few of the other DBMSs do at this time.
Oracle returns a default, system-generated message to the application when an operation is restricted, and it generates default names for unnamed constraints. You must query the dictionary tables for the constraints' names if you want to manage them individually. You can also code your own referential-integrity constraints using Oracle's powerful triggers. Oracle's pre-operation and pre-row triggers are useful and efficient for checking whether an operation will violate a referential-integrity constraint before the operation is actually performed.
In SQLBase 6.0, you can define standard declarative referential-integrity constraints as part of the create table and alter table statements, using ANSI SQL-92-like syntax. SQLBase supports the restrict, set null, and cascade options for delete operations on the primary table, but it only supports the default restrict action for update operations. Note again that restrict is not standard ANSI SQL-92 syntax.
SQLBase returned the following system-generated message for our loanstobooks example: "Error: Unmatched foreign key values." This doesn't really help much for a large table with 10 or more referential-integrity constraints. However, in SQLBase you can redefine the messages coupled to a referential-integrity constraint by following a clearly outlined procedure described in the manual. What is more, you can define four different messages, one for each of the four possible actions that may violate the constraint. First add your own error messages to the SQLBase configuration file, then alter the constraints using a variation of the alter table statement.
For our example, you would add the following four messages to the SQLBase configuration file:
13001 'Cannot insert the loan as the book does not exist'
13002 'Cannot change the loan as the book does not exist'
13003 'Cannot delete the book as it is out on loan'
13004 'Cannot update the book as it is out on loan'
Then you would execute the following four SQL statements through the SQLTalk interactive SQL interface:
ALTER TABLE LOANS ADD USERERROR 13001 FOR INSERT_DEPENDENT OF FOREIGN KEY BookNo; ALTER TABLE LOANS ADD USERERROR 13002 FOR UPDATE_DEPENDENT OF FOREIGN KEY BookNo; ALTER TABLE LOANS ADD USERERROR 13003 FOR DELETE_PARENT OF FOREIGN KEY BookNo; ALTER TABLE LOANS ADD USERERROR 13004 FOR UPDATE_PARENT OF FOREIGN KEY BookNo;
This gives you full, user-defined error messages for the declarative referential-integrity constraint. For unnamed constraints, SQLBase generates unique names based on the foreign-key column name, such as BookNo for our loanstobooks example. The constraints are always named in the context of the table on which they are defined; they are never referred to by database-wide unique names.
You can also code your own referential-integrity constraints using SQLBase's powerful triggers. SQLBase's pre-operation triggers are good tools for checking if an operation will violate a referential-integrity constraint before the operation is actually performed. The problem with SQLBase triggers and procedures is that it is difficult to return meaningful error messages to the application. However, you can return the default referential-integrity violation message from the SQLBase configuration file, you can tailor it, or you can add your own messages to the file as described previously.
In Sybase SQL Server 10, you can define basic declarative referential-integrity constraints as part of the create table and alter table statements, using entry-level ANSI SQL-92 syntax. The only action available when a constraint is violated is the default to restrict the operation.
Sybase SQL Server returned the following system-generated message for our loanstobooks example: "Foreign key constraint violation occurred, dbname = library, table name = LOANS, constraint name = LOANS_BookN_143." You can redefine the message returned by a referential-integrity constraint by following a clearly outlined procedure described in the SQL Server manual. First, add your own messages to the SysUserMessages table using the "sp_addmessage" procedure, then bind the message to the constraint using the "sp_bindmsg" procedure. However, you can only assign a single message to the constraint. This means that you must use a generic message that is applicable to all four of the actions that may violate the constraint. Similar to the other DBMSs, Sybase SQL Server assigns system-generated names to the constraints, according to a well-documented algorithm. If you want to manage the constraints' names individually, you can obtain the names by querying the system catalogs using the "sp_helpconstraint" procedure.
You can also code your own referential-integrity constraints using Sybase SQL Server's triggers. The referential-integrity checking code must be integrated with all of the other trigger code in the three triggers allowed per table. Remember also that the triggers are fired after the triggering operations have been performed; thus, the trigger must check the effects of the operation after it has occurred and then remove the operation if it caused an integrity violation.
Of the set of DBMSs I investigated, all support basic referential-integrity conrols, but:
* Only Oracle and SQLBase support more than the no action option for delete operations. Oracle supports cascade and restrict, as they call it, while SQLBase supports cascade, set null, and restrict. For update operations, all of the DBMSs support the default no action option only.
* Only SQLBase and Sybase SQL Server provide the facilities to change the default system-generated messages returned by the referential-integrity constraints. Only SQLBase provides the facility to define four different messages for the four different actions that can violate a referential-integrity constraint.
* Only Oracle supports the match clauses specified in the ANSI SQL-92 syntax for nullable and partial foreign keys.
Although most popular DBMSs, generally speaking, support a richer functionality than specified in the ANSI SQL-92 standard, none supports the level of integrity control proposed in the first part of this column. Most DBMSs actually lag behind the standard regarding declarative referential-integrity constraints. This is quite curious to me, because many users are implementing referential-integrity constraints extensively in their database server using triggers. It seems strange that these users haven't applied more pressure on the DBMS vendors and the standards groups to improve declarative referential-integrity constraints.
constraint CONSTRAINTNAME
FOREIGNTABLE [alias] (ColumnList) references PRIMARYTABLE [alias] (ColumnList) [match { full | partial }]
on referencing insert { no action | warning }
[ where clause ]
[ message text ]
on referencing update { no action | warning }
[ where clause ]
[ message text ]
on referenced delete { no action | cascade | set default | set null | warning }
[ where clause ]
[ message text ]
on referenced update { no action | cascade | set default | set null | warning }
[ where clause ]
[ message text ]
[initially { deferred | immediate }]
[[not] deferrable]
constraint LOANStoBOOKS LOANS L ( BookNo ) references BOOKS B ( BookNo ) on referencing insert no action message 'Cannot insert the loan as the book does not exist' on referencing update no action message 'Cannot change the loan as the book does not exist' on referenced delete no action where L.ReturnDate is null message 'Cannot delete the book as it is out on loan' on referenced delete cascade where L.ReturnDate is not null message '<<RowCount>> loans were deleted' on referenced update cascade message '<<RowCount>> loans were updated'