DBMS

Building a DBA Repository System, Part 2

By Martin Rennhackkamp
DBMS, February 1996

How a Repository System can help your organization manage its physical database.


In this second part of a two-part series, I explain how you can use repository technology for physical database management. Last month, I illustrated how repository technology can be applied to physical database management, and I highlighted the shortcomings of existing repository-based tools with respect to physical database management. This month, I describe a repository-based system that we have developed at The Data Base Approach for physical database management. I give an overview of the DBA Repository System, explain how it is used throughout the database system life cycle, evaluate the merits of this system, and conclude by indicating the future extensions we plan for the system.

The DBA Repository System is a repository-based physical database management tool that is used primarily by database administrators (DBAs). It is used to document, report, plan, implement, and manage various databases on various platforms through a consistent interface (see Figure 1). It provides online access to all of the managed databases for all database administration tasks. It is used for database management, database object management, configuration management, and detailed performance tuning.

The DBA Repository System also documents all the installation details, such as nodes, operating system parameters, DBMS servers, and their parameters and disk storage locations. It is used to document and manage all of the database objects, including domains, tables with different implementation details, extensive integrity constraints, indexes, triggers, rules, procedures, events, and views of arbitrary complexity. The database objects are grouped into schemas and logical databases, which you can deploy on various physical databases. The Repository System is also used to document and manage all the rights of the security-related entities (that is, the users, groups, and roles, with respect to all the database objects and installations). It generates all of the SQL scripts necessary to manage all the database objects and security rights, and it runs the scripts against the target physical databases.

The DBA Repository System has several unique features. First, it is highly configurable. Each user can set up a profile, customizing the system's behavior for his/her personal preferences. This facilitates multiple language support for the SQL scripts that run against the various target physical databases. These personal profiles, together with versioning and object definition "freezing," are very useful in managing databases shared by multiple application projects and deployed in various phases.

Database Life Cycle

Figure 2 illustrates the part of the database life cycle that affects physical database management, as we use it at The Data Base Approach. It starts with requirements analysis and logical data modeling. Physical database management per se starts in the next phase with physical data modeling, then progresses through transaction analysis, implementation, deployment, and finally performance tuning. In multi-phase projects, this sequence is repeated. Projects that employ extensive prototyping have similar cycles, but with smaller loop-back cycles contained within the larger cycle. Note that I focus on physical database management in this article. In terms of John Zachman's Information System Architecture (ISA) framework, I concentrate only on the data dimension.

During requirements analysis, the users' requirements are documented in objective hierarchies and events, operations, data, and constraints glossaries. During this pre-database management phase, you can use some of the more extensive and extendible CASE tools, specifically those with good documentation and cross-referencing capabilities.

Logical Data Modeling

During the logical data modeling phase, the users' data and constraints requirements are represented as a logical data model. In most cases, a straightforward, well-documented, and normalized entity-relationship model is enough to represent the users' requirements. In some situations, however, we must use extensions to the basic entity-relationship model, specifically where complex structures and interrelationships must be modeled.

The data and operations requirements are also transformed into entity-life histories. An entity-life history is a logical, implementation-independent model that represents the interactions between entities and operations. Entity-life histories are very useful for ensuring that sufficient attention has been paid to the life cycle of each entity, including data archiving, data warehousing, and data burial (the removal of the data from the system).

For this pre-physical database management phase, most CASE tools provide extensive entity-relationship modeling capabilities, some with dictionaries to document the models completely and consistently, with cross-references between the various objects and diagrams. Only a few of the more advanced CASE tools provide for specialization and generalization hierarchies. Some of the more advanced CASE tools provide proper entity-life history models, and these models are cross-checked to the entity-relationship diagrams.

Physical Data Modeling

During the physical data modeling phase, the logical model is transformed to a physical model. The physical model is then checked for accuracy. After that, the implementation-specific details are added. During this phase, we also add the objects beyond the definition of the relational data model (as well as the security specifications) to the physical data model.

The physical model of a database is a representation of the data model that can be implemented using a DBMS such as CA-OpenIngres, Oracle, Gupta SQLBase, and so on. The data model is represented in terms of domains, relations (tables), integrity constraints (including domain and referential constraints), and indexes. All the DBMS-specific aspects are also specified. For example, for CA-OpenIngres, the uniqueness, table structure (b-tree, ISAM, hash, or heap), minpages, maxpages, fill factors, compression, journaling, and optimization details are added. The physical data model must adhere to the normalization rules. It must be at least in the third-normal form, or preferably in the Boyce-Codd or fourth-normal form.

Some of the more advanced CASE tools provide logical to physical transformation utilities, some of which can apply normalization algorithms. Few CASE tools, however, enforce the use of domains, and even fewer provide for the correct implementation of specialization and generalization hierarchies. We have not yet found a CASE tool that addresses all the DBMS-specific implementation details for CA-OpenIngres.

During this phase, the DBA Repository System's repository is populated with the physical data model, where the domain, table, referential integrity constraint, and index definitions are captured. We can perform this manually or by importing a SQL script generated by a CASE tool.

The DBA Repository System supports a more complete implementation of the relational data model than is currently explicitly supported by relational DBMSs such as CA-OpenIngres, Oracle, SQL Server, or any of the others. It contains most features of the relational data model as described by Dr. Codd. For example, it makes extensive use of domains, with integrity constraints specified per domain, which can be overridden per column where the domain is utilized. It also supports the SQL-92 standard -- it already makes provisions for the features described in the standard that are expected to be incorporated in future releases of most relational DBMSs. Figure 3 shows a table version screen of the system.

Unlike most DBMS utilities, such as CA-OpenIngres' facilities, SchemaDef, and VisualDBA tools, or SQLBase's Database Object Manager, all of which function directly on the database's system catalogs, the DBA Repository System functions proactively. All of the physical database definitions are first documented in detail in the repository database before they are implemented in an actual database. This enables the data administrator to perform detailed planning, impact analysis, and change control before any physical databases are changed. Because of the proactive implementation, you can obtain detailed documentation about the physical data model without first physically implementing it. Only during the physical implementation phase does the DBA run the actual scripts on the target development database or databases to create the tables, integrity checks, references, indexes, and so on.

During this phase, the objects extending beyond the basic relational data model are also defined. These include views, triggers, rules, stored procedures, and events. For example, views of arbitrary complexities can be defined over the defined tables and previously defined views, to integrate or relate the data in different tables or to restrict the data visible to an application.

Very few CASE tools provide for these objects. Using the Repository System, however, a DBA can define and implement all the extended features of a state-of-the-art relational DBMS.

Security Aspects

For a production environment, you must also consider all the security-related aspects of the data model. At this stage, we analyze and define various rights to the various entities on various objects in the data model. Examples include select rights to public, update rights to the maintenance groups, and application roles and additional rights to a separate user.

Very few CASE tools provide for these aspects. Using the Repository System, a DBA can document, implement, and manage all the access policies and rights of the security entities (the users, groups, and roles) that access the security objects (the tables, views, procedures, and events). For example, the DBA can specify which entities may access which tables, which entities may execute which stored procedures, which entities may raise which events, which entities may register for which events, and so on.

Transaction Modeling

At this stage, the developers should have finalized the design of their applications, modules, and transactions. The database administrator must analyze the transactions to check their impact on the physical data model and check the performance-related details of the physical data model. By checking how the transactions access the structures of the physical data model, the DBA can adjust the physical implementation details to ensure efficient transaction performance. During this phase, the DBA also considers other physical performance-related aspects, such as data volumes, disk locations for tables, storage structures, minpages, maxpages, fill factors, and so on. These are all documented in the Repository System, where they are used during the subsequent implementation phase.

Implementation and Grouping

During the implementation phase, we group the objects of the physical data model into various logical and physical units, and we deploy them on the various development, test, and production databases.

Before we implement the physical data model on any database, however, the various database objects are grouped into logical groupings, called schemas. For example, a schema can consist of a number of tables, with their referential checks, indexes, rules, and procedures. The entire schema can then be included as a unit in various databases. (See Figure 4.)

The DBA Repository System provides two database levels: logical and physical. There can be several types of logical databases, such as a branch database, a head-office database, and a warehouse database. Each logical database can be implemented as a number of physical databases (for example, a branch database can be implemented on two CA-OpenIngres installations in New York, one Oracle installation in San Francisco, and one SQL Server installation in Miami).

Deployment

When we implement the various databases, we use the Repository System to generate the scripts according to the definitions we have entered, and to run the scripts against the identified target databases. Using the Repository System, a DBA simply defines the tables, referential integrity constraints, indexes, views, and so on, declaratively. The Repository System generates the scripts for the DBA according to the definitions he/she has entered, and then runs the scripts against the identified target databases. The DBA can generate a single script to create an entire schema on a target database, or create an entire logical database on a target physical database.

It is important to note that the DBA Repository System never works directly on the system catalogs of a target database. Rather, it generates scripts for the physical database definitions, which are deployed on a target database using the standard SQL interface.

Permanence

When physical data model objects are used in production databases, they must be protected against change or corruption. To prevent users from changing a particular object version, the DBA can mark that object version as owned by him or her. When an object version is owned, only its owner can change its definition or ownership.

The DBA can also freeze the definitions of the database objects. When an object is frozen, nobody, not even its owner, can change it. For tables and indexes, however, the database administrator can freeze the definition (that is, the columns making up the table or the index) without freezing the tunable aspects (that is, the storage structure, the columns used in the storage structure, minpages, maxpages, and fill factors).

Performance

After deployment, you must monitor the test and production databases to ensure and maintain the efficient and acceptable performance of the implemented physical data model. The aim of performance analysis is to ensure fast throughput of user transactions, fast response times to the various users, and efficient utilization of the machine resources, including CPU, memory, and disk space.

During performance analysis, we monitor the transactions executing on the databases for locking and resource utilization. We also monitor table growth in order to determine the long-term effects of the transactions.

In addition to the DBMS-specific utilities, such as CA-OpenIngres' Interactive Performance Monitor (IPM) and SQLBase's SQL Console activity monitor, trace points, query execution plans (QEPs), and operating system utilities, such as sar, iostat, glance, and so on, there are many third-party utilities you can use to monitor performance.

During performance tuning, a number of issues unrelated to the data model can be optimized, including the operating system, the DBMS installation, and the actual transaction queries. Inefficient queries are often sources of bad performance. By inspecting the interaction of the queries with the data model, the DBA can usually point out bad queries and suggest improvements.

There are also several data model-related aspects that can influence performance, such as:

  1. Table and index columns: the columns used in table and index structures, as well as their placement in the structures.
  2. Table and index structures: the storage structures used for tables and indexes.
  3. Disk locations: the placement of tables and indexes on physical disks, as well as the placement of work areas, temporary storage areas and the various DBMS utilities, such as the log and journal files.
  4. Query statistics: bad query plans sometimes indicate that the query optimizer uses wrong or old statistics in estimating the QEPs.

The details of the columns, their placement, storage structures, tuning details (such as minpages, maxpages, and fill factors), disk locations, and optimization parameters can be managed from the Repository System. All of these aspects can be changed in the repository, and the modify or optimize scripts can be regenerated and re-run from the Repository System. The DBA Repository System caters to all the physical performance-tuning facilities provided by DBMSs such as CA-OpenIngres.

Next Phase Development

Many development projects iterate through various development cycles. During each cycle, the physical database model is usually altered. To facilitate this, new versions of the objects can be defined in the repository. You can even make a copy of a previous object version and change it.

The DBA Repository System makes extensive use of versions. Each database object can have a number of implementation versions, and the database administrator can specify different implementation details for each version. This means that, while the users work on version two of a table, the developers can be developing code for version three of the table, while the database administrator is experimenting with version four's structures. The versions also provide a change history.

Often in these cycles, or even during development, the developers want to change the tables, or it dawns on the database administrator that a table would be more efficient if it were implemented differently. The users and the developers obviously want to keep their test data through the change (the classic "alter table add column" requirement).

The DBA Repository System provides the functionality of a very powerful "alter table add column" statement. If performed manually, you would typically take the following steps:

  1. Create a new version of the table.
  2. Copy the columns, indexes, and references from the old to the new version.
  3. Make the necessary changes to the new version.
  4. Define a mapping from the old version to the new version. This can be a very complex mapping from an old table version to a new table version.
  5. Run an upgrade script to create the new version from the old version, which transforms and preserves the data through the defined mapping. In addition, when the new version of the table is created, the script can include everything defined for the table, including its integrity constraints, indexes, rules, and views.

Advantages

Using a repository-based system for physical database management has the following advantages:
  1. It is a single system used for all the database administration tasks, with a single, consistent, user-friendly interface.
  2. It is a central point of control for database administration -- the database administrator can manage all of the corporation's databases from a single system.
  3. It provides a central, controlled documentation source for database administration, data administration, and the developers.
  4. Because the Repository System generates and runs all the scripts, it eliminates all the syntax errors that occur through the subtle differences encountered in the various utilities and command statements.
  5. It's a complete inventory of all metadata.
  6. It is used in all the implementation phases of a database system.
  7. It provides cross-reference, usage, and impact analysis information, which is otherwise available only by querying the system catalogs or by using a catalog browser.
  8. It supports multiple database configurations on multiple installation sites.
  9. It documents all the different configurations of the organization's different DBMS installations in a central repository.

Releases 1.0, 2.01, 2.02, and 2.03 of the Repository System were designed and implemented on CA-Ingres (releases 6.3 and 6.4). Release 2.04 actually runs on CA-OpenIngres 1.1. The front end was developed in CA-Ingres/Vision, Ingres's 4GL code generator, with the help of a post-processor we developed to enhance the functionality provided by CA-Ingres/Vision.

All the scripts produced by the DBA Repository System are generated through template files. Other DBMSs are supported by tailoring the template files. We are currently generalizing and extending the DBA Repository System for the physical database management of a wide variety of relational database systems, such as CA-OpenIngres, SQL Server, Oracle, Sybase, SQLBase, and Borland Interbase, as well as for the running of the repository database on these systems.

Many of the features already provided by the Repository System are not yet directly supported by DBMSs such as CA-OpenIngres, Oracle, or SQL Server. Some of these features, such as tailorable integrity violation messages and unconventional referential integrity constraints, are not even outlined in the SQL-92 standard yet, and it will be some time before they appear in the popular DBMSs. But by generating rules, procedures, and triggers, the Repository System can already provide this functionality.

By applying repository technology to database management, we have developed a very useful tool for all the physical database management tasks throughout the database life cycle. We have used it to manage multiple and different CA-OpenIngres databases successfully on various machines in three of the biggest CA-OpenIngres projects in South Africa and in various in-house and pilot projects.

Future extensions will integrate the DBA Repository System more closely with third-party database monitoring tools, address repository standards, and, in the process, integrate closer with application design and development repositories.

Acknowledgments

I am grateful to my colleagues at The Data Base Approach for their dedicated and hard work on the DBA Repository System. I also wish to thank our clients at Portnet (a development partner of the DBA Repository System), the UNIBase project of Telkom SA, and Shoprite/Checkers for their support.


Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can reach Martin via the Internet at mr@dba.co.za.


FIGURE 1


--Managing multiple databases using a repository-based system.


FIGURE 2


--This figure shows the part of the database system life cycle that affects physical database management.


FIGURE 3


--This figure contains a table version screen in the DBA Repository System.


FIGURE 4


--A schema version screen containing database objects in the DBA Repository System.



Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
February 1996 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Monday, November 11, 1996.