DBMS

Building a DBA Repository System

By Martin Rennhackkamp
DBMS, January 1996

Applying Application-Centric Repository Principles to Physical Database Management.


In this two-part series, I explain how you can use repository technology for physical database management. In this first part, I provide an overview of repository technology and the challenges of physical database management. I also discuss the shortcomings of existing repository-based tools with respect to physical database management, and describe how repository technology is applicable to physical database management. Next month, I will describe a repository-based system that my company has developed for physical database management.

Repository Technology

A repository system's purpose is to record and manage the information about the data models and the designs and implementations of the enterprise databases, applications, and systems. In other words, the repository system records and manages the metadata of the enterprise. The repository is a knowledge base that contains metadata about the information an enterprise keeps and accesses, the processes it performs to be successful, and how it has implemented its application systems to support its information processing needs. A repository system is illustrated in Figure 1.

Thus, a repository system is a database-centric application that manages corporate metadata and therefore its data and application systems. The result is an integrated and manageable corporate information resource. Repository technology is a key component in the integration of the enterprise's information environments. It is the technology used to integrate different data models, independent databases, diverse and heterogeneous database technologies, user applications, CASE tools, reference and directory management systems, systems administration, and object management services. The repository system presents a single, consistent view of the metadata describing all of these pieces.

Content

The metadata stored in the repository system typically includes the following: The repository system must not only track metadata such as physical data elements, screens, forms, and tables, but it must also track business rules, in which some data elements might be related to other data elements. For example, it must track which data elements appear on which forms, screens, and tables. Therefore, the information about the relationships between the data elements and the screens should also be maintained. The metamodel must contain all of the valid associations, in addition to the objects and properties that make up the metamodel.

Facilities

The repository system's facilities are implemented as database applications operating on normalized sets of databases, object bases, knowledge bases, and files to manage the metadata at all levels of abstraction. These facilities, which provide standardized services for data access, data sharing, and data management, include:

Functions

A repository system must support the storage, retrieval, version management, and configuration control of all the metadata used for the many aspects of application development. It must provide the user applications with a consistent scheme for managing data in a distributed information processing environment. The application processes must be able to share data across heterogeneous computer networks, yet remain insulated from the details of interaction with multiple types of hardware and software. The repository system must provide a uniform services interface to application processes, and it must maintain knowledge of the physical location of the data, as well as control the distribution thereof.

A repository system must provide the following functions:

Some code, such as data definition statements, can be automatically generated from the information stored in the repository -- as long as the repository and the DBMS are tightly integrated. The input for physical design is in the repository (definitions, names, lengths, and formats). With these elements, you can create constructs for testing and later move them to production. Because database definitions are usually DBMS-specific, the DBA will have much less work to do if the repository can address the DBMS in its own terms (for example if the repository can generate the data definitions in a syntax supported by the DBMS, or, even better, if it can run the database definitions directly against the databases managed by the DBMS).

The repository's role is to identify the impact made when programs alter the way they use data or when the data structures change. An analyst must be able to determine the impact of these changes, even if the only resource is a list of potentially affected relationships. Cross-referencing is good for more than just impact analysis. Using even a partial cross-reference with the actual program descriptions results in "asset management"; that is, the ability to increase the reusability of subroutines or other program code. Impact analysis should also examine the nature of the following business rules affected by the change:

In multi-release impact analysis, you must perform the same impact analysis for each application version in order to determine which application components have been affected by a change. Another change-management challenge occurs when the release schedule is restructured. When this occurs, you must conduct careful impact analysis.

Physical Database Management

Database management, in the generic sense, covers the entire life cycle of all of an organization's databases, from requirements definition through logical database design to physical database design, implementation, and performance tuning on various target databases. Logical database design is performed by a data administrator. This administrator documents the logical data model, typically using a CASE tool. The goal is to represent the users' business needs as a consistent and correct logical data model. The data administrator conducts JAD sessions with users to establish a global logical data model, which is sometimes partitioned for particular application areas.

A DBA performs physical database management. Physical database management carries the logical data model forward to physical design. The goal of physical database management is to implement and manage the logical data model as an efficient, consistent, and correct physical data model.

A large organization typically has a number of databases, often in various stages of development (for example, regional production databases in its regional offices, a central production database in its head office, an evolving warehouse database for statistical processing, and various test and development databases). Developers must manage the implementation of the physical data model on the various databases of the enterprise.

Obviously, physical database management is complex. DBAs must manage multiple physical implementations of the same or different logical data models in different databases, and concentrate on tuning various aspects of these different databases to improve performance. In addition to the widely diverging performance tuning aspects, DBAs must also manage various other implementation-related objects, such as triggers, user-defined procedures, rules, events, and views.

As an example, see Tables 1 and 2. In the sales branch's database, the Employee table must be indexed on the EmployeeName column, preferably using a b-tree or ISAM structure that allows partial key searches. In the head-office database, the Employee table must be indexed on the unique EmployeeID key column and on the repeating Department foreign-key column, both preferably using some hash structure that allows direct key look-ups. In the warehouse database, however, the view (see Table 3) of these two tables would be used.

In addition, on the branch databases, the DBA must implement the business rule that employees in the sales department may not take more than 30 days leave, while employees in other branches may not take more than 25 days leave. On the head-office database, the DBA must also implement the business rule that there may only be a total of 25 employees in all the sales departments country-wide, and only eight employees in the R&D department country-wide. Meanwhile, in the IT section, the programmers are changing the human resources system to work with the Employee table (see Table 4), which would cater to new taxation laws.

Available Tools

The database management function must manage complex, interrelated objects such as table definitions, integrity constraint definitions, business rules, views, stored procedures, and performance improvement facilities. In an environment with multiple DBMSs, this is even more complex because each vendor's implementation of these database objects differs. For example, CA-OpenIngres implements business rules through stored rules and procedures, while Oracle and Sybase implement business rules through triggers. However, Oracle's and Sybase's SQL syntax triggers differ significantly.

Software developers work in a slightly different environment than that of a DBA. They work with different versions of program code, written in different programming languages (such as C, C++, Cobol, Visual Basic, and various 4GLs), addressing different target systems (such as Windows for DOS and Motif for Unix), packaged in different releases, and targeted for deployment on different development, test, and production systems, in ever-changing release cycles. Software developers can, however, use repository-based tools to develop and manage the complex environment of interrelated software components in various stages of development and deployment.

Most repository-based tools, including CASE tools, address only the subject area of the data administrator. Most CASE tools have the facilities to generate simple ANSI SQL database definition scripts, and most even have the facilities to generate SQL database definition scripts for specific target systems, such as CA-OpenIngres, Oracle, and Sybase. However, few of these CASE tools address the full complex environment managed by the DBA. Very few CASE tools provide for different logical data models on different databases, and even fewer provide for different physical models or implementations on different databases.

One of the CASE tools that has made the most progress in this area is Sybase's S-Designor. It lets you map a logical data model (defined in terms of entities, attributes, and relationships) to various physical data models (defined in terms of tables, columns, and constraints), as supported by different DBMSs (for example SQL table definitions in the SQL dialects supported by Sybase, Oracle, and CA-OpenIngres). For a discussion and a listing of CASE tools, see "The Evolution of Client/Server CASE," on page 67.

Most CASE tools will allow multiple versions of the Employee table. This approach is very useful for the evolution of the Employee table used in the branch and head-office databases. Few CASE tools, however, let you associate versions 1 and 3 with the branch and head-office databases, while version 2 is deployed on the warehouse database.

Very few CASE tools provide for physical tuning aspects, such as table structures, fill factors, and re-indexing scripts. For example, few CASE tools make the distinction that: tables can be structured as b-tree, hash, heap, or ISAM structures in CA-OpenIngres, and as b-tree or hash structures in Oracle; you can specify a table preallocation size in CA-OpenIngres; and you must specify a table preallocation size in Oracle. Very few CASE tools address the extended features provided by the more advanced RDBMSs such as CA-OpenIngres, Oracle, and Sybase. These include objects such as procedures, rules, triggers, events, and views. For example, few CASE tools let you construct a view as a select query about existing tables and other views, with full cross-referencing information on which columns are used in which views. Finally, few CASE tools let you capture stored procedure and trigger definitions, let alone address the subtle differences between implementation and syntax (for example, between CA-OpenIngres, Oracle, and Sybase).

A few repository-based CASE tools do address the interface between logical database design and physical database implementation through simple SQL script-generation facilities. However, the database management function must still use the tools and facilities of the operating systems and the DBMSs to manage physical database management in the complex database environment.

Applying Repository Technology

Despite the challenges, it makes perfect sense to utilize repository technology for physical database management. If you consider the contents of the repository system, the following applies to physical database management: When you consider the functions provided by a repository system, the following applies to physical database management: The most important analogies between the functions provided by conventional repository systems and a repository system adapted or extended for physical database management is shown in Table 5.

The proliferation of repositories is a big problem. The "theoretical" correct approach is to use one standard, integrated repository for all these functions -- which results in totally integrated metadata management. In practice, however, this does not happen (yet), because all these products have their own, fairly proprietary, repositories.

In the DBA Repository System (which I explain in detail next month), we took the viewpoint that most CASE tools can generate SQL-92 syntax (at least some adherence to standards!), with some variations for each particular DBMS. We therefore provide an import facility with which you can "synchronize" the DBA Repository from your CASE tool repository. It doesn't blindly bulk-load the metadata into the repository database; rather, it matches and resolves previous versions of the defined objects where it can. This is how we overcome the workload issue -- no one, not even different developers on the same project, wants to capture the metadata twice, because that would leave open the possibility for discrepancies between the logical and physical data models.

Until Next Month. . .

DBAs and application developers work in equally complex environments. They perform similarly complex tasks in order to manage multiple interrelated database objects for different target systems. DBAs inspect and manage the same managed objects, but from a different perspective -- they actually manage the database objects traditionally cross-referenced from applications, and they cross-reference the application objects themselves. However, they do not have highly integrated repository systems at their disposal.

Although some of the repository-based development systems cater to different logical and physical models, and although some of them can help the DBA deploy the physical model on development, testing, and production databases, they do not provide sufficient functionality for the DBA's day-to-day tasks. For example, they do not provide functions for re-indexing tables, re-gathering optimizer statistics, and re-allocating tables to different physical devices. On the other hand, the tools traditionally used by the DBA to perform these tasks, be they from a third party or part of the DBMS itself, are usually not repository based. They only contain the present state of the database objects, without facilities to document or manage previous or future versions of the database objects.

I have tried to show in this article that you can apply the functions performed by a conventional application-centric repository system equally well to physical database management. It's just the focus that changes. The repository objects that DBAs would use most are the actual database objects (that is, the tables and views to be accessed), with the relationships, integrity, and other constraints defined among them, as well as the where-used information (that is, which tables and views are used in which queries and applications).

Next month, I will describe a repository-based system that we at The Data Base Approach have developed for physical database management.


FIGURE 1


--A repository system. The repository is a knowledge base that contains metadata about the information an enterprise keeps and accesses, the processes it performs to be successful, and how it has implemented its application systems to support its information processing needs.


TABLE 1. The Employee Table

EMPLOYEE
EmployeeID EmployeeNameDepartment
1345John WalkerSales
4692Sue Brown Repairs
1467Pam JohnsonR&D


TABLE 2. The EmployeeLeave Table

EMPLOYEELEAVE
EmployeeIDFromDate ToDate
13451 Jan 199515 Jan 1995
13455 Apr 199510 Apr 1995
46921 Jan 19958 Jan 1995


TABLE 3. The Employee view of Tables 1 and 2

EMPLOYEE
EmployeeIDEmployeeNameYear DaysLeave
1345 John Walker199522
4692Sue Brown19956
1467Pam Johnson19950


TABLE 4. The Employee table with taxation laws included

EMPLOYEE
EmployeeID EmployeeName SocialSecurityNumberDepartmentAppointmentDate
1345 John Walker 34-89675-234 Sales 4 Jun 1980
4692 Sue Brown 67-11201-829 Repairs 1 Jan 1994
1467 Pam Johnson45-24011-114 R&D1 Apr 1992


TABLE 5.

ASPECTAPPLICATION-CENTRIC REPOSITORY SYSTEMSPHYSICAL DATABASE MANAGEMENT REPOSITORY SYSTEMS
Share repository metadata across business units Application developers share global definitions and application libraries DBAs share physical database objects and definitions
Multiple versions of models Versions of application design and implementation documents Versions of logical and physical database design documents
Multiple versions of object definitionsVersions of program code, shared libraries, test scripts, and so onVersions of table and view definitions, trigger code, database definitions, and so on
Adaptable/extendable meta-model Application development approaches changeObject types stored in databases change
Existing frameworks and standards John Zachman's Information Systems Architecture (ISA) framework; AD/Cycle framework John Zachman's Information Systems Architecture (ISA) framework; ANSI SQL standard
Descriptions of objects for analysis, design and implementationApplication models, application designs, program code librariesData models, database designs, database definition scripts
Change-management historiesVersions of application designs; versions of program code Versions of database designs; versions of database definition scripts
Cross-referencing documentationInterrelated application objects (such as procedures calling other procedures); database objects used by applicationsInterrelated database objects (such as tables used inviews and stored procedures); application components accessing database objects
Component engineering informationCode in functions in procedures in application systemsColumns in tables in views in stored procedures in schemas in logical databases in physical databases
Code generationCode libraries, programs, application systemsDatabase management scripts, database definition scripts
Change/impact analysis Application components affected by changes in other application components Application components and database objects affected by changes to other database objects
Comparison of conventional application-centric repository systems and physical database management repository items.


Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can reach Martin via the Internet at mr@dba.co.za.
Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
January 1996 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Sunday, December 1, 1996