DBMS, February 1996
DBMS Letters

Editor's Note: The following is Joe Celko's reply to Chris Date's letter from the January 1996 issue of DBMS.

Celko Fights Back

I think that much (if not all) of our disagreements are in terminology. Chris Date's terms are based on dividing the problem in a model and its implementation. Here are my definitions:

A (abstract) data type is a basic set of abstract primitives with certain properties. These are very high-level abstractions. The sets of all integers, all reals, all character strings, and so forth, are data types. Peano's Postulates would be the rules for integers. Data types are what we are trying to build into our computer hardware but cannot because, by their nature, data types tend to be infinite sets.

Defining data types as an abstract "Platonic ideal" has advantages. I simply design my programming language as if I had a true infinite integer data type on an abstract computer. I then trust the hardware to have an approximation that is good enough for my work.

The next step is to move from the abstract data type to a physical level where you have a (physical or user-defined or hardware -- pick your qualifying term) data type. If I have a 16-bit computer and a 32-bit computer, the floating point numbers will be based on the same data type: reals. However, the laws of floating point math are not quite those of real numbers. But they are well defined and built into my hardware, so I will use them as primitives.

These primitive (physical) data types and their hardware operators are used to construct what I am calling domains and, yes, I mean this in the sense of SQL-92. A domain is a data type with constraints and associated operators. A domain is all syntax, with no semantics.

At the next level, I want to use domains to model the attributes of an entity in my database. Thus, I might model the attribute "weight" as a floating point number with the constraints that it is greater than zero and that I can add weights together. Notice that "height" can also be modeled as a floating point number with the constraints that it is greater than zero and has addition, but they are different domains. I attach domains to attributes, but this is purely syntax.

If a data type and a domain were the same, a constraint could have a self-reference. This leads to formal paradoxes and practical problems, which is why SQL-92 explicitly forbids a CREATE DOMAIN statement constraint to use its own domain directly or indirectly.

A scale is a semantic concept that tries to attach meaning to a domain. The purposes of a scale are: to attach meaning to the value of an attribute in a uniform manner, and to attach meaning to calculations performed with other attributes of this and other entities.

One of the basic properties of scales is that operations must be done on the same type of scale. Another property is that you can form only certain compound units. There is a joke about a teacher asking his class, "What is 12 times 78?"

The first kid raises his hand and says, "Green!"

The second kid shouts out, "No, it's Thursday!"

The third child announces, "936, teacher."

"That's very good, Billy. How did you get your answer?"

"I divided green by Thursday!"

This is funny because the wrong answers are values on scales that are non-numeric domains, and Billy's division operation is illegal for either of the two domains.

This leads to a second level of rules among entities, attributes, and scales. This is semantics at the entity level. It is why I can multiply feet and pounds to get foot pounds of work when I move an object, but cannot multiply a man's height by his weight to figure out how much work he can do.

I separate these three concepts -- data type, domain, and scale -- because you can change any one without necessarily changing the other two. For example, assume I want to store temperatures in my database. I probably start with the scale ("we keep records in Celsius"), because it is the most abstract level. I then move to the domain ("we need whole numbers between -100 and +200"), and finally pick an available data type ("let's declare that column as integer NOT NULL"). Now I can make changes to my data design:

  1. If I decide to use the Fahrenheit scale, I can still use whole numbers and an INTEGER data type.
  2. If I decide to change the domain to four digits, I can still use the INTEGER data type and the Celsius scale -- I just modify my CHECK( ) clause.
  3. If I decide to change the data type to DECIMAL (5,2), I can still use the Celsius scale, and all of the old domain values are still represented. I gain more precision in my measurement, however, and can store values that the old data type could not.
In fairness, there are often relationships among the data type, domain, and scale of an attribute that require a change in one to trigger a change in the others. For example, if I were change a DECIMAL (5,2) column to INTEGER, I would need to decide how to get rid of the decimal places (round or truncate).

If there were no difference, then this would be fine. Using Date's example of quantitative (numeric scaled) and fuzzy (ordinal scale) temperature, I can find two reasons why they cannot be compared, in spite of their being drawn from the same data type. First, fuzzy temperature is one domain and quantity is another. Second, quantity is an absolute scale and fuzzy temperature is an ordinal scale; no matter what domain or data type you pick for them, they would not be comparable.

Taking Date's point number one, I would say that the data type of the fuzzy temperature scale is a character string. The domain is limited to the set of strings ('hot,' 'warm,' 'cool,' 'cold') and it has a collation from 'hot' to 'cold'. Concatenation will not work simply because it is not in the same domain as a regular character string.

What seems to be lacking in Date's modeling system, in which everything is a "data type," is a concept of an entity made up of attributes. I see entities that have attributes, which have values, which are drawn from domains, which are constructed from data types, which are based on abstract data types.

Another reader pointed out to me that although adding two temperatures together by themselves makes no sense, we can and do add temperatures by averaging them over mass. The particular material and other conditions determine the time required. In short, the behavior of temperature is linked to the entity to which it belongs because that entity will have other attributes. Semantics need entities.

The weakness of SQL is that the language is all syntax and no semantics. A table can model an entity (Date's Parts table) or a relationship (Date's Parts-Supplier-Job table) or something else. You cannot tell what a table means from looking at it.

I will concede that an OO database can capture more of the semantics and behaviors of what we are trying to model than a relational model. But I don't mind this, because I don't think that computers are good at semantics. Artificial intelligence's slow progress is proof of this.

I would also like to point out that sets defined by enumeration are fundamentally different from those defined by a rule. An enumerated set can only be finite or of size Alph Null; a set defined by a rule can be of any size. In Date's model, a set defines one of his data types; it is complete in itself. This means that two different floating point representations are also different data types because the two underlying sets of real numbers do not match.

I would say that both are based on the abstract data type, real numbers, and have different domains. I have a rule for determining that they are compatible. I might have some trouble working with them in the same database, but it is a domain problem, not a data type problem.

Furthermore, there are sets that can be defined by a rule, and that cannot be enumerated and do not behave nicely -- the Cantor sets, Julia set, the (3n+1) problem set, and so on. Many of these sets have the property with which you can find out whether any single element from the universal set from which it is drawn is either in the set or not, but you cannot find all of the elements of the set.

Irrational numbers are the most common example. They are defined by the rule that you cannot represent them as (p/q), where p and q are integers. If Date will please enumerate all the irrational between 0 and 1, he will have done better than any mathematician in all history. This should be easy, because there is proof that there are more irrational than rational numbers in that segment.

I agree that I would like to have a single temperature domain that I could convert to display in a scale of my choice. I know this is possible because I have several different interval or ratio scales for temperature, and I know what the conversions will be because of the kind of scales I have. I do not care if it is explicit or under the covers, the calculation will be the same. I would hide the scale conversion in VIEWS. -- Joe Celko

Perplexed Products

Thanks for the Velocis review in the December 1995 issue of DBMS (page 30). There was one mistake in the article; it's in the first sentence. Velocis is not the database that was once called db_Vista and is today called Raima Database Manager. Raima Database Manager is a different product and is still called Raima Database Manager. While Velocis is our client/server database with SQL and ODBC support, Raima Database Manager is a high-performance, royalty-free file server database.

Ted Kenney
Marketing Associate, Raima Corp.
Issaquah, Wash.

Thanks for the clarification. My intent was to show how Velocis has evolved through a succession of products -- not to retire any Raima products before their time. -- Paul R. Reed, Jr.

Addendum

In December (page 42), DBMS reported that Progress Software Corp. (Bedford, Mass.) shipped version 7.3 of its Progress application development environment, when, in fact, Progress has just shipped a much more significant release, version 8.0. Key additions to version 8.0 are SmartObjects and an Application Component Environment (ACE).We apologize for any confusion this may have caused. For more information on Progress 8.0, call 617-380-4000.


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
February 1996 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Monday, November 11, 1996.