DBMS

Letters

DBMS, January 1996

Farmer Faux Pas

Obviously Mr. Labbauf, one of those who contributed an opinion about Windows 95 in your November issue (From the Editor, page 89), is a stranger to farms. His expression "bail and wire and bubble gum" should be "baling wire and bubble gum," a well-known rural engineering technique.

Patrick J. Brennan
Virginia Beach, Va.

The Debate Continues . . .

I refer to the October 1995 letter from Joe Celko. The principal technical disagreement between Celko and myself (at least, in the area under discussion) concerns the difference between types and representations. Let me lay a little groundwork:

Thus, types are a model consideration, while representations are an implementation consideration. As a trivial example, consider the type FLOAT in SQL. The representation of FLOAT numbers is not specified in the SQL language, and users don't need to know what it is; what they do need to know is the operations they can perform on FLOAT numbers -- "+", "*", ">", and so on. And that's all they need to know. To quote Luca Cardelli and Peter Wegner ("On Understanding Types, Data Abstraction, and Polymorphism," ACM Comp. Surv. 17, No. 4, December 1985): "A major purpose of type systems is to avoid embarrassing questions about representations, and to forbid situations in which these questions might come up."

Now I turn to Celko's letter. At one point, Celko seems to agree with the foregoing, when he says that "numbers . . .and numerals . . .are [not the same thing]" -- that is, numbers are the type, numerals are the representation. Very confusingly, though, he uses the term type to mean representation -- which is presumably why he also says "I do not see how domain and data type can [possibly] be the same." The source of this unorthodox terminology is probably the SQL language, which is thoroughly confused in this whole area (my FLOAT example notwithstanding).

However, Celko also gives an extensive example that suggests that he does not properly understand the type vs. representation distinction after all: "I want to represent [sic] temperatures. Degrees Kelvin is one possible scale. I could use Celsius or Fahrenheit instead; they all use numeric values. Or I could use the scale ('hot,' warm,' 'cool,' 'cold'); the data type is character string. What is the right choice? [Given a certain application area,] I will probably prefer Kelvin because it is expressed with a numeric data type. I can use arithmetic operators with numeric data types, and I will need to do calculations. [But] you cannot add temperatures . . ." (somewhat edited and condensed from Celko's letter).

I have two major observations on this example. The first has to do with the type vs. representation distinction and is fundamental. The second is not fundamental but has to do with a question of good design.

  1. The data type is not character string for "the scale ('hot,' 'warm,' 'cool,' 'cold')"; the representation is character string. We're not going to perform substring or concatenate operations on temperatures! Likewise, it's the representation, not the data type, that is numeric for the Kelvin and Celsius and Fahrenheit designs; as Celko himself points out, we're not going to add temperatures.
    No: The operations we're going to perform on temperatures are, precisely, the operations that are defined for the temperature type, whatever those might be.
  2. Celko also asks which of his possible designs is the right choice. Presumably he's saying we could have a domain called, say, K_TEMP, whose values are measurements on the Kelvin scale; or a domain called, say, C_TEMP, whose values are measurements on the Celsius scale; or several other analogous possibilities.

Personally, I think none of these proposals is the best choice. Rather, I would have a single "temperature" domain, with operators to expose temperature values as "number of degrees Kelvin," "number of degrees Celsius," "number of degrees Fahrenheit" -- even as "hot," "warm," and so on, if desired. One advantage of this design is that it avoids the need for explicit conversion functions from one scale to another (any such conversions would be completely under the covers). Another is that it affords a greater degree of data independence (we can change the representation from, say, Kelvin to Fahrenheit with logical impunity).

Celko makes many further points in his letter with which I strongly disagree. If I went into detail on all of them, however, this would be a very long letter. Let me therefore just make a few apparently dogmatic assertions here, without trying to provide any evidence in support of them (I can provide such evidence on request):

To close on a positive note: Celko's letter does include one point on which (I think) we are partly in agreement! He suggests that it might be possible to express a given calculation in two distinct but equivalent ways, one involving an explicit multiplication of two weights and the other not. So if we've "disallowed the operation weight times weight at the data type level," one of these two distinct but equivalent expressions might be legal and the other not.

I agree this is a valid concern (at least in general, though I'm not sure it's valid in the case of multiplying two weights in particular). I've pointed out elsewhere that, for example, if we can't add two dates together, we obviously can't go on to divide that sum by two to obtain the date midway between the given ones -- yet this latter would be a useful and valid thing to do. (This criticism applies to SQL, incidentally.) But the moral isn't that types should simply "inherit" operations from their representations, as Celko suggests, but rather that data type definition needs to be done carefully. What's more, this moral applies at least as much to built-in types as it does to user-defined ones.

C.J. Date
Healdsburg, Calif.


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
January 1996 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Sunday, December 1, 1996