DBMS, May 1997
DBMS Online: SQL for Smarties By Joe Celko

Parlez-vous SQL?

Joe Discusses New Names, Old Languages, the Future, and Graph Theorems.

Name change. No, not mine; the ANSI X3H2 Database Standards Committee. On January 23 the ANSI "Accredited Standards Committee X3 -- Information Technology" became the National Committee for Information Technology Standards (NCITS, pronounced "insights"). We will have done an ON UPDATE CASCADE on all of our forms by the time you read this. The new Web site is www.ncits.org, if you need more information.

The official press release said, "While the scope of standardization activities remains the same, the emphasis is different. NCITS's mission is to produce market-driven, voluntary consensus standards in the areas of multimedia (MPEG/JPEG), intercommunication among computing devices and information systems (including the Information Infrastructure, SCSI-2 interfaces, Geographic Information Systems), storage media (hard drives, removable cartridges) and programming languages (such as C++)," but it did not mention databases or just what that means in real terms.

I Am Not Making This Up

BUZZwords, the Georgia Tech Alumni Association email newsletter (alumni@mordred.gatech.edu), reported that two French language watchdog groups are suing Georgia Tech Lorraine because the Web site for the Tech campus in Metz, France is only available in English. The campus consists of approximately 60 graduate students in electrical and computer engineering degree programs. Courses are taught in English by Georgia Tech faculty assigned to Georgia Tech Lorraine on a rotating basis.

The Association for the Defense of the French Language and the Association for the Future of the French Language charge that the Georgia Tech Lorraine Web site violates a 1994 French law that forbids the sale of "goods and services" in France in any language other than French, unless accompanied by a French translation.

The two groups cite the 1994 law in the lawsuit, demanding that all Web site information accessible from France or related to a French organization be translated into French. The French courts ruling could force Georgia Tech Lorraine and other institutions with Internet sites to conform to the rarely enforced language law.

At a preliminary hearing on October 28, 1996, Maitre Schaefer, an attorney representing Georgia Tech Lorraine, argued for the dismissal of the lawsuit, noting that as Internet surfers use hypertext links to move from site to site, they also may move invisibly from a server located in France to one in another country. Georgia Tech Lorraine argued that it is not feasible to provide a French translation of every Web page accessible from France or related to a French organization.

Georgia Tech Lorraine also contends that the law specifically excludes the facility and contradicts the European Charter of Human Rights, and that the watchdog groups filed the suit without following proper procedures.

After hearing arguments from both sides on January 6, a judge took the case under consideration until February 24. "Due to scarce resources, the initial version of the Georgia Tech Lorraine Web site was primarily developed in English," said Dr. Hans Puttgen, director of Georgia Tech Lorraine. "As the scope of Georgia Tech Lorraine is widened, multilingual versions will be progressively developed."

Last fall Schaefer attempted to settle the matter amicably by offering to add a French summary to the Web pages, but opposing parties refused the offer. I would like to ask why anyone would want a Web site in a country with a shrinking minority language and a national policy that forbids private encryption of data, which would interfere with the government's ability to read any private or commercial email.

Do any of your customers live in France? Can a Frenchman get to any of your databases (prices, inventory, current status of parcels, and so on) via the Internet and be offended by the lack of an all-French version? Some automatic translation products for Web pages are now available, but they are not really meant to do on-the-fly, bulk translations of complete databases. As data becomes more global, we are going to have to build databases to ISO formats with SI (metric system) units and UTC timestamps, and the rest of the world is going to have to learn English. At least it is possible to read and speak English badly in a short time; other languages require months or years of study to be read and spoken badly.

Toffler and Negroponte

Speaking of Georgia Tech, my alumni and press connections got me into a presentation that Tech sponsored with The Leader Center (Atlanta, Georgia; 404-250-0092). The one-day show featured Alvin Toffler in the morning and Nicholas Negroponte after lunch.

You should know both names. Toffler and his wife are famous futurists who wrote Future Shock, The Third Wave, and most recently, Creating A New Civilization: The Politics of the Third Wave. Negroponte is the founder and director of the Media Lab at MIT and author of Being Digital.

Toffler hosted a panel that discussed the differences in Second and Third Wave societies by looking at their social, economic, and political systems. Negroponte spoke about the information society vs. the industrial society.

Both men offered a ton of applicable ideas, and I have no space to begin to cover them in this short column; if you have not read the work of these two, you need to, or you will not understand the world and what the computer is doing to it.

Let me tell one story. Negroponte began his part of the show by telling the audience -- around 400 people in a large banquet room in the Cobb Center -- to start clapping in unison while he turned his back and timed them. The crowd started off as an uncoordinated cacophony but picked a beat and fell into step in about four seconds. Doing this exercise all over the world, Negroponte found only one audience that took longer than five seconds: the Italian Senate, which took over 25 seconds (in fairness, I'll mention that their meeting hall has a marble dome that echoes). I cannot think of a better example of self-organizing order.

Graphs and GUIs I want to get back to the theme of graphs in SQL that I started in my April column. This is a classic called "the utility puzzle"; you have three utilities (x, y, and z) on one side of the street and three houses (a, b, and c) on the other side, and you have to connect them without crossing any utility lines.

The gimmick is that you cannot do the puzzle on a plane --it works on a torus (doughnut) or a sphere, but not on a plane. The Planar Graph theorem, which is due to Kasimir Kuratowski, says that a graph is planar if and only if it has no subgraph in it that can be mapped to the two graphs named K(5) and K(3,3). The K(3,3) graph (see Figure 1) is the utility puzzle graph; the K(5) graph looks like a pentagram. (See Figure 2.)

You might remember another puzzle called the Four Color Map theorem from high school. If I give you a map and tell you to color the countries so that no two countries that share a common border have the same color (corners don't count), what is the minimum number of colors you need to use? The answer is always four, if the map is flat (for example, drawn on a plane). You can convert your map into a graph by changing the countries into nodes and their common borders into edges.

Kenneth Appel and Wolfgang Haken proved the Four Color Map theorem in 1976 by trying all possible cases on a DEC minicomputer during off hours at a university; it took over 1,200 hours of computing time.

Now look at your favorite GUI query builder or CASE tool for whatever SQL you are using. They both draw graphs on a flat screen, showing tables as boxes (nodes) and the joins or primary- and foreign-key relationships as lines (edges). They also try to avoid crossing lines (say, to generate planar graphs). This means that some fairly simple database schemas and queries cannot ever be represented by GUI tools. The usual solution is to introduce some language constructs into the GUI (the query box in QBE), which destroys the graphical basis of the tool; to generate bad code; or simply to choke.

This is one reason I hate GUI tools and recommend that people never use them. You can write simple queries fast and get the feeling that you understand SQL. But you will never write anything but simple queries -- and you will do a bad job with a lot of them, too.

Just for kicks, from one of your databases take two totally unrelated tables that both have a date column in them but no direct link between the two. Try to write a query to pair a value in one table with a value in the other table based on a common date -- something like this:

SELECT D1.duty_date, guard_name, missing_cookies
FROM DutyRoster AS D1, Thefts AS T1
WHERE D1.duty_date = T1.theft_date
ORDER BY D1.duty_date;

What you will get instead is a long chain of tables joined together via primary- and foreign-key relationships -- what we would now call a path, having had graph theory lessons.

Puzzle

Frank Langel, an Information Technology student at the University of Bamberg, asked about a problem he was having with a report query. The situation involves three tables. Tables A, B, and C all have a field called amount, and we want to calculate (A.amount -- B.amount -- C.amount).

But the gimmick is that not all rows in A have a corresponding row in table B or in table C. When a row does match, there is a one-to-many relationship between A and B, and a one-to-many relationship between A and C.

CREATE TABLE A
(keycol INTEGER NOT NULL PRIMARY KEY,
amount DECIMAL (5,2));
INSERT INTO A VALUES (1, 2.50);
INSERT INTO A VALUES (2, 3.50);
INSERT INTO A VALUES (3, 4.50);
INSERT INTO A VALUES (4, 5.50);
INSERT INTO A VALUES (5, 6.50);

CREATE TABLE B
(keycol INTEGER NOT NULL,
keycolb INTEGER NOT NULL,
amount DECIMAL (5,2),
PRIMARY KEY (keycol, keycolb));
INSERT INTO B VALUES (1, 1, 2.00);
INSERT INTO B VALUES (1, 2, 2.00);

CREATE TABLE C
(keycol INTEGER NOT NULL,
keycolb INTEGER NOT NULL,
amount DECIMAL (5,2),
PRIMARY KEY (keycol, keycolb));
INSERT INTO C VALUES (5, 1, 2.00);
INSERT INTO C VALUES (5, 2, 2.00);
Langel's first step was to test left outer joins between A and B, using the isnull( ) operator to convert nulls into zeros. Both of these queries worked fine in his particular SQL.

SELECT A1.amount - ISNULL (SUM (B.amount), 0) AS
total
FROM A AS A1 
LEFT OUTER JOIN 
B ON A1.keycol = B.keycol
GROUP BY A1.amount;

Results
total
======
-1.50
3.50
4.50
5.50
6.50

SELECT A2.amount - ISNULL (SUM (C.amount), 0) AS 
total
FROM A AS A2 
LEFT OUTER JOIN 
C ON A2.keycol = C.keycol
GROUP BY A2.amount;

Results
total
======
2.50
3.50
4.50
5.50
2.50

His problem is that when he executes the two outer joins separately, he always gets five rows (because table A consists of five rows), but when he did the combined query:

SELECT A1.keycol, A1.amount 
- ISNULL (SUM (B.amount), 0.00) 
- ISNULL (SUM (C.amount), 0.00), .... 
FROM A AS A1 
LEFT OUTER JOIN B ON A1.keycol = B.keycol, 
A AS A2
LEFT OUTER JOIN C ON A2.keycol = C.keycol 
WHERE A1.keycol = A2.keycol 
GROUP BY A1.keycol;
the query did not work. Why? Can you come up with a query that does work? (See Puzzle Answer)


Joe Celko is an Atlanta-based guru with Northern Lights Software Ltd. and a member of the ANSI X3H2 Database Standards Committee. He is also the author of two books on SQL: SQL For Smarties (Morgan-Kaufmann, 1995) and Instant SQL Programming (Wrox Press, 1995). You can contact Joe via CompuServe at 71062,1056 or via email at 71062.1056@compuserve.com.


Figure 1.


--The K(3,3) graph for the utility puzzle. The puzzle works on a torus (doughnut) or a sphere, but not on a plane.

Figure 2.


--The Planar Graph theorem, which is due to Kasimir Kuratowski, says that a graph is planar if and only if it has no subgraph in it that can be mapped to the two graphs named K(5) and K(3,3). The K(3,3) graph is shown in Figure 1; the K(5) graph looks like a pentagram.


ANSWER

The first problem is that the group by syntax is wrong and will not work. A1.keycol is the only grouping column; A1.amount does not appear as either an aggregate or a grouping column.

We want to total the (A+B+C) amounts, even when B and C are missing. If we clean out the group by clause, we get:

SELECT A1.keycol, (A1.amount 
- COALESCE (B.amount, 0.00) 
- COALESCE (C.amount, 0.00)) AS total 
FROM A AS A1 
LEFT OUTER JOIN B ON A1.keycol = B.keycol, 
A AS A2
LEFT OUTER JOIN C ON A2.keycol = C.keycol 
WHERE A1.keycol = A2.keycol;
Results
keycoltotal
=================
1.50
1.50
2 3.50
34.50
45.50
54.50
54.50

which gives us results with all five rows using this data, as Langel wanted, but this not right yet because it cannot handle the situation where more than one row in B and more than one row in C match the A row. Notice the multiple rows with keycol equal to 1 and 5. The final query looks like this:

SELECT A1.keycol, (A1.amount
- SUM (ISNULL (B.amount, 0)) 
- SUM (ISNULL (C.amount, 0))) AS total 
FROM A AS A1 
LEFT OUTER JOIN B ON A1.keycol = B.keycol, 
A AS A2 
LEFT OUTER JOIN C ON A2.keycol = C.keycol 
WHERE A1.keycol = A2.keycol 
GROUP BY A1.keycol, A1.amount;
Results
keycol total
=================
1-1.50
2 3.50
34.50
45.50
52.50

Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
May 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Monday, April 14, 1997