DBMS
DBMS, September 1996
SQL for Smarties By Joe Celko

Databases on the Net

Joe explores SQL database resources on the World Wide Web.

Everybody but me has a Web page of their own. I am one of the pitiful home pageless, the people you see on the shoulders of the information superhighway. (Editor's note: the DBMS Web site has a Joe Celko page at http://www.dbmsmag.com/celko.html.) William (Bill) Kent of Hewlett-Packard Laboratories has put a collection of stuff he has written over the past few decades on the Web at http://www.hpl.hp.com/personal/Bill_Kent.

There is published and unpublished material on a variety of topics, most of it technical and some not. He was the founder of the X3H7 object standards group, but he also has a lot to say about data and databases. His book Data and Reality (Elsevier North-Holland, New York, 1978, ISBN 0-444-85187-9) has been one of my favorites for years, and he was on the X3H2 Committee, so I am giving this home page a shameless plug. Graeme Birchall set up shop at http://ourworld.compuserve. com/homepages/Graeme_Birchall. This is a place where you can download his DB2/V2 SQL Cookbook. He also provides a handy set of practical programming tips and tricks for anyone using DB2/V2 and others who might benefit from his SQL notes. If you need SQL help, you can go to http://www.inquiry.com/ techtips/thesqlpro. You can look up articles from a wide range of magazines - including DBMS - and even post your own questions. Over the years, I have given out a diskette with short SQL puzzles to the editors of local user group newsletters. These puzzles are now on the Web at http://www.inquiry.com/ techtips/thesqlpro/ (search the Tech Tips database for "Celko"), so I can save the postage I would have spent on a diskette.

Databases and Business on the Web

June was a busy month for databases and the Web. Open Environment Corp. was acquired by Borland for its Mambo product, a Web development kit that can connect you to Sybase, Informix, Oracle, and DB2 databases on a representative range of Unix platforms. Sybase made a partnership deal with Netscape for browsers and another deal with Verity for a text search engine. Centura Software Corp. announced plans for its Centura Web Developer, which will create Java applets and HTML from database queries. By the time you read this, there will be more database vendors and shrink-wrap shops with similar products.

But even if I get a home page and put a database on the Web, I will still be behind the curve because I am not running a business in Cyberspace. Providing access to a SQL database via the Internet is really not enough by itself to set up a virtual enterprise.

Back in May 1996, ExperTelligence Inc. announced its latest Web server, WebBase Pro, which added full text search from Verity, a Web data interchange agent, Agent3W, and a database-driven email merge engine, E-Merge. WebBase Pro has shopping baskets to track selections made on multiple Web pages, threaded forums, and remote management so you can set up a virtual shop with a day of training. At the time it was released, WebBase Pro was the only Web server with SQL database integration, full text search, and HTML document retrieval. You could go to the database, pull up a set of names, and then send customized mail-merge letters based on criteria taken from a database query.

Agent3W lets WebBase Pro conduct Web data interchange ("WDI") with multiple Web sites via the standard HTTP get and post commands. A company's Web site could show the order status to a customer and also embed the Federal Express shipping information in the message.

Empress Software came out with its DataWeb product in June 1996. This is a single CD-ROM with a subset of the Empress SQL database, an ODBC toolkit, an HTTP server, and an HTML toolkit. You must provide your own browser, such as Netscape or Mosaic. The product has several big advantages: It is cheap, small, complete, and fast. Empress is a Unix-based system, so you might need a copy of Linux to run it. What makes Empress the best choice for Web sites is that it is designed for BLOBs. Much as I like to read text, fancy graphics are what the Web is all about. Empress likes to quote a government benchmark in which they were 10 times faster than database "O" and over 20 times faster than database "S" in handling bulk data.

The Return of Old Friends

The first database I worked on was Image/3000 (later upgraded and renamed Turbo/Image) on an HP/3000 computer. This product was supposed to be a version of Total, another early mainframe database. Technically, the model was a networked database, but it really had only two layers. There were hashed Master records that linked to Detail Set record sets. The Detail Set records were chained together so you could navigate back and forth, sort the records, and generally make the Detail Set behave like a miniature sequential file. A Detail Set could have more than one Master and a Master could have several Detail Sets.

Until you have tried to convert one of these old database systems to SQL, you really cannot appreciate what a good idea the separation of logical and physical levels is.

Quantum Software sent me a pre-beta version of an internally developed tool that converts Image database schemas to one of several popular SQL product Data Declaration Languages (DDLs) and writes a program in HP Fortran to migrate the old data over. The conversion cannot ever be 100 percent complete, however, because the old Image code is just too weird. We used to concatenate string columns together to get one key column for the hashing algorithm that connects the Master and Detail Sets. Does anyone know if there are similar products for converting IDMS, IMS, and other hierarchical and network model databases? Any opinions as to how well they work?

Puzzle

Jack Wells (CompuServe 100442,3164) submitted this perplexing SQL problem in June 1996. His situation is pretty typical for SQL programmers who work with 3GL people. The programmers are writing a report on the employees, and they want information about each employee's current and previous salary status. The report will show the date of their promotion and the salary amount.

This is pretty easy if you can put each salary in one row in the result set and let the host program format it. In fact, that is the first programming problem for the reader.

Oh, I forgot to mention that the programmers are a bunch of lazy bums and want to have both the current and previous salary information on one row for each employee. This will let them write a very simple cursor statement and print out the report without any real work on their part.

Jack spoke with Fabian Pascal, author of Understanding Relational Databases (John Wiley, 1993, ISBN 0-471-58538-6) and SQL and Relational Basics (M&T Books, 1989, ISBN 1-55851-063-X), the week he was working on this problem, and Mr. Pascal replied that this query could not be done. He said, "In a truly relational language it could be done, but since SQL is not relational it isn't possible, not even with SQL-92." Sounds like a challenge to me!

Oh, I forgot to mention an additional constraint on the query: Jack is working in Oracle. This product is still not up to SQL-92 standards (that is, there are no proper outer joins, no general scalar subexpressions, and so on), so your query must run under the old SQL-86 or SQL-89 rules.

Assume that you have this test data:

CREATE TABLE Salaries
(emp CHAR(10) NOT NULL,
sal_date DATE NOT NULL,
sal_amt DECIMAL (8,2) NOT NULL,
PRIMARY KEY (emp, sal_date));

INSERT INTO Salaries VALUES ('Tom', '1996-06-20', 500.00);
INSERT INTO Salaries VALUES ('Tom', '1996-08-20', 700.00);
INSERT INTO Salaries VALUES ('Tom', '1996-10-20', 800.00);
INSERT INTO Salaries VALUES ('Tom', '1996-12-20', 900.00);
INSERT INTO Salaries VALUES ('Dick', '1996-06-20', 500.00);
INSERT INTO Salaries VALUES ('Harry', '1996-07-20', 500.00);
INSERT INTO Salaries VALUES ('Harry', '1996-09-20', 700.00);

Tom has had two promotions, Dick is a new hire, and Harry has had one promotion. (See Puzzle Answer.)


Joe Celko is a member of the ANSI X3H2 Database Standards Committee, a widely published author, and a consultant with OSoft Development Corp. in Atlanta. Joe is also a frequent contributor to the DBMS Forum on CompuServe. You can email him at 71062.1056@compuserve.com.
* Borland International Inc., 100 Borland Way, Scotts Valley, CA 95066; 800-245-7367, 408-431-1000, or fax 408-431-4122; http://www. borland.com.
* Empress Software, 6401 Golden Triangle Dr.; Greenbelt, MD 20770; 301-220-1919 or fax 301-220-1997; http://www.empress.com; email: sales@empress.com.
* ExperTelligence Inc., 203 Chapala St., Santa Barbara, CA 93101; 805-962-2558 or fax 805-962-5188; http://www.expertelligence.com
* Open Environment Corp., 25 Travis St., Boston, MA 02134; 800-562-5969, 617-562-0900, or fax 617-562-0038; http://www.oec.com; email: mambo@openenv.com
* Quantum Software, 10606 Mill Cir., Houston, TX 77070; 713-894-2145 or fax 713-894-5954; http://www.3k.com/quantum.html; email: quantum@nol.net.

PUZZLE ANSWER

First, let's do the easy problem. The answer is to use the query I call a generalized extrema or "top (n)" function and put it in a view, like this:

CREATE VIEW Salaries1 (emp, sal_date, sal_amt)
AS SELECT S0.emp, S0.sal_date, MAX(S0.sal_amt)
FROM Salaries AS S0, Salaries AS S1
WHERE S0.emp = S1.emp
AND S0.sal_date <= S1.sal_date
GROUP BY S0.emp, S0.sal_date
HAVING COUNT(*) <= 2;

The S1 copy of the Salaries table determines the boundary of the subset of two or fewer salary changes for each employee. The max( ) function is a trick to get the salary amount column in the results. This gives you one row for each of the first two salary changes for each employee. If the programmers were not so lazy, you could pass this table to them and let them format it for the report.

The real problem is harder. One way to do this, within the limits of SQL-89, is to break the problem into two cases: a) employees with only one salary action and b) employees with two or more salary actions. We know that every employee falls into one and only one of those cases. One solution is:

SELECT S0.emp, S0.sal_date, S0.sal_amt,
S1.sal_date, S1.sal_amt
FROM Salaries AS S0, Salaries AS S1
WHERE S0.emp = S1.emp
AND S0.sal_date =
(SELECT MAX(S2.sal_date)
FROM Salaries AS S2
WHERE S0.emp = S2.emp)
AND S1.sal_date =
(SELECT MAX(S3.sal_date)
FROM Salaries AS S3
WHERE S0.emp = S3.emp
AND S3.sal_date < S0.sal_date)
UNION ALL
SELECT S4.emp, MAX(S4.sal_date), MAX(S4.sal_amt),
NULL, NULL
FROM Salaries AS S4
GROUP BY S4.emp
HAVING COUNT(*) = 1;
If you run this, you should see these results:

emp sal_date sal_amt sal_date sal_amt
Tom 1996-12-20 900 1996-10-20 800
Harry 1996-09-20 700 1996-07-20 500
Dick 1996-06-20 500 (NULL) (NULL)

DB2 programmers will recognize this as a version of the outer join done without a SQL-92 standard outer join operator. The first select statement is the hardest. It is a self-join on the Salaries table, with copy S0 being the source for the most recent salary information and copy S1 the source for the next most recent information. The second select statement is simply a grouped query that locates the employees with one row. Because the two result sets are disjointed, you can use the union all instead of a union operator to save an extra sorting operation.

All of these features were in the SQL-86 standard, except the use of explicit nulls in the select list; if the nulls present a problem, they can be replaced by a blank string, zeros, or another missing data token. The only bad news is that many older products do not allow a union or union all in a view.

I invite readers to submit their own solutions to this problem, in SQL-92 or in SQL-89. The best answers will win a book of my choice as a prize.

- Joe Celko


Table of Contents - September 1996 | Home Page
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to mfrank@mfi.com
Updated Wednesday, September 18, 1996