
Everybody but me has a Web page of their own. I am one of the pitiful home pageless, the people you see on the shoulders of the information superhighway. (Editor's note: the DBMS Web site has a Joe Celko page at http://www.dbmsmag.com/celko.html.) William (Bill) Kent of Hewlett-Packard Laboratories has put a collection of stuff he has written over the past few decades on the Web at http://www.hpl.hp.com/personal/Bill_Kent.
There is published and unpublished material on a variety of topics, most of it technical and some not. He was the founder of the X3H7 object standards group, but he also has a lot to say about data and databases. His book Data and Reality (Elsevier North-Holland, New York, 1978, ISBN 0-444-85187-9) has been one of my favorites for years, and he was on the X3H2 Committee, so I am giving this home page a shameless plug. Graeme Birchall set up shop at http://ourworld.compuserve. com/homepages/Graeme_Birchall. This is a place where you can download his DB2/V2 SQL Cookbook. He also provides a handy set of practical programming tips and tricks for anyone using DB2/V2 and others who might benefit from his SQL notes. If you need SQL help, you can go to http://www.inquiry.com/ techtips/thesqlpro. You can look up articles from a wide range of magazines - including DBMS - and even post your own questions. Over the years, I have given out a diskette with short SQL puzzles to the editors of local user group newsletters. These puzzles are now on the Web at http://www.inquiry.com/ techtips/thesqlpro/ (search the Tech Tips database for "Celko"), so I can save the postage I would have spent on a diskette.
Databases and Business on the Web
June was a busy month for databases and the Web. Open Environment Corp. was acquired by Borland for its Mambo product, a Web development kit that can connect you to Sybase, Informix, Oracle, and DB2 databases on a representative range of Unix platforms. Sybase made a partnership deal with Netscape for browsers and another deal with Verity for a text search engine. Centura Software Corp. announced plans for its Centura Web Developer, which will create Java applets and HTML from database queries. By the time you read this, there will be more database vendors and shrink-wrap shops with similar products.
But even if I get a home page and put a database on the Web, I will still be behind the curve because I am not running a business in Cyberspace. Providing access to a SQL database via the Internet is really not enough by itself to set up a virtual enterprise.
Back in May 1996, ExperTelligence Inc. announced its latest Web server, WebBase Pro, which added full text search from Verity, a Web data interchange agent, Agent3W, and a database-driven email merge engine, E-Merge. WebBase Pro has shopping baskets to track selections made on multiple Web pages, threaded forums, and remote management so you can set up a virtual shop with a day of training. At the time it was released, WebBase Pro was the only Web server with SQL database integration, full text search, and HTML document retrieval. You could go to the database, pull up a set of names, and then send customized mail-merge letters based on criteria taken from a database query.
Agent3W lets WebBase Pro conduct Web data interchange ("WDI") with multiple Web sites via the standard HTTP get and post commands. A company's Web site could show the order status to a customer and also embed the Federal Express shipping information in the message.
Empress Software came out with its DataWeb product in June 1996. This is a single CD-ROM with a subset of the Empress SQL database, an ODBC toolkit, an HTTP server, and an HTML toolkit. You must provide your own browser, such as Netscape or Mosaic. The product has several big advantages: It is cheap, small, complete, and fast. Empress is a Unix-based system, so you might need a copy of Linux to run it. What makes Empress the best choice for Web sites is that it is designed for BLOBs. Much as I like to read text, fancy graphics are what the Web is all about. Empress likes to quote a government benchmark in which they were 10 times faster than database "O" and over 20 times faster than database "S" in handling bulk data.
The Return of Old Friends
The first database I worked on was Image/3000 (later upgraded and renamed Turbo/Image) on an HP/3000 computer. This product was supposed to be a version of Total, another early mainframe database. Technically, the model was a networked database, but it really had only two layers. There were hashed Master records that linked to Detail Set record sets. The Detail Set records were chained together so you could navigate back and forth, sort the records, and generally make the Detail Set behave like a miniature sequential file. A Detail Set could have more than one Master and a Master could have several Detail Sets.
Until you have tried to convert one of these old database systems to SQL, you really cannot appreciate what a good idea the separation of logical and physical levels is.
Quantum Software sent me a pre-beta version of an internally developed tool that converts Image database schemas to one of several popular SQL product Data Declaration Languages (DDLs) and writes a program in HP Fortran to migrate the old data over. The conversion cannot ever be 100 percent complete, however, because the old Image code is just too weird. We used to concatenate string columns together to get one key column for the hashing algorithm that connects the Master and Detail Sets. Does anyone know if there are similar products for converting IDMS, IMS, and other hierarchical and network model databases? Any opinions as to how well they work?
Puzzle
Jack Wells (CompuServe 100442,3164) submitted this perplexing SQL problem in June 1996. His situation is pretty typical for SQL programmers who work with 3GL people. The programmers are writing a report on the employees, and they want information about each employee's current and previous salary status. The report will show the date of their promotion and the salary amount.
This is pretty easy if you can put each salary in one row in the result set and let the host program format it. In fact, that is the first programming problem for the reader.
Oh, I forgot to mention that the programmers are a bunch of lazy bums and want to have both the current and previous salary information on one row for each employee. This will let them write a very simple cursor statement and print out the report without any real work on their part.
Jack spoke with Fabian Pascal, author of Understanding Relational Databases (John Wiley, 1993, ISBN 0-471-58538-6) and SQL and Relational Basics (M&T Books, 1989, ISBN 1-55851-063-X), the week he was working on this problem, and Mr. Pascal replied that this query could not be done. He said, "In a truly relational language it could be done, but since SQL is not relational it isn't possible, not even with SQL-92." Sounds like a challenge to me!
Oh, I forgot to mention an additional constraint on the query: Jack is working in Oracle. This product is still not up to SQL-92 standards (that is, there are no proper outer joins, no general scalar subexpressions, and so on), so your query must run under the old SQL-86 or SQL-89 rules.
Assume that you have this test data:
CREATE TABLE Salaries
(emp CHAR(10) NOT NULL,
sal_date DATE NOT NULL,
sal_amt DECIMAL (8,2) NOT NULL,
PRIMARY KEY (emp, sal_date));
INSERT INTO Salaries VALUES ('Tom', '1996-06-20',
500.00);
INSERT INTO Salaries VALUES ('Tom', '1996-08-20',
700.00);
INSERT INTO Salaries VALUES ('Tom', '1996-10-20',
800.00);
INSERT INTO Salaries VALUES ('Tom', '1996-12-20',
900.00);
INSERT INTO Salaries VALUES ('Dick', '1996-06-20',
500.00);
INSERT INTO Salaries VALUES ('Harry', '1996-07-20',
500.00);
INSERT INTO Salaries VALUES ('Harry', '1996-09-20',
700.00);
Tom has had two promotions, Dick is a new hire, and Harry has had one promotion. (See Puzzle Answer.)
| emp | sal_date | sal_amt | sal_date | sal_amt |
| Tom | 1996-12-20 | 900 | 1996-10-20 | 800 |
| Harry | 1996-09-20 | 700 | 1996-07-20 | 500 |
| Dick | 1996-06-20 | 500 | (NULL) | (NULL) |
DB2 programmers will recognize this as a version of the outer join done without a SQL-92 standard outer join operator. The first select statement is the hardest. It is a self-join on the Salaries table, with copy S0 being the source for the most recent salary information and copy S1 the source for the next most recent information. The second select statement is simply a grouped query that locates the employees with one row. Because the two result sets are disjointed, you can use the union all instead of a union operator to save an extra sorting operation.
All of these features were in the SQL-86 standard, except the use of explicit nulls in the select list; if the nulls present a problem, they can be replaced by a blank string, zeros, or another missing data token. The only bad news is that many older products do not allow a union or union all in a view.
I invite readers to submit their own solutions to this problem, in SQL-92 or in SQL-89. The best answers will win a book of my choice as a prize.
- Joe Celko