DBMS, December 1996
DBMS Online: SQL for Smarties By Joe Celko

Public and Private Data

Superbase survives, and Microsoft's SQL Server 6.5's SQL-92 conformance glows and woes.

A recent issue of World Patent Information (Vol. 18 No. 2, 1996) contains an article titled "The Information Dissemination Policy of the United States Patent and Trademark Office," written by Wesley Gewehr, Administrator for Information Dissemination at the Patents and Trademarks Office (PTO). The PTO makes money selling information about patents in the form of searches, CD-ROMs, and paper copies. According to the figures in the article, placing this information onto the Internet for free access will cost the PTO about $1,400,000 a year (the revenue from magnetic tape sales plus text searches).

But the PTO claims that fees from all of their information services merely recover costs, and that it makes no net profit. If so, then it should not matter to the PTO if its database is on the Internet. But instead this article claims that making patent information available on the Internet "is not a role that we believe a national patent office can play effectively." This statement is contradicted by the article itself, which does not even mention that the PTO already has an Internet patent abstract database that only lists magnetic tapes and CD-ROMs. Until a few years ago, buying the text files on magnetic tapes cost about $40,000, but the price is now $120,000. This makes no sense. Instead of copying the information from a database to magnetic tapes, then storing, shipping, and billing people thousands of times, the costs of running a text-only Web site would consist of buying one of several Web site database access packages that would let people search the existing data.

In fact, the PTO also keeps refusing to switch from the royalty-protected Dataware search engine used on their CASSIS CD-ROMs to a royalty-free search engine. By using a proprietary search engine, the PTO effectively prohibits low-cost CD-ROMs that could be sold to the public for approximately a dollar a copy.

As an aside, The New York Times (July 5, 1996, page D2) reported that the U.S. Postal Service canceled the 20231 zip code that belongs exclusively to the Patent and Trademarks Office. Loss of the zip code meant that about 50,000 pieces of mail were rejected by the Postal Service systems en route. The Postal Service is attributing the problem to a software glitch.

Data That Is Too Public

While at least one government agency is not giving out enough public information, Lexis/Nexis, a company in the private sector, began selling a database of personal information, the P-TRAK file, with Social Security numbers in the records. The practice received so much bad publicity that the company discontinued the display of Social Security numbers in the P-TRAK file as of June 11, 1996, 11 days after the product was introduced. Rumors take a little while to circulate, so CNN did not do a story on P-TRAK until September 18, 1996 -- weeks after people on the Web had mounted a campaign against it.

Lexis/Nexis markets the P-TRAK file to the legal community and displays the name of the individual, the individual's maiden and assumed names, current as well as up to two previous addresses, month and year of birth, and telephone number. That is the only information displayed in the P-TRAK file.

Contrary to messages posted to some Internet discussion and news groups, the P-TRAK file does not contain any credit histories, bank account information or other personal financial data, or medical histories. The information displayed in the P-TRAK file is an aggregation -- readily available from public information sources such as telephone directories (in print and CD-ROM format) and public records maintained by government agencies.

You can email comments to pr@prod.lexis-nexis.com or contact Lexis/Nexis at 800-543-6862 to have your data removed. The company's statement about the incident is posted on the Web at http://www.lexis-nexis.com/lncc/p-trak/p-trak.html.

Superbase Returns

Superbase is a veteran PC database product that has been resurrected from a near demise. Superbase Developers Inc. is a publicly traded company formed by a group of former Superbase users in the U.S. and the U.K. who bought the product from SPC (the previous owner) and are now marketing and maintaining it. Superbase was the first database system on Windows 2.x and was marketed by Microsoft under the name "Microsoft Database for Windows." Superbase's agent in the U.S. is InterApps Inc. (1414 Pacific Coast Hwy., Suite C, Hermosa Beach, CA 90254; 310-374-4124 or fax 310 374-7971; http://www.superbase.com).

Close to 1,000,000 copies of Superbase have been produced, and over 1000 packages have been developed with it today. The Superbase95 product still uses the original version of Basic but has added ODBC and other modern features so that it can get to foreign data.

The fact that developers wanted to save Superbase says a lot for the product -- most developers have fantasies about smashing the vendors of bad products with baseball bats.

Microsoft SQL Server

On September 4, 1996, Microsoft offered a presentation at its Atlanta office of version 6.5 of SQL Server. I decided to attend and called the office for directions. I immediately encountered a long-winded touch-tone voice maze, which told me to rent a car at the airport and drive up the toll road -- not great advice for a local Atlanta resident who wanted to use MARTA, our public transportation system. So I switched over to the human option and asked about taking a bus to visit the office. The young woman on the line did not know about bus routes, so she asked someone else. She then came back and repeated what she was told; this answer involved a rail station miles from the site and a $10 taxi ride.

I then called MARTA. I took a bus directly from the Medical Center MARTA station to Microsoft's front door. I saved about 30 to 45 minutes of driving time, a car rental, and a toll. Draw your own conclusions about Microsoft telephone support.

Once I arrived successfully at the Microsoft office, I was rewarded with the presentation of SQL Server 6.5. SQL Server has passed the FIPS-127 conformance test and has added many of the SQL-92 features. Frankly, Microsoft has done a good job of moving from the old mutant, diseased Sybase T-SQL language toward ANSI/ISO Standard SQL-92. I am really happy about the progress and the product. Hats off to the SQL Server team for a job well done.

Unfortunately, the Microsoft spokesperson claimed that the product has implemented the full SQL-92 standard. This is just not so. Richard Romley of Smith-Barney sent me test scripts (see the sidebar, "Crashing Microsoft SQL Server") that are valid SQL-92 but blow up in Microsoft SQL Server 6.5 SP (Service Pack) 1. I have the feeling that the political problems at NIST with the FIPS-127 program have resulted in poor compliance testing.

Microsoft's spokesperson also claimed that SQL Server is an "open system," even though it only runs on Windows NT and is becoming more closely attached to it. Even ODBC from Microsoft is not what I would call an open standard. Paradox 7 can connect with ODBC drivers using an ODBC socket, but Microsoft will not let Borland distribute an ODBC driver with Paradox. When you go to get the driver from Microsoft's Web site, it will not run unless you have bought Access, FoxPro, Visual Basic, or C++ from Microsoft.

The spokesperson reasoned that SQL Server is an open system because it can read from and write to other, non-Windows NT systems. However, not only is SQL Server getting tied closer to Windows NT, but the presenters talked about integrating Microsoft Back Office into Windows NT.

I view such a tightly coupled system as a disaster: It will be harder and harder to add firewalls between the database, operating, networking, and other systems. Take a look at three problem areas Microsoft has right now: Microsoft Word, ActiveX, and Nintendo programming. Let me explain. Right now, nobody in his right mind will accept Microsoft Word files from an unknown outside source over the Internet or email because of the 200+ versions of the Concept virus that can come in with the files. What will happen when the Back Office suite and email system with Word have access at the operating system level to SQL Server databases?

I get a lot press releases from sources that I do not recognize, and I now simply refuse anything but ASCII files. I have already had to do one full clean-up of my previous employer's network because of a Concept virus I brought in. Please do not tell me that you always virus-scan your outgoing email. We both know that is a lie, and even if it isn't, you cannot guarantee that your virus checker is up to date. Microsoft's own public relations department sent out a CD-ROM with two infected files on it this year; does your public relations firm have more equipment, software, and expertise than the production department at Microsoft headquarters? I don't think so.

As an aside, I wish that public relations flacks would think about what they are doing. A Word file is five times larger than the same information in ASCII, so they are making me pay five times the connection time to get their junk mail. Next, I have to move the file over to a machine on the other side of my firewall just to read it. Then, if I want to use any of the material, I have to translate it into ASCII to store and file it back on my original machine.

In hardcopy terms, this process is the equivalent of sending a postage-due press release labeled "Possible Bomb!" written in a language that requires translation into English by a second party. Why are you surprised that your client never gets any mention in my column?

Fred McLain posted a bit of ActiveX code known as the "Internet Exploder" on his Web site in June 1996 to demonstrate a point. The "Exploder" is an ActiveX control that took him less than a day to write, which will shut down Microsoft Explorer under Windows 95. It works by taking advantage of the ActiveX control's ability to access any part of your system (this level of system access is not permitted in Java applets, by the way). Shortly you may find that an employee browsing "alt.sex.barnyard_animals.furniture" can poison the entire system.

Microsoft's third problem is Nintendo programming, a term I use for programming done by kids who grew up on single-user video games and who now use those paradigms in their code. Fundamentally, Microsoft still does not understand enterprise-level systems. Microsoft is full of examples that can be explained by asking the question, "How would little Mario do it?" and seeing what Microsoft did.

The Windows NT group at Microsoft proposed that the graphics drivers be moved into the operating system kernel or ring zero, so that the screens would refresh faster. This positioning will really make little Mario fly, but it is a disaster in a multiuser system. One terminal lockup in a GUI or a game would then shut down everyone, possibly destroying work in progress. File locking and systems management code goes into the kernel because you cannot just put in a another quarter and reset for a replay of a transaction. But little Mario likes pretty pictures better than that nasty old data.

In September, Netscape Communications Corp., O'Reilly & Associates, and a story in PC Week Online (September 6, 1996) by Eamonn Sullivan reported that the Server and Workstation versions of Windows NT 3.51 are identical software. The only difference between them is two registry settings turned on by their respective install programs (for more details, read Nicholas Petreley's "Down to the Wire" column in InfoWorld, September 16, 1996). This report is interesting because Microsoft claims that the two versions of Windows NT are designed differently and are specifically tuned for different tasks.

PC Week Labs tested a utility from O'Reilly that changes the settings. According to PC Week magazine, the utility works. When PC Week Labs ran the utility on a Windows NT Workstation 4.0, NT displayed a dialog box warning that the action was a violation of the license agreement.

Tandem and DataBlade

In September, Tandem Computers and Informix began to integrate the Informix DataBlade application programmer interface (API) with Tandem's ServerWare SQL reliable, massively parallel database software on their Himalaya and Windows NT servers for clustered systems. The systems are available on Unix, Windows NT, and Tandem NonStop Kernel operating systems.

This agreement is a strong endorsement of Informix's extensible DataBlade technology, and it accelerates the growth of de facto "rich media" databases such as time-series analysis, spatial map rendering, multimedia Intranet and Internet applications, video-on-demand, and electronic publishing. The de jure ANSI X3H2 attempts at putting nontraditional data into the SQL3 document are becoming more convoluted and probably will not be implemented by anyone in the face of strong de facto standards and the horrible complexity of the ANSI model.

Puzzle

Yogesh Chacha ran into a problem and sent it to me on CompuServe on September 12, 1996. Users in his shop usually end up sending files to the wrong printer for printout; thus they decided to include a new table in the system that will derive the correct printer for each user at runtime. Their table looked like this:
CREATE TABLE PrinterControl
(usid CHAR(20),
printer_name CHAR(20) NOT NULL,
description CHAR(40) NOT NULL);
The rules of operation are: 1) If the user has an entry in the table, he or she will pick the corresponding printer_name; 2) If the user is not in the table, then he or she is supposed to use one of the printers whose usid is NULL.

Now, consider the following example:

PrinterControl
usid printer_name description
==== ============ ===========
'chacha' 'LPT1' 'First floor's printer'
'lee' 'LPT2' 'Second floor's printer'
'thomas' 'LPT3' 'Third floor's printer'
NULL 'LPT4' 'Common printer for new user'
NULL 'LPT5' 'Common printer for new user'

When "chacha" executes the report, he is entitled to use only LPT1, whereas a user named "celko" is expected to use either LPT4 or LPT5. In the first case, a simple query can pull out one row and it works fine; in the second case, you get two rows and cannot use that result.

Can you come up with a one-query solution? (See the answer)


Crashing Microsoft SQL Server 6.5

Using the Pubs database that comes with SQL Server 6.5 SP (Service Pack) 1 for the examples, the following query works fine:
SELECT X.au_id
FROM (SELECT au_id
FROM Authors
WHERE au_fname LIKE 'Albert') AS X
au_id
----------------
998-72-3567

This is the right answer. But add an outer join of derived table X with anything, and guess what happens?

SELECT X.au_id
FROM (SELECT au_id
FROM Authors
WHERE au_fname LIKE 'Albert') AS X
LEFT OUTER JOIN Authors AS A0
ON X.au_id = A0.au_id;
yields the following results:

au_id
----------------
172-32-1176
213-46-8915
238-95-7766
267-41-2394
274-80-9391
341-22-1782
409-56-7008
427-17-2319
472-27-2349
486-29-1786
527-72-3246
648-92-1872
672-71-3249
712-45-1867
722-51-5454
724-08-9931
724-80-9391
756-30-7391
807-91-6654
846-92-7186
893-72-1158
899-46-2035
998-72-3567

Suddenly "Albert" is all over the place! It gets better. Let's change that last JOIN to an INNER JOIN.

SELECT X.au_id
FROM (SELECT au_id
FROM Authors
WHERE au_fname LIKE 'Albert') AS X
INNER JOIN Authors AS A0
ON X.au_id = A0.au_id;
This statement will break the connection and give you the error message "DB-Library Process Dead -- Connection Broken," even though these three statements are logically the same because the JOINs do nothing. Microsoft claims it will fix all of these problems by SP 2. Because I have to write my column three months in advance, the readers will have to verify this promise for themselves. -- Joe Celko
* InterApps Inc., 1414 Pacific Coast Hwy., Suite C, Hermosa Beach, CA 90254; 310-374-4124 or fax 310 374-7971; http://www.superbase.com.
* Microsoft Corp., One Microsoft Way, Redmond, WA 98052; 800-426-9400, 206-882-8080, or fax 206-936-7329; http://www.microsoft.com.

Joe Celko is an Atlanta-based guru with Northern Lights Software Ltd. and a member of the ANSI X3H2 Database Standards Committee. He is also the author of two books on SQL: SQL For Smarties (Morgan-Kaufmann, 1995) and Instant SQL Programming (Wrox Press, 1995). You can contact Joe via CompuServe at 71062,1056 or via email at 71062.1056@compuserve.com.

Puzzle Answer

I would answer that the problem is in the data. Look at the usid column. The name tells you that it should be unique, but the column definition allows multiple NULLs. The real world poses another problem: You want to balance the printer loads between LPT4 and LPT5, so that one of them is not overused. Ergo, do not write a fancy query; instead, change the table:
CREATE TABLE PrinterControl
(usid_start CHAR(20) NOT NULL,
usid_finish CHAR(20) NOT NULL,
printer_name CHAR(20) NOT NULL,
description CHAR(40) NOT NULL,
PRIMARY KEY (usid_start, usid_finish));
Now, consider the following example:

PrinterControl
usid_start usid_finish printer_name description
========== =========== ============ ===========
'chacha' 'chacha' 'LPT1' 'First floor's printer'
'lee' 'lee' 'LPT2' 'Second floor's printer'
'thomas' 'thomas' 'LPT3' 'Third floor's printer'
'aaa . . . ''mzz . . . ''LPT4' 'Common printer #1 '
'naa . . . ''zzz . . . ''LPT5''Common printer #2'
The query then becomes:

SELECT MIN(printer_name)
FROM PrinterControl
WHERE :my_id BETWEEN usid_start AND usid_finish;

The trick is in the start and finish values, which partition the range of possible strings between "aaa . . . " and "zzz . . . " any way you wish. For example, the "celko" user id qualifies only for LPT4 because it falls alphabetically within that range of strings. A user "norman" is qualified only for LPT5. Careful choice of these ranges will let you distribute the printer loads evenly if you know what the user ids will be like.

I have also made an assumption that the common printers will always have higher LPT numbers. When "chacha" goes to this table, he will get a result set of (LPT1, LPT4) and then pick the minimum value, LPT1, from it. A smart optimizer should be able to use the PRIMARY KEY index to speed up the query. -- Joe Celko


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
December 1996 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Monday, November 18, 1996