DBMS, February 1996
DBMS Online: SQL for Smarties By Joe Celko

Old Friends and New Problems

Genexus revisited, the nucleus server, the year 2000, and a SQL puzzle.

November was the month that I was out of town only one day to consult with a firm that is developing a data warehousing analysis product. I did not go to Comdex this year because I just wanted to spend some time at home with my wife. Ha! She took a week off to go to a women-only, vegetarian commune in St. Augustine, Florida, and luxuriate on the beach. However, old friends came to visit me instead.

Genexus Again

Sergio Prusky of Genexus dropped by my office in November to show me the latest version of Genexus' product, which I mentioned in my March 1995 column (see "Database in the Windy City," page 20). This is a development tool from South America that worked only with Xbase products when I first saw it, but has been extended to generate code for AS/400, Oracle, DB2/2, DB2/6000, and Microsoft SQL Server, with more offerings on the way. You can use the internal problem definition language to drive a generator for any target language.

The system is divided into three major parts: a front-end screen builder, a prototype system, and a production system. The user builds screens and sends them to a Knowledge Manager for analysis. The Knowledge Manager prompts for any extra information it needs and then creates a prototype database in third-normal form (3NF). The Knowledge Manager is not just a database builder and checker; there is also a scripting language for business rules based on an event-action model that is tied to each screen.

Developers can then use the prototype database for more experiments or to create the production system. The system is able to create a lot of documentation automatically. The entire product has a nice, clean, natural feel to it. Sergio and I built a simple order-entry system, and then added a rule that would give a 10 percent discount (shown as a line item and taken from inventory) for orders totaling more than $100. Yes, I know this is a bad design, but I wanted to see a rule triggered by an aggregate, and see whether deleting an item (which would bring the total under $100) would cause the discount to be backed out. Genexus had no problems constructing the right procedures, and the problem took about five minutes of actual programming time.

In particular, I liked the way the company handled its report writer. I am a known "GUI hater" and have favored language-based report writers because they give better control over the data. Plus, language-based report writers generate documentation that a pure GUI report writer cannot provide. However, GUI-based report writers let you see what you are getting and make the layout work much, much easier. Genexus offers a cross between the two approaches, and combines graphics with a control language script.

Nucleus Again

I was going through my usual 42 pounds of junk mail recently and saw the name of one of my favorite products for the first time in years -- the Nucleus Server. This came from a small start-up company in Santa Monica, Calif., with which I consulted in the mid-1980s.

The original company had many of the usual start-up problems, but I think that it died with its founder Ted Glazer. Ted was one of the most interesting people I had ever met in the trade. Blind since age seven, he was one of the designers of the Burroughs 5000 computer family, invented a light pen, and helped start the Computer Science Department at Case Western University. Lockheed was one of the company's investors, and it sold the rights to the Nucleus Server to Sands Technology Systems in Canada as part of a downsizing move in late 1995.

The Nucleus Server provides proprietary bitmapped indexing technology in both a hardware and software version. This technology has several unique aspects. The indexes are built on domains rather than on tables; that is, if I want to index telephone numbers, Nucleus builds one bit index vector for all the columns in all the tables that hold telephone numbers. Each telephone number value will appear only once in the domain, but each occurrence will be represented as a single bit in a bit space dimensioned on table, row, and column. This approach directly implements the SQL-92 standard CREATE DOMAIN statement as a database structure. The bit vector space is stored in a compressed format, and the engine can manipulate the bit vectors in that compressed format, decompressing them only when returning the results.

The result is that there is an automatic index on every column. At the same time, the database shrinks in size. This does not happen with conventional indexing approaches. Plus, the Nucleus Server will actually perform better as the number of domains used to represent a fact increases. For example, if the telephone numbers are broken down into domains of area codes, exchanges, and local numbers, the total size of the database decreases and access improves.

You can also rotate the bit space, so there is little difference between a table, row, or column. Thus, you can ask domain-based questions such as, "What things in the database are red?" as easily as you could if you were doing a conventional SQL query.

Year 2000 Problems

In my October 1995 column ("SQL in Sleepytown," page 20), I mentioned a book by Jerome T. and Marilyn J. Murray on the programming problems about to be caused by the year 2000, and offered a reward to any reader who could get me a copy of the book. Well, forget it, you are too late. The authors emailed me a note that a new edition will be out about by the time you read this column. The new book will be titled The Year 2000 Computing Crisis: A Millennium Date Conversion Plan, and will include a disk with the code of the authors' solution to the problem. (The book is published by Particle Books and is available for $49.95 in softcover [ISBN: 0-07-912945-5].)

Quite a few companies are offering services and products to fix the Year 2000 problem. I have a list of more than 20 different companies involved in this area, and it is still growing. This is not just a Cobol mainframe shop problem for legacy databases. Take a look at what your PC BIOS chips do when the date rolls over to the next millennium: You will be surprised by how many of them reset to 1900, 1980, or 1985, instead of 2000.

Government Databases and Privacy

Because November 7 was election day, I also got to thinking about government databases. In 1995, privacy advocates had some victories and some setbacks in trying to keep Big Brother out of all of our lives. The IRS lost the ability to perform random audits, due to lack of funding from the Republican-based congress. These were the "monster-from-hell" audits in which people were picked at random and asked to produce every supporting document they ever had -- from marriage licenses to birth certificates -- for every tax return they had ever filed.

Officially, the goal of these audits was to establish baseline statistical data for creating algorithms to flag possible tax fraud and errors. The fact is that many innocent people suffered crushing financial problems because of the legal expenses involved with such audits. Furthermore, the information necessary for such algorithms can be obtained with sampling methods that do not involve individuals.

I suspect that the IRS is not actively opposing the flat-tax proposals because its systems cannot handle the year 2000 problem. Developing a completely new tax system would be easier and cheaper than rewriting the existing software. A flat-tax system might also help reduce the casualty rate among IRS agents, which is four times that of any other federal agency, including the FBI, DEA, and other law-enforcement units.

In November 1995, the FBI proposed establishing a national wire-tapping system that has the capacity to monitor one out of every 1000 telephone lines (one out of every 100 phone calls) simultaneously in certain areas of the country. The annual average number of court-authorized wiretaps has been under 850 per year -- or less than one tap per 174,000 telephone lines (information from the Federal Register, October 16, 1995, as reported in the November 2, 1995 issue of The New York Times and the November 3, 1995 issue of the San Francisco Chronicle). This is the first step in implementing the Digital Telephony Act signed by President Clinton in 1994.

San Diego County in California started using a fingerprint and photo database system from Unisys to catch welfare fraud last year. (See "Fingerprint Identification System Dusts Benefit Fraud," by Judith Silver, Government Systems News, October 16, 1995, page 12.) This issue has gone to court and is not considered a civil rights violation because the system is not integrated with the criminal justice system. Welfare fraud, apart from Food Stamp and Social Security fraud, is estimated at $2.5 billion per year. The system's first six months of operation in San Diego County saved $186,000 in discontinued or denied payments. A similar system in New York dropped welfare fraud by 15 percent in its first year of operation.

These systems are not just for government, however. Several banks in Britain are looking at using a combination of a smart card and a small fingerprint recognition system terminal for credit card transactions. Going one step further, an American firm has developed a thermal video camera that recognizes a human face at a distance by the pattern of blood vessels. The firm is trying to sell the product to banks for ATM machines. The British system has the disadvantage of working with stolen cards and severed fingers, while the American system would require taking a live hostage.

The Clipper chip/Skipjack controversy is back again. In August 1995, the Clinton administration made a proposal to allow the export of 64-bit key encryption products, instead of the current 40-bit key limit. The 40-bit key is pretty weak by today's cryptography standards, and the 64-bit key is not much better, so this is not a big deal. Most cryptographers do not think that you have real security until you have a 128-bit key. What the government is pushing is another version of the escrow key scheme that was proposed with the original Clipper chip. The original escrow system would have placed the law enforcement encryption keys with government agencies, whereas the newest proposal allows trusted, government-certified private agencies to hold the keys. The original escrow system did not provide access to the Skipjack algorithm used by the Clipper chip; the newest proposal would publish and certify the algorithms.

International firms are buying their cryptography products from foreigners who do not have these limitations. Remember, the U.S. isn't the only country with a post-Cold War defense industry that needs to find new civilian jobs. In November 1995, Sun Computers hired an entire team of Russian computer scientists who had worked on the Russian Supercomputer project to develop their cryptography technology.

Software Patents

While I'm on the subject of government actions, you might be interested in a software patent (#5461708) granted to Borland International on October 24, 1995 for "systems and methods for automated graphing of spreadsheet information." Philippe Kahn is listed as the inventor. The application claimed that the software provides:

1. In a computer system, a method for assisting a user with graphing information, the method comprising the steps of: (a) providing a plurality of different graph types for presenting data sets of varying layout and information content; (b) receiving as input one of said data sets, said data set including information desired by the user to be plotted as a graph; (c) examining said data set by said computer system, for determining layout and information content for said data set; (d) automatically selecting a graph type by said computer system, based on said determined layout and information content for said data set; (e) plotting said graph with the information of said data set, said graph having said selected graph type.

Gee, has anyone seen this technique used in spreadsheet software before 1995? I guess the patent office didn't know that this has been common in all PC-based spreadsheets since the early 1980s. The spreadsheets got the idea from mainframe-based decision-support systems and statistical packages such as SPSS and SAS. With all this prior art, the patent never should have been issued.

Puzzle

Leonard C. Medal posted this problem on CompuServe in November. Patients make legal claims against a medical institution, which are recorded in the following Claims table:

Claims
claimID patient
===============
10 'Smith'
20 'Jones'
30 'Brown'

Each claim has one or more defendants, usually physicians, recorded in the Claimdefendant table:

Claimdefendant
claimID defendant
====================
10 'Johnson'
10 'Meyer'
10 'Dow'
20 'Baker'
20 'Meyer'
30 'Johnson'

Each defendant associated with a claim has a history of events in which changes in the status of the defendant on a given claim are recorded:

Events
claimID defendant status changedate
=========================================
10 'Johnson' 'AP' '1994-01-01'
10 'Johnson' 'OR' '1994-02-01'
10 'Johnson' 'SF' '1994-03-01'
10 'Johnson' 'CL' '1994-04-01'
10 'Meyer' 'AP' '1994-01-01'
10 'Meyer' 'OR' '1994-02-01'
10 'Meyer' 'SF' '1994-03-01'
10 'Dow' 'AP' '1994-01-01'
10 'Dow' 'OR' '1994-02-01'
20 'Meyer' 'AP' '1994-01-01'
20 'Meyer' 'OR' '1994-02-01'
20 'Baker' 'AP' '1994-01-01'
30 'Johnson' 'AP' '1994-01-01'

Changes in status for each defendant occur in a known sequence, determined by law, as shown in the Status table:

Status
status statusdesc seq
========================================
'AP' 'Awaiting review panel' 1
'OR' 'Panel opinion rendered' 2
'SF' 'Suit filed' 3
'CL' 'Closed' 4

The problem is to find the status of each claim. (See Answer).


Joe Celko is an Atlanta-based guru with Northern Lights Software Ltd. and a member of the ANSI X3H2 Database Standards Committee. He is also the author of two books on SQL: SQL For Smarties (Morgan-Kaufmann, 1995) and Instant SQL Programming (Wrox Press, 1995). You can contact Joe via CompuServe at 71062,1056 or via email at 71062.1056@compuserve.com.


* Genexus, 534 South Kansas Ave., Ste. 710, Topeka, KS 66603; 800-440-6325 or fax 913-357-8446.
* Sands Technology Systems, 4141 Sherbrooke St. W., Ste. 410, Westmount, Quebec, Canada H3Z 1B8; 514-939-3477 or fax 514-939-2042.

Puzzle Answer

The status of a defendant (with regard to a given claim) is the defendant's latest status, shich is the status with the highest sequence number. For certain legal reasons, events ordered by date do not always correctly respond to events ordered by sequence number.

The status of a claim is the status of the defendant having the lowest status of all the defendants involved in the claim. Thus, the claim status is a minimum of the maximums. For this sample data, the answer would be:

claimIDpatientstatus
===========================
10'Smith''OR'
20'Jones''AP'
30'Brown''AP'

Mr. Medal's answer was a single SQL query that directly translated the description into code:

SELECT C1.claimid, C1.patient, T2. status
FROM Claims AS C1, Status AS T2
WHERE T2.seq
IN (SELECT MIN (S2.seq)
FROM Status AS S2
WHERE S2.seq
IN (SELECT MAX (S3.seq)
FROM Events AS E1, Status AS S3
WHERE E1. status = S3. status
AND E1.claimid = C1. claimid
GROUP BY E1. defendant));
Can you think of a better way? I will offer a prize for another solution.
--Joe Celko
Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
February 1996 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1996 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Monday, November 11, 1996.