DBMS, July 1998
DBMS Online: SQL for Smarties By Joe Celko

Moving On

Migration paths, real and imagined.


I was in New York City for Tax Day in April. Not only did this trip give me a long mail delay for my tax return, but I also got to speak to a joint meeting of Professional Association of Database Developers (PADD) and Object Developerıs Group (ODG) at Morgan-Stanleyıs office at Time Square.

One of the stock questions I get in my email is, "Which database should I buy?" Itıs right up there with, "Which car should I buy?" It depends a great deal on what you are going to do with it. If you tell me you want to deliver pizzas with it, I can eliminate some decisions.

While "Pizza by Ferrarri" delivery service might be able to promise, "Your pie, piping hot, anywhere in the state in 15 minutes or less," the cost/performance ratio is going to be killer. And unless you are supplying the Italian army, a Peterbilt is not a good choice either.

However, the one thing I will always safely predict is that some day you will have to move your application to another platform because it got too big for the database you first picked.

Oracle and Pervasive

I am not the only person who remembers that you need a migration path. In March 1998, Pervasive Software Inc. (www.pervasive.com) announced a joint effort to let existing packaged-application providers migrate their products to Pervasive and Oracle databases.

Pervasive, formerly known as Btrieve Technologies, is embedded in more than 10,000 independent software vendor products, but its database was designed for smaller, embedded, client/server environments, and it does not scale up to the enterprise level.

I hope the deal with Oracle will give users a transparent migration path to the enterprise level. I am writing this column in early April, so nobody has had time to see if this works. As soon as someone tries it, please drop me an email to let me know what happened.

Big Brother Bill

But not everyone has the same idea about how to do a migration. The Federal Office Systems Exposition (FOSE) is the big trade show for the federal government, and it is held in March in Washington, D.C. Peter Barus posted a note on CompuServe about attending and watching a demo of Windows 98. Things were fine until the pitch got to the part where the presenter said, "And it logs onto the Web so Microsoft can scan your drives to find out what you need." At this moment, Mr. Barus decided that this was not the product he wanted. During that demo there was a crash, of course, and Windows 98 instantly restarted, with the flag and all, while the presenter was talking. She was cool, but she said "Uh, whatıs happening? Oh..." Many of us are already saying that with Windows 95.

Apparently, Bill Gates expects us to blithely turn over the most intimate parts of our computers to Microsoft to browse around and exchange drivers for us. Actually, Microsoft is already doing it. If you try to get the Draw 98 package from Microsoftıs Web site, it will try to scan your hard drive for components from Office suite so that "the installer knows what you need to make Draw 98 run correctly" and make some decisions for you. No thanks; if I am going to migrate from one product or version to another, I want know about it.

Supercomputers

My prediction that some day you will have to move your application to a bigger database also applies to hardware. Getting bigger iron is not always easy in all parts of the world, though. India is ignoring the U.S. attempts to limit supercomputer technology to the United States and Japan. While you are probably aware of the U.S. position on cryptography, you might not know about recent Clinton administration barriers on the export of high-performance computers whose power exceeds 2,000 mega theoretical operations per second (MTOPS).

Vijay Bhatkar heads Indiaıs Centre for Development of Advanced Computing (C-DAC), which was created a decade ago when the United States first banned the sale of supercomputers to India. Its current project is a teraflop (one trillion floating point operations per second) machine. The new machine is called the PARAM 10000 and took five years to develop.

MPEG-4 Standard

Migrating to new standards does not always mean that you will be using more hardware. Professor Barry Vercoe, head of the MIT Media Labıs Machine Listening group and leader of the Structured Audio research project, and his team developed some of the key technology for the forthcoming MPEG-4 International Standard. MPEG, the Moving Picture Experts Group, is the part of ISO that is chartered with the development of industry standards for the compression, processing, coding, and transmission of audio and video data. MPEG-4 is supposed to dramatically boost the performance levels of computer sound, letting you play CD-quality stereo music transmitted through the average userıs modem. The Final Committee Draft standard is supposed to be released in October 1998 and formally become an international standard in December 1998.

The Media Labıs new approach to sound processing is called "Structured Audio," and it does not represent sound as a stream of bits. Instead, it stores and delivers content as a computer program in a flexible language, then translates it into sound on the userıs computer. Additional information about MPEG-4 Structured Audio is available on the Web at sound.media.mit.edu/mpeg4.

Because transmitting data as a program is far more efficient than transmitting streams of bits, this method enables a radical increase in the quality and efficiency with which sound is delivered. The idea of constructing the actual data at the client instead of storing it in the database has been used in fractal data compression algorithms for still pictures. I wonder what other datatypes you could handle this way?

The Euro

Not all migrations to new standards are well planned. According to an article on WebTech, more than 36 percent of European businesses have no plans in place to deal with the unified European currency, scheduled to start at the beginning of 1999, according to research from the European Information Technology Observatory (EITO), a research venture funded by European industry associations and trade show organizers. But in terms of enthusiasm, 75 percent of companies polled welcomed the Euro. They highlighted the disappearance of foreign exchange risks and the simplification of administrative processes as the main benefits.

The average multinational bank ı including the largest banks in the United States ı will probably have to spend $50 million to $150 million on information technology related to the Euro according to Gartner Group Europe. Financial institutions must pay attention not just to back-office accounting but to payment systems, automated teller machines, and point-of-sale terminals, as well as introducing new products, training staff, and educating customers. "This will have a big impact, even greater than Year 2000, " said Robert J. Baldoni, a New York-based partner of Ernst & Young. "Banks are on a tight timetable."

If you have to worry about the changes in European money, you might want to put a bookmark in your Web browser at www.cix.co.uk/~parkside/emuweb.htm, which will get you to the Pocket Guide to the European Economic and Monetary Union (EMU). This page is beginning to provide links to EMU resources that are publicly available on the Internet. It tries to include primarily sites that are good gateways to a range of other resources, and not to be an exhaustive index of everything, but you should check the site often because things may change without warning.

E-Money

While we are talking about money, let me pass along an email I got from Doron Avital, who is part of a small group at Columbia University working on a project called MarketNet. MarketNet aims to introduce a monetary system in networks that exploits economic principles, first to ensure security and second, for purposes of optimization. The basic structure it considers is that of various domains, each controlled by one bank and one currency; each domain serves both customers and providers of services. In order to acquire a service, a wallet containing virtual money of the right currency should make its way to the service provider. You can achieve security through changes in exchange-rate of currencies, controlling prices, and various ways of detecting the trajectory of an electronic wallet.

Puzzle

This puzzle came in my email from Mike Gora. I changed the original problem a bit, but the idea still holds. You are given a table with the results of an insurance salespersonıs appraisal of the possible losses a customer might suffer. To make the code easier, letıs alphabetically name the dangers a through o. If a danger is not present for this customer, then we show that with a null. If a danger is present, then we give it a numeric rating. For example, a fireworks factory on a mountain top has no danger of a flood, but the "blown to bits" factor is very high. Typically, only five or six of these attributes will have any values. The table looks like this:

CREATE TABLE Losses
(cust_nbr INTEGER NOT NULL PRIMARY KEY, 
a INTEGER, b INTEGER, c INTEGER, d INTEGER, e INTEGER, 
f INTEGER, g INTEGER, h INTEGER, i INTEGER, j INTEGER, 
k INTEGER, l INTEGER, m INTEGER, n INTEGER, o INTEGER);

Letıs put one customer into the table so we will have someone to talk about:

INSERT INTO Losses 
VALUES (99, 5, 10, 15, NULL, NULL, NULL, 
        NULL, NULL, NULL, NULL, NULL, 
        NULL, NULL, NULL, NULL);

We have a second table that we use to determine the correct policy to sell the customer based on his or her possible losses. That table looks like this:

CREATE TABLE Policy_Criteria 
(criteria_id INTEGER NOT NULL, 
 criteria CHAR(1) NOT NULL, 
 crit_val INTEGER  NOT NULL, 
PRIMARY KEY (criteria_id, criteria, crit_val));

INSERT INTO Policy_Criteria VALUES (1, ıAı, 5); 
INSERT INTO Policy_Criteria VALUES (1, ıAı, 9); 
INSERT INTO Policy_Criteria VALUES (1, ıAı, 14); 
INSERT INTO Policy_Criteria VALUES (1, ıBı, 4); 
INSERT INTO Policy_Criteria VALUES (1, ıBı, 10); 
INSERT INTO Policy_Criteria VALUES (1, ıBı, 20); 
INSERT INTO Policy_Criteria VALUES (2, ıBı, 10); 
INSERT INTO Policy_Criteria VALUES (2, ıBı, 19); 
INSERT INTO Policy_Criteria VALUES (3, ıAı, 5); 
INSERT INTO Policy_Criteria VALUES (3, ıBı, 10); 
INSERT INTO Policy_Criteria VALUES (3, ıBı, 30); 
INSERT INTO Policy_Criteria VALUES (3, ıCı, 3); 
INSERT INTO Policy_Criteria VALUES (3, ıCı, 15); 
INSERT INTO Policy_Criteria VALUES (4, ıAı, 5); 
INSERT INTO Policy_Criteria VALUES (4, ıBı, 21); 
INSERT INTO Policy_Criteria VALUES (4, ıBı, 22);
In English, this means that: Therefore, the customer could be offered policies 1, 2, and 3, but not 4. Policy 3 should be ranked the highest, because it matches the most criteria and returned as the answer. Policy 1 should be second highest, and Policy 2 should be last, but letıs not worry about presenting alternatives yet.


Puzzle Answer:
The trick in this problem is that the losses are presented as attributes in the Losses table and as values in the Policy Criteria table. This messes up the data model and means that you have to convert one table to match the other. I will pick the Losses table and flatten it out like this. This might be done with a VIEW, but I am going to show it as a working table:

CREATE TABLE LossDoneRight 
(cust_nbr INTEGER NOT NULL, 
 criteria CHAR(1) NOT NULL, 
 crit_val INTEGER NOT NULL)

Here is how you transform values to and from attributes:

INSERT INTO LossDoneRight (cust_nbr, criteria, crit_val) 
SELECT cust_nbr, 'A', a FROM Losses WHERE a IS NOT NULL
UNION ALL
SELECT cust_nbr, 'B', b FROM Losses WHERE b IS NOT NULL 
UNION
SELECT cust_nbr, 'C', c FROM Losses WHERE c IS NOT NULL 
UNION
SELECT cust_nbr, 'D', d FROM Losses WHERE d IS NOT NULL 
UNION
SELECT cust_nbr, 'E', e FROM Losses WHERE e IS NOT NULL 
UNION
SELECT cust_nbr, 'F', f FROM Losses WHERE f IS NOT NULL 
UNION
SELECT cust_nbr, 'G', g FROM Losses WHERE g IS NOT NULL 
UNION
SELECT cust_nbr, 'H', h FROM Losses WHERE h IS NOT NULL 
UNION
SELECT cust_nbr, 'I', i FROM Losses WHERE i IS NOT NULL 
UNION
SELECT cust_nbr, 'J', j FROM Losses WHERE j IS NOT NULL 
UNION
SELECT cust_nbr, 'K', k FROM Losses WHERE k IS NOT NULL 
UNION
SELECT cust_nbr, 'L', l FROM Losses WHERE l IS NOT NULL 
UNION
SELECT cust_nbr, 'M', m FROM Losses WHERE m IS NOT NULL 
UNION
SELECT cust_nbr, 'N', n FROM Losses WHERE n IS NOT NULL 
UNION
SELECT cust_nbr, 'O', o FROM Losses WHERE o IS NOT NULL;

Now we have a relational division problem:

SELECT  L1.cust_nbr, ' could use policy ', C1.criteria_id, 
             COUNT(*) AS score
      FROM LossDoneRight AS L1, Policy_Criteria AS C1 
 WHERE L1.criteria = C1.criteria 
      AND L1.crit_val = C1.crit_val
 GROUP BY L1.cust_nbr, C1.criteria_id 
HAVING COUNT(*) = (SELECT COUNT(*)
                                            FROM LossDoneRight AS L2
                                         WHERE L1.cust_nbr = L2.cust_nbr);

In English, you join the losses and criteria together. If the loss was able to match all the criteria (in other words, has the same count) in the Policy Criteria description, we keep him or her. It is a one-to-one mapping of the two tables, but one of them can have leftovers and the other cannot.

You can tune this by making equality into a greater than or equal, a less than or equal, or a BETWEEN predicate. The longest execution is building the LossDoneRight table.


Joe Celko is an Atlanta-based independent consultant and a member of the ANSI X3H2 Database Standards Committee. He is also the author of two books on SQL: SQL For Smarties (Morgan-Kaufmann, 1995) and Instant SQL Programming (Wrox Press, 1995). You can contact Joe via email at 71062.1056@compuserve.com.
What did you think of this article? Send a letter to the editor.


Subscribe to DBMS -- It's free for qualified readers in the United States
July 1998 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS (http://www.dbmsmag.com)
Copyright © 1998 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated June 4, 1998