I recently bought an old house that needs some work (read: money pit), and so most of my life and worldly processions are still in cardboard boxes as I write this month's column. One friend who "helped" us move labeled all of her boxes "Living Room" because that is where she packed them. Although I might be short a pile of office supplies, half of my wedding photographs, and all of my tax records, at least she gave me a topic for the first section of this month's column: random functions and physical storage.
Sometimes you want the data in your database spread randomly over physical storage because you will be searching it randomly. One player (database user) will be favored over another simply by luck, and any estimate of performance must be worded with a margin of error. If all of the items being sought are on the same page of physical disk storage, the database will lock that page and make everyone wait his or her turn, and performance will suffer. This usually occurs in a situation where the queries fetch a single row. But it also happens when you load a sorted sequential file into a database.
Other times you want the data in your database clustered in some sorted order in physical storage because you will be searching it in that order. In this case, if all of the related items being sought are on the same page of storage, the database will lock only that page and overall performance will improve. This usually occurs in a situation where the queries aggregate multiple rows or are looking for a range or something similar.
What is a good way to randomize the physical table storage if we are loading data from a sorted sequential file? If I gave you a standard deck of 52 playing cards and the same question, you would shuffle them. A "perfect shuffle" is a magician's term meaning that you cut the deck into two equal packs and then interlace a card from one pack with one from the other. If you start with the right-hand pack, it is called an "inshuffle"; if you start with the left-hand pack, it is called an "outshuffle."
The general formula for computing (n) the number of outshuffles required to restore a deck of (2*d) cards is:
(2^n) mod (2*d-1) = 1
It takes only eight outshuffles, but 52 inshuffles, to restore a standard 52-card deck to its starting position. Also, if d is not a multiple of two, then the formula fails and the only known procedure is to work out each case.
Persi Diaconis at Harvard University demonstrated that the usual riffle shuffle -- split the deck into two packs and shuffle, but without perfect interlacing -- is very good at producing randomness in the deck. It takes seven riffle shuffles to randomize a deck of playing cards. Diaconis and other researchers also found out that certain-size decks would have sudden changes in their randomness; after six riffle shuffles, a deck is still visibly ordered, but this order disappears with the seventh shuffle.
Implications for computer people: Parallel computers whose processors perform the same task simultaneously and then mix their outputs are an example of what are called "shuffle networks." Algorithms that distribute data over storage pages are in effect shuffling the rows of a table and can have similar "catastrophe points"; this accounts for sudden changes in performance in multiuser systems.
This standard looks a lot like ODBC, but it is more generic. It is supposed to be the basis for version 3.0 of ODBC from Microsoft. This subject is worth an article -- or a book -- in itself, but for now, the SQL/CLI will add SQL-92 support and depend more on handles as parameters to the library calls. (See "Understanding ODBC 3.0 Standards and OLE DB" by Ken North, DBMS, April 1996, page S15.)
The same treaty will also radically affect the way that stock prices, weather data, train schedules, and other facts are controlled. It just happens to occur at the same time that the NBA and other sports franchises are stepping up their efforts to control the realtime dissemination of sports statistics through the Internet or with wireless paging devices. (See, for example, "NBA Case No Slam Dunk," by Janet Kornblum, September 7, 1996, at www.news.com/News/Item/0,4,3208,00.html.)
If the treaty is approved and implemented, sports leagues will have far broader powers to dictate the terms and conditions under which sport statistics are reported and disseminated. According to the proposed treaty and legislation introduced in the 104th Congress to implement the treaty (HR 3531, a version of the database protection proposal), the NFL, NBA, NHL, and MLB will have the right to prevent anyone from publishing these and other statistics without express permission from the sports league. This will include the right to control access to the historical archives of sports statistics -- even the right to dictate who can publish, say, the box scores from a game.
The proposals for a new legal environment for publishing facts are outlined in a draft treaty on "databases" that was considered at a December 1996 meeting of the World Intellectual Property Organization (WIPO), in Geneva, Switzerland. The proposal would require the United States and other countries to create a new property right for public domain materials. "Texts, sounds, images, numbers, facts, or data representing any other matter or substance" will be protected.
This treaty represents "the end of the public domain," according to American University Law Professor Peter Jaszi. Copies of the proposed treaty, a federal register notice asking for public comment, and independent commentary can be found at the Union for the Public Domain Web site at www.public-domain.org/database/database.html.
Regular readers of this column will remember that in 1991, the U.S. Supreme Court ruled in what is now called "the Feist decision" that the facts from a telephone directory of names, addresses, and phone numbers were not protected under the copyright laws, and that in general "facts" could not be copyrighted by anyone. The Feist decision alarmed several large database vendors, who crafted this new sui generis property right that would protect facts (and just about everything else).
The database vendors have already succeeded in obtaining a directive on a database proposal from the European Union, although no European country has yet passed legislation to implement the treaty. The most active supporter of this new property right is West Publishing Co., the Canadian legal publisher. A West Publishing employee chairs a key American Bar Association subcommittee that wrote a favorable report on the treaty. A number of very large British and Dutch database vendors are also lobbying hard for the treaty.
West wants the new property right to protect the "page numbers" and "corrections" it adds to the judicial opinions it publishes in paper-bound books. Telephone companies want to protect the names, addresses, and telephone numbers they publish, and other database vendors want to protect scientific data or other non-copyrighted government information they publish. In seeking to protect these items, the treaty was written to stamp "owned by" labels on a vast sea of information now in the public domain. Copyright experts J.H. Reichman and Pamela Samuelson say the treaty advances the "least balanced and most potentially anti-competitive intellectual property rights ever created." (See the full document on the Web at ksgwww.harvard.edu/iip/reisamda.html.)
The administration has proposed a deal in which companies would be freed from export restrictions if they agreed to build such a "key recovery" mechanism into their products. If you were a foreign citizen, would you like the U.S. Government reading your email? Yet at the same time that television sets are getting hit under the munitions act, Tandem Computers Inc. (www.tandem.com) announced its first Internet Transaction Processing (ITP) products to create a secure, reliable environment for commerce on public networks such as the Internet.
Tandem's parallel hardware systems have already given the company a good presence in the OLTP (online transaction processing) market. Tandem hardware runs about 80 percent of the world's automatic teller machines, 66 percent of credit card transactions, and 90 percent of securities transactions. This recent move endorses Internet commerce.
You might want to email your representatives and senators to see what they are doing about NIST operating in contempt of Congress and Public Law.
There are two problems:
CREATE TABLE Appointments
(person CHAR(10) NOT NULL,
period INTEGER NOT NULL CHECK(period BETWEEN 1 AND 16),
status CHAR(4) NOT NULL CHECK(status IN ('Open','Busy')),
PRIMARY KEY (person, period));
INSERT INTO Appointments VALUES ('Mr. B', 8, 'Busy');
INSERT INTO Appointments VALUES ('Mr. B', 9, 'Open');
INSERT INTO Appointments VALUES (ýMr. B', 10, 'Open');
INSERT INTO Appointments VALUES (ýMr. B', 11, 'Busy');
INSERT INTO Appointments VALUES ('Mr. B', 12, 'Open');
INSERT INTO Appointments VALUES ('Mr. B', 13, 'Busy');
INSERT INTO Appointments VALUES ('Mr. B', 14, 'Busy');
INSERT INTO Appointments VALUES ('Mr. B', 15, 'Busy');
INSERT INTO Appointments VALUES ('Mr. B', 16, 'Busy');
INSERT INTO Appointments VALUES ('Mr. C', 8, 'Open');
INSERT INTO Appointments VALUES ('Mr. C', 9, 'Open');
INSERT INTO Appointments VALUES ('Mr. C', 10, 'Open');
INSERT INTO Appointments VALUES ('Mr. C', 11, 'Busy');
INSERT INTO Appointments VALUES ('Mr. C', 12, 'Busy');
INSERT INTO Appointments VALUES ('Mr. C', 13, 'Busy');
INSERT INTO Appointments VALUES ('Mr. C', 14, 'Busy');
INSERT INTO Appointments VALUES ('Mr. C', 15, 'Busy');
INSERT INTO Appointments VALUES ('Mr. C', 16, 'Busy');
INSERT INTO Appointments VALUES ('Mr. D', 8, 'Open');
INSERT INTO Appointments VALUES ('Mr. D', 9, 'Busy');
INSERT INTO Appointments VALUES ('Mr. D', 10, 'Open');
INSERT INTO Appointments VALUES ('Mr. D', 11, 'Busy');
INSERT INTO Appointments VALUES ('Mr. D', 12, 'Busy');
INSERT INTO Appointments VALUES ('Mr. D', 13, 'Busy');
INSERT INTO Appointments VALUES ('Mr. D', 14, 'Busy');
INSERT INTO Appointments VALUES ('Mr. D', 15, 'Busy');
INSERT INTO Appointments VALUES ('Mr. D', 16, 'Busy');
and you need to build a table of your people that you want to meet with:
CREATE TABLE MyPeople
(person CHAR(10) NOT NULL PRIMARY KEY);
INSERT INTO MyPeople VALUES ('Mr. B');
INSERT INTO MyPeople VALUES ('Mr. C');
INSERT INTO MyPeople VALUES ('Mr. D');
Can you do these two queries?
CREATE VIEW Available (person, period, status)
AS SELECT person, period, status
FROM Appointments
WHERE status = 'Open';
Hint: A view of open time slots might help, but it is not needed. You can find this month's puzzle answer in the online version of this column on the DBMS Web site at www.
dbmsmag.com.
SELECT period FROM Appointments AS S1 WHERE status = 'Open' AND person IN (SELECT person FROM MyPeople) GROUP BY period HAVING COUNT(*) = (SELECT COUNT(*) FROM MyPeople);Results
period
======
10
If this does not work, then you might try again for the next best meeting by looking for (n-1) persons with a common open slot:
SELECT period FROM Appointments AS S1 WHERE status = 'Open' AND person IN (SELECT person FROM MyPeople) GROUP BY period HAVING COUNT(*) = (SELECT COUNT(*) FROM MyPeople) - 1;Results
period
======
8
9
and so forth until you get a hit.
Obviously, the equality comparison can be replaced with ">=" to see if you can get a quorum for a meeting, instead of exactly (n-1) people. Notice that you are not told which (n-1) people can meet, only the period.
One way of doing the query for the second problem is a self-join, which has one copy of the view for each person involved.
SELECT A1.period AS B, A2.period AS C, A3.period AS D
FROM Available AS A1,
Available AS A2,
Available AS A3
WHERE (A1.person = 'Mr. B'
AND A1.period NOT IN (A2.period, A3.period))
AND (A2.person = 'Mr. C'
AND A2.period NOT IN (A1.period, A3.period))
AND (A3.person = 'Mr. D'
AND A3.period NOT IN (A1.period, A2.period));
Results
| B | C | D |
| =========== | ||
| 9 | 8 | 10 |
| 9 | 10 | 8 |
| 10 | 9 | 8 |
| 12 | 8 | 10 |
| 12 | 9 | 8 |
| 12 | 9 | 10 |
| 12 | 10 | 8 |
This pattern can easily be extended to any number of people. This strategy will return all possible combinations of valid meetings, if any exist. If your people are not very busy, then the number of possible combinations explodes. Using this sample data, compute the number of possible meetings if everyone is open.
This combinatory explosion might be a good thing, or it might not. I will offer a book as a prize for either a better complete solution or a SQL query that returns one and only one possible meeting schedule. No procedural code, or keep it to a minimum, please.
-- Joe Celko