Computer scientist David Gelertner commented that "children need Internet access the way they need subsidized bus service to the nearest mall" in reference to the governmentıs goal of wiring the nationıs schoolrooms to the Internet. I myself like to get out of the house and go to a local college library to do my research using a terminal on a T1 line along with all the reference materials they have in hardcopy. Itıs like a mall for geeks.
Most of the time, I download the pages as ASCII text to a diskette and bring the data home. Why the kids need to see everything in hardcopy is beyond me. The option to save to diskette is right there on the Netscape File menu, and it is a lot faster than printing a Serbian-Chinese dictionary on a shared laser printer.
My problem is that after a day or two of Web surfing, I have a ton of ASCII files on a box of diskettes and have to cross-reference them by hand. People like me now have a tool from askSam Systems (www.asksam.com), which you might know as the maker of a neat little textbase tool by that same name.
The company just released a public beta of SurfSaver, a new product that integrates seamlessly with whatever browser you are using and lets you store Web pages directly from your browser into searchable folders. When you find a page you want to save, right click the mouse and choose the SurfSaver Save command. The page is immediately saved with text, graphics, and hypertext links. Once the pages are saved, you can quickly search and browse them (even when youıre not connected to the Internet).
Just as important, SurfSaver gives you a permanent archive of the information you collect. If the original Web page changes or disappears, you still have the original information on your disk. SurfSaver has the usual full-text, Boolean, proximity, date, and other textbase search options that can be invoked directly from your browser.
The estimated release date was April 1998, and the downloadable version was priced at $29.95, or $39.95 plus shipping for a boxed copy. You can get information and the downloadable beta version at www.surfsaver.com. As of this writing, the beta version ran only on Internet Explorer 4.0; however, askSam plans versions for IE 3.0 and Netscape.
Speaking of improving things, you might want to look at Sylvain Faust Internationalıs SQL-Optimizer/DBA version 1.1 tool. (See Robin Schumacherıs review in the January 1998 issue of DBMS.) This product will identify problems and propose optimized SQL statements. Maybe you can get better answers to some of my puzzles.
For example, one of the new transforms that significantly enhances SQL-Optimizerıs ability to rewrite SQL code is the "Predicate Push-Up," which identifies possible programmer errors in queries that have nested subqueries. If a subqueryıs WHERE clause predicate doesnıt contain references to any of the columns in the subqueryıs tables, the predicate is moved outside of the subquery to the appropriate outer queryıs WHERE clause. You can get more details and take a test drive of SQL-Optimizer 1.1 at www.sfi-software.com.
Very large databases (VLDBs) are one area where optimization pays off, and the real world is fast becoming cluttered with the monsters. The VLDB Summit, which was held in Beverly Hills, California, in March of this year, is a trade show sponsored by Winter Corp., a consulting and research firm specializing in large database technology, and DBMSıs sister publication Database Programming and Design. The awards are interesting because they tell you what is happening with real (commercial) databases and give you a measure of the size of the beasts.
United Parcel Service (UPS), based here in Atlanta, got its third win in as many years for its federated transaction processing system. UPS received one grand prize for its more than 11TB of data and another for having 324 billion rows in its DB2 database.
Telstra, a telecommunications company based in Melbourne, Australia, won two grand prizes for its centralized OLTP database ı the most data and most rows in any environment. The Telstra installation contains 4.2TB of data and 51 billion rows. The main component of Telstraıs DBMS is DB2 on a cluster of systems, including IBM S/390.
The award for the largest number of rows in a transaction processing system in a Unix environment went to Deere & Co. of Moline, Illinois. The grand prize was awarded for a total of more than 2.5 billion rows in its database. So much for rumors of the death of Big Iron.
Careful readers might remember my mention in March of the European Article Number (EAN) code, which parallels our UPC codes. The lead time on my column kept me from telling you that in March 1998, the Article Number Association ı the U.K. standards authority for bar-coding and electronic data interchange (EDI) ı and the Electronic Commerce Association ı a nonprofit organization offering guidance and practical solutions to let businesses make the most effective use of EDI and other forms of emerging e-commerce technologies ı agreed to work toward a full merger later in the year.
The merger would create a one-stop shop for e-commerce in the United Kingdom, including numbering, data carriers, EDI, and the Internet. The rest of Europe is also moving ahead of the United States on the cryptography front. On March 23, 1998, cnlab Software AG of Rapperswil, Switzerland, announced the formation of an alliance with Network Associates International B.V. to develop and compile full-strength, 128-bit international versions of the Network Associatesı PGP encryption product line. Network Associates will license the software from cnlab Software for markets outside the United States.
Network Associatesı PGP encryption products for international markets will be fully developed and compiled in Europe by cnlab Software, based upon widely available published source code that was legally exported from the United States. No U.S. technical assistance has been, or will be, provided to cnlab Software or to international offices of Network Associates, ensuring full compliance with U.S. export laws. In short, the United States will not see any money or benefit directly from this project because of its export laws.
In April, the NCITS H2 email list had a series of posts on what it means to add n months to a given date on its database standards mailing list. The general consensus seemed to be that there was no standard way to do it; you would have to rely on whatever conventions your application used.
Mike Lefler, a Litton/PRC senior technical fellow, posted a note reporting that one person said that his software calculates date arithmetic in the following fashion. Let Y1 be the original year shown as all four digits, M1 be the original month, and D1 be the original day of the month (YYYY-MM-DD format). Then it adds n months to this date by adding n months to Y1-M1-01, then adding (D1ı1) days to the result. The trouble is, for n=14 and a starting date of 1997-12-30, this yields 1999-03-02, while a starting date of 1998-01-01 gives us 1999-03-01. Because December 30, 1997, is earlier than January 1, 1998, but March 2 comes before March 1 in the same year, this result is unsatisfactory. Does anyone have a better method? Or just a different one?
Mike also made a tongue-in-cheek suggestion that we scrap the current calendar and go to a calendar with 30 months of 12 days apiece, with a five-day Saturnalia (six days every fourth year unless the year is divisible by 100 and not by 400).
Actually, something called the Edwards calendar is the best reform proposal I have seen. It has 12 months divided into quarters of three months with 30, 30, and 31 days respectively. The first, second, and third months within each quarter always start on Monday, Wednesday, and Friday, respectively. New Yearıs Day is the first day of the year; it is not within any month, Mayan style, and it is numbered zero (2000-00-00). Leap year day is handled by a second intercalendral day (2000-00-01) when needed.
You can take an intensive two-week course in Geographic Information Systems (GIS) this summer at the University of California, Riverside Extension Center. This is the third annual GIS Intensive Institute and will be held August 3-13, 1998. The agenda calls for morning lectures followed by afternoon computer labs and interactive exercises. Topics covered include data conversion, data capture technologies, and project management. The instituteıs final sessions focus on team projects and practical exercises dealing with real-life work settings.
Christopher D. Thomas, a marketing manager at ESRI in Redlands and a national leader in GIS applications, is the institute coordinator. Pioneers in GIS lead the various sessions. The fee for the two-week course is $1,950 and includes a copy of ArcView 3.0. Registration is requested by July 31, and enrollment is limited. You can find information on the Web at www.unex.ucr.edu/gis/gis.html.
SQL is pretty good about letting you do cross joins to get all possible pairs (x, y) from two sets of elements with a simple query, for example:
SELECT x, y
FROM BigX
CROSS JOIN
BigY;
But sometimes you would like to do this sort of thing horizontally instead of vertically. A permutation is an ordered arrangement of elements of a set. For example, if I have the set {1, 2, 3}, the permutations of those elements are (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), and (3, 2, 1). The rule is that for n elements, you have a factorial n! number of permutations.
What I would like from you this month is a query that returns one permutation per row from a set of the first seven integers. Try to make the answer easy to generalize for more numbers.
CREATE TABLE Elements (i INTEGER NOT NULL); INSERT INTO Elements VALUES (1); INSERT INTO Elements VALUES (2); INSERT INTO Elements VALUES (3); INSERT INTO Elements VALUES (4); INSERT INTO Elements VALUES (5); INSERT INTO Elements VALUES (6); INSERT INTO Elements VALUES (7);
| Answer |
|---|
| The obvious and horrible answer is:
This monster predicate will guarantee that all column values in a row are unique. Execution time, however, is pretty bad. An improvement on this query can be made by adding one more predicate to the where clause:
This improves things because most optimizers will see a predicate of the form <expression> = <constant> and will execute it before the and-ed chain of in() predicates. While not all rows that total 28 are a permutation, all permutations will total to 28 for this set of integers. When you have a factorial, you look for all the improvements you can get! But let's carry the totals trick one step further. First, redefine the Elements table to have a weight for each element in the set:
The weights are powers of two, and we are about to write a bit vector in SQL with them. Now, the where clause becomes:
This does the whole filtering job for you and the in() predicates are all unnecessary. This answer also has another beneficial effect: The elements can now be of any data type and are not limited just to integers. |