I attended Software Development East '96 in October 1996 to give a talk on representing tree structures in SQL. Software Development is a good conference. I define a good conference as one where I get so many new ideas that I know it will take me months to follow up on them. Unfortunately, I could only come up for a day and half at the end of the show and missed most of the talks I would like to have heard.
Karl Seiler's "Advanced Database Search and Retrieval Technology" talk was not a plug for his company and product (Level Five Research and Quest respectively), but rather a good, balanced presentation on tools for doing Web searches.
The first problem, as anyone who has tried a search knows, is getting back either too much information when your search is too loose ("9,543,234 pages match your criteria.") or no information when you try to tighten it up. What you really want is a research assistant who would look up the most relevant choices for you and give you a short summary of each one before you accessed the Web sites you needed. We are getting there.
The second problem is that search engines are not always smart about what they find. For example, a search on "domino" returns references to a pizza company, musicians, a pet Dalmatian, a Novell network product, some Latin references, some personal Web sites, and game-related sites.
The third problem is that Web site providers try to trick search engines into giving their sites higher scores by using keywords over and over in their text. In a huge result set, you have to be near the top of the list to be read. A page where "domino" appears 100 times will probably get a higher score than a page where the word appears only once, so the Webmaster simply repeats the same words over and over in the comments on a page.
Level\5's Quest Server version 2.0 lets the user input a query with weights to the criteria via a point-and-click interface. This dialog is translated into a search on the Web, and Quest returns the best matches it can find and sorts the results in order of weight. It always returns something, and it is not so easily fooled by Web site provider tricks. If you would like to look at the product, you can download a demo copy from the company's Web site at www.L5R.com.
Karl also mentioned a research project in England that reads documents and reduces the contents to summaries whose length -- a paragraph, a sentence --is determined by the user. The shortest summary is two words, but you will encounter a lot of semantic loss. It takes a long time to analyze a document, and a human expert with voice input might still be faster for one document, but this project demonstrates that computerized summarization is possible.
You might like to experiment with a "Vickerey auction" either in a computer simulation or with friends. This procedure takes sealed bids and sells the item to the highest bidder, as expected. The gimmick is that the winner pays the second highest price. This procedure, which lets the seller obtain as high a price as with any other auction, has been used to sell oil field rights and broadcast spectrum space.
SELECT ((EXP (SUM (LN (CASE WHEN nbr = 0.00 THEN NULL ELSE ABS(nbr) END)))) * (CASE WHEN MIN (ABS (nbr)) = 0.00 THEN 0.00 ELSE 1.00 END) * (CASE WHEN MOD (SUM (CASE WHEN SIGN(nbr) = -1 THEN 1 ELSE 0 END), 2) = 1 THEN -1.00 ELSE 1.00 END) AS Prod FROM NumberTable;The nice part of this is that you can also use the sum(distinct <expression>)option to get the equivalent of prod(distinct <expression>).
The natural logarithm, LN(x), is not defined for all numbers, so we don't have a true inverse over the domain of the problem. As x approaches zero, LN(x) goes toward negative infinity and it is not defined for negative numbers.
For that reason, I added the three CASE expressions in the PROD( ) query. They are a bit tricky, so follow me closely: The first CASE expression is to insure that negative numbers are converted to their absolute values so the logarithm will be defined. Zeros are also converted to NULLs in this expression so that SQL will not raise an exception; several SQLs tested did not report an error. Instead, they returned a NULL or even a zero, both of which are wrong. The NULLs will be dropped out in the SUM( ), and we will handle multiplication by zero later.
The second CASE expression will return zero as the answer if there was a zero in the nbr column of any selected row. The MIN(ABS(nbr)) is a handy trick for detecting the existence of a zero in a list of both positive and negative numbers with an aggregate function. Don't forget to file that one away in your notebooks!
The third CASE expression will return ý1 (minus one) if there was an odd number of negative numbers in the nbr column. The innermost CASE expression uses a SIGN( ) function that returns +1 for a positive number, ý1 for a negative number, and 0 for a zero. The SUM( ) counts the ý1 results, and then the MOD( ) function determines if the count was odd or even. I am not sure if changing the answers clauses of the CASE expression from 1 and 0 to 1.0 and 0.0 -- which would make them into floating point numbers -- would improve performance or not.
You still need to test your SQL to see if SIGN( ) and ABS( ) return NULL for a NULL argument. I did not find any exceptions, but you never know.
SELECT S1.emp, S1.sal_date, S1.sal_amt, S2.sal_date, S2.sal_amt FROM Salaries1 AS S1, Salaries2 AS S2 -- use the view WHERE S1.emp = S2.emp AND S1.sal_date > S2.sal_date UNION ALL SELECT emp, MAX(sal_date), MAX(sal_amt), NULL, NULL FROM Salaries1 GROUP BY emp HAVING COUNT(*) = 1;You might have to replace the last two columns with the expressions CAST(NULL AS DATE) and CAST(NULL AS DECIMAL(8,2)) to assure that they are of the right data types for a UNION. However, the best answers once more came from Richard Romley of Smith-Barney. His first solution takes advantage of the subquery table expression to avoid VIEWSs:
SELECT B.emp, B.maxdate, Y.sal_amt, B.maxdate2, Z.sal_amt
FROM (SELECT A.emp, A.maxdate, maxdate2 = MAX(X.sal_date)
FROM (SELECT W.emp, maxdate = MAX(W.sal_date)
FROM Salaries AS W
GROUP BY W.emp) AS A
LEFT OUTER JOIN Salaries AS X
ON A.emp = X.emp
AND A.maxdate > X.sal_date
GROUP BY A.emp, A.maxdate) AS B
LEFT OUTER JOIN Salaries AS Y
ON B.emp = Y.emp
AND B.maxdate = Y.sal_date
LEFT OUTER JOIN Salaries AS Z
ON B.emp = Z.emp
AND B.maxdate2 = Z.sal_date;
If your SQL product supports only a TABLE or VIEW name in their outer joins, then you can convert some of the subqueries into VIEWs for the table subqueries named A and B.
Richard and I went back and forth via email on other possible solutions until we both came up an unorthodox approach, which I wanted to reject on artistic grounds. In the end, though, performance is the ultimate judge.
The trick is Conjugacy and functions involved are concatenation and its inverse, SUBSTRING( ). The trick is to convert the dates and salaries into strings, concatenate them with the date string first, and then split the salary back out.
SELECT A.emp,
MAX(A.sal_date) AS date1,
SUBSTRING (MAX(CAST(A.sal_date AS CHAR(10))
|| CAST(A.sal_amt AS CHAR(8))), 11, 8) AS sal1,
MAX(B.sal_date) AS date2,
SUBSTRING (MAX(CAST(B.sal_date AS CHAR(10))
|| CAST(B.sal_amt AS CHAR(8))), 11, 8) AS sal2
FROM Salaries AS A
LEFT OUTER JOIN
Salaries AS B
ON A.emp = B.emp
AND A.sal_date > B.sal_date
GROUP BY A.emp;
If you wanted to, you could also CAST( ) the whole substring expression back to DECIMAL(8,2) so you would have the original data type. Another advantage is that you can extend the pattern of this query by adding more MAX( ) and SUBSTRING( ) pairs with matching LEFT OUTER JOINS and performance does not degrade significantly.
Look back at the list of conjugate function pairs at the start of this section; you will see that I have not used the trig functions or string reversal. Frankly, I cannot think of any way to use the trig functions, but I know that some reader has a great example of string reversal. When you send it in to me, you'll get your name in print and a new book as a prize.
| Table2 | |
| sortnbr | sortseq |
| ============================= | |
| 0 | 'T1.a, T1.b, T1.c;' |
| 1 | 'T1.a, T1.c, T1.b;' |
| 2 | 'T1.b, T1.a, T1.c;' |
| 3 | 'T1.b, T1.c, T1.a;' |
| 4 | 'T1.c, T1.a, T1.b;' |
| 5 | 'T1.c, T1.b, T1.a;' |
His first approach was to use a dynamic cursor that was built with code something like the following pseudocode. In the actual program, the sort order code came from another procedure, not from a dialog with the user.
BEGIN
DECLARE query1 CHAR(100)
DEFAULT('SELECT * FROM Table1 AS T1 ORDER BY ');
mysortcode INTEGER NOT NULL;
mysortseq CHAR(30) NOT NULL;
CONNECT TO MyDatabase;
/* get the id number for the sort */
WRITELN ('give me a sort order code: ');
READLN (mysortcode);
/* get the text of the Order By clause from Table2 */
SELECT sortseq INTO mysortseq
FROM Table2
WHERE sortnbr = mysortcode;
/* concatenate the strings to build the query */
SET query1 = query1 || mysortseq;
/* prepare and execute the dynamic SQL string */
PREPARE query1;
EXECUTE query1;
/* use the results .. */
...
END;
Can you think of a way to do this sorting problem that would avoid using dynamic SQL? That is, can you write a single SELECT statement that could be put into a static SQL statement of the form "EXECUTE MyAnswer USING . . ." instead?
EXEC SQL SELECT Table1.*,
(CASE WHEN :sortflag = 0
THEN Table1.a || Table1.b || Table1.c
WHEN :sortflag = 1
THEN Table1.a || Table1.c || Table1.b
...
WHEN :sortflag = 5
THEN Table1.c || Table1.b || Table1.a
ELSE ' ' END) AS sortfield
FROM Table1
WHERE ...
ORDER BY sortfield;
This answer makes the query self-contained and removes a redundant table from the schema. Obviously, you will have to CAST( ) the three columns strings if they are not already CHAR( ) data types. I left that part out because it would make the code too hard to read. This statement is static SQL and will run faster than a dynamic statement that must be re-compiled every time.