I am writing this column in October 1996, the month in which my wife and I bought our first house and I started a new job. I was on business trips while she was trying to locate washing machines, dryers, moving equipment, painters, and muscles, and I was living out of a suitcase on the road. Neither of us found any great joy in this adventure as we poured every bit of cash we had into our new money pit.
As editor, Jim was planning to ship camera-ready copy to the ISO Central Secretariat in Geneva by mid-November. That schedule should have ensured a 1996 date on ISO/IEC 9075-4, although it probably won't be available for purchase until some time in January or even February 1997. Shortly afterwards, you can expect SQL/PSM to be added to the SQL standard as an auxiliary.
The language syntax is in the block structured, or "Algol," family of programming languages. The exception handling and use of terminating keywords looks like Ada, the Department of Defense programming language. I will be writing about SQL/PSM programming, and I will try to use the language in my examples, when I need procedural code from now on. Another pledge of SQL religious purity from me, however, will not help other programmers.
The vendors do not yet support SQL/PSM in their products. Although some vendors' existing procedural languages for triggers and stored procedures are closer to the new standard than others, getting a full implementation of SQL/PSM will take considerable time and effort. Because they already have a procedural language, many vendors will either ignore SQL/PSM or mix it with their own existing languages.
The most horrible example of this bastardization is what happened to Watcom SQL when it became Sybase SQL Anywhere. Watcom was well on its way to becoming a good version of the early working draft of the SQL/PSM when its parent company, Powersoft, was bought out by Sybase. However, rather than change Sybase System 11 to conform to ANSI/ISO standards, Sybase's management decided to allow the use of both Sybase's mutant T-SQL procedural language and Watcom's original SQL-compliant dialect in a Sybase SQL Anywhere program.
Another problem is that many programmers no longer know how to write procedural code. At one point, most programmers wrote unstructured code. This practice was marked by undisciplined programming: goto statements jumping all over the place, hybrid data, redundant code, and all of the other evils. You can read about the structured programming revolution in old books by Ed Yourdon, Tom DeMarco, Larry Constantine, or Chris Gane and Trish Sarson.
The current crop of programmers, however, are object- oriented people who learned to program in Visual Basic, PowerBuilder, or other GUI tools. I realized the extent of this problem when I was at a client site looking at a frequently called Microsoft SQL Server stored procedure written by a Visual Basic programmer. I'll spare you all of the details, but the procedure had to join several tables into a result table to pass to a spreadsheet-style display screen on the client machine, where it would be used for editing documents. The specs for what was to go into the grid varied from situation to situation.
The programmer did not know SQL very well, either. I was able to reduce the total execution time by 40 to 50 percent by changing some of his SQL. In the few days I was at the client site, I only had time to make the SQL changes, but the whole program needed to be rewritten.
I am sure that this programmer thought he was writing structured code, because he only used a goto statement to get to the error routine code at the end of the procedure. However, he had a hybrid data parameter and "lasagna code" all over the program.
For you younger programmers, hybrid data is data that is used as both a control flag and as values, so you never have any idea what it is doing when you see it in a code module. For example, if the variable "version_id" is positive, you should read it as a version identification number, but if it is -1, then it is a flag that tells the program to take another execution path. Dual data is a related error where one variable is used to hold two different types of data. For example, a column named "hat_size_or_IQ" is used to record either the hat size or the IQ of an employee, depending which is bigger or available.
I coined the term "lasagna code" in my column in Information Systems News (now Information Week) in 1982 for a occurrence that was well-known among people teaching structured methods. In those days, we were worried about "spaghetti code," which referred to programs with indiscriminate GOTO statements that jumped forward and backward in the program. This tangled structure left you with a path of control through the program that looked like a plate of spaghetti.
The standard solution was to replace the GOTOs with nested IF-THEN-ELSE statements and switches that were layered so deep and had so much redundancy that you had a path of control that looked like a plate of lasagna instead. When you have a program with complicated logic, you really should make a decision table and code from it.
SQL/PSM programmers will not return completely to the old structured programming methods, because they now have access to non-procedural SQL statements and a very different flow control model than their ancestors. As one example, the SQL/PSM lets you declare a typical begin-end block or a fancier begin atomic-end block. The atomic option says that all of the statements in the block must execute successfully or they are all rolled back. There is probably a good book in this somewhere.
Many programming languages define the modulo function, MOD(n,m), only for positive values of both n and m, and leave the result implementation-dependent when either of the parameters is zero or negative. However, negative values for n do have a very nice mathematical interpretation. Let's start with the formal definition of MOD(n,m) as it stands in the SQL3 draft document:
If m is positive, then the result is the unique non-negative exact numeric quantity r with scale 0 such that r is less than m and n = (m X K) + r for some exact numeric quantity K with scale 0.
Len wanted to add that otherwise, the result is an implementation-defined exact numeric quantity r with scale 0, which satisfies the requirements that r is strictly between m and -m, and that n = (m X K) + r for some exact numeric quantity K with scale 0, and a completion condition is raised: warning -- implementation-defined result.
This definition guarantees that the mod function, for a given positive value of m, will be a homomorphism under addition from the mathematical group of all integers, under integer addition, to the modulus group of integers {0, 1, . . ., (m - 1)} under modulus addition. This definition preserves the following group properties:
1) The additive identity is preserved:
MOD(0,m) = 0
2) Additive inverse is preserved:
MOD(-n,m) = -MOD(n,m) -- in the modulus group = mod(-MOD(n,m),m) = m - MOD(n,m)
3) The addition property is preserved:
MOD(x+y,m) = MOD(x,m) + MOD(y,m) -- where "+" is modulus addition = MOD((MOD(x,m) + MOD(y,m)),m)
4) As a result of the above, subtraction is preserved:
MOD(x-y,m) = MOD(x,m) - MOD(y,m) where "-" is modulus subtraction = mod((mod(x,m) - mod(y,m)),m)
From the proposed new definition, we would get the following:
MOD(12,5) = 2 because (5 X K) + 2 = 12 (5 X K) = 10 K = 2
and when we make the number negative and keep the same modulus:
MOD(-12,5) = 3 because (5 X K) + 3 = -12 (5 X K) = -15 K = -3
In some applications, the "best" result to MOD(-12,5) might be "-2" or "-3" rather than "3"; that is probably why various implementations of the MOD function differ. You might want to check out what your favorite programming language does with positive and negative values in its MOD( ) function.
CREATE TABLE Names
(name CHAR(10) NOT NULL PRIMARY KEY);
INSERT INTO Names VALUES ('Al');
INSERT INTO Names VALUES ('Ben');
INSERT INTO Names VALUES ('Charlie');
INSERT INTO Names VALUES ('David');
INSERT INTO Names VALUES ('Ed');
INSERT INTO Names VALUES ('Frank');
INSERT INTO Names VALUES ('Greg');
INSERT INTO Names VALUES ('Howard');
INSERT INTO Names VALUES ('Ida');
INSERT INTO Names VALUES ('Joe');
INSERT INTO Names VALUES ('Ken');
INSERT INTO Names VALUES ('Larry');
INSERT INTO Names VALUES ('Mike');
A simple "SELECT name FROM Names ORDER BY name;" returns the original list in alphabetic order. Suppose, however, that you wanted to display the names three across, like this:
Results
| name1 | name2 | name3 |
| ======================== | ||
| Al | Ben | Charlie |
| David | Ed | Frank |
| Greg | Howard | Ida |
| Joe | Ken | Larry |
| Mike | NULL | NULL |
or four across:
Results
| name1 | name2 | name3 | name4 |
| ============================== | |||
| Al | Ben | Charlie | David |
| Ed | Frank | Greg | Howard |
| Ida | Joe | Ken | Larry |
| Mike | NULL | NULL | NULL |
or any other number across? Can you write single SQL statements that will generate each of these results? (See Puzzle Answer.)
| name1 | name2 |
| =============== | |
| Al | Ben |
| Charlie | David |
| Ed | Frank |
| Greg | Howard |
| Ida | Joe |
| Ken | Larry |
| Mike | NULL |
The self outer join will put the lower alphabetical ranked name in the first column. The MIN( ) aggregate function will then pick the nearest remaining me from the group of names after N1.name. This combination gives immediate successors in the pairs of names.
The WHERE clause is the real trick. We want to find the values of N1.name that will start each row in the desired result table and use that list of names to define the result set. In this case, that result set would be the first name ("Al"), third name ("Charlie"), and so on in the alphabetized list.
Start with an experimental table that looks like this:
SELECT A.name, B.name FROM Names AS A INNER JOIN Names AS B ON A.name <= B.name GROUP BY A.name;Using four names, the ungrouped table would look like:
| A.name | B.name |
| ==================== | |
| Al | Al |
| Al | Ben |
| Al | Charlie |
| Al | David |
| -------------------- | |
| Ben | Ben |
| Ben | Charlie |
| Ben | David |
| -------------------- | |
| Charlie | Charlie |
| Charlie | David |
| -------------------- | |
| David | David |
The predicate (MOD(COUNT(A.name),2) = 0 will find what we want. This predicate is fine for even numbers, but if we have an odd number of people (insert "Ed" into the example), we need to get that "orphaned" row into the result table. You can do this by knowing the total number of rows in the original table and using it to adjust the selection of the first column in the final result table. I am skipping some algebra, but you can work it out easily.
Instead of doing the cases for three and four across, let's jump directly to five across to show how the solution generalizes:
SELECT N1.name, MIN(N2.name) AS name2, MIN(N3.name) AS name3, MIN(N4.name) AS name4, MIN(N5.name) AS name5 FROM (Names AS N1 LEFT OUTER JOIN Names AS N2 ON N1.name < N2.name) LEFT OUTER JOIN Names AS N3 ON N1.name < N2.name AND N2.name < N3.name LEFT OUTER JOIN Names AS N4 ON N1.name < N2.name AND N2.name < N3.name AND N3.name < N4.name LEFT OUTER JOIN Names AS N5 ON N1.name < N2.name AND N2.name < N3.name AND N3.name < N4.name AND N4.name < N5.name WHERE N1.name IN (SELECT A.name FROM Names AS A INNER JOIN Names AS B ON A.name <= B.name GROUP BY A.name HAVING MOD(COUNT(B.name), 5) = (SELECT MOD(COUNT(*),5) FROM Names)) GROUP BY N1.name ORDER BY N1.name;Results
| name1 | name2 | name3 | name4 | name5 |
| ====================================== | ||||
| Al | Ben | Charlie | David | Ed |
| Frank | Greg | Howard | Ida | Joe |
| Ken | Larry | Mike | NULL | NULL |
-- Joe Celko