DBMS, July 1997
DBMS Online: SQL for Smarties By Joe Celko

Year 2000 -- Threat or Menace?

Joe breaks the Year 2000 problem down into four separate but related problems.

I have written a few articles and done presentations on the Year 2000 problem, but I've only mentioned it briefly in my column. When speaking about dates, the convention has been to describe formats with pictures, à la Cobol or some versions of Basic. "yyyy" or "ccyy" is a four-digit year. "yy" is a two-digit year or, more properly, year within century. "mm" is the month as a number from 01 to 12. "mmm" is the month as a three-letter abbreviation. "dd" is the day of the month as a number between 01 and 31.

I break the Year 2000 problem down into four separate but related problems:

The Odometer Problem

The odometer problem is the fact that not all hardware uses a four-digit year and will not be able to physically represent the year 2000 on the system clock.

PCs -- the worst offenders -- have rollover dates of 1980, 1900, 1984, and perhaps others I have not seen yet. Mainframes are safer because they use a longer word length and will not blow up until a few decades into the second millennium.

If you would like to check your PC, you can download a program named DOSCHK from several different Web sites: from the DOSCHK home page (ourworld.compuserve.com/homepages/saphena/year2000.htm), from the home page for Year2000.exe and the RighTime product range (www.rightime.com), from DOSCHK's author Bob Stammers (70630.2720@compuserve.com), from the Year2000 Forum on CompuServe, or from the Year 2000 home page at www.year2000.com. This little program sets the clock to just before midnight on New Year's Day 1999, lets 10 seconds pass so you get a rollover, shows the system time display and what the realtime clock is actually saying, and resets your machine back like it was. You can do the same thing manually or write a short program in many application development tools. You might also want to check out the Risk Forum on usenet at comp.risks.

SQL is not immune to the odometer problem. Some application programs access the realtime clock directly, so software patches do not work. Some programs accept both the realtime clock and the system time. When these programs mess up a date, they can pass their errors into a SQL database.

The Millennium Problem

This is the one that everyone reads about in the press. People used the "mm-dd-yy" format (or the "dd-mm-yy" format in the U.K. and parts of Europe) in legacy application programs, and now they cannot tell the 1900s from the 2000s.

In case you don't think this is a real problem anymore, on April 10, 1997, Edward L. King (76711.2533@compuserve.com), a systems operator on the CompuServe Year2000 Forum, ran into a month old POS (Point of Sale) credit card system installed in one of the offices where he works. A representative for Visa was talking about the system's Year 2000 compatibility and provided a test for it. The system bombed out. Beware, folks. Just because you got it recently doesn't mean it's right!

Standard ANSI/ISO SQL-92 has a full specification for dates and times that are based on ISO-8601 standards and should have no Year 2000 problems. This standard requires the "yyyy-mm-dd" format and 24-hour time formats based on the UTC (Universal Coordinated Time, formerly the GMT or Greenwich Mean Time) and has a lot of details in it.

Within SQL-92, local times are handled by a schema table of the time zones and their displacements, which can be changed when Daylight Saving Time (DST) comes into effect or another country decides to change its laws to get in line with the European Union conventions.

Too bad almost no products have bothered to implement the SQL standard for dates. Instead, most products only have a local clock and calendar, not a UTC. This is dangerous as we get more and more distributed databases in different time zones. Given two transactions that occur on an airplane moving over the International Date Line, you can have the later transaction given a timestamp that is one day earlier.

They also let you use display formats other than ISO 8601 for input and output. The selections always include several two-digit year formats.

I have a strong recommendation that you should replace all of your rubber date stamps with ones that have a four-digit year and the year-month-day order. If you can find U.S. Army rubber stamps, the three-letter month is not a problem. In fact, people sort, read, and write the Army's "yyyy-mmm-dd" format at least 15 percent more accurately than the "yy/mm/dd" civilian format, according to tests that were done in World War II when data processing was still a manual operation.

Then change all of your display screens, all of your reports, preprinted forms, and anything else you can find with a date on it.

The Weird Dates Problem

This is one problem to which SQL is immune internally. In the old days, the programmers wrote their own date routines in Cobol, Fortran, or whatever language they had. They did not always edit for invalid dates but instead would allow certain "special codes" to be put into the fields.

The most common example is "9999-99-99" or "99-99-99" for "eternity"; that is, some indefinite date in the future. This was very handy for expiration dates on warranties. If the program did an edit on the dates, then eternity became "9999-12-31" instead.

There was also "0000-00-00" or "00-00-00" for "negative eternity"; that is, some indefinite date in the past. Likewise, if the program did an edit on the dates, then negative eternity became "0001-01-01" instead. You would also see other codes used in date fields, but the idea was to pick something that would sort either high or lower for reporting purposes.

When you try to migrate legacy data with weird dates into SQL, there are problems. The first impulse is to make all obviously weird dates into nulls. A better way is to split them off as flags (0= valid date, 1= missing date, 2= eternity, etc.) in separate columns and build a table of flag codes for reports, avoiding nulls.

The use of real dates such as "9999-12-31" or "0001-01-01" for these same purposes is fine with SQL, and it will proceed to do arithmetic with them once they are in the database. But you do get weird statistics with them; think about computing the average age of a customer when a lot of them were born in the year one.

The Leap Year Problem

There are not 365.25 days per year, but actually there are 365.2422 days per year; every 400 years, you accumulate an extra day. The year 2000 is a leap year; here is the formula as a block of Pascal code: The correct rule for leap years in Pascal is:
FUNCTION leapyear (year: INTEGER): BOOLEAN; 
BEGIN 
IF ((year MOD 400) = 0) 
THEN leapyear := TRUE
ELSE IF ((year MOD 100) = 0)
THEN leapyear := FALSE
ELSE IF ((year MOD 4) = 0)
THEN leapyear := TRUE
ELSE leapyear := FALSE;
END;
Technically, carrying the days per year out to four decimal places is still rounding off a bit. If you want to go into painful details adjusting the clock to the exact rotation of the Earth, a minute can be between 58 and 62 seconds. In fact, there was a leap second on June 30, 1997. Please adjust your clocks accordingly, if you did not do so. If you like to check out this sort of thing, tune your Web browser to tycho.usno.navy.mil/leap.html.

You might also look at the time zone files maintained by NIST: ftp://elsie.nci.nih.gov/pub/tzcode1997e.tar.gz and ftp://elsie.nci.nih.gov/pub/tzdata1997f.tar.gz.

Although the SQL Standard says that you have to follow all this stuff, the real world is not always so accurate. I do not know of any SQL product that messes up the leap year in 2000, but you will have have problems communicating with legacy systems that got it wrong.

Overlapping Time Periods

There are three basic ways to represent a chronological period in a database. It can be specified as a pair of starting and ending dates, as a starting date and a duration in days from it, or as a set of discrete dates. The first two methods are really identical, because each one can be converted into the other; they are based on continuous time periods. The third is fundamentally different because it is not a continuous interval, but rather a set of discrete values.

When a Westerner gives someone's age, she uses the discrete unit model by rounding a duration to a single year and counting the number of years involved. When an Asian gives someone's age, he says that that person is in his or her such-and-such year, using a time period model. A 49-year-old man is in his fiftieth year.

This makes a difference in reporting. Assume that Charlotte was born on December 31, 1980 and that today is January 1, 1997. The day before yesterday, she was 17 years old. Next year, she will be 20 years old. She was 18 years old yesterday and will be 19 years old during all of 1997. If you did a report on the age of your employees a few days apart, you would have had a jump in the age groups, which would be noticeable because Charlotte went from being a minor (under 18) to being a legal adult (21 and over) in this period.

SQL-92 added the OVERLAPS predicate to determine whether or not two chronological periods overlap in time, using both of the interval specifications. The syntax looks like this (ignoring the fact the <end date>s could also be intervals):

(<start date 1>, <end date 1>)
OVERLAPS
(<start date 2>, <end date 2>)

If the start date is NULL or if the end date comes before the start date, then flip them around. The intervals used in the OVERLAPS predicate include their starting point but are open on the end.

The result of the OVERLAPS predicate is the following expression:

(startdate_1 > startdate_2 
AND NOT (startdate_1 >= enddate_2 
AND enddate_1 >= enddate_2)) 
OR (startdate_2 > startdate_1 
AND NOT (startdate_2 >= enddate_1 
AND enddate_2 >= enddate_1)) 
OR (startdate_1 = startdate_2 
AND (enddate_1 <> enddate_2 OR enddate_1 = 
enddate_2))
This formal definition looks tricky, but if you draw pictures of time lines, it will make sense. The definition is saying that a time period is a half-open interval, which includes its starting date but not its ending date. Chris Date regards this as a problem (see Database Programming & Design, March 1997, "Dates and Times in the SQL Standard -- Part 2," page 55), because if the first period is a single date equal to the value of the start of the second period, then the predicate is true, but if the first period is a single date equal to the value of the end of the second period, then the predicate is FALSE.

I regard the use of half-open intervals as one of the best features of the OVERLAPS predicate because:

  1. Two half-open intervals concatenated together yield a half-open interval, and they do not duplicate an overlapping date like closed intervals would.
  2. If you remove a half-open interval from the middle of a half-open interval, you get one or two half-open intervals as a result.

Puzzle

Rather than do a SQL problem this month, let me show you that people have trouble calculating with time with two old puzzles.
  1. If a hen-and-a-half can lay an egg-and-a-half in a day-and-a-half, then how many hens does it take to lay six eggs in six days?
  2. If 10 hens can hatch 10 eggs in 10 days, how long does it take 5 hens to hatch 5 eggs?
See answer.


Joe Celko is an Atlanta-based guru with Northern Lights Software Ltd. and a member of the ANSI X3H2 Database Standards Committee. He is also the author of two books on SQL: SQL For Smarties (Morgan-Kaufmann, 1995) and Instant SQL Programming (Wrox Press, 1995). You can contact Joe via CompuServe at 71062,1056 or via email at 71062.1056@compuserve.com.


ANSWER

1. The answer is a hen-and-a-half. People tend to get tripped up on the rate (eggs per hen per day) because they handle time wrong. For example, if a cookbook has a recipe that serves one and you want to serve 100 guests, you increase the amount of ingredients by 100, but you do not cook it 100 times longer. The algebra looks like this, where rate is in terms of "eggs per day," a rather fun unit of measurement:

1.5 hens * 1.5 days * rate = 1.5 eggs

The first urge is to multiple both sides by 2/3 and turn all of the 1.5s into 1s. But what you actually get is:

1.5 hens * 1.5 days * rate = 1.5 eggs
1 hens * 1.5 days * rate = 1 egg
1.5 days * rate = 1 egg per hen
rate = 2/3 egg per hen per day

2. Ten days, not five like you answered without thinking. If one pregnant woman can produce a baby in nine months, can nine pregnant women make a baby in one month?

-- Joe Celko


What did you think of this article? Send a letter to the editor.


Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
July 1997 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1997 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Wednesday, June 18, 1997