I have written a few articles and done presentations on the Year 2000 problem, but I've only mentioned it briefly in my column. When speaking about dates, the convention has been to describe formats with pictures, à la Cobol or some versions of Basic. "yyyy" or "ccyy" is a four-digit year. "yy" is a two-digit year or, more properly, year within century. "mm" is the month as a number from 01 to 12. "mmm" is the month as a three-letter abbreviation. "dd" is the day of the month as a number between 01 and 31.
I break the Year 2000 problem down into four separate but related problems:
PCs -- the worst offenders -- have rollover dates of 1980, 1900, 1984, and perhaps others I have not seen yet. Mainframes are safer because they use a longer word length and will not blow up until a few decades into the second millennium.
If you would like to check your PC, you can download a program named DOSCHK from several different Web sites: from the DOSCHK home page (ourworld.compuserve.com/homepages/saphena/year2000.htm), from the home page for Year2000.exe and the RighTime product range (www.rightime.com), from DOSCHK's author Bob Stammers (70630.2720@compuserve.com), from the Year2000 Forum on CompuServe, or from the Year 2000 home page at www.year2000.com. This little program sets the clock to just before midnight on New Year's Day 1999, lets 10 seconds pass so you get a rollover, shows the system time display and what the realtime clock is actually saying, and resets your machine back like it was. You can do the same thing manually or write a short program in many application development tools. You might also want to check out the Risk Forum on usenet at comp.risks.
SQL is not immune to the odometer problem. Some application programs access the realtime clock directly, so software patches do not work. Some programs accept both the realtime clock and the system time. When these programs mess up a date, they can pass their errors into a SQL database.
In case you don't think this is a real problem anymore, on April 10, 1997, Edward L. King (76711.2533@compuserve.com), a systems operator on the CompuServe Year2000 Forum, ran into a month old POS (Point of Sale) credit card system installed in one of the offices where he works. A representative for Visa was talking about the system's Year 2000 compatibility and provided a test for it. The system bombed out. Beware, folks. Just because you got it recently doesn't mean it's right!
Standard ANSI/ISO SQL-92 has a full specification for dates and times that are based on ISO-8601 standards and should have no Year 2000 problems. This standard requires the "yyyy-mm-dd" format and 24-hour time formats based on the UTC (Universal Coordinated Time, formerly the GMT or Greenwich Mean Time) and has a lot of details in it.
Within SQL-92, local times are handled by a schema table of the time zones and their displacements, which can be changed when Daylight Saving Time (DST) comes into effect or another country decides to change its laws to get in line with the European Union conventions.
Too bad almost no products have bothered to implement the SQL standard for dates. Instead, most products only have a local clock and calendar, not a UTC. This is dangerous as we get more and more distributed databases in different time zones. Given two transactions that occur on an airplane moving over the International Date Line, you can have the later transaction given a timestamp that is one day earlier.
They also let you use display formats other than ISO 8601 for input and output. The selections always include several two-digit year formats.
I have a strong recommendation that you should replace all of your rubber date stamps with ones that have a four-digit year and the year-month-day order. If you can find U.S. Army rubber stamps, the three-letter month is not a problem. In fact, people sort, read, and write the Army's "yyyy-mmm-dd" format at least 15 percent more accurately than the "yy/mm/dd" civilian format, according to tests that were done in World War II when data processing was still a manual operation.
Then change all of your display screens, all of your reports, preprinted forms, and anything else you can find with a date on it.
The most common example is "9999-99-99" or "99-99-99" for "eternity"; that is, some indefinite date in the future. This was very handy for expiration dates on warranties. If the program did an edit on the dates, then eternity became "9999-12-31" instead.
There was also "0000-00-00" or "00-00-00" for "negative eternity"; that is, some indefinite date in the past. Likewise, if the program did an edit on the dates, then negative eternity became "0001-01-01" instead. You would also see other codes used in date fields, but the idea was to pick something that would sort either high or lower for reporting purposes.
When you try to migrate legacy data with weird dates into SQL, there are problems. The first impulse is to make all obviously weird dates into nulls. A better way is to split them off as flags (0= valid date, 1= missing date, 2= eternity, etc.) in separate columns and build a table of flag codes for reports, avoiding nulls.
The use of real dates such as "9999-12-31" or "0001-01-01" for these same purposes is fine with SQL, and it will proceed to do arithmetic with them once they are in the database. But you do get weird statistics with them; think about computing the average age of a customer when a lot of them were born in the year one.
FUNCTION leapyear (year: INTEGER): BOOLEAN; BEGIN IF ((year MOD 400) = 0) THEN leapyear := TRUE ELSE IF ((year MOD 100) = 0) THEN leapyear := FALSE ELSE IF ((year MOD 4) = 0) THEN leapyear := TRUE ELSE leapyear := FALSE; END;Technically, carrying the days per year out to four decimal places is still rounding off a bit. If you want to go into painful details adjusting the clock to the exact rotation of the Earth, a minute can be between 58 and 62 seconds. In fact, there was a leap second on June 30, 1997. Please adjust your clocks accordingly, if you did not do so. If you like to check out this sort of thing, tune your Web browser to tycho.usno.navy.mil/leap.html.
You might also look at the time zone files maintained by NIST: ftp://elsie.nci.nih.gov/pub/tzcode1997e.tar.gz and ftp://elsie.nci.nih.gov/pub/tzdata1997f.tar.gz.
Although the SQL Standard says that you have to follow all this stuff, the real world is not always so accurate. I do not know of any SQL product that messes up the leap year in 2000, but you will have have problems communicating with legacy systems that got it wrong.
When a Westerner gives someone's age, she uses the discrete unit model by rounding a duration to a single year and counting the number of years involved. When an Asian gives someone's age, he says that that person is in his or her such-and-such year, using a time period model. A 49-year-old man is in his fiftieth year.
This makes a difference in reporting. Assume that Charlotte was born on December 31, 1980 and that today is January 1, 1997. The day before yesterday, she was 17 years old. Next year, she will be 20 years old. She was 18 years old yesterday and will be 19 years old during all of 1997. If you did a report on the age of your employees a few days apart, you would have had a jump in the age groups, which would be noticeable because Charlotte went from being a minor (under 18) to being a legal adult (21 and over) in this period.
SQL-92 added the OVERLAPS predicate to determine whether or not two chronological periods overlap in time, using both of the interval specifications. The syntax looks like this (ignoring the fact the <end date>s could also be intervals):
(<start date 1>, <end date 1>)
OVERLAPS
(<start date 2>, <end date 2>)
If the start date is NULL or if the end date comes before the start date, then flip them around. The intervals used in the OVERLAPS predicate include their starting point but are open on the end.
The result of the OVERLAPS predicate is the following expression:
(startdate_1 > startdate_2 AND NOT (startdate_1 >= enddate_2 AND enddate_1 >= enddate_2)) OR (startdate_2 > startdate_1 AND NOT (startdate_2 >= enddate_1 AND enddate_2 >= enddate_1)) OR (startdate_1 = startdate_2 AND (enddate_1 <> enddate_2 OR enddate_1 = enddate_2))This formal definition looks tricky, but if you draw pictures of time lines, it will make sense. The definition is saying that a time period is a half-open interval, which includes its starting date but not its ending date. Chris Date regards this as a problem (see Database Programming & Design, March 1997, "Dates and Times in the SQL Standard -- Part 2," page 55), because if the first period is a single date equal to the value of the start of the second period, then the predicate is true, but if the first period is a single date equal to the value of the end of the second period, then the predicate is FALSE.
I regard the use of half-open intervals as one of the best features of the OVERLAPS predicate because:
1.5 hens * 1.5 days * rate = 1.5 eggs
The first urge is to multiple both sides by 2/3 and turn all of the 1.5s into 1s. But what you actually get is:
1.5 hens * 1.5 days * rate = 1.5 eggs
1 hens * 1.5 days * rate = 1 egg
1.5 days * rate = 1 egg per hen
rate = 2/3 egg per hen per day
2. Ten days, not five like you answered without thinking. If one pregnant woman can produce a baby in nine months, can nine pregnant women make a baby in one month?
-- Joe Celko
What did you think of this article? Send a letter to the editor.