"Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> writes:
1. the transition times are 64 bits rather than 32 bits, doubling the size. 2. About 400 years of transitions are recorded rather than about 100, quadrupling the size. The combination of the two consideration means that the new data takes about 8 times as much space as the old, and the total is about 9 times as much as the old.
Ah, thanks, that explains it. I didn't know about (2). How about if we document this? Here's a proposed patch to the Theory file that explains this, along with some other issues that I noticed when I reread that file: * Update references to POSIX, etc.. * The tz code does not yet support the quoted time zone abbreviation syntax required by POSIX starting in 2001. * Add an example of a POSIX TZ setting. --- Theory 2004/05/27 16:00:30 2004.1 +++ Theory 2005/04/25 18:55:19 2004.1.0.1 @@ -12,26 +12,31 @@ ----- Time and date functions ----- -These time and date functions are upwards compatible with POSIX.1, +These time and date functions are mostly upwards compatible with POSIX, an international standard for UNIX-like systems. -As of this writing, the current edition of POSIX.1 is: +As of this writing, the current edition of POSIX is: - Information technology --Portable Operating System Interface (POSIX (R)) - -- Part 1: System Application Program Interface (API) [C Language] - ISO/IEC 9945-1:1996 - ANSI/IEEE Std 1003.1, 1996 Edition - 1996-07-12 + Standard for Information technology + -- Portable Operating System Interface (POSIX (R)) + -- System Interfaces + IEEE Std 1003.1, 2004 Edition + <http://www.opengroup.org/online-pubs?DOC=7999959899> + <http://www.opengroup.org/pubs/catalog/t041.htm> + +Currently the only POSIX feature not implemented is quoted time zone +abbreviations, e.g., TZ='<UTC-10>10' for a time zone 10 hours behind +UTC whose abbreviation is "UTC-10". -POSIX.1 has the following properties and limitations. +POSIX has the following properties and limitations. -* In POSIX.1, time display in a process is controlled by the - environment variable TZ. Unfortunately, the POSIX.1 TZ string takes +* In POSIX, time display in a process is controlled by the + environment variable TZ. Unfortunately, the POSIX TZ string takes a form that is hard to describe and is error-prone in practice. - Also, POSIX.1 TZ strings can't deal with other (for example, Israeli) + Also, POSIX TZ strings can't deal with other (for example, Israeli) daylight saving time rules, or situations where more than two time zone abbreviations are used in an area. - The POSIX.1 TZ string takes the following form: + The POSIX TZ string takes the following form: stdoffset[dst[offset],date[/time],date[/time]] @@ -40,6 +45,9 @@ POSIX.1 has the following properties and std and dst are 3 or more characters specifying the standard and daylight saving time (DST) zone names. + Starting with POSIX.1-2001, std and dst may also be + in a quoted form like "<UTC+10>"; this allows + "+" and "-" in the names. offset is of the form `[-]hh:[mm[:ss]]' and specifies the offset west of UTC. The default DST offset is one hour @@ -61,15 +69,25 @@ POSIX.1 has the following properties and where week 1 is the first week in which day d appears, and `5' stands for the last week in which day d appears (which may be either the 4th or 5th week). + + Here is an example POSIX TZ string, for US Pacific time using rules + appropriate from 1987 through at least 2005: -* In POSIX.1, when a TZ value like "EST5EDT" is parsed, - typically the current US DST rules are used, + TZ='PST8PDT,M4.1.0/02:00,M10.5.0/02:00' + + This POSIX TZ string is hard to remember, and mishandles time stamps + before 1987. With this package you can use this instead: + + TZ='America/Los_Angeles' + +* POSIX does not define the exact meaning of TZ values like "EST5EDT". + Typically the current US DST rules are used to interpret such values, but this means that the US DST rules are compiled into each program that does time conversion. This means that when US time conversion rules change (as in the United States in 1987), all programs that do time conversion must be recompiled to ensure proper results. -* In POSIX.1, there's no tamper-proof way for a process to learn the +* In POSIX, there's no tamper-proof way for a process to learn the system's best idea of local wall clock. (This is important for applications that an administrator wants used only at certain times-- without regard to whether the user has fiddled the "TZ" environment @@ -78,9 +96,9 @@ POSIX.1 has the following properties and daylight saving time shifts--as might be required to limit phone calls to off-peak hours.) -* POSIX.1 requires that systems ignore leap seconds. +* POSIX requires that systems ignore leap seconds. -These are the extensions that have been made to the POSIX.1 functions: +These are the extensions that have been made to the POSIX functions: * The "TZ" environment variable is used in generating the name of a file from which time zone information is read (or is interpreted a la @@ -108,7 +126,7 @@ These are the extensions that have been * To handle places where more than two time zone abbreviations are used, the functions "localtime" and "gmtime" set tzname[tmp->tm_isdst] (where "tmp" is the value the function returns) to the time zone - abbreviation to be used. This differs from POSIX.1, where the elements + abbreviation to be used. This differs from POSIX, where the elements of tzname are only changed as a result of calls to tzset. * Since the "TZ" environment variable can now be used to control time @@ -136,6 +154,18 @@ These are the extensions that have been Points of interest to folks with other systems: +* In 2005 this package started generating time zone information files + containing two sets of data. The first set uses 32-bit time stamps + and covers times from 1901-12-13 20:45:52 through 2038-01-19 + 03:14:07 UTC; it is for backward compatibility with older versions of + this and other libraries. The second set uses 64-bit time stamps + and contains about 400 years of transition times, which are + extrapolated into the indefinite future; it is for newer libraries, + typically on hosts with 64-bit time stamps. New files are + approximately nine times the size of the old, because the added data + set contains about four times as many transitions, and its time + stamps are twice as wide. + * This package is already part of many POSIX-compliant hosts, including BSD, HP, Linux, Network Appliance, SCO, SGI, and Sun. On such hosts, the primary use of this package @@ -173,9 +203,9 @@ Hewlett Packard, offer a wider selection beyond those provided here. The absence of such functions from this package is not meant to discourage the development, standardization, or use of such functions. Rather, their absence reflects the decision to make this package -contain valid extensions to POSIX.1, to ensure its broad -acceptability. If more powerful time conversion functions can be standardized, -so much the better. +contain valid extensions to POSIX, to ensure its broad acceptability. If +more powerful time conversion functions can be standardized, so much the +better. ----- Names of time zone rule files ----- @@ -277,7 +307,7 @@ and `Factory' (see the file `factory'). ----- Time zone abbreviations ----- When this package is installed, it generates time zone abbreviations -like `EST' to be compatible with human tradition and POSIX.1. +like `EST' to be compatible with human tradition and POSIX. Here are the general rules used for choosing time zone abbreviations, in decreasing order of importance: @@ -292,17 +322,16 @@ in decreasing order of importance: preferred "ChST", so the rule has been relaxed. This rule guarantees that all abbreviations could have - been specified by a POSIX.1 TZ string. POSIX.1 + been specified by a POSIX TZ string. POSIX requires at least three characters for an - abbreviation. POSIX.1-1996 says that an abbreviation + abbreviation. POSIX through 2000 says that an abbreviation cannot start with ':', and cannot contain ',', '-', - '+', NUL, or a digit. Draft 7 of POSIX 1003.1-200x - changes this rule to say that an abbreviation can - contain only '-', '+', and alphanumeric characters in - the current locale. To be portable to both sets of + '+', NUL, or a digit. POSIX from 2001 on changes this + rule to say that an abbreviation can contain only '-', '+', + and alphanumeric characters from the portable character set + in the current locale. To be portable to both sets of rules, an abbreviation must therefore use only ASCII - letters, as these are the only letters that are - alphabetic in all locales. + letters. Use abbreviations that are in common use among English-speakers, e.g. `EST' for Eastern Standard Time in North America. @@ -343,10 +372,10 @@ abbreviations like `EST'; this avoids th Calendrical issues are a bit out of scope for a time zone database, but they indicate the sort of problems that we would run into if we extended the time zone database further into the past. An excellent -resource in this area is Nachum Dershowitz and Edward M. Reingold, -<a href="http://emr.cs.uiuc.edu/home/reingold/calendar-book/index.shtml"> -Calendrical Calculations -</a>, Cambridge University Press (1997). Other information and +resource in this area is Edward M. Reingold and Nachum Dershowitz, +<a href="http://emr.cs.uiuc.edu/home/reingold/calendar-book/second-edition/"> +Calendrical Calculations: The Millennium Edition +</a>, Cambridge University Press (2001). Other information and sources are given below. They sometimes disagree. @@ -546,7 +575,7 @@ Sources: Michael Allison and Robert Schmunk, "Technical Notes on Mars Solar Time as Adopted by the Mars24 Sunclock" -<http://www.giss.nasa.gov/tools/mars24/help/notes.html> (2004-03-15). +<http://www.giss.nasa.gov/tools/mars24/help/notes.html> (2004-07-30). Jia-Rui Chong, "Workdays Fit for a Martian", Los Angeles Times (2004-01-14), pp A1, A20-A21.