Time zone package: the next generation Over the last year we've done as much as possible, within the existing time zone package framework, to cope with systems with 64-bit time_t values. But the zone information compiler (zic) still produces binary files with 32-bit transition time values. Something's gotta give. As long as we're making changes, it's best to do as much as possible (to avoid the need for further change down the road). I've listed problems, approaches, and questions below. Much of this material is related to general matters of time rather than specific matters of time zones; my apologies. PROBLEMS * Future transition times/past transition times The binary files produced by zic record transition times as 32-bit values; times after the 2038 (or before 1901) cannot be represented. (The future limit can be extended to 2106 by treating the values as unsigned, but if that's done times before 1970 cannot be represented.) * Transitions in Israel Israel now goes back to standard time in the fall on the Saturday before Yom Kippur; there's no convenient way to represent this in the input to zic. * Julian-Gregorian transition Signed 32-bit time_t values can only represent years going back to 1901; this means that for most areas of the world the Gregorian calendar is in effect for all times representable by such a time_t. Signed 64-bit time_t values have a far greater range; the range always includes all instants when areas switched from Julian to Gregorian. There's no provision for handling the switch in the time zone package. (The transition happens at different times in different places; in addition to handling the jump over a number of days, there's also the matter of figuring out whether a year ending in 00 is a leap year in a particular place.) * Year zero Some year numbering schemes skip over the year zero; others do not. There's no provision for specifying whether or not to skip in the time zone package. * Early years of Julian calendar Leap years were inserted every three (rather than four) years early in the life of the Julian calendar; some leap years were skipped later to make up for this. Month lengths (and names) were also in flux for a while. Documentation of the glitches is shaky; there's no way to reflect what documentation we do have. * Pre-Julian calendars The time zone package cannot handle information about the Roman Republic calendar or any of its predecessors. * Non-Julian-Gregorian calendars The time zone package cannot handle information about non-Julian-Gregorian time schemes (Mayan, Martian, and so on). * Big Bang/Big Crunch Signed 64-bit time_t values have enough range to go back to theorized time of the Big Bang origin of the universe and thus back to the start of time itself. Some folks might want to fold all instants "before" the Big Bang in to that instant. At the other end, advocates of the Big Crunch theory might want to treat time_t values greater than the predicted Crunch instant as if they were the Crunch instant. There's no way to do such pegging in the time zone package. * Creation/Apocalypse Some folks might want to peg past times at a predicted time of Creation and peg future times at a predicted Apocalypse. APPROACHES * Do nothing Since things don't get sticky until at least 2037, it's possible to wait (at least for a while) before taking action. * Tweak the binary file format At the least this would involve widening stored transition times beyond 32 bits. It might also be necessary to widen offsets as a way of coping with Julian/Gregorian shifts and year zero skips. * Abandon binary files We're now operating on the "terminfo" model in which human-readable descriptions are converted to binary form (with some precomputation done) for use by programs. We could shift to the earlier "termcap" model, simply copying files such as "asia" and "northamerica" to a public directory and interpreting them at run time. This eliminates the need to change binary file formats (since such files disappear); there might still be a need to change the source file format if we wanted to do things such as handle Julian/Gregorian transitions. Responsibly taking this approach might involve learning why the termcap-to-terminfo transition occurred, and whether the reasons are still applicable in today's computing environment. * Preprocess We could change "zic" so that for each zone it outputs a file with only those Rule and Zone lines required for the zone; there could be some simplification of the output (such as expressing all times in UTC) to ease interpretation. Again there would be run-time interpretation, but the job would be scaled down (by pre-identification of relevant data) and simplified. * Change zic's output to another format We could take the vzic route of reading the existing source files and producing VZONEINFO format output. We might want to extend the VZONEINFO format (for example, to handle leap seconds). We might want to produce output in some other existing format, or in a newly designed format. QUESTIONS * Do we handle Julian/Gregorian transitions? * Do we allow control of skipping the year zero? * Do we handle early-Julian leap year variations? * Do we handle pegging of far-past and far-future times? * For each new of the above: * What default assumption should be used in zic and in run-time software? * Can the assumption be overridden in a time zone source file? If so, how? * Can the assumption be overridden using an environment variable? If so, how? * Can the assumption be overridden with function calls? If so, how? * Do we simplify handling of events tied to non-Gregorian-calendar-related events (such as Yom Kippur)? * Do we handle pre-Julian or non-Julian-Gregorian time schemes? * When Sherman types... %horton TZ=Europe/Rome wayback March 15 -44 ...will he and Mr. Peabody witness an assassination? So...before discussing any of the above in detail...are the problems, approaches, and questions above correct and complete? If not, what should be changed? --ado
On a practical note, there is no way I'd be willing to trade binary compatibility for any of the benefits listed, at least until we're approaching 2038. Old binaries never die until there's a compelling reason, as in the Y2K scenario. I suggest that the compiler needs to generate the current 32-bit output format for the foreseeable future. It might also produce an enhanced version for new applications, but that has to be in addition to, not instead of, the old format. I'm OK with the idea of parallel zoneinfo & newzoneinfo directories. I suggest that the new format should use an XML-based structure, not binary. That should make them portable between machines with different endians and time_t sizes. Yes, it's more expensive to read, but that's the way the whole industry is going. Stuart Taylor Cisco Systems
-----Original Message----- From: Olson, Arthur David (NIH/NCI) [mailto:olsona@dc37a.nci.nih.gov] Sent: Thursday, March 03, 2005 2:49 PM To: Tz (tz@elsie.nci.nih.gov) Subject: Time zone: the next generation
Time zone package: the next generation
Over the last year we've done as much as possible, within the existing time zone package framework, to cope with systems with 64-bit time_t values. But the zone information compiler (zic) still produces binary files with 32-bit transition time values. Something's gotta give.
As long as we're making changes, it's best to do as much as possible (to avoid the need for further change down the road).
I've listed problems, approaches, and questions below. Much of this material is related to general matters of time rather than specific matters of time zones; my apologies.
PROBLEMS
* Future transition times/past transition times The binary files produced by zic record transition times as 32-bit values; times after the 2038 (or before 1901) cannot be represented. (The future limit can be extended to 2106 by treating the values as unsigned, but if that's done times before 1970 cannot be represented.) * Transitions in Israel Israel now goes back to standard time in the fall on the Saturday before Yom Kippur; there's no convenient way to represent this in the input to zic. * Julian-Gregorian transition Signed 32-bit time_t values can only represent years going back to 1901; this means that for most areas of the world the Gregorian calendar is in effect for all times representable by such a time_t. Signed 64-bit time_t values have a far greater range; the range always includes all instants when areas switched from Julian to Gregorian. There's no provision for handling the switch in the time zone package. (The transition happens at different times in different places; in addition to handling the jump over a number of days, there's also the matter of figuring out whether a year ending in 00 is a leap year in a particular place.) * Year zero Some year numbering schemes skip over the year zero; others do not. There's no provision for specifying whether or not to skip in the time zone package. * Early years of Julian calendar Leap years were inserted every three (rather than four) years early in the life of the Julian calendar; some leap years were skipped later to make up for this. Month lengths (and names) were also in flux for a while. Documentation of the glitches is shaky; there's no way to reflect what documentation we do have. * Pre-Julian calendars The time zone package cannot handle information about the Roman Republic calendar or any of its predecessors. * Non-Julian-Gregorian calendars The time zone package cannot handle information about non-Julian-Gregorian time schemes (Mayan, Martian, and so on). * Big Bang/Big Crunch Signed 64-bit time_t values have enough range to go back to theorized time of the Big Bang origin of the universe and thus back to the start of time itself. Some folks might want to fold all instants "before" the Big Bang in to that instant. At the other end, advocates of the Big Crunch theory might want to treat time_t values greater than the predicted Crunch instant as if they were the Crunch instant. There's no way to do such pegging in the time zone package. * Creation/Apocalypse Some folks might want to peg past times at a predicted time of Creation and peg future times at a predicted Apocalypse.
APPROACHES
* Do nothing Since things don't get sticky until at least 2037, it's possible to wait (at least for a while) before taking action. * Tweak the binary file format At the least this would involve widening stored transition times beyond 32 bits. It might also be necessary to widen offsets as a way of coping with Julian/Gregorian shifts and year zero skips. * Abandon binary files We're now operating on the "terminfo" model in which human-readable descriptions are converted to binary form (with some precomputation done) for use by programs. We could shift to the earlier "termcap" model, simply copying files such as "asia" and "northamerica" to a public directory and interpreting them at run time. This eliminates the need to change binary file formats (since such files disappear); there might still be a need to change the source file format if we wanted to do things such as handle Julian/Gregorian transitions. Responsibly taking this approach might involve learning why the termcap-to-terminfo transition occurred, and whether the reasons are still applicable in today's computing environment. * Preprocess We could change "zic" so that for each zone it outputs a file with only those Rule and Zone lines required for the zone; there could be some simplification of the output (such as expressing all times in UTC) to ease interpretation. Again there would be run-time interpretation, but the job would be scaled down (by pre-identification of relevant data) and simplified. * Change zic's output to another format We could take the vzic route of reading the existing source files and producing VZONEINFO format output. We might want to extend the VZONEINFO format (for example, to handle leap seconds). We might want to produce output in some other existing format, or in a newly designed format.
QUESTIONS
* Do we handle Julian/Gregorian transitions? * Do we allow control of skipping the year zero? * Do we handle early-Julian leap year variations? * Do we handle pegging of far-past and far-future times? * For each new of the above: * What default assumption should be used in zic and in run-time software? * Can the assumption be overridden in a time zone source file? If so, how? * Can the assumption be overridden using an environment variable? If so, how? * Can the assumption be overridden with function calls? If so, how? * Do we simplify handling of events tied to non-Gregorian-calendar-related events (such as Yom Kippur)? * Do we handle pre-Julian or non-Julian-Gregorian time schemes? * When Sherman types... %horton TZ=Europe/Rome wayback March 15 -44 ...will he and Mr. Peabody witness an assassination?
So...before discussing any of the above in detail...are the problems, approaches, and questions above correct and complete? If not, what should be changed?
--ado
On Thu, Mar 03, 2005 at 02:49:27PM -0500, Olson, Arthur David (NIH/NCI) wrote:
But the zone information compiler (zic) still produces binary files with 32-bit transition time values. Something's gotta give.
As long as we're making changes, it's best to do as much as possible (to avoid the need for further change down the road).
I've listed problems, approaches, and questions below. Much of this material is related to general matters of time rather than specific matters of time zones; my apologies.
My opinion is that opening up the scope of the TZ database to include all historic calendars used before the introduction of standardized time zones in the 19th century is too ambitious, as is including non- earth-referenced clocks (e.g., Martian time). While I agree that some flexibility about what calendar is used is in order (so that, for example, we might handle the new Israeli time zones by reference to the Hebrew calendar), I think the primary problem that this code needs to address is the simpler, but already complex, one of: Given: * a location on earth [probably specified broadly, as with the current tz zone-name based approach, but perhaps we can include polygon information allowing a latitude+longitude based selection?], * a "Modified Julian Date" (Gregorian 1970-01-01 CE == MJD 40587) [specified relative to a convenient-to-the-program location on earth, not necessarily the location mentioned above] * a reference clock (TAI/GPS, UTC/UTS/UT1, other), * and a count-of-seconds-since-epoch on that reference clock [note that TAI and GPS based counters would count SI seconds, but UTC and UTS based ones would only count "non-leap" seconds, and I guess a UT1 clock would probably have "seconds" of irregular lengths?] return: * the local MJD * the local (zone-adjusted) time Beyond that, we will of course want to comply with the C and POSIX time APIs, and so will need to at least translate between MJD values and proleptic Gregorian calendar dates. Slightly more ambitious, but still within the bounds of reason, we may wish to add support for the Hebrew calendar (which we would need to do internally anyway, if we wish to support the new time zones in Israel without having to resort to the current solution of special-casing every year). We may also wish to add ephemeris calculations so that we can correctly adjust for local-sun-time, whether for Saudi Arabia in the late 1980s, or for pre-timezone locales (or not). We should certainly leave the derived MJD exposed for other APIs to translate into dates on other calendars, but once again I claim that, just as we are less complete about time zone definitions prior to 1970, this code base need not directly cater to other calendar expressions. As to "zic file" format: I agree that we should revisit the choice of compiling the zone files to a binary format. Absent a good and still valid reason for it, I'd much prefer that we go to a "termcap-like" model. I'd say that the primary human-maintained tables would include many (all?) of the features that we currently have (such as local-time based transition references, aliases/links, and shared zone-transition rules), but the installed run-time version, while still text that can be edited (so that local installations can make tweaks before official updates are available), should be pre-processed as much as practical (e.g., eliminating external cross-references and converting all times to UTC). To that end, I'll toss out a "pre-alpha" idea about what an entry might hold (omitting a fair amount of complexity that will eventually be required): tzversion:tzcode-zic/2005f name:Asia/Jerusalem clock:UTC valid_start:53795 valid_end:open standard_abbr:IST standard_offset:+7200 daylight_abbr:IDT daylight_offset:+10800 daylight_start_day:dow=5 & ((mon=4g & mday=1g) | (mon=3g & 25g<mday)) daylight_start_time:0 daylight_end_day:dow=5 & mon=1H & 1H < mday & mday<9H daylight_end_time:82800 Note that while in Israel the daylight_end_day would be on the *Saturday* (dow=6) preceding 10 Tishri (2H<mday & mday<10H), this entry has been preprocessed to reference the date/time in UTC. Also, the use of "g" and "H" suffixes to reference the Gregorian and Hebrew calendars is almost certainly a bad notation, but I needed something to use for my example... (NB: MJD 53795 = 2005-03-01, when the law passed in the Knesset; a different start date might be more appropriate, I'm not sure.) Thinking about the broader problem a little more, perhaps it would make sense to use XML for the run-time format? One of the first problems I see with the notation I used above is that it gets quite verbose and redundant for zones which have had many changes only in the start-day/end-day rules. Two good things about XML are that there are some good and fast parsers out there, and it is a well-known standard, allowing other applications to easily leverage our data. The bad thing is that it would either add an external dependency to the code, or require that we bundle a parser. Anyway, there's my first 0.02 euros on the subject. --Ken Pizzini
Ken Pizzini said:
* a "Modified Julian Date" (Gregorian 1970-01-01 CE == MJD 40587) [specified relative to a convenient-to-the-program location on earth, not necessarily the location mentioned above]
Why MJD and not true Julian Date?
Slightly more ambitious, but still within the bounds of reason, we may wish to add support for the Hebrew calendar (which we would need to do internally anyway, if we wish to support the new time zones in Israel without having to resort to the current solution of special-casing every year).
There's other purely algorithmic calendars that could be added relatively simply as well.
We may also wish to add ephemeris calculations so that we can correctly adjust for local-sun-time, whether for Saudi Arabia in the late 1980s, or for pre-timezone locales (or not).
I'm a little more nervous of this, because you start to run into issues like leap seconds that could affect the answers.
Thinking about the broader problem a little more, perhaps it would make sense to use XML for the run-time format?
Very definitely. Looking at your (elided) example I can see several places where a nested structure would be preferable, and once you've gone that way you might as well do XML.
The bad thing is that it would either add an external dependency to the code, or require that we bundle a parser.
If you assume that the incoming files are lexically correct, a parser is actually pretty simple. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |
On Mon, Mar 07, 2005 at 11:36:19AM +0000, Clive D.W. Feather wrote:
Ken Pizzini said:
* a "Modified Julian Date" (Gregorian 1970-01-01 CE == MJD 40587) [specified relative to a convenient-to-the-program location on earth, not necessarily the location mentioned above]
Why MJD and not true Julian Date?
No good technical reason. I just personally have encountered use of MJD more frequently than I have of true Julian date, and the MJD use of midnight for the day-transition aligns better with our uses than the true Julian date's use of noon. Either one would work well though. --Ken Pizzini
On Mon, Mar 07, 2005 at 11:36:19AM +0000, Clive D.W. Feather wrote:
Thinking about the broader problem a little more, perhaps it would make sense to use XML for the run-time format?
Very definitely. Looking at your (elided) example I can see several places where a nested structure would be preferable, and once you've gone that way you might as well do XML.
The bad thing is that it would either add an external dependency to the code, or require that we bundle a parser.
If you assume that the incoming files are lexically correct, a parser is actually pretty simple.
On further reflection I'm less convinced that XML is directly useful (though I'm not opposed to using it for secondary reasons of interchange with other applications, if someone wants to argue that case), as I belatedly recall exactly what zic is currently doing and realize that most of the complexity I was contemplating just doesn't need to be there: for all dates in the past zic knows the precise timestamp to use for each transition (to the best of the knowlege encoded in the tzdata files). It is only for specifying the last pair (or larger set?) of "until max" rules that an algorithmic representation in the run-time data makes any potential sense. And as Robert Elz is pointing out, the case can be made that precomputing estimated transition rules for N years into the future of a given zic run is probably good enough. So, based on the discussion so far and further reflection, I see the following points for "TZ-ng": * The tzfile format is basically sound. Suggested extensions: . widen timestamps to 64 bits, of course; . add one (or a few?) versioning field(s) --- while the tzh_magic field with a different TZ_MAGIC should be adequate for "version of tzfile", it'd be nice to record something of the character "compiled by tzcode-2004a/zic from tzdata-2005d/africa"; . add a "time reference" field --- have the file document whether the transitions are on a TAI ("right") or a UTC ("posix") clock, for example (see my "wish-list" item below for another potential class of values); . add support for additional "optional" extension data --- the code written such that it will ignore unknown extensions. One idea for such a future extension is to include polygon data describing the geographic region covered by the zone. (I'm not sure that such data really belongs in tzfiles, but I'm also not completely convinced that it doesn't. The issue is that the name of the zone is mostly arbitrary; it is the spatial and temporal boundaries that really identifies a zone.) * The complexity of interpreting rules on different calendars is all pushed into the preprocessing done by zic; the run-time code need not know anything about them. (Current needs include Gregorian, Hebrew, and Persian. Future needs might include Islamic, Eastern Orthodox (like Gregorian, but with different "multiples of 100" rules), Chinese, and Japanese, but we should wait until such a need actually arises before worrying about them.) [Did any country which used the Julian calendar in the last 100 years or so (e.g., Tsarist Russia) ever observe daylight saving transitions based on that system of dates?] Adding such support can be made at any convenient time, before or after the switch to 64-bit timestamps in tzfile; in the interim we'll just continue to use the work-around currently employed for Iran and Israel: embed a bunch of special-case entries in the tzdata source, based on external conversion to Gregorian dates. * The run-time APIs in this implementation should continue to be limited to the (proleptic) Gregorian calendar, (the one which is mandated by the C and POSIX APIs) (no externally visible change). Though I still slightly favor the ability to expose a Julian day ("modified" or not), in light of the above am also willing to say that applications which wish to work with dates in non-Gregorian calendars can just base their interconversions on the (tm_year,tm_yday) pair instead. Such applications as can handle things like Sweden's multiple transitions to the Gregorian calendar or the calendrical chaos in Rome around the time of Julius Caesar's reign, or the Mayan calendar, or the World calendar, or any other manner of ways that the days have been marked (actual or proposed) in different places and times are quite welcome, but outside the scope of this project. An item that is still on my personal wish-list (but I'm now questioning whether the complexity is justified) is support for "zoneless" times based on local sun (real/apparent, and/or mean). I mentioned Saudi Arabia in my earlier posting, but really my interest is for times in the pre-standard-time past, and perhaps as a sane "best guess" for dates between the "N years into the future" cut-off and such time as our projections of earth rotation become notably inaccurate (by which I mean, the usefulness of the guess goes down as the error bars on the projection expand; the code can probably be left to blithely calculate "local time" beyond the heat-death/big-crunch/whatever of the universe). Like the addition of support for non-Gregorian calendars in zic, this can mostly be deferred as something independent of the redefinition of the tzfile format. The only support that might be helpful is a means to annotate "use sun angle at meridian N" (and whether that is real or mean sun) as an alternative to "UTC" or "TAI". (Or in addition to: have the code fall back to sun time when the date is outside of the range of years covered by zone information?) An "it might be nice" item that is neither strongly required, nor particularly hard to provide, is a tzdata-to-XML translator. This probably should have options to either output what is essentially tzfile data in XML format, or to re-interpret the tzdata files in XML form. The main justification for this is that it would make it easier for other applications to import our hard-won data without having to build custom parsers or tzfile readers. I'm also curious as to whether an XML based variant of the tzdata file would be any easier to use/edit/maintain, if someone else is motivated to do the experiment (my guess is that it would not be, which is why I'm not making the effort myself). Cheers, --Ken Pizzini
Ken Pizzini <"tz."@explicate.org> writes:
[Did any country which used the Julian calendar in the last 100 years or so (e.g., Tsarist Russia) ever observe daylight saving transitions based on that system of dates?]
Yes. For example, according to our current data Moscow observed daylight-saving time in 1917, when the Julian calendar was still the de facto and de jure calendar. This was back when Moscow was normally 2 hours, 30 minutes, 48 seconds ahead of GMT. This wasn't "Tsarist Russia", though, as the Tsar was overthrown before daylight-saving time was introduced. Russia is a bit of a special case. It didn't even adopt the _Julian_ calendar until 1700! (It used the Byzantine calendar before that.)
Ken Pizzini said:
On further reflection I'm less convinced that XML is directly useful (though I'm not opposed to using it for secondary reasons of interchange with other applications, if someone wants to argue that case),
Well, that's a good argument in itself.
* The tzfile format is basically sound. Suggested extensions: . widen timestamps to 64 bits, of course; . add one (or a few?) versioning field(s) --- while the tzh_magic field with a different TZ_MAGIC should be adequate for "version of tzfile", it'd be nice to record something of the character "compiled by tzcode-2004a/zic from tzdata-2005d/africa"; . add support for additional "optional" extension data --- the code written such that it will ignore unknown extensions.
All three of these indicate that we ought to move to a system-independent representation of data. That means either a textual format or a self-describing data format such as ASN.1. I think that textual is far preferable. Once you decide on textual, the choice is between using a standard one or inventing your own. I don't see any significant benefits in not using XML.
* The complexity of interpreting rules on different calendars is all pushed into the preprocessing done by zic; the run-time code need not know anything about them.
No argument there. But I think that both input and output should be XML, preferably with compatible schemas. Actually, thinking further, the zic output could contain both the "compiled" form *and* the original data, so that someone possessing the output can see how it was derived.
* The run-time APIs in this implementation should continue to be limited to the (proleptic) Gregorian calendar, (the one which is mandated by the C and POSIX APIs) (no externally visible change).
Though I still slightly favor the ability to expose a Julian day ("modified" or not), in light of the above am also willing to say that applications which wish to work with dates in non-Gregorian calendars can just base their interconversions on the (tm_year,tm_yday) pair instead.
Given how easy it is to convert between proleptic Gregorian and JD/MJD, I think we should provide these interfaces as a matter of convenience and to prevent endless reinvention of wheels.
Such applications as can handle things like Sweden's multiple transitions to the Gregorian calendar or the calendrical chaos in Rome around the time of Julius Caesar's reign, or the Mayan calendar, or the World calendar, or any other manner of ways that the days have been marked (actual or proposed) in different places and times are quite welcome, but outside the scope of this project.
Fine. Again, though, an XML format would mean that the information could be bundled into the same files if desired. Ditto polygons, etc. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |
On Thu, 3 Mar 2005, Olson, Arthur David (NIH/NCI) wrote:
* Transitions in Israel Israel now goes back to standard time in the fall on the Saturday before Yom Kippur; there's no convenient way to represent this in the input to zic. [...]
On Sat, 5 Mar 2005, Ken Pizzini wrote:
name:Asia/Jerusalem clock:UTC valid_start:53795 valid_end:open standard_abbr:IST standard_offset:+7200 daylight_abbr:IDT daylight_offset:+10800 daylight_start_day:dow=5 & ((mon=4g & mday=1g) | (mon=3g & 25g<mday)) daylight_start_time:0 daylight_end_day:dow=5 & mon=1H & 1H < mday & mday<9H daylight_end_time:82800 Note that while in Israel the daylight_end_day would be on the *Saturday* (dow=6) preceding 10 Tishri (2H<mday & mday<10H),
Please note that it is the Saturday _night_ before Yom Kippur -- i.e. in reality (and this is what is stated in the final wording of the law), the last _Sunday_ before Yom Kippur at 02:00 a.m. Moreover, please note that Iran uses the Persian calendar for both the starting and end dates of Daylight Saving Time so that calendar too would have to be part of the software. ___________________________________________________________________________ Ephraim Silverberg, CSE System Group, Phone number: 972-2-6585521 Hebrew University, Jerusalem, Israel. Fax number: 972-2-5617723 WWW: http://www.cs.huji.ac.il/~ephraim E-mail: ephraim@cse.huji.ac.il
"Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> writes:
QUESTIONS
(1) Do we handle Julian/Gregorian transitions? (2) Do we allow control of skipping the year zero? (3) Do we handle early-Julian leap year variations?
My kneejerk reaction is that these are all low-priority issues. It's very hard to get reliable data for (1) -- even harder than getting reliable data for DST transitions. The few people who care about these things will most likely argue about them, and I'd hate to be the arbiter. I would suggest handling (1)-(3) as POSIX-locale LC_TIME issues, rather than as issues in the TZ database itself.
(4) Do we handle pegging of far-past and far-future times?
Again, I wouldn't bother, or I'd defer it to LC_TIME.
(5) Do we simplify handling of events tied to non-Gregorian-calendar-related events (such as Yom Kippur)?
This would help simplify the database for Iran (Persian calendar) and Israel (Hebrew calendar), and presumably for some other locales (Islamic calendar -- currently this isn't analyzed well). On the downside, though, is that there is not an algorithmic conversion between the Gregorian and the Persian calendar -- as I understand it, there will be some not-yet-exercised human judgment for dates after around 2050 or so. I wouldn't be surprised if there were similar issues for the other calendars. It might be better to leave sleeping dogs lie. Or if we do attack the problem, generalize the database format well enough so that it can specify the conversion functions. (Ouch!)
(5) Do we handle pre-Julian or non-Julian-Gregorian time schemes?
Again, I'd defer this matter and use an LC_TIME-like approach (sorry if this is fuzzy). I agree that it might make sense to go with a VZONEINFO-like approach, though maybe these days we'd be better off designing our own format atop XML. Here are some more questions for your list: (6) Do we add support to represent time zone abbreviations in other locales, e.g., HNE for Eastern Standard Time for French writers? (7) Do we support sub-second time stamps (e.g., POSIX struct timespec with its ns resolution) and time zone offsets that are not an integer number of seconds (e.g., Amsterdam time, 1835-1937)?
On Sun, Mar 06, 2005 at 01:45:38AM -0800, Paul Eggert wrote:
Here are some more questions for your list:
(6) Do we add support to represent time zone abbreviations in other locales, e.g., HNE for Eastern Standard Time for French writers?
No opinion...
(7) Do we support sub-second time stamps (e.g., POSIX struct timespec with its ns resolution) and time zone offsets that are not an integer number of seconds (e.g., Amsterdam time, 1835-1937)?
I'm in favor. --Ken Pizzini
On Mar 6, 2005, at 1:45 AM, Paul Eggert wrote:
(6) Do we add support to represent time zone abbreviations in other locales, e.g., HNE for Eastern Standard Time for French writers?
That data is already in CLDR [1]. I don't think there is a need to add it to tz, and doing so would require adding a lot of infrastructure to handle localization. Deborah Goldsmith Internationalization, Unicode Liaison Apple Computer, Inc. goldsmit@apple.com [1] http://www.unicode.org/cldr/
Deborah Goldsmith <goldsmit@apple.com> writes:
On Mar 6, 2005, at 1:45 AM, Paul Eggert wrote:
(6) Do we add support to represent time zone abbreviations in other locales, e.g., HNE for Eastern Standard Time for French writers?
That data is already in CLDR [1].
Some of the data is there, but as far as I can tell there's no programmatic way to generate the proper information from the union of CLDR and the tz database. To take an extreme example, the abbreviation "LMT" can mean either "Local Mean Time" or "Lisbon Mean Time", and as far as I can see the CLDR infrastructure provides no way to tell which is which for Europe/Lisbon time stamps. The tz data currently show that Portuguese time stamps before 1884 used local mean time, and that from 1884-1911 they used Lisbon Mean Time, but the only think you'll find in the tz database proper, outside of comments, is "LMT" for both. This sort of thing is why I think it advisable to add better support for time zone abbreviations. Caveat: the CLDR database <http://www.unicode.org/cldr/data/diff/by_type/dates_timeZoneNames.html> currently doesn't have any entries either for Lisbon or for local mean time, so to some extent I'm guessing about how the CLDR would actually operate once it became complete enough to handle the Portuguese situation.
doing so would require adding a lot of infrastructure to handle localization.
Yes. It would be nice if we could simply point people at CLDR, and address the problem mentioned above. Ideally we could point them to a complete reference implementation (such as already exists for tz), one that would handle the combined tz+CLDR problem.
Mark ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Deborah Goldsmith" <goldsmit@apple.com> Cc: "Tz (tz@elsie.nci.nih.gov)" <tz@lecserver.nci.nih.gov> Sent: Sunday, March 06, 2005 23:04 Subject: Re: Time zone: the next generation
Deborah Goldsmith <goldsmit@apple.com> writes:
On Mar 6, 2005, at 1:45 AM, Paul Eggert wrote:
(6) Do we add support to represent time zone abbreviations in other locales, e.g., HNE for Eastern Standard Time for French writers?
That data is already in CLDR [1].
Some of the data is there, but as far as I can tell there's no programmatic way to generate the proper information from the union of CLDR and the tz database.
To take an extreme example, the abbreviation "LMT" can mean either "Local Mean Time" or "Lisbon Mean Time", and as far as I can see the CLDR infrastructure provides no way to tell which is which for Europe/Lisbon time stamps. The tz data currently show that Portuguese time stamps before 1884 used local mean time, and that from 1884-1911 they used Lisbon Mean Time, but the only think you'll find in the tz database proper, outside of comments, is "LMT" for both. This sort of thing is why I think it advisable to add better support for time zone abbreviations.
While CLDR does provide for the option of having timezone abbreviations, what we have found is that they seldom used, except in the multi-zone countries, like the US, Canada, Australia, etc, and in that case, typically just in the languages that are used in that country. Even there there is a problem, because often the abbreviations used in one country will collide with those used in another. So while it is available, it doesn't appear worth encouraging.
Caveat: the CLDR database <http://www.unicode.org/cldr/data/diff/by_type/dates_timeZoneNames.html> currently doesn't have any entries either for Lisbon or for local mean time, so to some extent I'm guessing about how the CLDR would actually operate once it became complete enough to handle the Portuguese situation.
doing so would require adding a lot of infrastructure to handle localization.
Yes. It would be nice if we could simply point people at CLDR, and address the problem mentioned above. Ideally we could point them to a complete reference implementation (such as already exists for tz), one that would handle the combined tz+CLDR problem.
We have been collecting timezone information in this release, but the way it works is that if a country only has a single timezone, the default is to use the name of the country itself, which we already have in a large number of languages. So you would not see a specific timezone localization for Lisbon unless that was felt to be important in some particular language.
Date: Thu, 3 Mar 2005 14:49:27 -0500 From: "Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> Message-ID: <75DDD376F2B6B546B722398AC161106C74038F@nihexchange2.nih.gov> | But the zone information compiler (zic) still produces binary files with | 32-bit transition time values. Something's gotta give. Yes, a revision makes sense. But | As long as we're making changes, it's best to do as much as possible | (to avoid the need for further change down the road). No, please try and avoid typical 2nd system effects, with the grand temptation to add everything that anything can imagine might possibly be of some interest to someone, somewhere, sometime. Change only what absolutely needs changing because of demonstrated need now. What exists is currently pretty good, the 64 bit issue is certainly going to bite sometime, so that one ought be fixed, there's nothing else (or very little) so seriously wrong with that exists now that makes it important to change, I suspect. If sometime in the future someone has a problem that is reasonably solved in this set of code, then we (or someone) can solve their concrete problem at the time it is presented. Until there's a real problem to solve, any solutions adopted are more likely to be problems than answers. | * Future transition times/past transition times | The binary files produced by zic record transition times as 32-bit | values; times after the 2038 (or before 1901) cannot be represented. | (The future limit can be extended to 2106 by treating the values as | unsigned, but if that's done times before 1970 cannot be | represented.) Yes, a wider range makes sense. | * Transitions in Israel | Israel now goes back to standard time in the fall on the Saturday | before Yom Kippur; there's no convenient way to represent this in | the input to zic. In the input language, perhaps - but all that's needed there is a slightly more flexible version of the script processing that's needed now for US presidential election years (or once was), and for some of the wacky rules that have been used in parts of Australia. The actual (binary) zone file format for this is just fine. | * Julian-Gregorian transition Forget it. Those interested in archeology/anthropology/astronomy can use their own calendar methods. All that needs to be dealt with here is the current time, and reasonable timestamps for events that have occurred in the computer age, and are reasonably likely to still exist. Going back to 1970 is plenty early enough (I'm not sure we need anything more than that). Don't attempt to solve everyone's problems. Pick ours, and solve that one, and leave all the rest alone. (Note, this isn't meant to demean the importance of everyone else's issues, just to limit our effort to what we know how to handle properly - guessing what someone else might find useful is just plain dumb.) | APPROACHES | | * Do nothing | Since things don't get sticky until at least 2037, | it's possible to wait (at least for a while) before taking action. No, we need enough future time for planning purposes, so something will have to be done before (at about the latest) 2025. Then we need depployment time before that (time for everything to get upgraded, and all the old stuff that doesn't get upgraded to die away). That means the new stuff needs to be ready to be shipped bt 2015 or so I think. When in the next 10 years the decisions get made and the code written probably doesn't matter all that much. | * Tweak the binary file format Yes. | * Abandon binary files No. The text files require too much knowledge and processing. As long as the binary file model exists, it doesn't matter how long that processing takes (it is only really necessary to ever process ascii->binary once a year or so), or what resources are needed to perform the conversion. So, it is entirely reasonable to run a program (for every year data is being generated) to calculate Israeli -> Gregorian conversions. It isn't reasonable to do that in every program that wants a time_t -> struct tm conversion. | * Change zic's output to another format The current format seems to work pretty well, the only thing (field widths excepted, that's a trivial change) beyond that that you might want to do, is profile some programs that do a lot of date conversions (ls -l on a huge directory or something) and see if the file format is adding unnecessary overhead, and if it is (and only if it is) consider whether some optimisation might be made that could improve access in the common cases (perhaps "this year" for the year the ascii->binary conversion is done could be made really fast to access, on the assumption that the current year is accessed more often than any other, but only (*only*) if profiling suggests a detectable win will be possible by adopting this approach). | * Do we handle Julian/Gregorian transitions? No. | * Do we allow control of skipping the year zero? No. | * Do we handle early-Julian leap year variations? No. | * Do we handle pegging of far-past and far-future times? If you're going to have a very wide allowable range, putting some limits on what gets converted makes sense. Pretending we know how times and calendars will be done for thousands of years into the future is absurd, just look at what has changed in the past few hundred years - and decades for DST. Handle times forward based upon current assumptions for a couple of hundred years at most, and treat all the rest as someone else's problem. Backwards, 1970 is far enough to be accurate. | * For each new of the above: | * What default assumption should be used in zic | and in run-time software? Back to 1970, forward to today + 100 years (maybe 200, no more). | * Can the assumption be overridden in a time zone source file? Not worth the effort. Aside from anything else, it means that applications behave differently in different environments, and that's to be avoided if at all possible. | * Can the assumption be overridden using an environment | variable? Definitely not. | * Can the assumption be overridden with function calls? No. Any overriding means the data has to exist. The data doesn't exist (for neither the distant future, nor the distant past). We're still guessing just what rules some parts of the world used for daylight saving in the past few years... | * Do we simplify handling of events tied to | non-Gregorian-calendar-related events (such as Yom Kippur)? By allowing an external script (and hence program) to supply rule info when zic runs. That's all that's needed. | * Do we handle pre-Julian or non-Julian-Gregorian time schemes? No. kre
On Sun, Mar 06, 2005 at 10:25:56PM +0700, Robert Elz <kre@munnari.OZ.AU> wrote:
| * Abandon binary files
No. The text files require too much knowledge and processing. As long
Nowadays, an X11 terminal emulator (say, i18n-ized rxvt) takes up to 0.5s of startup time on my 2ghz opteron, which is a fast cpu. This is actually somewhat slower than the startup times on, say, my pentium in 1996, although one would have expected an order of magnitude faster startup. I once tried to find out why that is happening. I found the reason for that slowdown is that libc/libX11/Xt etc. now parses a lot of files it didn't parse before, such as the Compose table (which also increased in size), lots of application-defaults files and more. (Of course, I get vastly more functionality, too). Although each of these files can be parsed very quickly, all of them together eat considerable time and memory resources. So while, personally, I prefer text over binary formats, I do so only for ease of development. However, there already is a binary format for timezone info, and before abandoning that I'd prefer good arguments. The argument "it doesn't seem to be a problem nowadays" doesn't sound good enough for it, as parsing time and memory required is not trivial, and while it itself is not a problem, it adds to an already existing problem. "Vastly more functionality" would be a better argument, but this doesn't seem to be achievable. (Another case in point is gnu libc's locale management, which a) converts text to binary representation and b) caches frequently-used locales in a special database, as loading even the pre-parsed binary format was slow). Just my 0.02¢, of course. -- The choice of a -----==- _GNU_ ----==-- _ generation Marc Lehmann ---==---(_)__ __ ____ __ pcg@goof.com --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE
On 2005-03-06, Robert Elz wrote:
Date: Thu, 3 Mar 2005 14:49:27 -0500 From: "Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> Message-ID: <75DDD376F2B6B546B722398AC161106C74038F@nihexchange2.nih.gov>
| As long as we're making changes, it's best to do as much as possible | (to avoid the need for further change down the road).
No, please try and avoid typical 2nd system effects, with the grand temptation to add everything that anything can imagine might possibly be of some interest to someone, somewhere, sometime. Change only what absolutely needs changing because of demonstrated need now. What exists is currently pretty good, the 64 bit issue is certainly going to bite sometime, so that one ought be fixed, there's nothing else (or very little) so seriously wrong with that exists now that makes it important to change, I suspect.
I support this argument, with a couple of small but important exceptions (see below).
Back to 1970, forward to today + 100 years (maybe 200, no more).
No, this is not enough for common uses of today's systems. We deal with dates in commerce and academia all the time to get our work done. One kind of date that we have to deal with regularly is birth dates. Everybody here was probably born before 1970. And we often have to deal with parents' birth dates. That means going back to 1900 at the very least, but probably to 1800 to be safe. Equally, we deal with contracts that extend into the future, not forever, but for periods sometimes in excess of 100 years, so we need to go 200 years forward as well. With this small window of 400 years, we avoid most of the hard stuff while also providing a date thing that is useful to common software, instead of continuing to force every database vendor in the world to invent yet another date mechanism. Greg
Robert Elz <kre@munnari.oz.au> writes:
it is entirely reasonable to run a program (for every year data is being generated) to calculate Israeli -> Gregorian conversions.
I agree with much of your analysis, but this point might require some further thought. It is entirely reasonable run a program for 32-bit time_t, since we need to generate predictions for future conversions only out to the year 2038. It isn't necessarily reasonable to do it for 64-bit time_t, since (in theory, anyway) we'd need to generate preductions for future conversions out to the year 292,277,026,596 or so, assuming I've done the arithmetic right. This number is a bit silly since it exceeds the commonly predicted useful life of the universe, but the point is that we'll have to arbitrarily cut off our predictions somewhere (e.g., 100 years in the future), and any arbitrary cutoff will be a bit of a pain. For example, suppose we arbitrarily cut off 100 years into the future. Do we need to generate new tables every year, as the cutoff time advances? This may sound like a trivial issue but in practice trivial issues like these build up.
If you're going to have a very wide allowable range, putting some limits on what gets converted makes sense. Pretending we know how times and calendars will be done for thousands of years into the future is absurd,
But unfortunately it is required for POSIX compliance, at least if tm_year is representable as an int (true for years up to about 2**31 on most hosts) and if TZ uses the POSIX format.
Backwards, 1970 is far enough to be accurate.
Here I think you're being a bit too modest in aim. The existing code already works for dates back to 1901 (in 32-bit time_t), and some people already rely on this, to handle time stamps the elderly (medical records, horoscopes, etc.). I don't see any technical reason to remove support for that. I'd say we might as well go back at least to the introduction of standard time (circa 1850), for time_t wide enough to support that. I don't see any fundamental technical objection to going back that far.
Date: Sun, 06 Mar 2005 23:18:26 -0800 From: Paul Eggert <eggert@CS.UCLA.EDU> Message-ID: <87mztfudod.fsf@penguin.cs.ucla.edu> | It is entirely reasonable run a program for 32-bit time_t, since we | need to generate predictions for future conversions only out to the | year 2038. It isn't necessarily reasonable to do it for 64-bit | time_t, since (in theory, anyway) we'd need to generate preductions | for future conversions out to the year 292,277,026,596 or so, assuming | I've done the arithmetic right. Since there's no possible rational reason for pretending to know what the DST rules will be like in the year 3000 (or even 2100), attempting to generate DST transitions that far off into the future is just absurd. | For example, suppose we arbitrarily cut off 100 years into the future. | Do we need to generate new tables every year, as the cutoff time | advances? This may sound like a trivial issue but in practice trivial | issues like these build up. Every year, probably not, but from time to time, probably. We do that anyway - simply having a new set of data generated for each new OS version distribution is probably going to stay safe. I mean, how many people do you still expect to be running Windows XT (or NetBSD 2, or Solaris 10) in 50 years from now? If the data remains stable (if the rules change, obviously it needs to be regenerated anyway) then people will be getting new code with updated data in it frequently enough for us to not worry about a 100 year cutoff, or the regeneration that means will be needed to make sure that the end point is far enough away not to bother anyone. | But unfortunately it is required for POSIX compliance, at least if | tm_year is representable as an int (true for years up to about 2**31 | on most hosts) and if TZ uses the POSIX format. What exactly si required for POSIX conformance? Do they require that we get DST conversions correct (for everywhere on the planet) for all years that are representable as an int? If they do, screw posix (they're asking for the impossible) - but frankly, I doubt it. Note that the database we're dealing with is a list of DST conversions. DST rules are all that matters here. Everything else is just an algorithmic conversion - I'm not suggesting that we don't pretend to convert the time_t with the value (~0 - 100) into a struct tm (assuming it fits), but I also don't care if the result we get from that turns out (after we get to that time, and know what human representation it actually has) to be an hour or two (or even a day or two) different than what we guessed and converted the time_t into. | > Backwards, 1970 is far enough to be accurate. | | Here I think you're being a bit too modest in aim. The existing code | already works for dates back to 1901 (in 32-bit time_t), Does it really? For all timezones, for all DST rules? | I'd say we might as well go back at least to the introduction of | standard time (circa 1850), for time_t wide enough to support that. I | don't see any fundamental technical objection to going back that far. I doubt that we get to decide what a time_t format should be - in fact, some of the recent changes to the code are there precisely because we don't get to make that decision (if we did, we wouldn't be bothering with that floating point nonsense). The range of a time_t is something that the OS dumps upon us. The job of this code is to convert that into a struct tm. By all means, generate a struct tm for every time_t supported by the implementation that can be represented in a struct tm (another data type that we don't get to define), just don't pretend that we're going to necessarily have the DST rules 100% correct for times before about 1970, or for time in the future further ahead of the current time than it is reasonable to expect the DST rules to remain stable. I mean, worrying about future DST in Isreal is just plain crazy - historical evidence would suggest that there will be a change of government there sometime in he next 10 years, and the new one will decide on a whole different set of rules. About the only thing that I'd suggest, is that we make it clear to people who deal with dates, that they should choose an appropriate data type for the actual purpose they need to represent the data, and that time_t is most certainly not appropriate for everything. Personally (for example), I think it would be just plain crazy to express people's date of birth as a time_t (just which second, or even sub-second for some proposed time_t extensions, do you record as the time of birth anyway? When the head appears, when the big toe is finally extracted, when the umbilical cord is cut ??? And who cares anyway?) And which records that retain DoB information, also bother to record the timezone that applied at the place of birth, so the correct DST conversions can be done on it? One of the true evils of computing is the temptation to add meaningless precision to all kinds of data, just because there is space available to allow it to be added. kre
What we and some other people do is to use the binary data for whatever it covers, but then use the rules for the last year going into the indefinite future. While it may be "absurd" to think that the timezone rules for 3000AD will be the same, we must return *some* conversion between local and UTC time for all dates covered by our datetime datatypes. So the last available year's rules are what we have chosen. See also http://icu.sourceforge.net/userguide/universalTimeScale.html Mark ----- Original Message ----- From: "Robert Elz" <kre@munnari.OZ.AU> To: "Paul Eggert" <eggert@CS.UCLA.EDU> Cc: "Tz (tz@elsie.nci.nih.gov)" <tz@lecserver.nci.nih.gov> Sent: Monday, March 07, 2005 06:30 Subject: Re: Time zone: the next generation
Date: Sun, 06 Mar 2005 23:18:26 -0800 From: Paul Eggert <eggert@CS.UCLA.EDU> Message-ID: <87mztfudod.fsf@penguin.cs.ucla.edu>
| It is entirely reasonable run a program for 32-bit time_t, since we | need to generate predictions for future conversions only out to the | year 2038. It isn't necessarily reasonable to do it for 64-bit | time_t, since (in theory, anyway) we'd need to generate preductions | for future conversions out to the year 292,277,026,596 or so, assuming | I've done the arithmetic right.
Since there's no possible rational reason for pretending to know what the DST rules will be like in the year 3000 (or even 2100), attempting to generate DST transitions that far off into the future is just absurd.
| For example, suppose we arbitrarily cut off 100 years into the future. | Do we need to generate new tables every year, as the cutoff time | advances? This may sound like a trivial issue but in practice trivial | issues like these build up.
Every year, probably not, but from time to time, probably. We do that anyway - simply having a new set of data generated for each new OS version distribution is probably going to stay safe. I mean, how many people do you still expect to be running Windows XT (or NetBSD 2, or Solaris 10) in 50 years from now? If the data remains stable (if the rules change, obviously it needs to be regenerated anyway) then people will be getting new code with updated data in it frequently enough for us to not worry about a 100 year cutoff, or the regeneration that means will be needed to make sure that the end point is far enough away not to bother anyone.
| But unfortunately it is required for POSIX compliance, at least if | tm_year is representable as an int (true for years up to about 2**31 | on most hosts) and if TZ uses the POSIX format.
What exactly si required for POSIX conformance? Do they require that we get DST conversions correct (for everywhere on the planet) for all years that are representable as an int?
If they do, screw posix (they're asking for the impossible) - but frankly, I doubt it.
Note that the database we're dealing with is a list of DST conversions. DST rules are all that matters here.
Everything else is just an algorithmic conversion - I'm not suggesting that we don't pretend to convert the time_t with the value (~0 - 100) into a struct tm (assuming it fits), but I also don't care if the result we get from that turns out (after we get to that time, and know what human representation it actually has) to be an hour or two (or even a day
or
two) different than what we guessed and converted the time_t into.
| > Backwards, 1970 is far enough to be accurate. | | Here I think you're being a bit too modest in aim. The existing code | already works for dates back to 1901 (in 32-bit time_t),
Does it really? For all timezones, for all DST rules?
| I'd say we might as well go back at least to the introduction of | standard time (circa 1850), for time_t wide enough to support that. I | don't see any fundamental technical objection to going back that far.
I doubt that we get to decide what a time_t format should be - in fact, some of the recent changes to the code are there precisely because we don't get to make that decision (if we did, we wouldn't be bothering with that floating point nonsense).
The range of a time_t is something that the OS dumps upon us. The job of this code is to convert that into a struct tm. By all means, generate a struct tm for every time_t supported by the implementation that can be represented in a struct tm (another data type that we don't get to define), just don't pretend that we're going to necessarily have the DST rules 100% correct for times before about 1970, or for time in the future further ahead of the current time than it is reasonable to expect the DST rules to remain stable.
I mean, worrying about future DST in Isreal is just plain crazy - historical evidence would suggest that there will be a change of government there sometime in he next 10 years, and the new one will decide on a whole different set of rules.
About the only thing that I'd suggest, is that we make it clear to people who deal with dates, that they should choose an appropriate data type for the actual purpose they need to represent the data, and that time_t is most certainly not appropriate for everything. Personally (for example), I think it would be just plain crazy to express people's date of birth as a time_t (just which second, or even sub-second for some proposed time_t extensions, do you record as the time of birth anyway? When the head appears, when the big toe is finally extracted, when the umbilical cord is cut ??? And who cares anyway?)
And which records that retain DoB information, also bother to record the timezone that applied at the place of birth, so the correct DST conversions can be done on it?
One of the true evils of computing is the temptation to add meaningless precision to all kinds of data, just because there is space available to allow it to be added.
kre
Robert Elz <kre@munnari.oz.au> writes:
What exactly si required for POSIX conformance? Do they require that we get DST conversions correct (for everywhere on the planet) for all years that are representable as an int?
Yes, but not "for everywhere on the planet"; it's simply for every DST rule expressible as a POSIX TZ string. POSIX TZ strings can represent just one DST rule (e.g., the current US rules), and the set of rules is rather limited (e.g., they cannot represent the Israeli or Iranian rules). And yes, it is a bit of absurd requirement, but it is there, and I wouldn't be surprised if some test suites check for it.
| > Backwards, 1970 is far enough to be accurate. | | Here I think you're being a bit too modest in aim. The existing code | already works for dates back to 1901 (in 32-bit time_t),
Does it really? For all timezones, for all DST rules?
Yes. I use it regularly to test time zone data before 1970. It's supposed to work, anyway, and when I find bugs (which is pretty rare these days) they get fixed.
I doubt that we get to decide what a time_t format should be
True.
in fact, some of the recent changes to the code are there precisely because we don't get to make that decision (if we did, we wouldn't be bothering with that floating point nonsense).
Yes, but the main point is that the code should work well with signed 64-bit time_t, which is the #2 most-popular format in practice, and whose popularity is gaining with time.
I think it would be just plain crazy to express people's date of birth as a time_t... And who cares anyway?)
The most-important users are astrologers, and (I suppose) scientists who occasionally debunk astrologers' claims.
And which records that retain DoB information, also bother to record the timezone that applied at the place of birth, so the correct DST conversions can be done on it?
Users infer the time zone from the location and the recorded time on the records. They can be very serious about this sort of thing. For example, if you were born in 1950 in Japan, they care whether you were born in a US military hospital or a Japanese hospital, as different time zones were used. (At least, this is what I've been told -- I don't have hard evidence for it.) I do agree that to some extent this whole thing is overkill; but the tz database is already overkill to some extent, and we might as well find a sweet spot in how much overkill we're willing to do. It's not much extra work to support time stamps before 1970, and there is some utility to doing so, so I don't see why not.
On Thu, 3 Mar 2005, Olson, Arthur David (NIH/NCI) wrote:
QUESTIONS
* Do we handle Julian/Gregorian transitions?
Yes, but with an override feature in the API. One of the problems is that many history texts contain converted calendar dates. For the consistency of a particular history book, biography etc, authors have a tendency to convert all calendar data contained in them into one system, even if at the given date and location a different calendar might have applied. If a related calendar/timezone database makes strict assumptions about which calendar applied for a given date, then new problems are introduced.
* Do we allow control of skipping the year zero?
Both styles of year counting have a name. The one with year zero is called 'astronomical year counting style' and it indicates years BC with negative year numbers. The one without year zero is called 'historical year counting style' and it indicates years BC with the suffix BC or BCE or something similar, but never with a plus/minus sign. I think the documentation should be clear, but the API needs only to support one style.
* Do we handle pegging of far-past and far-future times?
Aside from calendar system changes, no time zone data are known for dates before the 1800s. There is an issue whether true solar time or mean solar time was used for the recording of a particular historical event. But I do not see how tzdata should be able to answer this question. E.g. when J.W.Goethe mentions in one of his works that he was born at noon on 28 August 1749 (greg. calendar) in Frankfurt: Does he mean true solar time (as indicated by a sun dial) or mean solar time (as measured by an astronomical clock in an observatory)? tzdata cannot know this, and should not claim that it knows. There was no clear transition between true solar time and mean solar time reckoning, before the formal introduction of timezone standards in the 19th century. Alois Treindl
regarding my project to extend the details of timezone history in tzdata, a first question: Is it desirable, in the eyes of current tzdata maintainers, to add such extensions? Example: For Germany, tzdata has currently only one zone, Europe/Berlin In fact, we can differentiate several zones in Germany with different timezone histories. I name only one difference for each: Germany/Baden LMT to 1891 march 15 8e24 meridian to 1892 april 1 CET ... Germany/Bayern LMT to 1891 march 15 11e34 meridian to 1892 april 1 CET ... Germany/Wurtemberg LMT to 1891 march 15 9e11 meridian to 1892 april 1 CET ... Germany/Rheinlandpfalz LMT to 1891 March 15 8e26 meridian to 1892 april 1 CET to 1919 Jan 1 GMT to 1927 Apr 9 # french occupation of Rhineland CET ... There are more differences for - East Germany in 1945 and 1947 - parts of Germany which now belong to Poland (after 1945) - parts of Germany which now belong to Russia (after 1945) - Saarland province etc. All these differences concern periods before 1970. -------------------- For France, tzdata has one zone. In fact, there are about 50 different areas with different timezone history. Most of the differences concern the years 1940 - 1944 during the second world war, when the areas under German occupation changed, with the progress of the German front in 1940, the progress of the allied front in 1944, and the border between german occupied France and the area under control of the Vichy government changed between 1942 and 1944 in various steps. The data exist and are well documented, but a large set of zone names will be required to express them in tzdata. ------------------------- These are just a few examples of what needs to be done of tzdata should get a more complete representation of timezone history.
participants (13)
-
Alois Treindl -
Clive D.W. Feather -
Deborah Goldsmith -
Ephraim Silverberg -
Greg Black -
Ken Pizzini -
Ken Pizzini -
Marc Lehmann -
Mark Davis -
Olson, Arthur David (NIH/NCI) -
Paul Eggert -
Robert Elz -
Stuart Taylor (sttaylor)