On Mon, Mar 07, 2005 at 11:36:19AM +0000, Clive D.W. Feather wrote:
Thinking about the broader problem a little more, perhaps it would make sense to use XML for the run-time format?
Very definitely. Looking at your (elided) example I can see several places where a nested structure would be preferable, and once you've gone that way you might as well do XML.
The bad thing is that it would either add an external dependency to the code, or require that we bundle a parser.
If you assume that the incoming files are lexically correct, a parser is actually pretty simple.
On further reflection I'm less convinced that XML is directly useful (though I'm not opposed to using it for secondary reasons of interchange with other applications, if someone wants to argue that case), as I belatedly recall exactly what zic is currently doing and realize that most of the complexity I was contemplating just doesn't need to be there: for all dates in the past zic knows the precise timestamp to use for each transition (to the best of the knowlege encoded in the tzdata files). It is only for specifying the last pair (or larger set?) of "until max" rules that an algorithmic representation in the run-time data makes any potential sense. And as Robert Elz is pointing out, the case can be made that precomputing estimated transition rules for N years into the future of a given zic run is probably good enough. So, based on the discussion so far and further reflection, I see the following points for "TZ-ng": * The tzfile format is basically sound. Suggested extensions: . widen timestamps to 64 bits, of course; . add one (or a few?) versioning field(s) --- while the tzh_magic field with a different TZ_MAGIC should be adequate for "version of tzfile", it'd be nice to record something of the character "compiled by tzcode-2004a/zic from tzdata-2005d/africa"; . add a "time reference" field --- have the file document whether the transitions are on a TAI ("right") or a UTC ("posix") clock, for example (see my "wish-list" item below for another potential class of values); . add support for additional "optional" extension data --- the code written such that it will ignore unknown extensions. One idea for such a future extension is to include polygon data describing the geographic region covered by the zone. (I'm not sure that such data really belongs in tzfiles, but I'm also not completely convinced that it doesn't. The issue is that the name of the zone is mostly arbitrary; it is the spatial and temporal boundaries that really identifies a zone.) * The complexity of interpreting rules on different calendars is all pushed into the preprocessing done by zic; the run-time code need not know anything about them. (Current needs include Gregorian, Hebrew, and Persian. Future needs might include Islamic, Eastern Orthodox (like Gregorian, but with different "multiples of 100" rules), Chinese, and Japanese, but we should wait until such a need actually arises before worrying about them.) [Did any country which used the Julian calendar in the last 100 years or so (e.g., Tsarist Russia) ever observe daylight saving transitions based on that system of dates?] Adding such support can be made at any convenient time, before or after the switch to 64-bit timestamps in tzfile; in the interim we'll just continue to use the work-around currently employed for Iran and Israel: embed a bunch of special-case entries in the tzdata source, based on external conversion to Gregorian dates. * The run-time APIs in this implementation should continue to be limited to the (proleptic) Gregorian calendar, (the one which is mandated by the C and POSIX APIs) (no externally visible change). Though I still slightly favor the ability to expose a Julian day ("modified" or not), in light of the above am also willing to say that applications which wish to work with dates in non-Gregorian calendars can just base their interconversions on the (tm_year,tm_yday) pair instead. Such applications as can handle things like Sweden's multiple transitions to the Gregorian calendar or the calendrical chaos in Rome around the time of Julius Caesar's reign, or the Mayan calendar, or the World calendar, or any other manner of ways that the days have been marked (actual or proposed) in different places and times are quite welcome, but outside the scope of this project. An item that is still on my personal wish-list (but I'm now questioning whether the complexity is justified) is support for "zoneless" times based on local sun (real/apparent, and/or mean). I mentioned Saudi Arabia in my earlier posting, but really my interest is for times in the pre-standard-time past, and perhaps as a sane "best guess" for dates between the "N years into the future" cut-off and such time as our projections of earth rotation become notably inaccurate (by which I mean, the usefulness of the guess goes down as the error bars on the projection expand; the code can probably be left to blithely calculate "local time" beyond the heat-death/big-crunch/whatever of the universe). Like the addition of support for non-Gregorian calendars in zic, this can mostly be deferred as something independent of the redefinition of the tzfile format. The only support that might be helpful is a means to annotate "use sun angle at meridian N" (and whether that is real or mean sun) as an alternative to "UTC" or "TAI". (Or in addition to: have the code fall back to sun time when the date is outside of the range of years covered by zone information?) An "it might be nice" item that is neither strongly required, nor particularly hard to provide, is a tzdata-to-XML translator. This probably should have options to either output what is essentially tzfile data in XML format, or to re-interpret the tzdata files in XML form. The main justification for this is that it would make it easier for other applications to import our hard-won data without having to build custom parsers or tzfile readers. I'm also curious as to whether an XML based variant of the tzdata file would be any easier to use/edit/maintain, if someone else is motivated to do the experiment (my guess is that it would not be, which is why I'm not making the effort myself). Cheers, --Ken Pizzini