On 2024-01-05 15:16, Stephen Colebourne via tz wrote:
On Fri, 5 Jan 2024 at 19:31, Paul Eggert via tz <tz@iana.org> wrote:
It might be possible to add a flag to zic to tell it to output larger TZif files that contain transitions and other information that do not affect localtime but might aid other applications. However, I don't see how such a flag could preserve all the relevant information, without a change to the TZif file format. So unless we change the TZif format, users who want all the info in the .zi input files would need to look at the .zi files anyway.
I suspect that most people write TZDB source parsers because they want access to more data than the binary format provides. The source files are a wealth of information, which cannot be obtained in any other way.
As such, it could be a useful direction for TZDB to provide an alternate output format - effectively a standardised version of the data in the source files. Logically this would be in JSON (or XML) format and well-documented This would allow most external parsers to be refactored to use the new data format. For example, modern Java uses a list of historic transitions and encoded rules for future transitions. But some others prefer a list of transitions into the future (to some future year). I suspect the new format would supply both the rules and resolved transitions for future dates.
See https://github.com/jodastephen/tzdiff/blob/master/data/Europe-London.txt for the kind of data Java needs (transitions and rules).
Issues such as the negative daylight savings flag go away. The alternate format would simply supply both flags. eg. for Europe/Dublin winter would have something like "dstLegal=true" and "dstSummer=false".
Note that it would basically need to expose all data in the source files (otherwise people will keep on parsing the source files). This therefore includes pre-1970 data for all regions - but that could be explicitly in a separate section of the format. (ie. all pre-1970 data from all countries would be separate from all post-1970 data, allowing data consumers to pick and choose what they want)
Ideally, the final TZif binary format would be derived from the new alternate format, thus the flow would be TZ source files (intended for internal TZDB use only) -> TZ JSON -> TZif binary
If there is interest, I could work on the JSON format needed.
tzdata `make ... tzdata.zi` gets you that preprocessed alternate format as well as {rearguard,main,vanguard.zi} with the abbreviated format as tzdata.zi, which zic also understands, and you should probably make your input one of these if your project wants to generate JSON or XML for further processing. I will be happy if the project wants to stick to plain text data formats and steer clear of barely readable and/or verbose hierarchical formats with arbitrary tags that require toolkits to make use of. -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry