I agree that the data is key, and by that I mean the distributed 'zic input data' (eg "southamerica"). However, I disagree strongly about:
> no-one should really be overly concerned with the format in which we write it
When a data file format is in very widespread use, changes to it are extremely painful for downstream clients. I've had plenty of experience with this with Unicode, BCP47, CLDR, and similar levels of internal changes at the companies I've worked at. Seemingly trivial changes have a way of screwing up lots of programs and millions of people.
If the TZDB were not important, arbitrary changes would not matter. But it is a crucial part of the world's software stack; its very importance cries out for stability. (As a trivial counterexample for "no-one should really be overly concerned with the format", try changing the character set of the files to EBCDIC and see how many squawks you get from users).
Now, there are ways to both expand the format and retain stability. Here are a couple of ways to do that.
A. Bifurcate the data
- Core. Always make available a set of data files in the current format. No changes to support "advanced" features like SAVE<0, fractional digits, etc. No splitting IDs because of advanced features either.
- Advanced. The format of data files can change "with no concern", in order to support "advanced" features.
One way to make this practical is to always have a program that generates the core data by filtering the advanced. It is important, however, such a program strictly minimize the textual changes to the core, so that diffing produces changes on the order of what it done now, for updates to country rules.
B. Add conditionals
Another way is to have just one set of files, but have well-defined "conditionals" to enable new features. Here is an example, just for illustration:
# @ IF FRACTIONAL
# @ Rule Arg 2007 only - Dec 30 0:00.0000001 1:00 S
# @ ELSE
Rule Arg 2007 only - Dec 30 0:00 1:00 S
# @ END
The key to having that work is that older implementations will just ignore the # @... lines, and newer implementations that want to support the features can use them.