Re: [tz] Proposal: validation text file with releases

July 13, 2015

      Given that I've already found discrepancies (see "Discrepancies in time
zone data interpretation") I'm going to go ahead and hack on this in purely
pragmatic (read: short term) ways. I'll create a github repo just for this
purpose and dump code in there - this is explicitly with the aim of
encouraging a more permanent solution by proving value.

Will post another message here when there's something worth looking at -
I'll be initially looking at zdump output, Joda Time, standard Java, and
Noda Time. Contributions from others for other languages/platforms will be
very welcome.

Jon

On 13 July 2015 at 14:46, Stephen Colebourne <scolebourne@joda.org> wrote:
...
FWIW, I think such a format would be very useful. Effectively, it is a
unit test for others to confirm that they interpret the rules the same
way as intended.
It is similar to what I produced when trying to demonstrate the amount
of change being caused by apparently "minor" changes to the data:
https://github.com/jodastephen/tzdiff/commits/master
Any output of this type should indeed just consist of a simple text
file with ISO-8601 format timestamps.
Stephen
...
Background: I'm the primary developer for Noda Time which consumes the tz
data. I'm currently refactoring the code to do this... and I've come
across
some code (originally ported from Joda Time) which I now understand in
terms
of what it's doing, but not exactly why.
For a little while now, the Noda Time source repo has included a text
dump
file, containing a text dump of every transition (up to 2100, at the
moment)
for every time zone. It looks like this, picking just one example:
Zone: Africa/Maseru
LMT: [StartOfTime, 1892-02-07T22:08:00Z) +01:52 (+00)
SAST: [1892-02-07T22:08:00Z, 1903-02-28T22:30:00Z) +01:30 (+00)
SAST: [1903-02-28T22:30:00Z, 1942-09-20T00:00:00Z) +02 (+00)
SAST: [1942-09-20T00:00:00Z, 1943-03-20T23:00:00Z) +03 (+01)
SAST: [1943-03-20T23:00:00Z, 1943-09-19T00:00:00Z) +02 (+00)
SAST: [1943-09-19T00:00:00Z, 1944-03-18T23:00:00Z) +03 (+01)
SAST: [1944-03-18T23:00:00Z, EndOfTime) +02 (+00)
I use this file for confidence when refactoring my time zone handling
code -
if the new code comes up with the same set of transitions as the old
code,
it's probably okay. (This is just one line of defence, of course - there
are
unit tests, though not as many as I'd like.)
It strikes me that having a similar file (I'm not wedded to the format,
but
it should have all the same information, one way or another) released
alongside the main data files would be really handy for all implementors
On 11 July 2015 at 11:35, Jon Skeet <skeet@pobox.com> wrote:
-
...
it would be a good way of validating consistency across multiple
platforms,
with the release data being canonical. For any platforms which didn't
want
to actually consume the rules as rules, but just wanted a list of
transitions, it could even effectively replace their use of the data.
One other benefit: diffing the dump between two releases would make it
clear
what had changed in effect, rather than just in terms of rules.
One sticking point is size. The current file for Noda Time is about 4MB,
although it zips down to about 300K. Some thoughts around this:
We wouldn't need to distribute it in the same file as the data - just as
we
have data and code file, there could be a "textdump" file or whatever
we'd
want to call it. These could be retroactively generated for previous
releases, too.
As you can see, there's redundancy in the format above, in that it's a
list
of "zone intervals" (as I call them in Noda Time) rather than a list of
transitions - the end of each interval is always the start of the next
interval.
For zones which settle into an infinite daylight saving pattern, I
currently
generate from the start of time to 2100 (and then a single zone interval
for
the end of time as Noda Time understands it; we'd need to work out what
form
that would take, if any). If we decided that "year of release + 30 years"
was enough, that would cut down the size considerably.
Any thoughts? If the feeling is broadly positive, the next step would be
to
nail down the text format, then find a willing victim/volunteer to write
the
C code. (You really don't want me writing C...)
Jon