On 18 July 2015 at 23:01, Howard Hinnant <howard.hinnant@gmail.com> wrote:
On Jul 18, 2015, at 3:40 PM, Jon Skeet <skeet@pobox.com> wrote:
>
> Next update: I've improved the zdump-based generation of the data, and put the data in the current format for all the tz data releases I can find (from 1996 onwards) at http://nodatime.org/tzvalidate/

I’ve generated a version of tzdata2015e-tzvalidate.txt.zip from my code here:

http://howardhinnant.github.io/tzdata2015e-tzvalidate.txt.zip

I saw your earlier message and hoped you were reading this thread too. Supporting code such as yours is precisely the motivation for this endeavour.
 
There are appear to be two kinds of differences:

1.  I appear to start earlier than you, for example I have:

Africa/Algiers
1891-03-14T23:48:48Z +00:09:21 standard PMT

and you do not.

That much is simple to explain - the format I'm currently generating explicitly starts in 1905 and ends in 2035. The 1905 part was due to an earlier version of zdump I was using was limited to 1900.
As per Paul's messages earlier in the thread, eventually we'll want to expose more data - although it's not clear how late it's worth going. (I doubt that it's worth extending beyond 2100 for example.)
 
2.  This one has me more concerned:  When a zone specifies a rule/date combination and the date falls of the beginning of the rule table, I assume a “” variable part, where you appear to assume a “S” variable part.  For example, I have:

America/Barbados
1924-01-01T03:58:29Z -03:58:29 standard BMT
1932-01-01T03:58:29Z -04:00:00 standard AT
1977-06-12T06:00:00Z -03:00:00 daylight ADT

And you have:

America/Barbados
1924-01-01T03:58:29Z -03:58:29 standard BMT
1932-01-01T03:58:29Z -04:00:00 standard AST
1977-06-12T06:00:00Z -03:00:00 daylight ADT

Just to be clear, this isn't "me" so much as "zic and then zdump". It happens that Noda Time (which is more "my" code) does the same thing though :)
 
The America/Barbados Zone switches to the Barb Rule on 1932-01-01T03:58:29Z, using the format A%sT.  But the first Barb Rule is 1977-06-12 2:00.  I looked for documentation for what is supposed to happen in a situation like this, but didn’t find anything.

I think AST makes sense here (as it's standard time) but I agree that it's not clearly documented.

In Noda Time, if I don't find a rule leading "into" the  transition period, I take the name of the first rule with no daylight savings.
See https://github.com/nodatime/nodatime/blob/20d57967e04f1b57a10c00910f337a1c3caf7522/src/NodaTime.TzdbCompiler/Tzdb/DateTimeZoneBuilder.cs#L127 for the code involved.

zic appears to implement equivalent behaviour, although I wouldn't like to pin down where.

I'd be interested in seeing whether your understanding of the data in natural language ties in with the comments expressed in DateTimeZoneBuilder at the link above, by the way.

Jon