Thanks for the message.

Yes, you were correct in thinking that

If the FORMAT field contains either a "%s" or a '/' then the RULES field must contain a named rule.

was a trial rule of my own invention for my own parser implementation. I saw a pattern in the historic data and was curious as to whether it applied everywhere. I have now modified my checks so that all files pass.

To be clear, I am not suggesting that the zic compiler mishandles any old data files. Neither am I suggesting that there are any errors in the zic documentation.

When I was referring to data being at slight variance to the documentation, the documentation I was referring to was:

https://data.iana.org/time-zones/tz-link.html and https://data.iana.org/time-zones/tz-how-to.html

I now recognise that I would have been better off using the zic documentation as my primary source.

Nonetheless, here are a few things I have found:

1. tz_link.html states that:

Sources for the tz database are UTF-8 text files...

Some of the comments in some of the old files contain non UTF-8 single byte representations of accented letters. Since such occurrences are in the comments this will not affect anything.

2. The tz_how-to.html states that:

Prior to the 2020b release, it was called the TYPE field, though it was never used in the main data ...

However, some of the old data in https://data.iana.org/time-zones/releases/ contains "even" and "odd" to account for the Adeleide festival. (I got round this by excluding the versions of the Australia/Adeleide exhibiting "even" and "odd".)

3. The tz_how-to.html states that:

The FORMAT column specifies the usual abbreviation of the time zone name. It can have one of three forms:
a string of three or more characters that are either ASCII alphanumerics, “+”, or “-”, in which case that’s the abbreviation ...

I had to allow an underscore and space to allow all the files to pass. In the case of St. Helena I also had to allow a '?' as the first character. Further, I had to allow an abbreviation in a '/' separated format to be only two characters.(I recognise that this is not technically in violation of the statement above.)

4. I can see that some of the older files use a '?' where the more modern files use '%s'. This is not mentioned in the tz_how-to.html documentation, I recognise that putting such obscurities in the document may not be a good idea.

As you can see these are all very minor things. I appreciate your quick responses.

Regards

Nick

On Fri, 13 May 2022 at 20:20, Paul Eggert <eggert@cs.ucla.edu> wrote:

On 5/13/22 09:35, Tim Parenti via tz wrote:

> I'm not sure where your "must contain a named rule" quote is coming from

I imagine this was a style rule of Nick's invention. Violating the rule
might issue a warning but it shouldn't be a fatal error, as the 'asia'
file was correct as-is.

> in this case the "rule" is not named, *per se*, but is rather a constant
> 1:00. The relevant thing is that the RULES field is not "-".

Or more precisely it's that the RULES column is neither "-", nor a
suffixless zero offset, nor an offset with an "s" suffix. We don't use
any of these more-obscure features in TZDB data but they're in the .zi
format.

Since TZDB consistently avoids '/' in the many other places where this
situation arises, it should avoid '/' here for stylistic consistency. So
I installed the attached proposed patches. The 1st patch omits the '/'
in question; the 2nd documents that STDOFF columns don't have suffixes
(this wasn't clear in the man page, and I discovered this while looking
into the 3rd patch), and the 3rd adds a style check for this.

None of these patches affect the TZif output files.