Thanks for the message. Yes, you were correct in thinking that *If the FORMAT field contains either a "%s" or a '/' then the RULES field must contain a named rule.* was a trial rule of my own invention for my own parser implementation. I saw a pattern in the historic data and was curious as to whether it applied everywhere. I have now modified my checks so that all files pass. To be clear, I am not suggesting that the zic compiler mishandles any old data files. Neither am I suggesting that there are any errors in the zic documentation. When I was referring to data being at slight variance to the documentation, the documentation I was referring to was: https://data.iana.org/time-zones/tz-link.html and https://data.iana.org/time-zones/tz-how-to.html I now recognise that I would have been better off using the zic documentation as my primary source. Nonetheless, here are a few things I have found: 1. tz_link.html states that: *Sources for the tz database are UTF-8 text files... * Some of the comments in some of the old files contain non UTF-8 single byte representations of accented letters. Since such occurrences are in the comments this will not affect anything. 2. The tz_how-to.html states that: *Prior to the 2020b release, it was called the TYPE field, though it was never used in the main data ...* However, some of the old data in https://data.iana.org/time-zones/releases/ contains "even" and "odd" to account for the Adeleide festival. (I got round this by excluding the versions of the Australia/Adeleide exhibiting "even" and "odd".) 3. The tz_how-to.html states that: *The FORMAT column specifies the usual abbreviation of the time zone name. It can have one of three forms:a string of three or more characters that are either ASCII alphanumerics, “+”, or “-”, in which case that’s the abbreviation ...* I had to allow an underscore and space to allow all the files to pass. In the case of St. Helena I also had to allow a '?' as the first character. Further, I had to allow an abbreviation in a '/' separated format to be only two characters.(I recognise that this is not technically in violation of the statement above.) 4. I can see that some of the older files use a '?' where the more modern files use '%s'. This is not mentioned in the tz_how-to.html documentation, I recognise that putting such obscurities in the document may not be a good idea. As you can see these are all very minor things. I appreciate your quick responses. Regards Nick On Fri, 13 May 2022 at 20:20, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 5/13/22 09:35, Tim Parenti via tz wrote:
I'm not sure where your "must contain a named rule" quote is coming from
I imagine this was a style rule of Nick's invention. Violating the rule might issue a warning but it shouldn't be a fatal error, as the 'asia' file was correct as-is.
in this case the "rule" is not named, *per se*, but is rather a constant 1:00. The relevant thing is that the RULES field is not "-".
Or more precisely it's that the RULES column is neither "-", nor a suffixless zero offset, nor an offset with an "s" suffix. We don't use any of these more-obscure features in TZDB data but they're in the .zi format.
Since TZDB consistently avoids '/' in the many other places where this situation arises, it should avoid '/' here for stylistic consistency. So I installed the attached proposed patches. The 1st patch omits the '/' in question; the 2nd documents that STDOFF columns don't have suffixes (this wasn't clear in the man page, and I discovered this while looking into the 3rd patch), and the 3rd adds a style check for this.
None of these patches affect the TZif output files.