Robert Elz wrote:
The final column is "comments" - there's no stated restriction on the characters that can be used in comments
True, and your interpretation with the comments beginning with a tab is possible. (I was half expecting DateTime::TimeZone (another repackaging of the database for CPAN), which went out promptly after 2012a, to have ended up with this interpretation, but it turns out it went with the two tabs being a single separator. I don't know how automated Dave Rolsky has that release process.) However, tab-separated-value tables conventionally don't allow tabs to be part of the data, each tab being a separator. Anyway, I was aware of some ambiguity here when I wrote my parser. Quite apart from the tab issue, there's no stated restriction of the comments to ASCII, but there's also no indication of which encoding would be used for non-ASCII characters. So I made the parser as strict as possible based on the partial statement of the file format and the (admirably regular) data actually seen. This includes a restriction that the comments contain only printable ASCII, and neither begin nor end with whitespace. On its face this isn't in accord with receiving half of the Postel principle, but the failure mode here isn't a total failure of operation, it's to kick the issue up for conscious human attention. (It emailed me.) The design is conservative in that I've told the parser not to guess the meaning of anything irregular. Rather than argue about what the current syntax definition means, when it's plainly unclear on some of the details, I'd rather resolve this by making the definition more detailed. I suggest that it should be defined to match the strict syntax to which the data has heretofore adhered, and which my parser expects. For reference, these are the Perl regexps that I use to parse zone.tab (in Time::OlsonTZ::Download): $line =~ /\A([A-Z]{2}) \t([-+][0-9]{4}(?:[0-9]{2})?[-+][0-9]{5}(?:[0-9]{2})?) \t([!-~]+) (?:\t([!-~][ -~]*[!-~]))? \n\z/x; $line =~ /\A#[^\n]*\n\z/; We should also have an automated test, as part of tzcode, that checks that the file matches whatever detailed syntax is decided, and that its content is semantically sane (refers only to defined zones, for example). I'm happy to translate my regexps, or equivalents for whatever other syntax we agree on, into C for this purpose. The same goes for iso3166.tab. -zefram