
Jon Skeet wrote:
The use case I'm primarily interested in is validation: diffing a "golden" file with one generated by another tool
Yes, I should have mentioned that. I commonly compare two zdump output files using "diff", for example. zdump -i works well for this, too. However, it does not suffice to merely look at diff output. Sometimes we add new zones, for example, and diff output won't serve to proofread those.
I wouldn't expect them to be dealing with this format every day
True; even I don't do that. Still, there is no need for zdump -i format to be self-explanatory. For example, the format need not use strftime %c format merely because naive users are more likely to understand %c format than ISO 8601 format. As long as the format is reasonably clear without constantly having to refer to the documentation then we should be OK, and zdump -i format clears that relatively-low bar.
- I don't see why we need the quoted form for the time zone ID.
The API allows the TZ environment variable (the time zone ID) to be any finite sequence of non-null bytes. TZ need not be UTF-8 encoded, and the bytes can contain newlines, etc., and zdump output should be unambiguous regardless of how weird TZ's value is.
Presumably the benefit of the proposed format is that you can copy/paste it into a Unix shell to use that time zone.
No, and in general such a cut-and-paste would not work because the quotation scheme is not designed to be shell-compatible. The main goal is to have an unambiguous format that supports any TZ value allowed by the API. Also, to provide some room for future extensions to zdump -i format.
the quotes and TZ= part are an unnecessary distraction IMO.
Some decoration is needed in order to make it easy to distinguish a TZ= line from an ordinary data line. This is because a TZ string can be almost anything: it can look like a data line, for example. Anyway, if this is the worst of zdump -i's problems, we should be OK.
- Indicating daylight/standard with an arbitrary positive integer: if this is going to be a canonical format, we need to be more precise than that. Equivalent outputs should be equal. I'd also prefer it not to be an integer at all, given that it's indicating a Boolean value.
tm_isdst is defined by ISO C11 and by POSIX to be an int value, so if we want zdump to work with all standard-conforming implementations without losing information, it must be able to represent an arbitrary int somehow. The existing zdump -v format can do it, and it would be odd if zdump -i format were to lose that ability.
- I'd *really* like colons in the UT offsets
That is mostly just a style thing. That being said, in my experience most UT offsets that contain hours and minutes omit colons (this includes several examples in the RFC-5322-format header in your email :-).
- I think it's simpler to think about the transition times in UT, indicated with a Z in the output.
That's not my experience. Most of our sources do not base transitions on UT, and I typically think about local time when mulling over transitions and DST rules.
choosing the local time *after* the transition isn't how most people think about transitions in day to day conversation.
True. But it's easy to get used to when looking at zdump -i format. Plus, users most likely prefer localtime to UT when thinking about transitions.
Just the fact that there's ambiguity
The format is documented and if this documentation is understood correctly the zdump -i output has just one interpretation, so there is no ambiguity. A problem might arise if someone attempts to look at zdump -i output without reading the documentation; although such a problem could occur with any format choice, some formats are less confusing than others, and most likely that is what you're referring to. To some extent there is a tradeoff between formats that make typos easy to find, and formats that are more what users typically expect. Within reason I'd rather make typos easy to find, as typos are a real probelm!
- Omitting the abbreviation when it happens to be the same as the UT offset makes the file harder to parse for very little benefit in my view.
First, it's trivial to parse zdump -i lines even when the abbreviation is omitted. For example, here's an awk script that outputs only zdump -i lines that correspond to DST transitions even when abbreviations are omitted: /^[0-9]/ && NF > 3 && /[0-9]$/ {print} Compare this to an awk script to do the same thing with tzvalidate format: /^[0-9]/ && $(NF - 1) == "daylight" {print} which is not significantly simpler. Second, I realize the improvement is of little benefit to those who do not read zdump output. But any unambiguous format would do for that case; we could pick JSON format, or XML format, or whatever. Being somewhat old-fashioned I'd like a text format that makes it easy for me to read zdump -i format using an ordinary text editor. And for me, it's quite useful that redundant abbreviations are omitted. Consider, for example, this output: 1981-04-01 01 +07 1 1981-09-30 23 +06 1982-04-01 01 +07 1 1982-09-30 23 +06 1983-04-01 01 +07 +08 1 1983-09-30 23 +06 1984-04-01 01 +07 1 1984-09-30 02 +06 where the (incorrect) 1983-04-01 transition sticks out like a sore thumb. In contrast, if the abbreviation were always output and columns always lined up, and the output looked like this: 1981-04-01 01 +07 +07 1 1981-09-30 23 +06 +06 0 1982-04-01 01 +07 +07 1 1982-09-30 23 +06 +06 0 1983-04-01 01 +07 +08 1 1983-09-30 23 +06 +06 0 1984-04-01 01 +07 +07 1 1984-09-30 02 +06 +06 0 the same typo is *much* harder to spot. So it is not "very little benefit". It's a big deal to someone like me who wants to catch typos and who has to deal with the consequences of typos.
for times, I'd favour at least keeping the minutes
I was tempted by that too, on the grounds that it's what readers typically expect. However, it makes typos harder to catch, which is a significant disadvantage. I hope I've explained the significant technical advantages of zdump -i format for my use case (manually looking at zdump -i output, and looking at diffs of it). I am not surprised that its style is offputting, which is why I'm thinking that we may need a way for people to specify output style more flexibly than zdump -i versus zdump -v versus zdump -V.