I'm totally in favour of the pragmatic approach of assuming there isn't crazy data. Let's try to design the format to accomplish what we need it to - which to my mind doesn't include years earlier than 1800 or later than 3000 (and probably a narrower range than that - I currently only check up to 2035, but that makes me somewhat nervous).

I'd rather have something that does what it's designed for well and doesn't work for tasks it's not designed for than something that copes with everything, but does so in a mediocre way.

I quite like Tim's padding idea overall, although I'd stlil argue for colons in offsets (RFC5322 uses a horrible format in general; I see no reason to copy mistakes of the past) and "d" instead of "a positive integer" to indicate daylight. (My goal is for this to capture all the relevant information from the original data text files, but the implementation details of is_dst fall outside that scope IMO. That's not part of the source data.)

I'm still fine with the idea of transforming some non-ideal-to-me format from zdump into a more canonical format in a simple way though.

Jon

On 7 June 2016 at 09:44, Paul Eggert <eggert@cs.ucla.edu> wrote:

Tim Parenti wrote:

I realize the goal may be to have a single canonical format, but perhaps
this could be made conditional on a -z option?

Yes, or some such option like that. I was thinking more of a strftime-like format in which one could specify UT vs local time.

Just to throw in a potential middle-of-the-road option, would it make sense
to space-pad the datetime and offset values instead?

I thought of doing that, but found that it would be a pain, since the amount of padding would be system-dependent. Every field whose extrema depend on machine integer size (the year, the UT offset) would have a width that would depend on the current machine architecture, and this would mean zdump -i would generate different outputs on different machine architectures. Plus, there would be a lot of spaces before the year and the UT offsets.

Alternatively, zdump could look at all output to be generated for this particular zdump run, compute the maximum width needed for the run, and use that width. But this would mean that 'zdump -i A; zdump -i B' would not necessarily output the same thing as 'zdump -i A B', which would not be good at all.

Alternatively, zdump could not bother to align outlandish years outside the range -999,9999 or outlandish UT offsets that are more than 100 hours away from UT. Something like that might work, I suppose, though we'd probably still get bug reports from compulsive aligners wondering why the outlandish cases aren't aligned properly, or why there's all that white space in the columns.