I'm totally in favour of the pragmatic approach of assuming there isn't crazy data. Let's try to design the format to accomplish what we need it to - which to my mind doesn't include years earlier than 1800 or later than 3000 (and probably a narrower range than that - I currently only check up to 2035, but that makes me somewhat nervous).
I'd rather have something that does what it's designed for well and doesn't work for tasks it's not designed for than something that copes with everything, but does so in a mediocre way.
I quite like Tim's padding idea overall, although I'd stlil argue for colons in offsets (RFC5322 uses a horrible format in general; I see no reason to copy mistakes of the past) and "d" instead of "a positive integer" to indicate daylight. (My goal is for this to capture all the relevant information from the original data text files, but the implementation details of is_dst fall outside that scope IMO. That's not part of the source data.)
I'm still fine with the idea of transforming some non-ideal-to-me format from zdump into a more canonical format in a simple way though.
Jon