Thank you, Dan, Paul, and Ian, for your comments. DJB wrote (in his acerbic way):
Nathan Myers writes:
I believe that conversions should yield a struct containing a reference timestamp, followed by up to four offsets tagged with official wall-clock offset, interpretation, and confidence values.
How exactly do you expect programs to use this information?
I can imagine a warning such as ``The time is jumping from 2:00 CDT back to 1:00 CST; I assume you meant 1:30 CDT; if you actually wanted 1:30 CST, type 1:30 CST.'' But this doesn't use your vague ``confidence values''; it uses hard data about the time zone.
Current programs handle time and time zones badly in large part because programmers understand the problems poorly or not at all. _That won't change._ Exposing programmers to more complexity than they understand, such as handing them a time zone transition database to explore (to see if their apparently successful conversion was in fact dubious) will not help matters appreciably. At best, it will slow down conversions in correct programs by orders of magnitude. The goal is to increase the number of programs that do something meaningful, achieved by encapsulating complexity. Certainly any program that needs to delve into a transition database should do so, but most should not. Users presented with such a level of detail are ill-equipped to evaluate it anyway. We improve matters by capturing our understanding in the code, and presenting a summary of meaningful results. Yes, the annotations I wrote were vague -- as noted in the posting, and as is appropriate in the sketch for a design. In a firm design things become a lot more precise. (Where I was too precise, e.g. "+/i 43200 seconds, it created a distraction.) Different programs will use the information differently; that's the *point*. A program that just needs a low-precision timestamp can use t0 and ignore the quibbles. An interactive program can present a confirmation query to the user. Non-interactive programs might log parts of the conversion report along with the timestamp, or use the report as a clue that they need to spend the time digging into the transition database -- otherwise a waste of time, in the common case. Some programs "know" that the time being entered wasn't just read off a wall clock, and can ignore the likelihood of the clock not having been reset. The conversion function doesn't know that, but the program can use the fact in its interpretation of the conversion result. Paul Hill wrote:
Nathan has provided an interesting list of ambiguities, but I see a few problems with the list. Nathan Myers wrote:
0. Unambiguous time, e.g. 04:30 morning of a time change
t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec; interpretation: unambiguous; confidence: certain offset 0: -3600 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful
1. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00. (Or 02:00:00 to 03:00:01? I don't know.)
t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful
2. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00 (or whatever).
t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: -3600 sec; wall-clock offset: 0 sec. interpretation: suggested substitute; confidence: doubtful
3. Autumn ambiguity, enter 01:30 on morning when civil time proceeds from 01:59:59 to 01:00; is it the first or second 01:30 event?
t0=123456789 offset 0: 0 sec; wall-clock offset: 3600 sec. interpretation: ambiguous choice; confidence: equal alternative offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: ambiguous choice; confidence: equal alternative
4. Autumn, enter 02:30, same morning as above; did they mean the official 02:30, or did they mean the second 01:30 because they failed to reset their clock?
t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: unofficial choice; confidence: Possible alternative
"Because they failed to reset their clock"! That possibility could apply to all times both DLS and non-DLS during all days of the year (or at least during some fuzzy set period in and around each time change), so the additional entry in #4 (caused by reading from an unofficial clock that wasn't changed) should be the same as the additional entry in #0 (caused by reading from an unofficial clock that wasn't changed).
We're dealing with _humans_, here. It's a reasonable goal to try to increase the reliability of time entries by noting likely errors. The reality is that it is extremely common for clocks to go unchanged for a few hours (or even a day or two) after the "official" time change. (I have shown up for work an hour early myself, as a result; I never look at a clock most Sundays.) A note that the time changed recently is *extremely* helpful when getting confirmation of a time entry, but most programmers are not equipped or inclined to root about in a transition database, especially when the conversion function has just done so and is far better-equipped to report what it found.
In the #0 your second possibility is "doubtful ... substitution", in #4 you have "possible ... unofficial". I don't see them as different. What are you trying to suggest by having so many categories? How can you really differentiate between them?
I agree that the two examples are too similar. Originally I had #0 as the canonical "certain" time, and then realized that (as you noted) that condition is rare. At some point the likelihood of an error due to a transition becomes smaller than that of a simple typo, which can only be determined empirically.
At a minimum this appears to suggest that there is a redundancy in your proposed catagories, but maybe I don't completely understand the use of the two offsets, which also differ between the two possibilities, but I can't see why they would.
Consider #0, then, to be the case the day after the time change. Two days after, we can say that a simple data entry "typo" is equally likely and drop the alternative.
Speaking of ambiguity, could you explain what your are really trying to capture in #1 and #2, because your descriptions are of the same circumstance. Maybe you meant #1 is one half hour after the time change ("2:30" is a Standard Time) and #2 is one half-hour before the time change ("2:30" is a DLS Time). If so, these are also the result of reading a clock that wasn't reset correctly and putting that value in the tm struct, so why are your return results different by more than just a sign in the offset.
Er, that's a typo. (Note the hour of the posting. Ironic, isn't it? :-) My apologies for the confusion. In the original posting I had two lists and it was correct in the first list, but transcribed wrong. The intended entry for #2 was 03:30. This is a valid "official" time, unlike 02:30. However, it is very likely to be wrong.
This seems to leave us with only one other possibility, #3, the one hour of time that really does correctly exist twice on a wall clock running in both Standard and DLS time correctly.
Yes, this is a case that unambiguously needs attention, and is frequently assumed to be the only one such. As Dan mentioned, a program that needs precision must sometimes root around in the transition database. A conversion function can offer a starting point, and can tell us whether we might need to look. Anything it can do to make looking unnecessary is good if it doesn't add too much complexity. Cases that are identical as far as the program is concerned (time is an hour off because of a clock not reset) are quite different in how the user experiences them. That distinction is worth preserving when presenting alternatives to the user who entered a datum. Nathan Myers ncm@cantrip.org