Apologies for following up my own posting, but I don't want to lose the thread here. I have edited the quoted text below for clarity.
Nathan Myers wrote:
The problem with the mktime model is that conversions between civil time and timestamps can result in 2, 1, or <1 valid conversions ...
... it's a rare wall clock, legal document, train schedule or airline ticket that indicates UTC offset, and when it does it's likely to be wrong, especially during the "interesting" times. ... The problem that normally arises is that someone enters a date/time and I want a numeric real-time value. I distinguish the following troublesome cases:
1. It's in spring and they enter 02:30, and that time doesn't exist, but there's a good guess (e.g. equiv. to 03:30) for what they probably meant. (There's a corresponding case for leap seconds.) 2. In spring they enter 03:30, which might be correct, but they might well have meant 04:30 because they failed to reset their clock. 3. In autumn, they enter 01:30; did they mean the first time it happened that day or the second? Either is equally valid. 4. (Also in autumn) If they enter 02:30, did they mean officially 02:30, or did they mean the second 01:30?
I believe that conversions should yield a struct containing a reference timestamp, followed by up to four offsets tagged with official wall-clock offset, interpretation, and confidence values. For example, in the cases above, and ignoring leap seconds for the moment, we would get 0. Unambiguous time, e.g. 04:30 morning of a time change t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec; interpretation: unambiguous; confidence: certain offset 0: -3600 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful 1. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00. (Or 02:00:00 to 03:00:01? I don't know.) t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful 2. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00 (or whatever). t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: -3600 sec; wall-clock offset: 0 sec. interpretation: suggested substitute; confidence: doubtful 3. Autumn ambiguity, enter 01:30 on morning when civil time proceeds from 01:59:59 to 01:00; is it the first or second 01:30 event? t0=123456789 offset 0: 0 sec; wall-clock offset: 3600 sec. interpretation: ambiguous choice; confidence: equal alternative offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: ambiguous choice; confidence: equal alternative 4. Autumn, enter 02:30, same morning as above; did they mean the official 02:30, or did they mean the second 01:30 because they failed to reset their clock? t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: unofficial choice; confidence: Possible alternative For each case above one might have up to two more entries corresponding to leap-second ambiguities. The detailed semantics of all the possible "interpretation" and "confidence" tags must be nailed down. This might end up looking sort of like the following C struct layout, using deliberately unsatisfactory names for exposition: typedef ... time_stamp; /* something more-or-less numeric */ typedef enum { official, unofficial, ambiguous, suggested } time_meaning; typedef enum { nominal, alternative, possible, doubtful } time_confidence; struct time_interpretation { int offset; /* limited to +/- 3601 */ long wall_offset; /* limited to +/- 43200 */ time_meaning meaning; time_confidence confidence; }; struct time_from_civil { time_stamp reference_time; int interpretation_count; /* range 0-4 */ time_interpretation interpretation[4]; }; (I've neglected leap-second ambiguity interpretations because they would reduce the clarity of the exposition.) The conversion function might look like: int time_convert(const struct tm *, const time_zone *, time_from_civil *); I think an interface like this would encapsulate knowledge about time conversion issues, exposing a high-level view of the possible real-time values implied by wall-clock time records. I think it would lead to better, more robust interfaces for conversions, and reduce programmer errors. Comments? Nathan Myers ncm@cantrip.org
long wall_offset; /* limited to +/- 43200 */
Re. the above - I may have missed something here but if you restrict the offset to +/-12 hours, you would run into difficulties with New Zealand DST which is UTC+13h. Ian Tragen -- World Clock Page:- http://www.page-1.com/time
Nathan Myers writes:
I believe that conversions should yield a struct containing a reference timestamp, followed by up to four offsets tagged with official wall-clock offset, interpretation, and confidence values.
How exactly do you expect programs to use this information? I can imagine a warning such as ``The time is jumping from 2:00 CDT back to 1:00 CST; I assume you meant 1:30 CDT; if you actually wanted 1:30 CST, type 1:30 CST.'' But this doesn't use your vague ``confidence values''; it uses hard data about the time zone. ---Dan 1000 recipients, 28.8 modem, 10 seconds. http://pobox.com/~djb/qmail/mini.html
Nathan has provided an interesting list of ambiguities, but I see a few problems with the list. Nathan Myers wrote:
0. Unambiguous time, e.g. 04:30 morning of a time change
t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec; interpretation: unambiguous; confidence: certain offset 0: -3600 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful
1. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00. (Or 02:00:00 to 03:00:01? I don't know.)
t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful
2. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00 (or whatever).
t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: -3600 sec; wall-clock offset: 0 sec. interpretation: suggested substitute; confidence: doubtful
3. Autumn ambiguity, enter 01:30 on morning when civil time proceeds from 01:59:59 to 01:00; is it the first or second 01:30 event?
t0=123456789 offset 0: 0 sec; wall-clock offset: 3600 sec. interpretation: ambiguous choice; confidence: equal alternative offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: ambiguous choice; confidence: equal alternative
4. Autumn, enter 02:30, same morning as above; did they mean the official 02:30, or did they mean the second 01:30 because they failed to reset their clock?
t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: unofficial choice; confidence: Possible alternative
"Because they failed to reset their clock"! That possibility could apply to all times both DLS and non-DLS during all days of the year (or at least during some fuzzy set period in and around each time change), so the additional entry in #4 (caused by reading from an unofficial clock that wasn't changed) should be the same as the additional entry in #0 (caused by reading from an unofficial clock that wasn't changed). In the #0 your second possibility is "doubtful ... substitution", in #4 you have "possible ... unofficial". I don't see them as different. What are you trying to suggest by having so many catagories? How can you really differentiate between them? At a minimum this appears to suggest that there is a redundancy in your proposed catagories, but maybe I don't completely understand the use of the two offsets, which also differ between the two possibilities, but I can't see why they would. Speaking of ambiguity, could you explain what your are really trying to capture in #1 and #2, because your descriptions are of the same circumstance. Maybe you meant #1 is one half hour after the time change ("2:30" is a Standard Time) and #2 is one half-hour before the time change ("2:30" is a DLS Time). If so, these are also the result of reading a clock that wasn't reset correctly and putting that value in the tm struct, so why are your return results different by more than just a sign in the offset. This seems to leave us with only one other possibility, #3, the one hour of time that really does correctly exist twice on a wall clock running in both Standard and DLS time correctly. thanks, -Paul Hill
Thank you, Dan, Paul, and Ian, for your comments. DJB wrote (in his acerbic way):
Nathan Myers writes:
I believe that conversions should yield a struct containing a reference timestamp, followed by up to four offsets tagged with official wall-clock offset, interpretation, and confidence values.
How exactly do you expect programs to use this information?
I can imagine a warning such as ``The time is jumping from 2:00 CDT back to 1:00 CST; I assume you meant 1:30 CDT; if you actually wanted 1:30 CST, type 1:30 CST.'' But this doesn't use your vague ``confidence values''; it uses hard data about the time zone.
Current programs handle time and time zones badly in large part because programmers understand the problems poorly or not at all. _That won't change._ Exposing programmers to more complexity than they understand, such as handing them a time zone transition database to explore (to see if their apparently successful conversion was in fact dubious) will not help matters appreciably. At best, it will slow down conversions in correct programs by orders of magnitude. The goal is to increase the number of programs that do something meaningful, achieved by encapsulating complexity. Certainly any program that needs to delve into a transition database should do so, but most should not. Users presented with such a level of detail are ill-equipped to evaluate it anyway. We improve matters by capturing our understanding in the code, and presenting a summary of meaningful results. Yes, the annotations I wrote were vague -- as noted in the posting, and as is appropriate in the sketch for a design. In a firm design things become a lot more precise. (Where I was too precise, e.g. "+/i 43200 seconds, it created a distraction.) Different programs will use the information differently; that's the *point*. A program that just needs a low-precision timestamp can use t0 and ignore the quibbles. An interactive program can present a confirmation query to the user. Non-interactive programs might log parts of the conversion report along with the timestamp, or use the report as a clue that they need to spend the time digging into the transition database -- otherwise a waste of time, in the common case. Some programs "know" that the time being entered wasn't just read off a wall clock, and can ignore the likelihood of the clock not having been reset. The conversion function doesn't know that, but the program can use the fact in its interpretation of the conversion result. Paul Hill wrote:
Nathan has provided an interesting list of ambiguities, but I see a few problems with the list. Nathan Myers wrote:
0. Unambiguous time, e.g. 04:30 morning of a time change
t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec; interpretation: unambiguous; confidence: certain offset 0: -3600 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful
1. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00. (Or 02:00:00 to 03:00:01? I don't know.)
t0=123456789 offset 0: 0 sec; wall-clock offset: -3600 sec. interpretation: suggested substitute; confidence: doubtful
2. Spring ambiguity, enter 02:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00 (or whatever).
t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: -3600 sec; wall-clock offset: 0 sec. interpretation: suggested substitute; confidence: doubtful
3. Autumn ambiguity, enter 01:30 on morning when civil time proceeds from 01:59:59 to 01:00; is it the first or second 01:30 event?
t0=123456789 offset 0: 0 sec; wall-clock offset: 3600 sec. interpretation: ambiguous choice; confidence: equal alternative offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: ambiguous choice; confidence: equal alternative
4. Autumn, enter 02:30, same morning as above; did they mean the official 02:30, or did they mean the second 01:30 because they failed to reset their clock?
t0=123456789 offset 0: 0 sec; wall-clock offset: 0 sec. interpretation: official choice; confidence: Nominally unambiguous offset 1: 3600 sec; wall-clock offset: 0 sec. interpretation: unofficial choice; confidence: Possible alternative
"Because they failed to reset their clock"! That possibility could apply to all times both DLS and non-DLS during all days of the year (or at least during some fuzzy set period in and around each time change), so the additional entry in #4 (caused by reading from an unofficial clock that wasn't changed) should be the same as the additional entry in #0 (caused by reading from an unofficial clock that wasn't changed).
We're dealing with _humans_, here. It's a reasonable goal to try to increase the reliability of time entries by noting likely errors. The reality is that it is extremely common for clocks to go unchanged for a few hours (or even a day or two) after the "official" time change. (I have shown up for work an hour early myself, as a result; I never look at a clock most Sundays.) A note that the time changed recently is *extremely* helpful when getting confirmation of a time entry, but most programmers are not equipped or inclined to root about in a transition database, especially when the conversion function has just done so and is far better-equipped to report what it found.
In the #0 your second possibility is "doubtful ... substitution", in #4 you have "possible ... unofficial". I don't see them as different. What are you trying to suggest by having so many categories? How can you really differentiate between them?
I agree that the two examples are too similar. Originally I had #0 as the canonical "certain" time, and then realized that (as you noted) that condition is rare. At some point the likelihood of an error due to a transition becomes smaller than that of a simple typo, which can only be determined empirically.
At a minimum this appears to suggest that there is a redundancy in your proposed catagories, but maybe I don't completely understand the use of the two offsets, which also differ between the two possibilities, but I can't see why they would.
Consider #0, then, to be the case the day after the time change. Two days after, we can say that a simple data entry "typo" is equally likely and drop the alternative.
Speaking of ambiguity, could you explain what your are really trying to capture in #1 and #2, because your descriptions are of the same circumstance. Maybe you meant #1 is one half hour after the time change ("2:30" is a Standard Time) and #2 is one half-hour before the time change ("2:30" is a DLS Time). If so, these are also the result of reading a clock that wasn't reset correctly and putting that value in the tm struct, so why are your return results different by more than just a sign in the offset.
Er, that's a typo. (Note the hour of the posting. Ironic, isn't it? :-) My apologies for the confusion. In the original posting I had two lists and it was correct in the first list, but transcribed wrong. The intended entry for #2 was 03:30. This is a valid "official" time, unlike 02:30. However, it is very likely to be wrong.
This seems to leave us with only one other possibility, #3, the one hour of time that really does correctly exist twice on a wall clock running in both Standard and DLS time correctly.
Yes, this is a case that unambiguously needs attention, and is frequently assumed to be the only one such. As Dan mentioned, a program that needs precision must sometimes root around in the transition database. A conversion function can offer a starting point, and can tell us whether we might need to look. Anything it can do to make looking unnecessary is good if it doesn't add too much complexity. Cases that are identical as far as the program is concerned (time is an hour off because of a clock not reset) are quite different in how the user experiences them. That distinction is worth preserving when presenting alternatives to the user who entered a datum. Nathan Myers ncm@cantrip.org
Nathan Myers wrote:
We're dealing with _humans_, here. It's a reasonable goal to try to increase the reliability of time entries by noting likely errors. The reality is that it is extremely common for clocks to go unchanged for a few hours (or even a day or two) after the "official" time change. (I have shown up for work an hour early myself, as a result; I never look at a clock most Sundays.)
A note that the time changed recently is *extremely* helpful when getting confirmation of a time entry, but most programmers are not equipped or inclined to root about in a transition database,
Why would a programmer want to? The programmer asked to please convert the fields into a timestamp given the TZ provided. I don't see how going to look for other transition rules has anything to do with it. There is still a list of possible interpretations given just one possible fully accurate TZ definition. Going looking in a transition DB only provides more entries for the list returned (An additional entry might capture a possible alternate based on the possibility that the programmer really did not pass the library the right TZ, because the library code can see that the given rules did not apply on the date provided in the time struct.)
I agree that the two examples are too similar. Originally I had #0 as the canonical "certain" time, and then realized that (as you noted) that condition is rare. At some point the likelihood of an error due to a transition becomes smaller than that of a simple typo, which can only be determined empirically.
So how are you going to put that fuzzy set in the library code so that sometimes you can return a "doubtful subsitution" and sometimes you can return a "possible unofficial time"?
At a minimum this appears to suggest that there is a redundancy in your proposed catagories, but maybe I don't completely understand the use of the two offsets, which also differ between the two possibilities, but I can't see why they would.
Consider #0, then, to be the case the day after the time change. Two days after, we can say that a simple data entry "typo" is equally likely and drop the alternative.
You are loosing me here. Would the time in question be a time that appears to be near the time change but actually asked to be converted (or re-examined) two days later, or would the time in question be a time that is two days later?
Speaking of ambiguity, could you explain what your are really trying to capture in #1 and #2, because your descriptions are of the same circumstance. Maybe you meant #1 is one half hour after the time change ("2:30" is a Standard Time) and #2 is one half-hour before the time change ("2:30" is a DLS Time). If so, these are also the result of reading a clock that wasn't reset correctly and putting that value in the tm struct, so why are your return results different by more than just a sign in the offset.
Er, that's a typo. (Note the hour of the posting. Ironic, isn't it? :-) My apologies for the confusion. In the original posting I had two lists and it was correct in the first list, but transcribed wrong.
The intended entry for #2 was 03:30. This is a valid "official" time, unlike 02:30. However, it is very likely to be wrong.
But that makes the #2 entry jibberish, if I rewrite it as: 2. Spring ambiguity, enter 03:30 when it doesn't exist because civil time proceeded 01:59:59 -> 03:00:00 (or whatever). Say what? 3:30 does exist in DLS, but it might be an old time in ST. Maybe it should be: 2. Spring ambiguity, enter 03:30, but that is so close to the change time it is suspicious. It would seem you could do the discussion of favor and repost a complete list of possible combinations and what kind of rules you are going to use to put your tags on all of the different interpreations.
Cases that are identical as far as the program is concerned (time is an hour off because of a clock not reset) are quite different in how the user experiences them.
"identical as far as the program is concerned", but you are trying to return two different values from the "alternative" time. What magic is used to generate the lables on the alternatives? Listing them as alternatives, okay, I see that. I just don't see how you can the alternatives differently.
In the #0 your second possibility is "doubtful ... substitution", in #4 you have "possible ... unofficial".
It would seem that that deciding the importance of an alternative as you state is up to the programmer who called this service, so all you can do is provide alternatives. One has to ask: since you have 16 possible return flag combinations, could you please explain why only 5 of them are covered in your list (and I see only 2), and what is right or wrong about the other 11, i.e. "doubtful ... unofficial" etc. If you only have 5, just have one enum that lists them and not imply states that don't exist. -Paul
participants (5)
-
D. J. Bernstein -
Ian Tragen -
Nathan Myers -
ncm@cantrip.org -
Paul Hill