Tzdb and the Sunshine Protection Act
You've seen the Sunshine Protection Act is back: Congress to consider making daylight saving time permanent https://www.axios.com/2023/03/02/daylight-saving-time-change-permanent-congr... As I read the law it would shift STDOFF by an hour, for example, America/New_York would move to -04:00 STDOFF, essentially moving "Eastern Time" into the Atlantic time zone. Historically tzdb has honored the laws as faithfully as possible so that might be the right approach. But it seems to me this would be technically disruptive, necessitating a great deal of careful modification of existing implementations. The other method is to shift DST rules to "permanent" DST. The law might arguably be interpreted this way. That might be less disruptive and leave the option of returning to DST when its no longer "permanent", as happened in the 1970s. There are many arguments for going to "permanent standard time" instead. One reason not mentioned is that it would be technically much easier and less disruptive. How will tzdb manage this?
On 3/2/23 14:22, Brooks Harris via tz wrote:
How will tzdb manage this?
Traditionally we've treated "permanent daylight saving" as standard time, and I'd rather continue this tradition than make an exception for the US. That is, tm_isdst would be 0. (Most people don't care about the tm_isdst flag, but POSIX and C standard nerds do.) Whether the adjusted time in (say) New York would be abbreviated "EST" or "AST" or "EDT" is up to common practice. We could use the abbreviation "-04" until common practice settles down. If common practice becomes "ET" we couldn't use that, unfortunately, as POSIX requires at least three characters. At some point "EST" might become the best of the alternatives. My biggest worry is the set of backward compatibility zones EST5EDT, CST6CDT, MST7MDT, PST8PDT as their continued use would lead to so much confusion that they'd be more trouble than they're worth. Presumably we would retire them by moving them to "backzone". "EST" and "MST" might need to retire as well. (Luckily, there is no "CST" or "PST".) Similar issues will come up if EU regions go to "permanent daylight saving", as they have threatened to do for years. Whatever we do in this area, it will be a mess.
On Thu, 2 Mar 2023 at 17:45, Paul Eggert via tz <tz@iana.org> wrote:
Whether the adjusted time in (say) New York would be abbreviated "EST" or "AST" or "EDT" is up to common practice.
[…] At some point "EST" might become the
best of the alternatives.
Worth considering that, if "EST" were to become standard for -04, it would require modifications to supported, but obsoleted, formats in RFC 2822 §4.3, which state: EDT is semantically equivalent to -0400 EST is semantically equivalent to -0500 CDT is semantically equivalent to -0500 CST is semantically equivalent to -0600 MDT is semantically equivalent to -0600 MST is semantically equivalent to -0700 PDT is semantically equivalent to -0700 PST is semantically equivalent to -0800 -- Tim Parenti
RFC 2822 is already obsolete, and this part was somewhat fixed in RFC 5322. See Section 3.3 of that document, in that those old zone names are marked as obsolete, but allowed. Pete can say more if he wants. Eliot On 03.03.23 00:04, Tim Parenti via tz wrote:
On Thu, 2 Mar 2023 at 17:45, Paul Eggert via tz <tz@iana.org> wrote:
Whether the adjusted time in (say) New York would be abbreviated "EST" or "AST" or "EDT" is up to common practice.
[…]
At some point "EST" might become the best of the alternatives.
Worth considering that, if "EST" were to become standard for -04, it would require modifications to supported, but obsoleted, formats in RFC 2822 §4.3, which state:
EDT is semantically equivalent to -0400 EST is semantically equivalent to -0500 CDT is semantically equivalent to -0500 CST is semantically equivalent to -0600 MDT is semantically equivalent to -0600 MST is semantically equivalent to -0700 PDT is semantically equivalent to -0700 PST is semantically equivalent to -0800
-- Tim Parenti
Eliot Lear wrote in <867b8fc8-1229-dca7-8408-2c80818dfc6b@lear.ch>: ... |On 03.03.23 00:04, Tim Parenti via tz wrote: |> On Thu, 2 Mar 2023 at 17:45, Paul Eggert via tz <tz@iana.org> wrote: |> Whether the adjusted time in (say) New York would be abbreviated \ |> "EST" |> or "AST" or "EDT" is up to common practice. ... |> Worth considering that, if "EST" were to become standard for -04, it |> would require modifications to supported, but obsoleted, formats in |> RFC 2822 §4.3, which state: |> |> EDT is semantically equivalent to -0400 ... |RFC 2822 is already obsolete, and this part was somewhat fixed in RFC |5322. See Section 3.3 of that document, in that those old zone names |are marked as obsolete, but allowed. Pete can say more if he wants. Destructive comment, but i do not copy this. Unless i give in and accept that over fourty years of email soon disappear in the void of digital history, except for bitsavers or wayback machine(s). ... --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
On 4 Mar 2023, at 11:46, Steffen Nurpmeso wrote:
Eliot Lear wrote in <867b8fc8-1229-dca7-8408-2c80818dfc6b@lear.ch>: ... |On 03.03.23 00:04, Tim Parenti via tz wrote: |> On Thu, 2 Mar 2023 at 17:45, Paul Eggert via tz <tz@iana.org> wrote: |> Whether the adjusted time in (say) New York would be abbreviated \ |> "EST" |> or "AST" or "EDT" is up to common practice. ... |> Worth considering that, if "EST" were to become standard for -04, it |> would require modifications to supported, but obsoleted, formats in |> RFC 2822 §4.3, which state: |> |> EDT is semantically equivalent to -0400
To be perfectly clear, even though the syntax was called "obsolete" in 2822 and 5322, we are clarifying in the full Standard (being completed soon) that it would probably have been better to call it "legacy": The elements in that section are not going away in any meaningful way, but rather they exist in archives of email and are generated by legacy systems from time to time, as Steffen rightly (though perhaps indelicately) points out. So as far as this syntactic form goes, "EDT" and the like should only exist in email archives or as generated by legacy systems (which no doubt will always believe that "EDT" is semantically equivalent to -0400), and therefore the meaning has not changed for purposes of interpreting extant email message "Date:" header fields. In current conformant email, those time zone designations are not generated.
|RFC 2822 is already obsolete, and this part was somewhat fixed in RFC |5322. See Section 3.3 of that document, in that those old zone names |are marked as obsolete, but allowed. Pete can say more if he wants.
There were no changes to this section between RFC 2822 and RFC 5322, and there won't be any changes when we move to full Standard. As above, when used in the interpretation of legacy email, those definitions are correct, independent of what happens in the future.
Destructive comment, but i do not copy this. Unless i give in and accept that over fourty years of email soon disappear in the void of digital history, except for bitsavers or wayback machine(s).
A rather quarrelsome way to say it, but yes, in legacy email, these forms will still exist and still mean what they always meant. Cheers, pr -- Pete Resnick https://www.episteme.net/ All connections to the world are tenuous at best
On 04.03.23 22:12, Pete Resnick wrote:
To be perfectly clear, even though the syntax was called "obsolete" in 2822 and 5322, we are clarifying in the full Standard (being completed soon) that it would probably have been better to call it "legacy": The elements in that section are not going away in any meaningful way, but rather they exist in archives of email and are generated by legacy systems from time to time, as Steffen rightly (though perhaps indelicately) points out.
Right. The point I'm making is that there is really nothing to do on the standards front in this case. We all know that email lives forever. Eliot
On 3/2/23 15:44:49, Paul Eggert via tz wrote:
On 3/2/23 14:22, Brooks Harris via tz wrote:
How will tzdb manage this? ... My biggest worry is the set of backward compatibility zones EST5EDT, CST6CDT, MST7MDT, PST8PDT as their continued use would lead to so much confusion that they'd be more trouble than they're worth. Presumably we would retire them by moving them to "backzone". "EST" and "MST" might need to retire as well. (Luckily, there is no "CST" or "PST".)
The forms containing numbers are mandated by POSIX: <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag...> ... The expanded format (for all TZ s whose value does not have a <colon> as the first character) is as follows: stdoffset[dst[offset][,start[/time],end[/time]]] and are the only forms supported by IBM's flagship operating system, z/OS: <https://www.ibm.com/docs/en/zos/2.5.0?topic=variable-command-format>. In the long term the change will prove pointless. People will readjust their hours of activity to match the hours of daylight as happened when clocks were advanced 6 hours in the past 4 centuries: <https://www.bible.com/bible/1/MAT.20.1-16.KJV>, then clamor for another adjustment. -- gil
On 3/2/23 15:32, Paul Gilmartin via tz wrote:
The forms containing numbers are mandated by POSIX:
POSIX does not specify TZ strings like TZ='EST5EDT'; they are TZDB extensions. If you want a TZ string whose meaning is specified, you need something like TZ='EST5EDT,M3.2.0,M11.1.0'. You can see this by looking a few lines before the lines you quoted, which say that a TZ string contents are "std offset dst offset, rule". Admittedly this part of POSIX could be written more clearly.
Date: Thu, 2 Mar 2023 15:49:09 -0800 From: Paul Eggert via tz <tz@iana.org> Message-ID: <555133d4-2fcc-0a25-e4a0-1ad0a569e661@cs.ucla.edu> | POSIX does not specify TZ strings like TZ='EST5EDT'; they are TZDB | extensions. If you want a TZ string whose meaning is specified, you need | something like TZ='EST5EDT,M3.2.0,M11.1.0'. That's not correct, otherwise TZ=UTC0 would not be POSIX, and it certainly is (there's obviously no way to specify summer time or times when it begins and ends, for UTC). | You can see this by looking a few lines before the lines you quoted, | which say that a TZ string contents are "std offset dst offset, rule". That's just a generic hint, the actual specification is later (and is unchanged, other than the reference to "for all TZs whose..." which has been altered to allow tzdata type names also) in the most recent drafts). It includes: The expanded format (for all TZs whose value does not have a <colon> as the first character) is as follows: stdoffset[dst[offset][,start[/time],end[/time]]] Where: std and dst Indicate no less than three, nor more than {TZNAME_MAX}, bytes that are the designation for the standard (std) or the alternative (dst--such as Daylight Savings Time) timezone. Only std is required; if dst is missing, then the alternative time does not apply in this locale. That's quite clear that all that is needed is XXXn to be a "POSIX" TZ specification, though any more included must meet the required syntax. The current standard doesn't say what to do when "dst" is given, but the rule (everything after the first comma, is not) - making it "implicitly" unspecified. The latest draft (currently available) of the forthcoming standard is clearer, it adds: If the dst field is specified and the rule field is not, it is implementation-defined when the changes to and from Daylight Saving Time occur. It shouldn't say "Daylight Saving Time" there, while the symbol in the grammar is "dst" it is otherwise described as "alternative timezone" and should be there as well. That may have already been fixed, if not, it will be before the new version of the standard gets published. Also note, that sometime in the future this "POSIX TZ" format will almost certainly be deprecated, and then removed. I believe there is now general acceptance that it is simply inadequate for real world timezones, other than the simplest ones - and is never able to describe anything other than a single shift to & from a single alternative timezone in one year (though the "single shift" could be fixed by allowing more than one set of start,end pairs - similarly, more XXXn fields could be added to specify more different zone offsets than just the two currently possible, but I very much doubt that anyone is going to work out how to specify that, particularly not if no-one is stupid enough to try to implement some version of this. The current draft also contains this: Daylight Saving Time is in effect all year if it starts January 1 at 00:00 and ends December 31 at 24:00 plus the difference between Daylight Saving Time and standard time, leaving no room for standard time in the calendar. For example, TZ='EST5EDT,0/0,J365/25' represents a time zone that observes Daylight Saving Time all year, being 4 hours west of UTC with abbreviation "EDT". which suggests an intent to be able to support "permanent summer time", though that complicated mess achieves little more than TZ=EDT4 except for the value of tm_isdst, which as you mentioned in an earlier message really has no effect on anything - it was more or less an index into the tzname[] array, which it doesn't do well at, as while that array has no defined upper bound on its index, tm_isdst is only permitted to be 0 or 1. That's all now largely obsoleted (though not yet in the standard) by tm_zone (and tm_gmtoff) (which will be in the standard). In this regard note that the standard already says: Implementations are encouraged to use the time zone database maintained by IANA to determine when Daylight Saving Time changes occur and to handle TZ values that start with a <colon>. See RFC 6557. That is, it has already been noted that POSIX TZ isn't really good enough. In the next draft, that will be altered to say Implementations are encouraged to incorporate the IANA timezone database into the timezone database used for TZ values specifying geographical and special timezones, and to provide a way to update it in accordance with RFC 6557. POSIX TZ strings are on their way to oblivion, fortunately. However, while they remain (which will be at least until the next (major) version of the standard, after the coming one - ie: at least another decade) the specification is that if the TZ value can be interpreted as a valid POSIX TZ string, then that is what it is. If that fails - which it will for anything which does not start xxxxN (at least 3 chars in the xxxx field), but can for many other reasons as well, then it is to be interpreted (of possible) as a geographic/ special TZ string (eg: as a tzdata zone name). And while I'm here, in an earlier message you said: | If common practice becomes "ET" we couldn't use that, | unfortunately, as POSIX requires at least three characters. That's also incorrect. It is true that to use a POSIX TZ string, in the form normally seen in the wild, like TZ=UTC0 (as above) the "std" (and "dst" field if given) must be at least 3 chars. But that field is allowed to be in what POSIX calls "quoted" format, where the first char is '<' and the last is '>' and those two count in the required 3 chars, but are not part of the name created (the minimum is three so that in quoted form, there is at least one meaningful character remaining, TZ='<>0' isn't valid. There is no problem with TZ='<Z>0' if you want to set "zulu" time. That has 3 chars of "std", but the quoting chars aren't part of the tzname defined, leaving just "Z". This 3 char rule also applies only to POSIX form TZ strings, the zone names specified by tzdata format TZ specifications (or whatever other provider of timezone data an implementation chooses to use) have no such restriction. There's no reason at all tzdata could not use "ET" if it wanted to (even now it really makes more sense to call what is currently EST and EDT as just "ET", all anyone really cares about is that is eastern (US) time (USET would be better, other places have an "east" too, and some of them have timezones that apply in their eastern areas - and that is > 3 chars...) Even a POSIX TZ string can handle that TZ='<ET>5<ET>4,whatever' should work on any conforming implementation, right now (with a suitable value filled in for "whatever" of course, or with it and its preceding comma omitted - in which case the implementation is expected to supply the rule for when the switch occurs, but nothing, anywhere, requires that rule to be in any way consistent with any actual timezone on the planet, or to supply any actual switch times at all. kre
On 2023-03-03 00:45, Robert Elz wrote:
The latest draft (currently available) of the forthcoming standard is clearer, it adds:
If the dst field is specified and the rule field is not, it is implementation-defined when the changes to and from Daylight Saving Time occur.
Thanks, I didn't know that. In other words, in the current POSIX standard TZ='EST5EDT' has unspecified behavior, whereas in the draft next POSIX standard TZ='EST5EDT' has partly-specified behavior in that the implementation must only shuttle back and forth between standard time and DST via some schedule. If I understand things correctly, the draft allows for more than two transitions per year, e.g., one for Ramadan and another for summer as Morocco used to do. (Or is this really required? could an implementation use permanent standard time? or permanent DST? it's not clear from the text you quoted.)
Also note, that sometime in the future this "POSIX TZ" format will almost certainly be deprecated, and then removed.
That could lead to problems, as Internet RFC 8536 relies on POSIX TZ format, and the format is embedded in the TZif files interpreted by tzcode and by lots of other downstream code. For example, on the Ubuntu workstation I'm typing this message on, /usr/share/zoneinfo/Europe/Paris contains the string 'CET-1CEST,M3.5.0,M10.5.0/3' and glibc uses this string to process future time stamps. I suppose if POSIX stops specifying strings like this, we could move the spec to the successor of RFC 8536. But what would be the point? Every tzcode-like implementation would still need to parse such strings, and there seems little point to deprecating the exposure of that parser to the user.
I believe there is now general acceptance that it is simply inadequate for real world timezones
Yes, it's certainly inadequate if the goal is to represent all timestamps since 1970. However, it's useful for specific use cases, e.g., if you care only about timestamps now and in the future (this is how TZif files use it). So it would make sense to keep it in POSIX, to support those use cases.
The current draft also contains this:
Daylight Saving Time is in effect all year if it starts January 1 at 00:00 and ends December 31 at 24:00 plus the difference between Daylight Saving Time and standard time, leaving no room for standard time in the calendar. For example, TZ='EST5EDT,0/0,J365/25' represents a time zone that observes Daylight Saving Time all year, being 4 hours west of UTC with abbreviation "EDT".
Yes, as I recall this was put in at my suggestion, before Michael Deckers pointed out on this list that this draft change to POSIX is not necessary. Instead of TZ='EST5EDT,0/0,J365/25' you can use TZ='XXX3EDT4,0/0,J365/23' which conforms to current POSIX, so there's no need for the draft POSIX change (though it doesn't hurt, I suppose, and Internet RFC 8536 refers to it...).
There is no problem with
TZ='<Z>0'
No there is a real problem, in current POSIX anyway, since POSIX says for this case "the std and dst fields in this case shall not include the quoting characters" ('<' and '>') and it also says that std must be at least three characters. This is not just a standard-lawyer quibble. Real-world software breaks if you set TZ='<Z>0'. For example, on current Ubuntu: $ TZ='<Z>0' date Fri Mar 3 17:49:16 2023 with no Z in the output anywhere. This Ubuntu behavior conforms to POSIX since POSIX doesn't say what to do with nonconforming strings like '<Z>0'.
This 3 char rule also applies only to POSIX form TZ strings, the zone names specified by tzdata format TZ specifications (or whatever other provider of timezone data an implementation chooses to use) have no such restriction.
I suppose you're right about that, if it's merely an issue of conforming to POSIX, That is, in theory TZ='Europe/Paris' can use whatever time zone abbreviation we like (including the empty string, or a string containing newlines :-). Still, I hesitate to depart from the POSIX form, as too much software expects it. TZDB used to depart from the POSIX form, in that 'date' and 'strftime' %Z would sometimes expand to strings containing spaces. However, this led to downstream trouble, in that parsers of the output of 'date' and 'strftime' got confused. I would not be surprised if we encountered similar problems with time zone abbreviations containing less than 3 characters, for reasons similar to why Ubuntu 'date' does not do what you want with TZ='<Z>0' or with TZ='<ET>4'.
Date: Fri, 3 Mar 2023 10:11:12 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <5f1bb439-5100-7c0d-fc27-32ab2e5fee08@cs.ucla.edu> | Thanks, I didn't know that. In other words, in the current POSIX | standard TZ='EST5EDT' has unspecified behavior, I think so, unless I missed something (I looked hard to find where it said what happens, and found nothing, but not finding doesn't actually mean not present, I might have been looking under the wrong bush). Then the general rule is that if nothing is specified, it is unspecified (the latter in the POSIX technical sense). Note not undefined, something semi-rational has to happen (not a core dump), the standard just doesn't say what. | whereas in the draft | next POSIX standard TZ='EST5EDT' has partly-specified behavior in that | the implementation must only shuttle back and forth between standard | time and DST via some schedule. Yes, act as if the rule is there, somewhere, hidden and invisible, and (ideally) actually specifying something sane. The way tzcode/tzdata handles this, as an (effective) alias for America/New_York is fine, nothing says that the implementation defined rule needs to be as limited as the POSIX string would be. Since it is implementation defined (will be) the implementation needs to say somewhere what happens, then its users will know if using a TZ spec like that, with no attached rule, but with summer time specified to happen, is adequate for their needs. If so it makes sense to use it, as the implementation is more likely to adapt to changes in the regulations than the user's .profile file. | If I understand things correctly, the draft allows for more than two | transitions per year, e.g., one for Ramadan and another for summer as | Morocco used to do. (Or is this really required? could an implementation | use permanent standard time? or permanent DST? it's not clear from the | text you quoted.) Nothing in POSIX (aside from the POSIX TZ string definition) defines what a timezone can be, or what the rules are (C defines even less). That's hardly surprising, those things are defined by various national (or similar) legislative or administrative bodies, and are completely out of POSIX's sphere of influence. The US Dept of Commerce (is it?) doesn't care if it conforms to POSIX nor not, it isn't seeking the magic badge of approval. All that matters is that somehow there be a mechanism that will convert a time_t into a struct tm, in some specified timezone, according to the rules of that timezone, as set by whoever. And vice versa. The closest POSIX comes (C has none of this) is that specification of the TZ string format, which allows simple cases to be specified for times for which the current rules are adequate (provided the needed rules are simple enough). Eg: someone's legislation might state that summer time begins at 01:00 on some particular Sunday (say the last Sunday of some month), with the time skipping forward to 02:00. That's all simple enough, and exactly the kind of thing we're used to. But let's suppose the legislation also says "If the last Sunday of the month is the last day of the month, summer time will instead start at 03:00 which becomes 04:00". The rules in a POSIX TZ string cannot handle that. In tdata (as I understand it) we can't handle it either - other than by manually inserting a one off rule every time the last Sunday of the month when summer time starts happens to be the last day of that month, and then reverting to a LastSun 01:00 rule for the following years, until it happens again. Once that's done, everything will be fine, but there's nothing automated about it. | That could lead to problems, as Internet RFC 8536 relies on POSIX TZ | format, If it relies upon it by reference, then it should probably start being updated to specify whatever it needs itself. Just in case. But there's no hurry. That would be a good thing to do in any case, relying upon someone else not making a change which might break your usage doesn't seem like the right thing to do to me. Do note that it will certainly be at least a decade, more likely 2 or 3 decades, and perhaps even more, before this would actually happen - the format needs to be marked obsolete first, and and even that hasn't happened yet (if it doesn't happen in the next standard, expected next year now (Posix-2024 perhaps ... Issue 8 certainly) then it won't until (at least) Issue 9, which (my guess would be) won't happen until the middle 2030's at the earliest (more likely late 2030's). Then (possibly) after having been marked obsolete in Issue 9, it might be removed in the next one (Issue 10, 2050's sometime perhaps) or the one after (Issue 11, late 2060's) ... There is LOTS of time to get everything else in place before anything changes here (and I am still just speculating that it ever will - it simply seems like a logical future step to me). | and the format is embedded in the TZif files interpreted by tzcode That's harmless - removing it from POSIX doesn't mean it must stop working, even less that its use in TZif files needs to end. If anything just the contrary, if things are no longer constrained by the POSIX spec, and if there's a need, that format could be extended to handle more than two transitions per year, or rules that are more complex than the ones POSIX allows to be specified (of course, you could do that anyway, update the RFC and TZif files don't need to be constrained by POSIX regardless of what happens). | and by lots of other downstream code. What kind? I doubt that anything other than tzset() and related stuff ever parses a TZ string contents, though I guess someone might have written a TZ string -> what it means converter, to help users get that right. | For example, on the Ubuntu | workstation I'm typing this message on, /usr/share/zoneinfo/Europe/Paris | contains the string 'CET-1CEST,M3.5.0,M10.5.0/3' and glibc uses this | string to process future time stamps. That's fine - but you don't need the definition in POSIX to do that. All that removing it does is tell users that they cannot necessarily expect a string of that format to work. It still might, as what will be left, with that gone, is "If the TZ value starts with a ':' what it means is implementation defined, if it doesn't then it, by magic means unspecified here (possibly the IANA tzdata database), the implementation will discover from the value a means to convert between time_t and local time. If the implementation wants to keep parsing old style POSIX strings (and for backwards compat for any users who use them, most will I would guess), that is just fine. As far as we're concerned here, nothing needs to change at all. | I suppose if POSIX stops specifying strings like this, we could move the | spec to the successor of RFC 8536. But what would be the point? As long as it remains in POSIX, users can keep insisting upon their right to use those strings, and implementations (even ones not based upon tzcode, which have no use for that nonsense at all) have to keep supporting it. This is exactly the same rationale as lots of other ancient crud has been retired from the standard over time. POSIX used to specify uucp ... it doesn't any more. That doesn't mean that an implementation cannot continue to support that, it just means that users can't complain about a POSIX violation if they choose not to. Same here with POSIX TZ strings. | Every | tzcode-like implementation would still need to parse such strings, and | there seems little point to deprecating the exposure of that parser to | the user. For tzcode, perhaps - but nothing anywhere requires that only tzcode be used to provide the translation service. A different implementation of a similar service, in a world where POSIX no longer specifies its TZ string format, would not need to parse those things. Why would it? | So it would make sense to keep it in POSIX, to support those use cases. No, it doesn't. It might make sense to keep it in the implementation to support those cases, it doesn't need to be in the standard for that to happen. | > There is no problem with | > | > TZ='<Z>0' | | No there is a real problem, in current POSIX anyway, since POSIX says | for this case "the std and dst fields in this case shall not include the | quoting characters" ('<' and '>') and it also says that std must be at | least three characters. Yes, but you are misinterpreting what "std" is. That is not the abbreviation (or tzname, or whatever one wants to call it), it is the field of the TZ string in which that name is specified. If there are no quoting chars, then it turns out the two are the same, the contents of the field (provided it meets the other requirements) is the abbreviation. If it is quoted, then the charset restrictions are relaxed (not just alpha chars). I read the sentence you quoted, as meaning "the abbreviations extracted from the std and dst fields shall not include...". It does say that std and dst must be at least 3 bytes, but that is earlier, before it starts on the format of those fields (which is irrelevant if they aren't at least 3 bytes long). That (at least 3 byte) std field might start with a '<' and end with a '>' in which case the tzname (abbreviation, whatever) is what is between (assuming correct syntax). Note the length limit is (or will be) bytes, not characters - not in the version that is coming, where an effort has been made to me more careful about the difference between a byte and a character, and use the intended word in each case, not just assume they are the same thing, which much of old POSIX used to do, and use the words interchangeably, preferring "character" when text was being discussed, and "byte" when the contents were arbitrary - so malloc(n) allocated space for n bytes, strlen(p) returned the number of characters in p. No more. In my reading, of this, the "std" string in the format is '<' 'Z' '>' which is 3 bytes (all ASCII in this case). Beyond that, even if I'm wrong, POSIX has (for ages) permitted implementation defined TZ specifications, beginning with ':' - those specify no rules on the length, or character set, or existence, of tzname abbreviations the way that the POSIX TZ string does. Implementations are free to support anything they like, and users are free to set TZ to any such string supported by the implementation. Application code needs to learn to deal with that - pandering to broken assumptions by insisting on pseudo-rules just because there's some (unjustified) belief that this is the way it is supposed to be, doesn't really help anyone. | This is not just a standard-lawyer quibble. Real-world software breaks | if you set TZ='<Z>0'. I consider glibc broken in that case. On NetBSD, which is using (in this area, probably much less modified than glibc uses), tzcode I get: jacaranda$ TZ='<Z>0' date Fri Mar 3 20:45:25 Z 2023 That's as it should be. What did glibc (or perhaps Ubuntu) do to things to break that? And why? What does a pure (as distributed) tzcode version do in this case? | since POSIX doesn't say what to do with nonconforming strings like '<Z>0'. Only if that is indeed non-conforming. I can see another POSIX bug report coming up, this area clearly needs more clarification. Note that even should it be decided that this is indeed non-conforming, an implementation can certainly support TZ=':<Z>0' or even just TZ=:Z0 and set the abbreviation to Z and the offset to 0, and POSIX has no rule against that at all. Application code needs to learn to deal with it. "I've never seen that happen, so it must not be possible" is a common, but bogus, argument. Implementations are not required to support that, so applications cannot depend upon using it - but implementations are allowed to support it, and users of that implementation are allowed to use it, applications running on that implementation must be able to deal with the consequences. | I suppose you're right about that, if it's merely an issue of conforming | to POSIX, That is, in theory TZ='Europe/Paris' can use whatever time | zone abbreviation we like (including the empty string, or a string | containing newlines :-). Yes, if it were merely a conformance issue... - though I haven't checked to see if POSIX decided to impose any rules on what is allowed in the tm_zone field of a struct tm. That might limit things, if there are any restrictions there. glibc has obviously decided the empty string is OK, as that is what the example you showed uses: $ TZ='<Z>0' date Fri Mar 3 17:49:16 2023 notice the two spaces between "16" and "2023". The abbreviation is inserted between those, and is clearly empty in this case. | Still, I hesitate to depart from the POSIX form, as too much software | expects it. We already made (and forced) a change, by sticking in +07 type abbreviations, which are not the 3 or more alpha chars that used to be the norm (and even longer ago, exactly 3 alpha chars, always). What I get locally (now) (and I'd much prefer if ICT came back) jacaranda$ date Sat Mar 4 03:52:30 +07 2023 which is nothing like what used to be the normal format. Applications need to learn to tell the difference between what is guaranteed (in this area, almost nothing) and what is commonly seen (which is irrelevant, unless some code wants to optimise for that case, which would be reasonable). | I would not be surprised if we | encountered similar problems with time zone abbreviations containing | less than 3 characters, I'd expect even more problems if the name doesn't appear at all. But Ubuntu seems to be surviving that, so I suspect it would survive shorter than 3 byte abbreviations as well. | for reasons similar to why Ubuntu 'date' does | not do what you want with TZ='<Z>0' or with TZ='<ET>4'. You didn't ever say what those reasons are, other than some desire to conform to something I don't believe POSIX actually requires. To be a conforming POSIX TZ string, just perhaps, but nothing else gives any guarantees, and no-one is required to (and few people do) use those strings. Most code runs without TZ set at all - in that case it is clear that there's no 3 byte limit on anything, as the spec for parsing TZ cannot be relevant if there is no TZ set anywhere to parse. What happens using glibs with TZ='<A>1' ? (nb: I'm not sure that A, as in the US Military timezone designated 'A', really is -0100, it might be +0100, or something else entirely, that doesn't matter here) What I see is: jacaranda$ TZ='<A>1' date ; TZ='<Z>0' date Fri Mar 3 19:58:25 A 2023 Fri Mar 3 20:58:25 Z 2023 What does Ubuntu (glibc) do in that case? Again, what I see is almost certainly just what tzcode does, and it is almost certainly correct. I certainly see no application conformance benefit in the Ubuntu behaviour you described - at least in the NetBSD (tzcode?) version, there is an abbreviation present, it might be shorter than some software might expect but it isn't absent. glibc obviously isn't treating the TZ string as garbage or we'd get something like: jacaranda$ TZ=/---+99 date Fri Mar 3 21:03:02 GMT 2023 with the fallback to GMT (or UTC perhaps) when the TZ string specifies nothing meaningful at all, it isn't doing that. So since it did seem to give UTC time (the '0') I'm assuming that it parsed the string, and then simply decided to break things, because someone believes (incorrectly) that POSIX requires otherwise (leading to unspecified behaviour, so you can do what you like - but in that case, doing the reasonable thing, rather than the vindictive one, seems more beneficial to me). Further, as tzdata (or other implementations using the newly added TZ specification type) and : TZ specs have no limits imposed like the POSIX TZ string imposes, applications need to deal with whatever comes from them as well. kre
On 2023-03-03 14:47, Robert Elz wrote:
| That could lead to problems, as Internet RFC 8536 relies on POSIX TZ | format,
If it relies upon it by reference, then it should probably start being updated to specify whatever it needs itself. Just in case.
That's not strictly necessary, as the RFC specifies the POSIX version, so even if POSIX comes out with a new version the RFC is still valid. When writing RFC 8536 I didn't want to duplicate the POSIX spec. I wanted to refer to an existing standard; that way, we could avoid errors that inevitably arise when duplicating, and readers could easily see that they can reuse their POSIX code to implement the spec. This sort of thing is common practice.
| and by lots of other downstream code.
What kind? I doubt that anything other than tzset() and related stuff ever parses a TZ string contents, though I guess someone might have written a TZ string -> what it means converter, to help users get that right.
There's a partial list at <https://data.iana.org/time-zones/tz-link.html#TZif>. There's plenty of other code like that, both to parse TZif files and to deal with other uses of POSIX TZ strings. A quick search reports <https://github.com/Ryujinx/Ryujinx/blob/master/Ryujinx.HLE/HOS/Services/Time...> for example; this is part of a Nintendo Switch emulator written in C#.
As long as it remains in POSIX, users can keep insisting upon their right to use those strings, and implementations (even ones not based upon tzcode, which have no use for that nonsense at all) have to keep supporting it.
I see uses for these POSIX TZ strings, even with tzcode and tzdata. Here's a scenario: your government abruptly changed the DST rules and you don't have access to the network (or perhaps your distributor hasn't updated its copy of tzdata yet) and so you can't get the latest tzdata easily. You can work around the problem with one of these POSIX TZ strings. Even if your platform is built out of a bunch of different modules, all the code should still work because they all conform to this longstanding POSIX standard. I don't much like POSIX TZ strings either. However, now that we have them, they're useful on occasions like these, and removing them from POSIX would be a small benefit to implementers and a significant hassle for some use cases.
| > There is no problem with | > | > TZ='<Z>0' | | No there is a real problem, in current POSIX anyway, since POSIX says | for this case "the std and dst fields in this case shall not include the | quoting characters" ('<' and '>') and it also says that std must be at | least three characters.
Yes, but you are misinterpreting what "std" is. That is not the abbreviation (or tzname, or whatever one wants to call it), it is the field of the TZ string in which that name is specified.
No, because POSIX says that for TZ strings "The std and dst fields in this case shall not include the quoting characters." In the TZ setting TZ='<+1245>-12:45<+1345>,M9.5.0/2:45,M4.1.0/3:45' (isn't that a *beauty* :-) the std field is simply "+1245", without the angle brackets. I realize your interpretation of that wording differs. However, my interpretation is more plausible and better reflects existing practice.
I consider glibc broken in that case.
macOS behaves like glibc. That's an independent code base, but evidently both sets of developers read POSIX the way that I'm reading it, and it'd be a stretch to say we're all wrong. AIX and Solaris behave in yet a third way: they treat TZ='<Z>0' as if it were TZ='<Z >0' (i.e., two spaces after the "Z"). All these behaviors conform to POSIX because POSIX doesn't specify the behavior when dst has fewer than 3 characters.
What does a pure (as distributed) tzcode version do in this case?
It behaves like NetBSD, which isn't surprising as NetBSD is derived from tzcode.
I'd expect even more problems if the name doesn't appear at all. But Ubuntu seems to be surviving that
? this is backwards. People don't use TZ='<Z>0' or TZ='<ET>4' because those usages are nonconforming and don't work in general. If TZ='America/New_York' started saying just 'ET', that would be more like what the situation was when TZDB put spaces in time zone abbreviations. But I'd be loath to do that.
What happens using glibs with TZ='<A>1' ?
$ TZ='<A>1' date; TZ='<Z>0' date; date -u Sat Mar 4 00:35:26 2023 Sat Mar 4 00:35:26 2023 Sat Mar 4 00:35:26 UTC 2023 That is, both TZ settings are invalid, and in that case glibc which uses UTC without any abbreviation (POSIX says %Z is empty when unknown). When NetBSD sees an invalid TZ setting it does something similar, except it uses the abbreviation "GMT" instead of "", and it extends POSIX in a different way so it has a different opinion about what is invalid. These behaviors all conform to POSIX since the TZ settings don't conform to POSIX. Here's a more-outlandish example, run on NetBSD: $ TZ="$(awk 'BEGIN {for (i=0; i<512; i++) printf "A"; print "4"}')" date; date -u Sat Mar 4 00:28:09 GMT 2023 Sat Mar 4 00:28:09 UTC 2023 Here NetBSD treats the TZ setting as invalid (a time zone abbreviation of 512 "A"s!) and silently substitutes GMT. Glibc treats this same example as specifying a 512-byte abbreviation for a time zone 4 hours west of Greenwich. Both behaviors conform to POSIX since the TZ string exceeds POSIX length limits.
Date: Fri, 3 Mar 2023 16:38:48 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <53ba0709-0cfa-8859-8021-6b91909a9b62@cs.ucla.edu> | That's not strictly necessary, as the RFC specifies the POSIX version, | so even if POSIX comes out with a new version the RFC is still valid. Technically yes, but it is much harder to acquire access to old versions of POSIX standards than it is old RFCs, most people will just give up when all they find is the current one (whichever that is). | > What kind? I doubt that anything other than tzset() and related | > stuff ever parses a TZ string contents, | There's a partial list at | <https://data.iana.org/time-zones/tz-link.html#TZif>. Oh, sorry, that's what I meant by "tzset() and related stuff", obviously anything doing localtime() or its equivalent in some other language or system is going to have to deal with it. I was asking about any applications that are not implementations of that functionality - things that might not simply get updated if the TZif file format, were to alter -- hence the possible example of a program which would parse TZ and explain in natural language (or perhaps just something less baroque than the POSIX TZ string rule format) what it all means. | I see uses for these POSIX TZ strings, even with tzcode and tzdata. | Here's a scenario: your government abruptly changed the DST rules and | you don't have access to the network (or perhaps your distributor hasn't | updated its copy of tzdata yet) and so you can't get the latest tzdata | easily. Frankly, that would be horrible advice to give someone. Anyone doing that is going to change the interpretation of all previous recorded timestamps to match the new rules. That absurd behaviour (which I'm sure you're well aware of) is one of the reasons those POSIX TZ strings are useless. In this case they're even worse than usual, where normally the rules aren't changing for long periods. The user would be better to make sure that they were equipped with zic, and the tzdata source files (or whatever equivalent applies to the system in use) and update things themselves. If that's impossible, or just impractical, then it would be better to simply allow the localtime conversion to be inaccurate for a while (all system timestamps, in UTC, will still be correct) until things can be fixed the normal way - just as probably the microwave oven (and other dumb clocks) usually shows the wrong time for who knows how many days after one of the time shifts, until it annoys someone enough to correct it. In this case, the user can even, justifiably, blame the government for not giving sufficient notice. So, please, never tell someone: | You can work around the problem with one of these POSIX TZ strings. | I realize your interpretation of that wording differs. However, my | interpretation is more plausible and better reflects existing practice. Some existing practice. Not all. In any case https://austingroupbugs.net/view.php?id=1639 now exists, and we will eventually get a resolution, one way or the other. | All these behaviors conform to POSIX because POSIX doesn't specify the | behavior when dst has fewer than 3 characters. But if that's correct, and it might turn out to be, why not simply do the reasonable thing that tzcode does, and just allow 1 character. Surely that's better than none, or one with trailing spaces? | Glibc treats this same | example as specifying a 512-byte abbreviation for a time zone 4 hours | west of Greenwich. That's a very odd choice. Why would they choose to ignore the max length limit (which is easily capable of making slightly sloppy programs overflow buffer sizes) but then enforce the minimum length, which (since with my interpretation, that is still not 0) which is not very likely to have any noticeable effects at all. Weird. Certainly glad I'm not a glibc user. kre
On 2023-03-04 23:32, Robert Elz wrote:
Technically yes, but it is much harder to acquire access to old versions of POSIX standards than it is old RFCs, most people will just give up when all they find is the current one (whichever that is).
Although that used to be true, POSIX has gotten better. It's now no trouble to access older POSIX versions so long as you don't want to go back before 2001. For example, the previous (Issue 6, revised 2004) Open Group spec for POSIX can be read here: https://pubs.opengroup.org/onlinepubs/009695399/ and Internet RFC 8536, which contains a similar URL to point to a later POSIX edition, should be good for quite some time.
I was asking about any applications that are not implementations of that functionality - things that might not simply get updated if the TZif file format, were to alter
I'm afraid that I don't understand the premise of the question then. But it's not important. POSIX won't obsolete these TZ strings any time soon, and even if it does so (which in my view would be a mistake) those programs will still need to support those strings since they're required for TZif.
Anyone doing that is going to change the interpretation of all previous recorded timestamps to match the new rules.
Sure, but for many applications that's preferable to screwing up today's timestamps. There is no perfection in situations like these, only the best of unappetizing solutions. Lots of people would prefer having localtime work today, than to require users to run on UTC, even if this means some old timestamps are wrong by an hour (after all, that's better than their being wrong by *12* hours....). Again, I'm not recommending this over having a properly updated TZDB. Obviously the latter would be preferable. It's just that sometimes it's not feasible.
https://austingroupbugs.net/view.php?id=1639
now exists, and we will eventually get a resolution, one way or the other.
Thanks, I've followed up there.
| Glibc treats this same | example as specifying a 512-byte abbreviation for a time zone 4 hours | west of Greenwich.
... Why would they choose to ignore the max length limit
They don't ignore it. It's just that the upper bound is much bigger than the upper bound in tzcode/NetBSD.
On 2023-03-02 23:49, Paul Eggert via tz wrote:
On 3/2/23 15:32, Paul Gilmartin via tz wrote:
The forms containing numbers are mandated by POSIX:
POSIX does not specify TZ strings like TZ='EST5EDT'; they are TZDB extensions. If you want a TZ string whose meaning is specified, you need something like TZ='EST5EDT,M3.2.0,M11.1.0'.
Weren't TZ strings like "EST5EDT" in use by Unix before POSIX made them non-standard? -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-
On 2023-03-03 02:09, Ian Abbott wrote:
Weren't TZ strings like "EST5EDT" in use by Unix before POSIX made them non-standard?
Yes, as I recall System V supported settings like TZ='EST5' and TZ='EST5EDT'. Time zone abbreviations had to be exactly three letters long, and there was no way to specify DST rules in the TZ string. I believe US DST rules were hard-coded in the C library, though I suppose some enterprising hackers in Europe may have changed the source code and recompiled everything. For reasons of backward compatibility tzcode still supports this sort of thing, though it does not insist on exactly three letters, and it lets builders more easily alter the compiled-in DST rules by building with something like this for Europe: make CFLAGS='-DTZDEFRULESTRING=\",M3.5.0,M10.5.0/3\"' as noted in the Makefile. This flexibility is a two-edged sword, though, and I hope that pretty much nobody needs or uses it. Users in France should use TZ='Europe/Paris', or at worst TZ='CET-1CEST,M3.5.0,M10.5.0/3'; they should not use a System V style setting like TZ='CET-1CES' as this will give wrong answers on most platforms: these platforms assume US DST rules if they assume anything at all.
On Mär 02 2023, Paul Eggert via tz wrote:
If common practice becomes "ET" we couldn't use that, unfortunately, as POSIX requires at least three characters.
When /dst/ is missing, /std/ can be less than 3 bytes. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Date: Fri, 03 Mar 2023 09:04:48 +0100 From: Andreas Schwab via tz <tz@iana.org> Message-ID: <87y1oes1jz.fsf@linux-m68k.org> | On M�r 02 2023, Paul Eggert via tz wrote: | | > If common practice becomes "ET" we couldn't use that, unfortunately, | > as POSIX requires at least three characters. | | When /dst/ is missing, /std/ can be less than 3 bytes. What gives you that impression? Where it says The interpretation of these fields is unspecified if either field is less than three bytes (except for the case when dst is missing), it is just allowing for a missing dst field being (obviously) 0 bytes, and hence less than 3 - it is the missing dst field that is allowed to be less than 3 bytes in that case, not the std field. But perhaps that could be clearer. kre
On Mär 03 2023, Robert Elz via tz wrote:
Date: Fri, 03 Mar 2023 09:04:48 +0100 From: Andreas Schwab via tz <tz@iana.org> Message-ID: <87y1oes1jz.fsf@linux-m68k.org>
| On Mr 02 2023, Paul Eggert via tz wrote: | | > If common practice becomes "ET" we couldn't use that, unfortunately, | > as POSIX requires at least three characters. | | When /dst/ is missing, /std/ can be less than 3 bytes.
What gives you that impression?
Where it says
The interpretation of these fields is unspecified if either field is less than three bytes (except for the case when dst is missing),
it is just allowing for a missing dst field being (obviously) 0 bytes, and hence less than 3 - it is the missing dst field that is allowed to be less than 3 bytes in that case, not the std field.
My understanding is that a missing field does not have a length. Though I admit that without explicit delimiters, a missing field is difficult to distinguish from a zero-length field.
But perhaps that could be clearer.
Perhaps it should say: "except that dst can be missing". -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Date: Fri, 03 Mar 2023 13:03:48 +0100 From: Andreas Schwab <schwab@linux-m68k.org> Message-ID: <87edq63uu3.fsf@igel.home> | My understanding is that a missing field does not have a length. That makes sense (though I don't know if POSIX anywhere says something like that) - but a field without a length would still not be at least 3 bytes long... | > But perhaps that could be clearer. | Perhaps it should say: "except that dst can be missing". I have entered a defect report into the POSIX (austin group) database of such things. https://austingroupbugs.net/view.php?id=1638 (Those are all available for public viewing). We'll what happens. I don't expect the (ugly) language I suggested as a resolution to be adopted, but something will be. Watch that space... kre
On 2023-03-02 5:44 PM, Paul Eggert wrote:
On 3/2/23 14:22, Brooks Harris via tz wrote:
How will tzdb manage this?
Traditionally we've treated "permanent daylight saving" as standard time, and I'd rather continue this tradition than make an exception for the US. That is, tm_isdst would be 0. (Most people don't care about the tm_isdst flag, but POSIX and C standard nerds do.)
Whether the adjusted time in (say) New York would be abbreviated "EST" or "AST" or "EDT" is up to common practice. We could use the abbreviation "-04" until common practice settles down. If common practice becomes "ET" we couldn't use that, unfortunately, as POSIX requires at least three characters. At some point "EST" might become the best of the alternatives.
My biggest worry is the set of backward compatibility zones EST5EDT, CST6CDT, MST7MDT, PST8PDT as their continued use would lead to so much confusion that they'd be more trouble than they're worth. Presumably we would retire them by moving them to "backzone". "EST" and "MST" might need to retire as well. (Luckily, there is no "CST" or "PST".)
Similar issues will come up if EU regions go to "permanent daylight saving", as they have threatened to do for years.
Whatever we do in this area, it will be a mess.
"A mess" doesn't sound good. Apparently there is still debate of "permanent DST" v.s. "permanent standard time". Many arguments have been made for "permanent standard time" but amongst them is no discussion or recognition of the potential technical difficulties, disruptions, and costs associated with "permanent DST". I would think the major implementers would be very concerned about this. I'm guessing all industries, from finance to transportation, would be affected and many don't realize the difficulties and costs they may face to adapt to the change. As I understand it going to "permanent standard time" is simple and straight forward for tzdb and the entire downstream infrastructure. Wouldn't it be a good idea to try to inform the public and politicians of the technical challenges posed by "permanent DST" ?
Brooks Harris wrote:
Apparently there is still debate of "permanent DST" v.s. "permanent standard time". Many arguments have been made for "permanent standard time" but amongst them is no discussion or recognition of the potential technical difficulties, disruptions, and costs associated with "permanent DST". I would think the major implementers would be very concerned about this. I'm guessing all industries, from finance to transportation, would be affected and many don't realize the difficulties and costs they may face to adapt to the change.
<flame> We have spent decades — for many of us, our entire lives — changing the clocks regularly every spring and fall, and only in the last few years have I become aware that this is somehow a tremendous burden and a massive inconvenience for me and everyone else. Having self-evidently settled the debate that changing the clocks is a terrible thing and a national priority, the only solution being formally presented is to shift our time zones 15° east year-round. No amount of evidence about the failed experiment in 1974, and the schoolchildren killed in early-morning accidents, seems to convince anyone. It seems unlikely that appeals to technical difficulties would convince these people either, as long as the golf and Halloween candy industries can be shown to turn a greater profit, and as long as flawed studies showing supposed energy savings can be produced. Marco Rubio is behind this bill, and he had never shied away from claiming that he or Florida represent the entire United States, so I don’t know why the DST situation should be any different. </flame> Sorry about that. I feel better now. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
participants (11)
-
Andreas Schwab -
Brooks Harris -
Doug Ewell -
Eliot Lear -
Ian Abbott -
Paul Eggert -
Paul Gilmartin -
Pete Resnick -
Robert Elz -
Steffen Nurpmeso -
Tim Parenti