Add ICU's Etc/Unknown Zone to IANA Time Zone Database?
Hi TZ friends - Should the time zone identifier "Etc/Unknown" (standardized in Unicode Technical Standard 35 and used by the ICU <https://icu.unicode.org/> library that implements time zone support in all major web browsers, in Java, and in other platforms) be added to the IANA Time Zone Database? Etc/Unknown would behave the same as the existing time zone identifier "Factory". Ideally, Etc/Unknown could be a Zone and Factory could be turned into a Link pointing to Etc/Unknown, because both of them have the meaning of "the time zone of this computer is not known" but Etc/Unknown is more self-describing and is better aligned with modern Zone naming conventions. Here's more context: "Etc/Unknown" is standardized in https://unicode.org/reports/tr35/#Time_Zone_Identifiers. Here's the relevant text from the standard:
There is a special code "unk" for an Unknown or Invalid time zone. This can be expressed in the tz database style ID "Etc/Unknown", although it is not defined in the tz database.
Following this standard, Etc/Unknown is returned by ICU <https://icu.unicode.org/> when the time zone of a computer cannot be determined. Here's an example of an ICU method that can return Etc/Unknown: icu::TimeZone::detectHostTimeZone() <https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/classicu_1_1TimeZone...> . The fact that Etc/Unknown looks like an IANA identifier but is not actually in TZDB has been a long-running source of problems. Most recently, this week Chrome is rushing out a patch <https://issues.chromium.org/issues/381620359> that reverted a recent change that added support for the "Factory" zone by making it an alias for "Etc/Unknown". GIven that ICU previously didn't recognize Factory and so returned Etc/Unknown for computers in the Factory zone, this was assumed by everyone to be a safe change. But this change turned out to break some 3rd-party libraries and had to be reverted. After discussing this recent bug with engineers at Google, our consensus was that it'd be helpful for the time zone ecosystem if Etc/Unknown stopped being a special case and started being a regular Zone in the IANA Time Zone database. Especially if we could Link-ify Factory at the same time so that Zone picker UIs won't have two distinct "I don't know what time zone this is" choices. Thanks for considering this proposal. Best, Justin Grant
May be prudent to keep Etc/Factory and Etc/Unknown separate to guard against the possibility that a vendor does something funky with Factory (such as automatically modifying the distribution so that Factory is their local time). There's a small price to pay given the small sizes of the existing Factory and proposed Unknown binaries. @dashdashado On Fri, Dec 6, 2024, 9:11 PM Justin Grant via tz <tz@iana.org> wrote:
Hi TZ friends - Should the time zone identifier "Etc/Unknown" (standardized in Unicode Technical Standard 35 and used by the ICU <https://icu.unicode.org/> library that implements time zone support in all major web browsers, in Java, and in other platforms) be added to the IANA Time Zone Database?
Etc/Unknown would behave the same as the existing time zone identifier "Factory". Ideally, Etc/Unknown could be a Zone and Factory could be turned into a Link pointing to Etc/Unknown, because both of them have the meaning of "the time zone of this computer is not known" but Etc/Unknown is more self-describing and is better aligned with modern Zone naming conventions.
Here's more context: "Etc/Unknown" is standardized in https://unicode.org/reports/tr35/#Time_Zone_Identifiers. Here's the relevant text from the standard:
There is a special code "unk" for an Unknown or Invalid time zone. This can be expressed in the tz database style ID "Etc/Unknown", although it is not defined in the tz database.
Following this standard, Etc/Unknown is returned by ICU <https://icu.unicode.org/> when the time zone of a computer cannot be determined. Here's an example of an ICU method that can return Etc/Unknown: icu::TimeZone::detectHostTimeZone() <https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/classicu_1_1TimeZone...> .
The fact that Etc/Unknown looks like an IANA identifier but is not actually in TZDB has been a long-running source of problems. Most recently, this week Chrome is rushing out a patch <https://issues.chromium.org/issues/381620359> that reverted a recent change that added support for the "Factory" zone by making it an alias for "Etc/Unknown". GIven that ICU previously didn't recognize Factory and so returned Etc/Unknown for computers in the Factory zone, this was assumed by everyone to be a safe change. But this change turned out to break some 3rd-party libraries and had to be reverted.
After discussing this recent bug with engineers at Google, our consensus was that it'd be helpful for the time zone ecosystem if Etc/Unknown stopped being a special case and started being a regular Zone in the IANA Time Zone database. Especially if we could Link-ify Factory at the same time so that Zone picker UIs won't have two distinct "I don't know what time zone this is" choices.
Thanks for considering this proposal.
Best, Justin Grant
On 07/12/2024 02:19, Arthur Olson via tz wrote:
May be prudent to keep Etc/Factory and Etc/Unknown separate to guard against the possibility that a vendor does something funky with Factory (such as automatically modifying the distribution so that Factory is their local time). There's a small price to pay given the small sizes of the existing Factory and proposed Unknown binaries.
The two are different 'concepts' and should not be considered as being mirrors of one another. That one does not know what timezone data should be used IS 'Unknown' is a fact and in general is outside the scope of the database, but having an entry to identify that fact would be prudent? Naturally it would have an offset value of zero so any time value would be handled as UTC. Geographic databases that supply a timezone identifier may well return a 'NULL' value if there is no matched entry and then there are other steps that are perhaps needed, again outside the scope of the tz database, that could provide a better offset than using UTC directly. Such as a simple 'solar' offset which has nothing to do with 'Factory' and is application dependent. A cross project agreement on a standard method of handling an unknown value of tz identifier is also outside the scope, but then that is why this thread has come into existence? -- Lester Caine ------------
As well as many distros building fat format (as third party readers break on slim) from rearguard with backward, backzone, and zone.tab for compatibility, and some add posixrules America/New_York, it would not surprise me if others, and commercial vendors set posixrules and/or Factory to some localized zone so users always have a valid default and never see -00. I could see this being a sound and convenient decision for orgs who provide localized installers, especially as most countries with their own languages also have only a single timezone, so Factory, /etc/localtime, and /etc/timezone can default to the same zone. If the other UTC (Unicode Technical Committee) or ICU would like Etc/Unknown to be instantiated as a timezone looking like the default Factory, and submit a patch, it should be included. I do not think any link between Factory and Unknown should ever be assumed. On 2024-12-06 19:19, Arthur Olson via tz wrote:
May be prudent to keep Etc/Factory and Etc/Unknown separate to guard against the possibility that a vendor does something funky with Factory (such as automatically modifying the distribution so that Factory is their local time). There's a small price to pay given the small sizes of the existing Factory and proposed Unknown binaries.
@dashdashado
On Fri, Dec 6, 2024, 9:11 PM Justin Grant via tz <tz@iana.org <mailto:tz@iana.org>> wrote:
Hi TZ friends - Should the time zone identifier "Etc/Unknown" (standardized in Unicode Technical Standard 35 and used by the ICU <https:// icu.unicode.org/> library that implements time zone support in all major web browsers, in Java, and in other platforms) be added to the IANA Time Zone Database?
Etc/Unknown would behave the same as the existing time zone identifier "Factory". Ideally, Etc/Unknown could be a Zone and Factory could be turned into a Link pointing to Etc/Unknown, because both of them have the meaning of "the time zone of this computer is not known" but Etc/Unknown is more self-describing and is better aligned with modern Zone naming conventions.
Here's more context: "Etc/Unknown" is standardized in https://unicode.org/ reports/tr35/#Time_Zone_Identifiers <https://unicode.org/reports/tr35/ #Time_Zone_Identifiers>. Here's the relevant text from the standard:
> There is a special code "unk" for an Unknown or Invalid time zone. > This can be expressed in the tz database style ID "Etc/Unknown", > although it is not defined in the tz database.
Following this standard, Etc/Unknown is returned by ICU <https:// icu.unicode.org/> when the time zone of a computer cannot be determined. Here's an example of an ICU method that can return Etc/Unknown: icu::TimeZone::detectHostTimeZone() <https://unicode-org.github.io/icu-docs/ apidoc/dev/icu4c/classicu_1_1TimeZone.html#a5ca5a356ff03ed1f7cd0b1550117f529>. The fact that Etc/Unknown looks like an IANA identifier but is not actually in TZDB has been a long-running source of problems. Most recently, this week Chrome is rushing out a patch <https://issues.chromium.org/issues/381620359> that reverted a recent change that added support for the "Factory" zone by making it an alias for "Etc/Unknown". GIven that ICU previously didn't recognize Factory and so returned Etc/Unknown for computers in the Factory zone, this was assumed by everyone to be a safe change. But this change turned out to break some 3rd-party libraries and had to be reverted.
After discussing this recent bug with engineers at Google, our consensus was that it'd be helpful for the time zone ecosystem if Etc/Unknown stopped being a special case and started being a regular Zone in the IANA Time Zone database. Especially if we could Link-ify Factory at the same time so that Zone picker UIs won't have two distinct "I don't know what time zone this is" choices.
-- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
Let me fill in a bit of background. For internationalization identifiers in general, including languages, scripts, countries, and so on, we need some identifier value that means "the value for that identifier is unknown". It is also often used in APIs for "the value supplied was invalid.". Some examples are in https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#Unknown_or_Invalid_I... . (Note that in that table there is 'unk'. CLDR provides short, stable identifiers for IANA TZDB identifiers, based where possible on UN/LOCODEs. All of these are defined by a mapping to the regular long IANA TZDB identifiers (plus Etc/Unknown), as per https://github.com/unicode-org/cldr/blob/main/common/bcp47/timezone.xml.) Around 2010, we added the "Etc/Unknown" in CLDR for use with the TZDB, to serve the purpose of a TZDB identifier for 'unknown'. The name was chosen so that it would be very unlikely to collide with any identifier that the TZDB would itself define in the future with a different meaning. Much more recently (2023) an effort was started to support some legacy POSIX identifiers that had been deliberately omitted from CLDR. You can read more about that in https://unicode-org.atlassian.net/browse/CLDR-17111, although the reasoning for the choices is not documented there. As a part of that effort, the best mapping for Factory (based upon its usage) was concluded to be Etc/Unknown. That led to the issue being discussed (see https://issues.chromium.org/issues/381620359). Justin's proposal for resolving the issue by formally adding Etc/Unknown to the TZDB would be very welcome. For the definition of Etc/Unknown in CLDR, we would then simply refer to the IANA TZDB. We could do this as early as the release of CLDR 47, due mid-March 2025. On Sat, Dec 7, 2024 at 12:11 PM Brian Inglis via tz <tz@iana.org> wrote:
As well as many distros building fat format (as third party readers break on slim) from rearguard with backward, backzone, and zone.tab for compatibility, and some add posixrules America/New_York, it would not surprise me if others, and commercial vendors set posixrules and/or Factory to some localized zone so users always have a valid default and never see -00.
I could see this being a sound and convenient decision for orgs who provide localized installers, especially as most countries with their own languages also have only a single timezone, so Factory, /etc/localtime, and /etc/timezone can default to the same zone.
If the other UTC (Unicode Technical Committee) or ICU would like Etc/Unknown to be instantiated as a timezone looking like the default Factory, and submit a patch, it should be included.
I do not think any link between Factory and Unknown should ever be assumed.
On 2024-12-06 19:19, Arthur Olson via tz wrote:
May be prudent to keep Etc/Factory and Etc/Unknown separate to guard against the possibility that a vendor does something funky with Factory (such as automatically modifying the distribution so that Factory is their local time). There's a small price to pay given the small sizes of the existing Factory and proposed Unknown binaries.
@dashdashado
On Fri, Dec 6, 2024, 9:11 PM Justin Grant via tz <tz@iana.org <mailto:tz@iana.org>> wrote:
Hi TZ friends - Should the time zone identifier "Etc/Unknown" (standardized in Unicode Technical Standard 35 and used by the ICU <https:// icu.unicode.org/> library that implements time zone support in all major web browsers, in Java, and in other platforms) be added to the IANA Time Zone Database?
Etc/Unknown would behave the same as the existing time zone identifier "Factory". Ideally, Etc/Unknown could be a Zone and Factory could be turned into a Link pointing to Etc/Unknown, because both of them have the meaning of "the time zone of this computer is not known" but Etc/Unknown is more self-describing and is better aligned with modern Zone naming conventions.
Here's more context: "Etc/Unknown" is standardized in https://unicode.org/ reports/tr35/#Time_Zone_Identifiers < https://unicode.org/reports/tr35/ #Time_Zone_Identifiers>. Here's the relevant text from the standard:
> There is a special code "unk" for an Unknown or Invalid time zone. > This can be expressed in the tz database style ID "Etc/Unknown", > although it is not defined in the tz database.
Following this standard, Etc/Unknown is returned by ICU <https:// icu.unicode.org/> when the time zone of a computer cannot be determined. Here's an example of an ICU method that can return Etc/Unknown: icu::TimeZone::detectHostTimeZone() < https://unicode-org.github.io/icu-docs/
apidoc/dev/icu4c/classicu_1_1TimeZone.html#a5ca5a356ff03ed1f7cd0b1550117f529>.
The fact that Etc/Unknown looks like an IANA identifier but is not
actually
in TZDB has been a long-running source of problems. Most recently,
this week
Chrome is rushing out a patch <
https://issues.chromium.org/issues/381620359>
that reverted a recent change that added support for the "Factory"
zone by
making it an alias for "Etc/Unknown". GIven that ICU previously
didn't
recognize Factory and so returned Etc/Unknown for computers in the
Factory
zone, this was assumed by everyone to be a safe change. But this change turned out to break some 3rd-party libraries and had to be
reverted.
After discussing this recent bug with engineers at Google, our
consensus was
that it'd be helpful for the time zone ecosystem if Etc/Unknown
stopped
being a special case and started being a regular Zone in the IANA
Time Zone
database. Especially if we could Link-ify Factory at the same time
so that
Zone picker UIs won't have two distinct "I don't know what time zone
this
is" choices.
-- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
On Dec 7, 2024, at 5:24 PM, Mark Davis Ⓤ via tz <tz@iana.org> wrote:
For internationalization identifiers in general, including languages, scripts, countries, and so on, we need some identifier value that means "the value for that identifier is unknown". It is also often used in APIs for "the value supplied was invalid.". Some examples are in https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#Unknown_or_Invalid_I....
So is there any specification for the behavior you'd get if, for example, LANG is set to "und_US"? Or is it "undefined behavior"?
Around 2010, we added the "Etc/Unknown" in CLDR for use with the TZDB, to serve the purpose of a TZDB identifier for 'unknown'. The name was chosen so that it would be very unlikely to collide with any identifier that the TZDB would itself define in the future with a different meaning.
If there's any specification for the LANG=und_US behavior, what should an equivalent specification for TZ=Etc/Unknown say? If not, presumably the behavior is undefined, so making it equivalent to UTC, or McMurdo Sound, or whatever would be OK.
The usage is more akin to : x = getScriptForLanguage("aay"); // an undefined language code or x = getScriptForLanguage("$#!&("); // an ill-formed language code // afterwards, x == "Zzzz" (unknown script) That is, the API is returning a value indicating that what you asked for doesn't have a well-defined answer. On Sat, Dec 7, 2024 at 6:28 PM Guy Harris <gharris@sonic.net> wrote:
On Dec 7, 2024, at 5:24 PM, Mark Davis Ⓤ via tz <tz@iana.org> wrote:
For internationalization identifiers in general, including languages, scripts, countries, and so on, we need some identifier value that means "the value for that identifier is unknown". It is also often used in APIs for "the value supplied was invalid.". Some examples are in https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#Unknown_or_Invalid_I... .
So is there any specification for the behavior you'd get if, for example, LANG is set to "und_US"? Or is it "undefined behavior"?
Around 2010, we added the "Etc/Unknown" in CLDR for use with the TZDB, to serve the purpose of a TZDB identifier for 'unknown'. The name was chosen so that it would be very unlikely to collide with any identifier that the TZDB would itself define in the future with a different meaning.
If there's any specification for the LANG=und_US behavior, what should an equivalent specification for TZ=Etc/Unknown say? If not, presumably the behavior is undefined, so making it equivalent to UTC, or McMurdo Sound, or whatever would be OK.
On Dec 7, 2024, at 9:15 PM, Mark Davis Ⓤ <mark@unicode.org> wrote:
That is, the API is returning a value indicating that what you asked for doesn't have a well-defined answer.
Then, presumably, all that is needed is a guarantee from the tzdb maintainers that "Etc/Unknown" will never be assigned to a timezone. Given that tzset() does not return a success/failure indication, and that the 2024 Single UNIX Specification's page for taste() says nothing about the behavior if TZ isn't set to a valid value (for some definition of "valid"): https://pubs.opengroup.org/onlinepubs/9799919799/functions/tzset.html nor does the 2024 Single UNIX Specification page on environment variables specify the the value of TZ in a fashion that permits all possible values to be determined to be valid or invalid: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html the behavior after calling tzset() if TZ is set to Etc/Unknown isn't specified anywhere, and it could be the same as the behavior if tzset("This/Does/Not/Exist") is called on a system that just uses the tzdb files, so there doesn't appear to be any need to put Etc/Unknown into the database - a guarantee that Etc/Unknown will never be used for any timezone in the tzdb should suffice
Yes, that would suffice. It would, however, be a bit cleaner (and easier for programmers to understand) if CLDR could just reference the TZDB for all identifiers. On Sat, Dec 7, 2024 at 10:51 PM Guy Harris <gharris@sonic.net> wrote:
On Dec 7, 2024, at 9:15 PM, Mark Davis Ⓤ <mark@unicode.org> wrote:
That is, the API is returning a value indicating that what you asked for doesn't have a well-defined answer.
Then, presumably, all that is needed is a guarantee from the tzdb maintainers that "Etc/Unknown" will never be assigned to a timezone.
Given that tzset() does not return a success/failure indication, and that the 2024 Single UNIX Specification's page for taste() says nothing about the behavior if TZ isn't set to a valid value (for some definition of "valid"):
https://pubs.opengroup.org/onlinepubs/9799919799/functions/tzset.html
nor does the 2024 Single UNIX Specification page on environment variables specify the the value of TZ in a fashion that permits all possible values to be determined to be valid or invalid:
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html
the behavior after calling tzset() if TZ is set to Etc/Unknown isn't specified anywhere, and it could be the same as the behavior if tzset("This/Does/Not/Exist") is called on a system that just uses the tzdb files, so there doesn't appear to be any need to put Etc/Unknown into the database - a guarantee that Etc/Unknown will never be used for any timezone in the tzdb should suffice
On 2024-12-08 10:26, Mark Davis Ⓤ via tz wrote:
Yes, that would suffice. It would, however, be a bit cleaner (and easier for programmers to understand) if CLDR could just reference the TZDB for all identifiers.
It's cleaner in some ways but messier in others. UTS#35 says "unk" means "Unknown or Invalid Time Zone". But if we added Etc/Unknown to TZDB then "unk" would correspond to a known and valid setting TZ="Etc/Unknown", equivalent in behavior to TZDB's TZ="Factory" or POSIX.1-2017's TZ="<-00>0". That could well be confusing, and it would disagree with the longstanding behavior of TZ="Etc/Unknown" in some implementations. JavaScript implementations conforming to UTS#35 need not worry about what TZDB does with "Etc/Unknown", as they can special-case that string and not inspect the TZDB data. So the question whether to add Etc/Unknown to TZDB is mostly about what non-JavaScript implementations should do with "Etc/Unknown". Many of these implementations, including tzcode, have a different mechanism for detecting whether a TZ string is known. In tzcode, for example, tzalloc("Etc/Unknown") returns a null pointer, following NetBSD's precedent. If we added Etc/Unknown to TZDB, tzcode and similar implementations would become trickier to use. For example, a caller would need to know that tzalloc can sometimes return a non-null pointer even if the timezone is unknown, if tzalloc's argument happens to be "Etc/Unknown" or "/usr/share/zoneinfo/Etc/Unknown" or whatever. Or perhaps tzalloc would need to have "Etc/Unknown" hardwired into it, for compatibility with JavaScript - but older tzcode implementations won't have that hardwiring. Or perhaps we'd extend the TZif format to represent "unknown" timezones, complicating all TZif readers. Et cetera. One more complication: CLDR currently <https://github.com/unicode-org/cldr-json/releases/tag/46.0.0> lists both "Etc/Unknown" and "Factory" as aliases for "unk". So it appears that currently CLDR wants Etc/Unknown and Factory to be equivalent, which contradicts Arthur's suggestion (seconded by Brian) to keep the names distinct as they have different motivations. All in all it sounds better to let sleeping dogs lie and document the guarantee that Guy suggested, without changing TZDB's data. As you write, that would suffice. It means that JavaScript can return the equivalent of "Etc/Unknown" in places where tzcode returns a null pointer - to some extent JS simply uses a string where C uses a null pointer to represent failure. Proposed patch attached and installed into the TZDB development repository. Assuming this proposal is acceptable, it might be better for a future version of CLDR to not say that Etc/Unknown and Factory are aliases, as the names have differing intents and behaviors in non-JavaScript implementations.
On Dec 8, 2024, at 1:04 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Many of these implementations, including tzcode, have a different mechanism for detecting whether a TZ string is known. In tzcode, for example, tzalloc("Etc/Unknown") returns a null pointer, following NetBSD's precedent.
Unfortunately, tzset() doesn't return a value, so it can't indicate that TZ is set to a value not in the tzdb. (And, unfortunately, there's also no way to indicate "well, we *do* have a file for that tzid, but it's not a valid file" - e.g., tzh_timecnt specifies more transition times than are in the file - in a way that it can be distinguished from "we don't have a file for that tzid.)
All in all it sounds better to let sleeping dogs lie and document the guarantee that Guy suggested, without changing TZDB's data. As you write, that would suffice. It means that JavaScript can return the equivalent of "Etc/Unknown" in places where tzcode returns a null pointer - to some extent JS simply uses a string where C uses a null pointer to represent failure. Proposed patch attached and installed into the TZDB development repository.
I agree. For "It would, however, be a bit cleaner (and easier for programmers to understand) if CLDR could just reference the TZDB for all identifiers.", does "reference the TZDB" mean "refer to a name that corresponds to a file" or does it just mean "refer to something that the tzdb has assigned"? If the latter, this suffices. (What happens if I set LANG to unk_US? Is there a guarantee that here will be information for that locale, does setlocale() return NULL, or what? This strikes me as an analogous case.)
On 2024-12-08 13:21, Guy Harris wrote:
(And, unfortunately, there's also no way to indicate "well, we*do* have a file for that tzid, but it's not a valid file" - e.g., tzh_timecnt specifies more transition times than are in the file - in a way that it can be distinguished from "we don't have a file for that tzid.)
Although POSIX provides no way to do that, tzcode does because tzalloc sets errno on failure and zdump uses strerror to tell the user. For example: $ zdump No/Such/Zone zdump: unknown timezone 'No/Such/Zone': No such file or directory $ zdump zone.tab zdump: unknown timezone 'zone.tab': Invalid argument $ zdump . zdump: unknown timezone '.': Is a directory (For extra credit, try 'zdump /etc/tty'. Maybe we should disallow that sort of thing....)
On 2024-12-08 14:41, Paul Eggert via tz wrote:
On 2024-12-08 13:21, Guy Harris wrote:
(And, unfortunately, there's also no way to indicate "well, we*do* have a file for that tzid, but it's not a valid file" - e.g., tzh_timecnt specifies more transition times than are in the file - in a way that it can be distinguished from "we don't have a file for that tzid.)
Although POSIX provides no way to do that, tzcode does because tzalloc sets errno on failure and zdump uses strerror to tell the user. For example:
$ zdump No/Such/Zone zdump: unknown timezone 'No/Such/Zone': No such file or directory $ zdump zone.tab zdump: unknown timezone 'zone.tab': Invalid argument $ zdump . zdump: unknown timezone '.': Is a directory
(For extra credit, try 'zdump /etc/tty'. Maybe we should disallow that sort of thing....)
Disallowing these would also likely disallow variations on: $ zdump /usr/share/zoneinfo/.../... $ zdump <(...) unless you just check the magic cookie? -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
A data point, obtained from a build of the time zone package as distributed: Script started on 2024-12-08 22:23:57+00:00 [TERM="xterm-256color" TTY="/dev/tty1" COLUMNS="82" LINES="30"] $ export TZ=America/New_York $ ./date Sun Dec 8 17:24:32 EST 2024 $ export TZ=Etc/Unknown $ ./date Sun Dec 8 22:24:55 UTC 2024 $ exit exit Script done on 2024-12-08 22:24:57+00:00 [COMMAND_EXIT_CODE="0"] I could make an argument for: with TZ set to Etc/Unknown (or any other unfathomable value), some obnoxiously wrong results are best seen; a time zone abbreviation of ERR has been suggested as a possibility in the past. Side note, brought to mind by /etc/tty and /dev/tty: if TZ is "-", should the time zone information be read from the standard input? :-) @dashdashado On Sun, Dec 8, 2024 at 5:07 PM Brian Inglis via tz <tz@iana.org> wrote:
On 2024-12-08 14:41, Paul Eggert via tz wrote:
On 2024-12-08 13:21, Guy Harris wrote:
(And, unfortunately, there's also no way to indicate "well, we*do* have a file for that tzid, but it's not a valid file" - e.g., tzh_timecnt specifies more transition times than are in the file - in a way that it can be distinguished from "we don't have a file for that tzid.)
Although POSIX provides no way to do that, tzcode does because tzalloc sets errno on failure and zdump uses strerror to tell the user. For example:
$ zdump No/Such/Zone zdump: unknown timezone 'No/Such/Zone': No such file or directory $ zdump zone.tab zdump: unknown timezone 'zone.tab': Invalid argument $ zdump . zdump: unknown timezone '.': Is a directory
(For extra credit, try 'zdump /etc/tty'. Maybe we should disallow that sort of thing....)
Disallowing these would also likely disallow variations on:
$ zdump /usr/share/zoneinfo/.../...
$ zdump <(...)
unless you just check the magic cookie?
-- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
On 2024-12-08 14:42, Arthur Olson via tz wrote:
I could make an argument for: with TZ set to Etc/Unknown (or any other unfathomable value), some obnoxiously wrong results are best seen; a time zone abbreviation of ERR has been suggested as a possibility in the past.
In other places TZDB currently uses the abbreviation "-00" to indicate invalid TZ settings. However, it neglects to use "-00" in the situation you mention. Fixed in the attached proposed patch.
Side note, brought to mind by /etc/tty and /dev/tty: if TZ is "-", should the time zone information be read from the standard input? :-)
This is related to Brian's comment about "zdump <(...)", where further thought is needed. I suspect we'll need to distinguish 'zdump "-"' (where reading from stdin could be useful) from TZ="-" (where we surely don't want a program to read from standard input merely because it wants the time of day and someone used a squirrelly TZ string).
On 2024-12-08 23:52, Paul Eggert wrote:
I suspect we'll need to distinguish 'zdump "-"' (where reading from stdin could be useful) from TZ="-" (where we surely don't want a program to read from standard input merely because it wants the time of day and someone used a squirrelly TZ string).
I did that by installing the attached proposed further patch.
On 2024-12-08 14:06, Brian Inglis via tz wrote:
Disallowing these would also likely disallow variations on:
$ zdump /usr/share/zoneinfo/.../...
$ zdump <(...)
unless you just check the magic cookie?
It shouldn't disallow the first variation, assuming the attached proposed patch which I've installed into the development repository. That's because /usr/share/zoneinfo/whatever is a regular file. Although it disallows the second variation, that's also true for zdump when built with glibc, which uses lseek to move around in the TZif file. The second variation causes such a zdump to read from a pipe where the lseek fails which means glibc tzset fails. Perhaps we can change zdump to allow that sort of thing, but this will need further thought.
On 2024-12-08 23:45, Paul Eggert wrote:
On 2024-12-08 14:06, Brian Inglis via tz wrote:
Disallowing these would also likely disallow variations on:
$ zdump /usr/share/zoneinfo/.../...
$ zdump <(...)
... Although it disallows the second variation, that's also true for zdump when built with glibc.... Perhaps we can change zdump to allow that sort of thing, but this will need further thought.
I did that for zdump when built with tzcode by installing the attached proposed patch.
I recently discovered that since 2006, the ICU file icu4c/source/tools/tzcode/icuzones has had a line "Zone Etc/Unknown 0 - Unknown"[1]. Any POSIX-like system that feeds this line into 'zic' might run afoul of the recently-proposed TZDB guideline[2] that Etc/Unknown is reserved and is not used by TZDB. This is because the line will cause Etc/Unknown to be a valid timezone with UT offset zero and abbreviation "Unknown", so that, for example, tzalloc("Etc/Unknown") will succeed instead of failing. I don't think this is a practical problem, as I don't know of any platform that feeds that icuzones line into zic and thus makes "Etc/Unknown" a valid timezone. However, it might be helpful to add a comment to icuzones about the situation. Proposed ICU patch attached. I am cc'ing this to Markus Scherer (who added the line) and to Yoshito Umaoka (who most recently updated icuzones), to give them a heads-up on the situation. [1]: https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/icuzo... [2]: https://lists.iana.org/hyperkitty/list/tz@iana.org/message/5QWH2DCQ6SLY6JFJT...
Paul Eggert via tz wrote in <bfa776a5-be6d-49f5-add4-63b672dd22b9@cs.ucla.edu>: |I recently discovered that since 2006, the ICU file |icu4c/source/tools/tzcode/icuzones has had a line "Zone Etc/Unknown 0 - |Unknown"[1]. Any POSIX-like system that feeds this line into 'zic' might |run afoul of the recently-proposed TZDB guideline[2] that Etc/Unknown is |reserved and is not used by TZDB. This is because the line will cause |Etc/Unknown to be a valid timezone with UT offset zero and abbreviation |"Unknown", so that, for example, tzalloc("Etc/Unknown") will succeed |instead of failing. | |I don't think this is a practical problem, as I don't know of any |platform that feeds that icuzones line into zic and thus makes |"Etc/Unknown" a valid timezone. However, it might be helpful to add a |comment to icuzones about the situation. Proposed ICU patch attached. | |I am cc'ing this to Markus Scherer (who added the line) and to Yoshito You actually had forgotten Markus Scherer. |Umaoka (who most recently updated icuzones), to give them a heads-up on |the situation. | |[1]: |https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/i\ |cuzones#L17 |[2]: |https://lists.iana.org/hyperkitty/list/tz@iana.org/message/5QWH2DCQ6SLY6\ |JFJTDMJQC2LYHTJKB7O/ |From 69218353e5e544348c8c0b138e6ab9c2e63c40aa Mon Sep 17 00:00:00 2001 |From: Paul Eggert <eggert@cs.ucla.edu> |Date: Tue, 10 Dec 2024 10:53:14 -0800 |Subject: [PATCH] Comment that icuzones should not be fed to zic | |--- | icu4c/source/tools/tzcode/icuzones | 4 ++++ | 1 file changed, 4 insertions(+) | |diff --git a/icu4c/source/tools/tzcode/icuzones b/icu4c/source/tools/tzc\ |ode/icuzones |index 940b0557acc..2a99c8645b9 100644 |--- a/icu4c/source/tools/tzcode/icuzones |+++ b/icu4c/source/tools/tzcode/icuzones |@@ -11,6 +11,10 @@ | # that are in CLDR and also include legacy ICU time zones originally | # in tz.alias for rataining backward compatibility. | |+# This file is not intended for use by zic, the TZDB timezone compiler. |+# Feeding it to zic would define Etc/Unknown as a known, valid timezone, |+# whereas Etc/Unknown should stand for an unknown or invalid timezone. |+ | # Add Etc/Unknown, defined by CLDR. Give it Etc/GMT behavior. | | # Zone NAME GMTOFF RULES FORMAT |-- |2.47.1 | --End of <bfa776a5-be6d-49f5-add4-63b672dd22b9@cs.ucla.edu> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) | |And in Fall, feel "The Dropbear Bard"s ball(s). | |The banded bear |without a care, |Banged on himself for e'er and e'er | |Farewell, dear collar bear
Steffen Nurpmeso via tz wrote in <20241210193131.fFgh9os9@steffen%sdaoden.eu>: |Paul Eggert via tz wrote in | <bfa776a5-be6d-49f5-add4-63b672dd22b9@cs.ucla.edu>: ||I recently discovered that since 2006, the ICU file ||icu4c/source/tools/tzcode/icuzones has had a line "Zone Etc/Unknown 0 - ||Unknown"[1]. Any POSIX-like system that feeds this line into 'zic' might ... ||I am cc'ing this to Markus Scherer (who added the line) and to Yoshito | |You actually had forgotten Markus Scherer. Well it seems you had not, but the mailing-list software filtered him out, likely because he is also a subscriber here. One should have known that better. *Of course*. But still.. (Once again bad that Mail-Followup-To: was never standardized, and then he likely configured to not receive duplicates, and the ML thus not only removed him from the RFC 5321 envelope / SMTP level, but also from the RFC 5322 email headers, so he will not be addressed in replies. Isn't it a pity that on the one hand the X via Y notation is used for the one thing, but not for the other? Like that his name would have been visible *at least* when the ML would have created the message; it likely would be removed in replies then later. Fwiw.) --Steffen Schönbein
Steffen Nurpmeso via tz wrote in <20241210193131.fFgh9os9@steffen%sdaoden.eu>: |Paul Eggert via tz wrote in | <bfa776a5-be6d-49f5-add4-63b672dd22b9@cs.ucla.edu>: ||I recently discovered that since 2006, the ICU file ||icu4c/source/tools/tzcode/icuzones has had a line "Zone Etc/Unknown 0 - ||Unknown"[1]. Any POSIX-like system that feeds this line into 'zic' might ... ||I am cc'ing this to Markus Scherer (who added the line) and to Yoshito | |You actually had forgotten Markus Scherer. Well it seems you had not, but the mailing-list software filtered him out, likely because he is also a subscriber here. One should have known that better. *Of course*. But still.. (Once again bad that Mail-Followup-To: was never standardized, and then he likely configured to not receive duplicates, and the ML thus not only removed him from the RFC 5321 envelope / SMTP level, but also from the RFC 5322 email headers, so he will not be addressed in replies. Isn't it a pity that on the one hand the X via Y notation is used for the one thing, but not for the other? Like that his name would have been visible *at least* when the ML would have created the message; it likely would be removed in replies then later. Fwiw.) --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) | |And in Fall, feel "The Dropbear Bard"s ball(s). | |The banded bear |without a care, |Banged on himself for e'er and e'er | |Farewell, dear collar bear
* Guy Harris via tz:
On Dec 7, 2024, at 9:15 PM, Mark Davis Ⓤ <mark@unicode.org> wrote:
That is, the API is returning a value indicating that what you asked for doesn't have a well-defined answer.
Then, presumably, all that is needed is a guarantee from the tzdb maintainers that "Etc/Unknown" will never be assigned to a timezone.
Given that tzset() does not return a success/failure indication, and that the 2024 Single UNIX Specification's page for taste() says nothing about the behavior if TZ isn't set to a valid value (for some definition of "valid"):
https://pubs.opengroup.org/onlinepubs/9799919799/functions/tzset.html
nor does the 2024 Single UNIX Specification page on environment variables specify the the value of TZ in a fashion that permits all possible values to be determined to be valid or invalid:
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html
the behavior after calling tzset() if TZ is set to Etc/Unknown isn't specified anywhere, and it could be the same as the behavior if tzset("This/Does/Not/Exist") is called on a system that just uses the tzdb files, so there doesn't appear to be any need to put Etc/Unknown into the database - a guarantee that Etc/Unknown will never be used for any timezone in the tzdb should suffice
It's also not possible to detect on some POSIX-like systems (that use the IANA database) whether the system administrator has used a specific IANA identifier to configure the system. Applications only see the data blob describing the time zone behavior and (sometimes) abbreviations, and that doesn't include the identifier. Thanks, Florian
On 2024-12-10 03:31, Florian Weimer via tz wrote:
* Guy Harris via tz:
On Dec 7, 2024, at 9:15 PM, Mark Davis Ⓤ <mark@unicode.org> wrote:
That is, the API is returning a value indicating that what you asked for doesn't have a well-defined answer.
Then, presumably, all that is needed is a guarantee from the tzdb maintainers that "Etc/Unknown" will never be assigned to a timezone.
Given that tzset() does not return a success/failure indication, and that the 2024 Single UNIX Specification's page for taste() says nothing about the behavior if TZ isn't set to a valid value (for some definition of "valid"):
https://pubs.opengroup.org/onlinepubs/9799919799/functions/tzset.html
nor does the 2024 Single UNIX Specification page on environment variables specify the the value of TZ in a fashion that permits all possible values to be determined to be valid or invalid:
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html
the behavior after calling tzset() if TZ is set to Etc/Unknown isn't specified anywhere, and it could be the same as the behavior if tzset("This/Does/Not/Exist") is called on a system that just uses the tzdb files, so there doesn't appear to be any need to put Etc/Unknown into the database - a guarantee that Etc/Unknown will never be used for any timezone in the tzdb should suffice
It's also not possible to detect on some POSIX-like systems (that use the IANA database) whether the system administrator has used a specific IANA identifier to configure the system. Applications only see the data blob describing the time zone behavior and (sometimes) abbreviations, and that doesn't include the identifier.
You should be able to figure it out whether using symlinks, hardlinks, or check sums of some kind: $ head -v /etc/timezone; \ tail -vn1 /etc/localtime; echo; \ llgo /etc/localtime; echo; \ find /usr/share/zoneinfo/ -inum `stat -L -c%i /etc/localtime`; echo; \ cksum /etc/localtime; echo; \ find /usr/share/zoneinfo/ -size `cksum /etc/localtime | cut -d' ' -f2`c \ | xargs cksum ==> /etc/timezone <== America/Edmonton ==> /etc/localtime <== MST7MDT,M3.2.0,M11.1.0 lrwxrwxrwx 1 31 May 25 2020 /etc/localtime -> ../usr/share/zoneinfo/localtime /usr/share/zoneinfo/America/Edmonton /usr/share/zoneinfo/Canada/Mountain /usr/share/zoneinfo/Etc/localtime /usr/share/zoneinfo/localtime /usr/share/zoneinfo/posix/America/Edmonton /usr/share/zoneinfo/posix/Canada/Mountain 1365654935 2332 /etc/localtime 1365654935 2332 /usr/share/zoneinfo/America/Edmonton 1365654935 2332 /usr/share/zoneinfo/Canada/Mountain 1365654935 2332 /usr/share/zoneinfo/Etc/localtime 1365654935 2332 /usr/share/zoneinfo/localtime 1365654935 2332 /usr/share/zoneinfo/posix/America/Edmonton 1365654935 2332 /usr/share/zoneinfo/posix/Canada/Mountain -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
* Brian Inglis via tz:
It's also not possible to detect on some POSIX-like systems (that use the IANA database) whether the system administrator has used a specific IANA identifier to configure the system. Applications only see the data blob describing the time zone behavior and (sometimes) abbreviations, and that doesn't include the identifier.
You should be able to figure it out whether using symlinks, hardlinks, or check sums of some kind:
$ head -v /etc/timezone; \ tail -vn1 /etc/localtime; echo; \ llgo /etc/localtime; echo; \ find /usr/share/zoneinfo/ -inum `stat -L -c%i /etc/localtime`; echo; \ cksum /etc/localtime; echo; \ find /usr/share/zoneinfo/ -size `cksum /etc/localtime | cut -d' ' -f2`c \ | xargs cksum ==> /etc/timezone <== America/Edmonton ==> /etc/localtime <== MST7MDT,M3.2.0,M11.1.0
Some systems (especially containers) have just /etc/localtime, and do not provide all those unused files under /usr/share/zoneinfo. Content-based matching fails for links and files that just happen to be the same (which would apply to Etc/Unknown, I expect). Thanks, Florian
On 2024-12-11 05:09, Florian Weimer wrote:
* Brian Inglis via tz:
It's also not possible to detect on some POSIX-like systems (that use the IANA database) whether the system administrator has used a specific IANA identifier to configure the system. Applications only see the data blob describing the time zone behavior and (sometimes) abbreviations, and that doesn't include the identifier.
You should be able to figure it out whether using symlinks, hardlinks, or check sums of some kind:
$ head -v /etc/timezone; \ tail -vn1 /etc/localtime; echo; \ llgo /etc/localtime; echo; \ find /usr/share/zoneinfo/ -inum `stat -L -c%i /etc/localtime`; echo; \ cksum /etc/localtime; echo; \ find /usr/share/zoneinfo/ -size `cksum /etc/localtime | cut -d' ' -f2`c \ | xargs cksum ==> /etc/timezone <== America/Edmonton ==> /etc/localtime <== MST7MDT,M3.2.0,M11.1.0
Some systems (especially containers) have just /etc/localtime, and do not provide all those unused files under /usr/share/zoneinfo. Content-based matching fails for links and files that just happen to be the same (which would apply to Etc/Unknown, I expect).
So far there don't appear to be any - same content == same timezone e.g. Navajo == Mountain == US/Mountain == America/Shiprock == America/Denver; localtime == Canada/Mountain == America/Edmonton; et al.: including posix and right 1582 files 5.9MB become 992 links => 590 inodes 3.6MB. -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
On 2024-12-11 08:43, Brian Inglis via tz wrote:
So far there don't appear to be any - same content == same timezone
Indeed, Makefile's now.ck verifies that every zonenow.tab entry has distinct timestamp behavior. Although not quite the same as checking that every distinct file in /usr/share/zoneinfo has distinct contents, the overall intent is the same: avoiding duplicate Zones.
On 2024-12-06 18:10, Justin Grant via tz wrote:
Hi TZ friends - Should the time zone identifier "Etc/Unknown" (standardized in Unicode Technical Standard 35 and used by the ICU <https://icu.unicode.org/> library that implements time zone support in all major web browsers, in Java, and in other platforms) be added to the IANA Time Zone Database?
If they chose "Etc/Unknown" because they wanted something that is not a defined TZDB name, we shouldn't define it in TZDB, because then we'll screw up whatever thing they had in mind. On the other hand, if they want their internal name "unk" to correspond to a TZDB name, wouldn't it be better to change UTS#35's two instances of the string "Etc/Unknown" to "Factory"? That way, UTS#35 would work with both current and previous TZDB releases. Either way, it'd be helpful to get clarification from the maintainers of TR35 as to what they want for "unk". I created an issue CLDR-18167[1] asking about this. It's currently assigned to Mark Davis, the editor for UTS#35. [1]: https://unicode-org.atlassian.net/browse/CLDR-18167
Thanks Paul, it's a good idea to loop in Mark Davis. He may know more about the history of Etc/Unknown. Mark also has some context about this topic because he was part of the group that helped to ship CLDR-17111 <https://unicode-org.atlassian.net/browse/CLDR-17111> which was the CLDR change that broke Chrome and prompted this thread. But I haven't spoken to Mark about this proposed change to TZDB so it'd be good to close the loop. wouldn't it be better to change UTS#35's two instances
of the string "Etc/Unknown" to "Factory"? That way, UTS#35 would work with both current and previous TZDB releases.
IMO it would have been ideal if UTS 35 and ICU had adopted Factory from the outset. But ICU changing to Factory now after many years of Etc/Unknown would have significant compatibility issues. Many ICU callers expect "Etc/Unknown" to be returned by ICU for time zones that are not in CLDR. Callers could have used ICU's getUnknown() <https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/classicu_1_1TimeZone...> method to avoid hardcoding, but judging by the 12.5K GitHub results for Etc/Unknown <https://github.com/search?q=%22Etc%2FUnknown%22&type=code>, not every developer got the memo. :-( ICU generally avoids breaking changes like this. Given that ICU is UTS 35's largest implementation, it's unlikely that the Unicode folks would change the standard to something that ICU wouldn't be able to adopt. Speaking of standards, for the last few years I've been helping to clarify parts of the ECMAScript spec that deal with time zones. Much of this new spec text <https://402.ecma-international.org/2.0/index.html#sec-canonicalizetimezonena...> is documenting how CLDR uses and adapts TZDB data. I've been meaning to post here on the TZ list to share this link in case folks are curious how JS engines use TZDB. Also, thanks so much for maintaining TZDB! JS developers are grateful. One unresolved issue in the ECMAScript spec is what to do about Etc/Unknown, because there's text in ECMA-402 <https://tc39.es/ecma402/#sec-use-of-iana-time-zone-database> that all browsers violate: No String may be an available named time zone identifier unless it is a Zone name or a Link name in the IANA Time Zone Database Although that language is from a PR of mine, I'm only describing existing behavior that has been in ECMAScript for ten years <https://402.ecma-international.org/2.0/index.html#sec-canonicalizetimezonena...>. To finally get browsers into compliance with the spec, I was planning to document Etc/Unknown as a special case in ECMA-402, but before proposing that change I wanted to check if there was interest in adding Etc/Unknown to TZDB instead. Best, Justin On Fri, Dec 6, 2024 at 10:51 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 2024-12-06 18:10, Justin Grant via tz wrote:
Hi TZ friends - Should the time zone identifier "Etc/Unknown" (standardized in Unicode Technical Standard 35 and used by the ICU <https://icu.unicode.org/> library that implements time zone support in all major web browsers, in Java, and in other platforms) be added to the IANA Time Zone Database?
If they chose "Etc/Unknown" because they wanted something that is not a defined TZDB name, we shouldn't define it in TZDB, because then we'll screw up whatever thing they had in mind.
On the other hand, if they want their internal name "unk" to correspond to a TZDB name, wouldn't it be better to change UTS#35's two instances of the string "Etc/Unknown" to "Factory"? That way, UTS#35 would work with both current and previous TZDB releases.
Either way, it'd be helpful to get clarification from the maintainers of TR35 as to what they want for "unk". I created an issue CLDR-18167[1] asking about this. It's currently assigned to Mark Davis, the editor for UTS#35.
participants (10)
-
Arthur Olson -
brian.inglis@systematicsw.ab.ca -
Florian Weimer -
Guy Harris -
Justin Grant -
Lester Caine -
Mark Davis Ⓤ -
Paul Eggert -
Steffen Nurpmeso -
Steffen Schönbein