On 2024-12-08 10:26, Mark Davis Ⓤ via tz wrote:
Yes, that would suffice. It would, however, be a bit cleaner (and easier for programmers to understand) if CLDR could just reference the TZDB for all identifiers.
It's cleaner in some ways but messier in others. UTS#35 says "unk" means "Unknown or Invalid Time Zone". But if we added Etc/Unknown to TZDB then "unk" would correspond to a known and valid setting TZ="Etc/Unknown", equivalent in behavior to TZDB's TZ="Factory" or POSIX.1-2017's TZ="<-00>0". That could well be confusing, and it would disagree with the longstanding behavior of TZ="Etc/Unknown" in some implementations. JavaScript implementations conforming to UTS#35 need not worry about what TZDB does with "Etc/Unknown", as they can special-case that string and not inspect the TZDB data. So the question whether to add Etc/Unknown to TZDB is mostly about what non-JavaScript implementations should do with "Etc/Unknown". Many of these implementations, including tzcode, have a different mechanism for detecting whether a TZ string is known. In tzcode, for example, tzalloc("Etc/Unknown") returns a null pointer, following NetBSD's precedent. If we added Etc/Unknown to TZDB, tzcode and similar implementations would become trickier to use. For example, a caller would need to know that tzalloc can sometimes return a non-null pointer even if the timezone is unknown, if tzalloc's argument happens to be "Etc/Unknown" or "/usr/share/zoneinfo/Etc/Unknown" or whatever. Or perhaps tzalloc would need to have "Etc/Unknown" hardwired into it, for compatibility with JavaScript - but older tzcode implementations won't have that hardwiring. Or perhaps we'd extend the TZif format to represent "unknown" timezones, complicating all TZif readers. Et cetera. One more complication: CLDR currently <https://github.com/unicode-org/cldr-json/releases/tag/46.0.0> lists both "Etc/Unknown" and "Factory" as aliases for "unk". So it appears that currently CLDR wants Etc/Unknown and Factory to be equivalent, which contradicts Arthur's suggestion (seconded by Brian) to keep the names distinct as they have different motivations. All in all it sounds better to let sleeping dogs lie and document the guarantee that Guy suggested, without changing TZDB's data. As you write, that would suffice. It means that JavaScript can return the equivalent of "Etc/Unknown" in places where tzcode returns a null pointer - to some extent JS simply uses a string where C uses a null pointer to represent failure. Proposed patch attached and installed into the TZDB development repository. Assuming this proposal is acceptable, it might be better for a future version of CLDR to not say that Etc/Unknown and Factory are aliases, as the names have differing intents and behaviors in non-JavaScript implementations.