CLDR+Java cannot handle Irish time correctly for past timestamps
We have not seen any demand for names before 1970, thus we haven't designed for more than two (regular) offsets per year for a given zone. It would not be hard, however, to add additional offsets, either for historic times, or if for some reason that becomes fashionable in the future. (Luckily, the tendency seems to be in the other direction, collapsing from 2 offsets into 1.) Mark On Sat, Jun 8, 2019 at 9:20 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
Steve Summit wrote:
I'm not sure that's an entirely fair challenge, though. Given that (as I understand it) Java and ICU/CLDR use tt_isdst to decide whether to display their equivalents of "GMT" or "IST", I don't think they *can* get the right answer near 1970
Yes, Ireland in 1970 is an "unfair" challenge. That was its point. It was intended to illustrate the inadequacy of the current CLDR/Java model to represent real-world aspects of civil timekeeping.
tzdb changed its mind about the mapping at that point.
I'm not sure what you mean by "mapping", but the 2018a change to Irish data was in response to a bug report about Irish time, a bug report that was investigated and found to be valid. Since tzdb can represent the Irish data as per Irish law and common use, the change was warranted from the tzdb point of view. And since Java's TZUpdater program currently rejects the changed data, I developed a 'rearguard' option to tzdb that lossfully converts the main-format tzdata into a rearguard format that pacifies TZUpdater.
However, even with the rearguard option (and even if we go back to circa 2017 code and data before this latest kerfuffle started), CLDR+Java cannot handle Irish time correctly for past timestamps due to what appear to be shortcomings in its model. This problem is not limited to Irish time; it also occurs for time in Los Angeles during World War II (see example below) and in several other areas, including Morocco right this minute and quite possibly in North America and Europe in the near future.
$ jshell | Welcome to JShell -- Version 12.0.1 | For an introduction type: /help intro
jshell> var jan1943 = java.time.Instant.ofEpochSecond(-852051600) jan1943 ==> 1943-01-01T07:00:00Z
jshell> var zone = java.time.ZoneId.of("America/Los_Angeles") zone ==> America/Los_Angeles
jshell> var dtf = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss Z z (zzzz)") dtf ==> Value(YearOfEra,4,19,EXCEEDS_PAD)'-'Value(MonthOf ... RT)' ''('ZoneText(FULL)')'
jshell> jan1943.atZone(zone).format(dtf) $4 ==> "1943-01-01 00:00:00 -0700 PDT (Pacific Daylight Time)"
jshell> $ TZ=America/Los_Angeles date -d@-852051600 +"%s %Y-%m-%d %H:%M:%S %z %Z" -852051600 1943-01-01 00:00:00 -0700 PWT
Near the end of the example above, Java says "PDT" where tzdb says "PWT", because Java can't handle PWT.
Now, it's true, isdst might not be the best key to use for this sort of thing any more. Do we have recommendations for what projects like Java and ICU/CLDR ought to be keying off of, if not isdst? (I suppose tt_abbrind, or more likely the actual string it indexes, might be better.)
I'm afraid they will need to solve this problem largely on their own, as one cannot look at tzdata and automatically derive strings like "Pacific War Time" or "Central Africa Ramadan Time": those strings are not in the data (not even in English), and there are no numeric equivalents either. The only partially-relevant info in tzdata consists of abbreviations like "IST" and "PDT" and unfortunately these abbreviations are well-documented to be ambiguous and historically inaccurate in some cases.
It should be possible for CLDR+Java to develop reasonably-reliable heuristics for guessing what string to use in some cases. For example, they could have a heuristic that "IST" means "India Standard Time" in Asia/Kolkata, "Israel Standard Time" in Asia/Gaza, Asia/Hebron and Asia/Jeruslaem, "Irish Summer Time" in Ireland before 1968-10-27, and "Irish Standard Time" in Ireland starting 1968-10-27. Similar heuristics could be used for other abbreviations, and if CLDR+Java tune the heuristics enough they'd be accurate. However, they'd have to do most of this work on their own. For example tzdb does not have an alphabetic abbreviation for the current time in Morocco (+00, a 1-hour negative DST where standard time is +01), so CLDR would have to invent an abbreviation there (presumably something like "Central Africa Ramadan Time" in English) and base its use on a heuristic like "when Africa/Casablanca is at +00 in the year 2019 or later, its time zone abbreviation is 'Central Africa Ramadan Time'".