Re: [tz] Java & Rearguard

June 9, 2019


      ...
CLDR+Java cannot handle Irish time correctly for past timestamps
We have not seen any demand for names before 1970, thus we haven't designed
for more than two (regular) offsets per year for a given zone. It would not
be hard, however, to add additional offsets, either for historic times, or
if for some reason that becomes fashionable in the future.

(Luckily, the tendency seems to be in the other direction, collapsing from
2 offsets into 1.)

Mark


On Sat, Jun 8, 2019 at 9:20 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
...
Steve Summit wrote:
...
I'm not sure that's an entirely fair challenge, though.
Given that (as I understand it) Java and ICU/CLDR use tt_isdst
to decide whether to display their equivalents of "GMT" or "IST",
I don't think they *can*  get the right answer near 1970
Yes, Ireland in 1970 is an "unfair" challenge. That was its point. It was
intended to illustrate the inadequacy of the current CLDR/Java model to
represent real-world aspects of civil timekeeping.
...
tzdb changed its mind about the mapping at that point.
I'm not sure what you mean by "mapping", but the 2018a change to Irish
data was
in response to a bug report about Irish time, a bug report that was
investigated
and found to be valid. Since tzdb can represent the Irish data as per
Irish law
and common use, the change was warranted from the tzdb point of view. And
since
Java's TZUpdater program currently rejects the changed data, I developed a
'rearguard' option to tzdb that lossfully converts the main-format tzdata
into a
rearguard format that pacifies TZUpdater.
However, even with the rearguard option (and even if we go back to circa
2017
code and data before this latest kerfuffle started), CLDR+Java cannot
handle
Irish time correctly for past timestamps due to what appear to be
shortcomings
in its model. This problem is not limited to Irish time; it also occurs
for time
in Los Angeles during World War II (see example below) and in several
other
areas, including Morocco right this minute and quite possibly in North
America
and Europe in the near future.
$ jshell
   |  Welcome to JShell -- Version 12.0.1
   |  For an introduction type: /help intro
jshell> var jan1943 = java.time.Instant.ofEpochSecond(-852051600)
   jan1943 ==> 1943-01-01T07:00:00Z
jshell> var zone = java.time.ZoneId.of("America/Los_Angeles")
   zone ==> America/Los_Angeles
jshell> var dtf =
java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd
HH:mm:ss Z z (zzzz)")
   dtf ==> Value(YearOfEra,4,19,EXCEEDS_PAD)'-'Value(MonthOf ... RT)'
''('ZoneText(FULL)')'
jshell> jan1943.atZone(zone).format(dtf)
   $4 ==> "1943-01-01 00:00:00 -0700 PDT (Pacific Daylight Time)"
jshell>
   $ TZ=America/Los_Angeles date -d@-852051600 +"%s %Y-%m-%d %H:%M:%S %z
%Z"
   -852051600 1943-01-01 00:00:00 -0700 PWT
Near the end of the example above, Java says "PDT" where tzdb says "PWT",
because Java can't handle PWT.
...
Now, it's true, isdst might not be the best key to use for this
sort of thing any more.  Do we have recommendations for what
projects like Java and ICU/CLDR ought to be keying off of,
if not isdst? (I suppose tt_abbrind, or more likely the actual
string it indexes, might be better.)
I'm afraid they will need to solve this problem largely on their own, as
one
cannot look at tzdata and automatically derive strings like "Pacific War
Time"
or "Central Africa Ramadan Time": those strings are not in the data (not
even in
English), and there are no numeric equivalents either. The only
partially-relevant info in tzdata consists of abbreviations like "IST" and
"PDT"
and unfortunately these abbreviations are well-documented to be ambiguous
and
historically inaccurate in some cases.
It should be possible for CLDR+Java to develop reasonably-reliable
heuristics
for guessing what string to use in some cases. For example, they could
have a
heuristic that "IST" means "India Standard Time" in Asia/Kolkata, "Israel
Standard Time" in Asia/Gaza, Asia/Hebron and Asia/Jeruslaem, "Irish Summer
Time"
in Ireland before 1968-10-27, and "Irish Standard Time" in Ireland
starting
1968-10-27. Similar heuristics could be used for other abbreviations, and
if
CLDR+Java tune the heuristics enough they'd be accurate. However, they'd
have to
do most of this work on their own. For example tzdb does not have an
alphabetic
abbreviation for the current time in Morocco (+00, a 1-hour negative DST
where
standard time is +01), so CLDR would have to invent an abbreviation there
(presumably something like "Central Africa Ramadan Time" in English) and
base
its use on a heuristic like "when Africa/Casablanca is at +00 in the year
2019
or later, its time zone abbreviation is 'Central Africa Ramadan Time'".

Re: [tz] Java & Rearguard

Mark Davis ☕️