Re: [tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change

Jan. 25, 2018

      Guy Harris <guy@alum.mit.edu> wrote on 01/24/2018 05:17:24 PM:
...
From: Guy Harris <guy@alum.mit.edu>
To: Yoshito Umaoka <yoshito_umaoka@us.ibm.com>
Cc: Stephen Colebourne <scolebourne@joda.org>, Time Zone Mailing 
List <tz@iana.org>, tz <tz-bounces@iana.org>
Date: 01/24/2018 05:17 PM
Subject: Re: [tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change
On Jan 24, 2018, at 1:17 PM, Yoshito Umaoka <yoshito_umaoka@us.ibm.com> 
wrote:
...
CLDR XML (or JSON) data is consumed by other projects such as ICU 
and Java, and these external projects know those offsets.
CLDR only specifies daylight saving time name used for Europe/
Dublin is "Irish Standard Time".
ICU/Java imports zoneinfo from tz database, and obtain offset at a
given time, then decide whether it's in standard time or daylight time.
The tz binary database has, for all transition times, an indication 
of whether, after the transition, you are in "DST".  If the tz 
binary database is what Java time zone code imports, it doesn't need
to look at offsets to determine whether the times are standard or 
"DST", it can just use those values.  (I say "DST" because that's 
used to set tm_isdst.)
I cannot speak for Java.
ICU does not use the tz binaries - ICU generates own binary resources
for tzdata source files. The information equivalent to tm_isdst is stored
in the ICU binary format. In addition to this, ICU also store raw-offset
and DST saving amount, that is not available in the tz binaries. ICU 
preserve
the information for supporting some legacy APIs - getRawOffset, etc..
...
It does *not* contain any offsets other than, for each transition, 
what the offset from UTC is.  Thus, it provides no notion of "raw-
offset" vs. "actual-offset", and you can't determine both a "raw-
offset" and an "actual-offset" from the tz binary database without 
either 1) additional data or 2) some possibly-incorrect assumptions 
being made, such as "the only reason why an entry in the table of 
transitions has a different tt_gmtoff value is that the transition 
represents starting or ending DST" (that latter assumption has been 
false for a very very very very very long time for some tzdb 
regions, as a given region might switch from one time zone to another).
The tzdb *source* files, however, give the "standard" offset from 
UTC in zone lines and the "amount to save", to be added to the 
"standard" offset, in rule lines, so code that parses those files 
independently, rather than relying on the binary files produced by 
zic parsing the files, can get both the "standard" and "current" 
offset from UTC.
Correct. As I explained above, ICU modified zic also store raw (standard)
offset and DST amount.
...
Which of those two things does the Java code that "imports zoneinfo 
from tz database do?  Does it read the binary data, or independently
read the source data (or read binary data produced by a parser otherthan
zic)?

CLDR and ICU are two separate projects, although CLDR was originally a
part of ICU project historically.

Our biggest issue with the change in 2018a/b was not actually negative
DST offset. The bigger issue is swapping standard/daylight saving names.
(Although, it's still a problem to adopt such rule, because we have a bug
in our code invalidating the negative DST saving amount in all ICU
versions released in the past, and need to distribute a patch to handle
such case.)

At this moment, the TZ database project does nothing with i18n. Names
used for displaying time zones are pretty much US centric. But there are
many other external projects that want to utilize the rules for clock
changes. CLDR is trying to provide localized expression of time zone names
in various different languages.

CLDR sets an assumption that name of zones are very stable. For example,
"Pacific Standard Time" represents standard time used on US Pacific coast
and the name itself does not change time to time.

However, transition rules are changing much more frequently, thus there
are many releases of new tz database.

To localize time zone display name, CLDR needs to assign a unique key to
each translatable text. And CLDR uses a combination of zone ID and
standard/daylight difference.

Because names are assumed as very stable, a consumer of CLDR usually does
not provide a mechanism to distribute updated names.

Of course, if CLDR and ICU are one project and data is only consumed by
ICU, then it's relatively easy to adopt such change. We just need to
update zone name data and code handling the clock at the same time.

But they are two separate projects, and CLDR is consumed by numbers of
other projects, that does not have any controls for clock calculation.
So such change could easily break downstream consumers, who utilizes the
TZ database.

I'm not sure what we want to do in CLDR if this change is brought back
to the TZ database at this moment. CLDR technical committee may decide
not to make corresponding change, instead, we might just change the
definition of keys assigned to each zone name strings.

Thanks,
Yoshito (ICU/CLDR)