Guy Harris <guy@alum.mit.edu> wrote on 01/24/2018 05:17:24 PM:
From: Guy Harris <guy@alum.mit.edu> To: Yoshito Umaoka <yoshito_umaoka@us.ibm.com> Cc: Stephen Colebourne <scolebourne@joda.org>, Time Zone Mailing List <tz@iana.org>, tz <tz-bounces@iana.org> Date: 01/24/2018 05:17 PM Subject: Re: [tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change
On Jan 24, 2018, at 1:17 PM, Yoshito Umaoka <yoshito_umaoka@us.ibm.com> wrote:
CLDR XML (or JSON) data is consumed by other projects such as ICU and Java, and these external projects know those offsets. CLDR only specifies daylight saving time name used for Europe/ Dublin is "Irish Standard Time". ICU/Java imports zoneinfo from tz database, and obtain offset at a given time, then decide whether it's in standard time or daylight time.
The tz binary database has, for all transition times, an indication of whether, after the transition, you are in "DST". If the tz binary database is what Java time zone code imports, it doesn't need to look at offsets to determine whether the times are standard or "DST", it can just use those values. (I say "DST" because that's used to set tm_isdst.)
I cannot speak for Java. ICU does not use the tz binaries - ICU generates own binary resources for tzdata source files. The information equivalent to tm_isdst is stored in the ICU binary format. In addition to this, ICU also store raw-offset and DST saving amount, that is not available in the tz binaries. ICU preserve the information for supporting some legacy APIs - getRawOffset, etc..
It does *not* contain any offsets other than, for each transition, what the offset from UTC is. Thus, it provides no notion of "raw- offset" vs. "actual-offset", and you can't determine both a "raw- offset" and an "actual-offset" from the tz binary database without either 1) additional data or 2) some possibly-incorrect assumptions being made, such as "the only reason why an entry in the table of transitions has a different tt_gmtoff value is that the transition represents starting or ending DST" (that latter assumption has been false for a very very very very very long time for some tzdb regions, as a given region might switch from one time zone to another).
The tzdb *source* files, however, give the "standard" offset from UTC in zone lines and the "amount to save", to be added to the "standard" offset, in rule lines, so code that parses those files independently, rather than relying on the binary files produced by zic parsing the files, can get both the "standard" and "current" offset from UTC.
Correct. As I explained above, ICU modified zic also store raw (standard) offset and DST amount.
Which of those two things does the Java code that "imports zoneinfo from tz database do? Does it read the binary data, or independently read the source data (or read binary data produced by a parser otherthan
zic)? CLDR and ICU are two separate projects, although CLDR was originally a part of ICU project historically. Our biggest issue with the change in 2018a/b was not actually negative DST offset. The bigger issue is swapping standard/daylight saving names. (Although, it's still a problem to adopt such rule, because we have a bug in our code invalidating the negative DST saving amount in all ICU versions released in the past, and need to distribute a patch to handle such case.) At this moment, the TZ database project does nothing with i18n. Names used for displaying time zones are pretty much US centric. But there are many other external projects that want to utilize the rules for clock changes. CLDR is trying to provide localized expression of time zone names in various different languages. CLDR sets an assumption that name of zones are very stable. For example, "Pacific Standard Time" represents standard time used on US Pacific coast and the name itself does not change time to time. However, transition rules are changing much more frequently, thus there are many releases of new tz database. To localize time zone display name, CLDR needs to assign a unique key to each translatable text. And CLDR uses a combination of zone ID and standard/daylight difference. Because names are assumed as very stable, a consumer of CLDR usually does not provide a mechanism to distribute updated names. Of course, if CLDR and ICU are one project and data is only consumed by ICU, then it's relatively easy to adopt such change. We just need to update zone name data and code handling the clock at the same time. But they are two separate projects, and CLDR is consumed by numbers of other projects, that does not have any controls for clock calculation. So such change could easily break downstream consumers, who utilizes the TZ database. I'm not sure what we want to do in CLDR if this change is brought back to the TZ database at this moment. CLDR technical committee may decide not to make corresponding change, instead, we might just change the definition of keys assigned to each zone name strings. Thanks, Yoshito (ICU/CLDR)