From: Stephen Colebourne <scolebourne@joda.org> Date: Tue, 23 Jan 2018 18:42:29 +0000 Subject: Re: [tz] OpenJDK/CLDR/ICU/Joda issues with Ireland change | Java time-zone data is updated using the tzupdater tool | [URL omitted here] | This will update the tzdb data, but not the CLDR-driven data that | drives the text. That is most probably a mistake - the two should be linked, it is entirely possible that a zone might change its names (regardless of issues of when transitions occur, or what, if anything, is regarded as the "standard" time). | Were the change to proceed, anyone running tzupdater | with the Ireland change would invert the meaning of inDaylightTime() | and access the wrong array element in the CLDR-driven data - a bug. Yes, it would be, and CLDR or java (whichever has the issue, or both) should fix it. And fix it soon. | And code changes don't help, as we'll see below. Of course code changes help - there's a bug, fixing the bug will fix that. And also of course, for people who don't update, the bug will continue to appear - as for any other bug, or security vunerability that is found and fixed. Nothing that we can do about that. People who won't, or can't, update get screwed by all kinds of things. | There is no possible fix to Java, as this is primarily an issue | between CLDR and TZDB. The two have a subtle API linkage which has | perhaps never been clearly spelled out here. Yes, they do, that ought to be obvious - the linkage is not (or should not be) subtle - it should be obvious. | CLDR provides textual names for time-zones, as an array [winter, | summer]. That itself is a bug. It assumes there are just two (not including for the "generic" name, mentioned in a later message from Yoshito Umaoka, which is probably the more useful one of the three anyway) - and there is no guarantee that will (or even always has) remain true. There is nothing to stop some locality (probably one at a high latitude) from deciding that they should advance the clocks in early spring, and then advance them further in early-mid summer, returning to the intermediate (or some other) value in late summer, and then to the original in late autumn (or fall if autumn happens to be called that in the relevant location). What's more, they could give 4 different names to the 3 (or 4) different offsets, perhaps "winter time" "spring time" "summer time" and "autumn time" with 4 different abbreviations. There could even be a mid winter fallback of even more, just as there could be a mid summer skip forward of more. Calling any of those offsets "standard" and the others as something different is really nonsense, though the jurisdiction (and people) might pick that label - but when they do, we should all remember that it is just another name. One offset is mot more blessed than any other because it happens to be labelled as the "standard" time. It might be different if we defined "standard time" to be the nearest "natural" offset based on lines of longitude - but with what resolution? And how would you apply that to China or India? So we don't do that. No-one does. CLDR (and its clients) needs to be able to represent all this. Tzdb can. CLDR must also handle places which (given the durations of the two periods that is common these days) decide that "standard" time be the one that applies for longer each year, and so should be the time in summer, and in winter the clocks should be set backwards some number of minutes for a few months, so it does not remain dark quite so late in the mornings ("darkness saving time" - aka DST). | As a much larger project with considerable history the order | of that array is not going to change. More than that needs to change, the order is not, or should not be, material. Just accept it - the design is broken, and must be fixed. | (I'm using winter and summer for CLDR for this email to aid clarity, | they refer to them as standard and daylight). Either way exposes the broken assumption that there are just two. | TZDB provides the offsets, SAVE values and a short text string. This | text string - GMT/IST or IST/GMT - is not directly linkable to the | data CLDR provides. It probably should be, probably when accompanied by the offset and the relevant time (perhaps the offset is less needed, or useful), those should be the key to the translated strings. But not as indexes into an array, that's just plain stupid. As database keys (for "database" in the general, not implying anything SQL based or similar). Alternatively, perhaps localized zoneinfo files should be used instead, built from a modified zic, which embeds the localized names (for some particular locality) with the raw data (probably in a similar way to, or perhaps instead of, how the abbreviations are handled now). That would mean one set of zoneinfo files for each locality an installation wants to support, but zoneinfo files are not really all that big (and adding a few extra strings to them would not make much difference) so this should not be seen as too much of a drawback - then CLDR users would simply use those files instead of the normal ones (if those even continue to exist on the system) for all purposes. This would obviously handle the problem of the two being updated independantly fairly easily. It does mean that if the "normal" files continue to exist, as both cldr and older applications both exist on the system, then those would need to be updated together. This should not be a problem, the update of one is simply not made available until both are ready. | Although it may seem that you can use the text | from TZDB as a key to lookup the correct value in CLDR, I know from | painful experience that approach fails (as the TZDB text varies over | time, Yes, and when it does the CLDR strings ("translations" into local formats) [ translations in quotes as I know that is not exactly what they are ] may need to change as well. There are multiple reasons why the TZDB names might change, some are, frankly, silly, but others represent real changes in what the local users call their times. In some cases the CLDR strings may have already matched local expectations, and nothing needs to alter, but in others the local's name might have changed (in their language, as well as in English) and the CLDR strings need to be updated (augmented). This is why the CLDR data should really be updated (if required) and (always) transmitted whenever the tzdb (zoneinfo) data changes. | has the same text in winter and summer, or isn't even text). I have no idea what the latter means - they are all text (we do not define zone abbreviations as random binary), unless you mean the +04 types, which are text, just text containing digits and +/- signs, rather than only letters. But you're right the "sometimes the same" (which is actually a very sane choice) means that you cannot use the abbreviation alone to map. However, the name, and the time to which it is being applied, is enough (and perhaps to avoid running that time through localtime() or its equivalent again just to get the offset, probably that as a param as well. We know localtime() must have been run already, or the data currently used would not be available.) | Thus, the only reliable way to pick which piece of CLDR data is needed | is from the offsets. Not even that alone, as the same offset can have different names during different periods. That (unlike some of my potential scenarios) has actually been observed in the past, and CLDR needs to handle that we well. It is simply untrue, and incorrect, to assume that if (in locaiity X) times at offset N are called ABC and times at offset M are called DEF today, than that was true last year. The old and the new names need to be available and applied to the appropriate times. This is true just as it is true that CLDR data is needed for more than calendaring applications - the only thing that matters is not just when the next meeting is schedueled (with the day and month, and timezone names converted to the local correct forms.) | For 20 years, this has been done in a simple and straightforward way - | if (raw-offset != actual-offset) then CLDR uses summer text and array | element 1. So, for 20 years there has been a latent bug. If for 20 years there has been a latent bug that allows a security breach, are you going to simply say "it has been there too long, we can't fix it now" ? Really? It makes no difference how old it is, a bug is a bug, and needs to be fixed. | This provides the necessary glue to link the two projects: It is the wrong glue. | TZDB has always had the raw and actual offsets What on earth is the "raw offset"? I somehow suspect that you (and perhaps CLDR in general) is reading too much into the tzdb source files. 99.9999% of people (not being zic) should really be ignoring those files, and everything they contain (the remaining percentage are the people who maintain the data - all 10 or 20 or so of them in the world). Everything else should be based upon the zoneinfo output files from zic - and that has no notion of a "raw" offset at all, all that exists, and all that you can ever assume, is that for some period of time (or indefinite length, starting at arbitrary and often unpredictable instants) a particular timezone will be at some offset from UTC. It might also be associated with some name (in reality, many are not, as Paul keeps pointing out, many of the abbreviated names that tzdb contain were purely invented for tzdb, because the (US centric) UNIX API/ABI required them - some of those are the ones being turned into numeric offsets represented as text strings - it makes no difference in the zone concerned, as there the time is just "the time" it has no other name (we really should have no abbreviation at all, and CLDR should have no translation of it). | the same in winter and different in summer, Once upon a time, the world was always flat, everyone knew that, the pope even proclaimed it... | so this has always worked. The latent bug was not exposed. That is not "worked" it is rather "managed to survive". | The Ireland proposal breaks this, with (raw-offset != actual-offset) | meaning winter, instead of summer. It is fair for TZDB to complain | that CLDR is inflexible with its definitions, but the reality is that | this was and is the only way to connect two separately developed | projects (where API stability is vital). Nonsense. It was just someone's idea of something they thought would work, and which seemed to - but it was based upon unfounded (and incorrect) assumptions about the natire of civil time, and how it can be expected to work. | In order for TZDB and CLDR to co-exist, it is *required* that the raw | offset equals the actual offset in winter, No, it is *required* for CLDR to be fixed. What is happening now is obviously incorrect. | This isn't a change that can be delayed for a year. Oh good, so we can make it now? | This interpretation of inSummerTime() relies on positive SAVE values, So, fix it. It is broken. | is part of the public API of TZDB just as much as the source code file | format is. If that's all, then we have no problem, as the source file format should not be regarded as part of anything except the method by which we happen to represent the data before zic converts it to zoneinfo. The source format has changed, and will change again - that is guaranteed. The zoneinfo format (in binary form, or converted to text) is designed to be immune to all of the schenanigans that go on, and really is what everyone should be using. If anyone believes that they need the source files for anything other than feeding to zic (or some equivalent program for systems that cannot run it, if there are any) then that almost guarantees that they are making some unststainable assumptions, which will, one day, be proven false. We (of course) attempt to remain backward compatible, but as legislatures (and the people under their governance) do weirder and weirder things, we are likely to find that the current language is incapable of expressing what needs to be expressed, and it ill be extended. I know there are others that read it, but this should be treated in a similar way to the way that compilers treat programming language specifications - when the language is extended (as all that are not dead have happen) the compilers all need to be updated to deal. Similarly, when tax legislation is amended (about the only thing that changes even more frequently, and for less rational reasons than timezones) the accountants, and the software they use, needs to be updated to deal with that. Updates/changes are simply a fact of life, there is nothing that is guaranteed (not really even death or taxes) that we can promise will never change. Hopefully zoneinfo files will not need much - though it aready has changed when 64 bit time support was added, and might need more, if people dealing 2038 issues find some innovative way to allow 32 bit timestamps to keep working, in some fashion, beyond 2038 in order to retain compat with old databases that cannot be updated easily. Everyone needs to remain aware of this. Sticking our heads in the sand and proclaiming "it always worked in the past, it must be made to continue working in the future" is, frankly, absurd. kre ps: I am sure apologies will be needed, I have tried to find and correct all my typos, but right now, my e-mail environment is horribly challenged, and I have no way to rationally do spell or grammar checks I normally would (well sometimes) attempt. So, consider that for any unfound mistakes, apologies are tendered.