Since McMurdo was not founded until 16 Feb 1956 (wikipedia), I think that it is simply incorrect data to assign any time-zone transitions until that date. There simply were no DST transitions in McMurdo in the 1930s as it did not exist. https://github.com/jodastephen/tzdiff/commit/57aac13605f184a46f74d22f7dddc90... This is a a case where an ID that has existed for a long time with reasonably good data that has now been degraded through a Zone to Link conversion. Stephen (I hadn't realised the extent of the data pollution of McMurdo until I did my recent data analysis)
McMurdo is a special case of a more general problem, which is the representation of locations while uninhabited. *Every* location in the tz database was uninhabited at *some* point, and the tz database does not attempt to systematically record the details of when a location was inhabited and when it wasn't, as that's outside the scope (and the values are typically unknown anyway). In practice I've used "zzz" entries for uninhabited locations only when the database format *forced* a value. When it doesn't force a value I haven't worried about it, and I'd rather not start worrying about it now. There's an amusing instance of the opposite problem in Pacific/Johnston, which was inhabited when we created the tz database but became uninhabited in 2004. Its entry cheerily says "We're just like Honolulu!", and really, that's OK: worrying about the discrepancy would be more trouble for everybody than it's worth.
Just to note that I firmly disagree with this analysis. We would not and should not create an ID for an uninhabited location, but where somewhere is or was inhabited we should make best efforts to define accurate data. The new McMurdo data is clearly not accurate prior to 1956. For example, someone can use the APIs I write to ask the question "which locations had DST in 1932?". That answer is now wrong for McMurdo. The key problem with the change for data consumers is the fact that McMurdo was uninhabited in the 1930s is *external* information, that an application would now need to *separately* know in order to get the correct result for McMurdo. I cannot inflict that pain on my users. The problem I have is that I'm no longer sure I can trust tzdb to safely be the guardian of the limited pre-1970 data which it has always possessed and which Java has long used. I will be talking to Oracle people this week to discuss what options we have for Java probably requiring manual workarounds of the damaged data. <shakes head in despair> Stephen (BTW, the "everywhere was uninhabited" point does not make sense. An uninhabited location would effectively be on LMT, so tzdb is accurate as far as it can be. Only locations like McMurdo change from unihabited to inhabited at a known date, and LMT should apply before that date) On 20 September 2013 17:03, Paul Eggert <eggert@cs.ucla.edu> wrote:
McMurdo is a special case of a more general problem, which is the representation of locations while uninhabited. *Every* location in the tz database was uninhabited at *some* point, and the tz database does not attempt to systematically record the details of when a location was inhabited and when it wasn't, as that's outside the scope (and the values are typically unknown anyway). In practice I've used "zzz" entries for uninhabited locations only when the database format *forced* a value. When it doesn't force a value I haven't worried about it, and I'd rather not start worrying about it now.
There's an amusing instance of the opposite problem in Pacific/Johnston, which was inhabited when we created the tz database but became uninhabited in 2004. Its entry cheerily says "We're just like Honolulu!", and really, that's OK: worrying about the discrepancy would be more trouble for everybody than it's worth.
Stephen Colebourne <scolebourne@joda.org> writes:
We would not and should not create an ID for an uninhabited location, but where somewhere is or was inhabited we should make best efforts to define accurate data. The new McMurdo data is clearly not accurate prior to 1956.
There is no such thing as local time in McMurdo prior to 1956. There is no standard for accuracy; the entire concept of accuracy of such a thing is meaningless. Local time is not a physical property. It's something created by humans who make shared rules about how to set their clocks, and in the absence of human presence, it doesn't exist. Local time in McMurdo prior to its habitation is undefined. To use a Java analogy, you're doing the equivalent of complaining that finalize() isn't running at the point in your program where you expected it to and where it ran in a previous release of the JVM. You're getting about as much sympathy here as you'd get with that plea in a Java community. As with any situation with undefined inputs, the output is basically at the discretion of the software, and returning either an error or some reasonably convenient answer are both standard approaches. Personally, I like the idea of returning an error, since I don't like undefined inputs resulting in apparently accurate outputs with no error. But, historically, the code has always returned some arbitrary but vaguely reasonable response (usually either a blind backwards-projection of current rules or whatever was the prevailing time standard in some reasonably nearby location) instead of producing an error, and there's a backwards compatibility challenge with changing that behavior to produce errors.
The key problem with the change for data consumers is the fact that McMurdo was uninhabited in the 1930s is *external* information, that an application would now need to *separately* know in order to get the correct result for McMurdo.
There's no such thing as a correct result for McMurdo in the 1930s because the question is not well-formed. The application cannot get something that doesn't exist.
The problem I have is that I'm no longer sure I can trust tzdb to safely be the guardian of the limited pre-1970 data which it has always possessed and which Java has long used. I will be talking to Oracle people this week to discuss what options we have for Java probably requiring manual workarounds of the damaged data. <shakes head in despair>
I once again encourage you to start your own separate project. I think that would make quite a few people much happier, including you. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>
On Sep 21, 2013, at 10:51 PM, Russ Allbery <rra@stanford.edu> wrote:
As with any situation with undefined inputs, the output is basically at the discretion of the software, and returning either an error or some reasonably convenient answer are both standard approaches.
I.e., there's limits to the sympathy you're likely to get for a complaint that the software and database changed from returning one technically-incorrect answer to a different technically-incorrect answer. (I suppose that one might get more sympathy for that than for a complaint that it changed from giving a technically-incorrect answer to a technically-correct answer.) Anybody who depends on the tzdb to give a technically correct answer for times arbitrarily far back in the past is pretty much *guaranteed* to be disappointed. Anybody who depends on the tzdb to give a technically correct answer for times subsequent to the official legal establishment of some form of standard time as civil time had better be prepared to be disappointed unless they're willing to put forth the effort to find officially-supported information about civil time in the location(s) about which they care and are willing to live with the results being incorrect until that information is incorporated into whatever version of the tzdb they use (whether that's the official version or their own privately-patched version). Anybody whose goal is to have their APIs return *an* answer for all times, regardless of whether it's technically correct or not, shouldn't worry that much about the accuracy of the tzdb.
Personally, I like the idea of returning an error, since I don't like undefined inputs resulting in apparently accurate outputs with no error. But, historically, the code has always returned some arbitrary but vaguely reasonable response (usually either a blind backwards-projection of current rules or whatever was the prevailing time standard in some reasonably nearby location) instead of producing an error, and there's a backwards compatibility challenge with changing that behavior to produce errors.
+1.
The key problem with the change for data consumers is the fact that McMurdo was uninhabited in the 1930s is *external* information, that an application would now need to *separately* know in order to get the correct result for McMurdo.
There's no such thing as a correct result for McMurdo in the 1930s because the question is not well-formed. The application cannot get something that doesn't exist.
+1.
I completely agree with the analysis of Guy Harris and would add following notice for participants who are not quite firm in Java APIs. Up to now standard Java-APIs like the class java.util.GregorianCalendar force the users to apply timezone calculations even in use cases where it is not appropriate at all. Example: Calculation of age differences of living persons. While it would be totally okay to base such a calculation on purely julian days without even considering timezone offsets the sad practise is that users have to use a timezone dependent data type to do such calculations. So they often implicitly apply tz calculations and are therefore strongly dependent on accuracy of tz data even pre 1970. Looking at this background it is surely understandable that some java users are now so concerned about newly publicized changes of tz data where they in former times never bothered about it but just took the tz data for granted or didn't even see involved tz calculations in their own software. Well, the new JSR-310-Date-And-Time-API (S. Colebourne is one of the project leaders) has finally introduced alternative types like LocalDate which is independent of timezone data. That is a huge improvement. But unfortunately JSR-310 also continues the traditional way to return *an* answer for ALL times regarding to timezone calculations i.e. does not have a concept of limited validity of such requests - see the new java class ZonedDateTime. Furthermore, the old timezone dependent types in Java will still continue to exist (probably for ever) and are not even declared as deprecated - a huge accident. But for all this stuff, the tzdb itself is not responsible for. Summarizing this I get the impression that the whole discussion here is mainly because external api problems are projected into the tz mailing list. But I think it would be better for external users of tzdb like api designers to think for example about new validity concepts when formulating requests of tz data. And this mailing list can then simply be limited to traditional timekeeping tasks and timezone research. Just my 2 coins. Am 22.09.2013 09:30, schrieb Guy Harris:
On Sep 21, 2013, at 10:51 PM, Russ Allbery <rra@stanford.edu> wrote:
As with any situation with undefined inputs, the output is basically at the discretion of the software, and returning either an error or some reasonably convenient answer are both standard approaches. I.e., there's limits to the sympathy you're likely to get for a complaint that the software and database changed from returning one technically-incorrect answer to a different technically-incorrect answer. (I suppose that one might get more sympathy for that than for a complaint that it changed from giving a technically-incorrect answer to a technically-correct answer.)
Anybody who depends on the tzdb to give a technically correct answer for times arbitrarily far back in the past is pretty much *guaranteed* to be disappointed.
Anybody who depends on the tzdb to give a technically correct answer for times subsequent to the official legal establishment of some form of standard time as civil time had better be prepared to be disappointed unless they're willing to put forth the effort to find officially-supported information about civil time in the location(s) about which they care and are willing to live with the results being incorrect until that information is incorporated into whatever version of the tzdb they use (whether that's the official version or their own privately-patched version).
Anybody whose goal is to have their APIs return *an* answer for all times, regardless of whether it's technically correct or not, shouldn't worry that much about the accuracy of the tzdb.
Personally, I like the idea of returning an error, since I don't like undefined inputs resulting in apparently accurate outputs with no error. But, historically, the code has always returned some arbitrary but vaguely reasonable response (usually either a blind backwards-projection of current rules or whatever was the prevailing time standard in some reasonably nearby location) instead of producing an error, and there's a backwards compatibility challenge with changing that behavior to produce errors. +1.
The key problem with the change for data consumers is the fact that McMurdo was uninhabited in the 1930s is *external* information, that an application would now need to *separately* know in order to get the correct result for McMurdo. There's no such thing as a correct result for McMurdo in the 1930s because the question is not well-formed. The application cannot get something that doesn't exist. +1.
Meno Hochschild wrote:
Summarizing this I get the impression that the whole discussion here is mainly because external api problems are projected into the tz mailing list. But I think it would be better for external users of tzdb like api designers to think for example about new validity concepts when formulating requests of tz data. And this mailing list can then simply be limited to traditional timekeeping tasks and timezone research. Just my 2 coins.
A similar situation has now been created in the PHP API, which has also been switched to using TZ as the 'bible' when it comes to DST information. So the above statement applies ... except that the TZ data needs to return 'invalid' when a request is made that it can not process. So that we can then revert to an alternate lookup. Alternatively one simply assumes that anything prior to 1970 is always wrong, and lookup an alternate database anyway? It may well be that external API's were wrong in adopting TZ as their database for DST data if it was never going to support accurate historic data. The question is what should be used instead since SOMETHING is required to provide that data now that API's have switched to it properly supporting DST. McMurdo is a clean example of where pruning pre-1970 data is losing perfectly valid and auditable data and other areas which we know that the historic data diverges but don't have accurate material for is equally wrong returning data that we know is simply an alternate guess. TZ needs to be honest and pre1970 lookups are just as valid as post? Even if a lot smaller number of hits require that data. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
On Sep 22, 2013, at 9:57 PM, Lester Caine <lester@lsces.co.uk> wrote:
A similar situation has now been created in the PHP API, which has also been switched to using TZ as the 'bible' when it comes to DST information. So the above statement applies ... except that the TZ data needs to return 'invalid' when a request is made that it can not process.
Such as any request for information given a date/time prior to the establishment of some form of standard time in the specified tzdb zone.
So that we can then revert to an alternate lookup.
There do not appear to be any APIs in http://www.php.net/manual/en/refs.calendar.php that take a longitude (and perhaps latitude) as an argument, so there's nothing those APIs can do to convert times prior to the establishment of standard time. I think it's a good idea for the tzdb to, if possible, contain the date of the initial establishment of standard time (which I'd define as a time scheme, specified in law, that applies to *all* of the tzdb zone in question, so this rules out, among other things, "use local mean time at the specific location" as a scheme) for each tzdb zone. Anybody who wants to handle times prior to that time is on their own, at least with respect to the tzdb.
A concrete example: "Exactly two years before McMurdo was established, blaa blaa seems to have happened there." It is reasonable to have that two years before return something reasonable in local time too. I think the 2013e version is more reasonable output. 2013d $ TZ=Antarctica/McMurdo date Mon Sep 23 18:56:23 NZST 2013 $ TZ=Antarctica/McMurdo date -d '59 years ago' Thu Sep 23 18:56:24 zzz 1954 2013e $ TZ=Antarctica/McMurdo date Mon Sep 23 18:57:52 NZST 2013 $ TZ=Antarctica/McMurdo date -d '59 years ago' Thu Sep 23 18:57:55 NZST 1954 Regards, Jaakko
On Sep 22, 2013, at 9:57 PM, Lester Caine <lester@lsces.co.uk> wrote:
A similar situation has now been created in the PHP API, which has also been switched to using TZ as the 'bible' when it comes to DST information. So the above statement applies ... except that the TZ data needs to return 'invalid' when a request is made that it can not process.
On Sun, 22 Sep 2013, Guy Harris wrote:
Such as any request for information given a date/time prior to the establishment of some form of standard time in the specified tzdb zone.
-- Foreca Ltd Jaakko.Hyvatti@foreca.com Keilaranta 1, FI-02150 Espoo, Finland http://www.foreca.com
On Sep 23, 2013, at 12:01 AM, Jaakko Hyvätti <jaakko.hyvatti@foreca.com> wrote:
A concrete example: "Exactly two years before McMurdo was established, blaa blaa seems to have happened there." It is reasonable to have that two years before return something reasonable in local time too.
"Local time" as in "local mean time" (in which case you need to know the longitude of "there" - and the tzdb is neither necessary nor sufficient) or "local time" as in "local time for some zone in which civil time is established" (in which case you need to know what zone that is)?
Guy Harris wrote:
On Sep 22, 2013, at 9:57 PM, Lester Caine <lester@lsces.co.uk> wrote:
A similar situation has now been created in the PHP API, which has also been switched to using TZ as the 'bible' when it comes to DST information. So the above statement applies ... except that the TZ data needs to return 'invalid' when a request is made that it can not process.
Such as any request for information given a date/time prior to the establishment of some form of standard time in the specified tzdb zone.
So that we can then revert to an alternate lookup.
There do not appear to be any APIs in
http://www.php.net/manual/en/refs.calendar.php
that take a longitude (and perhaps latitude) as an argument, so there's nothing those APIs can do to convert times prior to the establishment of standard time.
DateTimeZone was added in PHP5.2 ... There were problems when it was introduced since the original attempt trod on the toes of older tools which did things differently. DateTimeZone was introduced using the TZ data as it's reference source, but currently there is no warning that while DateTime transparently goes back in time, and it REQUIRES that a timezone is set, the data returned prior to 1970 may not be appropriate. Obviously this is not a problem that TZ would accept, but it demonstrates that end users - such as myself! - would have no idea that there WAS a problem until we dug deeper. PHP's documentation needs an update, but we still have the problem of plugging this hole going forward? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
On 23 September 2013 03:48, Meno Hochschild <mhochschild@gmx.de> wrote:
Up to now standard Java-APIs like the class java.util.GregorianCalendar force the users to apply timezone calculations even in use cases where it is not appropriate at all. Example: Calculation of age differences of living persons. While it would be totally okay to base such a calculation on purely julian days without even considering timezone offsets the sad practise is that users have to use a timezone dependent data type to do such calculations. So they often implicitly apply tz calculations and are therefore strongly dependent on accuracy of tz data even pre 1970. Looking at this background it is surely understandable that some java users are now so concerned about newly publicized changes of tz data where they in former times never bothered about it but just took the tz data for granted or didn't even see involved tz calculations in their own software. Well, the new JSR-310-Date-And-Time-API (S. Colebourne is one of the project leaders) has finally introduced alternative types like LocalDate which is independent of timezone data. That is a huge improvement. But unfortunately JSR-310 also continues the traditional way to return *an* answer for ALL times regarding to timezone calculations i.e. does not have a concept of limited validity of such requests - see the new java class ZonedDateTime. Furthermore, the old timezone dependent types in Java will still continue to exist (probably for ever) and are not even declared as deprecated - a huge accident. But for all this stuff, the tzdb itself is not responsible for.
Exactly, recent changes affect the data seen by every Java developer via GregorianCalendar. For JSR-310, there are two separate issues mixed here 1) should a result should be returned using a time-zone for ancient times (pre 1800). ZonedDateTime and GregorianCalendar do so because the only alternative is an exception and that would cause more pain to a typical developer than returning some vague result based on LMT or similar. 2) should a result should be returned for recent history (1800 - 1970). In this case, plenty of developers will be querying local time, for birth dates, historical documents, contracts etc, and they would expect a reasonable answer. It is clear that tzdb data is being changed that affect this period, and thus affecting users. Stephen
Guy/Paul/Russ, I thought I'd explained it pretty clearly why recent this change is problematic. All the replies miss the point. The tzdb is not some kind of theoretical project, it is used directly as input data by millions of developers. Those developers have never previously has to cross-reference tzdb data with any other data (ie. when somewhere was inhabited) to get a reasonable answer to the question "what is local time in 1930" (reasonable not accurate). Describing the input as malformed is unhelpful to the debate, because a developer just using the data has no idea from the tzdb that this input *is* malformed. On 22 September 2013 08:43, Guy Harris <guy@alum.mit.edu> wrote:
If you care about getting the *right* answer, the only concerns should be about cases where the current pre-1970 data is known to be correct and the changes will eliminate correct data. Discarding data not known to be correct could just replace one incorrect answer with another. If you only care about getting *an* answer, it obviously doesn't matter.
I don't just want *an* answer, and I'm not obsessed by the *right* answer, my problem here is a *wrong* answer (any clock change including DST before 1956 is wrong). Such wrong data implies human activity when there was none. There are a variety of answers I would accept before 1956, including LMT and UTC. But changes in offset I cannot. My problem more generally is tinkering, making changes from one guesswork answer to another guesswork answer, Note that if you read the above properly, there is a solution here. It is absolutely fine to say that 1930 is bad input data for McMurdo. But it is absolutely essential for the tzdb to provide that data. What is completely objectionable is to say that we need some other source of data to find that out. Why? Firstly, because the tzdb data has previously been complete within itself, now it is not. Secondly, because there is no such data source that maps TZDB to habitation/accuracy dates (since 1800). On 22 September 2013 07:40, Paul Eggert <eggert@cs.ucla.edu> wrote:
More generally, the tz database isn't designed to answer questions about which parts of the Earth were inhabited when, and it's implausible that actual users would use it that way
I'm not asking for that!!! I'm simply asking the tzdb to provide reasonable local time information since the start of global offset fixing for the IDs it provides, keeping the data stable if entries are guesswork. From my perspective as a data consumer, that is what the data has always provided. I do wish there was a little more acceptance of how the data in tzdb is actually being used. I'm your customer, and I, and those downstream of me, only see the data, not the rationale/discussion/justification/theory made here. Focus solely on the data visible downstream, and my problem should be obvious. Stephen On 22 September 2013 06:51, Russ Allbery <rra@stanford.edu> wrote:
Stephen Colebourne <scolebourne@joda.org> writes:
We would not and should not create an ID for an uninhabited location, but where somewhere is or was inhabited we should make best efforts to define accurate data. The new McMurdo data is clearly not accurate prior to 1956.
There is no such thing as local time in McMurdo prior to 1956. There is no standard for accuracy; the entire concept of accuracy of such a thing is meaningless. Local time is not a physical property. It's something created by humans who make shared rules about how to set their clocks, and in the absence of human presence, it doesn't exist. Local time in McMurdo prior to its habitation is undefined.
To use a Java analogy, you're doing the equivalent of complaining that finalize() isn't running at the point in your program where you expected it to and where it ran in a previous release of the JVM. You're getting about as much sympathy here as you'd get with that plea in a Java community.
As with any situation with undefined inputs, the output is basically at the discretion of the software, and returning either an error or some reasonably convenient answer are both standard approaches. Personally, I like the idea of returning an error, since I don't like undefined inputs resulting in apparently accurate outputs with no error. But, historically, the code has always returned some arbitrary but vaguely reasonable response (usually either a blind backwards-projection of current rules or whatever was the prevailing time standard in some reasonably nearby location) instead of producing an error, and there's a backwards compatibility challenge with changing that behavior to produce errors.
The key problem with the change for data consumers is the fact that McMurdo was uninhabited in the 1930s is *external* information, that an application would now need to *separately* know in order to get the correct result for McMurdo.
There's no such thing as a correct result for McMurdo in the 1930s because the question is not well-formed. The application cannot get something that doesn't exist.
The problem I have is that I'm no longer sure I can trust tzdb to safely be the guardian of the limited pre-1970 data which it has always possessed and which Java has long used. I will be talking to Oracle people this week to discuss what options we have for Java probably requiring manual workarounds of the damaged data. <shakes head in despair>
I once again encourage you to start your own separate project. I think that would make quite a few people much happier, including you.
-- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>
Stephen Colebourne <scolebourne@joda.org> writes:
Guy/Paul/Russ, I thought I'd explained it pretty clearly why recent this change is problematic. All the replies miss the point.
I got your point just fine. I just think you're wrong. It's quite possible for me to have a complete understanding of your position and continue to disagree with you, as is the case here.
The tzdb is not some kind of theoretical project, it is used directly as input data by millions of developers. Those developers have never previously has to cross-reference tzdb data with any other data (ie. when somewhere was inhabited) to get a reasonable answer to the question "what is local time in 1930" (reasonable not accurate).
And this has not changed. "What is local time in 1930" still returns a reasonable but not accurate response for McMurdo, insofar as such a thing exists given that the question is undefined. It's just a different reasonable but not accurate answer than it used to return, similar to how finalize() now runs at a different but still reasonable point in a Java program with a new version of the JVM.
Describing the input as malformed is unhelpful to the debate, because a developer just using the data has no idea from the tzdb that this input *is* malformed.
I completely agree that this would be a nice thing to fix. That was my point about preferring to return errors for undefined inputs. However, it's very difficult to do this for exactly the same sorts of reasons as why finalize() is still part of the Java language despite the fact that almost every use of it is wrong. This software and database has existed for many years, and its behavior has always been to return a reasonable but inaccurate response for dates in the past prior to standardized time. If we had a time machine to go back and change the original behavior to cause localtime() to return an error for such inputs, that would probably be, overall, a better situation. However, we don't. The current reality is that innumerable programs exist in the wild that will find localtime() failing to be highly surprising. Yes, POSIX and other standards say that it *can* fail, but in practice it *doesn't* fail, which means that a lot of software does not handle the failure case at all. Making this change would probably involve creating a new interface (not only in C but in the other languages that have relied on the historic behavior of the API) that can now return errors for undefined questions, and then a lot of data collection about where the boundaries of undefined should be. It's quite a large project. I do think the world would be a better place if someone completed that project, but I also don't think that it's that horribly important. Computing has survived for many years with the current behavior.
I don't just want *an* answer, and I'm not obsessed by the *right* answer, my problem here is a *wrong* answer (any clock change including DST before 1956 is wrong).
I don't see why you think that a DST shift somehow crosses some line into making the database responses unreasonable.
Such wrong data implies human activity when there was none.
I don't see any such implication. It's a simple backward-projection of current rules into the past. If anything, it's your proposal that implies human activity and makes the clocks less accurate, since it implies that at some point someone made a conscious decision to introduce DST to a location that previously had non-DST local time. But that's not what happened. Instead, people brought the existing DST rules (and time zone) of their staging base with them, and if you had asked those original inhabitants about times prior to their arrival, they would have projected those rules backwards, just like the database now does. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>
On Sep 22, 2013, at 5:17 AM, Stephen Colebourne <scolebourne@joda.org> wrote:
On 22 September 2013 08:43, Guy Harris <guy@alum.mit.edu> wrote:
If you care about getting the *right* answer, the only concerns should be about cases where the current pre-1970 data is known to be correct and the changes will eliminate correct data. Discarding data not known to be correct could just replace one incorrect answer with another. If you only care about getting *an* answer, it obviously doesn't matter.
I don't just want *an* answer, and I'm not obsessed by the *right* answer, my problem here is a *wrong* answer (any clock change including DST before 1956 is wrong).
OK, so, in that particular case, given that it would be misleading to say DST was in effect or that the time was some form of New Zealand time prior to the establishment of the base, I'd be inclined to say that the old entry for McMurdo Sound should perhaps be re-instated, complete with "zzz". As for the general case, the "Scope of the tz database" section of the Theory file says The tz database attempts to record the history and predicted future of all computer-based clocks that track civil time. The Wikipedia article for "civil time" says In modern usage, civil time refers to statutory time scales designated by civilian authorities, or to local time indicated by clocks. Modern civil time is generally standard timein a time zone at a fixed offset from Coordinated Universal Time (UTC) or from Greenwich Mean Time (GMT), possibly adjusted by daylight saving time during part of the year. UTC is calculated by reference to atomic clocks, and was adopted in 1972. Older systems use telescope observations. In traditional astronomical usage, civil time was mean solar time reckoned from midnight. Before 1925, the astronomical time 00:00:00 meant noon, twelve hours after the civil time 00:00:00 which meant midnight. HM Nautical Almanac Office in the United Kingdom used Greenwich Mean Time (GMT) for both conventions, leading to ambiguity[clarification needed], whereas the Nautical Almanac Office at the United States Naval Observatory used GMT for the pre-1925 convention and Greenwich Civil Time (GCT) for the post-1924 convention until 1952. In 1928, the International Astronomical Union introduced the term Universal Time for GMT beginning at midnight, but the two Nautical Almanac Offices did not accept it until 1952. which doesn't give a definitive answer as to whether "civil time" has to refer to "statutory time scales designated by civilian authorities". My inclination would be to explicitly state that it *does* refer to statutory time scales, other than local mean time, designated by a government authority, i.e. that, prior to the establishment in law of a time scale other than "use local mean time for *your* location", the tzdb doesn't attempt to record anything - it only records the point in time at which that standardized time scale was first established. I'd then have the first Zone line for a tzdb zone give, as the Until column, the point in time at which the standardized time scale was first established. I don't strongly care what time offset or zone abbreviation is chosen for them; my personal choice would be to use the initially established standardized time offset and time zone abbreviation (so as to project standardized time infinitely far back into the past).
Stephen Colebourne wrote:
I'm simply asking the tzdb to provide reasonable local time information since the start of global offset fixing for the IDs it provides
That's being done for McMurdo -- as far as we know, its entry is accurate for all time stamps used since it was founded, and (as Russ Allbery pointed out) for time stamps before McMurdo was founded and for which localtime is undefined, it's reasonable to interpret the new entry as being more "accurate" than the old.
keeping the data stable if entries are guesswork.
That's not a reasonable request even for near-future time stamps that end users actually care about, much less for these long-ago time stamps for uninhabited locations, where they don't care. We're about to change near-future data (even though our changes are guesswork) for Tocantins and for Jordan. This is not because we *like* making guesses; it's because guessing is the best we can *do*, the system does not allow us to refuse to make these guesses, and better guesses are preferable to worse guesses.
On Sep 22, 2013, at 3:54 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Stephen Colebourne wrote:
I'm simply asking the tzdb to provide reasonable local time information since the start of global offset fixing for the IDs it provides
That's being done for McMurdo -- as far as we know, its entry is accurate for all time stamps used since it was founded, and (as Russ Allbery pointed out) for time stamps before McMurdo was founded and for which localtime is undefined, it's reasonable to interpret the new entry as being more "accurate" than the old.
The old entry said that Something Changed in 1956; the new entry says "just like Auckland". If nobody ever kept standardized time there before the station was established, then I wouldn't say the new entry is *more* accurate, as the answer to "what was the UTC offset and DST rules, if any, for McMurdo Sound before the station was established?" would, as I see it, be "mu!", so all possible answers other than failing are equally inaccurate. If there were people there who *did* keep standardized time between 1868-11-02 and {whatever the appropriate date is for the establishment of the station}, *and* they kept New Zealand time, then I *would* say the new entry is more accurate.
Recent arguments on this list mostly sound like arguments in semantics. But I believe the primary concern many of us have is that the data provided by the database has been changed unnecessarily, forcing rework and re-interpretation for those that use the database was other than assumed by the maintainers. In many cases this will mean using data from other sources, when the data could easily have been mantained in a single location.
David Patte ₯ <dpatte@relativedata.com> writes:
But I believe the primary concern many of us have is that the data provided by the database has been changed unnecessarily, forcing rework and re-interpretation for those that use the database was other than assumed by the maintainers. In many cases this will mean using data from other sources, when the data could easily have been mantained in a single location.
Many people have asked repeatedly for a specific example of a problem, any problem, that occurred or would occur in a real-world program due to these changes. Those requests have been to no avail; all that people have posted in return are opinions, theoretical discussions, and misunderstandings of the database or its maintenance practices. It sounds like you have a concrete example at your fingertips, so please enlighten us about the details! In particular, what would be extremely useful is a specific case involving real-world code where one of the changes in 2013e caused or will cause an actual program to break or misbehave. Please include the specific use case, why the program was using that tzid, what the program did prior to the data change, what the program did after the data change, and why you believe the new behavior is incorrect. And please describe exactly what change to code (rework) you had to do or believe you have to do in order to remedy this problem. Contentless assertions that this has happened or will happen are neither useful nor actionable. Real-world test cases are both, and are considerably more persuasive. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>
On 23 September 2013 02:27, David Patte ₯ <dpatte@relativedata.com> wrote:
Recent arguments on this list mostly sound like arguments in semantics.
But I believe the primary concern many of us have is that the data provided by the database has been changed unnecessarily, forcing rework and re-interpretation for those that use the database was other than assumed by the maintainers. In many cases this will mean using data from other sources, when the data could easily have been mantained in a single location.
+1
From my perspective as a consumer of the data, the tzdb now provides worse data than it did before.The rationale for making the changes at all is very weak. The changes made pretty arbitrary. (eg, fixes are aplied to Switzerland in the 1940s, yet McMurdo now contains nonsense for the 1940s). The follow up emails jump through huge hoops and weasel words to try and justify these changes because they are not enhancements, just change for changes sake.
The tzdb should be a simple project in data terms - future changes and enhancements to the past only. These ridiculous cleanups are a huge net negative. Stephen
Guy Harris wrote:
If there were people there who *did* keep standardized time between 1868-11-02 and {whatever the appropriate date is for the establishment of the station}, *and* they kept New Zealand time, then I *would* say the new entry is more accurate.
Certainly people kept standardized time at that location before McMurdo was officially established, and even before the arbitrary cutoff of 1956-01-01 00:00:00 UT that was in the 2013d entry. Most likely the 1955 advance party kept New Zealand time, so in that (limited) sense the 2013e entry is more accurate than the 2013d was. (Not that any of this matters to any tz end users....)
Stephen Colebourne wrote:
An uninhabited location would effectively be on LMT
That's not been the usual practice with the tz database. LMT has been a standin for "there were clocks here, and they were on local mean time or solar time or something like that, and users who care about the details are barking up the wrong tree". It's not just the LMT offsets; even the transition times from LMT to standard time are mostly standins that should not be taken seriously, for reasons already discussed. And even if we were to change the tz database to agree with this new interpretation, the old McMurdo data would still be incorrect, because it did not say that McMurdo was on LMT while uninhabited. More generally, the tz database isn't designed to answer questions about which parts of the Earth were inhabited when, and it's implausible that actual users would use it that way, not merely because it's an odd question for them to ask, but also because they'd get typically get the wrong answer regardless of which version of the tz data they used.
On Sep 21, 2013, at 9:47 PM, Stephen Colebourne <scolebourne@joda.org> wrote:
The problem I have is that I'm no longer sure I can trust tzdb to safely be the guardian of the limited pre-1970 data which it has always possessed and which Java has long used. I will be talking to Oracle people this week to discuss what options we have for Java probably requiring manual workarounds of the damaged data. <shakes head in despair>
If you care about getting the *right* answer, the only concerns should be about cases where the current pre-1970 data is known to be correct and the changes will eliminate correct data. Discarding data not known to be correct could just replace one incorrect answer with another. If you only care about getting *an* answer, it obviously doesn't matter.
(BTW, the "everywhere was uninhabited" point does not make sense. An uninhabited location would effectively be on LMT,
As per my +1 to Russ Allbery's mail, an uninhabited location is on "mu!" time, unless there are living beings at that location that keep time, or there are humans who keep track of time at that location even though there's nobody there. (If you're trying to, for example, determine where the Sun is in the sky, or something such as that, at that location at a given time, the entire tzdb is irrelevant.)
Only locations like McMurdo change from unihabited to inhabited at a known date, and LMT should apply before that date)
As far as I'm concerned, the tzdb data needn't say anything about time in a given tzdb zone prior to the establishment of standard time for that zone, and the tz code is entitled to return, for that zone, whatever it wants for times before the establishment of standard time in that zone; my inclination would be to project standard time, without DST rules, back into -infinity. Other code using the tzdb data could choose what to do; if, for example, that code has an API that's passed a time and a longitude and latitude, they could look up the tzdb zone containing that location and, if the time is prior to the establishment of standard time as official civil time in the region covered by that zone, calculate LMT based on the longitude.
Guy Harris wrote:
Other code using the tzdb data could choose what to do; if, for example, that code has an API that's passed a time and a longitude and latitude, they could look up the tzdb zone containing that location and, if the time is prior to the establishment of standard time as official civil time in the region covered by that zone, calculate LMT based on the longitude.
Exactly what I've been asking as the base line .... http://www.openstreetmap.org/browse/node/515065315 is just missing a start time and a link to a timezone ... and you can even get it as an XML packet :) -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Lester Caine wrote:
Guy Harris wrote:
Other code using the tzdb data could choose what to do; if, for example, that code has an API that's passed a time and a longitude and latitude, they could look up the tzdb zone containing that location and, if the time is prior to the establishment of standard time as official civil time in the region covered by that zone, calculate LMT based on the longitude.
Exactly what I've been asking as the base line .... http://www.openstreetmap.org/browse/node/515065315 is just missing a start time and a link to a timezone ... and you can even get it as an XML packet :)
Which has now been addressed ... but I need to know just WHAT timezone data link to add :) -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
participants (8)
-
David Patte ₯ -
Guy Harris -
Jaakko Hyvätti -
Lester Caine -
Meno Hochschild -
Paul Eggert -
Russ Allbery -
Stephen Colebourne