Unacceptable recent changes [wasRe: [PATCH 2/4] Move obsolescent Americas entries to 'backward'.]
On 26 August 2013 18:16, Paul Eggert <eggert@cs.ucla.edu> wrote:
# Shanks & Pottenger say that Atikokan has agreed with Rainy River # ever since standard time was introduced, but the information from # McKinnon sounds more authoritative. For now, assume that Atikokan # switched to EST immediately after WWII era daylight saving time # ended. This matches the old (less-populous) America/Coral_Harbour # entry since our cutoff date of 1970, so we can move -# America/Coral_Harbour to the 'backward' file. +# America/Coral_Harbour to the 'backward' file. And Atikokan itself +# is the same as America/Panama since 1970, so we can move +# America/Atikokan to the 'backward' file as well. -Zone America/Atikokan -6:06:28 - LMT 1895 - -6:00 Canada C%sT 1940 Sep 29 - -6:00 1:00 CDT 1942 Feb 9 2:00s - -6:00 Canada C%sT 1945 Sep 30 2:00 - -5:00 - EST
So if I understand correctly, all changes to Atikokan before 1970 have just been lost? And they are supposed to use Panama from now on? Here is what JSR-310 can tell us about the two sets of rules: America/Atikokan Transition[Gap at 1895-01-01T00:00-06:06:28 to -06:00] Transition[Gap at 1918-04-14T02:00-06:00 to -05:00] Transition[Overlap at 1918-10-27T02:00-05:00 to -06:00] Transition[Gap at 1940-09-29T00:00-06:00 to -05:00] America/Panama Transition[Overlap at 1890-01-01T00:00-05:18:08 to -05:19:36] Transition[Gap at 1908-04-22T00:00-05:19:36 to -05:00] Merging these is clearly utterly RIDICULOUS. They contain completely different data, and the 1970 argument simply won't wash. Many users will have stored data in databases or other long term storage that refers to these time-zones being made obsolete. When they read the data back, they will get completely different times for historic events - utterly unacceptable. More broadly, no one can pretend that the IDs we use are meaningless. Whether they should or should not have been created this way is not relevent - what matters is that these IDs are meningful and getting rid of them would be a disaster. The IDs are clearly localized, and they have a huge and longstanding usage globally. Panama is nowhere near Atikokan, nor would any reasonable user connect the two. Paul, you need to STOP NOW. You need to reverse the vast majority of your recent changes. Focus on actual changes happening in 2013. Keep the rest of the database totally and utterly stable. And just accept that there is a measure of politics incumbent in time-zones. NONE of these changes should be happening. STABILITY is vital. Stephen
I agree. While the primary focus of the TZ effort is complete coverage of dates from 1970 onward, it has a large body of knowledge relating to older dates. The fact that the project has no commitment to make that older information 100% complete is not even remotely a valid argument for removing it outright. paul On Aug 28, 2013, at 12:20 PM, Stephen Colebourne <scolebourne@joda.org> wrote:
On 26 August 2013 18:16, Paul Eggert <eggert@cs.ucla.edu> wrote:
# Shanks & Pottenger say that Atikokan has agreed with Rainy River # ever since standard time was introduced, but the information from # McKinnon sounds more authoritative. For now, assume that Atikokan # switched to EST immediately after WWII era daylight saving time # ended. This matches the old (less-populous) America/Coral_Harbour # entry since our cutoff date of 1970, so we can move -# America/Coral_Harbour to the 'backward' file. +# America/Coral_Harbour to the 'backward' file. And Atikokan itself +# is the same as America/Panama since 1970, so we can move +# America/Atikokan to the 'backward' file as well. -Zone America/Atikokan -6:06:28 - LMT 1895 - -6:00 Canada C%sT 1940 Sep 29 - -6:00 1:00 CDT 1942 Feb 9 2:00s - -6:00 Canada C%sT 1945 Sep 30 2:00 - -5:00 - EST
So if I understand correctly, all changes to Atikokan before 1970 have just been lost?
And they are supposed to use Panama from now on?
Here is what JSR-310 can tell us about the two sets of rules:
America/Atikokan Transition[Gap at 1895-01-01T00:00-06:06:28 to -06:00] Transition[Gap at 1918-04-14T02:00-06:00 to -05:00] Transition[Overlap at 1918-10-27T02:00-05:00 to -06:00] Transition[Gap at 1940-09-29T00:00-06:00 to -05:00]
America/Panama Transition[Overlap at 1890-01-01T00:00-05:18:08 to -05:19:36] Transition[Gap at 1908-04-22T00:00-05:19:36 to -05:00]
Merging these is clearly utterly RIDICULOUS.
They contain completely different data, and the 1970 argument simply won't wash. Many users will have stored data in databases or other long term storage that refers to these time-zones being made obsolete. When they read the data back, they will get completely different times for historic events - utterly unacceptable.
More broadly, no one can pretend that the IDs we use are meaningless. Whether they should or should not have been created this way is not relevent - what matters is that these IDs are meningful and getting rid of them would be a disaster. The IDs are clearly localized, and they have a huge and longstanding usage globally. Panama is nowhere near Atikokan, nor would any reasonable user connect the two.
Paul, you need to STOP NOW.
You need to reverse the vast majority of your recent changes. Focus on actual changes happening in 2013. Keep the rest of the database totally and utterly stable. And just accept that there is a measure of politics incumbent in time-zones.
NONE of these changes should be happening. STABILITY is vital.
Stephen
On Wed, Aug 28, 2013 at 12:20 PM, Stephen Colebourne <scolebourne@joda.org>wrote:
They contain completely different data, and the 1970 argument simply won't wash. Many users will have stored data in databases or other long term storage that refers to these time-zones being made obsolete. When they read the data back, they will get completely different times for historic events - utterly unacceptable.
I second this -- we actively use the database for *all* dates, including those pre-1970 because historical financial information extends back to 1895 in some cases. There is no other database that captures this data accurately. If the IANA tz database is shrugging off all the work that went into making this database what it is, we'll have no choice but to fork it. -Andrew
On 08/28/13 09:20, Stephen Colebourne wrote:
Many users will have stored data in databases or other long term storage that refers to these time-zones being made obsolete.
The zones are not being made obsolete. TZ=America/Atikokan will still work, and will give the same results as before for post-1970 timestamps, which are the only timestamps in scope for this project. I sincerely doubt that the proposed change will cause any problem in practice. It's simply not the case that "many users" have pre-1970 timestamps for Atikokan and will notice or care about the proposed change. After all, the pre-1970 data for Atikokan was almost surely incorrect, and nobody cared about that either. That being said, I do take the point that we should not discard the old data even if it's wrong or problematic, so I'll work on a further patch that will resurrect it, along the lines already suggested.
On 28 August 2013 17:46, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 08/28/13 09:20, Stephen Colebourne wrote:
Many users will have stored data in databases or other long term storage that refers to these time-zones being made obsolete.
The zones are not being made obsolete. TZ=America/Atikokan will still work, and will give the same results as before for post-1970 timestamps, which are the only timestamps in scope for this project.
Some systems will automatically map the old "backward" forms of IDs to the "correct" new forms. Thus users will see their Atikokan replaced by Panama, something which is clearly ridiculous. The "backward" file is intended for IDs that have been replaced on spelling or city size grounds, not for IDs which you no longer feel like maintaining.
I sincerely doubt that the proposed change will cause any problem in practice. It's simply not the case that "many users" have pre-1970 timestamps for Atikokan and will notice or care about the proposed change. After all, the pre-1970 data for Atikokan was almost surely incorrect, and nobody cared about that either.
Yes, they really do care. By being the world's only real source of TZ data, the database de facto defines what time was around the world pre-1970. Clearly you do not like that data as you are treating it with utter disdain. The pre-1970 data (for many different zones) is encoded in a vast amount of systems around the world, in databases and more. Removal, or otherwise hiding it forces either a fork or your removal. To be clear. Deletion of that data, or making it far harder to access, is unacceptable.
That being said, I do take the point that we should not discard the old data even if it's wrong or problematic, so I'll work on a further patch that will resurrect it, along the lines already suggested.
That isn't good enough. Sorry to be unsubtle, but I think we're approaching vote of no confidence territory here. Your actions are causing severe damage to the database built up over many years, and that cannot be allowed to stand. Stephen
On 08/28/13 09:55, Stephen Colebourne wrote:
the database de facto defines what time was around the world pre-1970
It absolutely does not do that. For the vast majority of pre-1970 history, the tz database's time stamps are simply incorrect. Any attempt to pass the tz database off as the definition of time before 1970 should be unacceptable to anybody who cares about the facts. (And that includes astrologers. :-)
That is not a good reason to destroy what has been correct and working for pre-1970 data. And there is no need to put smilies onto the mention of astrologers. They, like Thomas Shanks, Rique Pottenger and Neil Michelsen, have been the most important compilers of timezone history information, and should be honored within the TZ project, not 'smilied' at. On 28.08.13 19:40, Paul Eggert wrote:
It absolutely does not do that. For the vast majority of pre-1970 history, the tz database's time stamps are simply incorrect. Any attempt to pass the tz database off as the definition of time before 1970 should be unacceptable to anybody who cares about the facts. (And that includes astrologers. :-)
On Wed, Aug 28, 2013, at 15:34, Alois Treindl wrote:
And there is no need to put smilies onto the mention of astrologers.
They, like Thomas Shanks, Rique Pottenger and Neil Michelsen, have been the most important compilers of timezone history information, and should be honored within the TZ project, not 'smilied' at.
Some astrologers are also responsible for a lawsuit that disrupted the maintenance of this project for a while recently, so it's unsurprising to find some hard feelings here.
Alois Treindl <alois@astro.ch> writes:
That is not a good reason to destroy what has been correct and working for pre-1970 data.
The data is not destroyed. There are very good retrospective tools for GIT, digging these things up is easy. Thanks, PM
Petr Machata <pmachata@redhat.com> wrote:
Alois Treindl <alois@astro.ch> writes:
That is not a good reason to destroy what has been correct and working for pre-1970 data.
The data is not destroyed. There are very good retrospective tools for GIT, digging these things up is easy. I don't think this argument works as for timezonedb consumers the data is not available. Only developers and experts can do something with the (unofficial) Got repository.
Derick -- http://derickrethans.nl | http://xdebug.org twitter: @derickr and @xdebug
On 28 August 2013 18:40, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 08/28/13 09:55, Stephen Colebourne wrote:
the database de facto defines what time was around the world pre-1970
It absolutely does not do that. For the vast majority of pre-1970 history, the tz database's time stamps are simply incorrect. Any attempt to pass the tz database off as the definition of time before 1970 should be unacceptable to anybody who cares about the facts.
The point I'm making is that end-users of this data assume it to be stable and reliable. Moreover, the vast majority of those users do not care whether the data pre-1970 is accurate or not. Stable yes, accurate no. What your changes do is change the pre-1970 times for places to be the pre-1970 times of some other location entirely. How can you not see that is unreasonable? I'll say it again, the pre-1970 is used by many users globally. Taking it from something which may or may not be accurate (the LMT is usually accurate) to somewhere elses pre-1970 times (definitely inaccurate, especially the LMT) is nonsense. I say this on behalf of a very large number of people who just use the data and don't ever think about it. Your changes are going to severely damage that data, and that is absolutely unacceptable. To be honest, I'm really surprised that you're attempting to make these changes. Destruction of data is a huge no-no to me. Stability, stability, stability. Stephen
Stephen Colebourne <scolebourne@joda.org> writes:
The point I'm making is that end-users of this data assume it to be stable and reliable. Moreover, the vast majority of those users do not care whether the data pre-1970 is accurate or not. Stable yes, accurate no.
I don't think you can assume stability either. We occasionally revisit old data to correct them. There's some guesswork involved in the way some zones are defined even in relatively recent past, let alone pre-1970, and when better information becomes available, those entries are changed. Thanks, PM
On 29 August 2013 14:53, Petr Machata <pmachata@redhat.com> wrote:
Stephen Colebourne <scolebourne@joda.org> writes:
The point I'm making is that end-users of this data assume it to be stable and reliable. Moreover, the vast majority of those users do not care whether the data pre-1970 is accurate or not. Stable yes, accurate no.
I don't think you can assume stability either. We occasionally revisit old data to correct them. There's some guesswork involved in the way some zones are defined even in relatively recent past, let alone pre-1970, and when better information becomes available, those entries are changed.
That is known and acceptable. It still falls under the "stable" category. What isn't stable is deleting data or changing IDs without really good reason. Stephen
Petr Machata wrote:
Stephen Colebourne<scolebourne@joda.org> writes:
The point I'm making is that end-users of this data assume it to be stable and reliable. Moreover, the vast majority of those users do not care whether the data pre-1970 is accurate or not. Stable yes, accurate no. I don't think you can assume stability either. We occasionally revisit old data to correct them. There's some guesswork involved in the way some zones are defined even in relatively recent past, let alone pre-1970, and when better information becomes available, those entries are changed.
And the 1970 date is purely artificial itself, just being a convenient zero at that point it time. The whole problem here is that both pre and post 1970 data needs to co exist, so why can't it co exist in the one database? When and if more accurate data is established then it gets updated, going forwards or back, but we still need historic timezone data for looking at say 10 years ago, so why should 1969 simply stop working? And if 1969 is wrong then it can be fixed. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Petr Machata wrote:
I don't think you can assume stability either.
That's correct; the tz database evolves, we hope for the better. I've compiled some "attic" data (appended to this email) which makes it clear that we have regularly replaced zones by links during tz maintenance. This practice hasn't caused hardships for users. I plan to propose this attic data, or something like it, more formally as a diff, and this should address the concerns expressed on this list about removing old data. I'm still thinking about good ways to use the attic data. Using it all wouldn't simply revert to version 2013d; instead, it would resurrect old data from a few or even many years ago in the tz database. Although this would be an enhancement to the backward-compatibility facility that we already have in the "backward" file, it'd be unwise to dump all this data into zic willy-nilly, as its reliability is dubious (one of the entries has "????" in the data!) and we've gotten along without the older zone data for many years without problems. So we need a mechanism for conditionally including this data. I like Zefram's suggestion for allowing a multitier structure for the tz database. One way to do that would be to add a cutoff year to the Makefile. It'd default to 1970. People who are interested in older timestamps could decrease the cutoff to (say) 1900; this would cause links to be turned into zones they differ from the existing zones after 1900. Conversely, people who are not interested in (say) pre-2000 timestamps could increase the cutoff to 2000, which would result in smaller database that turns a zone into a link if it's equivalent to another zone after 2000. Another way to filter the data might be by tz version when the zone was turned into a link in the tz database. The attic data below has a Version line to help with this sort of filtering. Both filters could be implemented, and they could be applied in series. # Attic data # This file contains zones that were formerly in the tz database, # but were later removed or replaced by links to other locations. # Entries are sorted by Zone name. Each entry is preceded by the name # of the country that the entry is in, to help identify the location, # along with any other commentary associated with the entry. # Data are also preceded by a Version line, which lists the last version # of the tz database in which the corresponding entry appeared as a zone. # This is intended for use in automated processing that selectively # retrieves data from the attic for backward-compatibility reasons. # Zone NAME GMTOFF RULES FORMAT [UNTIL] # Mali # no longer different from Bamako, but too famous to omit Version 2005k Zone Africa/Timbuktu -0:12:04 - LMT 1912 0:00 - GMT # Anguilla Version 2013d Zone America/Anguilla -4:12:16 - LMT 1912 Mar 2 -4:00 - AST # Antigua and Barbuda Version 2013d Zone America/Antigua -4:07:12 - LMT 1912 Mar 2 -5:00 - EST 1951 -4:00 - AST # Argentina # Chubut (CH) # The name "Comodoro Rivadavia" exceeds the 14-byte POSIX limit. Version 2005k Zone America/Argentina/ComodRivadavia -4:30:00 - LMT 1894 Oct 31 -4:16:48 - CMT 1920 May -4:00 - ART 1930 Dec -4:00 Arg AR%sT 1969 Oct 5 -3:00 Arg AR%sT 1991 Mar 3 -4:00 - WART 1991 Oct 20 -3:00 Arg AR%sT 1999 Oct 3 -4:00 Arg AR%sT 2000 Mar 3 -3:00 - ART 2004 Jun 1 -4:00 - WART 2004 Jun 20 -3:00 - ART # Aruba Version 2013d Zone America/Aruba -4:40:24 - LMT 1912 Feb 12 # Oranjestad -4:30 - ANT 1965 # Netherlands Antilles Time -4:00 - AST # Canada Version 2013d Zone America/Atikokan -6:06:28 - LMT 1895 -6:00 Canada C%sT 1940 Sep 29 -6:00 1:00 CDT 1942 Feb 9 2:00s -6:00 Canada C%sT 1945 Sep 30 2:00 -5:00 - EST Version 2013d Zone America/Blanc-Sablon -3:48:28 - LMT 1884 -4:00 Canada A%sT 1970 -4:00 - AST # Cayman Is Version 2013d Zone America/Cayman -5:25:32 - LMT 1890 # Georgetown -5:07:12 - KMT 1912 Feb # Kingston Mean Time -5:00 - EST # Canada Version 2006g Zone America/Coral_Harbour -5:32:40 - LMT 1884 -5:00 NT_YK E%sT 1946 -5:00 - EST # Curacao Version 2013d Zone America/Curacao -4:35:47 - LMT 1912 Feb 12 # Willemstad -4:30 - ANT 1965 # Netherlands Antilles Time -4:00 - AST # Dominica Version 2013d Zone America/Dominica -4:05:36 - LMT 1911 Jul 1 0:01 # Roseau -4:00 - AST # Mexico Version 1999h Zone America/Ensenada -7:46:28 - LMT 1922 Jan 1 0:13:32 -8:00 - PST 1927 Jun 10 23:00 -7:00 - MST 1930 Nov 16 -8:00 - PST 1942 Apr -7:00 - MST 1949 Jan 14 -8:00 - PST 1996 -8:00 Mexico P%sT # US Version 1999l Zone America/Fort_Wayne -5:00 US E%sT 1946 -5:00 - EST # Always EST as of 1986 # Grenada Version 2013d Zone America/Grenada -4:07:00 - LMT 1911 Jul # St George's -4:00 - AST # Guadeloupe Version 2013d Zone America/Guadeloupe -4:06:08 - LMT 1911 Jun 8 # Pointe a Pitre -4:00 - AST # Canada Version 2013d # Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S Rule Mont 1917 only - Mar 25 2:00 1:00 D Rule Mont 1917 only - Apr 24 0:00 0 S Rule Mont 1919 only - Mar 31 2:30 1:00 D Rule Mont 1919 only - Oct 25 2:30 0 S Rule Mont 1920 only - May 2 2:30 1:00 D Rule Mont 1920 1922 - Oct Sun>=1 2:30 0 S Rule Mont 1921 only - May 1 2:00 1:00 D Rule Mont 1922 only - Apr 30 2:00 1:00 D Rule Mont 1924 only - May 17 2:00 1:00 D Rule Mont 1924 1926 - Sep lastSun 2:30 0 S Rule Mont 1925 1926 - May Sun>=1 2:00 1:00 D # The 1927-to-1937 rules can be expressed more simply as # Rule Mont 1927 1937 - Apr lastSat 24:00 1:00 D # Rule Mont 1927 1937 - Sep lastSat 24:00 0 S # The rules below avoid use of 24:00 # (which pre-1998 versions of zic cannot handle). Rule Mont 1927 only - May 1 0:00 1:00 D Rule Mont 1927 1932 - Sep lastSun 0:00 0 S Rule Mont 1928 1931 - Apr lastSun 0:00 1:00 D Rule Mont 1932 only - May 1 0:00 1:00 D Rule Mont 1933 1940 - Apr lastSun 0:00 1:00 D Rule Mont 1933 only - Oct 1 0:00 0 S Rule Mont 1934 1939 - Sep lastSun 0:00 0 S Rule Mont 1946 1973 - Apr lastSun 2:00 1:00 D Rule Mont 1945 1948 - Sep lastSun 2:00 0 S Rule Mont 1949 1950 - Oct lastSun 2:00 0 S Rule Mont 1951 1956 - Sep lastSun 2:00 0 S Rule Mont 1957 1973 - Oct lastSun 2:00 0 S # Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone America/Montreal -4:54:16 - LMT 1884 -5:00 Mont E%sT 1918 -5:00 Canada E%sT 1919 -5:00 Mont E%sT 1942 Feb 9 2:00s -5:00 Canada E%sT 1946 -5:00 Mont E%sT 1974 -5:00 Canada E%sT # Montserrat Version 2013d Zone America/Montserrat -4:08:52 - LMT 1911 Jul 1 0:01 # Cork Hill -4:00 - AST # Bahamas Version 2013d # Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S Rule Bahamas 1964 1975 - Oct lastSun 2:00 0 S Rule Bahamas 1964 1975 - Apr lastSun 2:00 1:00 D # Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone America/Nassau -5:09:30 - LMT 1912 Mar 2 -5:00 Bahamas E%sT 1976 -5:00 US E%sT # Trinidad and Tobago Version 2013d Zone America/Port_of_Spain -4:06:04 - LMT 1912 Mar 2 -4:00 - AST # Brazil # Rio_Branco is too ambiguous, since there's a Rio Branco in Uruguay too. Version 2000h Zone America/Porto_Acre -4:31:12 - LMT 1914 -5:00 Brazil AC%sT 1988 Sep 12 -5:00 - ACT # Argentina # Santa Fe (SF), Entre Rios (ER), Corrientes (CN), Misiones (MN), Chaco (CC), # Formosa (FM), La Pampa (LP), Chubut (CH) Version 2002b Zone America/Rosario -4:02:40 - LMT 1894 Nov -4:16:44 - CMT 1920 May -4:00 - ART 1930 Dec -4:00 Arg AR%sT 1969 Oct 5 -3:00 Arg AR%sT 1991 Jul -3:00 - ART 1999 Oct 3 0:00 -4:00 Arg AR%sT 2000 Mar 3 0:00 -3:00 - ART # St Kitts-Nevis Version 2013d Zone America/St_Kitts -4:10:52 - LMT 1912 Mar 2 # Basseterre -4:00 - AST # St Lucia Version 2013d Zone America/St_Lucia -4:04:00 - LMT 1890 # Castries -4:04:00 - CMT 1912 # Castries Mean Time -4:00 - AST # Virgin Is Version 2013d Zone America/St_Thomas -4:19:44 - LMT 1911 Jul # Charlotte Amalie -4:00 - AST # St Vincent and the Grenadines Version 2013d Zone America/St_Vincent -4:04:56 - LMT 1890 # Kingstown -4:04:56 - KMT 1912 # Kingstown Mean Time -4:00 - AST # British Virgin Is Version 2013d Zone America/Tortola -4:18:28 - LMT 1911 Jul # Road Town -4:00 - AST # McMurdo, Ross Island, since 1955-12 Version 2013d Zone Antarctica/McMurdo 0 - zzz 1956 12:00 NZAQ NZ%sT # Japan Version 1999a Zone Asia/Ishigaki 8:16:36 - LMT 1896 8:00 - CST # Israel Version 1996a Zone Asia/Tel_Aviv 2:19:04 - LMT 1880 2:21 - JMT 1918 2:00 Zion I%sT # Russia Version 1996g Zone Asia/Tomsk 5:39:52 - LMT 1924 May 2 6:00 - TSK 1957 Mar 7:00 Russia TS%s 1991 Mar 31 2:00s 6:00 1:00 TSD 1991 Sep 29 2:00s 6:00 - TSK 1992 Jan 19 2:00s 7:00 Russia TS%s # Svalbard & Jan Mayen Version 2001b Zone Atlantic/Jan_Mayen -1:00 - EGT # Australia Version 1995l Zone Australia/Canberra 9:56:32 - LMT 1895 Feb 10:00 - EST 1917 Jan 1 0:01 10:00 Aus EST 1971 Oct 31 2:00 10:00 AN EST 1981 Oct 25 2:00 10:00 1:00 EST 1982 Apr 4 3:00 10:00 AN EST # UK Version 2005k Zone Europe/Belfast -0:23:40 - LMT 1880 Aug 2 -0:25:21 - DMT 1916 May 21 2:00 # Dublin/Dunsink MT -0:25:21 1:00 IST 1916 Oct 1 2:00s # Irish Summer Time 0:00 GB-Eire %s 1968 Oct 27 1:00 - BST 1971 Oct 31 2:00u 0:00 GB-Eire %s 1996 0:00 EU GMT/BST # Slovenia Version 1997j Zone Europe/Ljubljana 0:58:04 - LMT 1884 1:00 - CET 1941 Apr 18 23:00 1:00 C-Eur CE%sT 1945 May 8 2:00s 1:00 1:00 CEST 1945 Sep 16 2:00s 1:00 - CET 1982 Nov 27 1:00 EU CE%sT # Bosnia and Herzegovina Version 1997j Zone Europe/Sarajevo 1:13:40 - LMT 1884 1:00 - CET 1941 Apr 18 23:00 1:00 C-Eur CE%sT 1945 May 8 2:00s 1:00 1:00 CEST 1945 Sep 16 2:00s 1:00 - CET 1982 Nov 27 1:00 EU CE%sT # Macedonia Version 1997j Zone Europe/Skopje 1:25:44 - LMT 1884 1:00 - CET 1941 Apr 18 23:00 1:00 C-Eur CE%sT 1945 May 8 2:00s 1:00 1:00 CEST 1945 Sep 16 2:00s 1:00 - CET 1982 Nov 27 1:00 EU CE%sT # Moldova Version 2000h Zone Europe/Tiraspol 1:58:32 - LMT 1880 1:55 - CMT 1918 Feb 15 # Chisinau MT 1:44:24 - BMT 1931 Jul 24 # Bucharest MT 2:00 Romania EE%sT 1940 Aug 15 2:00 1:00 EEST 1941 Jul 17 1:00 C-Eur CE%sT 1944 Aug 24 3:00 Russia MSK/MSD 1991 Mar 31 2:00 2:00 Russia EE%sT 1992 Jan 19 2:00 3:00 Russia MSK/MSD # Croatia # Zone NAME GMTOFF RULES FORMAT [UNTIL] Version 1997j Zone Europe/Zagreb 1:03:52 - LMT 1884 1:00 - CET 1941 Apr 18 23:00 1:00 C-Eur CE%sT 1945 May 8 2:00s 1:00 1:00 CEST 1945 Sep 16 2:00s 1:00 - CET 1982 Nov 27 1:00 EU CE%sT # Micronesia Version 2005k Zone Pacific/Yap 9:12:32 - LMT 1901 # Colonia 9:00 - YAPT 1969 Oct # Yap Time 10:00 - YAPT # System V Version 2000h Zone SystemV/AST4ADT -4:00 SystemV A%sT Zone SystemV/EST5EDT -5:00 SystemV E%sT Zone SystemV/CST6CDT -6:00 SystemV C%sT Zone SystemV/MST7MDT -7:00 SystemV M%sT Zone SystemV/PST8PDT -8:00 SystemV P%sT Zone SystemV/YST9YDT -9:00 SystemV Y%sT Zone SystemV/AST4 -4:00 - AST Zone SystemV/EST5 -5:00 - EST Zone SystemV/CST6 -6:00 - CST Zone SystemV/MST7 -7:00 - MST Zone SystemV/PST8 -8:00 - PST Zone SystemV/YST9 -9:00 - YST Zone SystemV/HST10 -10:00 - HST # Soviet Union Version 1995b Zone W-SU 3:00 M-Eur ????
On Aug 29, 2013, at 12:28 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
... I like Zefram's suggestion for allowing a multitier structure for the tz database. One way to do that would be to add a cutoff year to the Makefile. It'd default to 1970. People who are interested in older timestamps could decrease the cutoff to (say) 1900; this would cause links to be turned into zones they differ from the existing zones after 1900. Conversely, people who are not interested in (say) pre-2000 timestamps could increase the cutoff to 2000, which would result in smaller database that turns a zone into a link if it's equivalent to another zone after 2000.
That's a useful thing to be able to do. It's a pretty easy addition to "zic" which I implemented in the version we use internally for the products I work on, exactly for the reason you mention. So the way to do that isn't so much to partition the source data, but rather to filter which subset of that data is transformed into the output that zic generates. paul
On 08/29/2013 09:36 AM, Paul_Koning@Dell.com wrote:
It's a pretty easy addition to "zic" which I implemented in the version we use internally for the products I work on, exactly for the reason you mention.
Thanks, is that code something that you and your company would donate to the public domain? If so, it could save us some work. How does the extension choose which of the alternate pre-1970 (or whatever) histories to use? For example, suppose America/Toronto and America/Montreal are the same after 1970, so zic makes them links. Which pre-1970 history does it use, Toronto's or Montreal's?
On Aug 29, 2013, at 1:51 PM, Paul Eggert <eggert@CS.UCLA.EDU> wrote:
On 08/29/2013 09:36 AM, Paul_Koning@Dell.com wrote:
It's a pretty easy addition to "zic" which I implemented in the version we use internally for the products I work on, exactly for the reason you mention.
Thanks, is that code something that you and your company would donate to the public domain? If so, it could save us some work.
Quite possibly; I will ask. We have done open source contributions before, there is a process in place that I can use.
How does the extension choose which of the alternate pre-1970 (or whatever) histories to use? For example, suppose America/Toronto and America/Montreal are the same after 1970, so zic makes them links. Which pre-1970 history does it use, Toronto's or Montreal's?
That's a case I don't have covered; our code only handles cases where you set a horizon later than 1970. I suppose that the case you mention would have to be handled by recognizing that the data now aren't the same, so it has to generate two real output files instead of one file and one link. (By the same token, with a later horizon you end up with several files that match when they didn't at the 1970 horizon, in which case more links could be generated as a result. I didn't bother doing that because our kit building process takes care of such optimizations at a later step.) paul
On 29 August 2013 18:51, Paul Eggert <eggert@cs.ucla.edu> wrote:
How does the extension choose which of the alternate pre-1970 (or whatever) histories to use? For example, suppose America/Toronto and America/Montreal are the same after 1970, so zic makes them links. Which pre-1970 history does it use, Toronto's or Montreal's?
Thats the key question here isn't it? For example, why did you choose to keep America/Panama's historic data and delete America/Atikokon's? Why not the other way around? Stephen
On 08/29/2013 11:37 AM, Stephen Colebourne wrote:
why did you choose to keep America/Panama's historic data and delete America/Atikokon's?
It's the usual rule stated in the Theory file, which is to choose the location with the greatest population. Atikokan has about three thousand people, Panama City about a million.
Paul Eggert wrote:
I like Zefram's suggestion for allowing a multitier structure for the tz database. One way to do that would be to add a cutoff year to the Makefile. It'd default to 1970.
If this is a popular idea, I think I should expand a bit on how to deal with it. I alluded earlier to a problem in manual zone selection, where the user may be forced to choose between zones that are equivalent for eir purposes. The same approach really addresses both issues, and it's worth generalising. The essential process is to winnow a set of timezones so that only inequivalent zones remain. Equivalence is in general defined by the user, specifically by the user indicating a range of years that is of interest. The kind of cutoff discussed so far describes the lower end of the range; some applications would also benefit from being able to specify an upper end. From among the full set of zones we group together those that have the same behaviour within the user's year range, and the winnowed zone set is made up of one representative zone from each equivalence class. Following the principles that we already use to select a representative location for each zone, the representative zone for each class should normally be the one whose principal location has the greatest population. We therefore need to store these population figures, for this use as preference values. Once we've got this winnowing operation implemented, there are multiple places where we should use it. First (both most important and the best testbed) in manual zone selection, under tzselect and the like. tzselect should (optionally) accept a year range from the user, and as the final stage of selection offer the user a choice between only the representative inequivalent zones. In a multi-level selection process, such as the tzselect method involving selecting an ISO 3166 region and then a zone from those associated with that region, the winnowing would be applied to the limited set of zones remaining after the earlier stages of selection. To support different kinds of installation, we'd want to apply equivalence winnowing at build/install time. The sysadmin/packager specifies a year range, and only the representative inequivalent zones for that range get installed as distinct files. We'd want to install links for the equivalent zones that were winnowed out. Desktop/server OS builds will probably want to install everything; embedded devices can get a smaller set that only covers the actual application needs.
# Attic data
Thanks for posting this. -zefram
On 08/29/2013 10:16 AM, Zefram wrote:
some applications would also benefit from being able to specify an upper end.
Good point, thanks
We therefore need to store these population figures, for this use as preference values.
Yes, I'd thought about that. Maintaining the population figures would be a bit of a pain (and another source of arguments!), but we could cut down on that by merely maintaining a list roughly sorted in order of increasing population. The list wouldn't need to be complete, only enough to resolve ambiguities. I had considered only winnowing at build/install time; winnowing at tzselect time is also a good idea. Thanks!
Paul Eggert wrote:
Yes, I'd thought about that. Maintaining the population figures would be a bit of a pain (and another source of arguments!),
It doesn't require much maintenance. We don't really need to update them, because the relative population figures only change slowly. It should suffice to collect a single figure per location, the estimated population for a common reference date. I suggest a nominal reference date of 2010 (-01-01T00:00:00Z, for the picky). We can think about switching to a new reference date in 2020. (There's a thought here to the possible future extension to sort on population as it was at the user's choice of historical era, for which population figures at decadal intervals should be adequate.) As a test, I just picked five entries from tz2013d zone.tab at random, and looked up the locations on Wikipedia. (Indian/Maldives, Asia/Almaty, Asia/Vientiane, America/Martinique, Atlantic/Reykjavik.) In all five cases the infobox at the top of the article gives a recent population figure, with date, dates ranging from 2007-01-01 to 2013-02-01. Three out of five gave a citation specifically for the population figure. Using those figures as-is should suffice for our purposes, or we can crudely correct for the variable dates by applying an exponential model of short-term population growth. I'd be inclined to stick the population figures in the zone source files, probably as a magic comment to avoid breaking anyone's zic. An automated process can generate a .tab from that. -zefram
"On 29 August 2013 17:28, Paul Eggert <eggert@cs.ucla.edu> wrote: > the tz database evolves +1 > we hope for the better. So why are you making it worse? Thats what I cannot fathom. The data that is being changed/deleted results in nonsense pre-1970. Until you have a comprehensive solution to the pre-1970 issue, you should revert the commit that has resulted in the nonsense. As a reminder, "America/Atikokan" used to be this: LMT -06:06:28 Transition[Gap at 1895-01-01T00:00-06:06:28 to -06:00] Transition[Gap at 1918-04-14T02:00-06:00 to -05:00] Transition[Overlap at 1918-10-27T02:00-05:00 to -06:00] Transition[Gap at 1940-09-29T00:00-06:00 to -05:00] but is now the same as "America/Panama": LMT -05:18:08 Transition[Overlap at 1890-01-01T00:00-05:18:08 to -05:19:36] Transition[Gap at 1908-04-22T00:00-05:19:36 to -05:00] >From this we can say (as fact, not opinion): - the LMT value has changed (Panama is nowhere near Atokokan) - the history of data before 1940 in Atikokan has changed - the history previously showed Atikokan started defining zones in 1895 - the history now shows Atikokan started defining zones in 1908 - the history now shows Atikokan as never having had a -06:00 offset The other IDs being altered have similar issues, but its easier to focus on one. You might argue that it is just pre-1970 data which is inaccurate and should not be relied on. I simply argue that you've taken the data from unknown quality to definitely inaccurate - clearly worse. (And that pre-1970 data is very visible in the work I do) > I've compiled some "attic" data (appended to this > email) which makes it clear that we have regularly replaced > zones by links during tz maintenance. This practice hasn't > caused hardships for users. Links for spelling mistakes obviously cause no issues. Beyond that, they are clearly going to be losers unless the entire history of the two zones are exactly identical. By entire history, I mean LMT, pre-1970 and post-1970. Bear in mind that the entire source data is visible in Java (LMT, pre-1970 and post-1970) and that we parse the source files directly. zic is a distraction. Looking at the attic data, its clear that the LMT has been treated as irrelevant in the past. Every time a zone with a unique ID is converted to a Link, then its LMT is lost. > Both filters could be implemented, and they could be applied > in series. This is all very well, but ignores the fact that other applications parse the source files, including Java. Those applications would need additional complex logic to fixup the data. There is too much focus on the C code developed here and Unix, and not on other consumers of the data. FWIW, I also think that filtering like this isn't really a good idea in practical terms for users. For example, say you filter most of central Europe after say 2010, you'll only get one zone as everywhere uses the same time. Now, lots of people setup their machines to that one central zone. Then imagine the case where Greece leaves the EU and starts to set its own time-zone. Everyone in Greece will now need to reset their zone ID. Whereas, if everyone had just selected Europe/Athens up front there would have been no problem. ie. zones split as well as merge, and they will often do so on the historic boundaries that are already captured in the tzdb. As I said above, you should start by reverting the controversial changes (see my other email). That takes the heat out of the immediate issue. Then, only make changes once you have a fully agreed strategy for handling pre-1970 data that is not destructive, and that gives enough notice to others to be able to adapt. Stephen
On 08/29/2013 10:44 AM, Stephen Colebourne wrote:
Links for spelling mistakes obviously cause no issues.
The "attic" data that I posted does not contain these links. Its zones all contain pre-1970 data we discarded, typically by turning the zones into links. This common practice hasn't caused problems for users.
other applications parse the source files, including Java.
I don't think we need to change the zic input format, so these applications should be OK. Any "Version" line in the attic data can be filtered out before presenting it to zic.
zones split as well as merge
Sure, and when that happens we should create new zones as needed. It's not practical or necessary for us to anticipate zone splits that might or might not occur.
Until you have a comprehensive solution to the pre-1970 issue
We are not likely to ever have a comprehensive solution to the pre-1970 issue. Too many data were never recorded or were lost, and much of what little we have is fabricated. Insisting on a comprehensive solution before making improvements would mean we could never make improvements. The changes in the github tz repository are experimental, and there needs to be room to experiment. I plan to restore the data you've expressed concern about, along with other data that's been discarded over the years. Please be patient while we figure out a good way to go about this.
On 29 August 2013 20:13, Paul Eggert <eggert@cs.ucla.edu> wrote:
other applications parse the source files, including Java.
I don't think we need to change the zic input format, so these applications should be OK.
They are not OK if they are relying on pre-1970 data, as Java does. Unless the new data consists of no more than a new file in the same format that is to be parsed, then there will be work to do.
Until you have a comprehensive solution to the pre-1970 issue
We are not likely to ever have a comprehensive solution to the pre-1970 issue. Too many data were never recorded or were lost, and much of what little we have is fabricated. Insisting on a comprehensive solution before making improvements would mean we could never make improvements.
Not quite what I meant. I fully accept that we cannot know all time-zone data before 1970. What I mean is that you need to carry everyone with you when making changes. You need to explain how the pre-1970 data that we do have will be retained and accessible. As I've said before, the users I represent don't care about the accuracy, but they will care if it is made willfully inaccurate. A comprehensive solution might be two sets of files, one pre and one post 1970 with enough notice for everyone to be able to adapt. Or it might be simply leaving the data as is.
The changes in the github tz repository are experimental, and there needs to be room to experiment. I plan to restore the data you've expressed concern about, along with other data that's been discarded over the years. Please be patient while we figure out a good way to go about this.
I want to ensure that we don't see a release from the current set of commits. Stephen
On 08/29/2013 12:33 PM, Stephen Colebourne wrote:
You need to explain how the pre-1970 data that we do have will be retained and accessible.
Yes, that's the goal. I expect the code is what's needed next, and it should help explain things. I plan to write it soon.
the users I represent don't care about the accuracy, but they will care if it is made willfully inaccurate.
I'm afraid I don't follow. As we've seen, similar changes have been made to the tz database regularly, and evidently those users didn't notice or care. What's different about these changes?
A comprehensive solution might be two sets of files, one pre and one post 1970 with enough notice for everyone to be able to adapt.
Yes, though I'm hoping we can be a bit more flexible, with settable cutoff dates, something along the lines that Zefram suggested.
I want to ensure that we don't see a release from the current set of commits.
The current experimental version will definitely not be released as-is, so please don't worry about that.
On 29 August 2013 21:18, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 08/29/2013 12:33 PM, Stephen Colebourne wrote:
the users I represent don't care about the accuracy, but they will care if it is made willfully inaccurate.
I'm afraid I don't follow. As we've seen, similar changes have been made to the tz database regularly, and evidently those users didn't notice or care. What's different about these changes?
Firstly, there are a lot of recent changes. Secondly, some of them are more political than in the past, notably cross border. I can't say without a lot more research whether, and to what degree, the previous changes caused visible changes in the historic data of an ID.
A comprehensive solution might be two sets of files, one pre and one post 1970 with enough notice for everyone to be able to adapt.
Yes, though I'm hoping we can be a bit more flexible, with settable cutoff dates, something along the lines that Zefram suggested.
Just to emphasise that anything coded in zic is not of use to me or those I represent directly. Any change that requires filters means I have to expend effort to fix code implemented in three other places, one of which (the JDK) is getting more restrictive before its main release to 9 million developers. As part of this debate, I have wondered if I should be looking at bypassing the tzdb and making up our own zone ID system for Java, due to the apparent instability here. Its really, really not something I want to see happen however. I would therefore offer a simple suggestion: Add a "historical data reliability" indicator to each zone. Say, the earliest date from which the data is regarded as being acceptably reliable. For London it would be right back to the start of zones (ie. excluding LMT), but for other zones it might only be 1970. This would give a solid basis for filtering, rather than an arbitrary one, and not require any change to the main data (other than retaining it and not deleting it or moving it elsewhere). For the record, I'm not that interested in resurrecting long dead data, just ensuring that nothing more is lost. In data creation terms, the proposal email outlines what would be necessary - at least one full history time-zone per ISO3166 code. Stephen
Stephen Colebourne wrote:
On 29 August 2013 21:18, Paul Eggert <eggert@cs.ucla.edu> wrote:
As we've seen, similar changes have been made to the tz database regularly, and evidently those users didn't notice or care. What's different about these changes?
Firstly, there are a lot of recent changes.
Sure, but there were more changes in the past, and they didn't cause problems.
Secondly, some of them are more political than in the past, notably cross border.
Sorry, but that's incorrect. The past changes were cross-border and were more political than the current proposal. For example, a year after the Siege of Sarajevo (the longest siege of a capital city in the history of modern warfare, and the focus of an intensely bitter Balkan war), we merged Sarajevo with Belgrade, on the grounds that their post-1970 time stamps were identical. Nobody noticed or cared. Nothing in the current proposal is remotely close to that merger, in terms of political controversy. So our practical experience suggests that the proposed changes won't cause any real problems.
Add a "historical data reliability" indicator to each zone. Say, the earliest date from which the data is regarded as being acceptably reliable.
I'm afraid that sounds like a lot of work, and it's not something that can be reliably determined -- at least, not unless we arbitrarily put in "1970" for a large majority of entries, and then what's the point?
I'm not that interested in resurrecting long dead data
OK, in that case we can make the attic smaller, and use it more on a going-forward basis, with 2003d as the starting point. This will be a smaller change to the database now, which I assume is a good thing.
Paul Eggert wrote:
I'm not that interested in resurrecting long dead data OK, in that case we can make the attic smaller, and use it more on a going-forward basis, with 2003d as the starting point. This will be a smaller change to the database now, which I assume is a good thing.
In which case we have to start setting up a second database as a home for the historical material that IS being used ... accurate documentation for daylight saving exist back to the 1960s, and presumably there is also a record of the confusion in the States prior to that? Certainly going back to diaries for calendars with pre 2000 dates should return the correct time. This was the area where I first found inconsistencies because of different rules being applied and our getting meetings moved an hour as a result! Moving back in time from that, getting the correct time difference between locations is equally important even if arbitrarily calculated for location, but where historic material does exist there should be SOME mechanism to combine that with the rest ... it was pulling the available data together that started the existing database and EVERYTHING gathered is important! -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Lester Caine wrote:
This will be a smaller change to the database now, which I assume is a good thing.
In which case we have to start setting up a second database
It's no big either way, so if you'd rather have a larger attic and if Stephen doesn't care, let's make it larger. I'll send out a proposed patch shortly, which creates the attic in zic input format, so that it can be used to further populate the database if you like. This is all still experimental, but at least the latest experimental version has recovered the pre-1970 data that people were concerned about.
Paul Eggert <eggert@cs.ucla.edu> wrote on Fri, 30 Aug 2013 at 01:08:13 -0700 in <522052ED.9090605@cs.ucla.edu>:
I'll send out a proposed patch shortly, which creates the attic in zic input format, so that it can be used to further populate the database if you like. This is all still experimental, but at least the latest experimental version has recovered the pre-1970 data that people were concerned about.
I have a dumb question that shows I need to review the past week's email in more detail: Do we have a concensus that putting the pre-1970 data in a separate place (attic) is wise? It seems to me that it creates more problems than it solves. --jhawk@mit.edu John Hawkinson
John Hawkinson wrote:
Do we have a concensus that putting the pre-1970 data in a separate place (attic) is wise?
I've mentioned the idea of an attic file a couple of times. As far as I recall, you're the first to have expressed qualms about the idea. What problems do you see?
On 30 August 2013 09:08, Paul Eggert <eggert@cs.ucla.edu> wrote:
Lester Caine wrote:
This will be a smaller change to the database now, which I assume is a good thing.
In which case we have to start setting up a second database
It's no big either way, so if you'd rather have a larger attic and if Stephen doesn't care, let's make it larger. I'll send out a proposed patch shortly, which creates the attic in zic input format, so that it can be used to further populate the database if you like. This is all still experimental, but at least the latest experimental version has recovered the pre-1970 data that people were concerned about.
FWIW, I can see an attic file working if there is an additional loader rule. The loader rule would have to say that a zone in the attic file takes precedence over a link with the same ID. I think that is doable. Please make clear any other restrictions that your approach has. BTW, I haven't researched it in detail, but according to the proposal I put forward, I would prefer there to be separate zones for each of the Balkan countries which may affect what you put in the attic. I'd use that proposal to guide what the initial set of things in the attic is. Stephen
Stephen Colebourne wrote:
I can see an attic file working if there is an additional loader rule. The loader rule would have to say that a zone in the attic file takes precedence over a link with the same ID.
That's pretty much what I implemented, with the loader rule being the BACKWARD setting in the Makefile, but you and others then objected. Oh well. Marc Lehmann wrote:
Never before has the tzdb seen so many changes that are not actually corrections
Not true; it had far more changes the last time we reorganized it, and another set of comparable changes after the Balkan wars upset clocks in the region. What's different now, is that there are more people involved, and a lot more bureaucracy, and considerably more resistance to change. It's understandable. Every long-running project has this sort of problem eventually. Given the reviews, it appears that we should undo the addition of the attic file, and revert the recent changes that lost pre-1970 data, so I'll do that shortly. Perhaps we can think of a better way to address the underlying problems.
On Thu, Aug 29, 2013 at 08:33:13PM +0100, Stephen Colebourne wrote:
pre-1970 data that we do have will be retained and accessible.
I'd prefer the same, on behalf of coworkers. In discussions with computational climatologist friends and colleagues I've found that they care about historical time differences at any given meteorological, tide, etc. measurement station location. They use historical climate records to validate models, among other things. They take note of the difference between local time and what became UTC when a time of day is specified for historical temperatures and tides. The TZ project's collection of historical time differences is useful input for their work, especially when it comes with references regarding what the local meteorologist and their neighbors were likely to consider "the time". On their behalf, I'd prefer to see what's been deduced regarding pre-1970 times retained in the project, or at least exported intact (or as intact as it's become :) ) in a usable format. Richard
On Thu, Aug 29, 2013 at 09:28:48AM -0700, Paul Eggert <eggert@cs.ucla.edu> wrote:
That's correct; the tz database evolves, we hope for the better.
No, "evolves" is what it did in the past, right now, it is undergoing a revolution. Never before has the tzdb seen so many changes that are not actually corrections and that do not improve the data. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@schmorp.de -=====/_/_//_/\_,_/ /_/\_\
On Thu, Aug 29, 2013 at 03:53:57PM +0200, Petr Machata <pmachata@redhat.com> wrote:
We occasionally revisit old data to correct them.
That is what indeed has happened in the past, but since the direction is to remove the data, or archive it away, this would also change. I think the whole point is that the tzdb wasn't broken before, and only recently has started to acquire a lot of politically-motivated or unmotivated changes. For many tzdb users, this is the kind of instability that scares them, not the occasional best-effort correction of old data. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@schmorp.de -=====/_/_//_/\_,_/ /_/\_\
Stephen Colebourne wrote:
Sorry to be unsubtle, but I think we're approaching vote of no confidence territory here.
I, for one, continue to have confidence in Paul Eggert's maintainership. On zone.tab he's reached a good compromise, which reduced the political pressure while avoiding throwing away the bulk of useful data. On pre-1970 data, throwing away all pre-1970 distinctions obviously isn't going to fly, and Eggert has already acknowledged this and is modifying his approach. He is responsive to the discussion on the list. My opinion on pre-1970 data is that I'd rather not lose any of the data the project has collected, but the stated scope of the tz project does have effects that can't be ignored. If you want proper handling of pre-1970 timestamps then the tz database isn't sufficient: the zones that it distinguishes pre-1970 are a matter of historical accident, not any systematic treatment. The zones whose conversion to links Paul Eggert has proposed are indeed an anomaly. There should be some project that tackles pre-1970 timezones systematically. My preference would be for the tz project itself to do this. It could be done gradually, by first permitting new zones (for which there's decent data) to be added that differ from existing ones only pre-1970, and eventually by progressively reducing that 1970 threshold to increase the project's formal scope. This is essentially the opposite of Paul Eggert's approach to resolving the anomaly. Another sane option is to limit the official tz database to zones that are distinct post-1970 and have a separate project develop a larger database that takes the more inclusive approach. Or a middle course is to maintain both collections in the tz project with a two-tier structure, the larger database being installed only where specifically requested. The "attic" concept is a step in that direction. Any proposal for a tz database of wider scope has to deal with the technical effects of a larger collection of zones. The compiled tzfiles (and to a lesser extent the source files) don't share data representation between zones, so twice as many zones means twice as much disk usage. Selection of a timezone for present-day purposes is hampered by the existence of multiple zones that don't differ recently enough to matter. There are ways to fix these problems, of course, but they mean that the database can't just be casually expanded. Hence the sense of a two-tier structure. So, in summary: if you want proper handling of pre-1970 timezones then you want something that the tz database doesn't supply. We need some other kind of effort to supply it. And how that effort relates to the present tz project, from among the sane options, isn't a matter for no-confidence votes. -zefram
On Wed, 28 Aug 2013, Zefram wrote:
There should be some project that tackles pre-1970 timezones systematically. My preference would be for the tz project itself to do this. It could be done gradually, by first permitting new zones (for which there's decent data) to be added that differ from existing ones only pre-1970, and eventually by progressively reducing that 1970 threshold to increase the project's formal scope.
I agree. It seems clear that some people care about pre-1970 timestamps. If they care enough to contribute data, than I think that the tz project should not turn them away. Increasing the quality of pre-1970 data may require creating new zones, and I think it would be fine if such new zones were handled in a way that allows distributors to choose whether or not to include them. For example, there could be different versions of zone.tab that do or do not include cities that differ only in pre-1970 timestamps, and there could be Makefile options to control whether or not such additional zones are installed. --apb (Alan Barrett)
I feel like i also have to comment. Zefram <zefram@fysh.org> wrote: |I, for one, continue to have confidence in Paul Eggert's maintainership. Paul Eggert is one of the major contributors to the TZ database, and as such i have to and also want to show respect and thanks. I've used the TZ data to make a living. And indirectly use it daily, everywhere. |On zone.tab he's reached a good compromise, which reduced the I disagree with this opinion of yours; the generated time.tab still maps an ID (Europe/Belgrade iirc) to a different country. The two countries in question were one for quite some time, and there was a terrible war after they broke up. Even if the dataset which i would simply describe as "horrible" is now generated by a script, it is *still* generated. And this is a step that imho noone but the involved tribes may take. It wasn't really funny that Korn shell script that requires the GNU bash to work (?) now supports geographical coordinates, whereas column 1 of time.tab still exists. I personally have never had problems with ISO 3166. Also, and that did really upset me, tzselect.8 has been changed to use the mentioned horrible dataset for an example. That was unnecessary like anything else! No! [.] |There should be some project that tackles pre-1970 timezones |systematically. My preference would be for the tz project itself to |do this. It could be done gradually, by first permitting new zones That would be great! |-zefram --steffen
Paul Eggert wrote:
If we relaxed this rule, and allowed multiple regions even though their clocks agreed since 1970, that will be a recipe for more political disputes. For example, why does the Navajo Nation have its own entry while the Hopi Nation doesn't? Or, why does Quebec have its own entry while Prince Edward Island lacks one? Currently, our only real answer is "because we felt like it". That is not a fair answer, and it will inevitably lead to more political problems in the future. Some new rule would have to be created. Defining for instance a degree of accuracy for splitting pre-1970 regions.
Zefram wrote:
My opinion on pre-1970 data is that I'd rather not lose any of the data the project has collected, but the stated scope of the tz project does have effects that can't be ignored. If you want proper handling of pre-1970 timestamps then the tz database isn't sufficient: the zones that it distinguishes pre-1970 are a matter of historical accident, not any systematic treatment. The zones whose conversion to links Paul Eggert has proposed are indeed an anomaly.
There should be some project that tackles pre-1970 timezones systematically. My preference would be for the tz project itself to do this. +1
It could be done gradually, by first permitting new zones (for which there's decent data) to be added that differ from existing ones only pre-1970, and eventually by progressively reducing that 1970 threshold to increase the project's formal scope. +1
Any proposal for a tz database of wider scope has to deal with the technical effects of a larger collection of zones. The compiled tzfiles (and to a lesser extent the source files) don't share data representation between zones, so twice as many zones means twice as much disk usage. Selection of a timezone for present-day purposes is hampered by the existence of multiple zones that don't differ recently enough to matter. There are ways to fix these problems, of course, but they mean that the database can't just be casually expanded. Hence the sense of a two-tier structure. This is a technical problem, and suitable for technical solutions. Configuring the package could for instance accept a cutoff date as parameter. So I could configure and install the database with zones for last century, since unix epoch, since I was born, since foundation of the company, since I bought my first computer... Many people don't even need accurate data back to 1970, and Windows is a good example of that. I'm sure I could work with Europe countries being hardlinks to 4-5 files.
Stephen Colebourne <scolebourne@joda.org> writes:
Some systems will automatically map the old "backward" forms of IDs to the "correct" new forms. Thus users will see their Atikokan replaced by Panama, something which is clearly ridiculous.
How would they even do it? They will keep seeing two different zones that happen to have the same content (and are actually hard-links of each other). The user will still see either "CA: Eastern Standard Time - Atikokan, Ontario and Southampton I, Nunavut", or "PA", depending on whether their zone is set to America/Atikokan, or America/Panama. Thanks, PM
Petr Machata <pmachata@redhat.com> writes:
Stephen Colebourne <scolebourne@joda.org> writes:
Some systems will automatically map the old "backward" forms of IDs to the "correct" new forms.
How would they even do it?
Actually, yeah, they could. Java behaves this way, apparently. As a last resort it goes on looking for a file with matching contents, so for one of the links it finds the wrong file. But that's not new, that happens with Busingen and Zurych already. Thanks, PM
At 2013-08-28 09:46, Paul Eggert wrote:
On 08/28/13 09:20, Stephen Colebourne wrote:
Many users will have stored data in databases or other long term storage that refers to these time-zones being made obsolete.
The zones are not being made obsolete. TZ=America/Atikokan will still work, and will give the same results as before for post-1970 timestamps, which are the only timestamps in scope for this project. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I just want to point out that this last clause is the very heart of the issue.
-- Alan Mintz <Alan_Mintz+TZ_IANA@Earthlink.net>
On Aug 28, 2013, at 12:46 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 08/28/13 09:20, Stephen Colebourne wrote:
Many users will have stored data in databases or other long term storage that refers to these time-zones being made obsolete.
The zones are not being made obsolete. TZ=America/Atikokan will still work, and will give the same results as before for post-1970 timestamps, which are the only timestamps in scope for this project.
I don't agree, and the Theory file you pointed to does not, either. It mentions "1970" in 8 places, none of which say anything remotely like "dates before 1970 are out of scope". On the contrary, dates prior to 1970 are explicitly mentioned in the "Scope of the tz database" section. The most you can argue from its words is that data prior to 1970 is definitely not complete (as opposed to later data, which at least aims to be complete though it might not succeed). It's pretty clear to me that there is not "rough consensus and running code" for the changes you have proposed. paul
On 08/28/13 10:05, Paul_Koning@Dell.com wrote:
The most you can argue from its words is that data prior to 1970 is definitely not complete
The rules for specifying how to partition the world into regions specify the "somewhat-arbitrary cutoff point of the POSIX Epoch (1970-01-01 00:00:00 UTC)". If clocks in two locations agree after 1970, the two locations are in the same region. If we relaxed this rule, and allowed multiple regions even though their clocks agreed since 1970, that will be a recipe for more political disputes. For example, why does the Navajo Nation have its own entry while the Hopi Nation doesn't? Or, why does Quebec have its own entry while Prince Edward Island lacks one? Currently, our only real answer is "because we felt like it". That is not a fair answer, and it will inevitably lead to more political problems in the future.
On 28 August 2013 18:23, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 08/28/13 10:05, Paul_Koning@Dell.com wrote:
The most you can argue from its words is that data prior to 1970 is definitely not complete
The rules for specifying how to partition the world into regions specify the "somewhat-arbitrary cutoff point of the POSIX Epoch (1970-01-01 00:00:00 UTC)". If clocks in two locations agree after 1970, the two locations are in the same region.
If we relaxed this rule, and allowed multiple regions even though their clocks agreed since 1970, that will be a recipe for more political disputes. For example, why does the Navajo Nation have its own entry while the Hopi Nation doesn't? Or, why does Quebec have its own entry while Prince Edward Island lacks one? Currently, our only real answer is "because we felt like it". That is not a fair answer, and it will inevitably lead to more political problems in the future.
Adding data does not cause great problems. Removing it does. That is the nature of backwards compatibility in a project vital to world computing. Time-zones are a political thing. That can't be escaped. Stephen
On Aug 28, 2013, at 1:23 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 08/28/13 10:05, Paul_Koning@Dell.com wrote:
The most you can argue from its words is that data prior to 1970 is definitely not complete
The rules for specifying how to partition the world into regions specify the "somewhat-arbitrary cutoff point of the POSIX Epoch (1970-01-01 00:00:00 UTC)". If clocks in two locations agree after 1970, the two locations are in the same region.
If we relaxed this rule, and allowed multiple regions even though their clocks agreed since 1970, that will be a recipe for more political disputes. For example, why does the Navajo Nation have its own entry while the Hopi Nation doesn't? Or, why does Quebec have its own entry while Prince Edward Island lacks one? Currently, our only real answer is "because we felt like it".
For your first example, that is clearly not correct. Navajo Nation is (in part anyway) contained within Arizona, but it keeps time different from the rest of Arizona. Hopi nation follows Arizona rules and is in Arizona, so it doesn't get its own entry. So the answer is not "we felt like it" but "the data are different". paul
On 08/28/13 11:07, Paul_Koning@Dell.com wrote:
Navajo Nation is (in part anyway) contained within Arizona, but it keeps time different from the rest of Arizona. Hopi nation follows Arizona rules
America/Phoenix does not mean "Arizona", just as America/Toronto does not mean "Ontario" and America/Denver does not mean "Colorado". These names stand for regions of clocks that have agreed since 1970, for which Phoenix, Toronto, and Denver are the main locations. The names do not stand for their containing politial regions; this has always been true and I'm not proposing to change this. For time stamps after 1970, the Navajo reservation follows the same rules as America/Denver, just as the Hopi reservation follows the same rules as America/Phoenix. It is inconsistent to give the Navajo reservation its own zone while not extending the same privileges to the Hopi. And a similar argument applies to the Osage, Yakama, Flathead, Wind River, and Rosebud reservations. There are hundreds more.
Stephen Colebourne said:
Paul, you need to STOP NOW.
You need to reverse the vast majority of your recent changes. Focus on actual changes happening in 2013. Keep the rest of the database totally and utterly stable. And just accept that there is a measure of politics incumbent in time-zones.
I don't agree with the language used here, but I do agree with the sentiment. Where we have data that differs before 1970, we should be keeping it. These amalgamations need to be backed out *completely* unless the zones in question are actually identical all the way back to the year Dot. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 28 August 2013 13:21, Marc Lehmann <schmorp@schmorp.de> wrote:
The interpretation I always gave this, and that was consistent with the "old style maintainance regime", is that pre-1970s data is not guaranteed to be correct, useful, or complete, but it is occasionally maintained at best effort base. This can be witnessed by the many pre-1970 changes to the data over the years.
I think the part you quoted contradicts your recent actions - the fact that the data isn't provided everywhere implies that data is provided somewhere, and overall, this part of the Theory file gives no reason to actively remove pre-1970 data.
The pre-1970 cut-off, IMHO, was always meant as the date from where the tz database really cares and really tries to get right, while earlier tz data might be completely wrong, and definitely much less certain.
I never thought the scope of the tz project to have been exclusionary, e.g., "pre-1970 is NOT in scope", but rather inclusionary, e.g., "post-1970 is IN scope". I never got the sense previously that pre-1970 data was an ending point for things to throw away; just that it was a practical starting point for things TO include. I think the reasons behind this were fairly well-established. It has been my understanding that the tz project has evolved into just as much a historical document as a systems project; and it is highly regarded by many for this purpose as well as its original intent (see, e.g., http://blog.jonudell.net/2009/10/23/a-literary-appreciation-of-the-olsonzone...). In observing list discussion prior to this point, the consensus Rule #0 has always seemed to be stability over everything. Even when this project has occasionally strayed outside the scope of Theory, we have VERY strongly avoided changing ANY data unless it's demonstrably wrong. Many of the currently proposed changes represent a major shift in that philosophy, I think for misguided reasons. On 28 August 2013 12:46, Paul Eggert <eggert@cs.ucla.edu> wrote:
After all, the pre-1970 data for Atikokan was almost surely incorrect, and nobody cared about that either.
If, by "almost surely incorrect", you mean "as good as we or anyone has really ever been able to compile". On 28 August 2013 13:23, Paul Eggert <eggert@cs.ucla.edu> wrote:
If we relaxed this rule, and allowed multiple regions even though their clocks agreed since 1970, that will be a recipe for more political disputes.
The problem here is that that rule was already relaxed long ago, presumably with some reason behind it (even if flawed). Removing this data may solve one problem (although I'd argue it doesn't), but it certainly causes more. On 28 August 2013 15:18, Paul Eggert <eggert@cs.ucla.edu> wrote:
It is inconsistent to give the Navajo reservation its own zone while not extending the same privileges to the Hopi. And a similar argument applies to the Osage, Yakama, Flathead, Wind River, and Rosebud reservations.
Actually, the America/Shiprock change is just about the only proposed 'backwards' change I support, for this very reason. On 28 August 2013 13:49, Zefram <zefram@fysh.org> wrote:
There should be some project that tackles pre-1970 timezones systematically. My preference would be for the tz project itself to do this.
This is my preference as well, and we are most equipped to handle it, since this is by far one of the most established projects in this space. On 28 August 2013 15:59, Alan Barrett <apb@cequrux.com> wrote:
Increasing the quality of pre-1970 data may require creating new zones, and I think it would be fine if such new zones were handled in a way that allows distributors to choose whether or not to include them. For example, there could be different versions of zone.tab that do or do not include cities that differ only in pre-1970 timestamps, and there could be Makefile options to control whether or not such additional zones are installed.
And this seems a very reasonable way to approach this. I, for one, am not currently in the "no confidence" camp, as Paul has been responsive and engaged with these concerns; however, these proposed changes to the data do have me very concerned about the future direction of this project. I hope that they are summarily reversed. -- Tim Parenti
"On 28 August 2013 13:49, Zefram <zefram@fysh.org> wrote: There should be some project that tackles pre-1970 timezones systematically. My preference would be for the tz project itself to do this. This is my preference as well, and we are most equipped to handle it, since this is by far one of the most established projects in this space." Perhaps we should have a way to create smaller working groups within our community, who research these things and present findings to the larger group. Open to any of us who are interested, of course.
Stephen Colebourne <scolebourne@joda.org> writes:
Merging these is clearly utterly RIDICULOUS. They contain completely different data, and the 1970 argument simply won't wash.
It's "always" been the case that we only split zones that are unique post-1970. By the same token, we could merge zones that are the same post-1970. Whether there's value in there merges is debatable. But since differing pre-1970 data never were basis for zone fork, you can't rely on correctness and completeness of this data anyway, so it's questionable whether the effort expended in maintaining it is justified at all. Thanks, PM
participants (20)
-
Alan Barrett -
Alan Mintz -
Alois Treindl -
Andrew Paprocki -
Andy Lipscomb -
Clive D.W. Feather -
Derick Rethans -
John Hawkinson -
Lester Caine -
Marc Lehmann -
Paul Eggert -
Paul_Koning@Dell.com -
Petr Machata -
random832@fastmail.us -
Richard Johnson -
Steffen Daode Nurpmeso -
Stephen Colebourne -
Tim Parenti -
Zefram -
Ángel González