How good is pre-1970 time zone history represented in TZ database?
Alois Treindl said:
I have analyzed to what extent TZ database represents the time zone information correctly, for these data.
I think the results are of some interest for the TZ community.
The overall sums are:
tz_count 24'674'767 data records, 100% tz_irange 23'675'636 in time range and region which TZ covers correctly, 96% tz_good 736'769 in time range and region where TZ is unreliable, but correct, 3% tz_bad 262'362 in time range and region where TZ database gives false result. 1%
Can you please explain how you've done this analysis. In particular, how do you know that TZ is "unreliable, but correct" or gives a bad result? -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
Alois Treindl said:
tz_count 24'674'767 data records, 100% tz_irange 23'675'636 in time range and region which TZ covers correctly, 96% tz_good 736'769 in time range and region where TZ is unreliable, but correct, 3% tz_bad 262'362 in time range and region where TZ database gives false result. 1%
Can you please explain how you've done this analysis. In particular, how do you know that TZ is "unreliable, but correct" or gives a bad result?
[...] Okay, to paraphrase what you said: * "irange" means that the zone that covers the relevant place now covered it at the relevant time. For example, the Europe/Berlin rules applied at the relevant time, even if that was pre-1970. * "good" means that the relevant place had different time rules before 1970 to the zone that now applies, so is in a different zone if you include pre-1970 rules, but the two zones had the same offset at the relevant time. For example, the Europe/Berlin rules disagreed with (hypothetical) Europe/Kiel rules but gave the same result at the relevant time. * "bad" means that according to the information you hold the relevant place had a different offset at the relevant time to the zone that now covers it. For example, Europe/Berlin and Europe/Kiel have different offsets at the relevant time. Have I understood correctly? -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 03/06/2020 11:01, Alois Treindl wrote:
It is known that pre-1970 time zone history is represented in TZ database only for the cities which define the zone names. Other areas of countries are often not correctly represented in TZ database before 1970.
In Europe, there are several rules pushed back to the backzone file which are accurate for all of the times back to the introduction of 'standard time' but if backzone is omitted from the TZ data then those areas fall back to the general rules from the current set prior to 1970. In particular the data during the Second World War for The islands around the UK is missed. The amount of missing data is only small, but the fact that it is missing is not always obvious? -- Lester Caine - G8HFL ----------------------------- Contact - https://lsces.uk/wiki/Contact L.S.Caine Electronic Services - https://lsces.uk Model Engineers Digital Workshop - https://medw.uk Rainbow Digital Media - https://rainbowdigitalmedia.uk
On 03/06/2020 12:09, Alois Treindl wrote:
The backzone zones cover only a very small part of the problems we know about.
Using the backzone prevents using the official tzdata distribution, as one has to compile with special procedures so that backzone data are compiled in. This would prevent us from using the official tzdata packages provided automatically for Redhat Enterprise Linux.
THAT has been my own complaint for years ... tzdist at least accepted that some rule sets are only valid for a reduced timeframe, and report that fact, but as yet there are no sources of this more accurate data as yet? A tzdist publisher would have the ability of improving their own historic data and in addition flag when rules have changed from when a previous normalization was applied. The CURRENT problem is even when present day time offsets change, a distribution will simply apply the new rules without any flagging that currently stored data may now be incorrect. One needs to be able to view the rule used for the stored data in parallel with the new rule even if only to flag human intervention is needed. -- Lester Caine - G8HFL ----------------------------- Contact - https://lsces.uk/wiki/Contact L.S.Caine Electronic Services - https://lsces.uk Model Engineers Digital Workshop - https://medw.uk Rainbow Digital Media - https://rainbowdigitalmedia.uk
Date: Wed, 3 Jun 2020 12:22:32 +0100 From: Lester Caine <lester@lsces.uk> Message-ID: <3576438c-e0bd-7188-2153-43d80e9054b8@lsces.uk> | The CURRENT problem is even when | present day time offsets change, a distribution will simply apply the | new rules without any flagging that currently stored data may now be | incorrect. I'm not sure what you're getting at there. There are two kinds of changes that may be made to a zone. One affects only future timestamps, and is the common old garden variety "government interference" or however you think of it. These are far and away the most common changes. While when we have insufficient notification of a change, it may be applied retrospectively, when that has happened, everyone tends to be very aware that there were bad time conversions for a while. In any case, old historic stored data isn't affected at all by this kind of change, it is as valid (or not) after the change as it was before. The second kind of change is a correction to historic data. This happens when we discover an error in what was present (and these days, almost only ever affects pre-1970 timestamps). In those, if someone had stored the UTC converted form of some local timestamp, then after the correction they wouldn't get back the data that was originally used to produce it. The problem there is having discarded the original data instead of retaining it. Always retain the original source data. Then by all means, when computing, convert timestamps from their various local values to UTC so they can be more easily correctly ordered (or whatever) but use those converted values only for transient computations. Store the original. Always. If that is done correctly, then after a correction to old data, the results might be different than they were before - but that's only because they were wrong before, and (hopefully) better after the fix. The only time it makes sense to store timestamps in other than the original form is when we *know* that the conversion is correct (and hence, no later correction will change it). For users of tzdata that really only applies to post-1970 timestamps. kre
On 03/06/2020 13:31, Robert Elz wrote:
The problem there is having discarded the original data instead of retaining it. Always retain the original source data. Then by all means, when computing, convert timestamps from their various local values to UTC so they can be more easily correctly ordered (or whatever) but use those converted values only for transient computations. Store the original. Always.
To some extent I totally agree with you, but unfortunately many systems simply assume that there is never a problem with the TZ data so they don't need to store both values. In fact they simply assume that ALL they need to store is the offset, so have no idea that in six months time the offset may be different. Firebird is just 'adding timezone datatypes' and simply ignore anything pre 1970 so have no way to store the initial seconds offsets even. Just go with what works for most people and ignore the rest? SQL standards are just a mess when it comes to timezone data types and simply storing some tz identifier is not the whole story! I came into this 20 years ago now while working with a data archive which has now been simply dumped because we had no idea what rules were used to produce the normalised data. Nowadays yes it does make sense to store both an original time and a normalised time, along with a location, and a record of which version of rules was used to do the normalization. Add to that a flag that indicates if the UTC time is fixed! My point is that for a miniscule number of users a short term change in TZ data may affect their plans yet much of the time it is judged to difficult to bother about? Now with more virtual international meetings, the potential of sessions moving an hour is not zero and being able to easily identify that the recorded normalized time has now changed at a very minimum flags that you need to check if the local time now needs to change since the UTC time is fixed by the event calendar. In the past I have had cases where international medical meetings have been disrupted over ramadan when sessions started at local time ignoring the fact that is was now an hour adrift from the master timetable. The website may have been updated at short notice, but how long do browsers hold 'old' copies but in most cases the 'local' staff did not even think about their overseas venues :( For historic data then the gold standard is two timestamps, location information and the rule set version ... along with a tolerance value, but the published time is UTC ... -- Lester Caine - G8HFL ----------------------------- Contact - https://lsces.uk/wiki/Contact L.S.Caine Electronic Services - https://lsces.uk Model Engineers Digital Workshop - https://medw.uk Rainbow Digital Media - https://rainbowdigitalmedia.uk
Date: Wed, 3 Jun 2020 16:02:45 +0100 From: Lester Caine <lester@lsces.uk> Message-ID: <3b7f0c78-4dd7-0f2e-a8e3-2b24401e7e1c@lsces.uk> | To some extent I totally agree with you, but unfortunately many systems | simply assume that there is never a problem with the TZ data so they | don't need to store both values. They don't in any case - the only one needed is the authoritative timestamp, which is almost always local time, somewhere (occasionally it might be UTC, but for most of us, that's rare - for astronomers perhaps less so). | I came into this 20 years ago I've been involved with it for longer than that - back to my first unix experience, in '76, where the US tz rules were compiled into the code, and most people in AU simply adjusted their computer's clock (their offset from UTC) 4 times a year (when the US switched summer time on and off, and when AU did - and yes, that meant that the generated GMT timestamps were wrong, most of the time). From that (a bit later) I was responsible for the mess that existed until ado invented tzdata (and yes, I mean the 2nd arg to gettimeofday()).
From all of this I have learned that time is hard. Really hard.
Many people believe that since they learned to tell the time when they were 4 or 5 years old, and have been doing it ever since, they know all there is to know. That's sad... | now while working with a data archive | which has now been simply dumped because we had no idea what rules were | used to produce the normalised data. That's a pity, byt sometimes past mistakes simply come back to bite, and sometimes bite hard. Note that the error there was normalising the data, if that hadn't been done, none of the rest of it would matter, you'd now have the original data and could manipulate it however seems best, for now, regardless of what anyone did with it decades ago (and if you get it all wrong, future generations could cope, because they'd also still have the original data, and can fix any errors). | Nowadays yes it does make sense to | store both an original time and a normalised time, No, it doesn't, just the original, plus ... | along with a location, yes, something which can be mapped into a timezone - and as accurate a location as possible. | and a record of which version of rules was used to do the | normalization. Don't care about that, since the result won't be being saved. | Add to that a flag that indicates if the UTC time is fixed! If the UTC time is the authoritative one, that is what is stored. No need for extra flags. Just the authoritative time - the one which defines whatever it is that is being recorded. | Now with more virtual international meetings, the potential of | sessions moving an hour is not zero and being able to easily | identify that the recorded normalized time has now changed Here you're talking about current timestamps, and updates to the rules. If you need that kind of info, then as well as the authoritative time, you can also store the "reported time" - then if a later conversion generates something different, you can do whatever is appropriate (but the recorded reported time is used only for that comparison). | at a very minimum flags that you need to check if the local time | now needs to change since the UTC time is fixed by the event calendar. This kind of thing should be planned well into the future, with the local times for several future meetings available to all to peruse (whatever the base, or fixed, time is .. many of the meetings like this that I've been involved with actually anchor the time to some US zone, yes, parochial, but what that is is unimportant, as long as the auth time and info leading to its zone are the primary source (and not some conversion derived from that). There will still be disruptions when time offsets change with little notice, nothing much anyone can do about that except harangue whoever is responsible for the problems they cause - but there's no good reason that international e-meetings should be any better off than airlines, etc. kre
Just to be clear: The statistics below represent the the geographical distribution of the birth places of users of astro.com, i.e. of people with an active interest in astrology. The figures do not relate to the distribution of general population figures, or state surfaces or anything like that. On 03.06.20 12:01, Alois Treindl wrote:
It is known that pre-1970 time zone history is represented in TZ database only for the cities which define the zone names. Other areas of countries are often not correctly represented in TZ database before 1970.
The worst are the USA and Canada, because there, in many states, the authority about daylight time adherence was in the hands of counties or towns, up to the Uniform Timezone Act of 1967.
I am personally responsible for providing correct timezone history for the astrological service company Astrodienst AG in Switzerland, which runs among other things the website www.astro.com
As source for timezone history we use, besides TZ database, the International and American Atlas by Thomas Shanks, and our own research.
The data we have are still far from perfect, because many local details of time zone and daylight saving time use are unknown, especially in the US and Canada.
We have about 25 million birth data records in our database.
I have analyzed to what extent TZ database represents the time zone information correctly, for these data.
I think the results are of some interest for the TZ community.
The overall sums are:
tz_count 24'674'767 data records, 100% tz_irange 23'675'636 in time range and region which TZ covers correctly, 96% tz_good 736'769 in time range and region where TZ is unreliable, but correct, 3% tz_bad 262'362 in time range and region where TZ database gives false result. 1%
This means that only 1% of our astrological charts would be false, if we used TZ database only, without any extra time zone history sources.
More details by country or US/Canada state. state count in_range % good % bad % USA AK (US) 19889 99.6 0.0 0.3 Alaska AL (US) 64975 78.6 13.3 8.1 Alabama AR (US) 43159 77.2 12.7 10.0 Arkansas AZ (US) 96545 97.9 2.0 0.0 Arizona CA (US) 981941 100.0 0.0 0.0 California CO (US) 107204 99.5 0.5 0.0 Colorado CT (US) 89273 95.8 4.2 0.0 Connecticut DC (US) 58401 85.7 13.1 1.1 District of Columbia DE (US) 14376 91.3 8.5 0.3 Delaware FL (US) 241083 89.0 6.5 4.5 Florida GA (US) 114613 85.8 7.1 7.2 Georgia (US) HI (US) 44070 100.0 0.0 0.0 Hawaii IA (US) 50194 72.7 17.1 10.2 Iowa ID (US) 23877 92.2 6.4 1.4 Idaho IL (US) 267459 90.4 9.5 0.1 Illinois IN (US) 88263 76.8 21.6 1.6 Indiana KS (US) 45462 74.4 15.0 10.6 Kansas KY (US) 56494 74.6 12.5 12.9 Kentucky LA (US) 76017 80.3 11.9 7.8 Louisiana MA (US) 193439 87.1 12.9 0.1 Massachusetts MD (US) 91577 91.2 8.1 0.7 Maryland ME (US) 31873 96.8 2.8 0.5 Maine MI (US) 185568 86.0 13.3 0.7 Michigan MN (US) 95725 75.0 17.0 7.9 Minnesota MO (US) 91038 79.4 14.4 6.2 Missouri MS (US) 28859 70.0 20.4 9.6 Mississippi MT (US) 19767 73.7 24.7 1.5 Montana NC (US) 106991 82.9 10.2 6.9 North Carolina ND (US) 12427 71.5 17.5 10.9 North Dakota NE (US) 28670 69.1 18.2 12.7 Nebraska NH (US) 23707 91.2 8.2 0.6 New Hampshire NJ (US) 190614 99.6 0.3 0.1 New Jersey NM (US) 42651 80.6 18.4 0.9 New Mexico NV (US) 36053 99.4 0.4 0.3 Nevada NY (US) 628944 96.8 3.0 0.1 New York OH (US) 191745 73.9 18.3 7.8 Ohio OK (US) 55229 73.7 16.5 9.8 Oklahoma OR (US) 98166 87.1 9.8 3.1 Oregon PA (US) 228064 89.8 9.7 0.6 Pennsylvania PR (US) 37149 100.0 0.0 0.0 Puerto Rico RI (US) 26089 97.1 2.8 0.1 Rhode Island SC (US) 40233 82.9 10.0 7.1 South Carolina SD (US) 12863 72.7 18.6 8.7 South Dakota TN (US) 75125 82.9 9.3 7.8 Tennessee TX (US) 327328 83.3 9.8 6.9 Texas UT (US) 48909 82.7 16.2 1.1 Utah VA (US) 109591 85.5 10.1 4.4 Virginia VT (US) 17598 86.4 12.7 0.9 Vermont WA (US) 159835 84.8 11.9 3.3 Washington WI (US) 85917 75.5 18.5 5.9 Wisconsin WV (US) 22800 72.7 20.3 7.0 West Virginia WY (US) 8962 75.1 23.9 0.9 Wyoming
Canada AB (CAN) 64249 97.9 2.0 0.1 Alberta BC (CAN) 99214 98.0 2.0 0.0 British Columbia (CAN) MB (CAN) 21563 69.8 27.5 2.6 Manitoba (CAN) NB (CAN) 10438 89.0 8.3 2.7 New Brunswick (CAN) NF (CAN) 7329 98.4 1.2 0.4 Newfoundland (CAN) NS (CAN) 18829 75.4 16.3 8.2 Nova Scotia (CAN) NT (CAN) 1137 80.7 0.0 19.3 Northwest Territories (CAN) ON (CAN) 239755 86.3 12.4 1.3 Ontario (CAN) PE (CAN) 2036 80.0 16.1 3.9 Prince Edward Island (CAN) QU (CAN) 126300 88.4 9.8 1.8 Quebec (CAN) SK (CAN) 24390 78.2 12.4 9.5 Saskatchewan (CAN) YK (CAN) 932 100.0 0.0 0.0 Yukon Territories (CAN)
Other countries with significant issues in TZ database, only those with > 1% bad results BY 13966 97.1 1.2 1.7 Belarus CAR 268 33.2 0.4 66.4 Central African Repub CHINA 214617 96.0 2.4 1.6 China CONGO 1026 49.9 0.0 50.1 Congo (Brazaville) GABON 704 47.7 0.0 52.3 Gabon GUAM 3613 45.6 0.0 54.4 Guam INDSA 51950 97.2 0.8 2.0 Indonesia NIGER 294 76.9 3.7 19.4 Niger NIRE (UK) 16228 97.1 0.7 2.1 Northern Ireland RU 177880 89.4 8.5 2.1 Russian Federation SUDAN 2495 0.0 0.0 100.0 Sudan (?? my be my fault) TAAF 2 0.0 0.0 100.0 French Kerguelen Islands TANZ 4162 68.3 30.2 1.5 Tanzania TOK 17 41.2 0.0 58.8 Tokelau UA 71260 95.9 2.4 1.7 Ukraine VIET 299630 97.6 0.8 1.6 Vietnam
participants (4)
-
Alois Treindl -
Clive D.W. Feather -
Lester Caine -
Robert Elz