
Hello, I'm investigating how to deal with timezones for an application that will get to processes billions of timestamped events, generated in just about any part of the world. Timestamps are in local time (without timezone, only post- and other codes) and we need them in UTC time. Money is involved, so it is of critical importance that intervals between events are calculated correctly. There's two steps involved in this: - given some geo data and some lookup tables from a database like geopostcodes.com, determine the timezone id - convert to UTC using the latest Time Zone Database Development wise it's fairly simple to automate. We're using joda time and this library allows to plugin a custom Timezone provider which can then use the latest TZ database. However, I see some issues: 1) I noticed that certain timezone ids are deleted in newer versions: tzdata2014a contains America/Shiprock, newer versions don't. This may lead to issues whenever recalculations are needed in the future. What is the likelyhood that ids are deleted? 2) Do rules from the past ever change? In other words, can we assume that a recalculation of a past local date to a UTC date will always yield the same result with newer versions of the TZ database? If so, is this a common thing and would one then recommend saving the TZ database version with which the conversion was performed along with the UTC date? Hope one of you can shed some light! Cheers, Joris

Joris Van den Bogaert wrote:
1) I noticed that certain timezone ids are deleted in newer versions: tzdata2014a contains America/Shiprock, newer versions don't.
No, America/Shiprock is still present in the latest tz release. It's in the 'backward' file. If you're concerned about backward compatibility, you should use the 'backward' file.
2) Do rules from the past ever change?
Yes, it happens all the time as we find out more about the past (or, in some cases, find out that what we thought we knew was bogus). A proposed change of that sort for Moscow in 1921 was published today, for example; see: https://github.com/eggert/tz/commit/8d558674ce15736f4db98332e9d1e86b1555c340
would one then recommend saving the TZ database version
One might, if one knew that one was using exactly a particular TZ database version. But that's often not the case; if someone else installed your database, they may have applied their own updates, e.g., point updates for pressing changes. And depending on your configuration, perhaps the TZ database might be updated during your computation. So it might be wise for you to record both UTC and local time, instead of trying to rely on tz version.

Hi Paul, Oops, I forgot to compile and include "backward" as well. Many thanks. Looking at the git log, it looks like most changes are typos and adjustments to data far away in the past, like the early 1900's or Big Bang :) changes and sometimes the very near future, like Egypt and Turkey. I didn't find any substantial changes that were made in the near past. We were already planning to record both UTC and local time, but we were considering adding an extra field TZ_VERSION. With the inclusion of the "backward" file in the compilation process, that does not seem to be necessary. In addition to storing the local and UTC time, I believe we need to include the timezone id as well. If not we'd have to version the GEO lookup tables as well. Hypothetically, let's say Catalonia becomes a separate state on 1/1/2015 and decides to have a timezone UTC+01:30 instead of CET. Do you guys then decide to create a new timezone Europe/Barcelona? What (should) happens when converting the datetime 1/1/2014 with "Europe/Barcelona", a timezone that didn't exist at that time? Sorry for the trivial questions, I'm new to this field. Cheers, Joris -----Original Message----- From: Paul Eggert Sent: Sunday, June 01, 2014 2:02 AM To: Joris Van den Bogaert ; tz@iana.org Subject: Re: [tz] changes to the TZ database over versions Joris Van den Bogaert wrote:
1) I noticed that certain timezone ids are deleted in newer versions: tzdata2014a contains America/Shiprock, newer versions don't.
No, America/Shiprock is still present in the latest tz release. It's in the 'backward' file. If you're concerned about backward compatibility, you should use the 'backward' file.
2) Do rules from the past ever change?
Yes, it happens all the time as we find out more about the past (or, in some cases, find out that what we thought we knew was bogus). A proposed change of that sort for Moscow in 1921 was published today, for example; see: https://github.com/eggert/tz/commit/8d558674ce15736f4db98332e9d1e86b1555c340
would one then recommend saving the TZ database version
One might, if one knew that one was using exactly a particular TZ database version. But that's often not the case; if someone else installed your database, they may have applied their own updates, e.g., point updates for pressing changes. And depending on your configuration, perhaps the TZ database might be updated during your computation. So it might be wise for you to record both UTC and local time, instead of trying to rely on tz version.

On Sun, 01 Jun 2014, Joris Van den Bogaert wrote:
Looking at the git log, it looks like most changes are typos and adjustments to data far away in the past, like the early 1900's or Big Bang :) changes and sometimes the very near future, like Egypt and Turkey. I didn't find any substantial changes that were made in the near past.
There are occasionally changes that affect the recent past, such as when governments make changes with very short notice. It is sometimes the case that the change has already taken effect in the real world before the tz database is updated, or at least before the new version is published and widely used.
Hypothetically, let's say Catalonia becomes a separate state on 1/1/2015 and decides to have a timezone UTC+01:30 instead of CET. Do you guys then decide to create a new timezone Europe/Barcelona?
In that hypothetical situation, a new timezone would be created, and named after the largest city or population centre in the affected area, which could well be Barcelona.
What (should) happens when converting the datetime 1/1/2014 with "Europe/Barcelona", a timezone that didn't exist at that time?
Any hypothetical new Europe/Barcelona zone would attempt to report the time as it was in the city of Barcelona at least as far into the past as 1 Jan 1970 (or perhaps farther, if good records exist), and as far into the future as can reasonably be predicted from legislation and other sources. So 1 Jan 2014 in a hypothetical new Europe/Barcelone zone would be identical to 1 Jan 2014 in the existing Europe/Madrid zone. --apb (Alan Barrett)

On 01/06/14 17:51, Joris Van den Bogaert wrote:
Hi Paul,
Oops, I forgot to compile and include "backward" as well. Many thanks.
Looking at the git log, it looks like most changes are typos and adjustments to data far away in the past, like the early 1900's or Big Bang :) changes and sometimes the very near future, like Egypt and Turkey. I didn't find any substantial changes that were made in the near past. That's because the near past is generally well-known and thus 100% accurate :)
Hypothetically, let's say Catalonia becomes a separate state on 1/1/2015 and decides to have a timezone UTC+01:30 instead of CET. Do you guys then decide to create a new timezone Europe/Barcelona? Yes.
What (should) happens when converting the datetime 1/1/2014 with "Europe/Barcelona", a timezone that didn't exist at that time? It will provide the same values as if computed with Europe/Madrid.
Don't think on it as “the timezone didn't exist” but “using the same time rules as Barcelona at that time”.
Sorry for the trivial questions, I'm new to this field.
Cheers, Joris You're welcome :)

Joris Van den Bogaert wrote:
I didn't find any substantial changes that were made in the near past.
That depends on what you mean by "substantial" and "near", but for example 2014a made these changes: * Fiji ended DST on 2014-01-19 at 02:00, not the previously-scheduled 03:00. * Ukraine switched from Moscow to Eastern European time on 1990-07-01 (not 1992-01-01), and observed DST during the entire next winter. * In 1988 Israel observed DST from 04-10 to 09-04, not 04-09 to 09-03. Unfortunately a significant fraction of the entries in the tz database are incorrect. We do our best to fix errors, though, so please continue to expect fixes in the future.

On 01/06/14 00:36, Joris Van den Bogaert wrote:
Hello, I'm investigating how to deal with timezones for an application that will get to processes billions of timestamped events, generated in just about any part of the world. Timestamps are in local time (without timezone, only post- and other codes) and we need them in UTC time. Money is involved, so it is of critical importance that intervals between events are calculated correctly. There's two steps involved in this: - given some geo data and some lookup tables from a database like geopostcodes.com, determine the timezone id - convert to UTC using the latest Time Zone Database Matching geo data to a timezone is a bit fuzzy. On the other hand, you state that correct calculation is critical.
2) Do rules from the past ever change? In other words, can we assume that a recalculation of a past local date to a UTC date will always yield the same result with newer versions of the TZ database? If so, is this a common thing and would one then recommend saving the TZ database version with which the conversion was performed along with the UTC date? Yes. If new data is discovered and it turns out the previous rule was wrong, it is updated. The new one will (hopefully) be more accurate, but there will be a difference from converting with the previous version. Also note that the applicable timezone could change.
You may know more than us how likely it is that the information about the period you are treating wasn't accurate. Now I'm thinking that perhaps you aren't working on historic records but with future ones, in which case it's very likely that tz will be right (but in that case why not store them in UTC directly?) ... excluding the cases where a Government decides to change time and tz/your team is not able to provide/install the new rules before the change. :/

Hi Angel,
Matching geo data to a timezone is a bit fuzzy. On the other hand, you state that correct calculation is critical.
We’re working with old protocols that don’t include TZ information in the data that is sent to us, so we have to get thet TZ info through geo mapping tables. We’ve been looking at geopostcodes.com. We’d like it to be as correct as possible, but realize it’s not possible to cover everything. Eg. 2014-10-26T02:05:00 may refer to 2014-10-26T02:05:00.000+02:00 or 2014-10-26T02:05:00.000+01:00
Now I'm thinking that perhaps you aren't working on historic records but with future ones, in which case it's very likely that tz will be right (but in that case why not store them in UTC directly?)
Our data is all about real-time events, but when reports need to be regenerated for some reason in the future, they will be historic. Hence the questions about tz versions. Cheers, Joris

On 01/06/14 18:03, Joris Van den Bogaert wrote:
Hi Angel,
Matching geo data to a timezone is a bit fuzzy. On the other hand, you state that correct calculation is critical. We’re working with old protocols that don’t include TZ information in the data that is sent to us, so we have to get thet TZ info through geo mapping tables. We’ve been looking at geopostcodes.com. I would evaluate switching the software to start reporting in UTC after some epoch (2015-01-01? 2014-07-01?).
We’d like it to be as correct as possible, but realize it’s not possible to cover everything. Eg. 2014-10-26T02:05:00 may refer to 2014-10-26T02:05:00.000+02:00 or 2014-10-26T02:05:00.000+01:00 Right. Tha local time alone is ambiguous. Although I was thinking in cases of "We are not sure if this Valley uses the timezone of the town 10 Km North or the one 8 Km South" As you now clarified that you are dealing with «recent» events, that's unlikely to happen. Actually, you may be able to tag each source with the tzid.
Now I'm thinking that perhaps you aren't working on historic records but with future ones, in which case it's very likely that tz will be right (but in that case why not store them in UTC directly?) Our data is all about real-time events, but when reports need to be regenerated for some reason in the future, they will be historic. Hence the questions about tz versions. If you convert them properly to UTC today, then you won't need them regenerated in the future. There is a big difference between processing what happened 40 years ago and what is happening right now. Nowadays, you can easily know your offset.
Cheers

On Sun, 1 Jun 2014, Joris Van den Bogaert wrote:
Hi Angel,
Matching geo data to a timezone is a bit fuzzy. On the other hand, you state that correct calculation is critical.
We’re working with old protocols that don’t include TZ information in the data that is sent to us, so we have to get thet TZ info through geo mapping tables. We’ve been looking at geopostcodes.com.
If you have lat/lon, have a look at what I did here: http://derickrethans.nl/what-time-is-it.html There is also a (free) dataset at http://download.geonames.org/export/dump/ linking nearly every city in the world with a TZiD -- in case you need to cover more than just the US. cheers, Derick
participants (5)
-
Alan Barrett
-
Derick Rethans
-
Joris Van den Bogaert
-
Paul Eggert
-
Ángel González