Time zone confusion and implementation hints
Hi, I'm currently implementing an API for the tz database that should look like .NET's TimeZoneInfo class to replace that in my application. The API includes: * Convert a date and time from UTC to a time zone * Convert a date and time from a time zone to UTC * Return the base UTC offset of a time zone for a date and time * Return the save offset of a time zone for a date and time * Determine whether a date and time is ambiguous in a time zone * Determine whether a date and time is invalid in a time zone It will come with a code generator that "compiles" the text files into C# code so that there's no need to deploy additional files with the application. The API's internal data structures are very similar to the tz database records, these are already coded. Currently I'm stuck because of the flexibility of the tz rules. These can define transition times either in local, standard or universal time. Additionally, the caller of my API passes a date and time that is either local or universal. All this needs to be matched correctly. The right rule needs to be selected and there can be ambiguous or invalid times all along the way. It's not only hard to implement, it's also hard to describe, it's very confusing not to have a common base [zone] you can hold on to. I have now read about any manpage and code file from the tzcode distribution, starting with zic.c and going backwards in the list. I think the file localtime.c does the actual conversion but it's hard to read because it contains only few comments and uses short all-lowercase global variable names. I couldn't follow it to its core where the magic happens. I have analysed .NET's TimeZoneInfo implementation with .NET Reflector but it's not half as mighty as what the tz database can express. I also looked at the C# library ZoneInfo, but it uses an incompatible license, reads the text files at runtime and last but not least is quite inexact around transition times which it inacceptable for a calendar application. Also I think it's a good idea to write my own API for it because I'll need a Java port later as well, working on the exact same data. Could somebody assist me in this, please? Are there helpful thoughts to use when implementing such a programme? Right now my biggest problem is the transition times and in consequence finding the right rule to use for a given date and time. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
Hi Yves, There was an effort made a couple of years ago on CodePlex, implementing the TZDatabase. Here's the link: http://publicdomain.codeplex.com/ Regards, Tom On Thu, Jul 1, 2010 at 9:22 PM, Yves Goergen <nospam.list@unclassified.de>wrote:
Hi,
I'm currently implementing an API for the tz database that should look like .NET's TimeZoneInfo class to replace that in my application. The API includes:
* Convert a date and time from UTC to a time zone * Convert a date and time from a time zone to UTC * Return the base UTC offset of a time zone for a date and time * Return the save offset of a time zone for a date and time * Determine whether a date and time is ambiguous in a time zone * Determine whether a date and time is invalid in a time zone
It will come with a code generator that "compiles" the text files into C# code so that there's no need to deploy additional files with the application. The API's internal data structures are very similar to the tz database records, these are already coded.
Currently I'm stuck because of the flexibility of the tz rules. These can define transition times either in local, standard or universal time. Additionally, the caller of my API passes a date and time that is either local or universal. All this needs to be matched correctly. The right rule needs to be selected and there can be ambiguous or invalid times all along the way. It's not only hard to implement, it's also hard to describe, it's very confusing not to have a common base [zone] you can hold on to.
I have now read about any manpage and code file from the tzcode distribution, starting with zic.c and going backwards in the list. I think the file localtime.c does the actual conversion but it's hard to read because it contains only few comments and uses short all-lowercase global variable names. I couldn't follow it to its core where the magic happens.
I have analysed .NET's TimeZoneInfo implementation with .NET Reflector but it's not half as mighty as what the tz database can express. I also looked at the C# library ZoneInfo, but it uses an incompatible license, reads the text files at runtime and last but not least is quite inexact around transition times which it inacceptable for a calendar application. Also I think it's a good idea to write my own API for it because I'll need a Java port later as well, working on the exact same data.
Could somebody assist me in this, please? Are there helpful thoughts to use when implementing such a programme? Right now my biggest problem is the transition times and in consequence finding the right rule to use for a given date and time.
-- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
If you're looking at porting to Java at some stage, you might want to consider using noda-time: http://code.google.com/p/noda-time/ <http://code.google.com/p/noda-time/>which is itself a port of joda-time ( http://joda-time.sourceforge.net/), both of which use the TZ database as their base. On 2 July 2010 08:29, Thomas KIPP <trwk76@gmail.com> wrote:
Hi Yves,
There was an effort made a couple of years ago on CodePlex, implementing the TZDatabase.
Here's the link: http://publicdomain.codeplex.com/
Regards,
Tom
On Thu, Jul 1, 2010 at 9:22 PM, Yves Goergen <nospam.list@unclassified.de>wrote:
Hi,
I'm currently implementing an API for the tz database that should look like .NET's TimeZoneInfo class to replace that in my application. The API includes:
* Convert a date and time from UTC to a time zone * Convert a date and time from a time zone to UTC * Return the base UTC offset of a time zone for a date and time * Return the save offset of a time zone for a date and time * Determine whether a date and time is ambiguous in a time zone * Determine whether a date and time is invalid in a time zone
It will come with a code generator that "compiles" the text files into C# code so that there's no need to deploy additional files with the application. The API's internal data structures are very similar to the tz database records, these are already coded.
Currently I'm stuck because of the flexibility of the tz rules. These can define transition times either in local, standard or universal time. Additionally, the caller of my API passes a date and time that is either local or universal. All this needs to be matched correctly. The right rule needs to be selected and there can be ambiguous or invalid times all along the way. It's not only hard to implement, it's also hard to describe, it's very confusing not to have a common base [zone] you can hold on to.
I have now read about any manpage and code file from the tzcode distribution, starting with zic.c and going backwards in the list. I think the file localtime.c does the actual conversion but it's hard to read because it contains only few comments and uses short all-lowercase global variable names. I couldn't follow it to its core where the magic happens.
I have analysed .NET's TimeZoneInfo implementation with .NET Reflector but it's not half as mighty as what the tz database can express. I also looked at the C# library ZoneInfo, but it uses an incompatible license, reads the text files at runtime and last but not least is quite inexact around transition times which it inacceptable for a calendar application. Also I think it's a good idea to write my own API for it because I'll need a Java port later as well, working on the exact same data.
Could somebody assist me in this, please? Are there helpful thoughts to use when implementing such a programme? Right now my biggest problem is the transition times and in consequence finding the right rule to use for a given date and time.
-- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
Hi Yves I have implemented two functions which use the tz database. They are original code written in standard C++, that are based on internal tables read directly from tz gz files. 1) SetWT(const string &zonename, const RDTime &wt) 2) SetUT(const string &zonename, const RDTime &ut) The first takes a current zonename (ie: 'America/Montreal') and its wall time and fills a data structure with the matching wall, standard and ut times with zone strings ('EDT','EST') The second is the same, except that it takes a local zonename and the current ut. Combined, these two functions can convert any datetime in the past/future to another datetime in a different zone. I'd show you the code but unfortunately they are part of a series of commercial products. But perhaps I can give you some input on how it was implemented, or if there is enough interest I could convince the boss to release licenced lib files for a small cost. David Patte Senior Designer, C++ Relative Data Inc. Yves Goergen wrote:
Hi,
I'm currently implementing an API for the tz database that should look like .NET's TimeZoneInfo class to replace that in my application. The API includes: ....
If you want yet another implementation, the Android mobile phone software has a Java version under the Apache license that works from a packed archive of files compiled with zic. The source code is at http://android.git.kernel.org/?p=platform/libcore.git;a=tree;f=luni/src/main... <http://android.git.kernel.org/?p=platform/libcore.git;a=tree;f=luni/src/main...> Eric
On Jul 1, 2010, at 12:22 PM, Yves Goergen wrote:
I'm currently implementing an API for the tz database that should look like .NET's TimeZoneInfo class to replace that in my application. The API includes:
* Convert a date and time from UTC to a time zone * Convert a date and time from a time zone to UTC * Return the base UTC offset of a time zone for a date and time * Return the save offset of a time zone for a date and time * Determine whether a date and time is ambiguous in a time zone * Determine whether a date and time is invalid in a time zone
It will come with a code generator that "compiles" the text files into C# code so that there's no need to deploy additional files with the application. The API's internal data structures are very similar to the tz database records, these are already coded.
Currently I'm stuck because of the flexibility of the tz rules. These can define transition times either in local, standard or universal time. Additionally, the caller of my API passes a date and time that is either local or universal. All this needs to be matched correctly. The right rule needs to be selected and there can be ambiguous or invalid times all along the way. It's not only hard to implement, it's also hard to describe, it's very confusing not to have a common base [zone] you can hold on to.
I have now read about any manpage and code file from the tzcode distribution, starting with zic.c and going backwards in the list. I think the file localtime.c does the actual conversion but it's hard to read because it contains only few comments and uses short all-lowercase global variable names. I couldn't follow it to its core where the magic happens.
Note that the localtime.c code reads in the compiled time zone files and uses those; those files are much simpler than the rule files - they have an array of "transition times", all of which are represented as UTC, that refer to items that have: offset from GMT; indication of whether DST/summer time is in effect; reference to the time zone abbreviation (e.g., EST vs. EDT for US/Canada Eastern Standard Time vs. Eastern Daylight Time); for the local time state that comes into effect at that transition time, so, for any given UTC time, it just has to find the earliest transition time >= the given time and use the local time state that comes into effect at that transition time to convert the UTC time to local time. The flexibility of the tz rules are handled in zic.c - it's the compiler that translates the text files with rules into the compiled time zone files. As for "the caller of my API passes a date and time that is either local or universal", note that localtime() takes a universal-time argument (seconds since January 1, 1970, 00:00:00 UTC) and mktime() takes a local-time argument (year, month, day, hours, minutes, seconds), so you might want to look at those to see how they find the right transition time. I would *STRONGLY* suggest that your compiler duplicate the work that zic does, and compile the text files into code that just has universal-time transition times and, for each transition time, data, or a reference to data, giving the time zone offset (and any other information your APIs need, such as "is this daylight savings/summer time?" and "what's the time zone abbreviation?").
[This part of my reply was composed after reading several replies:] When a zone defines that a rule set is valid until before 1990 Sep 3 at 2:00w or 2:00s, how exactly am I supposed to find out when that is? Either way, the exact time of that switch depends on the UTC and save offsets around that point in time, ie the rule set (Zone definition) and/or the relevant rule in the moment before and/or after that time. Both base UTC offset and additional save offset may change in that moment. Which of them shall be applied, the previous or the next interval's values? Is there some higher-level algorithm available of how that works? Right now I can't even think of a way to do it 'manually' by reading the text files and calculating stuff in my brain. How should I tell a stupid computer what to do? The ZoneInfo code (C#) doesn't even care about it and produces errors. The (now dead?) (full-blown) PublicDomain library (C#) seems to be restricted to two transition times per year (like Windows) and also doesn't seem to care very well about the exact moment of change. I don't understand the cryptic original code (C) of the tz distribution. Android's implementation (Java) in org.apache.harmony seems too short to be complete. The tz format description doesn't explain how to use the data programmatically. So right now I understand the data format but don't see how it can be used for real computing. [This part of my reply was composed after reading Guy Harris' reply:] I was totally unaware that zic does all that. It sounds interesting though. I think I could find an algorithm for UTC to local conversion with all transition times in UTC. The reverse function however would still be educated try and error. I can't compile zic.c with Visual Studio 2008, some unixoid libraries are missing and I don't think I have them. Using the Makefile doesn't look more promising. So I need to implement zic's function - or something similar - myself. The format description of the compiled tzfile is very verbose which makes it hard to get an overview of what is stored where. I have attached an incomplete pseudo-code description of what I think it might look like. Anyway it would just be an aid in reverse-analysing the zic code. (I find it helpful to start from the output so I won't see anything I don't need for that.) But now I'm wondering about where the annual rules are gone. The compiled tzfile only seems to contain fixed transition times. Anything that's not stored literally in the file is unknown. Are all dates between 0001-01-01 and 9999-12-31 stored in the file? Or just between 1970-01-01 and 2038-xx-xx, or maybe even in a configurable time span? Doesn't that produce masses of redundant data and limit the usage of the database? -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de struct TZFileFormat { char8[4] magic = "TZif"; char8 version = '2'; byte[15] reserved; int32 ttIsGmtCount; int32 ttIsStdCount; int32 leapCount; int32 timeCount; int32 typeCount; int32 charCount; int32[timeCount] transitionTimes; // UNIX timestamp* of each transition time (that's when the rule changes) byte[timeCount] ttLocalTimeTypes; // specify 'local time type' (see timeTypes) for the same-indexed transitionTimes item ttInfo[typeCount] timeTypes; // purpose unclear ??? // array of time zone abbreviation characters (undefined) leapData[leapCount] leapSeconds; // sorted by .time byte[ttIsStdCount]; // standard/wall indicators for transition times associated with local time types - unused? byte[ttIsGmtCount]; // UTC/local indicators for transition times associated with local time types - unused? // now comes version-2-specific data (see tzfile(5), last paragraph) // which is the entire file once again, only with int64 types for all transition times and leap seconds times // followed by char8[] = /\n[^\n]*\n/ (purpose unclear) } struct ttInfo { int32 gmtOffset; // offset to UTC in seconds byte isDst; // indicates whether this is DST byte abbrIndex; // index into "array of time zone abbreviation characters" (undefined) } struct leapData { int32 time; // UNIX timestamp* of the time when a leap second occurs int32 total; // total number of leap seconds to be applied after that time } // *) UNIX timestamp: seconds since 1970-01-01T00:00:00Z
On 07/02/10 07:18, Yves Goergen wrote:
When a zone defines that a rule set is valid until before 1990 Sep 3 at 2:00w or 2:00s, how exactly am I supposed to find out when that is?
You use the rules that were in effect just before the change. It wouldn't make sense any other way, surely.
I don't understand the cryptic original code (C) of the tz distribution.
Hey, it's not _that_ cryptic! But I'm afraid that it's the canonical version. (Perhaps you can find an old, doddering C expert nearby who can explain it to you. :-)
Paul Eggert wrote:
Hey, it's not _that_ cryptic! But I'm afraid that it's the canonical version. (Perhaps you can find an old, doddering C expert nearby who can explain it to you. :-)
Since i've been coding C algorithms since the era of the first Vax machines - i may actually qualify as doddering :) Yves, there are of course, many approaches to analysing the data, but parsing (or compiling) the files in the gz into zone, link, rule records is fairly straightforward. You can then sort the records, remove redundant records, and link the structures. The resulting table can then searched using binary searching and a little bit of table walking to identify the rule in effect for a particular zone & time. You can then apply your time offsets at that point in the structure, to determine the corresponding wall, standard, and univeral times. I have acquired permission from this end to produce a commercial DLL to wrap our tz parser and time converter algorithms as a licensable product if there is enough interest. I might even be able to convince the boss to let me release it as open source - but that would take some convincing ....if anyone has suggestions i can relay, i can pass them on.
On 02.07.2010 18:59 CE(S)T, David Patte wrote:
Yves, there are of course, many approaches to analysing the data, but parsing (or compiling) the files in the gz into zone, link, rule records is fairly straightforward. You can then sort the records, remove redundant records, and link the structures.
The resulting table can then searched using binary searching and a little bit of table walking to identify the rule in effect for a particular zone & time.
A (binary searchable) table implies a flat two-dimensional structure. Zones and rules are more than two-dimensional. At the point when I resolve zones and rules into a linear table, I think I lose the efficient storage and need to write out the transition times for all years separately, which will be huge. I think I need to retain the original data structure in my code and then evaluate it all at runtime, finding the right rule set in the zone, then iterating all rules to find the current UTC offset.
I have acquired permission from this end to produce a commercial DLL to wrap our tz parser and time converter algorithms as a licensable product if there is enough interest.
Sorry, not by me. I am creating a free (maybe open-source) product and where is no money, none can flow. Also, yours is probably native C code which I don't like to use in C# and probably cannot use in Java on Android. Actually I don't see the big problem in writing the code that does all this. It's finding a working algorithm at all, at least that I can run it in my brain. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
<<On Fri, 02 Jul 2010 21:51:03 +0200, Yves Goergen <nospam.list@unclassified.de> said:
I think I need to retain the original data structure in my code and then evaluate it all at runtime, finding the right rule set in the zone, then iterating all rules to find the current UTC offset.
I think you are much, much better off parsing the output of zic(1) rather than its input. The world really, truly doesn't need any more buggy, incomplete reimplementations of part of zic. (You don't need to ship zic or the source files at all in your package; the output files are machine-independent. It should not be hard to write a program that reads in all of the output files and writes an assembly containing a dictionary of constants with one entry for each one.) -GAWollman
On 2 July 2010 20:51, Yves Goergen <nospam.list@unclassified.de> wrote:
Sorry, not by me. I am creating a free (maybe open-source) product and where is no money, none can flow. Also, yours is probably native C code which I don't like to use in C# and probably cannot use in Java on Android.
There is a compiler in Joda-Time and another in JSR-310. Both in Java which you can read. The JSR-310 one is BSD licensed, so easy to reuse. Stephen
On Fri, 2 Jul 2010, Yves Goergen wrote:
A (binary searchable) table implies a flat two-dimensional structure. Zones and rules are more than two-dimensional. At the point when I resolve zones and rules into a linear table, I think I lose the efficient storage and need to write out the transition times for all years separately, which will be huge.
You can fit a table entry into four bytes, so for two transitions each year between 1970 and 2038 (136 entries) you need just over half a kilobyte. Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ FAIR ISLE FAEROES: WEST, BACKING SOUTHWEST, 4 OR 5. MODERATE OR ROUGH. SHOWERS. MODERATE OR GOOD.
On 05.07.2010 15:43 CE(S)T, Tony Finch wrote:
You can fit a table entry into four bytes, so for two transitions each year between 1970 and 2038 (136 entries) you need just over half a kilobyte.
That's the transition dates. Still need the offsets. And it's only until 2038... Okay, if I just specify that my calendar won't work before 1900 and after 2100, it's 2,4 kB. Per timezone. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On Mon, 5 Jul 2010, Yves Goergen wrote:
On 05.07.2010 15:43 CE(S)T, Tony Finch wrote:
You can fit a table entry into four bytes, so for two transitions each year between 1970 and 2038 (136 entries) you need just over half a kilobyte.
That's the transition dates. Still need the offsets.
The offsets and other info can be (and are) stored in a separate table referred to by the main table entries. The tz code uses 5 bytes per entry but this could easily be reduced to 4 by one-minute time resolution instead of one-second resolution.
And it's only until 2038...
Because time_t is still usually signed 32 bits. (I don't know how the tz code deals with 64 bit time_t.) Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ FORTIES CROMARTY FORTH TYNE DOGGER: WEST OR SOUTHWEST 4 OR 5, INCREASING 6 FOR A TIME. SLIGHT OR MODERATE, OCCASIONALLY ROUGH AT FIRST IN FORTIES. SHOWERS. GOOD.
Date: Mon, 5 Jul 2010 17:26:54 +0100 From: Tony Finch <dot@dotat.at> Message-ID: <alpine.LSU.2.00.1007051718140.10878@hermes-2.csi.cam.ac.uk> | Because time_t is still usually signed 32 bits. (I don't know how the tz | code deals with 64 bit time_t.) It deals with it just fine - handling 64 bit time_t's correctly was added some time ago now - and several systems are using 64 but time_t's with no problems of note (current NetBSD is one of those - that is, when NetBSD 6 is released it will be 64 bit time's throughout, FreeBSD is as well probably). However ... Yves Goerge <nospam.list@unclassified.de> said: | Okay, if I just specify that my calendar won't work before 1900 and after | 2100, it's 2,4 kB. Per timezone. Before 1900 (approximately, it varies from timezone to timezone) all of this is nonsense anyway, as there was no real standardised time - you could perhaps go back another 100 years in some areas, but we don't have much in the way of reliable data for anywhere in that period, and essentially no incentive to go and collect it where there are any historical records from which we might be able to get the data - it just isn't important for any practical purpose. Into the future, for any data past July 6, 2010, we're all just speculating. That is, any answer is a guess. For the near future (say, July 7, 2010) we have a very high confidence level that our speculation will turn out to be correct - the further into the future we go, the more that level drops. Guessing times offsets more than about 3 or 4 years into the future in many timezones seems to be a wildly dangerous thing to do if you're going to claim any degree of confidence in your answer - that is, if you're not clearly labelling it as a guess. I am at least glad that Guy Harris' message is being treated seriously, I have been continually amazed about the number of messages we see here from people who want to work with the timezone data in some software or other, and who then set out to attempt to write parsers for the source data files. Other than as an academic exercise, frankly, that's insane. The only purpose of the source files is to be maintained by the people on this list to be as accurate a representation of the current (and past) known intentions of the various authorities who set the world's time offsets. The format of those files can, has, and will again, change whenever it is needed to be able to better express what is expressed in the various policies. For example, one thing that we're currently lacking, which hasn't yet been enough of a problem for us to need to fix - but might be one day, is any way of expressing dates that vary depending upon the various variable religious holidays that exist - we have the "yearistype" method of handling conditional evaluation (though that isn't generally regarded as being a particularly good solution I don't think), but nothing that calculates the dates of the various events that have been known to affect summer time transition vents when they clash. To the best of my knowledge, all of those dates can be calculated with sufficient code, if and when we ever decide it is needed - if we do that' it would probably mean another change of some form to the timezone source data files. When that happens, for sure, zic will deal with it. The other representation is the zic output - that's simple, and in a format that is essentially never going to change in any incompatible way, as that's the format that software everywhere is reading to actually convert times between UTC and local time (both directions) - and because its format already allows for everything (being so simple). Yes, it is verbose - but that's OK, computers easily deal with volumes of data, and while this stuff is growing, and will continue to grow, it is growing at a much slower rate than ram and processing speeds (even i/o transfer rates). Any code that is being written with a purpose of actually usefully translating times between UTC and the various local timezones, which you actually expect that should last longer than a year or so (and which isn't being written just to prove that "yes, it is possible to write a tzdata parser in lisp/fortran/ apl/...") really should be working exclusively with the output from zic - if the binary format is a problem, then a 10 minute conversion program could convert the file into any other format that you'd prefer instead. Yves - this rant has not been aimed at you particularly, you're at least considering Guy's message, but at all the others out there who keep trying to do the same thing, and then, attempt to argue against changes to the tzdata source format, because their particular parser wouldn't be able to handle that. Tough! kre kre
Hi Robert I enjoyed reading your rant, and I agree with a lot that you say, but not all. Certainly, the source file format is something that one would think that all users of the data should have the chance to comment about. I'm sure there are many people that would prefer there never be any changes at all, and others that could provide novel suggestions for enhancements. I would hope that all suggestions about the data, from all users, would be at least entertained. The needs of all users are different, and the prefered format for their needs will therefore be different. That said, of course, its up to the maintainers, knowing their users, to decide in which format the data should be released, and if that doesnt conform with the wishes of some of the users, then of course, its up to those users to find other solutions. In our case, our requirements where that we could provide updated timezone information to our users within 24 hours of being aware of them. Since we are not in charge of generating the original data, we are dependant on its releases for updates, as we all are. As well, since we do not maintain zic, we are dependant upon any changes that may need to be applied to it as well. Our solution was to write our own parser to extract the data we needed directly from the tar.gz, and put it into a format that was most appropriate for our own use, bypassing zic altogether. This removes one level of dependancy, and also gets the data into our preferred format immediately. When there is a new tar.gz released, our users can have updated data in their systems automatically by an automated ftp. The basis of our current requirements, is to provide time and date information for long-range (past and future) research . Our parsers are used in astronomy products, archo-astronomy products, as well as for genealogy and other historical research. Our time & date conversions reach back and look forward tens of thousands of years, and can generate the solar, mean and 'zone' time for any date over a 100,000 year span, as well as matching against dates in other calendars. Though its true that no one was wondering about daylight saving and zone offsets 10000 years ago, the dates we compute for 10000 years ago, give users a 'feel' for the time of an event that happened that long ago. Most people don't know the difference between apparent time and mean time, but saying that an event likely happend near 6PM EST June 1st 8000BC is something that most people can relate to. More importantly, for more recent historical research, knowing when a region generally went from solar time to local mean time, then to standard time is very useful for pinning down the accuracy and synchronicity of recorded historical events - such as eclipses. Im sure the tz database was never originally intended for this purpose, but no doubt there are others on this list that use what they can from tz, then augment it with data and algorithms from other sources. If we suggest changes to tz to make things easier for us, without interfering with the needs of current users, its because the data is now being used far beyond its originally intended purpose. As for my thoughts about changing the actual format for my own needs, i have made a few suggestions, but think most of my ideas would probably cause more grief to others than the advantage they might gain me, so most of my suggestions have remained unmentioned and are handled instead in our parsers.
David Patte said:
Though its true that no one was wondering about daylight saving and zone offsets 10000 years ago, the dates we compute for 10000 years ago, give users a 'feel' for the time of an event that happened that long ago. Most people don't know the difference between apparent time and mean time, but saying that an event likely happend near 6PM EST June 1st 8000BC is something that most people can relate to.
Except that, that far back, leap seconds probably make a significant difference as well. That far back, the required correction (i.e. TAI - UT1) is about 86 hours (that is, 3.5 days). -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
Yes, delta-T is very important in all astronomical calculations Clive D.W. Feather wrote:
David Patte said:
Though its true that no one was wondering about daylight saving and zone offsets 10000 years ago, the dates we compute for 10000 years ago, give users a 'feel' for the time of an event that happened that long ago. Most people don't know the difference between apparent time and mean time, but saying that an event likely happend near 6PM EST June 1st 8000BC is something that most people can relate to.
Except that, that far back, leap seconds probably make a significant difference as well.
That far back, the required correction (i.e. TAI - UT1) is about 86 hours (that is, 3.5 days).
On 06.07.2010 04:06 CE(S)T, Robert Elz wrote:
When that happens, for sure, zic will deal with it.
zic deals exactly nothing for me right now. Is there a Windows binary available somewhere? Are the generated zic files available? Could somebody assist me in building zic for Windows? BTW, your comments and answers to my questions have so far been very interesting and I think they might be helpful to others as well who consider implementing their own parser for X#++. I've been reading the project website which doesn't contain any of this background information. I didn't know what zic does, I didn't know the format of those generated files nor where the specification of the input files is, I wasn't clear about the problems my implementation is going to have. All this could have helped me in advance. And I'm sure someone else will come along in a year and will ask the same questions as me. As probably somebody did a year ago. I'm sure you can answer them as well but for those, and for those who don't ask, this information would help. Give it a catchy title and truly interested people (or those already experiencing problems) will read it. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On Tuesday, July 6 2010, "Yves Goergen" wrote to "tz@lecserver.nci.nih.gov" saying:
On 06.07.2010 04:06 CE(S)T, Robert Elz wrote:
When that happens, for sure, zic will deal with it.
zic deals exactly nothing for me right now. Is there a Windows binary available somewhere? Are the generated zic files available? Could somebody assist me in building zic for Windows?
zic uses the low-level POSIX file I/O calls (open/read/write) rather than the high-level C file I/O calls (fopen/fread/fwrite), as well as some APIs without C equivalents (mkdir) and some POSIX-specific concepts (link/symlink), so directly porting it to a non-POSIX system without an emulation layer would probably be a fair amount of trouble. Fortunately, Cygwin <http://cygwin.com/> provides an excellent POSIX emulation for Windows, and includes the tzcode binaries (zic and friends) and the generated tzdata files in its base distribution. -- Jonathan Lennox lennox@cs.columbia.edu
I have now tried to compile it with MinGW using the makefile but it doesn't work either. Here's the output:
C:\Programme\MinGW\tz>mingw32-make makefile:306: warning: overriding commands for target `install' makefile:287: warning: ignoring old commands for target `install' sed \ -e 's|AWK=[^}]*|AWK=nawk|g' \ -e 's|TZDIR=[^}]*|TZDIR=/usr/local/etc/zoneinfo|' \ <tzselect.ksh >tzselect chmod +x tzselect process_begin: CreateProcess(NULL, chmod +x tzselect, ...) failed. make (e=2): Das System kann die angegebene Datei nicht finden. mingw32-make: *** [tzselect] Error 2
(The second-last line says: The system cannot find the specified file.) On 06.07.2010 17:16 CE(S)T, lennox@cs.columbia.edu wrote:
zic uses the low-level POSIX file I/O calls (open/read/write) rather than the high-level C file I/O calls (fopen/fread/fwrite)
POSIX shouldn't be too much of a problem on Windows, AFAIK it offers such an API. IIRC with VS2008 it was functions like _getopt or so that were eventually unresolved which caused the linker to fail. The code itself compiled. I specified the single .c files though (try&error to find out the set of files) and not the makefile.
Fortunately, Cygwin <http://cygwin.com/> provides an excellent POSIX emulation for Windows, and includes the tzcode binaries (zic and friends) and the generated tzdata files in its base distribution.
For some reason I traditionally dislike cygwin. The last times I've seen it it was huge and bloated. It felt like installing an entire Linux upon Windows. Also this seems like a massive measure only to get those timezone data converted in a processable format. I have rethought the requirements for my calendar application and I think I can restrict the operation time from 1970 to 2099. That seems a reasonable timespan to use my calendar app in. Since the original zic would generate transition times from something around 1970 to 2038, I'd need to change that anyway. So I must be able to compile my customised version of zic if I got that right. And if that changes a lot and it's up to me (as the application publisher) to handle updates then I'd really like to be able to quickly build a new zic and not wait for cygwin to gracefully release an update. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On Tue, 06 Jul 2010, Yves Goergen wrote:
I have now tried to compile it with MinGW using the makefile but it doesn't work either. Here's the output:
C:\Programme\MinGW\tz>mingw32-make makefile:306: warning: overriding commands for target `install' makefile:287: warning: ignoring old commands for target `install' sed \ -e 's|AWK=[^}]*|AWK=nawk|g' \ -e 's|TZDIR=[^}]*|TZDIR=/usr/local/etc/zoneinfo|' \ <tzselect.ksh >tzselect chmod +x tzselect process_begin: CreateProcess(NULL, chmod +x tzselect, ...) failed. make (e=2): Das System kann die angegebene Datei nicht finden. mingw32-make: *** [tzselect] Error 2
(The second-last line says: The system cannot find the specified file.)
I thought you were trying to build zic? tzselect is something different. (Try "make zic" instead of just "make".) --apb (Alan Barrett)
On 06.07.2010 18:30 CE(S)T, Alan Barrett wrote:
I thought you were trying to build zic? tzselect is something different. (Try "make zic" instead of just "make".)
That leads to:
C:\Programme\MinGW\tz>mingw32-make zic makefile:306: warning: overriding commands for target `install' makefile:287: warning: ignoring old commands for target `install' cc -DTZDIR=\"/usr/local/etc/zoneinfo\" -c -o zic.o zic.c process_begin: CreateProcess(NULL, cc -DTZDIR=\"/usr/local/etc/zoneinfo\" -c -o zic.o zic.c, ...) failed. make (e=2): Das System kann die angegebene Datei nicht finden. mingw32-make: *** [zic.o] Error 2
Seems tz resists building on Windows? Has nobody ever tried yet? Think I'll retry with Visual Studio and modify the source code to define some macros to make it more friendly. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
Some testing for me seems to reveal that if you: * install MSYS as well as MINGW, and add both the MINGW and MSYS bin directories to your Windows PATH * use the MSYS make * set CFLAGS in the tzcode Makefile to -DHAVE_SYS_WAIT_H=0 -DHAVE_SYS_STAT_H=0 -DHAVE_SYMLINK=0 * set cc in the tzcode Makefile to gcc * write a wrapper function for link() mapping it to the Windows NT CreateHardLink function: #include <errno.h> #define _WIN32_WINNT 0x0500 #include <windows.h> int link(const char* path1, const char* path2) { BOOL createdLink = CreateHardLinkA(path2, path1, NULL); if (!createdLink) { errno = EACCES; /* ?? Should map GetLastError to errno */ return -1; } return 0; } * include the source file containing link() in the Makefile's TZCSRCS variable, and its corresponding object file in TZCOBJS * and run on a file system that supports hard links, e.g. NTFS Then make zic; make zones works. You'll probably also want to set the TZDIR variable in the makefile to something more Windows-sensible than /usr/local/etc/zoneinfo. If you need to support older filesystems, e.g. FAT32, you could alternately map link() to CopyFile instead. -- Jonathan Lennox lennox@cs.columbia.edu On Tuesday, July 6 2010, "Yves Goergen" wrote to "tz@lecserver.nci.nih.gov" saying:
On 06.07.2010 18:30 CE(S)T, Alan Barrett wrote:
I thought you were trying to build zic? tzselect is something different. (Try "make zic" instead of just "make".)
That leads to:
C:\Programme\MinGW\tz>mingw32-make zic makefile:306: warning: overriding commands for target `install' makefile:287: warning: ignoring old commands for target `install' cc -DTZDIR=\"/usr/local/etc/zoneinfo\" -c -o zic.o zic.c process_begin: CreateProcess(NULL, cc -DTZDIR=\"/usr/local/etc/zoneinfo\" -c -o zic.o zic.c, ...) failed. make (e=2): Das System kann die angegebene Datei nicht finden. mingw32-make: *** [zic.o] Error 2
Seems tz resists building on Windows? Has nobody ever tried yet?
Think I'll retry with Visual Studio and modify the source code to define some macros to make it more friendly.
-- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On 06.07.2010 20:57 CE(S)T, lennox@cs.columbia.edu wrote:
* set CFLAGS in the tzcode Makefile to -DHAVE_SYS_WAIT_H=0 -DHAVE_SYS_STAT_H=0 -DHAVE_SYMLINK=0
* write a wrapper function for link() mapping it to the Windows NT CreateHardLink function:
Thank you for these tips. I could now build zic with Visual Studio and the following command: cl -DHAVE_SYS_WAIT_H=0 -DHAVE_SYMLINK=0 -DHAVE_UNISTD_H=0 zic.c scheck.c ialloc.c my_getopt.c my_link.c I needed to insert #include "getopt.h" in private.h (I thought it is an appropriate location) and copy my_getopt (see other message) and another c file with your link function into it. I now have the following source files: getopt.h ialloc.c my_getopt.c my_getopt.h my_link.c private.h scheck.c tzfile.h zic.c In the end I have a zic.exe that seems to do the job. I run it like this: zic -d output input/africa input/antarctica input/asia input/australasia input/etcetera input/europe input/northamerica input/pacificnew input/southamerica (Windows CMD doesn't expand * like bash would do and zic doesn't do it either so I need to specify all files separately.) It creates 457 files in 16 directories with a total size of 536 kB. After looking into the resulting files, I have the impression that the transition timestamps are all 32-bit, even in the 64-bit v2 section (half of the bytes in that area are 0). I assume that zic only creates transition records until before 2038. How can I extend that range? I didn't find a good place in the code.
If you need to support older filesystems, e.g. FAT32, you could alternately map link() to CopyFile instead.
What does zic use links for? Does it link zone aliases? I think I won't need them in my calendar application, do I? Could I disable creating duplicate files (or linking them)? -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
After looking into the resulting files, I have the impression that the transition timestamps are all 32-bit, even in the 64-bit v2 section (half of the bytes in that area are 0). I assume that zic only creates transition records until before 2038. How can I extend that range? I didn't find a good place in the code.
Some 64-bit timestamps are created--take a look, for example, at America/Godthab. But for those places whose current time zone rules can be represented by POSIX-style environment strings, the POSIX string is written at the very end of the file and is used for times after the last transition time recorded in the file, reducing the number of explicit transitions needed. Even for places (such as Godthab) that CAN'T be represented by POSIX rules, only 400 years of transitions are written. Since the Gregorian calendar is on a grand 400-year cycle (January 1, 2000 falls on a Saturday; January 1, 2400 falls on a Saturday) localtime can handle far-future timestamps using modular arithmetic. Transition times through 2037 are written for the benefit of old systems that don't handle the POSIX-style string at the end. --ado
Date: Tue, 06 Jul 2010 21:58:29 +0200 From: Yves Goergen <nospam.list@unclassified.de> Message-ID: <4C338AE5.7010309@unclassified.de> | What does zic use links for? Does it link zone aliases? Yes, and the default timezone. | I think I won't need them in my calendar application, do I? Probably not, they're mostly for backwards compatibility (old names of zones that have been renamed). | Could I disable creating duplicate files (or linking them)? So you can avoid using NTFS, yes, just have link return -1, as I suggested (in mail I sent after you sent this one...) and you'll get symlinks instead. I think those will work on FAT - trying to make them go away isn't worth the bother (you can always remove anything that you don't need later). With 64 bit times, attempting to write the complete rand of years to the file would be absurd, so we no longer do that - the file still retains the complete range of 32 bit time_t years (1970-2038) as that's what it used to be like, and old systems expect to find that data. But for more modern systems, we recognise that times in the future (beyond some near future year, usually this year, occasionally next year, and for now at least, all well before 2038) are obtained by guessing from a rule. Historical times have all kinds of anomalies, and future times will as well - but we don't know what the future ones are yet... So, once we get out beyond where we have any recorded data, we just use the rule (embedded in the tzfile) to encode any future time. It will be wrong in many cases, but it is as good as is possible. So, use the data to deal with all the weirdness until the data runs out, then just use the rule. This is the comment from tzfile.h that describes all of this ... /* ** If tzh_version is '2' or greater, the above is followed by a second instance ** of tzhead and a second instance of the data in which each coded transition ** time uses 8 rather than 4 chars, ** then a POSIX-TZ-environment-variable-style string for use in handling ** instants after the last transition time stored in the file ** (with nothing between the newlines if there is no POSIX representation for ** such instants). */ The "POSIX-TZ-environment-variable-style string" is the rule. kre
On 06.07.2010 23:06 CE(S)T, Robert Elz wrote:
So, use the data to deal with all the weirdness until the data runs out, then just use the rule.
Hm, so I need to figure out how to parse that rule and compute further data myself. I'm using zic to not do that. I hoped there would be a simple switch in zic.c where I could extend the last generated year. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
Date: Wed, 07 Jul 2010 07:47:38 +0200 From: Yves Goergen <nospam.list@unclassified.de> Message-ID: <4C3414FA.4010005@unclassified.de> | Hm, so I need to figure out how to parse that rule and compute further | data myself. The rule is POSIX specified, so you can look up its definition - but you'll also find the code to process it in localtime.c somewhere. You could just use that - it isn't very difficult, just tedious (much much easier than parsing the tzdata source files!) | I'm using zic to not do that. I hoped there would be a | simple switch in zic.c where I could extend the last generated year. There isn't, but you could fairly easily add one, you just need to alter its setting of max_year (look for "2037" in the code). The advantage of using the rule is that you can generate transitions for any future year that seems useful (with no guarantee of correctness - but few people expect us to cope with unannounced legislative changes before they happen). That is, you don't have to have an upper bound (for a calendar application, a backwards limit is generally OK, as no-one wants to plan meetings in the past - but planning future events many years ahead is sometime desireable). kre
On 07.07.2010 08:45 CE(S)T, Robert Elz wrote:
| I'm using zic to not do that. I hoped there would be a | simple switch in zic.c where I could extend the last generated year.
There isn't, but you could fairly easily add one, you just need to alter its setting of max_year (look for "2037" in the code).
I've tried it again and it worked. The first time I did it, I think I didn't see any effect, but I was reading the binary files myself where I might have made a mistake. Now that I have my binary file reader in C#, I can see transition times up until 2099 or whatever I set in zic.c. I just need to polish it a bit so I can publish all of it on my website for those interested. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On Tuesday, July 6 2010, "Yves Goergen" wrote to "tz@lecserver.nci.nih.gov" saying:
If you need to support older filesystems, e.g. FAT32, you could alternately map link() to CopyFile instead.
What does zic use links for? Does it link zone aliases? I think I won't need them in my calendar application, do I? Could I disable creating duplicate files (or linking them)?
Other than the default timezone (which doesn't make much sense to set on Windows), links correspond exactly to Link lines in the tzdata source files. Most of these do indeed come from the "backward" file, which you can probably safely omit, but some of them are for cases where two separate countries (by the ISO 3166 definition) have had identical timezone histories since 1970. You could probably get away with requiring people in Vatican City to use Europe/Rome, or those on the isle of Guernsey to use Europe/London. However, forcing people in the other countries of the former Yugoslavia to use Serbian time, or Slovaks to use Czech time, is sufficiently politically fraught that it's probably a lot less trouble in the long run just to get links working. -- Jonathan Lennox lennox@cs.columbia.edu
On Tue, 6 Jul 2010, lennox@cs.columbia.edu wrote:
fraught that it's probably a lot less trouble in the long run just to get links working.
Or write a link() that does file copy. Jaakko -- Foreca Ltd Jaakko.Hyvatti@foreca.com Tammasaarenkatu 5, FI-00180 Helsinki, Finland http://www.foreca.com
On 07.07.2010 04:42 CE(S)T, Jaakko Hyvätti wrote:
On Tue, 6 Jul 2010, lennox@cs.columbia.edu wrote:
fraught that it's probably a lot less trouble in the long run just to get links working.
Or write a link() that does file copy.
Oh links do seem to work, but those files are not the end of my processing. I'd like to further compile them to code to reduce the need of ~400 files... I think my post-processor could recognise links and generate the code accordingly, so all's fine. :-) -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On Jul 6, 2010, at 8:55 AM, Yves Goergen wrote:
I have now tried to compile it with MinGW using the makefile but it doesn't work either. Here's the output:
C:\Programme\MinGW\tz>mingw32-make makefile:306: warning: overriding commands for target `install' makefile:287: warning: ignoring old commands for target `install' sed \ -e 's|AWK=[^}]*|AWK=nawk|g' \ -e 's|TZDIR=[^}]*|TZDIR=/usr/local/etc/zoneinfo|' \ <tzselect.ksh >tzselect chmod +x tzselect process_begin: CreateProcess(NULL, chmod +x tzselect, ...) failed. make (e=2): Das System kann die angegebene Datei nicht finden. mingw32-make: *** [tzselect] Error 2
(The second-last line says: The system cannot find the specified file.)
The specified file might be chmod; that command is marking the tzselect command as executable - it's a shell script, so that's necessary.
On 06.07.2010 17:16 CE(S)T, lennox@cs.columbia.edu wrote:
zic uses the low-level POSIX file I/O calls (open/read/write) rather than the high-level C file I/O calls (fopen/fread/fwrite)
POSIX shouldn't be too much of a problem on Windows, AFAIK it offers such an API.
The high-level C file I/O calls are offered, of course; I don't remember why we didn't use it. The low-level equivalents of open(), read(), and write() in Windows are CreateFile(), ReadFile(), and WriteFile(). (In UN*X, you create files with open(); in Windows, you open files with CreateFile(). :-))
IIRC with VS2008 it was functions like _getopt or so that were eventually unresolved which caused the linker to fail.
getopt() is oriented towards the UN*X command-line option conventions, and isn't part of the C standard. Wireshark uses the GNU libc version of getopt() on Windows; the BSD version might also work if the GNU Public License is a problem.
On Jul 6, 2010, at 9:51 AM, Guy Harris wrote:
The low-level equivalents of open(), read(), and write() in Windows are CreateFile(), ReadFile(), and WriteFile(). (In UN*X, you create files with open(); in Windows, you open files with CreateFile(). :-))
...and the Microsoft C library offers _open(), _read(), _write(), _close(), etc., which function similarly to the UN*X routines - similarly enough that GLib (not to be confused with glibc) and Wireshark use them on Windows - so, on Windows, if you define "open" as "_open", "read" as "_read", etc., that might take care of those routines.
On 06/07/10 16:55, Yves Goergen wrote:
IIRC with VS2008 it was functions like _getopt or so that were eventually unresolved which caused the linker to fail. The code itself compiled. I specified the single .c files though (try&error to find out the set of files) and not the makefile.
You can get a BSD-style licensed version of getopt from <http://xent.com/~bsittler/geocities/>. That's Benjamin Sittler's "my_getopt" package. There are other free getopt implementations out there too, as well as the LGPL'd version in the GNU C Library. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
On 06.07.2010 18:56 CE(S)T, Ian Abbott wrote:
You can get a BSD-style licensed version of getopt from <http://xent.com/~bsittler/geocities/>. That's Benjamin Sittler's "my_getopt" package.
I tried that and it helps. The getopt missing references are gone. There just remains a missing _link reference that I cannot resolve. Here's what I did after editing private.h to minimise the error messages and adding my_getopt: cl zic.c scheck.c ialloc.c my_getopt.c Output:
Microsoft (R) 32-Bit C/C++-Optimierungscompiler Version 14.00.50727.762 für 80x8 6 Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
zic.c zic.c(103) : warning C4028: Formaler Parameter '2' unterscheidet sich von der De klaration zic.c(1490) : warning C4113: 'int (__cdecl *)()' weicht in der Parameterliste vo n 'int (__cdecl *)(const void *,const void *)' ab scheck.c ialloc.c my_getopt.c Code wird generiert... Microsoft (R) Incremental Linker Version 8.00.50727.762 Copyright (C) Microsoft Corporation. All rights reserved.
/out:zic.exe zic.obj scheck.obj ialloc.obj my_getopt.obj zic.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol "_link" i n Funktion "_dolink". zic.exe : fatal error LNK1120: 1 nicht aufgelöste externe Verweise.
There are warnings that I don't understand. The present C syntax with split function arguments was entirely new to me. Looks a bit like Pascal. I searched all files for "_link" but could only find it in zic.obj, not in any of the source files. Where can I get that unused function from? And why do I need to? Can I just define my own _link() { } ? -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
Date: Tue, 06 Jul 2010 20:01:22 +0200 From: Yves Goergen <nospam.list@unclassified.de> Message-ID: <4C336F72.9000708@unclassified.de> | There just remains a missing _link reference that I cannot resolve. Just make a link() function that does link() { return -1; } That will be enough to cause zic to fall back to symlinks() instead, and those should work on Windows. So, ... | And why do I need to? Can I just define my own _link() { } ? yes, but it needs to return an error code to indicate it failed, then all should be OK. kre
On Wednesday, July 7 2010, "Robert Elz" wrote to "tz@lecserver.nci.nih.gov" saying:
Just make a link() function that does link() { return -1; } That will be enough to cause zic to fall back to symlinks() instead, and those should work on Windows.
Symbolic links only exist on Windows Vista or later (and only on NTFS), and there's no symlink() system function -- you'd have to wrap CreateSymbolicLink. Shortcut files (.lnk files) have existed for a very long time, but they're a higher-level abstraction which is opaque to low-level file system calls (i.e., the low-level APIs just see the file with the .lnk extension). This is why I recommended that the tzcode be built with -DHAVE_SYMLINK=0 for MinGW. If you have neither a working link() nor a working symlink(), zic bails out at the first Link in the tzdata, which happens to be Antarctica/South_Pole -> Antarctica/McMurdo. -- Jonathan Lennox lennox@cs.columbia.edu
On Tue, 06 Jul 2010, Yves Goergen wrote:
Could somebody assist me in building zic for Windows?
If you post sufficient details about the problems you encounter, then it is likely that somebody could help.
I didn't know what zic does, I didn't know the format of those generated files nor where the specification of the input files is,
The format of the input is documented in the zic(8) man page. The format of the output is documented in the tzfile(5) man page. --apb (Alan Barrett)
On 06.07.2010 18:22 CE(S)T, Alan Barrett wrote:
On Tue, 06 Jul 2010, Yves Goergen wrote:
I didn't know what zic does, I didn't know the format of those generated files nor where the specification of the input files is,
The format of the input is documented in the zic(8) man page. The format of the output is documented in the tzfile(5) man page.
Oh, yes, sorry, I forgot. This is classic Unix/Linux. Here's no fancy web pages yet that could welcome and inform the user efficiently. Here's manpages, C codes and scattered text files in archives that you first need to download. Have I overlooked something? I understand that maintaining a more or less beautiful but more importantly informative website causes work, but so does answering e-mails which again seemed feasible. It's the overview that lacks here. I could read all pieces but would still not know how they work together and what their intention is. Anyway, it was just a suggestion. You already helped me and I'm thankful for that. -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On Mon, Jul 5, 2010 at 9:06 PM, Robert Elz <kre@munnari.oz.au> wrote:
The other representation is the zic output - that's simple, and in a format that is essentially never going to change in any incompatible way, as that's the format that software everywhere is reading to actually convert times between UTC and local time (both directions) - and because its format already allows for everything (being so simple).
Almost, but not quite. There are two things that I think are wrong with the zic output: - There's just a boolean indicating /whether/ we're observing some kind of advanced time (usually, but not necessarily, summer time, a.k.a., daylight saving time). There's no way to recover the standard offset from UTC from a single entry. You have to backtrack to the most recent entry for standard time and hope that it hasn't changed since then. - Multi-byte integers are aligned on arbitrary byte boundaries. This can be a hassle on systems that require stricter alignment. (This isn't a fundamental loss of data like the DST switch is, just a pain in the rear.) --Bill Seymour
I have a reply to David Patte's message typed, but I have decided to wait a bit before sending it. But these two are easy... Date: Tue, 6 Jul 2010 03:58:42 -0500 From: Bill Seymour <stdbill.h@pobox.com> Message-ID: <AANLkTil04xyA_bV9jJ4T-BfNF8bJQDtADVUICKa0Wb0D@mail.gmail.com> | There's no way to recover the standard offset from UTC That is mostly as the concept isn't real - there is really no "standard" offset from UTC, we'd like it if there were, and we often like to pretend it exists, and in some locales, it is even reasonable to say it does, but when legislatures like to arbitrarily shift their "standard" from one value to another, it starts to become fairly meaningless to settle on anything as really being a standard - all we really have (or need) is the offset from UTC that applies in some area at a particular (universal) time. Expecting more than than is asking for more than the world really gives us. | - Multi-byte integers are aligned on arbitrary byte boundaries. This is just nuisance - if it bothers you, write a converter, and convert the binary file into something easier for your application to parse. It is difficult (or worse...) to change for existing applications, which expect the current format, but trivial to overcome for new ones (so if you have an application that is using the data at a rate that makes this kind of issue significant, which isn't your typical application's use of the data, then avoiding this problem is a 10-20 minute conversion prog from the current data format to whatever suits you best.) Yves Goergen <nospam.list@unclassified.de> said: | zic deals exactly nothing for me right now. Is there a Windows binary | available somewhere? I think you asked that before (perhaps indirectly) and no-one answered, which is a surprise to me, as I cannot imagine anything about zic that would make it particularly difficult to port to anything with a C compiler. And I know that includes windows - though I'm not a windows user and cannot help personally. Sure, some of the frills may need to be trimmed away, but they're not important for the primary task. You said that you tried to build one, and there were unix libc() function calls that were undefined - what were they? Perhaps we can either just give you copies of those functions to use, or tell you they're unnecessary, and you can just delete the references to them. or at worse, tell you what the function needs to do (if it is an OS interface function) and you can write whatever it takes on windows to achieve the same effect. kre
At worst you can always install a complete linux like environment and tools to your windows box from cygwin.com . I think the basic installation includes timezone files and zic. If not, click on them in the installer to install them. Yves Goergen <nospam.list@unclassified.de> said:
zic deals exactly nothing for me right now. Is there a Windows binary available somewhere?
-- Foreca Ltd Jaakko.Hyvatti@foreca.com Tammasaarenkatu 5, FI-00180 Helsinki, Finland http://www.foreca.com
On 02.07.2010 20:21 CE(S)T, Bill Seymour wrote:
Attached is my understanding of the format of the binaries.
Ehm, where's the time zone name? Does the binary file only contain a single time zone? -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de
On Fri, Jul 2, 2010 at 2:42 PM, Yves Goergen <nospam.list@unclassified.de> wrote:
On 02.07.2010 20:21 CE(S)T, Bill Seymour wrote:
Attached is my understanding of the format of the binaries.
Ehm, where's the time zone name? Does the binary file only contain a single time zone?
Yes, and the time zone name is the file name. The '/' characters in the time zone name are Unix directory separators. My understanding is that the directory where all the tz binaries reside can be found in some environment variable...I'm not sure which one, but I imagine others can tell you. In that directory, there will be, for example, a subdirectory named "Pacific", and in that subdirectory, there will be a file named "Honolulu". That's the time zone binary for Hawaii. If a time zone name has three elements, that just indicates another level of subdirectories. For example, America has four subdirectories, Argentina, Indiana, Kentucky, and North_Dakota. North_Dakota contains two files, Center and New_Salem. TZ gurus: did I get all that right? --Bill
On Fri, Jul 2, 2010 at 1:34 PM, Bill Seymour <stdbill.h@pobox.com> wrote:
On Fri, Jul 2, 2010 at 2:42 PM, Yves Goergen <nospam.list@unclassified.de> wrote:
On 02.07.2010 20:21 CE(S)T, Bill Seymour wrote:
Attached is my understanding of the format of the binaries.
Ehm, where's the time zone name? Does the binary file only contain a single time zone?
Yes, and the time zone name is the file name. The '/' characters in the time zone name are Unix directory separators. My understanding is that the directory where all the tz binaries reside can be found in some environment variable...I'm not sure which one, but I imagine others can tell you.
In that directory, there will be, for example, a subdirectory named "Pacific", and in that subdirectory, there will be a file named "Honolulu". That's the time zone binary for Hawaii.
If a time zone name has three elements, that just indicates another level of subdirectories. For example, America has four subdirectories, Argentina, Indiana, Kentucky, and North_Dakota. North_Dakota contains two files, Center and New_Salem.
TZ gurus: did I get all that right?
Yes - that is the basic structure as I understand it. The scheme is analogous to the scheme adopted by terminfo for terminals, and similar hierarchical schemes are used in CPAN (Comprehensive Perl Archive Network) and other places too. -- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2008.0513 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."
On Jul 2, 2010, at 11:21 AM, Bill Seymour wrote:
Attached is my understanding of the format of the binaries.
Not quite. The tzh_ttisgmtcnt, tzh_ttisstdcnt, tzh_leapcnt, tzh_timecnt, tzh_typecnt, and tzh_charcnt fields are always 4 bytes, in both version 1 and 2. The transition times, and the leap-second times (but not the leap seconds counts), are 4 bytes in version 1 and 2 bytes in version 2. You might want to use uint32_t, and uint64_t, rather than long, to make it clear that the length of the fields in question doesn't necessarily match the size of a long on the platform on which the code is running.
On 02.07.2010 16:18 CE(S)T, Yves Goergen wrote:
Is there some higher-level algorithm available of how that works?
Here's my first draft of today. It describes a function to determine the effective UTC offset for local time at a specified UTC timestamp. This can be used for converting from UTC to local time. The basic idea is to keep the tz database text files' data structure and work on it at runtime. It may be a little slower, but I intend to put caching means into it later. The advantage of this is that it can be used for an arbitrary date while keeping the data storage as small as possible. (You know, 2038 is not the end of the world...) My algorithm assumes that time zones (the Zone lines with all their continuation lines), rule sets (the individual Zone and continuation lines) and rules (the Rule lines) are available as application data. Is somebody able and willing to take a look at it to spot big mistakes? BTW, is there some kind of test suite to validate implementations against the original C code? -- Yves Goergen "LonelyPixel" <nospam.list@unclassified.de> Visit my web laboratory at http://beta.unclassified.de tz algorithm ============ Function: Get UTC offset at a specified UTC time (for converting from UTC to local time) Input: Requested UTC time (UTC date/time to determine the effective UTC offset for) Input: Time zone Output: Base UTC offset and save offset in effect at the requested time Data: All time zones with their rule sets with their rules, as stored in the tz database text files Select first rule set of the time zone Set base UTC offset from the rule set Set save offset = 0 Set active year = Year(End date of the rule set) - 1 Set date pointer to Jan 1st of the active year For each rule set of the time zone: Set base UTC offset from the rule set If the rule set specifies rules: Loop with all rules specified in the rule set: While the date pointer is in the active year: Select the rule with the first transition month/day/time on or after the date pointer from all rules valid in the active year |< Break if no such rule was found Compute the UTC transition time from the its relative date, its time type and the active base UTC/save offset Recompute the rule set's UTC end time from its time type and the active base UTC/save offset If the UTC transition time is greater than or equal to the rule set's UTC end time: Set date pointer = rule set's UTC end time |< - - - - Continue with the next rule set If the requested UTC time is smaller than the UTC transition time: Return with the current base UTC/save offset Set save offset from the selected rule Set date pointer to the rule's UTC transition time + 1 day Increase active year by 1 If the rule set does not specifiy rules: Recompute the rule set's UTC end time from its time type and the active base UTC/save offset If the UTC transition time is greater than or equal to the rule set's UTC end time: Set date pointer = rule set's UTC end time |< - - Continue with the next rule set If the requested UTC time is smaller than the rule set's UTC end time: Return with the current base UTC/save offset
Hi, Yves, If you go to http://www.cstdbill.com/tzdb/db.html and scroll all the way down to Appendix B, you'll find an open-source program that reads the tz source files and generates text files, either comma-separated or tab-separated, for loading into a relational database. You might get some ideas from that. --Bill Seymour
participants (19)
-
Alan Barrett -
Bill Seymour -
Clive D.W. Feather -
David Patte -
Eric Fischer -
Garrett Wollman -
Guy Harris -
Ian Abbott -
Jaakko Hyvätti -
Jonathan Leffler -
lennox@cs.columbia.edu -
Olson, Arthur David (NIH/NCI) [E] -
Paul Eggert -
Robert Elz -
Stephen Colebourne -
Stephen Moir -
Thomas KIPP -
Tony Finch -
Yves Goergen