I'm forwarding this message from John Dlugosz, who is not on the time zone mailing list. Those of you who are on the list, please direct replies appropriately. --ado -----Original Message----- From: John Dlugosz [mailto:JDlugosz@TradeStation.com] Sent: Tuesday, January 06, 2009 7:10 To: tz@lecserver.nci.nih.gov Subject: time zone library I downloaded the timezone library, and in adapting it to my needs I've been figuring out how it works. I have a few questions, and also (hopefully) a contribution. First, my contribution. Regarding /* ** Adapted from code provided by Robert Elz, who writes: ** The "best" way to do mktime I think is based on an idea of Bob ** Kridle's (so its said...) from a long time ago. ** It does a binary search of the time_t space. Since time_t's are ** just 32 bits, its a max of 32 iterations (even at 64 bits it ** would still be very reasonable). */ I'm using 64-bit time_t values, and find that it usually takes 62 to 64 iterations. Basically, it is always worst case. I tried using a rough guess for the initial hi and low: lo = ((yourtm.tm_year-70) * 365i64 + yourtm.tm_mon * 28) * SecondsPerDay; hi = ((yourtm.tm_year-70) * 366i64 + (yourtm.tm_mon+1)*31) * SecondsPerDay; when the resulting time_t would be positive, and it completes in 20 to 23 iterations. I punted on the negative case, but a very similar calculation would work. When tm_year < 70, I just used the smallest value I care about for lo, and 0 for hi. That gives me worst case 35 iterations, because I'm only supporting values back to 1752 CE, not the full range of a 64-bit number, which would be 21 times longer than the age of the universe. Now for my first question. TZ_MAX_TIMES is 1200, and that is exactly the length of the arrays for state::ats and state::types. I suppose it is noting every time change, e.g. DST adjustments twice a year for 600 years? There is no need to store anything before the first reported time in the configuration files, as it simply uses the first entry for anything older (no DST in pre-history); it seems to have code for dealing with dates beyond the end of the table. Does it extrapolate the current DST rule in the future? I'd like to cut off the table at a more reasonable date for my application. Where is that table populated? Is it when the data file is loaded, or in zic when it is created? My second question concerns leap seconds. It appears that the leap second data is associated with every zone file. Why isn't it simply the same for everything, and global? --John
Date: Mon, 12 Jan 2009 13:09:02 -0500 From: "Olson, Arthur David (NIH/NCI) [E]" <olsona@dc37a.nci.nih.gov> Message-ID: <B410D30A78C6404C9DABEA31B54A2813029A0407@nihcesmlbx10.nih.gov> I'm going to leave your questions for someone else, but ... | I'm using 64-bit time_t values, and find that it usually takes 62 to 64 | iterations. Basically, it is always worst case. [...] | when the resulting time_t would be positive, and it completes in 20 to | 23 iterations. First, I am not surprised at the "always worst case", that's what I would expect, it can only be quicker by pure fluke, which should save n iterations once in every 1/2^n cases (ie: you might expect to have it take 54 iterations (64 bit time_t) one time in a thousand or so). What really matters is that "worst" here doesn't really mean very much. What kind of application do you have for which the performance of mktime() matters enough that it is worth complicating the algorithm? The "even at 64 bits it would still be very reasonable" comment was written in the time when the computations were being done on Vax 11/780 (and perhaps even more commonly) 750 systems. These days where CPUs are a thousand times faster (or more), and even baby embedded on a chip systems are likely 10-100 times quicker than the systems of the time, it is really hard to imagine an application that could really have a need to call mktime() enough for its CPU cost to matter in the slightest. If such an application did exist, it would probably be better to tailor an algorithm that could make use of the results all the thousands of conversions it must be doing to really optimise the calculations, rather than just building in a constant heuristic. kre
When you have a server farm of over 200 high-end multicore machines, eeking a little more performance out is worth some programmer time, as compared with the price of buying more servers. Even 1% is basically 2 more machines on the rack, with its inherent cost of ownership. Basically, I found me a niche where performance still matters <grin>. www.tradestation.com. Actually, I think that the more complex line of code isn't _that_ complex, and saves a lot. The real issue is testing. "Check everything" is less likely to have a mistake. I'm actually making a comprehensive test which checks every 15-30 minutes from 1930 through 2010, for every TZ file. That will let me do a full regression test of my code against your original. Since I'm dealing with local times of various places in the application, a major difference is to make timezone objects that can be instantiated, rather than a single global setting. BTW, what is the official way to refer to or cite this code and the associated database? I see several names and abbreviations in use. --John -----Original Message----- From: kre@munnari.OZ.AU [mailto:kre@munnari.OZ.AU] Sent: Monday, January 12, 2009 3:57 PM To: John Dlugosz Cc: tz@elsie.nci.nih.gov Subject: Re: FW: time zone library Date: Mon, 12 Jan 2009 13:09:02 -0500 From: "Olson, Arthur David (NIH/NCI) [E]" <olsona@dc37a.nci.nih.gov> Message-ID: <B410D30A78C6404C9DABEA31B54A2813029A0407@nihcesmlbx10.nih.gov> I'm going to leave your questions for someone else, but ... | I'm using 64-bit time_t values, and find that it usually takes 62 to 64 | iterations. Basically, it is always worst case. [...] | when the resulting time_t would be positive, and it completes in 20 to | 23 iterations. First, I am not surprised at the "always worst case", that's what I would expect, it can only be quicker by pure fluke, which should save n iterations once in every 1/2^n cases (ie: you might expect to have it take 54 iterations (64 bit time_t) one time in a thousand or so). What really matters is that "worst" here doesn't really mean very much. What kind of application do you have for which the performance of mktime() matters enough that it is worth complicating the algorithm? The "even at 64 bits it would still be very reasonable" comment was written in the time when the computations were being done on Vax 11/780 (and perhaps even more commonly) 750 systems. These days where CPUs are a thousand times faster (or more), and even baby embedded on a chip systems are likely 10-100 times quicker than the systems of the time, it is really hard to imagine an application that could really have a need to call mktime() enough for its CPU cost to matter in the slightest. If such an application did exist, it would probably be better to tailor an algorithm that could make use of the results all the thousands of conversions it must be doing to really optimise the calculations, rather than just building in a constant heuristic. kre
Date: Mon, 12 Jan 2009 17:38:48 -0500 From: "John Dlugosz" <JDlugosz@TradeStation.com> Message-ID: <450196A1AAAE4B42A00A8B27A59278E70925C92F@EXCHANGE.trad.tradestation.com> | When you have a server farm of over 200 high-end multicore machines, | eeking a little more performance out is worth some programmer time, as | compared with the price of buying more servers. Even 1% is basically 2 | more machines on the rack, with its inherent cost of ownership. What kind of applications are you running that call mktime() enough that even if you reduced mktime()'s CPU cost to 0 you could possibly save 1% of the total system CPU usage? Have you measured this saving, or are you just speculation, as in "this is obviously faster so it must be worthwhile" ? If you were concerned more about low end embedded systems I'd tend to be a little more believing, but for high end multi-core machines if you can even measure the cost of mktime (unless you have very unusual applications) I'd be astounded. It certainly is not where I'd be spending my programmer effort! | Basically, I found me a niche where performance still matters <grin>. | www.tradestation.com. Performance matters just about everywhere, and if you can convince us (which does not necessarily mean me - in fact almost certainly doesn't) that there's actually a measurable performance win from the change, then it might get considered, but that has to be in an overall system context, not in a bogus "run mktime() a million times and count the cost" test. | Actually, I think that the more complex line of code isn't _that_ | complex, and saves a lot. I suspect it saves some of the cost of mktime() which for most people is saving some of nothing... | The real issue is testing. "Check everything" is less likely to have | a mistake. I'm actually making a comprehensive test which checks | every 15-30 minutes from 1930 through 2010, for every TZ file. Why stop at 2010? That's a little close. Also remember, this is mktime() - it has to convert all kinds of (bogus) struct tm's into a time_t - fully testing it is non-trivial. | Since I'm dealing with local times of various places in the application, | a major difference is to make timezone objects that can be instantiated, | rather than a single global setting. That certainly seems worthwhile. | BTW, what is the official way to refer to or cite this code and the | associated database? I see several names and abbreviations in use. Very good question, and again, I will defer to someone who might have a suggestion ("The ado timezone package" is probably what I'd use). kre
My second question concerns leap seconds. It appears that the leap second data is associated with every zone file. Why isn't it simply the same for everything, and global?
There was a year when New York observed the leap second at midnight local time rather than midnight UCT; this allowed the countdown at Times Square to go "3...2...1...leap...Happy New Year." While the data files reflect the legal reality of the situation, we've left code in place to allow oddball observance of leap seconds for those who want it. A consequence is that leap data must be stored on a zone-by-zone basis. The amount of leap data per zone is, of course, fairly small. --ado
"Arthur" == Arthur David Olson <Olson> writes:
My second question concerns leap seconds. It appears that the leap second data is associated with every zone file. Why isn't it simply Arthur> the same for everything, and global?
Arthur> There was a year when New York observed the leap second at Arthur> midnight local time rather than midnight UCT; this allowed Arthur> the countdown at Times Square to go "3...2...1...leap...Happy Arthur> New Year." While the data files reflect the legal reality of Arthur> the situation, we've left code in place to allow oddball Arthur> observance of leap seconds for those who want it. A Arthur> consequence is that leap data must be stored on a Arthur> zone-by-zone basis. The amount of leap data per zone is, of Arthur> course, fairly small. I find this odd. Leap seconds are defined by BIH, i.e., they are an international standard, not merely the local whim of some random mayor. While it is fun to think of a leap second observation at Times Square, that doesn't mean it's real. I would think it's better to ignore such games. paul
participants (4)
-
John Dlugosz -
Olson, Arthur David (NIH/NCI) [E] -
Paul Koning -
Robert Elz