suggestion for simplified represention of timezone rules
Hello, I have been doing extensive work with the tz database, and while the data is quite useful, it is not in the most accessible format. I have had little success using the zic compiler or any of the source code, and have been forced to do my own parsing and analysis of the data. One aspect of the representation of rules I find awkward is the notation for the day on which a rule takes effect. There are rules such as "lastSun" or "Sun>=8", etc. While descriptive, they are are hard to process. I would like to suggest a better approach. It takes the form "d op nn", where 'd' is the day-of-week, 0=Sun to 6=Sat 'op' is a comparison code of '<', '=' or '>', and 'nn' is a 2-digit day number (may be 00 or 32) The comparision code works as follows: '=' nn is an exact value '<' rule describes the last day less than nn '>' rule describes the first day greater than nn Examples: 0>07 first Sun > 7 (same as first Sun >= 8) 0<32 last Sun < 32 (same as "lastSun") 0=27 Sun the 27th of the month When a range of years are involved in a rule, there is no exact day-of-week, and in such cases, '9' may be used as a place-holder: 9=01 the first day of the month, regardless of what day-of-week it is There is little need to use a day-of-week with >, since the value would always be the same. Example: 9>07 means: first day of the month (regardless of day of week) that is > 7 is always 8. It is conceivable that 9<nn could be used, but the only case where it would have any merit is in February: 9<32 last day of the month For months other than Feb, the value is a constant for a given month, and for Feb it depends on the leap-year status. Since no one (that I know of) does anthing about DST in Feb, we can safely eliminate the 9<nn form as well as the 9>nn form. The conversion from existing rules to this format is pretty straightforward: lastSun 0<32 '32' is constant for 'last' Sun>=8 0>07 subtract 1 from the tz rule value Sun<=21 0<22 add 1 to the tz rule value May 1 9=01 for May 1 of any year May 1 1980 4=01 May 1 1980 is a Thursday Hope you find this interesting. Regards, Robert Hodge
I'm forwarding this message from Robert Hodge, who is not on the time zone mailing list. Those of you who are on the time zone mailing list should direct replies appropriately. --ado -----Original Message----- From: Robert Hodge [mailto:roberth@sisconet.com] Sent: Friday, January 05, 2007 9:26 AM To: tz@lecserver.nci.nih.gov Subject: suggestion for simplified represention of timezone rules Hello, I have been doing extensive work with the tz database, and while the data is quite useful, it is not in the most accessible format. I have had little success using the zic compiler or any of the source code, and have been forced to do my own parsing and analysis of the data. One aspect of the representation of rules I find awkward is the notation for the day on which a rule takes effect. There are rules such as "lastSun" or "Sun>=8", etc. While descriptive, they are are hard to process. I would like to suggest a better approach. It takes the form "d op nn", where 'd' is the day-of-week, 0=Sun to 6=Sat 'op' is a comparison code of '<', '=' or '>', and 'nn' is a 2-digit day number (may be 00 or 32) The comparision code works as follows: '=' nn is an exact value '<' rule describes the last day less than nn '>' rule describes the first day greater than nn Examples: 0>07 first Sun > 7 (same as first Sun >= 8) 0<32 last Sun < 32 (same as "lastSun") 0=27 Sun the 27th of the month When a range of years are involved in a rule, there is no exact day-of-week, and in such cases, '9' may be used as a place-holder: 9=01 the first day of the month, regardless of what day-of-week it is There is little need to use a day-of-week with >, since the value would always be the same. Example: 9>07 means: first day of the month (regardless of day of week) that is > 7 is always 8. It is conceivable that 9<nn could be used, but the only case where it would have any merit is in February: 9<32 last day of the month For months other than Feb, the value is a constant for a given month, and for Feb it depends on the leap-year status. Since no one (that I know of) does anthing about DST in Feb, we can safely eliminate the 9<nn form as well as the 9>nn form. The conversion from existing rules to this format is pretty straightforward: lastSun 0<32 '32' is constant for 'last' Sun>=8 0>07 subtract 1 from the tz rule value Sun<=21 0<22 add 1 to the tz rule value May 1 9=01 for May 1 of any year May 1 1980 4=01 May 1 1980 is a Thursday Hope you find this interesting. Regards, Robert Hodge
One aspect of the representation of rules I find awkward is the notation for the day on which a rule takes effect. There are rules such as "lastSun" or "Sun>=8", etc. While descriptive, they are are hard to process.
Not that hard. And it's a truism that it's better to have the computer do the work than humans.
I would like to suggest a better approach. It takes the form "d op nn", where 'd' is the day-of-week, 0=Sun to 6=Sat 'op' is a comparison code of '<', '=' or '>', and 'nn' is a 2-digit day number (may be 00 or 32)
The comparision code works as follows: '=' nn is an exact value '<' rule describes the last day less than nn '>' rule describes the first day greater than nn
It helps humans to allow ">=" and <=".
Examples:
0>07 first Sun > 7 (same as first Sun >= 8) 0<32 last Sun < 32 (same as "lastSun") 0=27 Sun the 27th of the month
I'm afraid I don't find this better. It's harder to interpret "0" than "Sun". So why not "Sun>07" or "Wed<32"? Any why force the day number to be 2 digits. Does it hurt to allow "Sun>7"?
When a range of years are involved in a rule, there is no exact day-of-week, and in such cases, '9' may be used as a place-holder:
Again, surely "Any" (or even the old standby "*") is better.
9=01 the first day of the month, regardless of what day-of-week it is
"Any=01" But then why not just say "1"?
It is conceivable that 9<nn could be used, but the only case where it would have any merit is in February:
9<32 last day of the month
For months other than Feb, the value is a constant for a given month, and for Feb it depends on the leap-year status. Since no one (that I know of) does anthing about DST in Feb, we can safely eliminate the 9<nn form as well as the 9>nn form.
No, because we don't *know* that nobody will ever introduce "last Thursday in February".
The conversion from existing rules to this format is pretty straightforward:
lastSun 0<32 '32' is constant for 'last'
But, again, what's wrong with "last"?
Sun>=8 0>07 subtract 1 from the tz rule value
If we go your way, what's wrong with "0>=08", or "0>8", or "Sun>8"?
Sun<=21 0<22 add 1 to the tz rule value
Mut.mut.
May 1 1980 4=01 May 1 1980 is a Thursday
But so are many other May 1sts. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 THUS plc | |
From: Robert Hodge [mailto:roberth@sisconet.com] Sent: Friday, January 05, 2007 9:26 AM
I have had little success using the zic compiler or any of the source code, and have been forced to do my own parsing and analysis of the data.
What problems have you had with zic, or with the source code? The C source code itself is supposed to be portable; if you tell us exactly what problems you're having, quite likely we can fix it. Also, you can get prebuilt copies of zic with most GNU/Linux distributions, and commonly these can boot on any x86 platform without modifying your existing operating system. E.g., see Knoppix <http://www.knopper.net/knoppix/index-en.html> or Ubuntu <http://www.ubuntu.com/>. Or, if C and/or GNU/Linux is too much hassle for you, there is a list of tz compilers written for other languages (C++, Java, JavaScript, Python, Ruby); see <http://www.twinsun.com/tz/tz-link.htm> and look for "Other tz compilers". The tz data notation itself could be extended, but it's unlikely that it would be changed in an incompatible way at this point, as there is too much existing practice.
participants (4)
-
Clive D.W. Feather -
Olson, Arthur David (NIH/NCI) [E] -
Paul Eggert -
Robert Hodge