Changing 24:00 to 0:00 where possible
While writing a parser for tzdb files, I noticed that some rules are for a given date at 24:00, rather than the following day at 0:00. While in some cases this is unavoidable (Egypt), in others there is no reason this is necessary (Belize). I think it would be reasonable to change 24:00 to 0:00 where possible, incrementing the day, day of week, and month as appropriate. This would reduce the need for special casing values to those that have a technical reason. Jacob Pratt
I wrote my own TZDB parser as well, and my early versions ran into these problems. The solution was to implement an internal version of the "date/time" class/type that could handle 24:00 without complaining. Your code will have to handle Japan's bizarre 25:00 rule for 1948-1951 anyway (Rule Japan 1948 1951 - Sep Sat>=8 25:00 0 S). Once it handles 25:00, it should be able to handle the 24:00. I don't speak for the TZDB maintainers, but my impression is that these 24:00 correspond directly to how the regulations and laws are written. That probably makes it easier for human beings to maintain them. On Tue, Apr 4, 2023, at 02:49, Jacob Pratt via tz wrote:
While writing a parser for tzdb files, I noticed that some rules are for a given date at 24:00, rather than the following day at 0:00. While in some cases this is unavoidable (Egypt), in others there is no reason this is necessary (Belize).
I think it would be reasonable to change 24:00 to 0:00 where possible, incrementing the day, day of week, and month as appropriate. This would reduce the need for special casing values to those that have a technical reason.
Jacob Pratt
That is definitely what I intend to do for where it's necessary, but the code path will almost certainly be slower, which is a significant factor to consider in my situation (it's a widely used library). Ultimately it would be great if there were a generated file in a common format (such as JSON) so that everyone doesn't have to write their own parser. But that's a separate issue. With regard to Japan, did the clock read 24:59:59 before going to 0:00? I'm not sure how else to interpret 25:00. On Tue, Apr 4, 2023, 10:53 Brian Park <brian@xparks.net> wrote:
I wrote my own TZDB parser as well, and my early versions ran into these problems. The solution was to implement an internal version of the "date/time" class/type that could handle 24:00 without complaining. Your code will have to handle Japan's bizarre 25:00 rule for 1948-1951 anyway (Rule Japan 1948 1951 - Sep Sat>=8 25:00 0 S). Once it handles 25:00, it should be able to handle the 24:00.
I don't speak for the TZDB maintainers, but my impression is that these 24:00 correspond directly to how the regulations and laws are written. That probably makes it easier for human beings to maintain them.
On Tue, Apr 4, 2023, at 02:49, Jacob Pratt via tz wrote:
While writing a parser for tzdb files, I noticed that some rules are for a given date at 24:00, rather than the following day at 0:00. While in some cases this is unavoidable (Egypt), in others there is no reason this is necessary (Belize).
I think it would be reasonable to change 24:00 to 0:00 where possible, incrementing the day, day of week, and month as appropriate. This would reduce the need for special casing values to those that have a technical reason.
Jacob Pratt
I am curious to hear that your TZDB parsing is exposed to your end-users and is in the critical path. I would have thought that the parsing would be done offline (e.g. after each TZDB release), and the TZDB data would be converted into a different format that is more amenable to the computation that your code is performing. With regards to the data format of the TZDB, I have not seen great problems with the parsing. It's the *interpretation* of that data which is incredibly difficult and tricky. I'm not sure that using different format, like JSON, would help with the interpretation part. With regards to 25:00, I believe that should be interpreted as: "This transition occurs at exact 25 hours after the beginning of the first Saturday on or after the 8th of September". The implicit 'w' for "wall time" is not a real physical clock on the wall, but a hypothetical clock keeping time in the local time zone just before the transition. On Tue, Apr 4, 2023, at 12:24, Jacob Pratt wrote:
That is definitely what I intend to do for where it's necessary, but the code path will almost certainly be slower, which is a significant factor to consider in my situation (it's a widely used library).
Ultimately it would be great if there were a generated file in a common format (such as JSON) so that everyone doesn't have to write their own parser. But that's a separate issue.
With regard to Japan, did the clock read 24:59:59 before going to 0:00? I'm not sure how else to interpret 25:00.
On Tue, Apr 4, 2023, 10:53 Brian Park <brian@xparks.net> wrote:
__ I wrote my own TZDB parser as well, and my early versions ran into these problems. The solution was to implement an internal version of the "date/time" class/type that could handle 24:00 without complaining. Your code will have to handle Japan's bizarre 25:00 rule for 1948-1951 anyway (Rule Japan 1948 1951 - Sep Sat>=8 25:00 0 S). Once it handles 25:00, it should be able to handle the 24:00.
I don't speak for the TZDB maintainers, but my impression is that these 24:00 correspond directly to how the regulations and laws are written. That probably makes it easier for human beings to maintain them.
On Tue, Apr 4, 2023, at 02:49, Jacob Pratt via tz wrote:
While writing a parser for tzdb files, I noticed that some rules are for a given date at 24:00, rather than the following day at 0:00. While in some cases this is unavoidable (Egypt), in others there is no reason this is necessary (Belize).
I think it would be reasonable to change 24:00 to 0:00 where possible, incrementing the day, day of week, and month as appropriate. This would reduce the need for special casing values to those that have a technical reason.
Jacob Pratt
<<On Tue, 04 Apr 2023 12:45:54 -0700, Brian Park via tz <tz@iana.org> said:
I am curious to hear that your TZDB parsing is exposed to your end-users and is in the critical path. I would have thought that the parsing would be done offline (e.g. after each TZDB release), and the TZDB data would be converted into a different format that is more amenable to the computation that your code is performing.
Since the very beginning of this project, there has been a standard parsed data format (now called "TZif") and the `zic` utility has been shipped to do the parsing. Indeed, for a long time, the behavior of `zic` was the only specification of the human-readable data format. Of course, this format was designed for easy access by the C standard library routines and not by JavaScript, but it would in theory not be difficult to modify the `zic` source code to generate a different output format that was more amenable to document-processing languages. -GAWollman
Since the very beginning of this project, there has been a standard parsed data format (now called "TZif") and the `zic` utility has been shipped to do the parsing. Indeed, for a long time, the behavior of `zic` was the only specification of the human-readable data format.
Indeed there was no standards-organization documentation in the early days; the only thing available was a manual entry (tzfile.5); that entry did not always capture the complete picture. @dashdashado On Tue, Apr 4, 2023 at 3:57 PM Garrett Wollman via tz <tz@iana.org> wrote:
<<On Tue, 04 Apr 2023 12:45:54 -0700, Brian Park via tz <tz@iana.org> said:
I am curious to hear that your TZDB parsing is exposed to your end-users and is in the critical path. I would have thought that the parsing would be done offline (e.g. after each TZDB release), and the TZDB data would be converted into a different format that is more amenable to the computation that your code is performing.
Since the very beginning of this project, there has been a standard parsed data format (now called "TZif") and the `zic` utility has been shipped to do the parsing. Indeed, for a long time, the behavior of `zic` was the only specification of the human-readable data format.
Of course, this format was designed for easy access by the C standard library routines and not by JavaScript, but it would in theory not be difficult to modify the `zic` source code to generate a different output format that was more amenable to document-processing languages.
-GAWollman
I would have thought that the parsing would be done offline (e.g. after each TZDB release), and the TZDB data would be converted into a different format that is more amenable to the computation that your code is performing.
It will be! However, it is quite likely that some users will want to be able to use their system's copy of tzdb, as the language I'm using is statically linked.
there has been a standard parsed data format (now called "TZif")
I was not aware of this! Now that I search, I see that there is existing tooling from the Unicode Consortium to parse this format. Given that, it'll make things quite a bit easier on my end. In any situation, having 24:00 where 0:00 is a possibility remains a bit odd, even if that is what the original sources say. Jacob Pratt On Tue, Apr 4, 2023 at 4:08 PM Arthur David Olson via tz <tz@iana.org> wrote:
Since the very beginning of this project, there has been a standard parsed data format (now called "TZif") and the `zic` utility has been shipped to do the parsing. Indeed, for a long time, the behavior of `zic` was the only specification of the human-readable data format.
Indeed there was no standards-organization documentation in the early days; the only thing available was a manual entry (tzfile.5); that entry did not always capture the complete picture.
@dashdashado
On Tue, Apr 4, 2023 at 3:57 PM Garrett Wollman via tz <tz@iana.org> wrote:
<<On Tue, 04 Apr 2023 12:45:54 -0700, Brian Park via tz <tz@iana.org> said:
I am curious to hear that your TZDB parsing is exposed to your end-users and is in the critical path. I would have thought that the parsing would be done offline (e.g. after each TZDB release), and the TZDB data would be converted into a different format that is more amenable to the computation that your code is performing.
Since the very beginning of this project, there has been a standard parsed data format (now called "TZif") and the `zic` utility has been shipped to do the parsing. Indeed, for a long time, the behavior of `zic` was the only specification of the human-readable data format.
Of course, this format was designed for easy access by the C standard library routines and not by JavaScript, but it would in theory not be difficult to modify the `zic` source code to generate a different output format that was more amenable to document-processing languages.
-GAWollman
On Tue 2023-04-04T17:00:45-0400 Jacob Pratt via tz hath writ:
In any situation, having 24:00 where 0:00 is a possibility remains a bit odd, even if that is what the original sources say.
If we could go back and tell folks that their description of time was suboptimal we would, but they would not listen, and we would still have to handle what they did. -- Steve Allen <sla@ucolick.org> WGS-84 (GPS) UCO/Lick Observatory--ISB 260 Natural Sciences II, Room 165 Lat +36.99855 1156 High Street Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 https://www.ucolick.org/~sla/ Hgt +250 m
But is there actually a difference between 24:00 and 0:00 of the next day? I wouldn't think so, hence my asking. Jacob Pratt On Tue, Apr 4, 2023 at 5:25 PM Steve Allen via tz <tz@iana.org> wrote:
On Tue 2023-04-04T17:00:45-0400 Jacob Pratt via tz hath writ:
In any situation, having 24:00 where 0:00 is a possibility remains a bit odd, even if that is what the original sources say.
If we could go back and tell folks that their description of time was suboptimal we would, but they would not listen, and we would still have to handle what they did.
-- Steve Allen <sla@ucolick.org> WGS-84 (GPS) UCO/Lick Observatory--ISB 260 Natural Sciences II, Room 165 Lat +36.99855 1156 High Street Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 https://www.ucolick.org/~sla/ Hgt +250 m
On Tue, Apr 4, 2023 at 5:27 PM Jacob Pratt via tz <tz@iana.org> wrote:
But is there actually a difference between 24:00 and 0:00 of the next day? I wouldn't think so, hence my asking.
Sure, but how would you translate, say, "lastSat 24:00" to something using "0:00 of the next day"?
That's why I've said "where possible" or similar in previous messages. lastX can't be replaced, but Sun>=8 24:00 could be replaced by Mon>=9 0:00. On Tue, Apr 4, 2023, 19:36 Bradley White <bww@acm.org> wrote:
On Tue, Apr 4, 2023 at 5:27 PM Jacob Pratt via tz <tz@iana.org> wrote:
But is there actually a difference between 24:00 and 0:00 of the next day? I wouldn't think so, hence my asking.
Sure, but how would you translate, say, "lastSat 24:00" to something using "0:00 of the next day"?
On 4/4/23 17:38:20, Jacob Pratt via tz wrote:
That's why I've said "where possible" or similar in previous messages. lastX can't be replaced, but Sun>=8 24:00 could be replaced by Mon>=9 0:00.
Yes, but "the second Sunday at 24:00" can't be replaced by "Monday at 00:00" because that may be either the second Monday or the third. The code to handle the hard case must exist; it's easy to branch around it when hour<24:00. -- gil
On 4/4/23 02:49, Jacob Pratt via tz wrote:
I think it would be reasonable to change 24:00 to 0:00 where possible,
When entering that data I prefer 0:00 to 24:00 when the source is unclear, but when the source says something like "midnight at the end of the day" I use 24:00, as that is closer to the original source and makes things a bit easier to check later.
participants (8)
-
Arthur David Olson -
Bradley White -
Brian Park -
Garrett Wollman -
Jacob Pratt -
Paul Eggert -
Paul Gilmartin -
Steve Allen