Fractional seconds in zic input
For many years I've chafed at tzdata's lack of support for fractional seconds. We have good evidence of well-established standard-time UT offsets that were not multiples of one second; for example, the Netherlands before 1937. These can't be recorded in tzdata except as comments. Ideally tzcode would support fractional-second times and UT offsets all the way down the chain. This would mean changes to the tz binary format and to the runtime API, though, which is not something we'd do lightly, if ever. However, it's easy to change the zic spec to allow fractional seconds, and to change zic to accept and ignore the fractions, so that fractional seconds can be documented more formally in the data; this could well be useful to applications other than tzcode. Proposed patch attached, hairy sscanf format and all. This patch does not actually change the data, as we'll need time, and/or a procedure to automatically generate data compatible with zic 2018c and earlier.
On Feb 4, 2018, at 11:37 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
For many years I've chafed at tzdata's lack of support for fractional seconds. We have good evidence of well-established standard-time UT offsets that were not multiples of one second; for example, the Netherlands before 1937. These can't be recorded in tzdata except as comments.
Ideally tzcode would support fractional-second times and UT offsets all the way down the chain. This would mean changes to the tz binary format and to the runtime API, though, which is not something we'd do lightly, if ever. However, it's easy to change the zic spec to allow fractional seconds, and to change zic to accept and ignore the fractions, so that fractional seconds can be documented more formally in the data; this could well be useful to applications other than tzcode. Proposed patch attached, hairy sscanf format and all.
This patch does not actually change the data, as we'll need time, and/or a procedure to automatically generate data compatible with zic 2018c and earlier. <0001-Add-fractional-seconds-to-data-format.patch>
If we are to add fractional second support, it should come with a specification of the precision, or precision-range that is supported (e.g. centiseconds, milliseconds, whatever). Without such documentation, 3rd party zic compilers will have to make an assumption based on the finest precision example in the data file, which may subsequently break when a finer-precision example is later introduced. To require compilers to support arbitrarily fine precision chosen at run time is neither practical nor terribly useful. In choosing a finest supported precision, I would encourage the choice of something coarser than nanoseconds. Not only are there likely to not ever be any such real-world examples, but it is convenient to traffic in time points represented by signed 64 bit 2’s complement, which at nanosecond precision has a range of +/- 292 years (too small of a range imho). I think the finest practical precision would be microseconds (+/-292,000 years range), and even that is almost certainly overkill for any real-world example. Also, do you anticipate a similar refinement of the “SAVE” quantity? Currently the finest precision given is minutes. I don’t currently recall if that precision is specified, or is simply a de-facto standard. If such changes to the database are to be made, I would much prefer they be made asap. Syntax and semantics changes to the database are a much bigger deal to me than changes to zic. And I am in midstream of attempting to base an international C++ language standard on the syntax and semantics of this database. “Optional support” of things such as UTC offsets with centisecond precision are undesirable as it will lead to non-portable behavior for such things as equality of time points. The finest representable precision of a UTC offset should be a concrete, portable specification. Howard
Howard Hinnant wrote:
If we are to add fractional second support, it should come with a specification of the precision, or precision-range that is supported (e.g. centiseconds, milliseconds, whatever).
zic currently supports only 1-second precision in its output. It accepts any amount of precision on input, limited only by size of memory, and ignores fractional seconds. Come to think of it, zic should round instead; I'll add that shortly. It's not clear to me that the zic man page should specify how much precision is intended to be significant.
Also, do you anticipate a similar refinement of the “SAVE” quantity?
Yes, I think the current zic man page says that both time-of-day and UT offset can have fractional seconds. I imagine both could be useful, for the same reason that to-the-second values were useful before now.
If such changes to the database are to be made, I would much prefer they be made asap.
We should have tzdb generate both bleeding-edge format (with fractional seconds and negative DST offsets) and trailing-edge format (without either feature, at least for now). So you'll be able to test with bleeding-edge if you like. I'll try to publish a patch to do that soon.
Paul Eggert <eggert@cs.ucla.edu> writes:
Howard Hinnant wrote:
If such changes to the database are to be made, I would much prefer they be made asap.
We should have tzdb generate both bleeding-edge format (with fractional seconds and negative DST offsets) and trailing-edge format (without either feature, at least for now). So you'll be able to test with bleeding-edge if you like. I'll try to publish a patch to do that soon.
I'm a bit astonished at the direction this discussion is taking. You're proposing to impose rather massive costs on all downstream consumers of tzdb --- maybe not immediately, but eventually --- in order to be able to represent local time in the Netherlands pre-1937 a fraction of a second more accurately? Last I heard, pre-1970 timestamps weren't really even in scope for tzdb, so this seems like a seriously poor misjudgment of what is worth spending people's time on. And it's not even your own time that you're proposing to expend. The original idea of adding the fractional seconds to the source files, and then dropping them again on output, sounded about right from here. regards, tom lane
Tom Lane wrote:
The original idea of adding the fractional seconds to the source files, and then dropping them again on output, sounded about right from here.
That's all that's been proposed for tzdb. I'm not proposing any change to the binary output files, just to the source files. I think you may have misunderstood comments about source files to be comments about binary files.
On Sun 2018-02-04T21:07:54-0800 Paul Eggert hath writ:
It's not clear to me that the zic man page should specify how much precision is intended to be significant.
At the 10th plenary assembly of the CCIR in 1963 Rec. 374 specified that radio broadcast time signals should be within 100 ms of the intended correct time. At the 12th general assembly of the IAU in 1964 Commission 31 (Time) reported that radio broadcast time signals permitted worldwide synchronization to within 2 ms. But the tabulated differences of time signals published by the BIH show that not all broadcasts were achieving the 100 ms target. At the 11th plenary assembly of the CCIR in 1966 Rec. 374-1 still specified 100 ms as the time offset. At the 12th plenary assembly of the CCIR in 1970 Rec. 460 specified the use of leap seconds offset from without placing any limit on the offset. It was not until the 13th plenary assembly of the CCIR in 1974 July that Rec. 460-1 specified that the signals should not deviate by more than 1 ms. The date for this to become effective was 1975-01-01. So fractional second offsets in tz with a precision any smaller than 1 ms are already beyond the level of technical conformance which was being achieved by the broadcasts that served as the basis of legal time for various jurisdictions at 1970-01-01. -- Steve Allen <sla@ucolick.org> WGS-84 (GPS) UCO/Lick Observatory--ISB 260 Natural Sciences II, Room 165 Lat +36.99855 1156 High Street Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m
On 2018-02-04 22:59, Steve Allen wrote:
On Sun 2018-02-04T21:07:54-0800 Paul Eggert hath writ:
It's not clear to me that the zic man page should specify how much precision is intended to be significant.
At the 10th plenary assembly of the CCIR in 1963 Rec. 374 specified that radio broadcast time signals should be within 100 ms of the intended correct time.
At the 12th general assembly of the IAU in 1964 Commission 31 (Time) reported that radio broadcast time signals permitted worldwide synchronization to within 2 ms. But the tabulated differences of time signals published by the BIH show that not all broadcasts were achieving the 100 ms target.
At the 11th plenary assembly of the CCIR in 1966 Rec. 374-1 still specified 100 ms as the time offset.
At the 12th plenary assembly of the CCIR in 1970 Rec. 460 specified the use of leap seconds offset from without placing any limit on the offset.
It was not until the 13th plenary assembly of the CCIR in 1974 July that Rec. 460-1 specified that the signals should not deviate by more than 1 ms. The date for this to become effective was 1975-01-01.
So fractional second offsets in tz with a precision any smaller than 1 ms are already beyond the level of technical conformance which was being achieved by the broadcasts that served as the basis of legal time for various jurisdictions at 1970-01-01.
Have the ITU not dropped those requirements? Digital production and distribution systems introduce many ms to s delays in broadcast time signals, even non-FM BBC stations, and many stations are off UTC by seconds from higher precision sources like NTP and GPS. Only the few dedicated national standard time broadcasts left seem to be within those limits, after correcting for distance and path delays. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
On Mon 2018-02-05T15:32:06-0700 Brian Inglis hath writ:
Have the ITU not dropped those requirements?
The current rec is https://www.itu.int/rec/R-REC-TF.460-6-200202-I/en It still demands 1 ms.
Digital production and distribution systems introduce many ms to s delays in broadcast time signals, even non-FM BBC stations, and many stations are off UTC by seconds from higher precision sources like NTP and GPS.
TF.460 is not about generic broadcasts of content, it is about broadcasts whose specific purpose is to be precision time signals. But most devices which provide access to precise values of UTC obtain that using newer technologies, not from these radio broadcasts. -- Steve Allen <sla@ucolick.org> WGS-84 (GPS) UCO/Lick Observatory--ISB 260 Natural Sciences II, Room 165 Lat +36.99855 1156 High Street Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m
Howard Hinnant wrote:
If we are to add fractional second support, it should come with a specification of the precision, or precision-range that is supported (e.g. centiseconds, milliseconds, whatever). Without such documentation, 3rd party zic compilers will have to make an assumption
No, there is no such necessity. Anything that parses the zic input format will need to round values to the resolution of its output format, which for mainline zic is the 1 s resolution of the tzfile format. It is not a problem for such a parser that the string input format can extend to finer resolution. Your comments would make a lot more sense if we were discussing a change to the tzfile format to accommodate fractional seconds. (We probably will have such a change in due course.) To be consistent with the existing nature of the file format, we'd most likely have a defined resolution for subsecond times.
And I am in midstream of attempting to base an international C++ language standard on the syntax and semantics of this database.
The obvious choice there would be something like POSIX's struct timespec. Fixed resolution defined as part of the API. It doesn't make sense to try to squeeze the subsecond part with the integer part into 64 bits: we're already settled on 64 bits for the integer part alone. Follow the consensus of using a separate fractional part, with a decimal-based resolution, at least as fine as existing APIs (1 ns). Your API design doesn't have to precisely match the resolution of tzfile (of any version), let alone anything further upstream. It would be good for it to be at least as fine as whatever we eventually put into tzfile, but even that isn't vital. -zefram
On Feb 4, 2018, at 7:21 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote:
On Feb 4, 2018, at 11:37 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
For many years I've chafed at tzdata's lack of support for fractional seconds. We have good evidence of well-established standard-time UT offsets that were not multiples of one second; for example, the Netherlands before 1937. These can't be recorded in tzdata except as comments.
Ideally tzcode would support fractional-second times and UT offsets all the way down the chain. This would mean changes to the tz binary format and to the runtime API, though, which is not something we'd do lightly, if ever. However, it's easy to change the zic spec to allow fractional seconds, and to change zic to accept and ignore the fractions, so that fractional seconds can be documented more formally in the data; this could well be useful to applications other than tzcode. Proposed patch attached, hairy sscanf format and all.
This patch does not actually change the data, as we'll need time, and/or a procedure to automatically generate data compatible with zic 2018c and earlier. <0001-Add-fractional-seconds-to-data-format.patch>
If we are to add fractional second support, it should come with a specification of the precision, or precision-range that is supported (e.g. centiseconds, milliseconds, whatever). Without such documentation, 3rd party zic compilers will have to make an assumption based on the finest precision example in the data file, which may subsequently break when a finer-precision example is later introduced. To require compilers to support arbitrarily fine precision chosen at run time is neither practical nor terribly useful.
In choosing a finest supported precision, I would encourage the choice of something coarser than nanoseconds. Not only are there likely to not ever be any such real-world examples, but it is convenient to traffic in time points represented by signed 64 bit 2’s complement, which at nanosecond precision has a range of +/- 292 years (too small of a range imho). I think the finest practical precision would be microseconds (+/-292,000 years range), and even that is almost certainly overkill for any real-world example.
Also, do you anticipate a similar refinement of the “SAVE” quantity? Currently the finest precision given is minutes. I don’t currently recall if that precision is specified, or is simply a de-facto standard.
If such changes to the database are to be made, I would much prefer they be made asap. Syntax and semantics changes to the database are a much bigger deal to me than changes to zic. And I am in midstream of attempting to base an international C++ language standard on the syntax and semantics of this database. “Optional support” of things such as UTC offsets with centisecond precision are undesirable as it will lead to non-portable behavior for such things as equality of time points. The finest representable precision of a UTC offset should be a concrete, portable specification.
I should’ve added two things: 1. Remember when I gave an early warning that negative SAVES’s would break things? (http://mm.icann.org/pipermail/tz/2017-December/025694.html). This change to the database will make that change look like a walk in the park. 2. Doing this without specifying a maximum precision will mean the substantial breakage I speak of in 1) will happen every time the precision is increased. Howard
On 02/05/2018 04:55 AM, Howard Hinnant wrote:
2. Doing this without specifying a maximum precision will mean the substantial breakage I speak of in 1) will happen every time the precision is increased. What sort of breakage do you see? Is the problem that different downstream users will compare calculations and disagree about the exact results because they use differing precisions? But we already have that problem, as at least one downstream user already discards sub-minute information, namely Kerry Shetline's recently-discussed tzdata compressor.
I think it is a very serious workflow problem that these potentially destabilizing changes are committed to the tz source tree prior to any discussion. Indeed, in this case, they were committed on Wed Jan 31 23:13:35 2018 -0800 and then discussion was initiated at Sun, 4 Feb 2018 08:37:52 -0800. And so now the discussion is all about making a case to remove something that's already been committed, rather than a discussion about whether it's appropriate. I don't think that's a good way for the burden to shift. The burden should be on the proposer of destabilizing changes to justify them; not the other way 'round. While I don't think that most forward-looking changes need to have discussion and consensus prior to being committed, I tend to think that other kinds of changes probably should. I realize this is more work for Paul -- a workflow where he can't casually commit to the master branch means keeping track of more stuff. And I'm not trying to suggest there should be a github-oriented "pull request" workflow where each of these changes gets committed on its own branch and only merged to the mainline with consensus or after review. But a lot of projects adopt that kind of workflow (including many that are less likely to cause stability issues throughout the operating system ecosystem) and it is found to be workable. Personally, I'm on the fence as to whether fractional seconds are destabilizing enough to be excluded. I tend to think they are not going to help and for very little gain. Also, not all downstream consumers are equal. --jhawk@mit.edu John Hawkinson
Quoting John Hawkinson on Monday February 05, 2018:
I think it is a very serious workflow problem that these potentially destabilizing changes are committed to the tz source tree prior to any discussion.
What is the expectation of the master branch of Paul's repository? Paul has repeatedly referred to it as his experimental repository. If there is a mismatch of expectations and folks are relying upon it for more than that, I think that needs to be clarified. kim
Kim Davies <kim.davies@iana.org> wrote on Mon, 5 Feb 2018 at 09:26:24 -0800 in <20180205172622.GA57570@KIDA-6861.local>:
What is the expectation of the master branch of Paul's repository? Paul has repeatedly referred to it as his experimental repository.
IMO, this has always been misleading. Regardless of whether it was labelled "experimental" or "mainline" or "development," the fact is that it is the source from which releases are cut. So once something is added to it, the debate becomes about removal. And sometimes, I allege, the debate should be about whether or not that something should be added. I think Paul has bowed to this reality, since he wrote on Fri, 19 Jan 2018 at 09:07:35 -0800 in <94408a93-9463-d05c-0b06-36677f666cd3@cs.ucla.edu>: | Yes, that's always been the intent of the GitHub version. Its master branch | is supposed to be always suitable for external testing. To help emphasize | that, I'm no longer calling it "the experimental version" and am now calling | it "the development version".
If there is a mismatch of expectations and folks are relying upon it for more than that, I think that needs to be clarified.
Paul is clearly relying on it for more than that -- it's what he cuts releases from. --jhawk@mit.edu John Hawkinson
On 02/05/2018 09:34 AM, John Hawkinson wrote:
Regardless of whether it was labelled "experimental" or "mainline" or "development," the fact is that it is the source from which releases are cut.
Yes, as tz-link.html says, the GitHub repository is a development repository. Its latest version is always intended to be suitable for release on pretty much a moment's notice (since governments often don't give us much notice); this was true even when the GitHub repository was called "experimental". None of the recently-installed changes were intended to depart from this procedure. The only changes since 2018c that affect the data are relatively routine: they are a 1-day correction to Kiribati's change-of-day in 1994/5, and a 1-second correction to Jamaica etc. before 1913. In the code there is also a porting fix for macOS awk that to my mind is the most-important change that would prompt a new release. Although other recently-installed changes are significant, they are important only for future releases, not for the current development code or data as used by current downstream packages. Although we could change tzdata in the future, these are changes that haven't been installed into the development version, or even proposed in detail as patches. If and when they are made, they should be accompanied by a procedure that continues to generate data in the current format, a draft of which is in the repository now.
On 02/05/2018 09:18 AM, John Hawkinson wrote:
I think it is a very serious workflow problem that these potentially destabilizing changes are committed to the tz source tree prior to any discussion.
I don't see anything potentially destabilizing about these particular changes. They make zic more-generous about the data it will accept, and they do not change data used by current downstream users. The potentially-destabilizing thing here is the possibility that new data will contain fractional seconds, with no alternative available that uses the old format. That would indeed be potentially-destabilizing, but it's not what's been installed and it's definitely something that should be discussed before it happens.
they were committed on Wed Jan 31 23:13:35 2018 -0800 and then discussion was initiated at Sun, 4 Feb 2018 08:37:52 -0800.
The changes were not published on GitHub until the weekend (the exact timestamp of isn't maintained on GitHub as far as I know). The Wednesday timestamp is that of a patch I prepared on Wednesday but did not push to GitHub until this weekend. (At times I pause work on tzdb and work on something else and one of those times was Wednesday through the weekend.) I initiated discussion on the topic at the moment I noticed the changes hitting GitHub.
Paul Eggert <eggert@cs.ucla.edu> wrote on Mon, 5 Feb 2018 at 09:51:01 -0800 in <b4466354-1e55-f0ce-c318-da3be94f2a34@cs.ucla.edu>:
The changes were not published on GitHub until the weekend (the exact timestamp of isn't maintained on GitHub as far as I know).
Sorry, yes. They were pushed to github sometime before Sat, 03 Feb 2018 21:00:04 -0000 (which is when I got notice, via RSS at 20 min. granularity). Err, that's Sat Feb 3 13:00:04 PST 2018 for comparison with the thread discussion time: Sun, 4 Feb 2018 08:37:52 -0800.
The Wednesday timestamp is that of a patch I prepared on Wednesday but did not push to GitHub until this weekend. (At times I pause work on tzdb and work on something else and one of those times was Wednesday through the weekend.) I initiated discussion on the topic at the moment I noticed the changes hitting GitHub.
A case could be made that both of these are things that matter. That is, one of the strongest reasons to favor discussion-before-committing is to avoid having the community have argue to remove something that you (Paul) have worked hard on, and therefore is a sunk cost for you. It is unpleasant to argue to someone that their past work is wasted, and especially hard to argue when you put in so much time dedicated to the project. So from that perspective, that amount of time from when the changes were implemented to when discussion started (e.g. time to bake in your head and grow comfortable with them) is pertinent. But on the other hand, sure, we all work on stuff and refactor it (especially in the world of git and other DVCSs) and the timestamps can reflect the time work was started (perhaps abortively) rather than completed, and so can be somewhat meaningless, so maybe we should just look at when they are published to the world for discussion. In any case, my chief concern isn't the amount of time. It's the burden of persuasion or the presumption of release. I'd say that any tzcode change is a potentially destabilizing change, and future changes to tzdata are generally not potentially destabilizing, and past changes to tzdata are potentially destabilizing. I'm not sure that's the right framework though, but it is what comes to mind. --jhawk@mit.edu John Hawkinson
On Feb 5, 2018, at 12:02 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 02/05/2018 04:55 AM, Howard Hinnant wrote:
2. Doing this without specifying a maximum precision will mean the substantial breakage I speak of in 1) will happen every time the precision is increased. What sort of breakage do you see? Is the problem that different downstream users will compare calculations and disagree about the exact results because they use differing precisions? But we already have that problem, as at least one downstream user already discards sub-minute information, namely Kerry Shetline's recently-discussed tzdata compressor.
A tzdb compiler will either exactly represent all of the data contained in the tzdb or it won’t. Let’s assume for the moment that the tzdb compiler desires to exactly represent all of the data contained in the tzdb. To do that, it will have to exactly represent the UTC offsets to whatever precision is in the database. To do so in a practical way, the compiler is likely to choose a precision that is at least as fine as the finest precision UTC offset (today that is seconds precision). The compiler _could_ choose to represent precisions finer than the current finest precision offset, but such a choice is not free: it costs range. So there is pressure to design the compiler to not represent uselessly fine precisions. Given that there is upwards pressure on the finest precision that the compiler can handle, one must assume that at least some compilers, if not all of them, will design themselves to whatever is the current finest precision in the database (today seconds). To modify a compiler to handle a precision finer than it is currently designed is a moderate-sized rewrite, likely to break API and ABI in its interface if said compiler is in library form (as is mine and others). So you’re going to break me (and most others) in the move from seconds to centiseconds. If a year from now you again move from centiseconds to milliseconds, you’re going to break me just as badly as the seconds to centiseconds move. If you keep breaking me, I’m eventually going to give up on you being a reliable source of data because I won’t be able to afford the maintenance. It won’t be that I won’t be able to keep up with the work, it will be that my customers won’t put up with my passing along your breakage to them in the form of API/ABI changes. Seconds to centiseconds (or whatever) is going to be a huge amount of breakage for a very limited amount of benefit. It would be a mistake to do it once. It would be a colossal mistake to _plan_ on doing it multiple times. ——————— But lets take the second choice now: The tzdb compiler may or may not exactly represent all of the data contained in the tzdb: ——————— Now if two computers are given the same UTC time point, say to microseconds precision, and both computers map that time point to the same local time using the same time zone specification from the tzdb, they are no longer guaranteed to have equal local times when they communicate with each other (comparing their computed local times) over http. This is a broken invariant that will inevitably lead to run time errors. ——————— So either tzdb compilers must universally exactly represent all data in the tzdb, or tzdb compilers must universally agree on the subset of data to extract from the tzdb so that they all have the same mapping (identical mapping was also essentially the motivation for the relatively recently introduced machine-readable versioning). In the latter case, the portion of the data in the tzdb that is universally ignored by all tzdb compilers has 0 benefit, and a non-zero cost because of the programming effort to ignore it, and the risk of accidentally not ignoring it. 0/non-zero is a horrible benefit/cost ratio. Howard
On 02/05/2018 09:27 AM, Howard Hinnant wrote:
Let’s assume for the moment that the tzdb compiler desires to exactly represent all of the data contained in the tzdb. Thanks, I hadn't considered the possibility that a compiler would be trying to track tzdata itself.
In that case, how about if we follow POSIX's lead and specify nanosecond resolution as the highest the format supports? Although that's likely overkill, it does match a widely used standard; and better overkill than underkill.
On Feb 5, 2018, at 1:01 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
In that case, how about if we follow POSIX's lead and specify nanosecond resolution as the highest the format supports? Although that's likely overkill, it does match a widely used standard; and better overkill than underkill.
On Feb 4, 2018, at 7:21 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote:
In choosing a finest supported precision, I would encourage the choice of something coarser than nanoseconds. Not only are there likely to not ever be any such real-world examples, but it is convenient to traffic in time points represented by signed 64 bit 2’s complement, which at nanosecond precision has a range of +/- 292 years (too small of a range imho). I think the finest practical precision would be microseconds (+/-292,000 years range), and even that is almost certainly overkill for any real-world example.
Howard
On Feb 5, 2018, at 1:21 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote:
On Feb 5, 2018, at 1:01 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
In that case, how about if we follow POSIX's lead and specify nanosecond resolution as the highest the format supports? Although that's likely overkill, it does match a widely used standard; and better overkill than underkill.
On Feb 4, 2018, at 7:21 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote:
In choosing a finest supported precision, I would encourage the choice of something coarser than nanoseconds. Not only are there likely to not ever be any such real-world examples, but it is convenient to traffic in time points represented by signed 64 bit 2’s complement, which at nanosecond precision has a range of +/- 292 years (too small of a range imho). I think the finest practical precision would be microseconds (+/-292,000 years range), and even that is almost certainly overkill for any real-world example.
<sigh> I’m going to have to stop responding without completing my thought, sorry. I should clarify that the precision of the UTC<->local_time mapping makes no theoretical limit on the finer side of the precision of the tools/clients that use said mapping. For example, my library’s clients can traffic in nanosecond precision time stamps and still make good use of the today’s seconds-precision IANA UTC<->local_time mapping. The precision of the IANA UTC<->local_time mapping creates a limit on the _coarseness_ of the client’s time stamp (assuming they want exact mappings), but it _does_not_ place a limit on the _fineness_ (I think I’m inventing words) of the client’s time stamp. For example my client’s can not today traffic in time_stamps coarser than a second if they want to exactly represent time zone mappings, but they can traffic in time stamps finer than a second. If the IANA mapping claims precision down to a nanosecond, then all downstream clients no longer have the option of trafficking in anything less precise than nanosecond precision. The more I think about this direction, the worse it gets. Howard
On 02/05/2018 10:21 AM, Howard Hinnant wrote:
On Feb 5, 2018, at 1:01 PM, Paul Eggert <eggert@cs.ucla.edu> wrote: >> >> In that case, how about if we follow POSIX's lead and specify nanosecond resolution as the highest the format supports? Although that's likely overkill, it does match a widely used standard; and better overkill than underkill. > > On Feb 4, 2018, at 7:21 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote: >> >> In choosing a finest supported precision, I would encourage the choice of something coarser than nanoseconds. Suppose an old UT offset uses sexagesimal notation, or something derived from it? In that case, the exact offset might not be representable as a decimal number, and the nanoseconds resolution will provide a comfortable excess of precision. Sexagesimal is not entirely hypothetical, as we have good evidence that civil time in Vietnam from 1906 to 1911 was 104° 17′ 17″ east of Paris.
I guess I'm not seeing the harm to go with nanoseconds in the data format; if a downstream user wants less precision they can easily round. And following Steve Allen's lead, we can mention in the documentation that there's no practical use of sub-millisecond precision in these old timestamps.
On Feb 5, 2018, at 1:38 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 02/05/2018 10:21 AM, Howard Hinnant wrote:
On Feb 5, 2018, at 1:01 PM, Paul Eggert <eggert@cs.ucla.edu> wrote: >> >> In that case, how about if we follow POSIX's lead and specify nanosecond resolution as the highest the format supports? Although that's likely overkill, it does match a widely used standard; and better overkill than underkill. > > On Feb 4, 2018, at 7:21 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote: >> >> In choosing a finest supported precision, I would encourage the choice of something coarser than nanoseconds. Suppose an old UT offset uses sexagesimal notation, or something derived from it? In that case, the exact offset might not be representable as a decimal number, and the nanoseconds resolution will provide a comfortable excess of precision. Sexagesimal is not entirely hypothetical, as we have good evidence that civil time in Vietnam from 1906 to 1911 was 104° 17′ 17″ east of Paris.
I guess I'm not seeing the harm to go with nanoseconds in the data format; if a downstream user wants less precision they can easily round. And following Steve Allen's lead, we can mention in the documentation that there's no practical use of sub-millisecond precision in these old timestamps.
If two clients (different platforms) want to maintain the invariant that equal time_points remain equal after mapping, then they must operate at the precision of the mapping (or finer). I can not understate the importance of maintaining this invariant, not just for a single application, but for disparate applications built in different programming languages, using different tzdb compilers and running on different computers. A downstream user can not choose less precision than the IANA mapping, and maintain this invariant. Howard
On 02/05/2018 10:46 AM, Howard Hinnant wrote:
If two clients (different platforms) want to maintain the invariant that equal time_points remain equal after mapping, then they must operate at the precision of the mapping (or finer). We already have clients that don't want to do that, as they discard sub-minute resolution. But I take your point that some clients may want to do that and we should cater to this subclass of clients too. In that case, how about if we stick to at most 1-ms resolution in the data, and note in zic.8 that 1 ms resolution is the way to go? I say "1 ms" because of Steve Allen's email.
On Feb 5, 2018, at 1:50 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 02/05/2018 10:46 AM, Howard Hinnant wrote:
If two clients (different platforms) want to maintain the invariant that equal time_points remain equal after mapping, then they must operate at the precision of the mapping (or finer). We already have clients that don't want to do that, as they discard sub-minute resolution. But I take your point that some clients may want to do that and we should cater to this subclass of clients too. In that case, how about if we stick to at most 1-ms resolution in the data, and note in zic.8 that 1 ms resolution is the way to go? I say "1 ms" because of Steve Allen's email.
I can live with 1ms resolution. However I do want to be clear: We’re now no longer talking about a catastrophic mistake, but simply a mistake. Your downstream clients will holler much more loudly than they did with the negative SAVE issue. And the benefit is simply that we can model centisecond precisions for time stamps that are so old that could have only been measured with quartz technology at best. Having said that, I’ll shut up now, and thank you for the 1ms limit. :-) Howard
Maybe I'm missing something, but are we talking about fractional seconds in *offsets* or fractional seconds for the time of the change? For offsets, why would we care whether it can represent +/- 292,000 years, since it's fantastically unlikely that a time zone offset would even be outside of +/- 24 hours. While both outcomes are very unlikely, I think an offset best represented in nanoseconds is much more likely than an offset +/- 292 years... On February 5, 2018 6:58:52 PM UTC, Howard Hinnant <howard.hinnant@gmail.com> wrote:
On Feb 5, 2018, at 1:50 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 02/05/2018 10:46 AM, Howard Hinnant wrote:
If two clients (different platforms) want to maintain the invariant
that equal time_points remain equal after mapping, then they must operate at the precision of the mapping (or finer).
We already have clients that don't want to do that, as they discard sub-minute resolution. But I take your point that some clients may want to do that and we should cater to this subclass of clients too. In that case, how about if we stick to at most 1-ms resolution in the data, and note in zic.8 that 1 ms resolution is the way to go? I say "1 ms" because of Steve Allen's email.
I can live with 1ms resolution. However I do want to be clear: We’re now no longer talking about a catastrophic mistake, but simply a mistake. Your downstream clients will holler much more loudly than they did with the negative SAVE issue. And the benefit is simply that we can model centisecond precisions for time stamps that are so old that could have only been measured with quartz technology at best.
Having said that, I’ll shut up now, and thank you for the 1ms limit. :-)
Howard
On Feb 5, 2018, at 2:13 PM, Paul G <paul@ganssle.io> wrote:
Maybe I'm missing something, but are we talking about fractional seconds in *offsets* or fractional seconds for the time of the change?
For offsets, why would we care whether it can represent +/- 292,000 years, since it's fantastically unlikely that a time zone offset would even be outside of +/- 24 hours. While both outcomes are very unlikely, I think an offset best represented in nanoseconds is much more likely than an offset +/- 292 years...
Let’s say, just for example, that we have a UTC offset of 1ns for Zone X. Let’s further assume that I want to map arbitrary time points between UTC and X, exactly. Well, in order to be sure that I can map UTC to X and back to UTC again, with no loss of information, then time points in both UTC and in X must have nanosecond (or finer) precision. (disclaimer: I’m using the term “finer” here in a very coarse manner. :-) The actual requirement is that the precision of the UTC and X time points must be able to exactly represent nanosecond precision, in the same way you can exactly represent minutes precision with a type holding milliseconds precision — but not vice-versa.) If I can only represent (for example) microsecond precision in UTC and X, then when I map a time point from UTC to X (or vice-versa), the 1ns offset will be lost when I add it to a count of microseconds, and truncate the result to microseconds. Subsequently my X time point will not be an accurate representation of the specified mapping for the X time zone. For example if I subtract UTC from local time I should get the offset, but in this example I would get 0. Howard
Yes, but you are always necessarily truncating time to the precision of your representation because time has significantly greater than nanosecond precision. Your compiler can always truncate the offsets if you want to represent "point in time" as an offset from some fixed point (e.g. the unix epoch) and you want offsets *within that representation* to have the same precision as the offset you're using to represent the point in time. Given that the realistic range of offsets is on the order of +/- 1 day, I don't think we should limit their precision to the same precision as the timestamps they may be used with. If, as seems to be increasingly the case, we're concerned with future-proofing tzdb, it would make sense to support a very high precision like nanoseconds or go with a precision specification scheme that is effectively unlimited. Compilers are free to truncate to whatever level of precision they want in their output data. Best, Paul On 02/05/2018 02:51 PM, Howard Hinnant wrote:
On Feb 5, 2018, at 2:13 PM, Paul G <paul@ganssle.io> wrote:
Maybe I'm missing something, but are we talking about fractional seconds in *offsets* or fractional seconds for the time of the change?
For offsets, why would we care whether it can represent +/- 292,000 years, since it's fantastically unlikely that a time zone offset would even be outside of +/- 24 hours. While both outcomes are very unlikely, I think an offset best represented in nanoseconds is much more likely than an offset +/- 292 years...
Let’s say, just for example, that we have a UTC offset of 1ns for Zone X.
Let’s further assume that I want to map arbitrary time points between UTC and X, exactly.
Well, in order to be sure that I can map UTC to X and back to UTC again, with no loss of information, then time points in both UTC and in X must have nanosecond (or finer) precision. (disclaimer: I’m using the term “finer” here in a very coarse manner. :-) The actual requirement is that the precision of the UTC and X time points must be able to exactly represent nanosecond precision, in the same way you can exactly represent minutes precision with a type holding milliseconds precision — but not vice-versa.)
If I can only represent (for example) microsecond precision in UTC and X, then when I map a time point from UTC to X (or vice-versa), the 1ns offset will be lost when I add it to a count of microseconds, and truncate the result to microseconds. Subsequently my X time point will not be an accurate representation of the specified mapping for the X time zone. For example if I subtract UTC from local time I should get the offset, but in this example I would get 0.
Howard
Allow me to try again: I want no truncation whatsoever. I want to do exact time arithmetic. If I have an offset of 1ns, and I add that to a time point of 1us UTC, the result is 1001ns in time zone X. To be able to accurately represent the time point in Zone X I have to be able to exactly represent 1001ns. Howard On Feb 5, 2018, at 4:03 PM, Paul G <paul@ganssle.io> wrote:
Yes, but you are always necessarily truncating time to the precision of your representation because time has significantly greater than nanosecond precision. Your compiler can always truncate the offsets if you want to represent "point in time" as an offset from some fixed point (e.g. the unix epoch) and you want offsets *within that representation* to have the same precision as the offset you're using to represent the point in time. Given that the realistic range of offsets is on the order of +/- 1 day, I don't think we should limit their precision to the same precision as the timestamps they may be used with.
If, as seems to be increasingly the case, we're concerned with future-proofing tzdb, it would make sense to support a very high precision like nanoseconds or go with a precision specification scheme that is effectively unlimited. Compilers are free to truncate to whatever level of precision they want in their output data.
Best, Paul
On 02/05/2018 02:51 PM, Howard Hinnant wrote:
On Feb 5, 2018, at 2:13 PM, Paul G <paul@ganssle.io> wrote:
Maybe I'm missing something, but are we talking about fractional seconds in *offsets* or fractional seconds for the time of the change?
For offsets, why would we care whether it can represent +/- 292,000 years, since it's fantastically unlikely that a time zone offset would even be outside of +/- 24 hours. While both outcomes are very unlikely, I think an offset best represented in nanoseconds is much more likely than an offset +/- 292 years...
Let’s say, just for example, that we have a UTC offset of 1ns for Zone X.
Let’s further assume that I want to map arbitrary time points between UTC and X, exactly.
Well, in order to be sure that I can map UTC to X and back to UTC again, with no loss of information, then time points in both UTC and in X must have nanosecond (or finer) precision. (disclaimer: I’m using the term “finer” here in a very coarse manner. :-) The actual requirement is that the precision of the UTC and X time points must be able to exactly represent nanosecond precision, in the same way you can exactly represent minutes precision with a type holding milliseconds precision — but not vice-versa.)
If I can only represent (for example) microsecond precision in UTC and X, then when I map a time point from UTC to X (or vice-versa), the 1ns offset will be lost when I add it to a count of microseconds, and truncate the result to microseconds. Subsequently my X time point will not be an accurate representation of the specified mapping for the X time zone. For example if I subtract UTC from local time I should get the offset, but in this example I would get 0.
Howard
I want no truncation whatsoever. I want to do exact time arithmetic.
Then why are you advocating for a 1ms precision? If you don't want any truncation, then you should be arguing for unlimited precision representations. Anything else will necessarily be a truncation.
If I have an offset of 1ns, and I add that to a time point of 1us UTC, the result is 1001ns in time zone X. To be able to accurately represent the time point in Zone X I have to be able to exactly represent 1001ns.
True. This project does not decide what the time zones will be, though. You will have this problem if and only if some zone decides on an offset with nanosecond precision, and if that happens, tzdb will either have to truncate the real data to fit this arbitrary cutoff, or a second change to the precision supported will need to happen. Of course it's unlikely that any zone will actually implement an offset with sub-millisecond precision, but I'm not buying arbitrarily limiting it to milliseconds on the *input* to the compiler on that basis.
Howard
On Feb 5, 2018, at 4:03 PM, Paul G <paul@ganssle.io> wrote:
Yes, but you are always necessarily truncating time to the precision of your representation because time has significantly greater than nanosecond precision. Your compiler can always truncate the offsets if you want to represent "point in time" as an offset from some fixed point (e.g. the unix epoch) and you want offsets *within that representation* to have the same precision as the offset you're using to represent the point in time. Given that the realistic range of offsets is on the order of +/- 1 day, I don't think we should limit their precision to the same precision as the timestamps they may be used with.
If, as seems to be increasingly the case, we're concerned with future-proofing tzdb, it would make sense to support a very high precision like nanoseconds or go with a precision specification scheme that is effectively unlimited. Compilers are free to truncate to whatever level of precision they want in their output data.
Best, Paul
On 02/05/2018 02:51 PM, Howard Hinnant wrote:
On Feb 5, 2018, at 2:13 PM, Paul G <paul@ganssle.io> wrote:
Maybe I'm missing something, but are we talking about fractional seconds in *offsets* or fractional seconds for the time of the change?
For offsets, why would we care whether it can represent +/- 292,000 years, since it's fantastically unlikely that a time zone offset would even be outside of +/- 24 hours. While both outcomes are very unlikely, I think an offset best represented in nanoseconds is much more likely than an offset +/- 292 years...
Let’s say, just for example, that we have a UTC offset of 1ns for Zone X.
Let’s further assume that I want to map arbitrary time points between UTC and X, exactly.
Well, in order to be sure that I can map UTC to X and back to UTC again, with no loss of information, then time points in both UTC and in X must have nanosecond (or finer) precision. (disclaimer: I’m using the term “finer” here in a very coarse manner. :-) The actual requirement is that the precision of the UTC and X time points must be able to exactly represent nanosecond precision, in the same way you can exactly represent minutes precision with a type holding milliseconds precision — but not vice-versa.)
If I can only represent (for example) microsecond precision in UTC and X, then when I map a time point from UTC to X (or vice-versa), the 1ns offset will be lost when I add it to a count of microseconds, and truncate the result to microseconds. Subsequently my X time point will not be an accurate representation of the specified mapping for the X time zone. For example if I subtract UTC from local time I should get the offset, but in this example I would get 0.
Howard
On Feb 5, 2018, at 7:31 PM, Paul G <paul@ganssle.io> wrote:
I want no truncation whatsoever. I want to do exact time arithmetic.
Then why are you advocating for a 1ms precision? If you don't want any truncation, then you should be arguing for unlimited precision representations. Anything else will necessarily be a truncation.
We’re having a philosophical argument. We both want the “truth”, but the “truth” is also elusive. For example if the two of us agreed that nanosecond precision of an offset is what was agreed upon in 1937 for some time zone, what is to prevent someone later from coming along and saying, no, actually we need picosecond resolution? Or femtosecond resolution?! Ultimately we could argue ourselves down to Planck time resolution. This would obviously be ridiculous. And if we accept that observation of ridiculous, then somewhere between Planck time resolution and gigasecond resolution is the optimum answer. Finer is not always better and coarser is not always better. There exists an optimum between these two ridiculous extremes. If you’re going to argue a specific resolution (e.g. nanosecond), I would like to base that on something better than “finer is better” because I can go finer than nanoseconds, no problem. And modern CPU’s have a clock tick at sub-nanosecond levels, so there’s a reasonable argument to go there. Couple that with: Finer precision implies shorter range for a given number of bits. And we have an engineering tradeoff for precision vs range. We can have the ultimate precision or the ultimate range, but not both. We need to factor in engineering judgement on the best tradeoff of precision vs range for a given sizeof(representation).
If I have an offset of 1ns, and I add that to a time point of 1us UTC, the result is 1001ns in time zone X. To be able to accurately represent the time point in Zone X I have to be able to exactly represent 1001ns.
True. This project does not decide what the time zones will be, though. You will have this problem if and only if some zone decides on an offset with nanosecond precision, and if that happens, tzdb will either have to truncate the real data to fit this arbitrary cutoff, or a second change to the precision supported will need to happen.
Of course it's unlikely that any zone will actually implement an offset with sub-millisecond precision, but I'm not buying arbitrarily limiting it to milliseconds on the *input* to the compiler on that basis.
I have an engineering background and can not help myself but to view things with a benefit/cost ratio analysis. I am 100% against prioritizing one dimension (eg. precision) while ignoring other dimensions (eg. sizeof, range, real-world application, backwards compatibility, etc.). To prioritize precision above all else means that we represent the offset, time points, and time durations with a “BigNum type” that allocates memory on the heap to represent arbitrary precision and range. That (imho) is not on the table. Howard
My suggestion was that the input to the compiler (which is not strongly typed, so has no `sizeof`) should either have infinite precision or a clear upgrade path (e.g. a precision specification). So far Paul's patch has no effect on the *output* of the zic compiler, and I think it's at least reasonable for the tzdb (the *input* to zic) to be able to support arbitrarily fine inputs, considering that is the ultimate "source of truth", even without any of the other engineering concerns. With regards to the other engineering concerns, that was what I was trying to appeal to when I said that nanoseconds are a more reasonable choice for precision *if we're arbitrarily limiting this anyway*. By selecting milliseconds or even microseconds, you're sacrificing precision in exchange for completely unnecessary range. Time zone offsets outside of +/- 1 day are dubious (and unsupported in many environments), +/- 1 week are very unlikely, and +/- 1 year is absurd. While both are pretty unlikely, I think nanosecond precision offsets are much more likely than >219 year timezone offsets, so assuming that you want to truncate the inputs *at all*, it would be preferable to use nanosecond precision than millisecond. Honestly, I'm fine with (assuming infinite precision isn't supported) any resolution such that the range is ~1 week. On 02/05/2018 08:30 PM, Howard Hinnant wrote:
On Feb 5, 2018, at 7:31 PM, Paul G <paul@ganssle.io> wrote:
I want no truncation whatsoever. I want to do exact time arithmetic.
Then why are you advocating for a 1ms precision? If you don't want any truncation, then you should be arguing for unlimited precision representations. Anything else will necessarily be a truncation.
We’re having a philosophical argument. We both want the “truth”, but the “truth” is also elusive. For example if the two of us agreed that nanosecond precision of an offset is what was agreed upon in 1937 for some time zone, what is to prevent someone later from coming along and saying, no, actually we need picosecond resolution? Or femtosecond resolution?! Ultimately we could argue ourselves down to Planck time resolution. This would obviously be ridiculous. And if we accept that observation of ridiculous, then somewhere between Planck time resolution and gigasecond resolution is the optimum answer. Finer is not always better and coarser is not always better. There exists an optimum between these two ridiculous extremes. If you’re going to argue a specific resolution (e.g. nanosecond), I would like to base that on something better than “finer is better” because I can go finer than nanoseconds, no problem. And modern CPU’s have a clock tick at sub-nanosecond levels, so there’s a reasonable argument to go there.
Couple that with: Finer precision implies shorter range for a given number of bits.
And we have an engineering tradeoff for precision vs range. We can have the ultimate precision or the ultimate range, but not both. We need to factor in engineering judgement on the best tradeoff of precision vs range for a given sizeof(representation).
If I have an offset of 1ns, and I add that to a time point of 1us UTC, the result is 1001ns in time zone X. To be able to accurately represent the time point in Zone X I have to be able to exactly represent 1001ns.
True. This project does not decide what the time zones will be, though. You will have this problem if and only if some zone decides on an offset with nanosecond precision, and if that happens, tzdb will either have to truncate the real data to fit this arbitrary cutoff, or a second change to the precision supported will need to happen.
Of course it's unlikely that any zone will actually implement an offset with sub-millisecond precision, but I'm not buying arbitrarily limiting it to milliseconds on the *input* to the compiler on that basis.
I have an engineering background and can not help myself but to view things with a benefit/cost ratio analysis. I am 100% against prioritizing one dimension (eg. precision) while ignoring other dimensions (eg. sizeof, range, real-world application, backwards compatibility, etc.). To prioritize precision above all else means that we represent the offset, time points, and time durations with a “BigNum type” that allocates memory on the heap to represent arbitrary precision and range. That (imho) is not on the table.
Howard
On Feb 5, 2018, at 9:02 PM, Paul G <paul@ganssle.io> wrote:
My suggestion was that the input to the compiler (which is not strongly typed, so has no `sizeof`) should either have infinite precision or a clear upgrade path (e.g. a precision specification). So far Paul's patch has no effect on the *output* of the zic compiler, and I think it's at least reasonable for the tzdb (the *input* to zic) to be able to support arbitrarily fine inputs, considering that is the ultimate "source of truth", even without any of the other engineering concerns.
Ah, I see our disconnect here. I have no concern about the zic compiler. The zic compiler is not in my workflow. The *input* to the zic compiler is the *only* thing that concerns me. I (and several others) have our own “zic compilers” which take this input, process it, and deliver it to our customers. For us, the product of the IANA repository is only the tz database, and not the tz code. Furthermore this “product”, the tz database, will be consumed by not just one alternative “zic compiler”, but many, and in many different languages on many different platforms.
With regards to the other engineering concerns, that was what I was trying to appeal to when I said that nanoseconds are a more reasonable choice for precision *if we're arbitrarily limiting this anyway*. By selecting milliseconds or even microseconds, you're sacrificing precision in exchange for completely unnecessary range. Time zone offsets outside of +/- 1 day are dubious (and unsupported in many environments), +/- 1 week are very unlikely, and +/- 1 year is absurd. While both are pretty unlikely, I think nanosecond precision offsets are much more likely than >219 year timezone offsets, so assuming that you want to truncate the inputs *at all*, it would be preferable to use nanosecond precision than millisecond. Honestly, I'm fine with (assuming infinite precision isn't supported) any resolution such that the range is ~1 week.
As I’ve repeatedly expressed, a precision of offset demands a precision in time point. And it is the range of the time point, not the offset that concerns me. utc_offset == local_time - utc_time This is an algebraic equation that _must_ be true. If utc_offset has precision nanoseconds, then either local_time or utc_time must have precision of nanoseconds or finer to make this equation true. This is a mathematical reality that exists. If utc_offset == 1ns, and neither local_time nor utc_time have the ability to represent nanosecond precision, how can the above equation possibly work, aside from coincidental examples where the number of nanoseconds in the utc_offset is an integral number of the precision of local_time or utc_time (e.g. a billion nanoseconds if both local_time and utc_time are seconds precision)? Howard
On 2/5/18 22:31, Howard Hinnant wrote:
On Feb 5, 2018, at 9:02 PM, Paul G <paul@ganssle.io> wrote:
My suggestion was that the input to the compiler (which is not strongly typed, so has no `sizeof`) should either have infinite precision or a clear upgrade path (e.g. a precision specification). So far Paul's patch has no effect on the *output* of the zic compiler, and I think it's at least reasonable for the tzdb (the *input* to zic) to be able to support arbitrarily fine inputs, considering that is the ultimate "source of truth", even without any of the other engineering concerns. Ah, I see our disconnect here. I have no concern about the zic compiler. The zic compiler is not in my workflow. The *input* to the zic compiler is the *only* thing that concerns me. I (and several others) have our own “zic compilers” which take this input, process it, and deliver it to our customers. For us, the product of the IANA repository is only the tz database, and not the tz code. Furthermore this “product”, the tz database, will be consumed by not just one alternative “zic compiler”, but many, and in many different languages on many different platforms. Again - writing a filter so that something that looks like the current tzdb can be generated from the real tzdb - cannot be too difficult. Could even generalize it to round
./tzfilter --precision=millis
With regards to the other engineering concerns, that was what I was trying to appeal to when I said that nanoseconds are a more reasonable choice for precision *if we're arbitrarily limiting this anyway*. By selecting milliseconds or even microseconds, you're sacrificing precision in exchange for completely unnecessary range. Time zone offsets outside of +/- 1 day are dubious (and unsupported in many environments), +/- 1 week are very unlikely, and +/- 1 year is absurd. While both are pretty unlikely, I think nanosecond precision offsets are much more likely than >219 year timezone offsets, so assuming that you want to truncate the inputs *at all*, it would be preferable to use nanosecond precision than millisecond. Honestly, I'm fine with (assuming infinite precision isn't supported) any resolution such that the range is ~1 week. As I’ve repeatedly expressed, a precision of offset demands a precision in time point. And it is the range of the time point, not the offset that concerns me.
utc_offset == local_time - utc_time
This is an algebraic equation that _must_ be true.
If utc_offset has precision nanoseconds, then either local_time or utc_time must have precision of nanoseconds or finer to make this equation true. This is a mathematical reality that exists. If utc_offset == 1ns, and neither local_time nor utc_time have the ability to represent nanosecond precision, how can the above equation possibly work, aside from coincidental examples where the number of nanoseconds in the utc_offset is an integral number of the precision of local_time or utc_time (e.g. a billion nanoseconds if both local_time and utc_time are seconds precision)?
Howard
On Mon, Feb 5, 2018 at 10:31 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote:
utc_offset == local_time - utc_time
This is an algebraic equation that _must_ be true.
That presumes that "utc_time", "local_time", and "utc_offset" are useful concepts. Jettison them and the problem goes away.
In the "Fractional seconds in zic input" thread, Howard Hinnant wrote:
Ah, I see our disconnect here. I have no concern about the zic compiler. The zic compiler is not in my workflow. The *input* to the zic compiler is the *only* thing that concerns me. I (and several others) have our own "zic compilers" which take this input, process it, and deliver it to our customers.
This strikes me as a very significant statement. If this is already a well-settled issue, pardon me for reraising it, but: is it agreed that the zic *input* files are as much of a product of this project as are the compiled output files, and the reference code that reads and interprets them? In the beginning I would have thought that the project's product was the database *and* the reference code, such that the project could change the database format as much as it wanted to as long as it released updated code to match. Soon enough, of course, there were other consumers of the database files (one in fact written by the project's current maintainer), such that more caution was needed when making changes to the zonefile format, changes which now necessitate corresponding changes by the authors of all other consumers. But I would have thought that, working our way upstream, the project could still change the zic input format as much as it wanted to as long as it updated zic to match. But now Howard is saying:
For us, the product of the IANA repository is only the tz database, and not the tz code.
And he is using the words "the tz database" to refer, not to the zic output files, to the *input* files! I'm not trying to blame Howard, and he's certainly not alone. There was of course that a big long thread last month about the issues with OpenJDK/CLDR/ICU/Joda and the Ireland change, and just today we heard about Kerry Shetline's new compiler. My point is simply that it's pretty hugely significant whether the zic input files are an officially-supported product or not. If they are, the project has to be much more conservative about making these kinds of changes, probably more conservative than it otherwise wants to be. But if they're not an officially-supported product, then people have to be discouraged from writing their own compilers, or if for whatever reason they truly need their own compilers and their own compiled data formats different from zic's, it seems to me we're going to inexorably wind up feeling the need to fork the project, a possibility which Stephen Colebourne has already raised. But, again, apologies if I'm being overly melodramatic, or raising and issue that's already well understood. (But in any case, we may need some clearer terminology. In particular: do those words "the tz database" properly refer to the zic output files, or the input files, or both?)
From my perspective as a prospective consumer, the input files to zic are the part that I can use in the system that it might be integrated into.
There are multiple reasons why the library that uses the zic-based output is not usable by the project (many of the reasons are peculiar to that software). I'm not entirely happy with the output format from zic, but that's somewhat separate — that could be lived with. But my project would probably be based on the data files that are the input to zic, and changing the format of those would become an issue. Adding comments etc isn't a problem; changing the format of the operational textual data that is the source for zic could be a problem. From my perspective, zic plus the library is a sample consumer of the input data — it is not the sole consumer. The input format to zic is the crucial data — the output from zic is coincidental (useful for checking, etc, but not the end data that will be used). I get the impression from the other emails on the list, that lots of other people consider it similarly — not necessarily for the same reasons, of course. On Mon, Feb 5, 2018 at 8:57 PM, Steve Summit <scs@eskimo.com> wrote:
In the "Fractional seconds in zic input" thread, Howard Hinnant wrote:
Ah, I see our disconnect here. I have no concern about the zic compiler. The zic compiler is not in my workflow. The *input* to the zic compiler is the *only* thing that concerns me. I (and several others) have our own "zic compilers" which take this input, process it, and deliver it to our customers.
This strikes me as a very significant statement. If this is already a well-settled issue, pardon me for reraising it, but: is it agreed that the zic *input* files are as much of a product of this project as are the compiled output files, and the reference code that reads and interprets them?
In the beginning I would have thought that the project's product was the database *and* the reference code, such that the project could change the database format as much as it wanted to as long as it released updated code to match.
Soon enough, of course, there were other consumers of the database files (one in fact written by the project's current maintainer), such that more caution was needed when making changes to the zonefile format, changes which now necessitate corresponding changes by the authors of all other consumers.
But I would have thought that, working our way upstream, the project could still change the zic input format as much as it wanted to as long as it updated zic to match. But now Howard is saying:
For us, the product of the IANA repository is only the tz database, and not the tz code.
And he is using the words "the tz database" to refer, not to the zic output files, to the *input* files!
I'm not trying to blame Howard, and he's certainly not alone. There was of course that a big long thread last month about the issues with OpenJDK/CLDR/ICU/Joda and the Ireland change, and just today we heard about Kerry Shetline's new compiler.
My point is simply that it's pretty hugely significant whether the zic input files are an officially-supported product or not. If they are, the project has to be much more conservative about making these kinds of changes, probably more conservative than it otherwise wants to be. But if they're not an officially-supported product, then people have to be discouraged from writing their own compilers, or if for whatever reason they truly need their own compilers and their own compiled data formats different from zic's, it seems to me we're going to inexorably wind up feeling the need to fork the project, a possibility which Stephen Colebourne has already raised.
But, again, apologies if I'm being overly melodramatic, or raising and issue that's already well understood.
(But in any case, we may need some clearer terminology. In particular: do those words "the tz database" properly refer to the zic output files, or the input files, or both?)
-- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2015.1101 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."
On 6 February 2018 at 04:57, Steve Summit <scs@eskimo.com> wrote:
In the "Fractional seconds in zic input" thread, Howard Hinnant wrote:
Ah, I see our disconnect here. I have no concern about the zic compiler. The zic compiler is not in my workflow. The *input* to the zic compiler is the *only* thing that concerns me. I (and several others) have our own "zic compilers" which take this input, process it, and deliver it to our customers.
This strikes me as a very significant statement. If this is already a well-settled issue, pardon me for reraising it, but: is it agreed that the zic *input* files are as much of a product of this project as are the compiled output files, and the reference code that reads and interprets them?
The downstream consumers I have experience of use the zic source files (europe, america, asia, etc). For those projects, the source files are the primary form of the tz database. Stephen
Date: Mon, 05 Feb 2018 23:57:59 -0500 From: scs@eskimo.com (Steve Summit) Message-ID: <2018Feb05.2357.scs.0001@quinine.home> | In the beginning I would have thought that the project's product | was the database *and* the reference code, The reference code is a side issue, useful to show how to use the data, and to assist in verifying its correctness, but it is the data that matters. Note: the data, not the format in which it is expressed, that's an even smaller side issue. The part of all of this that is difficult (aside from attempting to make the code work on a zillion different systems in a sane way) is actually collecting and, as best as is possible, verifying the data. That is all that really matters. All the rest is just frills, and as anyone is free to take the collected data and write it down in whatever format they like, no-one should really be overly concerned with the format in which we write it, nor how often that changes. Getting as much data as possible, as accurately as we can determine it (down to tiny fractions of a second, when it matters) is all that is really important. When the format cannot represent the data, we change the format, never compromise the data. The format can also change just because something new happens to be more convenient. kre
I agree that the data is key, and by that I mean the distributed 'zic input data' (eg "southamerica"). However, I disagree strongly about:
no-one should really be overly concerned with the format in which we write it
When a data file format is in very widespread use, changes to it are extremely painful for downstream clients. I've had plenty of experience with this with Unicode, BCP47, CLDR, and similar levels of internal changes at the companies I've worked at. Seemingly trivial changes have a way of screwing up lots of programs and millions of people. *If the TZDB were not important, arbitrary changes would not matter. *But it is a crucial part of the world's software stack; its very importance cries out for stability. (As a trivial counterexample for "no-one should really be overly concerned with the format", try changing the character set of the files to EBCDIC and see how many squawks you get from users). Now, there are ways to both expand the format and retain stability. Here are a couple of ways to do that. A. Bifurcate the data 1. *Core. *Always make available a set of data files in the current format. No changes to support "advanced" features like SAVE<0, fractional digits, etc. No splitting IDs because of advanced features either. 2. *Advanced. *The format of data files can change "with no concern", in order to support "advanced" features. One way to make this practical is to always have a program that generates the core data by filtering the advanced. It is important, however, such a program strictly minimize the textual changes to the core, so that diffing produces changes on the order of what it done now, for updates to country rules. B. Add conditionals Another way is to have just one set of files, but have well-defined "conditionals" to enable new features. Here is an example, just for illustration: # @ IF FRACTIONAL # @ Rule Arg 2007 only - Dec 30 0:00.0000001 1:00 S # @ ELSE Rule Arg 2007 only - Dec 30 0:00 1:00 S # @ END The key to having that work is that older implementations will just ignore the # @... lines, and newer implementations that want to support the features can use them. Mark On Tue, Feb 6, 2018 at 12:14 PM, Robert Elz <kre@munnari.oz.au> wrote:
Date: Mon, 05 Feb 2018 23:57:59 -0500 From: scs@eskimo.com (Steve Summit) Message-ID: <2018Feb05.2357.scs.0001@quinine.home>
| In the beginning I would have thought that the project's product | was the database *and* the reference code,
The reference code is a side issue, useful to show how to use the data, and to assist in verifying its correctness, but it is the data that matters. Note: the data, not the format in which it is expressed, that's an even smaller side issue.
The part of all of this that is difficult (aside from attempting to make the code work on a zillion different systems in a sane way) is actually collecting and, as best as is possible, verifying the data.
That is all that really matters. All the rest is just frills, and as anyone is free to take the collected data and write it down in whatever format they like, no-one should really be overly concerned with the format in which we write it, nor how often that changes.
Getting as much data as possible, as accurately as we can determine it (down to tiny fractions of a second, when it matters) is all that is really important. When the format cannot represent the data, we change the format, never compromise the data. The format can also change just because something new happens to be more convenient.
kre
On 02/05/2018 08:57 PM, Steve Summit wrote:
it's pretty hugely significant whether the zic input files are an officially-supported product or not.
This general issue is covered here: https://data.iana.org/time-zones/theory.html#stability
Ah, I see our disconnect here. I have no concern about the zic compiler. The zic compiler is not in my workflow. The *input* to the zic compiler is the *only* thing that concerns me. I (and several others) have our own “zic compilers” which take this input, process it, and deliver it to our customers. For us, the product of the IANA repository is only the tz database, and not the tz code. Furthermore this “product”, the tz database, will be consumed by not just one alternative “zic compiler”, but many, and in many different languages on many different platforms.
This is not the disconnect, we're *both* talking about the input to zic. I was just saying that the current patch has no effect on the data, it's just a patch to the reference compiler (zic). What I don't understand is why you want to put *any* restrictions on the tzdb data precision at all. All your concerns about precision and range are irrelevant when you don't have sized inputs, all you're asking for is an arbitrary truncation of timezone offsets *at the source*, which is completely unnecessary. If you're writing your own compiler, truncate at 1ms or 1us or 1ns or whatever the precision of your offset data type is. I will abandon my other line of argument about milliseconds not making sense because honestly I think the precision in the tzdb should be unlimited anyway, it's just lazy and short-sighted design to put an arbitrary limit on a DSL where that requirement makes no sense. Re: Malcolm Wallace
Since a 1-millisecond timezone offset corresponds to approximately 30cm longitudinal distance on the surface of the earth, I can't imagine any real (*timezone*) uses for better precision of the offsets.
I certainly don't think it's likely that any real time zone would use sub-millisecond precision, but I think it's much more likely that anyone uses an offset >= 1 year, which is why Howard's much-vaunted "range vs precision" argument should really not land us on milliseconds. That said, the one realistic scenario that I can immagine offhand that would have a strange offset like this (and this would honestly be something of an abuse of the tzdb notation to do this rather than a legitimate time zone) would be if someone encoded a leap second smear directly into their time zone by progressively adjusting their time zone during the course of a day or more. That might require some large amount of small entries with odd offsets. (Though honestly if someone does that, it would probably be best to add some sort of smear-specific notation to the tzdb). Another possible scenario would be a strange aesthetic choice, like some small data haven country or religious organization starts operating in a time zone that has some symbolic value down to the nanosecond level (like an offset of 3:14:15.926535897). Again, this is very unlikely, but the pain it causes to support *any* precision in the *input* files is so mininmal (as long as we're changing up the format to support fractional seconds anyway) that it is not worth placing arbitrary restrictions on what kinds of time zones we can represent. On 02/05/2018 10:31 PM, Howard Hinnant wrote:
On Feb 5, 2018, at 9:02 PM, Paul G <paul@ganssle.io> wrote:
My suggestion was that the input to the compiler (which is not strongly typed, so has no `sizeof`) should either have infinite precision or a clear upgrade path (e.g. a precision specification). So far Paul's patch has no effect on the *output* of the zic compiler, and I think it's at least reasonable for the tzdb (the *input* to zic) to be able to support arbitrarily fine inputs, considering that is the ultimate "source of truth", even without any of the other engineering concerns.
With regards to the other engineering concerns, that was what I was trying to appeal to when I said that nanoseconds are a more reasonable choice for precision *if we're arbitrarily limiting this anyway*. By selecting milliseconds or even microseconds, you're sacrificing precision in exchange for completely unnecessary range. Time zone offsets outside of +/- 1 day are dubious (and unsupported in many environments), +/- 1 week are very unlikely, and +/- 1 year is absurd. While both are pretty unlikely, I think nanosecond precision offsets are much more likely than >219 year timezone offsets, so assuming that you want to truncate the inputs *at all*, it would be preferable to use nanosecond precision than millisecond. Honestly, I'm fine with (assuming infinite precision isn't supported) any resolution such that the range is ~1 week.
As I’ve repeatedly expressed, a precision of offset demands a precision in time point. And it is the range of the time point, not the offset that concerns me.
utc_offset == local_time - utc_time
This is an algebraic equation that _must_ be true.
If utc_offset has precision nanoseconds, then either local_time or utc_time must have precision of nanoseconds or finer to make this equation true. This is a mathematical reality that exists. If utc_offset == 1ns, and neither local_time nor utc_time have the ability to represent nanosecond precision, how can the above equation possibly work, aside from coincidental examples where the number of nanoseconds in the utc_offset is an integral number of the precision of local_time or utc_time (e.g. a billion nanoseconds if both local_time and utc_time are seconds precision)?
Howard
Of course it's unlikely that any zone will actually implement an offset with sub-millisecond precision, but I'm not buying arbitrarily limiting it to milliseconds on the *input* to the compiler on that basis.
Since a 1-millisecond timezone offset corresponds to approximately 30cm longitudinal distance on the surface of the earth, I can't imagine any real (*timezone*) uses for better precision of the offsets. M. -----Original Message----- From: tz [mailto:tz-bounces@iana.org] On Behalf Of Paul G Sent: 06 February 2018 00:31 To: Howard Hinnant Cc: Time zone mailing list Subject: [External] Re: [tz] Fractional seconds in zic input
I want no truncation whatsoever. I want to do exact time arithmetic.
Then why are you advocating for a 1ms precision? If you don't want any truncation, then you should be arguing for unlimited precision representations. Anything else will necessarily be a truncation.
If I have an offset of 1ns, and I add that to a time point of 1us UTC, the result is 1001ns in time zone X. To be able to accurately represent the time point in Zone X I have to be able to exactly represent 1001ns.
True. This project does not decide what the time zones will be, though. You will have this problem if and only if some zone decides on an offset with nanosecond precision, and if that happens, tzdb will either have to truncate the real data to fit this arbitrary cutoff, or a second change to the precision supported will need to happen. Of course it's unlikely that any zone will actually implement an offset with sub-millisecond precision, but I'm not buying arbitrarily limiting it to milliseconds on the *input* to the compiler on that basis.
Howard
On Feb 5, 2018, at 4:03 PM, Paul G <paul@ganssle.io> wrote:
Yes, but you are always necessarily truncating time to the precision of your representation because time has significantly greater than nanosecond precision. Your compiler can always truncate the offsets if you want to represent "point in time" as an offset from some fixed point (e.g. the unix epoch) and you want offsets *within that representation* to have the same precision as the offset you're using to represent the point in time. Given that the realistic range of offsets is on the order of +/- 1 day, I don't think we should limit their precision to the same precision as the timestamps they may be used with.
If, as seems to be increasingly the case, we're concerned with future-proofing tzdb, it would make sense to support a very high precision like nanoseconds or go with a precision specification scheme that is effectively unlimited. Compilers are free to truncate to whatever level of precision they want in their output data.
Best, Paul
On 02/05/2018 02:51 PM, Howard Hinnant wrote:
On Feb 5, 2018, at 2:13 PM, Paul G <paul@ganssle.io> wrote:
Maybe I'm missing something, but are we talking about fractional seconds in *offsets* or fractional seconds for the time of the change?
For offsets, why would we care whether it can represent +/- 292,000 years, since it's fantastically unlikely that a time zone offset would even be outside of +/- 24 hours. While both outcomes are very unlikely, I think an offset best represented in nanoseconds is much more likely than an offset +/- 292 years...
Let’s say, just for example, that we have a UTC offset of 1ns for Zone X.
Let’s further assume that I want to map arbitrary time points between UTC and X, exactly.
Well, in order to be sure that I can map UTC to X and back to UTC again, with no loss of information, then time points in both UTC and in X must have nanosecond (or finer) precision. (disclaimer: I’m using the term “finer” here in a very coarse manner. :-) The actual requirement is that the precision of the UTC and X time points must be able to exactly represent nanosecond precision, in the same way you can exactly represent minutes precision with a type holding milliseconds precision — but not vice-versa.)
If I can only represent (for example) microsecond precision in UTC and X, then when I map a time point from UTC to X (or vice-versa), the 1ns offset will be lost when I add it to a count of microseconds, and truncate the result to microseconds. Subsequently my X time point will not be an accurate representation of the specified mapping for the X time zone. For example if I subtract UTC from local time I should get the offset, but in this example I would get 0.
Howard
This email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please delete all copies and notify the sender immediately. You may wish to refer to the incorporation details of Standard Chartered PLC, Standard Chartered Bank and their subsidiaries at http://www.standardchartered.com/en/incorporation-details.html Where you have a Financial Markets relationship with Standard Chartered PLC, Standard Chartered Bank and their subsidiaries (the "Group"), information on the regulatory standards we adhere to and how it may affect you can be found in our Regulatory Compliance Statement on https://www.sc.com/rcs/ and Regulatory Compliance Disclosures on http://www.sc.com/rcs/fm Insofar as this communication contains any market commentary, the market commentary has been prepared by sales and/or trading desk of Standard Chartered Bank or its affiliate. It is not and does not constitute research material, independent research, recommendation or financial advice. Any market commentary is for information purpose only and shall not be relied for any other purpose, and is subject to the relevant disclaimers available at https://www.sc.com/en/banking-services/market-disclaimer.html Insofar as this e-mail contains the term sheet for a proposed transaction, by responding affirmatively to this e-mail, you agree that you have understood the terms and conditions in the attached term sheet and evaluated the merits and risks of the transaction. We may at times also request you to sign on the term sheet to acknowledge in respect of the same. Please visit https://www.sc.com/en/banking-services/dodd-frank-disclosures.html for important information with respect to derivative products.
On 2/5/18 13:50, Paul Eggert wrote:
On 02/05/2018 10:46 AM, Howard Hinnant wrote:
If two clients (different platforms) want to maintain the invariant that equal time_points remain equal after mapping, then they must operate at the precision of the mapping (or finer). We already have clients that don't want to do that, as they discard sub-minute resolution. But I take your point that some clients may want to do that and we should cater to this subclass of clients too. In that case, how about if we stick to at most 1-ms resolution in the data, and note in zic.8 that 1 ms resolution is the way to go? I say "1 ms" because of Steve Allen's email.
The current representation of time in calendars is only down to the second and that has been an issue for industries (financial, smart grid etc) that want to represent that data. I can ask but I'm sure that milliseconds are too coarse. The problem of whether DST applies for 90 nanoseconds after 2am is something they will have to resolve.
On Mon 2018-02-05T10:38:56-0800 Paul Eggert hath writ:
I guess I'm not seeing the harm to go with nanoseconds in the data format; if a downstream user wants less precision they can easily round. And following Steve Allen's lead, we can mention in the documentation that there's no practical use of sub-millisecond precision in these old timestamps.
There are serious questions that need context to inform the choices of IANA tz when looking at sub-second precision. I ask that everyone take a look at this writeup and the questions that it poses for what tz database really wants to encode. https://www.ucolick.org/~sla/temporary/tzfracsec/ -- Steve Allen <sla@ucolick.org> WGS-84 (GPS) UCO/Lick Observatory--ISB 260 Natural Sciences II, Room 165 Lat +36.99855 1156 High Street Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m
On 02/05/2018 11:11 AM, Steve Allen wrote:
https://www.ucolick.org/~sla/temporary/tzfracsec/ Thanks for doing that work. To answer its questions:
For the case of sub-second offsets, does the tz project want to split the America/<pickone> zones to indicate that the US and Canada had different time offsets before and after events like this?
No, the intent of tzdb is to record the (idealized) UT offsets for civil time. Actual clocks often had errors, and we obviously can't record all those errors.
Should tz prefer the intended time offsets from UT/UTC according to legal decrees only (in the absence of technical details), or the offsets from UT/UTC actually used by people who at best had access to some radio broadcast time signals?
More the former. That is, tzdata attempts to record the intended UT/UTC offsets instead of the actual offsets, and similarly for transition times.
If tz chooses to ignore technical details, then most of the sub-second offsets of local mean time are irrelevant because the contemporary practice of time-keeping had systematic offsets and lack of accuracy which do not justify keeping offsets to even 0.1 s of sub-second precision.
Yes, quite true. Until a very few years ago the wall clocks in Boelter Hall at UCLA were set by hand and would stray from the correct time by several minutes. Much of the world still runs this way. We're not trying to record that. Instead, for these non-integer UT offsets, we're actually recording the longitude of the meridians of the affected civil-time zones. (I suppose this should be written down in theory.html somewhere.) My impression is that such longitudes could be measured to a precision no worse than 0.1 s of UT offset (which corresponds to 1.5″ of longitude), even with circa-1930 technology. Admittedly I haven't researched this in detail.
On Mon 2018-02-05T17:29:05-0800 Paul Eggert hath writ:
Instead, for these non-integer UT offsets, we're actually recording the longitude of the meridians of the affected civil-time zones. (I suppose this should be written down in theory.html somewhere.) My impression is that such longitudes could be measured to a precision no worse than 0.1 s of UT offset (which corresponds to 1.5" of longitude), even with circa-1930 technology.
No, due to inconsistencies between the global geodetic datums which were used as the basis of the longitudes assumed for calculations of local standard time. Prior to the 1980s there were offsets of hundreds of meters between one regional geodetic datum and another. I am going to argue that sub-second resolutions based on the once-assumed longitudes of sites are down in the "technical details" level that tzdb does not want to record except in comments. Have a look at Publications of the USNO, second series, Vol. 4, part 4, page G16 (1906) https://books.google.com/books?id=QUrnAAAAMAAJ&pg=RA3-PA16#v=onepage&q&f=fal... This has a table of the use of time in various nations constructed by the navy and department of state. For France (Paris) this gives a time offset of 9 minutes 20.9 seconds from Greenwich. In this case that offset is almost certainly correct to that precision because 1) Greenwich had exquisite equipment and skilled astronomers 2) Paris had exquisite equipment and skilled astronomers 3) the distance between the two sites allowed for transport of a chronometer back and forth within the span of about a day 4) the distance between the two sites allowed for telegraphic transmission line delays to be well characterized 5) the distance between the two sites allowed for radio transmission delays to be well characterized 6) the data from the BIH up through 1961 gave 9m 20.935s 7) the new globally adjusted BIH value from 1962 was 9m 20.921s Pick some other place and many of these criteria will not be met. For example, Zikawei China is given in 1906 USNO as 8h 5m 43.3s and by BIH after 1962 as 8h 5m 42.864s, a notable systematic offset between the once-believed longitude which had been used for local time and the globally self-consistent longitude. Alternatively, in the 1963 plots of time from Rio de Janiero the curve wanders over a range of 0.6 s during the year. Other observatories at earlier dates did much worse at providing their local legal time. A strong argument that the decimal seconds are technical matters, rather than the legal or practical matters that tzdb cares to track, is that after the 1884 International Meridian Conference there was resistance to the notion of Greenwhich time in Paris. Despite the accurate decimals of the measurements known to the USNO by 1906, in 1911 the law was changed to make legal time in France 9m 21s behind the mean solar time of Paris observatory. https://books.google.com/books?id=VgkiAQAAMAAJ&pg=PA260#v=onepage&q&f=false If tzdb adopts sub-second resolution then it needs a very clear set of rules about how those are to be used and interpreted, along with caveats about how much not to believe that level of precision offset from a globally self-consistent time. -- Steve Allen <sla@ucolick.org> WGS-84 (GPS) UCO/Lick Observatory--ISB 260 Natural Sciences II, Room 165 Lat +36.99855 1156 High Street Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m
On 02/05/2018 07:04 PM, Steve Allen wrote:
I am going to argue that sub-second resolutions based on the once-assumed longitudes of sites are down in the "technical details" level that tzdb does not want to record except in comments.
Thanks for looking into the problem and for coming up with such a clear and convincing argument. Although I'm not sure that subsecond precision would be wrong everywhere, it does seem clear that it would be useful only in a very few cases in tzdata, and the exact set of cases would be hard to determine reliably. That's a good argument to not bother to add it, so let's revert that. However, I would still like zic to accept (and simply discard) subsecond data on input, as this is a straightforward extension to the data format, is not likely to collide with any future extension, and may be of some use someday (say, if North Korea decides to use a non-integer UTC offset as some sort of protest against the tyranny of Western timekeeping :-). Proposed patch attached.
Really? Another completely unnecessary change adding no value? All consumers want is a file updated whenever a government changes the clocks! Nothing more. "Ideally tzcode would support fractional-second times and UT offsets all the way down the chain" Why? What possible benefit could it bring to the global software community. "The real problem here is the incessant fiddling with the data. The vast majority of users just want small stable updates representing actual changes in time zones, not the continuous refactoring we've been subjected to in the last few years." http://mm.icann.org/pipermail/tz/2018-January/025822.html "the focus should be on small, stable updates, not potentially destabilizing "cleanups"" http://mm.icann.org/pipermail/tz/2018-January/025827.html "any such change should have very, very clear ROI that strongly outweighs the disruption" http://mm.icann.org/pipermail/tz/2018-January/026087.html Please stop messing around, revert this patch and abandon the idea. TZDB needs to get back to being the pragmatic practical tool it was intended to be. Stephen On 4 February 2018 at 16:37, Paul Eggert <eggert@cs.ucla.edu> wrote:
For many years I've chafed at tzdata's lack of support for fractional seconds. We have good evidence of well-established standard-time UT offsets that were not multiples of one second; for example, the Netherlands before 1937. These can't be recorded in tzdata except as comments.
Ideally tzcode would support fractional-second times and UT offsets all the way down the chain. This would mean changes to the tz binary format and to the runtime API, though, which is not something we'd do lightly, if ever. However, it's easy to change the zic spec to allow fractional seconds, and to change zic to accept and ignore the fractions, so that fractional seconds can be documented more formally in the data; this could well be useful to applications other than tzcode. Proposed patch attached, hairy sscanf format and all.
This patch does not actually change the data, as we'll need time, and/or a procedure to automatically generate data compatible with zic 2018c and earlier.
On 02/05/2018 02:49 AM, Stephen Colebourne wrote:
Another completely unnecessary change adding no value?
Although it doesn't add value for today's timestamps, it is useful for historical timestamps that have been covered by the database for decades. Some applications do deal with older timestamps, and when it's easy (as it is here) it's helpful to correct longstanding data entry errors that were forced by an inadequate format. Downstream parsers like OpenJDK+CLDR that do not handle fractional seconds can use the file rearguard.zi, which avoids them and so should continue to be compatible. I suggest testing with the development version's rearguard.zi now, to shake out any potential problems in that area. More generally, tzdb should not be thought of as a project whose format is fixed in stone and will never change. The format has changed in the past (e.g., the "u" suffix) and will undoubtedly will change in the future for reasons that we cannot in general anticipate, and it's helpful to have some processes in place to deal with such changes. The proposed scheme with vanguard.zi, main.zi, and rearguard.zi is an attempt to supply such a process: it provides rearguard.zi for downstream users who want to put off changes for as long as possible, and vanguard.zi for downstream users who want to try new features ASAP. Both classes of users have commented in this thread.
Paul Eggert <eggert@cs.ucla.edu> wrote on Mon, 5 Feb 2018 at 09:23:04 -0800 in <c968840d-014f-abdf-8efa-778c297039fc@cs.ucla.edu>:
Although it doesn't add value for today's timestamps, it is useful for historical timestamps that have been covered by the database for decades. Some applications do deal with older timestamps, and when it's easy (as it is here) it's helpful to correct longstanding data entry errors that were forced by an inadequate format.
I question this claim, that it is helpful to correct longstanding data entry errors. The result is that historical timestamps in such zones change, and interval times that may have been previously calculated may change, either becoming incorrect (or correct; or "differntly incorrect"). Those are not necessarily minor changes. We should pause before "correcting" historical timestamps, even if we think the change is better. (In practice, I suppose, we have been making changes willy-nilly to historical timestamps for long enough that anyone who cares has likely already been burned by it and adopted a solution to mitigate it, although what that looks like I am not sure. I'm not sure this is a great argument to keep doing so...) --jhawk@mit.edu John Hawkinson
On 2/5/18 05:49, Stephen Colebourne wrote:
Please stop messing around, revert this patch and abandon the idea. TZDB needs to get back to being the pragmatic practical tool it was intended to be. Stephen
Reading https://www.iana.org/time-zones: The Time Zone Database (often called tz or zoneinfo) contains code and data that represent the history of local time for many representative locations around the globe. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight-saving rules. Its management procedure is documented in BCP 175: Procedures for Maintaining the Time Zone Database <https://www.iana.org/go/rfc6557>. The tone of the referenced BCP is in line with that. From the outset the data appears to have attempted to maintain an accurate picture of the history as well as current changes. What any group may strongly desire isn't necessarily what was intended. As earlier stated in another thread it's a simple task to filter the data to avoid problems with applications and operating systems and I'm guessing such a mechanism could be built into the current distribution.
On 4 February 2018 at 16:37, Paul Eggert <eggert@cs.ucla.edu> wrote:
For many years I've chafed at tzdata's lack of support for fractional seconds. We have good evidence of well-established standard-time UT offsets that were not multiples of one second; for example, the Netherlands before 1937. These can't be recorded in tzdata except as comments.
Ideally tzcode would support fractional-second times and UT offsets all the way down the chain. This would mean changes to the tz binary format and to the runtime API, though, which is not something we'd do lightly, if ever. However, it's easy to change the zic spec to allow fractional seconds, and to change zic to accept and ignore the fractions, so that fractional seconds can be documented more formally in the data; this could well be useful to applications other than tzcode. Proposed patch attached, hairy sscanf format and all.
This patch does not actually change the data, as we'll need time, and/or a procedure to automatically generate data compatible with zic 2018c and earlier.
participants (17)
-
Bradley White -
Brian Inglis -
Howard Hinnant -
John Hawkinson -
Jonathan Leffler -
Kim Davies -
Mark Davis ☕️ -
Michael Douglass -
Paul Eggert -
Paul G -
Robert Elz -
scs@eskimo.com -
Stephen Colebourne -
Steve Allen -
Tom Lane -
Wallace, Malcolm -
Zefram