RE: Question on abbreviations

So in this case: Rule US 1942 only - Feb 9 2:00 1:00 W # War Rule US 1945 only - Aug 14 23:00u 1:00 P # Peace Why is %s undefined in 1943? This was the question that started the thread. If the time setting carries forward, surely the letter should also. ++PLS -----Original Message----- From: tz-request@elsie.nci.nih.gov [mailto:tz-request@elsie.nci.nih.gov] On Behalf Of Ken Pizzini Sent: Wednesday, September 27, 2006 3:14 PM To: tz@lecserver.nci.nih.gov Subject: Re: Question on abbreviations On Wed, Sep 27, 2006 at 02:37:58PM -0700, Mark Davis wrote:
I share your confusion. If Paul (Eggert's) description is right, then I have to ignore the TO field in some circumstances which are entirely unclear to me. I would much rather see the TO field corrected. That is, if TO=1942 is ignored, and 1945 is the real date, then the line should be corrected to TO=1945.
The key to understanding is that the rules describe a list of *transitions*. After a transition, the described effect on zone offset and abbreviation *remain* in effect until the next transition. The "TO" part of a rule is used to enable a shorthand for a _recurring_ transition, such as "first Tuesday of February", for all years within the range. If "to" is "only", then the *transition* being documented is a singleton, but the transitioned-into offset/abbreviation remains in effect until the _next_ transition, no matter how far in the future.
There are other failures in the parsing. My error messages are: ... I looked into why this is happening, and found:
Zone Europe/Amsterdam 0:19:32 - LMT 1835 0:19:32 Neth %s 1937 Jul 1
But the first LETTER/S defined by Neth is in 1916, so during the range from 1835 to 1916 this is undefined. If the LETTER/S are magically also defined *before* the first FROM, that should be described in the specification.
Yes, this is a failure of the documentation. If a Zone refers to a time within a Rule that is before the first transition mentioned for that rule, then the _oldest_standard_time_ "Letter/s" is used. In this case, AMT.
BTW, the documentation was a first a bit confusing to me, since it says that fields are delimited by spaces, and lists a single Zone UNTIL field. However, if you look carefully at the documentation, there are really 4 fields:
UNTIL_YEAR UNTIL_IN UNTIL_ON UNTIL_AT
which are optional [but only in "truncation" from the end: that is, it corresponds to the (Perl) regex (UNTIL_YEAR (UNTIL_IN (UNTIL_ON (UNTIL_AT)?)?)?)?].
I'm not the only one to have initially made this mistake: the proposed XML format for the TZ database makes the same mistake.
Confusing: granted. Whether "Until" is one or multiple fields is a matter of interpretation. The _traditional_ understanding is that it is a *single* "timestamp field" which may happen to have spaces within it. BTW the subfields aren't "YEAR IN ON AT", but "YEAR MONTH DAY TIME". In this regard, a recent addition to the tzcode tarball is zoneinfo2tdf.pl, which translates the more free-with-spaces zone tzdata into a form which strictly uses a single tab between fields. This may make life easier for some by simplifying their parser's requirements. (Or not.) --Ken Pizzini

My reading of the specification (zic.8.txt) was that the first rule mentioned was operative during the interval from 1942 to "only", that is, during 1942 alone. This was by my reading of: TO Gives the final year in which the rule applies. In addition to minimum and maximum (as above), the word only (or an abbreviation) may be used to repeat the value of the FROM field. While it was explained to me what the actual code does, I don't think this is reflected in the above text -- or at least, not at all clearly. According to this text, if I saw the following: Rule US 1942 1944 - Feb 9 2:00 1:00 W # War The rule should not apply in 1945. So I request that the text be fixed, because the rule clearly, according to the explanations given on this thread, applies *afterwards* (and the circumstances in which it applies need to be clearly specified. Is it until the next Rule that has an SAVE value with the same SAVE value as this Rule? Until the next Rule that has a SAVE value?... mark On 9/27/06, Paul Schauble <Paul.Schauble@ticketmaster.com> wrote:
So in this case: Rule US 1942 only - Feb 9 2:00 1:00 W # War Rule US 1945 only - Aug 14 23:00u 1:00 P # Peace
Why is %s undefined in 1943? This was the question that started the thread. If the time setting carries forward, surely the letter should also.
++PLS
-----Original Message----- From: tz-request@elsie.nci.nih.gov [mailto:tz-request@elsie.nci.nih.gov] On Behalf Of Ken Pizzini Sent: Wednesday, September 27, 2006 3:14 PM To: tz@lecserver.nci.nih.gov Subject: Re: Question on abbreviations
On Wed, Sep 27, 2006 at 02:37:58PM -0700, Mark Davis wrote:
I share your confusion. If Paul (Eggert's) description is right, then I have to ignore the TO field in some circumstances which are entirely unclear to me. I would much rather see the TO field corrected. That is, if TO=1942 is ignored, and 1945 is the real date, then the line should be corrected to TO=1945.
The key to understanding is that the rules describe a list of *transitions*.
After a transition, the described effect on zone offset and abbreviation *remain* in effect until the next transition. The "TO" part of a rule is used to enable a shorthand for a _recurring_ transition, such as "first Tuesday of February", for all years within the range. If "to" is "only", then the *transition* being documented is a singleton, but the transitioned-into offset/abbreviation remains in effect until the _next_ transition, no matter how far in the future.
There are other failures in the parsing. My error messages are: ... I looked into why this is happening, and found:
Zone Europe/Amsterdam 0:19:32 - LMT 1835 0:19:32 Neth %s 1937 Jul 1
But the first LETTER/S defined by Neth is in 1916, so during the range from 1835 to 1916 this is undefined. If the LETTER/S are magically also defined *before* the first FROM, that should be described in the specification.
Yes, this is a failure of the documentation. If a Zone refers to a time within a Rule that is before the first transition mentioned for that rule, then the _oldest_standard_time_ "Letter/s" is used. In this case, AMT.
BTW, the documentation was a first a bit confusing to me, since it says that fields are delimited by spaces, and lists a single Zone UNTIL field. However, if you look carefully at the documentation, there are really 4 fields:
UNTIL_YEAR UNTIL_IN UNTIL_ON UNTIL_AT
which are optional [but only in "truncation" from the end: that is, it corresponds to the (Perl) regex (UNTIL_YEAR (UNTIL_IN (UNTIL_ON (UNTIL_AT)?)?)?)?].
I'm not the only one to have initially made this mistake: the proposed XML format for the TZ database makes the same mistake.
Confusing: granted. Whether "Until" is one or multiple fields is a matter of interpretation. The _traditional_ understanding is that it is a *single* "timestamp field" which may happen to have spaces within it. BTW the subfields aren't "YEAR IN ON AT", but "YEAR MONTH DAY TIME".
In this regard, a recent addition to the tzcode tarball is zoneinfo2tdf.pl, which translates the more free-with-spaces zone tzdata into a form which strictly uses a single tab between fields. This may make life easier for some by simplifying their parser's requirements. (Or not.)
--Ken Pizzini

On Wed, Sep 27, 2006 at 04:13:52PM -0700, Mark Davis wrote:
My reading of the specification (zic.8.txt) was that the first rule mentioned was operative during the interval from 1942 to "only", that is, during 1942 alone. This was by my reading of:
TO Gives the final year in which the rule applies. In addition to minimum and maximum (as above), the word only (or an abbreviation) may be used to repeat the value of the FROM field.
While it was explained to me what the actual code does, I don't think this is reflected in the above text
With the understanding that it is *transitions* being documented, it is reflected.
or at least, not at all clearly.
Apparently not, since this is not the first time that this confusion has come up.
According to this text, if I saw the following:
Rule US 1942 1944 - Feb 9 2:00 1:00 W # War
The rule should not apply in 1945.
Correct: the rule should *not* be applied in 1945. If there is a transition specified between 1944-02-09 and 1945-02-09, then the effect of that transition will still be in effect, with *this* rule *not* applied on 1945-02-09.
So I request that the text be fixed,
I'll make an attempt at making the text clearer... but then again, since I understood the original text and you found it misleading, perhaps you'd like to take a stab at clarifying it? --Ken Pizzini

On Wed, Sep 27, 2006 at 05:38:45PM -0700, Ken Pizzini wrote:
I'll make an attempt at making the text clearer...
Okay, here's what I came up with --- not so much rewording as adding clarifications: [snip] Input lines are made up of specified fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. An unquoted sharp character (#) in the input introduces a comment which extends to the end of the line the sharp character appears on. White space characters and sharp characters may be enclosed in double quotes (") if they're to be used as part of a field. Any line that is blank (after comment stripping) is ignored. Non-blank lines are expected to be of one of three types: rule lines, zone lines, and link lines. A rule line has the form Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S For example: Rule US 1967 1973 - Apr lastSun 2:00 1:00 D Each rule specifies one or more transitions between standard and saving time. The fields that make up a rule line are: NAME Gives the (arbitrary) name of the set of rules this rule is part of. FROM Gives the first year in which the transition rule applies. Any integer year can be supplied; the Gregorian calendar is assumed. The word minimum (or an abbreviation) means the mini- mum year representable as an integer. The word maximum (or an abbreviation) means the maximum year representable as an inte- ger. Rules can describe times that are not representable as time values, with the unrepresentable times ignored; this allows rules to be portable among hosts with differing time value types. TO Gives the final year in which the rule applies. In addition to minimum and maximum (as above), the word only (or an abbrevia- tion) may be used to repeat the value of the FROM field. Note that the offset that the rule transitions will continue on until the next transition occurs, no matter how far in the future of the FROM or TO years that may be. TYPE Gives the type of year in which the rule applies. If TYPE is - then the rule applies in all years between FROM and TO inclu- sive. If TYPE is something else, then zic executes the command yearistype year type to check the type of a year: an exit status of zero is taken to mean that the year is of the given type; an exit status of one is taken to mean that the year is not of the given type. This might be used, for example, to have a rule which applies only in leap years. [...unchanged text omitted...] LETTER/S Gives the "variable part" (for example, the "S" or "D" in "EST" or "EDT") of time zone abbreviations to be used when this rule is in effect. If this field is -, the variable part is null. Note that, as a special case, references to a rule's LETTER/S field (through a %s in a zone line's FORMAT field) for a date which predates the oldest date specified in a given rule will be assigned the "variable part" specified by the oldest stan- dard time (i.e., with a SAVE value of zero) transition speci- fied for the named rule. A zone line has the form Zone NAME GMTOFF RULES/SAVE FORMAT [UNTIL] [...unchanged text omitted...] UNTIL The time at which the UTC offset or the rule(s) change for a location. This field, which is logically a single field in the sense of the high-level description, consists of whitespace sepa- rated subfields consisting of a year, a month, a day, and a time of day. If this is specified, the time zone information is gen- erated from the given UTC offset and rule change until the time specified. The month, day, and time of day have the same format as the IN, ON, and AT columns of a rule; trailing columns can be omitted, and default to the earliest possible value for the miss- ing columns. The next line must be a "continuation" line; this has the same form as a zone line except that the string "Zone" and the name are omitted, as the continuation line will place information starting at the time specified as the UNTIL field in the previous line in the file used by the previous line. Continuation lines may contain an UNTIL field, just as zone lines do, indicating that the next line is a further continuation. [snip] And for ADO, here is the corresponding diff to the zic.8 page: --- zic.8~ 2006-08-21 06:56:12.000000000 -0700 +++ zic.8 2006-09-27 19:54:52.597649904 -0700 @@ -98,7 +98,9 @@ .B yearistype when checking year types (see below). .PP -Input lines are made up of fields. +Input lines are made up of +.B specified +fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. An unquoted sharp character (#) in the input introduces a comment which extends @@ -122,13 +124,16 @@ Rule US 1967 1973 \- Apr lastSun 2:00 1:00 D .sp .fi +Each rule specifies one or more +.I transitions +between standard and saving time. The fields that make up a rule line are: .TP "\w'LETTER/S'u" .B NAME Gives the (arbitrary) name of the set of rules this rule is part of. .TP .B FROM -Gives the first year in which the rule applies. +Gives the first year in which the transition rule applies. Any integer year can be supplied; the Gregorian calendar is assumed. The word .B minimum @@ -153,6 +158,16 @@ may be used to repeat the value of the .B FROM field. +.IP +Note that the +.I offset +that the rule transitions will continue on until the next +transition occurs, +no matter how far in the future of the +.B FROM +or +.B TO +years that may be. .TP .B TYPE Gives the type of year in which the rule applies. @@ -176,6 +191,8 @@ to check the type of a year: an exit status of zero is taken to mean that the year is of the given type; an exit status of one is taken to mean that the year is not of the given type. +This might be used, for example, to have a rule which applies only in +leap years. .TP .B IN Names the month in which the rule takes effect. @@ -263,6 +280,22 @@ If this field is .BR \- , the variable part is null. +.IP +Note that, as a special case, references to a rule's +.B LETTER/S +field +(through a %s in a zone line's +.B FORMAT +field) +for a date which predates the oldest date specified in a given rule +will be assigned the +.q "variable part" +specified by the oldest +.I standard time +(i.e., with a +.B SAVE +value of zero) +transition specified for the named rule. .PP A zone line has the form .sp @@ -313,7 +346,9 @@ .TP .B UNTIL The time at which the UTC offset or the rule(s) change for a location. -It is specified as a year, a month, a day, and a time of day. +This field, which is logically a single field in the sense of the +high-level description, consists of whitespace separated subfields +consisting of a year, a month, a day, and a time of day. If this is specified, the time zone information is generated from the given UTC offset and rule change until the time specified. @@ -321,7 +356,9 @@ columns of a rule; trailing columns can be omitted, and default to the earliest possible value for the missing columns. .IP -The next line must be a +The next line +.I must +be a .q continuation line; this has the same form as a zone line except that the string --Ken Pizzini

On Wed, Sep 27, 2006 at 05:38:45PM -0700, Ken Pizzini wrote:
I'll make an attempt at making the text clearer...
Okay, here's what I came up with --- not so much rewording as adding clarifications: [snip] Input lines are made up of specified fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. An unquoted sharp character (#) in the input introduces a comment which extends to the end of the line the sharp character appears on. White space characters and sharp characters may be enclosed in double quotes (") if they're to be used as part of a field. Any line that is blank (after comment stripping) is ignored. Non-blank lines are expected to be of one of three types: rule lines, zone lines, and link lines. A rule line has the form Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S For example: Rule US 1967 1973 - Apr lastSun 2:00 1:00 D Each rule specifies one or more transitions between standard and saving time. The fields that make up a rule line are: NAME Gives the (arbitrary) name of the set of rules this rule is part of. FROM Gives the first year in which the transition rule applies. Any integer year can be supplied; the Gregorian calendar is assumed. The word minimum (or an abbreviation) means the mini- mum year representable as an integer. The word maximum (or an abbreviation) means the maximum year representable as an inte- ger. Rules can describe times that are not representable as time values, with the unrepresentable times ignored; this allows rules to be portable among hosts with differing time value types. TO Gives the final year in which the rule applies. In addition to minimum and maximum (as above), the word only (or an abbrevia- tion) may be used to repeat the value of the FROM field. Note that the offset that the rule transitions will continue on until the next transition occurs, no matter how far in the future of the FROM or TO years that may be. TYPE Gives the type of year in which the rule applies. If TYPE is - then the rule applies in all years between FROM and TO inclu- sive. If TYPE is something else, then zic executes the command yearistype year type to check the type of a year: an exit status of zero is taken to mean that the year is of the given type; an exit status of one is taken to mean that the year is not of the given type. This might be used, for example, to have a rule which applies only in leap years. [...unchanged text omitted...] LETTER/S Gives the "variable part" (for example, the "S" or "D" in "EST" or "EDT") of time zone abbreviations to be used when this rule is in effect. If this field is -, the variable part is null. Note that, as a special case, references to a rule's LETTER/S field (through a %s in a zone line's FORMAT field) for a date which predates the oldest date specified in a given rule will be assigned the "variable part" specified by the oldest stan- dard time (i.e., with a SAVE value of zero) transition speci- fied for the named rule. A zone line has the form Zone NAME GMTOFF RULES/SAVE FORMAT [UNTIL] [...unchanged text omitted...] UNTIL The time at which the UTC offset or the rule(s) change for a location. This field, which is logically a single field in the sense of the high-level description, consists of whitespace sepa- rated subfields consisting of a year, a month, a day, and a time of day. If this is specified, the time zone information is gen- erated from the given UTC offset and rule change until the time specified. The month, day, and time of day have the same format as the IN, ON, and AT columns of a rule; trailing columns can be omitted, and default to the earliest possible value for the miss- ing columns. The next line must be a "continuation" line; this has the same form as a zone line except that the string "Zone" and the name are omitted, as the continuation line will place information starting at the time specified as the UNTIL field in the previous line in the file used by the previous line. Continuation lines may contain an UNTIL field, just as zone lines do, indicating that the next line is a further continuation. [snip] And for ADO, here is the corresponding diff to the zic.8 page: --- zic.8~ 2006-08-21 06:56:12.000000000 -0700 +++ zic.8 2006-09-27 19:54:52.597649904 -0700 @@ -98,7 +98,9 @@ .B yearistype when checking year types (see below). .PP -Input lines are made up of fields. +Input lines are made up of +.B specified +fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. An unquoted sharp character (#) in the input introduces a comment which extends @@ -122,13 +124,16 @@ Rule US 1967 1973 \- Apr lastSun 2:00 1:00 D .sp .fi +Each rule specifies one or more +.I transitions +between standard and saving time. The fields that make up a rule line are: .TP "\w'LETTER/S'u" .B NAME Gives the (arbitrary) name of the set of rules this rule is part of. .TP .B FROM -Gives the first year in which the rule applies. +Gives the first year in which the transition rule applies. Any integer year can be supplied; the Gregorian calendar is assumed. The word .B minimum @@ -153,6 +158,16 @@ may be used to repeat the value of the .B FROM field. +.IP +Note that the +.I offset +that the rule transitions will continue on until the next +transition occurs, +no matter how far in the future of the +.B FROM +or +.B TO +years that may be. .TP .B TYPE Gives the type of year in which the rule applies. @@ -176,6 +191,8 @@ to check the type of a year: an exit status of zero is taken to mean that the year is of the given type; an exit status of one is taken to mean that the year is not of the given type. +This might be used, for example, to have a rule which applies only in +leap years. .TP .B IN Names the month in which the rule takes effect. @@ -263,6 +280,22 @@ If this field is .BR \- , the variable part is null. +.IP +Note that, as a special case, references to a rule's +.B LETTER/S +field +(through a %s in a zone line's +.B FORMAT +field) +for a date which predates the oldest date specified in a given rule +will be assigned the +.q "variable part" +specified by the oldest +.I standard time +(i.e., with a +.B SAVE +value of zero) +transition specified for the named rule. .PP A zone line has the form .sp @@ -313,7 +346,9 @@ .TP .B UNTIL The time at which the UTC offset or the rule(s) change for a location. -It is specified as a year, a month, a day, and a time of day. +This field, which is logically a single field in the sense of the +high-level description, consists of whitespace separated subfields +consisting of a year, a month, a day, and a time of day. If this is specified, the time zone information is generated from the given UTC offset and rule change until the time specified. @@ -321,7 +356,9 @@ columns of a rule; trailing columns can be omitted, and default to the earliest possible value for the missing columns. .IP -The next line must be a +The next line +.I must +be a .q continuation line; this has the same form as a zone line except that the string --Ken Pizzini

First off, sorry for the duplicated post --- operator error. I meant to be sending this correction: On Wed, Sep 27, 2006 at 08:34:10PM -0700, Ken Pizzini wrote:
Note that the offset that the rule transitions will continue on until the next transition occurs, no matter how far in the future of the FROM or TO years that may be.
There is a word missing there, and it is poorly worded anyway. I meant something more like: Note that the offset which results from a rule transition specification will continue to be in effect until the next transition occurs, no matter how far in the future of the given FROM or TO years that may be. That probably could still be worded better, but at least this version is grammatically sensible. --Ken Pizzini

Date: Wed, 27 Sep 2006 20:34:10 -0700 From: Ken Pizzini <tz.@explicate.org> Message-ID: <20060928033410.GA22927@866863.msa.explicate.org> | Okay, here's what I came up with --- not so much rewording as | adding clarifications: I'm not sure that is the best approach - we can keep on adding more and more clarifications, and that just means more and more text to confuse people. What's really needed is to be as precise as possible, as briefly as possible, so there's less room left for confusion or ambiguity. | And for ADO, here is the corresponding diff to the zic.8 page: I'm going to comment on these, as it's much easier to see what changes you're proposing this way ... I will, of course, omit most of the diff, leaving just the specific changes of concern. | -Input lines are made up of fields. | +Input lines are made up of | +.B specified | +fields. I doubt that adding that word by itself will accomplish anything, all that leads to is "what are specified fields ?" | +Each rule specifies one or more | +.I transitions | +between standard and saving time. This one I think is the crucial sentence to add. In fact, perhaps even another sentence: The offset from UTC that applies at any particular time can be obtained by finding the nearest preceding transition. And even, perhaps, something about what happens before the first transition. However, I'm not sure we should say (anywhere) "saving time", as "saving" (with respect to time, or for that matter, daylight) has nothing at all to do with what is happening (saving energy perhaps). It is probably best just to call them "standard" and "alternate" time offsets (or something like that). | -Gives the first year in which the rule applies. | +Gives the first year in which the transition rule applies. Adding that word helps nothing, the earlier addition already made clear that the rules are transition rules. | +Note that the | +.I offset | +that the rule transitions will continue on until the next | +transition occurs, | +no matter how far in the future of the | +.B FROM | +or | +.B TO | +years that may be. (That's the one you sent alternative (better) wording for in a later message). I don't think that's needed at all - simply saying (along with the definition of the rules as transitions) that the offset at time X is that given by the most recent transition before (or at) X should be enough. | +This might be used, for example, to have a rule which applies only in | +leap years. That's OK, I guess, but I'm not sure it is needed. As long as the field isn't used, no-one much cares what it might be used for (except the very few people who write data for these files and even then only in unusual cases - most people are concerned with extracting the data). Where the field is used, its purpose is generally both well commented, and really obvious. | +.IP | +Note that, as a special case, references to a rule's | +.B LETTER/S | +field | +(through a %s in a zone line's | +.B FORMAT | +field) | +for a date which predates the oldest date specified in a given rule | +will be assigned the | +.q "variable part" | +specified by the oldest | +.I standard time | +(i.e., with a | +.B SAVE | +value of zero) | +transition specified for the named rule. It may be better to have something in general which specifies how to process data before the first transition (and for when there are no transitions at all) - both for the zone name abbreviation, and for the offset to use, as in general we are (or should) be telling people normally to locate the preceding transition - we do need to say what to do when there is no preceding transition. In there, the text above would explain how the abbreviation is derived (when %s is used anyway) but it wouldn't need the "Note that as a special case" part, as the whole section would be something of a special case. | -It is specified as a year, a month, a day, and a time of day. | +This field, which is logically a single field in the sense of the | +high-level description, consists of whitespace separated subfields | +consisting of a year, a month, a day, and a time of day. I think this is the wrong way to explain this - better to explain that the final field contains whatever data is left on the input line, white space included, after all previous fields have been assigned values. Then it wiil be fine to say that the until field contains year month day & time. | -The next line must be a | +The next line | +.I must | +be a Putting "must" in italics is just wrong - if it needed emphasising at all (which I don't think it does), it should be bold, not italics. Just leave that one the way it was. kre

On Thu, Sep 28, 2006 at 01:22:09PM +0700, Robert Elz wrote:
| Okay, here's what I came up with --- not so much rewording as | adding clarifications:
I'm not sure that is the best approach - we can keep on adding more and more clarifications, and that just means more and more text to confuse people.
What's really needed is to be as precise as possible, as briefly as possible, so there's less room left for confusion or ambiguity.
Sure, but I wasn't finding a better way to remove the confusion that has been seen than just adding the "clarifications". If someone else can do a better job of making the text less susceptible to misinterpretation I'd be happy to see that text used instead.
| -Input lines are made up of fields. | +Input lines are made up of | +.B specified | +fields.
I doubt that adding that word by itself will accomplish anything, all that leads to is "what are specified fields ?"
Yeah, probably so. My intent was to try and get rid of the "there may be zero to three extra space-separated fields in a Zone line if UNTIL is specified" misunderstanding by claiming that the document only *specifies* the six fields. I was probably being optimistic in thinking that the above "magic word" would help. I like your suggestion about wording for the UNTIL field (below), which handles this problem much more neatly.
However, I'm not sure we should say (anywhere) "saving time", as "saving" (with respect to time, or for that matter, daylight) has nothing at all to do with what is happening (saving energy perhaps). It is probably best just to call them "standard" and "alternate" time offsets (or something like that).
I was just following the current convention: there are currently 79 mentions of "saving time", 28 mentions of "savings time" and zero mentions of "alternate time" in the tzcode+tzdata tree. I probably should have said "summer time" though --- 123 mentions of that term.
| -Gives the first year in which the rule applies. | +Gives the first year in which the transition rule applies.
Adding that word helps nothing, the earlier addition already made clear that the rules are transition rules.
I added it for emphasis. Emphasis is a useful thing; if one skimmed too quickly over the first mention, the modifier here would give one pause about "what is that `transition' adjective there for?"
| +Note that the | +.I offset | +that the rule transitions will continue on until the next | +transition occurs, | +no matter how far in the future of the | +.B FROM | +or | +.B TO | +years that may be.
(That's the one you sent alternative (better) wording for in a later message).
I don't think that's needed at all - simply saying (along with the definition of the rules as transitions) that the offset at time X is that given by the most recent transition before (or at) X should be enough.
Should be: maybe. But will it really be? In this very thread I've seen the misinterpretation that this sentence is addressing crop up after already explaining once about "transitions". The depth of the confusion I've seen on this point makes me think that harping on the correct understanding is a good thing.
| +This might be used, for example, to have a rule which applies only in | +leap years.
That's OK, I guess, but I'm not sure it is needed. As long as the field isn't used, no-one much cares what it might be used for (except the very few people who write data for these files and even then only in unusual cases - most people are concerned with extracting the data). Where the field is used, its purpose is generally both well commented, and really obvious.
I added it because I recall someone once being puzzled about its purpose, and I thought I'd address that as long as I was wordsmithing. The places it is currently used is the "pacificnew" zone, which seems a bit obscure, and the yearistype.sh script itself. While by the very nature of the beast, more folk will refer to this document for the purpose of interpreting the tzdata files, I think that the document should nevertheless also be suitable as stand-alone documentation for those who compose rules. Because the use of a TYPE other than "-" is so rare I felt that it would be useful to spell out its rationale in this "primary documentation" of the file format.
| +Note that, as a special case, references to a rule's | +.B LETTER/S | +field | +(through a %s in a zone line's | +.B FORMAT | +field) | +for a date which predates the oldest date specified in a given rule | +will be assigned the | +.q "variable part" | +specified by the oldest | +.I standard time | +(i.e., with a | +.B SAVE | +value of zero) | +transition specified for the named rule.
It may be better to have something in general which specifies how to process data before the first transition (and for when there are no transitions at all) - both for the zone name abbreviation, and for the offset to use,
My wording is bad: sure. But I think you're misrepresenting the problem being addressed. If a *Zone* transition is in the future of the first *Rule* transition, the only plausible confusion is what to substitute for a variable-text FORMAT --- the "alternate time" offset should be zero seconds away from the GMTOFF. The text above is trying to specify that the *Rule* line with the oldest date that also has a zero SAVE value is the text to use.
as in general we are (or should) be telling people normally to locate the preceding transition - we do need to say what to do when there is no preceding transition.
Other than a match-up issue between Zone and Rule transitions, there is no well-defined concept of "before the first transition". With the sole exception of handling a variable-text part of a FORMAT, a lack of *Rule* transitions preceding the first (or any other) *Zone* transitions means "there are no transitions, so this is a no-op". I guess we could emphasize that the SAVE offset from the GMTOFF would be zero seconds in this case, but that seems excessive to me.
| -It is specified as a year, a month, a day, and a time of day. | +This field, which is logically a single field in the sense of the | +high-level description, consists of whitespace separated subfields | +consisting of a year, a month, a day, and a time of day.
I think this is the wrong way to explain this - better to explain that the final field contains whatever data is left on the input line, white space included, after all previous fields have been assigned values.
Then it wiil be fine to say that the until field contains year month day & time.
I agree: that's a better way.
| -The next line must be a | +The next line | +.I must | +be a
Putting "must" in italics is just wrong - if it needed emphasising at all (which I don't think it does), it should be bold, not italics. Just leave that one the way it was.
*shrug* If this were HTML I'd use <em>must</em>, not <strong>must</strong>, as I just wanted to emphasise this requirement. A minor style point; I'll defer to ADO's discretion of what (if any) font change to apply there. --Ken Pizzini

Date: Thu, 28 Sep 2006 00:29:52 -0700 From: Ken Pizzini <tz.@explicate.org> Message-ID: <20060928072952.GA23423@866863.msa.explicate.org> | I was just following the current convention: there are currently | 79 mentions of "saving time", 28 mentions of "savings time" and | zero mentions of "alternate time" in the tzcode+tzdata tree. | I probably should have said "summer time" though --- 123 mentions | of that term. What's in the tree is much less important for this than what's in the doc itself - that one should be somewhat consistent. In the data files we tend to get the locally appropriate "slang" name for the clock altering effect, which is fine (so it is daylight savings time in the US, and summer time in Aust - or was until US media influences brought "daylight savings" along with them). What is perhaps important here (for the doc) is that not all transitions are to/from whatever we call that effect - some are simply alterations from one zone offset to another (the standard time offset is altered). Referring to the transitions as all being to/from summer time (or savings time, or any other name like that) will lead to confusion when someone finds one of those transitions where a locality simply altered its definition of standard time. | I added it for emphasis. Emphasis is a useful thing; if one skimmed | too quickly over the first mention, the modifier here would give one | pause about "what is that `transition' adjective there for?" I actually disagree with the "emphasis is useful" - precision is useful, anything more than that degrades the result (for anything where technical accuracy is the objective - we're not talking literature here). Sure, very precise docs are dry, and hard to read, but anyone who does the work to read it carefully will be left in no doubt as to the meaning. Anyone who doesn't bother to read carefully doesn't really want to know anyway, and there's no reason for us to worry about them (eg: here, the people who raised the issues have obviously read the doc very carefully, and know exactly what is there). | Should be: maybe. But will it really be? In this very thread I've | seen the misinterpretation that this sentence is addressing crop up | after already explaining once about "transitions". You're assuming that your explanation was read (and read before the misinterpretation). That's not necessarily true. I'm certainly happy to have the doc become very clear on this point, which it clearly has not been, but I think once should be enough, I don't think harping on anything actually achieves much in the way of useful results. [the TYPE explanation] | I added it because I recall someone once being puzzled about its | purpose, Sure, I understand that, and don't object to it, its just that I am not sure that most of the readers of this doc care, and I suspect that those who do care already know, or would find out in other ways. | The places it is currently used is the "pacificnew" zone, which seems | a bit obscure, and the yearistype.sh script itself. In some ways it is a pity it was yanked from the rules for Australia/Adelaide It was used there in the early 1990's when summer time ended on the first Sunday in March in odd numbered years (and was consistent with most of the rest of Aust) but on the 3rd Sunday in March in even numbered years (and was inconsistent with everyone else). There we had a case where the TYPE specification was actually being used in a real production zone (unlike pacificnew). However, by the mid 1990's, summer time was extended (everywhere in Aust) to the end of March, so this even/odd year distinction was irrelevant. Then, once the limits were known, the transitions could be (and now are) handled by a sequence of "only" rules, rather than using the even/odd year rule. Since that means the yearistype script isn't needed to compile the zone file, everything gets simpler (some systems shipped zic, but not yearistype, which worked everywhere except for the australasia file). | While by the very nature of the beast, more folk will refer to this | document for the purpose of interpreting the tzdata files, I think that | the document should nevertheless also be suitable as stand-alone | documentation for those who compose rules. Because the use of a TYPE | other than "-" is so rare I felt that it would be useful to spell out | its rationale in this "primary documentation" of the file format. OK, but again (harping on ... and hence probably annoying both you and everyone else...) I'm not sure that there is a need for rationale in the specification. It simply needs to specify what happens, not why someone might want to use it (if for no other reason than that giving an example like that can stifle creativity - people come to believe that this feature is for a particular purpose, and never imagine the other ways it can be used). For example, the changes in Egypt/Syria/... recently may have been better handled by a "ramadan-at-timeshift" TYPE specifier, and some magic code in yearistype.sh to determine whether the year in question is the one that needs different summer time rules than normal. | My wording is bad: sure. Yes, as I said, I saw your update, and the "even better needed" part of that - I wasn't commenting upon the particular wording. | But I think you're misrepresenting the problem being addressed. No, I understand that. I just think that if we make it clear that the way to handle any time conversion (given an absolute time (incl date), find what UTC offset & name applied (or applies) to it for a particular zone) by saying "find the preceding transition, and the offset and name specified there apply" which I think is how we should make it very clear that the rules are just specifying transitions, and not time ranges - then we need some general statement about what to do when there is no previous transition. If we have that, it can include this case as well as part of the general (early time) handling methods. For this I don't think it needs to matter whether the previous transition is one caused by a ZONE or a RULE - it is just "the previous transition". Also note that it is perfectly possible to define a zone with absolutely no transitions, in fact, we have several already, consider these ... Zone Etc/GMT 0 - GMT Zone Etc/UTC 0 - UTC Zone Etc/UCT 0 - UCT The fact that the offset there is 0 is (almost) just a conincidence, it is perfectly acceptable to define a zone as Zone America/Eastern-Standard -5:00 - EST And since it is possible to use %s in a FORMAT, I could also do Zone America/Eastern-Standard -5:00 - E%sT but I'm not real sure what that one would mean (where does the %s value come from here?) We probably need to say that %s can only be used when the RULES/SAVE field actually specifies rules (not when it is '-' or an explicit adjustment). We should also probably do away with the "Alternatively, a slash (/) separates standard and daylight abbreviations." for the FORMAT as a particularly poor idea (fortunately, never used to my knowledge). (Do away as in from the code, as well as the doc, of course...) | With the sole exception of handling a variable-text part of a FORMAT, | a lack of *Rule* transitions preceding the first (or any other) *Zone* | transitions means "there are no transitions, so this is a no-op". | I guess we could emphasize that the SAVE offset from the GMTOFF would | be zero seconds in this case, but that seems excessive to me. It is a matter of how we specify how to handle the data - if we always say "look for the preceding transition", which is what I am suggesting, then we have to actually say what to do when there is none found - even if that is as simple as "zero seconds from the (first) gmtoff for the zone and the substitute for %s in a FORMAT is ..." | *shrug* If this were HTML I'd use <em>must</em>, not <strong>must</strong>, | as I just wanted to emphasise this requirement. A minor style point; I'll | defer to ADO's discretion of what (if any) font change to apply there. Minor style point perhaps (especially as half the time people read this stuff in effectively ascii anyway, where all of this is rendered in various obscure ways) - but my understanding is/was that italics are supposed to be used for quotations, latin words, other special terms, and stuff like that, not for emphasis - for emphasis one uses underlining or bold (or bigger) font, or even all capitals. I kind of suspect (unsupported by much in the way of evidence) that all this got confused in the days of typewriters, where there was no way to get italic font, so underlining got used instead, on the other hand, a kind of bold could be done by overstriking. Then when we went back to typesetting (or the modern equivalents) someone decided that anything which had been underlined should be set in italics - including where the underlining had been intended for emphasis, not because what was written was latin (eg...) or a quotation. Now we have a mess, where the conventions are largely gone, and it is hard to ascribe any meaning to what is presented, other than "that is different". Nevertheless, unless there's a very good reason, it makes sense to try and do things the right way. kre

In early versions of the time zone package there was no support for "UNTIL" stuff; lines of the same type always had the same number of fields and there was no need to cope with field count variation. This is what's behind the documentation haziness. Here's the minimal set of changes that might address the issue at hand. --ado *** zic.8 Thu Sep 28 10:12:49 2006 --- zic.8.maybe Thu Sep 28 10:11:12 2006 *************** *** 269,275 **** .nf .ti +.5i .ta \w'Zone\0\0'u +\w'Australia/Adelaide\0\0'u +\w'GMTOFF\0\0'u +\w'RULES/SAVE\0\0'u +\w'FORMAT\0\0'u ! Zone NAME GMTOFF RULES/SAVE FORMAT [UNTIL] .sp For example: .sp --- 269,275 ---- .nf .ti +.5i .ta \w'Zone\0\0'u +\w'Australia/Adelaide\0\0'u +\w'GMTOFF\0\0'u +\w'RULES/SAVE\0\0'u +\w'FORMAT\0\0'u ! Zone NAME GMTOFF RULES/SAVE FORMAT [UNTILYEAR [MONTH [DAY [TIME]]]] .sp For example: .sp *************** *** 311,317 **** a slash (/) separates standard and daylight abbreviations. .TP ! .B UNTIL The time at which the UTC offset or the rule(s) change for a location. It is specified as a year, a month, a day, and a time of day. If this is specified, --- 311,317 ---- a slash (/) separates standard and daylight abbreviations. .TP ! .B UNTILYEAR [MONTH [DAY [TIME]]] The time at which the UTC offset or the rule(s) change for a location. It is specified as a year, a month, a day, and a time of day. If this is specified, *************** *** 318,325 **** the time zone information is generated from the given UTC offset and rule change until the time specified. The month, day, and time of day have the same format as the IN, ON, and AT ! columns of a rule; trailing columns can be omitted, and default to the ! earliest possible value for the missing columns. .IP The next line must be a .q continuation --- 318,325 ---- the time zone information is generated from the given UTC offset and rule change until the time specified. The month, day, and time of day have the same format as the IN, ON, and AT ! fields of a rule; trailing fields can be omitted, and default to the ! earliest possible value for the missing fields. .IP The next line must be a .q continuation *************** *** 328,338 **** .q Zone and the name are omitted, as the continuation line will place information starting at the time specified as the ! .B UNTIL ! field in the previous line in the file used by the previous line. ! Continuation lines may contain an ! .B UNTIL ! field, just as zone lines do, indicating that the next line is a further continuation. .PP A link line has the form --- 328,338 ---- .q Zone and the name are omitted, as the continuation line will place information starting at the time specified as the ! .q until ! information in the previous line in the file used by the previous line. ! Continuation lines may contain ! .q until ! information, just as zone lines do, indicating that the next line is a further continuation. .PP A link line has the form

The change to [UNTILYEAR [MONTH [DAY [TIME]]]] is much clearer for anyone parsing the file. A few other items. TYPE Gives the type of year in which the rule applies. As far as I can tell, this is always "-". Might be nice to have a note indicating the last time this was necessary. (I hope it never becomes necessary to use in the future, since that makes the file unparseable by anything but zic.) SAVE Gives the amount of time to be added to local standard time when the rule is in effect. This field has the same format as the AT field (although, of course, the w and s suffixes are not used). It doesn't mention the "u (or g or z)" suffixes. If those are also disallowed, they should be mentioned; or the text could be phrased as "of course, the letter suffixes are not used". GMTOFF The amount of time to add to UTC to get standard time in this zone. This field has the same format as the AT and SAVE fields of rule lines; begin the field with a minus sign if time must be subtracted from UTC. This is incorrect, since the AT and SAVE fields don't have the same format: I presume it is the SAVE format, so "AT and" should be deleted. Mark On 9/28/06, Olson, Arthur David (NIH/NCI) [E] <olsona@dc37a.nci.nih.gov> wrote:
In early versions of the time zone package there was no support for "UNTIL" stuff; lines of the same type always had the same number of

A couple more items. There are currently only three instances where Rules are used before they are defined. I realize that it doesn't make a difference with the current zic, but for other parsers it would be nicer if Rules were only referenced after they were defined. Any possibility of getting these Rule definitions moved before their first usage in Zone lines? It only affects one file. *** EU in Europe/London *** EU in Europe/Dublin *** Romania in Europe/Chisinau A minor item. The AT field can be wall time (and is, in the default case). In all but the first rule, that is calculated from what the previous transition produced. However, in the first rule, wall time can't be defined by that, since there was no previous transition. Just to be perfectly clear, zic.8.txt should say what the interpretation is in that case (I assume that in that case, is identical with standard time). Mark

Date: Wed, 27 Sep 2006 17:38:45 -0700 From: Ken Pizzini <tz.@explicate.org> Message-ID: <20060928003845.GA17660@866863.msa.explicate.org> | I'll make an attempt at making the text clearer... but then again, | since I understood the original text and you found it misleading, | perhaps you'd like to take a stab at clarifying it? I suspect the problem (like a lot of things that lead to ambiguities) is that it all depends upon yor state of mind when you start (what you already believe is true). If you have it in mind that the rules are defining periods during which a particular offset from UTC (and a particular abbreviation) applies, then you're likely to read the text in an entirely different way than if you start out believing that what is being defined is a set of points at which the offset frm UTC (and/or the associated abbreviation alters). For anyone who thinks carefully about it, the first of those two is clearly not rational, for example, consider the following two lines (rules) from some version or other of the australasia file (this might not be current, it's just a version I had conveniently lying around) Rule AT 1991 1999 - Oct Sun>=1 2:00s 1:00 - Rule AT 1991 max - Mar lastSun 2:00s 0 - If you believe the "specifies a range of times during which an offset applies" is the correct interpretation, then the first of those rules says that from some Sunday early Oct 1991 (the 6th it happened to be) until Some Sunday, early oct 1999, the ofset from UTC (for Tasmania) should have been +11:00 (the base offset is +10:00). The second rule says that from some Sunday late March 1991 (the 31st that year), into the unknown indefinite future, the offset from UTC is +10:00. What that would have to mean is that all during 1992, 1993, ... there were two offsets defined to run concurrently. That would be absurd, so, proof by contradiction (reductio ad absurdum -- or something like that) the original hypothesis must be incorrect. On the other hand, if you treat the rules as simply saying Mar 31 1991 (02:00s) change offset to 10:00 Oct 6 1991 (02:00s) change offset to 11:00 Mar 29 1992 (02:00s) change offset to 10:00 Oct 4 1992 (02:00s) change offset to 11:00 Mar 28 1993 (02:00s) change offset to 10:00 Oct 3 1993 (02:00s) change offset to 11:00 (etc) That is, as a shorthand notation from writing all of that out (which would also be possible, of course), then it all fits perfectly well, we have the transitions, and the offset (and abbreviation) between any two transitions, and the offset (& abbreviation) after the last transition, and even what applies before the first transition is all trivial to obtain. If the zic.8 text needs clarifying, perhaps what is needed is not any kind of change to the text that has been suggested, but to make it quite clear that a list of transitions is what is being specified, not a list of ranges of times (those of us who have "grown up" alongside the development of the database simply know this, but it is apparently not as clear to those who have started looking at it more recently). On spaces separating fields, I suspect the answer is that it all works the same way as the (unix shell) read command - white space separates fields, until we have as many fields as we need - after that all the rest of the input line (including anything which would otherwise be a separator) all just gets included in the value of the final field. So, to the unix shell echo a b c | read var puts (aside from problems of using "read" from a pipe) "a b c" into var. echo a b c | read v1 v2 puts "a" into v1, and "b c" into v2, and echo a b c | read v1 v2 v3 v4 puts "a" into v1, "b" into v2 ,"c" into v3 and "" (empty) into v4. In the database source format, the "until" is the final field (would be the last name on the "read" command, if there were one), so if this parsing method is assumed, then the "spaces in until field are OK" all just works out... It is also certainly not really harder to parse, the parsing method simply finds the first N fields (delimited by white space) and leaves whatever is left over (if anything) as the final field - that's trivial to code. Explaining it is also not really difficult either - though perhaps a few extra words making it clear that the line has a fixed maximum number of fields, and any excess data is all part of the final field (white space included). kre
participants (5)
-
Ken Pizzini
-
Mark Davis
-
Olson, Arthur David (NIH/NCI) [E]
-
Paul Schauble
-
Robert Elz