Help With Understanding the Binary Files
Could somebody please help me interpret the tz binary files? I'm having trouble understanding what to do with the two arrays that follow the leap second array. 1. tzfile.h says that they're "indexed by type"; but I don't see anything in the "type" (which I take to mean what tzfile.5 calls "struct ttinfo") that could be used as such an index. Does "indexed by type" mean "having the same index as the type", in other words, the value from the array following the transition time array? 2. tzfile.h seems to imply that these booleans indicate how I should interpret the values in the array of transition times. Is that correct? If so, how do I calculate standard time while DST is being observed given that the SAVE column in the Rule has been lost? Do I assume that the offset for DST is always exactly 1 hour? What about solar89, etc.? 3. tzfile.5 says that these "are used when a time zone file is used in handling POSIX-style time zone environment variables." Does that mean that the "transition times" are the ones in the TZ variable, not the array of transition times in the file (so my question 2 is moot)? I'm very confused. 8-( Thanks, --Bill Seymour
On Dec 9, 2009, at 6:09 AM, Bill Seymour wrote:
Could somebody please help me interpret the tz binary files?
I'm having trouble understanding what to do with the two arrays that follow the leap second array.
1. tzfile.h says that they're "indexed by type"; but I don't see anything in the "type" (which I take to mean what tzfile.5 calls "struct ttinfo") that could be used as such an index. Does "indexed by type" mean "having the same index as the type", in other words, the value from the array following the transition time array?
Yes. The "number of local time types" is what's stored in the tzh_typecnt field, so the array of structures containing coded UTC offset in seconds tm_isdst value abbreviation list index is indexed by the local time type (as it has tzh_typecnt elements), and the array following the transition time array is an array of "types of local time starting at above". See also "man tzfile" on UN*X systems that use the Olson code (and that bother to install the man page).
2. tzfile.h seems to imply that these booleans indicate how I should interpret the values in the array of transition times. Is that correct? If so, how do I calculate standard time while DST is being observed given that the SAVE column in the Rule has been lost? Do I assume that the offset for DST is always exactly 1 hour?
No. Presumably "standard time", at a time when DST is being observed, is what the time would have been had the most recent standard time -> DST transition not happened (note that the offset between GMT and *standard* time in a particular time zone can change over time, so it's not as if a given location has to remain at the same offset from GMT forever), so you should get the offset from GMT for the entry prior to the current entry.
3. tzfile.5 says that these "are used when a time zone file is used in handling POSIX-style time zone environment variables." Does that mean that the "transition times" are the ones in the TZ variable,
No. It means that the TZ variable can be in one of the following forms: :{pathname}, in which case the part of TZ after the : is the pathname of the time zone file to use - absolute if it begins with /, relative to the directory containing the zoneinfo files if it doesn't begin with /; {something not beginning with :}, in which case the value of TZ in its entirety is used as a pathname, and if the pathname (again, absolute if it begins with /, relative to the directory containing the zoneinfo files if it doesn't begin with /) refers to a time zone file that can be read, that file is used, otherwise it's used as a POSIX- style time zone setting. If it's used as a POSIX-style time zone setting rather than as a file name, and the setting of TZ includes no transition rules, the time zone file "posixrules", in the directory containing the zoneinfo files, is used to specify when transitions between standard time and DST happen. See the tzset man page: rule Indicates when to change to and back from summer time. The rule has the form: date/time,date/time where the first date describes when the change from standard to summer time occurs and the second date describes when the change back happens. Each time field describes when, in current local time, the change to the other time is made. The format of date is one of the following: J n The Julian day n (1 <= n <= 365). Leap days are not counted; that is, in all years -- including leap years -- February 28 is day 59 and March 1 is day 60. It is impossible to explicitly refer to the occasional February 29. n The zero-based Julian day (0 <= n <= 365 ) . Leap days are counted, and it is possible to refer to February 29. M m.n.d The d'th day (0 <= d <= 6) of week n of month m of the year (1 <= n <= 5), (1 <= m <= 12), where week 5 means ``the last d day in month m'' which may occur in either the fourth or the fifth week). Week 1 is the first week in which the d'th day occurs. Day zero is Sun- day. The time has the same format as offset except that no leading sign (`-') or (`+') is allowed. The default, if time is not given, is 02:00:00. If no rule is present in the TZ specification, the rules specified by the tzfile(5)-format file posixrules in the system time conversion information directory are used, with the standard and summer time offsets from UTC replaced by those specified by the offset values in TZ.
(*sheesh*) I was NOT asking about the number of local time types. I was NOT asking about the difference between standard time and DST. I was NOT asking how to make POSIX TZ strings. I WAS asking about the two arrays that follow the leap second array in the zoneinfo binaries...what does it mean to say that they're "indexed by type", and what the heck do I do with the data in them? --Bill On Wed, Dec 9, 2009 at 12:02 PM, Guy Harris <guy@alum.mit.edu> wrote:
On Dec 9, 2009, at 6:09 AM, Bill Seymour wrote:
Could somebody please help me interpret the tz binary files?
I'm having trouble understanding what to do with the two arrays that follow the leap second array.
1. tzfile.h says that they're "indexed by type"; but I don't see anything in the "type" (which I take to mean what tzfile.5 calls "struct ttinfo") that could be used as such an index. Does "indexed by type" mean "having the same index as the type", in other words, the value from the array following the transition time array?
Yes. The "number of local time types" is what's stored in the tzh_typecnt field, so the array of structures containing
coded UTC offset in seconds tm_isdst value abbreviation list index
is indexed by the local time type (as it has tzh_typecnt elements), and the array following the transition time array is an array of "types of local time starting at above".
See also "man tzfile" on UN*X systems that use the Olson code (and that bother to install the man page).
2. tzfile.h seems to imply that these booleans indicate how I should interpret the values in the array of transition times. Is that correct? If so, how do I calculate standard time while DST is being observed given that the SAVE column in the Rule has been lost? Do I assume that the offset for DST is always exactly 1 hour?
No. Presumably "standard time", at a time when DST is being observed, is what the time would have been had the most recent standard time -> DST transition not happened (note that the offset between GMT and *standard* time in a particular time zone can change over time, so it's not as if a given location has to remain at the same offset from GMT forever), so you should get the offset from GMT for the entry prior to the current entry.
3. tzfile.5 says that these "are used when a time zone file is used in handling POSIX-style time zone environment variables." Does that mean that the "transition times" are the ones in the TZ variable,
No.
It means that the TZ variable can be in one of the following forms:
:{pathname}, in which case the part of TZ after the : is the pathname of the time zone file to use - absolute if it begins with /, relative to the directory containing the zoneinfo files if it doesn't begin with /;
{something not beginning with :}, in which case the value of TZ in its entirety is used as a pathname, and if the pathname (again, absolute if it begins with /, relative to the directory containing the zoneinfo files if it doesn't begin with /) refers to a time zone file that can be read, that file is used, otherwise it's used as a POSIX-style time zone setting.
If it's used as a POSIX-style time zone setting rather than as a file name, and the setting of TZ includes no transition rules, the time zone file "posixrules", in the directory containing the zoneinfo files, is used to specify when transitions between standard time and DST happen. See the tzset man page:
rule Indicates when to change to and back from summer time. The rule has the form:
date/time,date/time
where the first date describes when the change from standard to summer time occurs and the second date describes when the change back happens. Each time field describes when, in current local time, the change to the other time is made.
The format of date is one of the following:
J n The Julian day n (1 <= n <= 365). Leap days are not counted; that is, in all years -- including leap years -- February 28 is day 59 and March 1 is day 60. It is impossible to explicitly refer to the occasional February 29.
n The zero-based Julian day (0 <= n <= 365 ) . Leap days are counted, and it is possible to refer to February 29.
M m.n.d The d'th day (0 <= d <= 6) of week n of month m of the year (1 <= n <= 5), (1 <= m <= 12), where week 5 means ``the last d day in month m'' which may occur in either the fourth or the fifth week). Week 1 is the first week in which the d'th day occurs. Day zero is Sun- day.
The time has the same format as offset except that no leading sign (`-') or (`+') is allowed. The default, if time is not given, is 02:00:00.
If no rule is present in the TZ specification, the rules specified by the tzfile(5)-format file posixrules in the system time conversion information directory are used, with the standard and summer time offsets from UTC replaced by those specified by the offset values in TZ.
On Dec 9, 2009, at 10:15 AM, Bill Seymour wrote:
I was NOT asking about the number of local time types.
You were asking what the index was. I was indicating how you can infer, from what tzfile.h says, that "indexed by type" does, in fact, mean "having the same index as the type"; that inference involves knowing that noticing that tzh_typecnt is the number of local time types. I also indicated that the tzfile man page could also be used; it says Then there are tzh_ttisstdcnt standard/wall indicators, each stored as a one-byte value; they tell whether the transition times associated with local time types were specified as standard time or wall clock time, and are used when a time zone file is used in handling POSIX-style time zone environment variables. Finally there are tzh_ttisgmtcnt UTC/local indicators, each stored as a one-byte value; they tell whether the transition times associated with local time types were specified as UTC or local time, and are used when a time zone file is used in handling POSIX-style time zone environment variables. but that's unfortunately incomplete. In fact, tzh_ttisstdcnt and tzh_ttisgmtcnt must either be zero or equal to tzh_typecnt (this is enforced by the code that reads the file). If tzh_ttisstdcnt is zero, the array is implicitly "all specified as wall-clock time"; if tzh_ttisgmtcnt is zero, the array is implicitly "all specified as local time".
I was NOT asking about the difference between standard time and DST.
You said
If so, how do I calculate standard time while DST is being observed
If by "how do I calculate standard time while DST is being observed" you mean that you have a time_t value and you want to find out, within a given time zone, what the time would be if DST *weren't* being observed, that means that you need the time zone offset for the first entry before the current entry that is an entry for standard time rather than DST (which, contrary to my previous mail, isn't *necessarily* the previous entry, as there's no guarantee that the previous entry won't be for DST as well - unlikely, but not impossible).
I was NOT asking how to make POSIX TZ strings.
You were asking what "are used when a time zone file is used in handling POSIX-style time zone environment variables" means, and I was explaining under what circumstances a time zone file is used in handling POSIX-style time zone environment variables - it's used when a POSIX TZ string doesn't give any rules. I gave the information of what a "rule" is to clarify what "doesn't give any rules" means - a string such as "EST5EDT" doesn't give any rules. That also means that the Booleans are not used for time zone files other than the posixrules file. It does *not* mean that, in all cases, the transition times are the ones in the TZ variable; the only time when the transition times are the ones in the TZ variable is when the TZ variable includes a rule or rules, not when it refers to a file or when it has a setting such as "PST8PDT" with no rules.
Sorry, I guess I should have searched your answers more carefully. In your first answer, for example, the significant word is "Yes"; the rest I already knew. Since I didn't understand why you'd include all the rest, I wasn't sure that the "Yes" answered the question I was asking. I see now that it did. As for going backwards to find the standard time, I don't think that's valid. Consider the following excerpts from the source for America/Indiana/Vincennes: -5:00 - EST 2006 Apr 2 2:00 -6:00 US C%sT 2007 Nov 4 2:00 After 02:00 on 2006 Apr 2, the standard time should be UTC-6; but if I go to the most recent non-DST entry (the actual previous one in this case), I get UTC-5. And I see that I escaped too early from your third answer as well. 8-) My current understanding is that, if the TZ string is POSIX-style (as evidenced by failure to open a file of that name), and if the TZ string has no rule, then I use the posixrules file; and only then do I worry about the two arrays I'm asking about. In this case, the entries in these arrays tell me how to interpret the values in the array of tzh_timecnt transition times. Is that correct? Thanks, --Bill
I've finally gotten around to doing some analysis on actual zoneinfo information files; and I've determined empirically that tzh_is***cnt are either zero or tzh_typecnt as Guy Harris recently said. But I've also found lots of files, not just poxisrules, in which the UTC/local and/or standard/wall indicators exist and are non-zero; so I'm still confused. Do I simply ignore them in all files except poxisrules (as Harris seemed to imply, but I might have misread it), or do I need to reinterpret the transition times based on these indicators when they exist? (If the latter, that seems to be the less desirable design. Since the files are used much more often than they're created, wouldn't it be reasonable to compute the wall clock times eagerly rather than lazily? Note also that, while creating the files, we have enough information to do that correctly. At the time the file is read, we've lost the DST offset amount and have only a switch indicating /whether/ we're observing DST.) Thanks, --Bill Seymour
For nearly all purposes, the UTC/local and standard/wall indicators can indeed be ignored. They're only important when a file is used (by creating a link from "posixrules" to it) as the basis for handling POSIX-style TZ environment variables; they control how the instants stored in the file are mapped to the instants when DST begins and ends. --ado -----Original Message----- From: Bill Seymour [mailto:stdbill.h@pobox.com] Sent: Saturday, December 12, 2009 10:57 To: tz@lecserver.nci.nih.gov Subject: Re: Help With Understanding the Binary Files I've finally gotten around to doing some analysis on actual zoneinfo information files; and I've determined empirically that tzh_is***cnt are either zero or tzh_typecnt as Guy Harris recently said. But I've also found lots of files, not just poxisrules, in which the UTC/local and/or standard/wall indicators exist and are non-zero; so I'm still confused. Do I simply ignore them in all files except poxisrules (as Harris seemed to imply, but I might have misread it), or do I need to reinterpret the transition times based on these indicators when they exist? (If the latter, that seems to be the less desirable design. Since the files are used much more often than they're created, wouldn't it be reasonable to compute the wall clock times eagerly rather than lazily? Note also that, while creating the files, we have enough information to do that correctly. At the time the file is read, we've lost the DST offset amount and have only a switch indicating /whether/ we're observing DST.) Thanks, --Bill Seymour
participants (3)
-
Bill Seymour -
Guy Harris -
Olson, Arthur David (NIH/NCI) [E]