Hi, I'm working on compiling the time zone data into an XML file for easier handling in a program. I noticed some inconsistencies in the format of the files versus the description in zic.8.txt. For example, in the latest version in Africa on line 93 there is an extra tab at the beginning. Can anyone confirm if this? I've attached screenshots of what I am seeing. Thanks, Colin
Colin Bowern is not on the time zone mailing list; direct replies appropriately. --ado ________________________________ From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Wednesday, July 19, 2006 1:37 PM To: tz@lecserver.nci.nih.gov Subject: Inconsistent format of data files? Hi, I'm working on compiling the time zone data into an XML file for easier handling in a program. I noticed some inconsistencies in the format of the files versus the description in zic.8.txt. For example, in the latest version in Africa on line 93 there is an extra tab at the beginning. Can anyone confirm if this? I've attached screenshots of what I am seeing. Thanks, Colin
Like the good book (the zic manual page) says... Input lines are made up of fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. So, by definition, there can't be "extra" tabs at the beginnings of lines. While we could make the stuff in the time zone package more consistent, there are presumably files out in the wild created by other folks that wouldn't match whatever consistent pattern we settled on. The safest course for developers is to parse liberally in accordance with the manual page. --ado ________________________________ From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Wednesday, July 19, 2006 1:37 PM To: tz@lecserver.nci.nih.gov Subject: Inconsistent format of data files? Hi, I'm working on compiling the time zone data into an XML file for easier handling in a program. I noticed some inconsistencies in the format of the files versus the description in zic.8.txt. For example, in the latest version in Africa on line 93 there is an extra tab at the beginning. Can anyone confirm if this? I've attached screenshots of what I am seeing.
Hi Arthur, I've trimmed the input for the leading and trailing spaces in the latest iteration. The problem I'm having comes into play when you've got whitespaces between fields, yet fields have multiple bits of data separated by spaces. The zic manual page says: "White space characters and sharp characters may be enclosed in double quotes (") if they're to be used as part of a field." Relating to that statement the problem is when I see a Zone record as such: Zone Antarctica/Vostok 0 - zzz 1957 Dec 16 In this example the final Until field has several whitespaces but is not enclosed in double quotes. If we say that all fields are tab separated then it's easy to interpret the above line, but if it's too liberal then I would think the Until field should be wrapped with double quotes. Thoughts? Thanks, Colin -----Original Message----- From: Olson, Arthur David (NIH/NCI) [E] [mailto:olsona@dc37a.nci.nih.gov] Sent: Thursday, July 20, 2006 4:20 PM To: Colin Bowern; tz@elsie.nci.nih.gov Subject: RE: Inconsistent format of data files? Like the good book (the zic manual page) says... Input lines are made up of fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. So, by definition, there can't be "extra" tabs at the beginnings of lines. While we could make the stuff in the time zone package more consistent, there are presumably files out in the wild created by other folks that wouldn't match whatever consistent pattern we settled on. The safest course for developers is to parse liberally in accordance with the manual page. --ado ________________________________ From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Wednesday, July 19, 2006 1:37 PM To: tz@lecserver.nci.nih.gov Subject: Inconsistent format of data files? Hi, I'm working on compiling the time zone data into an XML file for easier handling in a program. I noticed some inconsistencies in the format of the files versus the description in zic.8.txt. For example, in the latest version in Africa on line 93 there is an extra tab at the beginning. Can anyone confirm if this? I've attached screenshots of what I am seeing.
I'm working on a similar project myself, and I think the answer is that technically, the year, month, and day are three separate fields. One good reason to extend the format, that I can see, is greater localizability. Currently, the tz file can only handle one abbreviation per zone per transition, while CLDR takes no account of changes and allows only the (one or two) current abbreviations to be localized. If you want to localize a date stamp and the location has changed rules since then, the model in place fails.... -----Original Message----- From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Thu 20 July 2006 16:29 To: Olson, Arthur David (NIH/NCI) [E]; tz@lecserver.nci.nih.gov Subject: RE: Inconsistent format of data files? Hi Arthur, I've trimmed the input for the leading and trailing spaces in the latest iteration. The problem I'm having comes into play when you've got whitespaces between fields, yet fields have multiple bits of data separated by spaces. The zic manual page says: "White space characters and sharp characters may be enclosed in double quotes (") if they're to be used as part of a field." Relating to that statement the problem is when I see a Zone record as such: Zone Antarctica/Vostok 0 - zzz 1957 Dec 16 In this example the final Until field has several whitespaces but is not enclosed in double quotes. If we say that all fields are tab separated then it's easy to interpret the above line, but if it's too liberal then I would think the Until field should be wrapped with double quotes. Thoughts? Thanks, Colin -----Original Message----- From: Olson, Arthur David (NIH/NCI) [E] [mailto:olsona@dc37a.nci.nih.gov] Sent: Thursday, July 20, 2006 4:20 PM To: Colin Bowern; tz@elsie.nci.nih.gov Subject: RE: Inconsistent format of data files? Like the good book (the zic manual page) says... Input lines are made up of fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. So, by definition, there can't be "extra" tabs at the beginnings of lines. While we could make the stuff in the time zone package more consistent, there are presumably files out in the wild created by other folks that wouldn't match whatever consistent pattern we settled on. The safest course for developers is to parse liberally in accordance with the manual page. --ado ________________________________ From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Wednesday, July 19, 2006 1:37 PM To: tz@lecserver.nci.nih.gov Subject: Inconsistent format of data files? Hi, I'm working on compiling the time zone data into an XML file for easier handling in a program. I noticed some inconsistencies in the format of the files versus the description in zic.8.txt. For example, in the latest version in Africa on line 93 there is an extra tab at the beginning. Can anyone confirm if this? I've attached screenshots of what I am seeing.
Hi Colin, The intention was to make zone files human-readable. Presumably there are users out there who consider it a good read ;) I'll write a php script to convert zone files into a more machine-readable format, http://php-tz.110mb.com/ Just created the account, will have something online probably by end of the week. One thing you're right about though, using dashes in date/time stamps instead of space would not decrease file readability but would help some of us considerably. Srdjan -----Original Message----- From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Thursday, July 20, 2006 10:29 PM To: Olson, Arthur David (NIH/NCI) [E]; tz@lecserver.nci.nih.gov Subject: RE: Inconsistent format of data files? Hi Arthur, I've trimmed the input for the leading and trailing spaces in the latest iteration. The problem I'm having comes into play when you've got whitespaces between fields, yet fields have multiple bits of data separated by spaces. The zic manual page says: "White space characters and sharp characters may be enclosed in double quotes (") if they're to be used as part of a field." Relating to that statement the problem is when I see a Zone record as such: Zone Antarctica/Vostok 0 - zzz 1957 Dec 16 In this example the final Until field has several whitespaces but is not enclosed in double quotes. If we say that all fields are tab separated then it's easy to interpret the above line, but if it's too liberal then I would think the Until field should be wrapped with double quotes. Thoughts? Thanks, Colin -----Original Message----- From: Olson, Arthur David (NIH/NCI) [E] [mailto:olsona@dc37a.nci.nih.gov] Sent: Thursday, July 20, 2006 4:20 PM To: Colin Bowern; tz@elsie.nci.nih.gov Subject: RE: Inconsistent format of data files? Like the good book (the zic manual page) says... Input lines are made up of fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. So, by definition, there can't be "extra" tabs at the beginnings of lines. While we could make the stuff in the time zone package more consistent, there are presumably files out in the wild created by other folks that wouldn't match whatever consistent pattern we settled on. The safest course for developers is to parse liberally in accordance with the manual page. --ado ________________________________ From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Wednesday, July 19, 2006 1:37 PM To: tz@lecserver.nci.nih.gov Subject: Inconsistent format of data files? Hi, I'm working on compiling the time zone data into an XML file for easier handling in a program. I noticed some inconsistencies in the format of the files versus the description in zic.8.txt. For example, in the latest version in Africa on line 93 there is an extra tab at the beginning. Can anyone confirm if this? I've attached screenshots of what I am seeing.
Hi Srdjan, Thanks, I'll take a look at that. I've got a suggestion from Arthur I'm going to try this weekend. I've almost got it working as far as reading it in and converting it out the other end to XML. I'll be posting my source on CodePlex.com as soon as the project is created. Cheers, Colin -----Original Message----- From: Srdjan Krajnalic [mailto:ludiskr@yahoo.com] Sent: Friday, July 21, 2006 3:49 AM To: tz@lecserver.nci.nih.gov Subject: RE: Inconsistent format of data files? Hi Colin, The intention was to make zone files human-readable. Presumably there are users out there who consider it a good read ;) I'll write a php script to convert zone files into a more machine-readable format, http://php-tz.110mb.com/ Just created the account, will have something online probably by end of the week. One thing you're right about though, using dashes in date/time stamps instead of space would not decrease file readability but would help some of us considerably. Srdjan -----Original Message----- From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Thursday, July 20, 2006 10:29 PM To: Olson, Arthur David (NIH/NCI) [E]; tz@lecserver.nci.nih.gov Subject: RE: Inconsistent format of data files? Hi Arthur, I've trimmed the input for the leading and trailing spaces in the latest iteration. The problem I'm having comes into play when you've got whitespaces between fields, yet fields have multiple bits of data separated by spaces. The zic manual page says: "White space characters and sharp characters may be enclosed in double quotes (") if they're to be used as part of a field." Relating to that statement the problem is when I see a Zone record as such: Zone Antarctica/Vostok 0 - zzz 1957 Dec 16 In this example the final Until field has several whitespaces but is not enclosed in double quotes. If we say that all fields are tab separated then it's easy to interpret the above line, but if it's too liberal then I would think the Until field should be wrapped with double quotes. Thoughts? Thanks, Colin -----Original Message----- From: Olson, Arthur David (NIH/NCI) [E] [mailto:olsona@dc37a.nci.nih.gov] Sent: Thursday, July 20, 2006 4:20 PM To: Colin Bowern; tz@elsie.nci.nih.gov Subject: RE: Inconsistent format of data files? Like the good book (the zic manual page) says... Input lines are made up of fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. So, by definition, there can't be "extra" tabs at the beginnings of lines. While we could make the stuff in the time zone package more consistent, there are presumably files out in the wild created by other folks that wouldn't match whatever consistent pattern we settled on. The safest course for developers is to parse liberally in accordance with the manual page. --ado ________________________________ From: Colin Bowern [mailto:Colin.Bowern@officialcommunity.com] Sent: Wednesday, July 19, 2006 1:37 PM To: tz@lecserver.nci.nih.gov Subject: Inconsistent format of data files? Hi, I'm working on compiling the time zone data into an XML file for easier handling in a program. I noticed some inconsistencies in the format of the files versus the description in zic.8.txt. For example, in the latest version in Africa on line 93 there is an extra tab at the beginning. Can anyone confirm if this? I've attached screenshots of what I am seeing.
Attached should be all the zone data in tab separated format. Rules: 2854 lines Zones: 1831 lines Links: 119 lines Total: 4804 lines Are any stats available to verify this? That host might be little more than a phishing scheme O;-) Any thoughts on free php hosting? I cannot make the site public on company's servers.
Attached is a perl script to convert the tzdata source file format to a simple tab-delimited file format. I place it in the public domain. It isn't as robust as zic in its parsing, but it should handle all well-formatted inputs fine. ADO: if you think it is useful, feel free to add it to the tzcode distribution. And if not, don't. :-) --Ken Pizzini
On Fri, Jul 21, 2006 at 05:49:34PM -0700, Ken Pizzini wrote:
Attached is a perl script to convert the tzdata source file format to a simple tab-delimited file format.
I hate when I do that... Okay, this time I'm *really* attaching it... --Ken Pizzini
Hi Ken, Your script is included here http://members.lycos.co.uk/phptz/ Because this is a free hosting account checking the tzdata diff cannot be automated but I'll check once or twice per week and update if necessary. Can you please send to me directly your tab separated file to check against my results? Cheers -----Original Message----- From: Ken Pizzini [mailto:tz.@explicate.org] Sent: Saturday, July 22, 2006 2:50 AM To: tz@lecserver.nci.nih.gov Subject: Re: Tab separated tz data files On Fri, Jul 21, 2006 at 05:49:34PM -0700, Ken Pizzini wrote:
Attached is a perl script to convert the tzdata source file format to a simple tab-delimited file format.
I hate when I do that... Okay, this time I'm *really* attaching it... --Ken Pizzini
participants (5)
-
Andy Lipscomb -
Colin Bowern -
Ken Pizzini -
Olson, Arthur David (NIH/NCI) [E] -
Srdjan Krajnalic