Recently, this mailing list has discussed the perceived need for standardized time zone abbreviations. There has also been some talk about putting the boundaries of time zones into a GIS (Geographic Information System) format. Several people have volunteered to help with that task. These issues are not new; they've come up repeatedly in the past. I've just now posted a few sample Web pages to display data from the tz database in a more accessible format. This applies particularly to the geographic extent of time zones, which is hidden in comments in tzdata2000h. I invite anyone who is interested to visit http://www.mindspring.com/~gwil/tz.html and browse around. Questions to keep in mind: Are the sample pages the best way of displaying the data? Would they be a useful resource for the other tasks mentioned above? Is it feasible to maintain a Web site like this in synch with the tz database? Gwillim Law
Gwillim, I'm curious how you produced these HTML files? I'm currently working on a little program that will convert the tz data into XML so that it can be more useful in applications like yours that want to raw data, not the binary results of zic. Chris ----- Original Message ----- From: "Gwillim Law" <gwil@mindspring.com> To: <tz@elsie.nci.nih.gov> Sent: Wednesday, February 28, 2001 1:02 AM Subject: Html-ize the tz database?
Recently, this mailing list has discussed the perceived need for standardized time zone abbreviations. There has also been some talk about putting the boundaries of time zones into a GIS (Geographic Information System) format. Several people have volunteered to help with that task. These issues are not new; they've come up repeatedly in the past.
I've just now posted a few sample Web pages to display data from the tz database in a more accessible format. This applies particularly to the geographic extent of time zones, which is hidden in comments in tzdata2000h. I invite anyone who is interested to visit http://www.mindspring.com/~gwil/tz.html and browse around. Questions to keep in mind: Are the sample pages the best way of displaying the data? Would they be a useful resource for the other tasks mentioned above? Is it feasible to maintain a Web site like this in synch with the tz database?
Gwillim Law
Gwillim, I'm curious how you produced these HTML files?
Manually. I started by taking a copy of tzdata2000h and editing it with a text editor. Several reasons for this: I wanted to get a feel for the data; to experiment with different ways of organizing them; to capture whatever useful information I could find in the comments; and to spot any inconsistencies or holes in the data. There are plenty of applications where it makes sense to automate the process, but I think it should be done manually at least once. Yours, Gwillim Law
Gwillim/Chris/All, Most of the work is parsing the tzdata file in its current form. Once you do this, you can present the data in any form you like. I had sent this earlier to a limited group working on parsing tzdata, but received no feedback. Following mail describes the difficulty I faced parsing timezone data. Please go thru and let me know if the suggestions are viable. I have two scripts, one to extract the rules, the second one to extract a list of all timezones and timezones under each country. Sample of each file is attached at the end. The problems I had were for, 1) Identifying the country name. 2) Identifying the end of a Zone ( I assume either start of next zone or a blank line or a comment line denotes end of current zone). 3) Getting the timezone long name. It is easy to get the one's listed at the top (or bottom) of the file, but difficult if it is written as a note on the zone line itself. All the timezone long names are also not documented. 4) In some cases like China the zone line says Zone Asia/Shanghai 8:05:52 - LMT 1928 8:00 Shang C%sT 1949 8:00 PRC C%sT but the rule for PRC is only until 1991. My script interpreted this as Shangai observing DST with corresponding rule missing. My suggestions to overcome these are ---------------------------------------------------- 1) Adding a tag #<ctryname> before the country name (or #<ctry> ... #<EndCtry> at the beginning and end). 2) A #<Zone> at beginning and #<EndZone> tag at the end of each zone. It can be done for Rule and Link also to make it consistent. 3) List all timezone abbrevaitions and names in a separate file (since timezones like EET are used across files), and consistently use the same names (need to figure out a way to handle duplicate names, possibly by using country name in conjunction, wherever relevant). 4) If any zone has a name like C%sT at the last line of its defenition, implying it observes daylight savings, then the correspoding rule line must have two entries each until 'max' (one for start and one for end). If not split the lines as suggested below 8:00 PRC C%sT 1991 8:00 - CST All the changes suggested above does not affect tzcode, as we are adding comment lines only. Sample files generated by parser, Thanks -Syed -----Original Message----- From: Gwillim Law [mailto:gwil@mindspring.com] Sent: Wednesday, February 28, 2001 9:57 AM To: Chris Sells; tz@elsie.nci.nih.gov Subject: Re: Html-ize the tz database?
Gwillim, I'm curious how you produced these HTML files?
Manually. I started by taking a copy of tzdata2000h and editing it with a text editor. Several reasons for this: I wanted to get a feel for the data; to experiment with different ways of organizing them; to capture whatever useful information I could find in the comments; and to spot any inconsistencies or holes in the data. There are plenty of applications where it makes sense to automate the process, but I think it should be done manually at least once. Yours, Gwillim Law
Syed, When I produce the parser that outputs XML, I'll let you know and you can let me know if it helps you produce the data you're looking for. The current data format, while cryptic, seems to be parseable by zic, so I'm leveraging that code to build my own program to output XML. Chris
-----Original Message----- From: Syed Sajjath [mailto:Syed.Sajjath@wcom.com] Sent: Wednesday, February 28, 2001 12:57 PM To: 'Gwillim Law'; 'Chris Sells'; tz@elsie.nci.nih.gov Subject: RE: Html-ize the tz database?
Gwillim/Chris/All,
Most of the work is parsing the tzdata file in its current form. Once you do this, you can present the data in any form you like.
I had sent this earlier to a limited group working on parsing tzdata, but received no feedback. Following mail describes the difficulty I faced parsing timezone data. Please go thru and let me know if the suggestions are viable.
I have two scripts, one to extract the rules, the second one to extract a list of all timezones and timezones under each country.
Sample of each file is attached at the end.
The problems I had were for,
1) Identifying the country name. 2) Identifying the end of a Zone ( I assume either start of next zone or a blank line or a comment line denotes end of current zone). 3) Getting the timezone long name. It is easy to get the one's listed at the top (or bottom) of the file, but difficult if it is written as a note on the zone line itself. All the timezone long names are also not documented. 4) In some cases like China the zone line says Zone Asia/Shanghai 8:05:52 - LMT 1928 8:00 Shang C%sT 1949 8:00 PRC C%sT but the rule for PRC is only until 1991. My script interpreted this as Shangai observing DST with corresponding rule missing.
My suggestions to overcome these are ---------------------------------------------------- 1) Adding a tag #<ctryname> before the country name (or #<ctry> ... #<EndCtry> at the beginning and end). 2) A #<Zone> at beginning and #<EndZone> tag at the end of each zone. It can be done for Rule and Link also to make it consistent. 3) List all timezone abbrevaitions and names in a separate file (since timezones like EET are used across files), and consistently use the same names (need to figure out a way to handle duplicate names, possibly by using country name in conjunction, wherever relevant). 4) If any zone has a name like C%sT at the last line of its defenition, implying it observes daylight savings, then the correspoding rule line must have two entries each until 'max' (one for start and one for end). If not split the lines as suggested below 8:00 PRC C%sT 1991 8:00 - CST
All the changes suggested above does not affect tzcode, as we are adding comment lines only.
Sample files generated by parser,
Thanks -Syed
-----Original Message----- From: Gwillim Law [mailto:gwil@mindspring.com] Sent: Wednesday, February 28, 2001 9:57 AM To: Chris Sells; tz@elsie.nci.nih.gov Subject: Re: Html-ize the tz database?
Gwillim, I'm curious how you produced these HTML files?
Manually. I started by taking a copy of tzdata2000h and editing it with a text editor. Several reasons for this: I wanted to get a feel for the data; to experiment with different ways of organizing them; to capture whatever useful information I could find in the comments; and to spot any inconsistencies or holes in the data. There are plenty of applications where it makes sense to automate the process, but I think it should be done manually at least once.
Yours, Gwillim Law
Gwillim Law wrote:
I've just now posted a few sample Web pages to display data from the tz database in a more accessible format. Very nice effort! This information is excellent for human readers. I especially like the page "State and Province Links" (ls.html) with its many relevant links.
Is it feasible to maintain a Web site like this in synch with the tz database? That depends. Who's gonna maintain this web site for as long as the tz-info will be maintained? It involves a lot of work which cannot be programmed. For instance, checking the existence of links can be automated, but finding new ones not.
Consistent data ---------------- One job should be done soon, I would like to suggest. That's to improve the consistency of the tz data within its format. If the tz data files were extremely consistent, programmers of non-unix applications had less manual labor to do, _much_ less, as I experienced myself for my Macintosh HyperCard application. Two examples: 1. The zone.tab information (land code, coordinates) should be integrated into the tz data files. 2. In most cases the Rule and Zone fields are separated by a tab, but sometimes by a space run and sometimes by nothing, just an absolute character position. This gave me severe headaches transporting the data to my application. Another issue is that many useful information bits are commented out (#) and reside at rather random locations. It has been proposed to xml-ize the tz data files. This would improve the transportability, but it would degrade the human readability. Although Garrett Wollman said on Apr. 18 2000: "And there is no inherent virtue in XML....." it would make especially a difference if there were a concurrent web site with tz data like Gwillim Law's proposal. The April 2000 "Time Zone Issues" thread discussed many pros and cons around xml. The "format of tz database" thread discussed the VTIMEZONE format. I think there is a 'market' for extremely consistent and transportable (i.e. script readable) tz data. But html? Excellent for humans, but I would like to address the issue application transportability. So XML? Proposals: 1. Keep for the moment the current tz data format, but clean it up: exactly 1 tab as data field separator, no spaces, always 1 tab. 2. Integrate zone.tab info in special comments, for instance beginning with #%, if necessary separated by tab fields. 3. Replace recurring information which is now # commented to _special_ commented lines. In this manner current Unix tz-applications can still use the tz data, but the data is much better transportable to other applications. Complex comments are very productive: take for example the OPI comments/directives in PostScript code. Regards, Oscar van Vlijmen 2001-03-02
Oscar van Vlijmen said:
Consistent data ---------------- One job should be done soon, I would like to suggest. That's to improve the consistency of the tz data within its format. If the tz data files were extremely consistent, programmers of non-unix applications had less manual labor to do, _much_ less, as I experienced myself for my Macintosh HyperCard application.
Do we have a formal syntax for the tz data files ? If we did, it would be trivial to write a tool that ensures that files remain consistent (and perhaps pretty-printed it as a side effect, allowing humans to check that it says what they think it says).
It has been proposed to xml-ize the tz data files. This would improve the transportability, but it would degrade the human readability.
I don't think that's necessary; plain text is perfectly transportable. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8371 1138 Internet Expert | Home: <clive@davros.org> | Fax: +44 20 8371 1037 Demon Internet | WWW: http://www.davros.org | DFax: +44 20 8371 4037 Thus plc | | Mobile: +44 7973 377646
From: "Clive D.W. Feather" Oscar van Vlijmen said:
Consistent data ---------------- One job should be done soon, I would like to suggest. That's to improve the consistency of the tz data within its format. Do we have a formal syntax for the tz data files ?
Yep, kinda: the file "zic.8" in the tzcode folder. The text says amongst others: "Fields are separated from one another by any number of white space characters." Currently fields are in most cases separated by tabs, but not always and this causes headaches for writers of import scripts. But I also would like to introduce the concept of complex comment lines, for instance beginning with #%, #$ etcetera, in order to incorporate additional data in a consistent manner without interrupting the usability for Unix systems, the OS the tz system is made for. This concept could be a transitional phase towards an xml or whatever database. If you have consistently formatted and rather complete data now, the transition to something else later goes smoother. Oscar van Vlijmen 2001-03-02
participants (5)
-
Chris Sells
-
Clive D.W. Feather
-
Gwillim Law
-
Oscar van Vlijmen
-
Syed Sajjath