Gwillim/Chris/All, Most of the work is parsing the tzdata file in its current form. Once you do this, you can present the data in any form you like. I had sent this earlier to a limited group working on parsing tzdata, but received no feedback. Following mail describes the difficulty I faced parsing timezone data. Please go thru and let me know if the suggestions are viable. I have two scripts, one to extract the rules, the second one to extract a list of all timezones and timezones under each country. Sample of each file is attached at the end. The problems I had were for, 1) Identifying the country name. 2) Identifying the end of a Zone ( I assume either start of next zone or a blank line or a comment line denotes end of current zone). 3) Getting the timezone long name. It is easy to get the one's listed at the top (or bottom) of the file, but difficult if it is written as a note on the zone line itself. All the timezone long names are also not documented. 4) In some cases like China the zone line says Zone Asia/Shanghai 8:05:52 - LMT 1928 8:00 Shang C%sT 1949 8:00 PRC C%sT but the rule for PRC is only until 1991. My script interpreted this as Shangai observing DST with corresponding rule missing. My suggestions to overcome these are ---------------------------------------------------- 1) Adding a tag #<ctryname> before the country name (or #<ctry> ... #<EndCtry> at the beginning and end). 2) A #<Zone> at beginning and #<EndZone> tag at the end of each zone. It can be done for Rule and Link also to make it consistent. 3) List all timezone abbrevaitions and names in a separate file (since timezones like EET are used across files), and consistently use the same names (need to figure out a way to handle duplicate names, possibly by using country name in conjunction, wherever relevant). 4) If any zone has a name like C%sT at the last line of its defenition, implying it observes daylight savings, then the correspoding rule line must have two entries each until 'max' (one for start and one for end). If not split the lines as suggested below 8:00 PRC C%sT 1991 8:00 - CST All the changes suggested above does not affect tzcode, as we are adding comment lines only. Sample files generated by parser, Thanks -Syed -----Original Message----- From: Gwillim Law [mailto:gwil@mindspring.com] Sent: Wednesday, February 28, 2001 9:57 AM To: Chris Sells; tz@elsie.nci.nih.gov Subject: Re: Html-ize the tz database?
Gwillim, I'm curious how you produced these HTML files?
Manually. I started by taking a copy of tzdata2000h and editing it with a text editor. Several reasons for this: I wanted to get a feel for the data; to experiment with different ways of organizing them; to capture whatever useful information I could find in the comments; and to spot any inconsistencies or holes in the data. There are plenty of applications where it makes sense to automate the process, but I think it should be done manually at least once. Yours, Gwillim Law