The "TZ" distibution is targeted towards POSIX-compliant hosts, including BSD, HP, Linux, Network Appliance, SCO, SGI, and Sun. Some non-Unix and even non-C developers would like to use the timezone data, but they cannot find nor read the documentation. The main documentation can be found in the file "Theory" in the tzcode folder. This is an unformatted text-only file. The file "zic.8" in the tzcode folder documents the format used in the tz database files. But, not everyone can read or convert groff/troff formatted files like "zic.8". I know, it's hard to believe, but it's a recurring problem. Therefore and once again, I converted the latest version of the zic.8 file to html. Please find it attached to this email. This point was discussed earlier on April 19 & 20, 2000 and April 7, 2002. The TZ files can still be found at: <ftp://elsie.nci.nih.gov/pub/> Download the files: tzcode2003d.tar.gz or newer, and tzdata2003d.tar.gz or newer These are Gzipped tar-balls. Oscar van Vlijmen 2003-10-09
On Thu, 9 Oct 2003, Oscar van Vlijmen wrote:
Some non-Unix and even non-C developers would like to use the timezone data, but they cannot find nor read the documentation.
If anyone's interested, I've implemented a Perl module that provides a Perl API to the timezone database. It parses the source text files and generates Perl modules, rather than simply overlaying the C API, for a number of reasons. The main reason was that I wanted this to work outside the bounds of the epoch, so using native time_t values wouldn't work. Anyway, the module is called DateTime::TimeZone, and is at http://search.cpan.org/dist/DateTime-TimeZone/. It's part of a larger project to provide decent date/time support for Perl. More info on the project can be found at datetime.perl.org. BTW, writing a parser for the TZ data was a bit of a nightmare, as the interrelations between observance changes and rules make it very complex to figure out what rule is in effect at any given time. I don't know if there's a better way to represent this data, but explaining _how_ to do this to someone who wants to write a parser is non-trivial. The zic.8 file doesn't really explain the "how", just the "what". Also, to make sure I was doing this correctly, I wrote some code to generate tests based on zdump's output. This is a script in the DateTime::TimeZone distro called tools/tests_from_zdump. People working on providing a non-C API may find this script useful, as it generates comprehensive tests for _every_ zone for _every_ change that zdump outputs. It could easily be adapted to generate tests for a different language's API. -dave /*======================= House Absolute Consulting www.houseabsolute.com =======================*/
Thanks for the heads-up. I'll propose something like the following xhtml in my next proposed tz update. Comments welcome. (I must confess that at Twin Sun we've avoided DateTime in our Perl code for performance reasons -- I didn't know it had tz support these days.) <li><a href="http://search.cpan.org/dist/DateTime-TimeZone/">DateTime::TimeZone</a> contains a script <code>parse_olson</code> that compiles <code>tz</code> source into <a href="http://www.perl.org/">Perl</a> modules. It is part of the Perl <a href="http://datetime.perl.org/">DateTime Project</a>, which is freely available under both the GPL and the Perl <a href="http://www.perl.com/language/misc/Artistic.html">Artistic License</a>. DateTime::TimeZone also contains a script <code>tests_from_zdump</code> that generates test cases for each clock transition in the <code>tz</code> database.</li>
On Sun, 12 Oct 2003, Paul Eggert wrote:
Thanks for the heads-up. I'll propose something like the following xhtml in my next proposed tz update. Comments welcome. (I must confess that at Twin Sun we've avoided DateTime in our Perl code for performance reasons -- I didn't know it had tz support these days.)
Hmm? It's had tz support since the very beginning. In fact, one of the main reasons for writing _new_ modules was to provide a useful Perl API to the Olson database, since previously the only way to do anything with timezones was to do something like "$ENV{TZ} = 'foo/bar'; POSIX::tzset()". That won't get you very far if you need to deal with datetimes in _multiple_ timezones all at once. As for performance, it's definitely slower than a lot of the alternatives, except for those cases where it's the _only module_ that does what you need, which is true for quite a number of things.
<li><a href="http://search.cpan.org/dist/DateTime-TimeZone/">DateTime::TimeZone</a> contains a script <code>parse_olson</code> that compiles <code>tz</code> source into <a href="http://www.perl.org/">Perl</a> modules. It is part of the Perl <a href="http://datetime.perl.org/">DateTime Project</a>, which is freely available under both the GPL and the Perl <a href="http://www.perl.com/language/misc/Artistic.html">Artistic License</a>. DateTime::TimeZone also contains a script <code>tests_from_zdump</code> that generates test cases for each clock transition in the <code>tz</code> database.</li>
Looks good. -dave /*======================= House Absolute Consulting www.houseabsolute.com =======================*/
At Thu, 09 Oct 2003 21:37:46 +0200, Oscar van Vlijmen <ovv@hetnet.nl> writes:
Therefore and once again, I converted the latest version of the zic.8 file to html.
Did you generate the HTML automatically? If so, I'd rather put the automated procedure into the makefile. If not, then I'm a bit dubious. I'd rather not maintain two forms of the documentation by hand. In my experience, HTML is not a particularly good way to maintain computer documentation. I'd far rather use texinfo, but even man pages are better than HTML. Ideally the automated procedure, whatever it is, would generate XHTML 1.1, which is the latest version of HTML. (Hmm, I see that tz-link.htm and tz-art.htm specify XHTML 1.0; time to upgrade.)
Paul Eggert <eggert@twinsun.com> writes:
Ideally the automated procedure, whatever it is, would generate XHTML 1.1, which is the latest version of HTML. (Hmm, I see that tz-link.htm and tz-art.htm specify XHTML 1.0; time to upgrade.)
I would be a bit careful about moving to XHTML 1.1 without analyzing all of the issues. XHTML 1.1 is not really the same thing as HTML; for example, it's not supposed to be served out as text/html. Many browsers don't really support XHTML 1.1 yet. XHTML 1.0 or HTML 4.01 Strict are still the best choices for portable HTML. Future versions of XHTML are not intended to be backward compatible upgrades to HTML, but instead are moving the standard intentionally in a completely different direction towards a purer XML-based markup world that not all browsers are prepared for. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>
At Sun, 12 Oct 2003 11:55:38 -0700, Russ Allbery <rra@stanford.edu> writes:
I would be a bit careful about moving to XHTML 1.1 without analyzing all of the issues. XHTML 1.1 is not really the same thing as HTML; for example, it's not supposed to be served out as text/html.
I suppose you're right; it's better to be cautious. Though, so long as ones sticks to a conservative subset, I don't know of any practical problems in serving XHTML 1.1 as text/html to any browsers that are in real-world use. For the tz web pages, the only difference between XHTML 1.0 Strict and XHTML 1.1 (aside from the new DOCTYPE) is that the <html> element would no longer have a 'lang' attribute.
Therefore and once again, I converted the latest version of the zic.8 file to html. Did you generate the HTML automatically? If so, I'd rather put the automated procedure into the makefile. If not, then I'm a bit dubious. I'd rather not maintain two forms of the documentation by hand. In my experience, HTML is not a particularly good way to maintain computer documentation. I'd far rather use texinfo, but even man pages are better than HTML. Ideally the automated procedure, whatever it is, would generate XHTML 1.1, which is the latest version of HTML. (Hmm, I see that tz-link.htm and tz-art.htm specify XHTML 1.0; time to upgrade.)
Sorry that I am again (cf. my emails dated 2000-04-19, 2002-04-07, 2003-10-09) not able to make this point clear. Allow me to try to explain the situation again. If the maintainers of the TZ distribution believe that they should only support the platforms the TZ software is targeted at then: don't do no nothing! Just leave all documentation as is. The reality is however that once in a while some developer is trying to use the TZ database files for developing a completely different application. Don't believe it, but it is true: not everybody has a Unix/Linux operating system. Not everybody has a C compiler (Makefile!). Not everybody has a groff/troff reader (zic.8!). Not everybody has a Texinfo, DocBook or linuxdoc-sgml reader. Not everybody has an XHTML 1.1 capable browser. At this moment in time there are two viable options for producing documentation for other platforms than the TZ software is aimed at: - text only, iso-8859-1 encoding, file name ending in .txt - html 4.01, iso-8859-1 encoding, file name ending in .htm(l) Whatever fancy software you might have, whatever software a 'regular' TZ user is able to run, at the moment only text-only and html 4 are truly platform independent. Why not something fancy? Why not according to the latest 'standards'? The answer is simple: not all computer platforms are alike and most software firms are not willing to conform to the latest standards. So: if TZ is solely targeted to and supported for POSIX-compliant hosts, including BSD, HP, Linux, Network Appliance, SCO, SGI, and Sun, then don't change anything in the documentation. Just ignore others and let them try to find out for themselves what is going on. If the TZ maintainers are willing to offer a small service to users of the TZ database files on other platforms, please consider to offer platform independent documentation in text-only or html 4 format. The least amount of effort would be: put an extra file in the distribution, namely zic.8 converted to text or html.
From Paul Eggert, 2000-04-20: ' "groff -Thtml -man zic.8" would have done it automatically '
Since it is to a non-target user not clear why ever he/she should read zic.8 or zic.html, something of this nature could be mentioned in the file "README", like: More information about the files in this distribution and their backgrounds can be read in the documentation file "Theory". The file "zic.html" describes, amongst others, the format of the TZ database files in the tzdata directory. The latter sentence could be repeated as a new item in the file "Theory" after: "Points of interest to folks with other systems:"
I'd rather not maintain two forms of the documentation by hand. Why not? The files "Theory" and "zic.8" hardly ever change. It would be not too difficult to make once every 3 years or so a minor change in 2 files at once (zic.8, zic.html), not even by hand!
In my experience, HTML is not a particularly good way to maintain computer documentation. It is a very useful way on other computer platforms like Windows and Macintosh. At any rate html 4 (yes: 4.01) without too much css is very platform independent and compatible. As is text-only.
... computer documentation. I'd far rather use texinfo, but even man pages are better than HTML. Not compatible with other platforms.
Did you generate the [zic8.html] HTML automatically? By hand, in order to keep most of the formatting in the zic.8 file, and because I have no groff converter/reader!
Oscar van Vlijmen 2003-10-12
At Sun, 12 Oct 2003 22:42:37 +0200, Oscar van Vlijmen <ovv@hetnet.nl> writes:
At this moment in time there are two viable options for producing documentation for other platforms than the TZ software is aimed at: - text only, iso-8859-1 encoding, file name ending in .txt - html 4.01, iso-8859-1 encoding, file name ending in .htm(l)
The automatic methods for generating HTML are somewhat buggy. However, groff and troff do a passable job at generating plain text, so I suppose it'd be reasonable to put them into the tz Makefile. (ISO-8859-1 isn't universally recognized, though, particularly in the Far East; we should stick with ASCII if we're not prepared to go with UTF-8.) Arthur, would it be reasonable to assume groff on the part of the maintainers? troff is less used these days. We could distribute the groff output, so people who download the files wouldn't need it.
I'd rather not maintain two forms of the documentation by hand. Why not? The files "Theory" and "zic.8" hardly ever change.
But they still change now and then. They will change again once we add support for 64-bit time_t, for example. I'd rather not worry about their going out-of-sync.
Paul Eggert wrote:
(ISO-8859-1 isn't universally recognized, though, particularly in the Far East; we should stick with ASCII if we're not prepared to go with UTF-8.)
1. There are in the POSIX 1 region of characters below code position 128 no significant differences between the encodings us-ascii, iso-8859-1 and utf-8. 2. If an html page will be viewed off-line, it would be useful to put a <meta> tag in the <head> section describing a character set. iso-8859-1 would be the most compatible, even for (localized versions of) browsers used in countries with other character sets. Some - admittedly, few - browsers refuse to open utf-8, are not able to process literally encoded multibyte utf-8, or very oddly, do not recognise a "us-ascii" character set meta-tag. Oscar van Vlijmen 2003-10-13
From: Paul Eggert <eggert@twinsun.com> Date: Mon, 13 Oct 2003 01:44:05 -0700 To: Oscar van Vlijmen <ovv@hetnet.nl> Cc: TZ-list <tz@lecserver.nci.nih.gov> Subject: Re: zic.8 in html Sorry I cannot reply to Paul directly. A previously sent email bounced back with a strange error:
"did not reach the following recipient(s): eggert@twinsun.com on Sun, 12 Oct 2003 22:42:38 0200 The recipient could not be processed because it would violate the security policy in force <hnexfe10.hetnet.nl #5.7.0 smtp;553 5.7.0 Header error 170 on a line by itself>"
At Mon, 13 Oct 2003 23:47:00 +0200, Oscar van Vlijmen <ovv@hetnet.nl> writes:
1. There are in the POSIX 1 region of characters below code position 128 no significant differences between the encodings us-ascii, iso-8859-1 and utf-8.
Yes, if we stick to the ASCII subset (and use only TAB and LF among the control characters) we should be OK. Pretty much everybody can read ASCII. ISO-8859-1 is incompatible with UTF-8, EUC, shift-JIS, etc. and so it's a bit more likely to be mishandled. (ISO-8859-1 used to be the default character set for HTML, but that was a while ago now.)
2. If an html page will be viewed off-line, it would be useful to put a <meta> tag in the <head> section describing a character set. iso-8859-1 would be the most compatible,
For some time the web pages have had <meta> tags that specify US-ASCII, which is a bit more conservative than ISO-8859-1. They also have their US-ASCII encoding specified in their XML declaration. Hmm, I just noticed that the HTTP header said "Content-Type: text/html; charset=iso-8859-1"; I just fixed this to say "us-ascii" so that it's all consistent.
Date: Mon, 13 Oct 2003 01:44:05 -0700 From: Paul Eggert <eggert@twinsun.com> Message-ID: <7wzng5il2i.fsf@sic.twinsun.com> | would it be reasonable to assume groff on the part of the maintainers? Why not just use nroff? The groff distribution includes a shell script that uses groff to emulate nroff - AT&T derived unix systems are all going to have nroff - it is hard to imagine anyone with the ability to process man(7) who doesn't have an nroff command. If necessary, post process the output to remove overstriking. kre
Paul Eggert wrote on 2003-10-12 15:36 UTC:
Ideally the automated procedure, whatever it is, would generate XHTML 1.1, which is the latest version of HTML.
For anyone still believing in the "XHTML is the latest version of HTML" fairy tale, I warmly recommend as tonight's bedtime reading http://www.hixie.ch/advocacy/xhtml Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__
participants (6)
-
Dave Rolsky -
Markus Kuhn -
Oscar van Vlijmen -
Paul Eggert -
Robert Elz -
Russ Allbery