tabs vs spaces

newer
Proposal to use Asia/Tel_Aviv for...

David Muir Sharnoff

May 1, 2013

5:42 a.m.

I wanted to extract some of the data from the timezone files so I wrote a quick parser for them. In the process I discovered 1,477 lines that have spaces where they should have tabs. Do you care? If so, do you want to fix them with (a) my fixup script (b) a patch that I generate; (c) by hand -- I can list the file names and line numbers? -Dave

Attachments:

attachment.html (text/html — 600 bytes)

Show replies by date

Ian Abbott

May 2013

9 a.m.

On 2013-05-01 06:42, David Muir Sharnoff wrote:

...

I wanted to extract some of the data from the timezone files so I wrote a quick parser for them. In the process I discovered 1,477 lines that have spaces where they should have tabs.

Do you care?

If so, do you want to fix them with (a) my fixup script (b) a patch that I generate; (c) by hand -- I can list the file names and line numbers?

Perhaps you should just fix your parser? See the zic man page for the format of the the timezone files. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-

Tobias Conradi

9:39 a.m.

On Wed, May 1, 2013 at 7:42 AM, David Muir Sharnoff <lists@dave.sharnoff.org> wrote:

...

If so, do you want to fix them with (a) my fixup script (b) a patch that I generate; (c) by hand -- I can list the file names and line numbers? Could you sent the patch to the list, so people can see what you discovered and perceived as wrong?

-- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

random832＠fastmail.us

11:35 a.m.

On Wed, May 1, 2013, at 1:42, David Muir Sharnoff wrote:

...

I wanted to extract some of the data from the timezone files so I wrote a quick parser for them. In the process I discovered 1,477 lines that have spaces where they should have tabs.

Tokens in the timezone files are separated by _any whitespace_. In most languages, splitting up by 'any whitespace' is the simplest thing in the world. In C (where nothing is simple), you could reuse the code from zic itself.

Tobias Conradi

12:08 p.m.

On Wed, May 1, 2013 at 1:35 PM, <random832@fastmail.us> wrote:

...

On Wed, May 1, 2013, at 1:42, David Muir Sharnoff wrote:

...
I wanted to extract some of the data from the timezone files so I wrote a quick parser for them. In the process I discovered 1,477 lines that have spaces where they should have tabs.

Tokens in the timezone files are separated by _any whitespace_. I don't see the word "token" in ftp://ftp.iana.org/tz/code/Theory

And I assume not any white space separates tokens.

...

In most languages, splitting up by 'any whitespace' is the simplest thing in the world. Evidence? I assume in most languages there are things simpler than that, e.g. splitting by space.

...

In C (where nothing is simple), you could reuse the code from zic itself. Or not.

-- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

random832＠fastmail.us

12:24 p.m.

On Wed, May 1, 2013, at 8:08, Tobias Conradi wrote:

...

On Wed, May 1, 2013 at 1:35 PM, <random832@fastmail.us> wrote:

...
Tokens in the timezone files are separated by _any whitespace_. I don't see the word "token" in ftp://ftp.iana.org/tz/code/Theory

It's the plain english meaning of the word. Theory does not contain a specification for the format of these files. And you knew that, so what was the point of making this post? Any whitespace is the separation in the format that zic accepts (as evidenced by the fact that these lines don't, in fact, break anything), I have no idea why people are trying to impose a tab-separated format on it, particularly when they don't have a consistent _number_ of tabs regardless (in northamerica alone, there are 455 lines that begin with a tab, 9 that have extra tabs at the end, and 7 that have two tabs between two fields).

Tobias Conradi

12:46 p.m.

On Wed, May 1, 2013 at 2:24 PM, <random832@fastmail.us> wrote:

...

On Wed, May 1, 2013, at 8:08, Tobias Conradi wrote:

...
On Wed, May 1, 2013 at 1:35 PM, <random832@fastmail.us> wrote:

...
Tokens in the timezone files are separated by _any whitespace_. I don't see the word "token" in ftp://ftp.iana.org/tz/code/Theory

It's the plain english meaning of the word. http://www.wordreference.com/definition/token lists seven English meanings.

...

Theory does not contain a specification for the format of these files. And you knew that, When? And source for that?

...

so what was the point of making this post? The plain English meaning of the post.

...

Any whitespace is the separation in the format that zic accepts (as evidenced by the fact that these lines don't, in fact, break anything), I have no idea why people are trying to impose a tab-separated format on it, And because you do not have any idea why, you reject it?

...

particularly when they don't have a consistent _number_ of tabs regardless (in northamerica alone, there are 455 lines that begin with a tab, 9 that have extra tabs at the end, and 7 that have two tabs between two fields). The variable number - I don't know whether it is not consistent - was a topic on the list before, e.g. for zone.tab:

http://mm.icann.org/pipermail/tz/2012-March/017507.html I don't know about benefits of having a variable number, but if there is none, it is a benefit to have a constant number, e.g. for opening files in OpenOffice. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Ian Abbott

8:13 p.m.

On 01/05/2013 13:46, Tobias Conradi wrote:

...

On Wed, May 1, 2013 at 2:24 PM, <random832@fastmail.us> wrote:

...
On Wed, May 1, 2013, at 8:08, Tobias Conradi wrote:

...
On Wed, May 1, 2013 at 1:35 PM, <random832@fastmail.us> wrote:

...
Tokens in the timezone files are separated by _any whitespace_. I don't see the word "token" in ftp://ftp.iana.org/tz/code/Theory

It's the plain english meaning of the word. http://www.wordreference.com/definition/token lists seven English meanings.

Okay, so it's the common meaning of the word when talking about parsers.

...

...
Any whitespace is the separation in the format that zic accepts (as evidenced by the fact that these lines don't, in fact, break anything), I have no idea why people are trying to impose a tab-separated format on it, And because you do not have any idea why, you reject it?

One reason for rejecting it is that it is already documented that the fields on a line are separated by any number of white space characters.

...

...
particularly when they don't have a consistent _number_ of tabs regardless (in northamerica alone, there are 455 lines that begin with a tab, 9 that have extra tabs at the end, and 7 that have two tabs between two fields). The variable number - I don't know whether it is not consistent - was a topic on the list before, e.g. for zone.tab:

http://mm.icann.org/pipermail/tz/2012-March/017507.html

I don't know about benefits of having a variable number, but if there is none, it is a benefit to have a constant number, e.g. for opening files in OpenOffice.

One benefit is that it allows you to line things up neatly, at least if diplayed monospace with tab-stops every 8 character columns as the Unix god intended. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-

Guy Harris

6:31 p.m.

On May 1, 2013, at 5:08 AM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...

I don't see the word "token" in ftp://ftp.iana.org/tz/code/Theory

I do, however, see Input lines are made up of fields. Fields are separated from one another by any number of white space characters. Leading and trailing white space on input lines is ignored. An unquoted sharp character (#) in the input introduces a comment which extends to the end of the line the sharp character appears on. White space characters and sharp characters may be enclosed in double quotes (") if they are to be used as part of a field. Any line that is blank (after comment stripping) is ignored. Non-blank lines are expected to be of one of three types: rule lines, zone lines, and link lines. in the zic manual page.

John Haxby

7:07 p.m.

On 01/05/13 13:08, Tobias Conradi wrote:

...

I don't see the word "token" in ftp://ftp.iana.org/tz/code/Theory

And I assume not any white space separates tokens.

Theory doesn't need to define "token"; it's common terminology for lexical analysis and very widely used. I don't understand the second sentence, but it does seem to be an ill-founded assumption. jch

Guy Harris

7:39 p.m.

On May 1, 2013, at 5:08 AM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...

And I assume not any white space separates tokens.

If you mean "not all white space characters are valid separators", I suspect that none of Vertical Tab, Carriage Return, or Line Feed/New Line would be considered valid separators, and I don't know whether we support non-ASCII characters in the data portion (rather than the comment portion) of the files, so Non-Breaking Space, for example, might not be valid either. However, I consider treating tabs and spaces differently to be an error (I think Stu Feldman considered it one of the biggest mistakes in Make); we should allow arbitrary numbers of tabs and/or spaces to be valid separators.

Paul_Koning＠Dell.com

7:56 p.m.

On May 1, 2013, at 3:39 PM, Guy Harris wrote:

...

On May 1, 2013, at 5:08 AM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
And I assume not any white space separates tokens.

If you mean "not all white space characters are valid separators", I suspect that none of Vertical Tab, Carriage Return, or Line Feed/New Line would be considered valid separators, and I don't know whether we support non-ASCII characters in the data portion (rather than the comment portion) of the files, so Non-Breaking Space, for example, might not be valid either.

However, I consider treating tabs and spaces differently to be an error (I think Stu Feldman considered it one of the biggest mistakes in Make); we should allow arbitrary numbers of tabs and/or spaces to be valid separators.

Correct. That is the normal definition for any programming language that is roughly free-form (i.e., just about every well known language other than Python and Fortran. And even those two allow tabs and spaces to be mixed, but since layout has meaning there things are slightly more constrained. The intent of the zic rules is clearly the same as the lexical rules of languages like C or Pascal. In other words, the original report was in error, and the files are correct as they stand. paul

James Cloos

11:51 p.m.

...

...
...
...
...
"GH" == Guy Harris <guy@alum.mit.edu> writes:

GH> If you mean "not all white space characters are valid separators", I GH> suspect that none of Vertical Tab, Carriage Return, or Line Feed/New GH> Line would be considered valid separators, zic.c uses isascii(3) and isspace(3), so it is libc and locale dependent, although in practice the use of isascii(3) should minimize that dependence. Maybe even eliminate it. In the C and POSIX locales, ' ', '\f', '\n', '\r', '\t' and '\v' all are valid whitespace, as confirmed by various man pages. The freebsd man page notes that that definition comes from ISO C90. By the time getfields() is called, the buffer is limited to one line, so '\n' would never be seen. Strings of any of the rest will be treated as token separators Or initial or terminal white space, at the start of end of the line. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6

Guy Harris

12:52 a.m.

On May 1, 2013, at 4:51 PM, James Cloos <cloos@jhcloos.com> wrote:

...

...
...
...
...
...
"GH" == Guy Harris <guy@alum.mit.edu> writes:

GH> If you mean "not all white space characters are valid separators", I GH> suspect that none of Vertical Tab, Carriage Return, or Line Feed/New GH> Line would be considered valid separators,

zic.c uses isascii(3) and isspace(3), so it is libc and locale dependent, although in practice the use of isascii(3) should minimize that dependence. Maybe even eliminate it.

Perhaps it should use "c == ' ' || c == '\t'" instead, i.e. only horizontal white space.

lord.buddha＠gmail.com

4:16 a.m.

;) We should change to an XML format that can be validated against version-ed schema(s) rather then keeping this legacy format accessible only by zic or an organic neural net :) Now a patch to zic for it to translate to a standard XML format would be useful. But then again, nothing seems to be broken at the moment ... On 2 May 2013 12:52, Guy Harris <guy@alum.mit.edu> wrote:

...

On May 1, 2013, at 4:51 PM, James Cloos <cloos@jhcloos.com> wrote:

...
...
...
...
...
> "GH" == Guy Harris <guy@alum.mit.edu> writes:

GH> If you mean "not all white space characters are valid separators", I GH> suspect that none of Vertical Tab, Carriage Return, or Line Feed/New GH> Line would be considered valid separators,

zic.c uses isascii(3) and isspace(3), so it is libc and locale dependent, although in practice the use of isascii(3) should minimize that dependence. Maybe even eliminate it.

Perhaps it should use "c == ' ' || c == '\t'" instead, i.e. only horizontal white space.

Russ Allbery

6:33 p.m.

James Cloos <cloos@jhcloos.com> writes:

...

zic.c uses isascii(3) and isspace(3), so it is libc and locale dependent, although in practice the use of isascii(3) should minimize that dependence. Maybe even eliminate it.

A lot of software uses isspace(3) in places where it feels to me like isblank(3) would be more appropriate, probably because isblank(3) is relatively new (C99 and POSIX). I wonder if this is one such place. It doesn't particularly matter, since the chances of running into one of the characters that differ is quite remote, but intuitively I would expect an occurrence of a vertical tab in the middle of a zic.c input file line to be more likely to be corruption than an intentional whitespace character. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>

Bennett Todd

7:06 p.m.

My reaction when I see a case like this, is to write a lint style checker, and rig it to be called from the makefile. I'd volunteer, but I do such things in perl, and that wouldn't be appropriate for tzcode.

Paul Eggert

6:34 p.m.

On 05/01/13 16:51, James Cloos wrote:

...

zic.c uses isascii(3) and isspace(3), so it is libc and locale dependent, although in practice the use of isascii(3) should minimize that dependence. Maybe even eliminate it.

Normally, zic doesn't call setlocale, so it's running in the C locale. There is a compile-time option HAVE_GETTEXT that enables setlocale but it's not set automatically. The intent of HAVE_GETTEXT was to allow zic's diagnostics to be internationalized; it was not to change the input language of zic. So perhaps the HAVE_GETTEXT stuff should be removed; or if it's kept, the use of ctype.h primitives should be removed (which'd be more work....).

Clive D.W. Feather

8:46 a.m.

James Cloos said:

...

zic.c uses isascii(3) and isspace(3), so it is libc and locale dependent, although in practice the use of isascii(3) should minimize that dependence. Maybe even eliminate it.

"isascii" is not a C Standard function. The C Standard requires that all the <ctype.h> functions work with all possible values of type unsigned char and with EOF, so there is no need for "isascii".

...

In the C and POSIX locales, ' ', '\f', '\n', '\r', '\t' and '\v' all are valid whitespace, as confirmed by various man pages. The freebsd man page notes that that definition comes from ISO C90.

The C Standard says that, in the "C" locale, "isspace" is nonzero for those 6 characters and zero for all others. In other locales it may be nonzero for other characters, but "isalnum(c) && isspace(c)" must always be 0. [The above statements apply to the 1990, 1994, and 1999 editions of ISO/IEC 9899.] -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646

Ian Abbott

9:44 a.m.

On 2013-05-03 09:46, Clive D.W. Feather wrote:

...

James Cloos said:

...
zic.c uses isascii(3) and isspace(3), so it is libc and locale dependent, although in practice the use of isascii(3) should minimize that dependence. Maybe even eliminate it.

"isascii" is not a C Standard function. The C Standard requires that all the <ctype.h> functions work with all possible values of type unsigned char and with EOF, so there is no need for "isascii".

Perhaps it can be removed now that zic.c assumes the compiler understands prototypes and stuff. It has some hackery to define "isascii" as a macro if it isn't already defined as a macro, which wouldn't work if the host implemented it as a function and only supported "isalnum" etc. in the range 0 to 127.

...

...
In the C and POSIX locales, ' ', '\f', '\n', '\r', '\t' and '\v' all are valid whitespace, as confirmed by various man pages. The freebsd man page notes that that definition comes from ISO C90.

The C Standard says that, in the "C" locale, "isspace" is nonzero for those 6 characters and zero for all others. In other locales it may be nonzero for other characters, but "isalnum(c) && isspace(c)" must always be 0.

Though as mentioned previously, zic does call "setlocale" if compiled with the HAVE_GETTEXT macro set. I guess it should really switch to the "C" locale while parsing the input, but if zic can assume that the important bits of the input (nothing between a '#' and the end of a line) is ASCII encoded, I don't think parsing it in some other locale would make any difference (unless the locale uses EBCDIC or something...).

...

[The above statements apply to the 1990, 1994, and 1999 editions of ISO/IEC 9899.]

-- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-

random832＠fastmail.us

4:39 p.m.

On Fri, May 3, 2013, at 4:46, Clive D.W. Feather wrote:

...

"isascii" is not a C Standard function. The C Standard requires that all the <ctype.h> functions work with all possible values of type unsigned char and with EOF, so there is no need for "isascii".

The code dates back to at least 1986, which is three years older than the C standard. There has been some cleanup done in recent years to make it more ANSI conformant (rather, more willing to presume ANSI conformance on the part of the environment it is compiled in), but it seems the removal of isascii usage has not been one of those changes so far.

Tobias Conradi

4:09 a.m.

On Wed, May 1, 2013 at 9:39 PM, Guy Harris <guy@alum.mit.edu> wrote:

...

On May 1, 2013, at 5:08 AM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
And I assume not any white space separates tokens.

...

If you mean "not all white space characters are valid separators", I had in mind what they do, not whether they are generally valid.

I came from random832@fastmail.us: "Tokens in the timezone files are separated by _any whitespace_." and re-used the any. I wanted to say "And I assume not /every/ 'any white space' separates tokens." I had in mind the spaces in the comment field: ftp://ftp.iana.org/tz/data/zone.tab "Rothera Station, Adelaide Island" But since token has not been defined by random832 nor is the word at all contained Theory, I might be wrong.

...

we should allow arbitrary numbers of tabs and/or spaces to be valid separators. What is the benefit of having more than one tab or one space as one separator?

One tab would allow easy import of fields into OpenOffice Spreadsheet columns. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Guy Harris

4:31 a.m.

On May 1, 2013, at 9:09 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...

What is the benefit of having more than one tab or one space as one separator?

It lets people use whatever text editor they want, regardless of what it does with horizontal white space. It lets the columns line up naturally for human reading. It lets existing files (which have more than one tab in a row, at least for leading spaces) be read without having to reformat them.

Tobias Conradi

7:26 p.m.

On Thu, May 2, 2013 at 6:31 AM, Guy Harris <guy@alum.mit.edu> wrote:

...

On May 1, 2013, at 9:09 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
What is the benefit of having more than one tab or one space as one separator?

It lets people use whatever text editor they want, regardless of what it does with horizontal white space.

It lets the columns line up naturally for human reading.

It lets existing files (which have more than one tab in a row, at least for leading spaces) be read without having to reformat them.

If instead of space and tab the source files would only use tab, then columns still can be lined up for human reading and no compatibility inside the IANA time zone database would be broken? Sadly the thread starter David Muir Sharnoff <lists@dave.sharnoff.org> didn't yet reply to http://mm.icann.org/pipermail/tz/2013-May/019162.html, so one would have a visualization of data containing spaces as separators. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Tobias Conradi

6:16 p.m.

On Thu, May 2, 2013 at 6:31 AM, Guy Harris <guy@alum.mit.edu> wrote:

...

On May 1, 2013, at 9:09 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
What is the benefit of having more than one tab or one space as one separator?

It lets the columns line up naturally for human reading.

Seems not so for the TZ column with preceding tab, using Google Chrome at https://github.com/eggert/tz/commit/188b29d9664cfcf0384e515c69f94a2dfc27c673... while Google Chrome at ftp://ftp.iana.org/tz/data/zone.tab aligns the TZ column well. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Guy Harris

4:45 a.m.

On May 1, 2013, at 9:09 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...

I came from random832@fastmail.us:

"Tokens in the timezone files are separated by _any whitespace_." and re-used the any.

I wanted to say "And I assume not /every/ 'any white space' separates tokens."

I had in mind the spaces in the comment field:

ftp://ftp.iana.org/tz/data/zone.tab

"Rothera Station, Adelaide Island"

...which isn't a timezone file. That file has a different syntax, as described in the leading comment: # Columns are separated by a single tab. In timezone files, none of the columns may contain whitespace, so an arbitrary string of white space (other than NL) separates columns.

Tobias Conradi

7:46 p.m.

On Thu, May 2, 2013 at 6:45 AM, Guy Harris <guy@alum.mit.edu> wrote:

...

On May 1, 2013, at 9:09 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
I came from random832@fastmail.us:

"Tokens in the timezone files are separated by _any whitespace_." and re-used the any.

I wanted to say "And I assume not /every/ 'any white space' separates tokens."

I had in mind the spaces in the comment field:

ftp://ftp.iana.org/tz/data/zone.tab

"Rothera Station, Adelaide Island"

...which isn't a timezone file.

Where is the set of what you call "timezone files" defined? ftp://ftp.iana.org/tz/code/Theory talks of "time zone rule files" and singular: "The daylight saving time rules to be used for a particular time zone are encoded in the time zone file" No "time zone files" or "timezone files" found.

...

That file has a different syntax, as described in the leading comment:

# Columns are separated by a single tab. So, the benefits layed down at http://mm.icann.org/pipermail/tz/2013-May/019175.html and other unknown reason that lead to multiple white space in some files are out-weighted by something unknown for zone.tab? What could that be?

-- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

random832＠fastmail.us

8:05 p.m.

On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...

Where is the set of what you call "timezone files" defined?

We have already said multiple times that we are talking about the files which are used as inputs to the zic program. They are not discussed in the Theory file. The format is described in the manpage for zic (the filename of this manpage is zic.8 - there may be a zic.8.txt in your distribution; I can't connect to the "ftp.iana.org" server you are using) I cannot possibly believe that you did not know this; you seem to be being deliberately disingenuous.

Tobias Conradi

8:18 p.m.

On Thu, May 2, 2013 at 10:05 PM, <random832@fastmail.us> wrote:

...

On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...
Where is the set of what you call "timezone files" defined?

We have already said multiple times that we are talking about the files which are used as inputs to the zic program. Evidence?

...

They are not discussed in the Theory file. Since that mentions "time zone rule files", I wonder what these then are, and what the files are that are "are used as inputs to the zic program"

...

The format is described in the manpage for zic (the filename of this manpage is zic.8 - there may be a zic.8.txt in your distribution; I can't connect to the "ftp.iana.org" server you are using)

I cannot possibly believe that you did not know this; I can.

...

you seem to be being deliberately disingenuous. To you.

-- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

random832＠fastmail.us

8:56 p.m.

On Thu, May 2, 2013, at 16:18, Tobias Conradi wrote:

...

Since that mentions "time zone rule files",

It uses both names to refer to files such as /usr/share/zoneinfo/America/New_York (location of /usr/share/zoneinfo may vary) i.e. the binary files (not part of the distribution) that are read from at runtime to do time conversions. You can infer this from the fact that it talks about them _being named after cities_, which no file in the distribution is.

...

I wonder what these then are, and what the files are that are "are used as inputs to the zic program"

Why did you put that in quotes? What could POSSIBLY be unclear about "used as inputs to the zic program"? The zic program runs, with a file such as northamerica as its input, and files such as America/New_York, America/Los_Angeles, and dozens of others are its output.

Tobias Conradi

9:28 p.m.

On Thu, May 2, 2013 at 10:56 PM, <random832@fastmail.us> wrote:

...

On Thu, May 2, 2013, at 16:18, Tobias Conradi wrote:

...
Since that mentions "time zone rule files",

It uses both names to refer to files such as /usr/share/zoneinfo/America/New_York (location of /usr/share/zoneinfo may vary) i.e. the binary files (not part of the distribution) that are read from at runtime to do time conversions.

You can infer this from the fact that it talks about them _being named after cities_, which no file in the distribution is. Thank you.

So there are: A) at least two IANA time zone database file, that some people would not call timezone file, namely zone.tab and iso3166.tab, for reading by humans, using a single tab as field separator. B) time zone files that have any number of white spaces as field separator, for reading by humans C) time zone rule files, mentioned in Theory, not for reading by humans In the distribution are: A+B In /usr/share/zoneinfo on debian I see A+C.

...

...
I wonder what these then are, and what the files are that are "are used as inputs to the zic program"

Why did you put that in quotes? What could POSSIBLY be unclear about "used as inputs to the zic program"? Quotes were only used, because I copied them from your text, no extra meaning was intended.

...

The zic program runs, with a file such as northamerica as its input, and files such as America/New_York, America/Los_Angeles, and dozens of others are its output. Thanks.

-- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Alan Perry

9:08 p.m.

On 5/2/13 1:05 PM, random832@fastmail.us wrote:

...

On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...
Where is the set of what you call "timezone files" defined? We have already said multiple times that we are talking about the files which are used as inputs to the zic program.

They are not discussed in the Theory file. The format is described in the manpage for zic (the filename of this manpage is zic.8 - there may be a zic.8.txt in your distribution; I can't connect to the "ftp.iana.org" server you are using) zic.8.txt is in the 2013c distribution.

There are links to the latest tzcode and tzdata from http://www.iana.org/time-zones. alan

...

I cannot possibly believe that you did not know this; you seem to be being deliberately disingenuous.

random832＠fastmail.us

8:58 p.m.

New subject: [bug] phrasing in Theory

On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...

"The daylight saving time rules to be used for a particular time zone are encoded in the time zone file"

Should read: ...encoded in the time zone rule file. Since evidently certain people are unable to understand anything other than perfect pedantic precision in describing things.

Alan Perry

9:15 p.m.

New subject: [bug] phrasing in Theory

On 5/2/13 1:58 PM, random832@fastmail.us wrote:

...

On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...
"The daylight saving time rules to be used for a particular time zone are encoded in the time zone file" Should read: ...encoded in the time zone rule file.

Except that they are called 'time zone data files' in the tzcode README ...

...

Since evidently certain people are unable to understand anything other than perfect pedantic precision in describing things.

The documentation of the tz database sometimes uses inconsistent terminology and doesn't spell everything out like a formal standard or most industry specs would. Does it need that now? alan

John Haxby

8:36 a.m.

New subject: [bug] phrasing in Theory

On 02/05/13 22:15, Alan Perry wrote:

...

On 5/2/13 1:58 PM, random832@fastmail.us wrote:

...
On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...
"The daylight saving time rules to be used for a particular time zone are encoded in the time zone file" Should read: ...encoded in the time zone rule file.

Except that they are called 'time zone data files' in the tzcode README ...

...
Since evidently certain people are unable to understand anything other than perfect pedantic precision in describing things.

The documentation of the tz database sometimes uses inconsistent terminology and doesn't spell everything out like a formal standard or most industry specs would. Does it need that now?

I don't believe so. We've had one person who thought tabs were important whereas common practice (and the code) treats sequences of tabs and spaces as white space separators. And one person that enjoys winding everyone else up. I don't think we actually need to change anything at all with regard to this: the tzdata and tzcode function perfectly well. (I shall be disappointed if the usual critic doesn't complain about that.) jch

Tobias Conradi

11:07 a.m.

New subject: [bug] phrasing in Theory

On Fri, May 3, 2013 at 10:36 AM, John Haxby <john.haxby@oracle.com> wrote:

...

On 02/05/13 22:15, Alan Perry wrote:

...
On 5/2/13 1:58 PM, random832@fastmail.us wrote:

...
On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...
"The daylight saving time rules to be used for a particular time zone are encoded in the time zone file" Should read: ...encoded in the time zone rule file.

Except that they are called 'time zone data files' in the tzcode README ...

...
Since evidently certain people are unable to understand anything other than perfect pedantic precision in describing things.

The documentation of the tz database sometimes uses inconsistent terminology and doesn't spell everything out like a formal standard or most industry specs would. Does it need that now?

I don't believe so. Any reasoning?

...

We've had one person who thought tabs were important whereas common practice (and the code) treats sequences of tabs and spaces as white space separators. That sounds more as a reason to have better documentation.

...

I don't think we actually need to change anything at all with regard to this: the tzdata and tzcode function perfectly well. Having better documentation to allow people understanding the IANA time zone database better, wouldn't deteriorate that, would it?

-- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Ian Abbott

8:58 a.m.

New subject: [bug] phrasing in Theory

On 2013-05-02 22:15, Alan Perry wrote:

...

On 5/2/13 1:58 PM, random832@fastmail.us wrote:

...
On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...
"The daylight saving time rules to be used for a particular time zone are encoded in the time zone file" Should read: ...encoded in the time zone rule file.

Except that they are called 'time zone data files' in the tzcode README ...

...
Since evidently certain people are unable to understand anything other than perfect pedantic precision in describing things.

The documentation of the tz database sometimes uses inconsistent terminology and doesn't spell everything out like a formal standard or most industry specs would. Does it need that now?

It would be nice to have some consistency in terminology. Sometimes it's hard to tell immediately if a piece of documentation is talking about the zic input files or the output files (which *may* be a good thing as it forces you to think about it). Having some sort of glossary of terms and sticking to it would be a nice thing to have, but obviously not at all urgent. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-

Tobias Conradi

10:11 p.m.

New subject: [bug] phrasing in Theory

On Thu, May 2, 2013 at 10:58 PM, <random832@fastmail.us> wrote:

...

On Thu, May 2, 2013, at 15:46, Tobias Conradi wrote:

...
"The daylight saving time rules to be used for a particular time zone are encoded in the time zone file"

Should read: ...encoded in the time zone rule file. I suggest using A) plural, to make clear there is more than one such file, or B) "encoded in the corresponding time zone rule file"

...

Since evidently certain people are unable to understand anything other than perfect pedantic precision in describing things. And evidently the Google Chrome in-file search for "time zone rule file" does not find "time zone file".

-- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Guy Harris

9 p.m.

On May 2, 2013, at 12:46 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...

On Thu, May 2, 2013 at 6:45 AM, Guy Harris <guy@alum.mit.edu> wrote:

...
On May 1, 2013, at 9:09 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
I came from random832@fastmail.us:

"Tokens in the timezone files are separated by _any whitespace_." and re-used the any.

I wanted to say "And I assume not /every/ 'any white space' separates tokens."

I had in mind the spaces in the comment field:

ftp://ftp.iana.org/tz/data/zone.tab

"Rothera Station, Adelaide Island"

...which isn't a timezone file.

Where is the set of what you call "timezone files" defined?

ftp://ftp.iana.org/tz/code/Theory talks of "time zone rule files" and singular:

"The daylight saving time rules to be used for a particular time zone are encoded in the time zone file"

No "time zone files" or "timezone files" found.

OK, so I'll change it to "...which isn't a time zone rule file", so that I'm not using terms not used in Theory. (Note, of course, that Theory is not a document mentioned anywhere in http://tools.ietf.org/html/rfc6557 and not mentioned anywhere on http://www.iana.org/time-zones and not mentioned anywhere in in any of the man pages or in any of the tzdata files, so it's not a document with any official standing as the ultimate documentation of the time zone database.)

...

...
That file has a different syntax, as described in the leading comment:

# Columns are separated by a single tab. So, the benefits layed down at http://mm.icann.org/pipermail/tz/2013-May/019175.html and other unknown reason that lead to multiple white space in some files are out-weighted by something unknown for zone.tab? What could that be?

Well, this benefit It lets existing files (which have more than one tab in a row, at least for leading spaces) be read without having to reformat them. obviously doesn't even apply, because the zone.tab file's existing syntax is different, and if *it* were changed to allow multiple tabs between columns, that would mean that code that read *those* files wouldn't work on the new file. I.e., different files, different file formats, for better or worse. Changing file formats, and breaking code, just to make all file formats the same is likely to cause more problems than it solves. The others do apply, but, for better or worse, that's not the format that was chosen, and we're stuck with it.

Tobias Conradi

9:59 p.m.

On Thu, May 2, 2013 at 11:00 PM, Guy Harris <guy@alum.mit.edu> wrote:

...

OK, so I'll change it to "...which isn't a time zone rule file", so that I'm not using terms not used in Theory.

Thanks.

...

(Note, of course, that Theory is not a document mentioned anywhere in

http://tools.ietf.org/html/rfc6557

A lot of files are not mentioned there. Also note, that the RFC is labeled "Best Current Practice" and "BCP: 175" but diverged from pre-existing Theory and usage: "IANA and the tz database - diverging from Theory" http://mm.icann.org/pipermail/tz/2011-September/008879.html

...

and not mentioned anywhere on

http://www.iana.org/time-zones two clicks from there and I see the file name, one more click, I see the content (using Google Chrome which here allows http and ftp browsing)

...

and not mentioned anywhere in in any of the man pages or in any of the tzdata files, so it's not a document with any official standing as the ultimate documentation of the time zone database.)

Where is the "ultimate documentation of the time zone database"?

...

...
...
That file has a different syntax, as described in the leading comment:

# Columns are separated by a single tab. So, the benefits layed down at http://mm.icann.org/pipermail/tz/2013-May/019175.html and other unknown reason that lead to multiple white space in some files are out-weighted by something unknown for zone.tab? What could that be?

Well, this benefit

It lets existing files (which have more than one tab in a row, at least for leading spaces) be read without having to reformat them.

obviously doesn't even apply, because the zone.tab file's existing syntax is different, and if *it* were changed to allow multiple tabs between columns, that would mean that code that read *those* files wouldn't work on the new file.

I didn't understand that benefit at all.

...

The others do apply, but, for better or worse, that's not the format that was chosen, and we're stuck with it.

OK, that is the current reason. But what might have been the reason when zone.tab was established with single tab? The benefit of variable white space that I understand the best is "It lets the columns line up naturally for human reading." And this is broken if I browse to ftp://ftp.iana.org/tz/data/zone.tab Same-length tzids could fix it. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Guy Harris

10:38 p.m.

On May 2, 2013, at 2:59 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...

On Thu, May 2, 2013 at 11:00 PM, Guy Harris <guy@alum.mit.edu> wrote:

...
(Note, of course, that Theory is not a document mentioned anywhere in

http://tools.ietf.org/html/rfc6557 A lot of files are not mentioned there. Also note, that the RFC is labeled "Best Current Practice" and "BCP: 175" but diverged from pre-existing Theory and usage:

"IANA and the tz database - diverging from Theory" http://mm.icann.org/pipermail/tz/2011-September/008879.html

Then perhaps it's time to retire Theory in favor of RFC 6557.

...

...
and not mentioned anywhere on

http://www.iana.org/time-zones two clicks from there and I see the file name, one more click, I see the content (using Google Chrome which here allows http and ftp browsing)

"You can find it if you dig into the middle of the code directory" is not quite as much of a "mention" as, for example, "the theory behind the database is described in <a href="{some URL}">the Theory file</a>"

...

...
and not mentioned anywhere in in any of the man pages or in any of the tzdata files, so it's not a document with any official standing as the ultimate documentation of the time zone database.)

Where is the "ultimate documentation of the time zone database"?

Perhaps nowhere. Perhaps a number of places, such as the various man pages for the technical details of the format of time zone data files and of the binary files produced by zic, and RFC 6557 for policies.

...

...
Well, this benefit

It lets existing files (which have more than one tab in a row, at least for leading spaces) be read without having to reformat them.

obviously doesn't even apply, because the zone.tab file's existing syntax is different, and if *it* were changed to allow multiple tabs between columns, that would mean that code that read *those* files wouldn't work on the new file.

I didn't understand that benefit at all.

The benefit of

...

...
It lets existing files (which have more than one tab in a row, at least for leading spaces) be read without having to reformat them.

is "if we don't change zic to treat individual white-space characters as column separators, so that <TAB>A<TAB><TAB>B is viewed as three columns, one with the value "A", one blank, and one with the value "B", then we don't change zic in such a way that it gets very confused when it reads the existing time zone rule files, and thus we don't have to change all our files to the new format, *and* we don't have to force anybody who's created their own time zone rule files to change *their* files". That benefit obviously doesn't apply to the zone.tab file, as, in *that* file, individual white-space characters *are* column separators. If we were to allow arbitrary sequences of white-space characters to be column separators, we would, at minimum, break the tzselect.ksh script (I tried adding extra tabs and spaces to the America/Los_Angeles entry in that file and running tzselect.ksh, and it did *NOT* find that entry), and might break other code that uses it (the FreeBSD 9 sysinstall program seems to use it, as it doesn't just throw a bunch of zone names at you, and behaves at least somewhat like tzselect.ksh; surprisingly, the PC-BSD 9 configuration programs *don't* use it, they just throw you a bunch of zone names, although I think they at least *did* say "Pacific" for America/Los_Angeles, for the benefit of those of us about 570 km from Los Angeles).

...

...
The others do apply, but, for better or worse, that's not the format that was chosen, and we're stuck with it.

OK, that is the current reason. But what might have been the reason when zone.tab was established with single tab?

Code to parse it was a bit quicker to whip up with that limitation, given that it was probably not viewed by its creator as being as "core" to the time zone database as the time zone data files?

Tobias Conradi

11:06 p.m.

On Fri, May 3, 2013 at 12:38 AM, Guy Harris <guy@alum.mit.edu> wrote:

...

On May 2, 2013, at 2:59 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
"IANA and the tz database - diverging from Theory" http://mm.icann.org/pipermail/tz/2011-September/008879.html

Then perhaps it's time to retire Theory in favor of RFC 6557.

Or retire the RFC.

...

"You can find it if you dig into the middle of the code directory" is not quite as much of a "mention" as, for example, "the theory behind the database is described in <a href="{some URL}">the Theory file</a>"

My mistake. Agreed.

...

...
Where is the "ultimate documentation of the time zone database"?

Perhaps nowhere. Perhaps a number of places, such as the various man pages for the technical details of the format of time zone data files and of the binary files produced by zic, and RFC 6557 for policies.

The man pages cannot be accessed with a browser and cannot be html-href-linked, can they?

...

...
I didn't understand that benefit at all.

The benefit of

...
...
It lets existing files (which have more than one tab in a row, at least for leading spaces) be read without having to reformat them.

is "if we don't change zic to treat individual white-space characters as column separators, so that

<TAB>A<TAB><TAB>B

is viewed as three columns, one with the value "A", one blank, and one with the value "B", then we don't change zic in such a way that it gets very confused when it reads the existing time zone rule files, and thus we don't have to change all our files to the new format, *and* we don't have to force anybody who's created their own time zone rule files to change *their* files".

Now understood as path-dependency, no benefit at time of creation.

...

...
OK, that is the current reason. But what might have been the reason when zone.tab was established with single tab?

Code to parse it was a bit quicker to whip up with that limitation, given that it was probably not viewed by its creator as being as "core" to the time zone database as the time zone data files?

Might contradict the claim by random832 http://mm.icann.org/pipermail/tz/2013-May/019163.html 1) In most languages, splitting up by 'any whitespace' is the simplest thing in the world. 2) In C (where nothing is simple), you could reuse the code from zic itself. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Guy Harris

11:30 p.m.

On May 2, 2013, at 4:06 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...

On Fri, May 3, 2013 at 12:38 AM, Guy Harris <guy@alum.mit.edu> wrote:

...
On May 2, 2013, at 2:59 PM, Tobias Conradi <mail.2012@tobiasconradi.com> wrote:

...
"IANA and the tz database - diverging from Theory" http://mm.icann.org/pipermail/tz/2011-September/008879.html

Then perhaps it's time to retire Theory in favor of RFC 6557.

Or retire the RFC.

Possibly. My preference is to retire Theory.

...

...
...
Where is the "ultimate documentation of the time zone database"?

Perhaps nowhere. Perhaps a number of places, such as the various man pages for the technical details of the format of time zone data files and of the binary files produced by zic, and RFC 6557 for policies.

The man pages cannot be accessed with a browser and cannot be html-href-linked, can they?

They currently cannot be directly accessed from http://www.iana.org/time-zones although the versions for at least some flavors of UN*X *can* be, e.g. http://www.freebsd.org/cgi/man.cgi?query=zic&apropos=0&sektion=0&manpath=Fre... or https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPa... for example. There is, obviously, no *technical* reason why they *couldn't* be. I might suggest that the format of the time zone data files be published in an RFC, and that the format of the binary files produced by zic perhaps be treated as an implementation detail and left in the man page.

...

...
...
OK, that is the current reason. But what might have been the reason when zone.tab was established with single tab?

Code to parse it was a bit quicker to whip up with that limitation, given that it was probably not viewed by its creator as being as "core" to the time zone database as the time zone data files?

Might contradict the claim by random832 http://mm.icann.org/pipermail/tz/2013-May/019163.html 1) In most languages, splitting up by 'any whitespace' is the simplest thing in the world. 2) In C (where nothing is simple), you could reuse the code from zic itself.

Part of the problem with zone.tab is that "some whitespace" is allowed in the last field on the line, so, to allow arbitrary white space to separate columns, the parsing would have to treat whitespace as a field separator between: the country code column and the coordinates column; the coordinates column and the TZ column; the TZ column and the comments column; but not to treat white space *after* that point as a field separator - blanks, at least, in the comments column are part of the entry in that column. I'm not sure there's a quick-and-dirty way to tell Awk, for example, to do that (and "Awk" means "any of the versions of Awk found in the UN*Xes commonly available in late 1996", as that's when tzselect was written; its use of Korn-shell-isms was itself a source of controversy, at least recently).

John Hawkinson

12:35 a.m.

Guy Harris <guy@alum.mit.edu> wrote on Thu, 2 May 2013 at 16:30:20 -0700 in <1B3C2E16-89CD-4876-B8C4-969776CAD2C7@alum.mit.edu>:

...

I might suggest that the format of the time zone data files be published in an RFC, and that the format of the binary files produced by zic perhaps be treated as an implementation detail and left in the man page.

Why would we want the data files to be standardized rather than retaining local control and the ability to alter the format? Is there really some compelling reason for standardization that is worth the concomitant difficulty in changing the format in the future? There are not a lot of non-zic consumers out there. --jhawk@mit.edu John Hawkinson

Guy Harris

1:44 a.m.

On May 2, 2013, at 5:35 PM, John Hawkinson <jhawk@mit.edu> wrote:

...

Why would we want the data files to be standardized rather than retaining local control and the ability to alter the format? Is there really some compelling reason for standardization that is worth the concomitant difficulty in changing the format in the future?

There are not a lot of non-zic consumers out there.

There's at least one, which triggered this thread in the first place. In addition, not all zic consumers are the same; were we to change the format, and issued a new tzdata release with files in the new format, a system with an older version of zic that didn't understand the new format would have to either get compiled files built on a system with a new version of zic or would have to build a new version of zic to compile the source files. We should at least indicate what our policy is there, from "we reserve the right to change the format any time we want, deal with it" to something stricter. We should also make our current specification of the format a little easier to find, so that people developing consumers other than zic don't write parsers for what they *think* the format is rather than for what we *say* it is.

Robert Elz

1:32 p.m.

Date: Thu, 2 May 2013 18:44:55 -0700 From: Guy Harris <guy@alum.mit.edu> Message-ID: <8EF0A340-6E4C-4D70-ACA4-FEFC25B50D83@alum.mit.edu> | We should also make our current specification of the format a | little easier to find, so that people developing consumers other than | zic don't write parsers for what they *think* the format is rather | than for what we *say* it is. Actually, I suggest that we should really be discouraging people from creating alternate parsers for the zone input files than zic - those two should remain closely tied together. From time to time we discover the need to add some new feature to the input language, doing that is really hard if all kinds of other implementations will suddenly break. Further, the input file format is rather quirky, and hard to explain completely in a way that makes a lot of sense (though it is perfectly fine for zic). Rather we should be encouraging people to use the binary output files from zic for almost all purposes when they need something more than standard libc functions provide. And if that means documenting that format, more than is already done, then let's do it. It is already difficult to the point of almost impossibility to make much in the way of changes to that file format, as it is understood by system's libc functions, that we cannot alter - and even where libc is normally shared (and so can be updated if needed) nothing compels users to use the shared version - static program linking works on every system I'm aware of. If the binary format changes in some incompatible way, all old static linked programs would stop working, which is an unaceptable result (which is why we never do that, at most we have, very rarely, added to that format in a way that doesn't break existing parers). This does mean that people may need to use a new zic to handle new tzdata distributions, but that should be easy (if it isn't, we should be fixing whatever the problem is to make it easy.) Anyone who does insist on parsing the input files themselves should be made to understand the risks involved - we have in the past, and I am sure will again in the future, make changes to that format, with zero advance notice (beyond normal code review here). Of course, that we can change it does not mean that we must, or even should, and there's been nothing in this recent discussion that even suggests to me any need to make any kind of change. kre

random832＠fastmail.us

4:49 p.m.

On Fri, May 3, 2013, at 9:32, Robert Elz wrote:

...

Rather we should be encouraging people to use the binary output files from zic for almost all purposes when they need something more than standard libc functions provide. And if that means documenting that format, more than is already done, then let's do it. It is already difficult to the point of almost impossibility to make much in the way of changes to that file format,

Actually, it is extensible by A) adding new data to the end of the existing data and/or B) altering the 15 bytes of reserved fields available in the headers. It would be necessary to determine if there are any existing libc implementations that would choke on such a modification, of course. I've been sitting on an idea for a proposal to extend the format, actually, to support localized timezone names and/or keys for looking up such localized information in a database such as CLDR. The main purpose of such a change would be A) to allow for programs to support localized information without having to parse XML and B) to remove the need to immediately update and deploy a new version of CLDR if a new zone is added or changes to a different named timezone (e.g. timezones in indiana moving from eastern to central a few years ago)

Guy Harris

5:05 p.m.

On May 3, 2013, at 9:49 AM, random832@fastmail.us wrote:

...

I've been sitting on an idea for a proposal to extend the format, actually, to support localized timezone names and/or keys for looking up such localized information in a database such as CLDR.

Keys such as, say, "metazone names"? If we could indicate what metazone a time zone is in for a given date/time range, that might let the CLDR not keep its own list of moves of time zones between metazones (see common/supplemental/metaZones.xml in the CLDR). I'd personally prefer that the tz database *not* provide localized time zone names; I see that as yet another opportunity for tons of complaints about our choices.

...

and B) to remove the need to immediately update and deploy a new version of CLDR if a new zone is added or changes to a different named timezone (e.g. timezones in indiana moving from eastern to central a few years ago)

Yes, I'd like to see the CLDR have the job of giving information such as localized time zone names for metazones and abbreviations, and *not* have to care about America/Whereever waking up some day and deciding to move from Eastern Time to Central Time - the TZ database should indicate that the zone in question moved from metazone America_Eastern to metazone America_Central.

random832＠fastmail.us

5:26 p.m.

On Fri, May 3, 2013, at 13:05, Guy Harris wrote:

...

On May 3, 2013, at 9:49 AM, random832@fastmail.us wrote:

...
I've been sitting on an idea for a proposal to extend the format, actually, to support localized timezone names and/or keys for looking up such localized information in a database such as CLDR.

Keys such as, say, "metazone names"?

If we could indicate what metazone a time zone is in for a given date/time range, that might let the CLDR not keep its own list of moves of time zones between metazones (see common/supplemental/metaZones.xml in the CLDR).

I'd personally prefer that the tz database *not* provide localized time zone names;

I do agree that CLDR should be who actually maintains the names, but I do think tzcode should provide reference code for _using_ them from C, possibly including a tool to convert them to a non-XML format (which should probably be packaged in CLDR's tools, rather than in tzcode - this might need some coordination between the two projects) Likely the metazone name would live inside the tz file, and the localized names would exist outside of it.

Guy Harris

5:34 p.m.

On May 3, 2013, at 6:32 AM, Robert Elz <kre@munnari.oz.au> wrote:

...

Rather we should be encouraging people to use the binary output files from zic for almost all purposes when they need something more than standard libc functions provide. And if that means documenting that format, more than is already done, then let's do it.

Do we need more than tzfile(5), which we already have? ("More" over and above "making HTMLified man pages available from http://www.iana.org/time-zones or a page under it", which is something I think wout be a good thing to do.)

...

It is already difficult to the point of almost impossibility to make much in the way of changes to that file format, as it is understood by system's libc functions, that we cannot alter - and even where libc is normally shared (and so can be updated if needed) nothing compels users to use the shared version - static program linking works on every system I'm aware of.

$ cat static.c #include <stdio.h> #include <time.h> int main(void) { time_t now = time(NULL); printf("%s", ctime(&now)); return 0; } $ gcc -static -o static static.c ld: library not found for -lcrt0.o collect2: ld returned 1 exit status and it's $ uname -sr Darwin 12.2.1 better known as $ sw_vers ProductName: Mac OS X ProductVersion: 10.8.2 BuildVersion: 12C3012 And also: $ gcc -static -o static static.c ld: fatal: library -lc: not found ld: fatal: file processing errors. No output written to static collect2: ld returned 1 exit status and it's $ uname -sr SunOS 5.11 better known as "the OS component of Solaris 11". And, no, as far as I know clang won't help on OS X and Sun C^W^WOracle Studio won't help on Solaris. Both OSes have an explicit policy of *disallowing* static linking, so that they can change implementations of system APIs under the hood *without* having to worry about out-of-date implementations being built into statically-linked binaries. (And, yes, even process 1 runs dynamically-linked code.) However, there are *other* UN*Xes where static linking *is* allowed, so we still have to worry about that. As random832@fastmail.us suggested, we can add new information to the end. There's also a version field in the header, and the only check currently done by our localtime.c is to see whether it's '\0' or not; the "new format", with 64-bit time support, has '2' in that field, and, unless some other reader is out there, we could set it to '3' for a new version with additional information at the end (and maybe some information in the reserved field if necessary).

Paul Eggert

2:15 p.m.

On 05/02/2013 08:44 PM, Guy Harris wrote:

...

We should at least indicate what our policy is there, from "we reserve the right to change the format any time we want, deal with it" to something stricter.

Suppose we want to extend the zic input format slightly, to accommodate new rules issued by (say) Palestine, which are of the form "The first Friday after the last Thursday in March". Shouldn't we be able to do this? Or, suppose we want to extend the zic output format slightly, to support the rules that are actually used in Greenland right now, past 2038. Shouldn't we be able to do that too? These aren't entirely hypothetical questions, as both issues have crossed my mind in the past few weeks. Obviously we don't want to change zic's input or (especially) output format without good reason, but whatever advice we put into place should allow for changes where the bottom line is "deal with it". Perhaps something like the following? "The code is written in C, and attempts to be portable to a wide variety of systems. The data accepted and produced is also intended to be widely portable. To encourage interoperability with other systems that produce and consume this data, the data format is intended to be stable and changes to it will be made carefully, with the intent that they be upwards compatible as much as possible." This is just a quick first cut and suggestions for improvement are welcome. PS. I didn't quite follow all the back-and-forth about the "Theory" file versus the RFC, but the idea is that "Theory" predates the RFC and notes down practical aspects of tz maintenance. If they weren't deemed important enough to put into the RFC we didn't put them there. Some issues are so trivial that they aren't written down even in "Theory", and that should be fine too. PPS. I'll try to clarify the tabs-in-zone.tab issue by changing "Columns are separated by a single tab" to "Columns are separated by a single tab, except that the Comments column may be preceded by more than one tab" in zone.tab.

Tobias Conradi

6 p.m.

On Fri, May 3, 2013 at 4:15 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:

...

PPS. I'll try to clarify the tabs-in-zone.tab issue by changing "Columns are separated by a single tab" to "Columns are separated by a single tab, except that the Comments column may be preceded by more than one tab" in zone.tab.

That may break processes that rely on single tab in zone.tab. I suggest keeping the two tab files (zone.tab and iso3166.tab) with single tab separators. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Paul Eggert

7:24 p.m.

On 05/03/2013 11:00 AM, Tobias Conradi wrote:

...

That may break processes that rely on single tab in zone.tab.

? Any such processes are already broken, as zone.tab has had its current form for several years. All I'm trying to do is document what's already there. Another way to put it, and perhaps a better way, is to say that there are three columns of data separated by tabs, and that a row of data may be followed by a tab and then arbitrary text all of which is ignored.

Tobias Conradi

7:34 p.m.

On Sat, May 4, 2013 at 9:24 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:

...

On 05/03/2013 11:00 AM, Tobias Conradi wrote:

...
That may break processes that rely on single tab in zone.tab.

? Any such processes are already broken, as zone.tab has had its current form for several years. Not broken since you fixed a double tab

https://github.com/eggert/tz/commit/c0f1e5d9d6c7959f73506e992a0e3ba2b0c5c5e8... -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com

Antoine Leca

8:06 a.m.

Paul Eggert wrote:

...

Another way to put it, and perhaps a better way, is to say that there are three columns of data separated by tabs, and that a row of data may be followed by a tab and then arbitrary text all of which is ignored. Better indeed; but then it does not closely match the description just above, which explains there are columns numbered from 1 to 4, the last of which is _conditional_ (not merely optional) based on the fact the "country" has multiple rows or not.

PS: I did not check whether that condition is checked or even if the current content complies. Antoine

Paul Eggert

1:58 p.m.

On 05/08/2013 01:06 AM, Antoine Leca wrote:

...

Better indeed; but then it does not closely match the description just above

OK, thanks, how about this instead? # Columns are separated by a single tab. # The last column may be followed by a tab and arbitrary commentary # (which may include tabs).

random832＠fastmail.us

4:36 p.m.

On Thu, May 2, 2013, at 21:44, Guy Harris wrote:

...

We should at least indicate what our policy is there, from "we reserve the right to change the format any time we want, deal with it" to something stricter.

We should also make our current specification of the format a little easier to find, so that people developing consumers other than zic don't write parsers for what they *think* the format is rather than for what we *say* it is.

It's currently documented in zic.8 - maybe you're right that there should be a separate specification for the format. I think that in general there has been some confusion about what documents have "official standing" as documentation for the tz project (either at an IANA level or at a "what the coordinator is going to do, where the IANA gives him authority" level) - what's in the RFC alone does not cover all the topics mentioned in zic.8 or the manpages. The RFC itself is very inadequate to the task. It does not once define what "the TZ database" actually contains, or the format of any files within. To the extent that it defines it by reference, one could interpret it as giving official standing to all of the documentation files within the project. Or maybe all that authority is delegated to the TZ coordinator.

Russ Allbery

4:46 p.m.

random832@fastmail.us writes:

...

Or maybe all that authority is delegated to the TZ coordinator.

I suspect this was the intention. That's the way that the TZ project had always worked since its inception, and one of the goals when moving this process under the partial aegis of of IANA was to change as little as possible. After all, the existing process had been working fine for many years. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>

random832＠fastmail.us

4:54 p.m.

On Fri, May 3, 2013, at 12:46, Russ Allbery wrote:

...

random832@fastmail.us writes:

...
Or maybe all that authority is delegated to the TZ coordinator.

I suspect this was the intention. That's the way that the TZ project had always worked since its inception, and one of the goals when moving this process under the partial aegis of of IANA was to change as little as possible. After all, the existing process had been working fine for many years.

I guess my main objection is to the recently suggested idea that documentation that has always been part of the distribution and which has no adequate formally-IANA-approved replacement somehow lacks "official standing". The solution is to either find a way to prove it has always had official standing (the TZ coordinator has continued including these files in the release), or to invent a process by which to give it official standing (maybe with a quality review and rewrites/reorganization somewhere in that process)

Guy Harris

5:50 p.m.

On May 3, 2013, at 9:54 AM, random832@fastmail.us wrote:

...

I guess my main objection is to the recently suggested idea that documentation that has always been part of the distribution and which has no adequate formally-IANA-approved replacement somehow lacks "official standing".

Well, perhaps you're referring to somebody else's suggested idea, but I think it *does* lack "any official standing as *the ultimate* documentation of the time zone database", to use my phrase; it's not as if everything has to be mentioned in Theory, including every term ever used to refer to files in the database, in order to have any validity in any discussion of the database. Theory does *not* describe the syntax of the time zone data files, nor does it describe the output files generated by zic. There *is* no single file that is "the ultimate documentation of the time zone database"; there are several. In addition, as has been noted, RFC 6557 has its own notes on "criteria for updates to the database".

...

The solution is to either find a way to prove it has always had official standing (the TZ coordinator has continued including these files in the release), or to invent a process by which to give it official standing (maybe with a quality review and rewrites/reorganization somewhere in that process)

I'd like to see some official indication as to what file or files are the documentation. Whether that involves making a new file, or editing Theory, or splitting it into multiple documents, or moving some or all of the documentation into RFCs, is another matter.

Paul Eggert

7:37 p.m.

On 05/03/2013 10:50 AM, Guy Harris wrote:

...

I'd like to see some official indication as to what file or files are the documentation.

I'm afraid I've lost context and I'm not sure what is being asked for here, but all the files in the tz distribution (both code and data) contain documentation.

random832＠fastmail.us

4:21 p.m.

On Thu, May 2, 2013, at 17:00, Guy Harris wrote:

...

and not mentioned anywhere in in any of the man pages or in any of the tzdata files, so it's not a document with any official standing as the ultimate documentation of the time zone database.)

He keeps citing it because I brought it up last time we were talking about his ridiculous proposed abbreviations, because it _is_ the only place the policy on time zone abbreviations is documented.

Random832

11:35 a.m.

On 05/02/2013 12:09 AM, Tobias Conradi wrote:

...

I had in mind the spaces in the comment field:

ftp://ftp.iana.org/tz/data/zone.tab

"Rothera Station, Adelaide Island"

But since token has not been defined by random832 nor is the word at all contained Theory, I might be wrong. zone.tab is not an input to zic and is not in the format I was discussing.

4777

Age (days ago)

4784

Last active (days ago)

List overview

Download

62 comments

18 participants

participants (18)

Alan Perry
Antoine Leca
Bennett Todd
Clive D.W. Feather
David Muir Sharnoff
Guy Harris
Ian Abbott
James Cloos
John Hawkinson
John Haxby
lord.buddha＠gmail.com
Paul Eggert
Paul_Koning＠Dell.com
Random832
random832＠fastmail.us
Robert Elz
Russ Allbery
Tobias Conradi

tabs vs spaces

David Muir Sharnoff

random832＠fastmail.us

random832＠fastmail.us

Russ Allbery

random832＠fastmail.us

random832＠fastmail.us

random832＠fastmail.us

random832＠fastmail.us

John Hawkinson

random832＠fastmail.us

random832＠fastmail.us

random832＠fastmail.us

Russ Allbery

random832＠fastmail.us

random832＠fastmail.us

Random832

tags

participants (18)