Tab/Space Formatting Inconsistency - Asia Ver 2023c
Hello, While writing software to parse through the database, I found the following inconsistencies in the "asia" file 1) Line 3463 to 3569 - Rule should be followed by a tab instead of a space 2) Line 3587 ,3589,3591,3593 - STDOFF should be followed by a tab instead of space 3) Line 3096 - STDOFF should be followed by a tab instead of space Gary Cho
You might want to read the zic man page, which describes the format of the zic input files (file asia is one of them)
Input lines are made up of fields. Fields are separated from one another by one or more white space characters. The white space characters are space, form feed, car‐ riage return, newline, tab, and vertical tab. Leading and trailing white space on input lines is ignored. An unquoted sharp character (#) in the input introduces a comment which extends to the end of the line the sharp character appears on. White space characters and sharp characters may be enclosed in double quotes (") if they're to be used as part of a field. Any line that is blank (after comment stripping) is ignored. Nonblank lines are expected to be of one of three types: rule lines, zone lines, and link lines.
There is no rule about what kind of white space is to be used between fields. If you write a parser, it has to be able to deal with all kinds of white space. If you want a more machine readable format, try tzdata.zi which is the output of zishrink.awk If you rely on the pre-formatting utilities which are parts of the project, your job becomes much easier. On 07.12.23 15:31, Gary Cho via tz wrote:
Hello,
While writing software to parse through the database, I found the following inconsistencies in the "asia" file
1) Line 3463 to 3569 - Rule should be followed by a tab instead of a space 2) Line 3587 ,3589,3591,3593 - STDOFF should be followed by a tab instead of space 3) Line 3096 - STDOFF should be followed by a tab instead of space
Gary Cho
On 12/7/23 06:31, Gary Cho via tz wrote:
1) Line 3463 to 3569 - Rule should be followed by a tab instead of a space 2) Line 3587 ,3589,3591,3593 - STDOFF should be followed by a tab instead of space 3) Line 3096 - STDOFF should be followed by a tab instead of space
As Alois writes, this is merely a visual appearance issue; it doesn't affect the meaning of the data. Spaces are sometimes used instead of tabs, when using a tab would cause columns or lines to not fit. For example, consider the line: Rule EgyptAsia 1957 only - May 10 0:00 1:00 S where the first separator is a space but the others are tabs. If the space were changed to tab, the third text column would shift right from character column 17 to character column 25. The remaining text columns would all shift right too, and the resulting line would be 81 character columns rather than its current 73. We prefer data to be 80 character columns or less for the usual reasons; see, for example <https://www.emacswiki.org/emacs/EightyColumnRule>. So we use a space instead of a tab there.
On 12/12/23 17:25:15, Paul Eggert via tz wrote:
... As Alois writes, this is merely a visual appearance issue; it doesn't affect the meaning of the data. .
Sometimes it matters. "make" requires build rules to be introduced by tabs. not spaces. And I know a couple lexical analyzers which recognize spaces but not tabs as separating tokens.
Spaces are sometimes used instead of tabs, when using a tab would cause columns or lines to not fit. For example, consider the line: ... Rule EgyptAsia 1957 only - May 10 0:00 1:00 S ... We prefer data to be 80 character columns or less for the usual reasons; see, for example <https://www.emacswiki.org/emacs/EightyColumnRule>. So we use a space instead of a tab there. > . Which says, “Thou shalt not cross 80 columns in thy file” originated from IBM 80 column punch cards, ...
On IBM 80 column punch cards, the tab, 0x05 (rarely used), occupies only a single column, regardless of its appearance on output devices. -- gil
On 12/12/23 17:46, Paul Gilmartin via tz wrote:
On IBM 80 column punch cards, the tab, 0x05 (rarely used), occupies only a single column, regardless of its appearance on output devices.
0x05 was rarely used because columns were aligned without using tab characters. The IBM 029 card punch[1] had a program drum, which let the operator set the equivalent of tab stops at whatever columns were needed. Tab stops didn't need to be every 8 columns, and didn't even need to be at regular intervals. Pressing the SKIP key would cause the card punch to skip ahead to the next tab stop, at a marvelous rate of 80 columns per second. The resulting data therefore did not contain tab characters and did not record where the tab stops were. We could get a similar effect by expanding all tabs to spaces in the TZDB source code. Some projects do that - partly to avoid distractions like this email thread! [1]: https://www.masswerk.at/keypunch/manuals/IBM029-GA24-3332-6_Reference_Manual...
Paul Eggert wrote:
We could get a similar effect by expanding all tabs to spaces in the TZDB source code. Some projects do that - partly to avoid distractions like this email thread!
I promise you that for at least some readers, the minutiae of text file formats and conventions is not a greater distraction than the minutiae of C language and POSIX versions which so often dominates this list. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
On 2023-12-12 18:46, Paul Gilmartin via tz wrote:
On 12/12/23 17:25:15, Paul Eggert via tz wrote:
... As Alois writes, this is merely a visual appearance issue; it doesn't affect the meaning of the data.
Sometimes it matters. "make" requires build rules to be introduced by tabs. not spaces. And I know a couple lexical analyzers which recognize spaces but not tabs as separating tokens.
Spaces are sometimes used instead of tabs, when using a tab would cause columns or lines to not fit. For example, consider the line: ... Rule EgyptAsia 1957 only - May 10 0:00 1:00 S ... We prefer data to be 80 character columns or less for the usual reasons; see, for example <https://www.emacswiki.org/emacs/EightyColumnRule>. So we use a space instead of a tab there.
Which says, “Thou shalt not cross 80 columns in thy file” originated from IBM 80 column punch cards, ... On IBM 80 column punch cards, the tab, 0x05 (rarely used), occupies only a single column, regardless of its appearance on output devices.
ITYM ...tab, 12-9-5 (EBCDIC 0x05 - rarely used),... ...I would go so far as to say never used except perhaps internally in software. -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
On Dec 12, 2023, at 5:46 PM, Paul Gilmartin via tz <tz@iana.org> wrote:
On 12/12/23 17:25:15, Paul Eggert via tz wrote:
... As Alois writes, this is merely a visual appearance issue; it doesn't affect the meaning of the data. . Sometimes it matters. "make" requires build rules to be introduced by tabs. not spaces.
https://beebo.org/haycorn/2015-04-20_tabs-and-makefiles.html quotes a 2015 email from Stu Feldman, author of "make": "I used tabs because I was trying to use Lex (still in first version) and had trouble with some other patterns. (Make was written over a weekend, rewritten the next weekend ...) So I gave up on being smart and just used a fixed pattern (^\t) to indicate rules. Within a few weeks of writing Make, I already had a dozen friends who were using it. So even though I knew that "tab in column 1" was a bad idea, I didn't want to disrupt my user base. So instead I wrought havoc on tens of millions. I have used that example in software engineering lectures. Side note: I was awarded the ACM Software Systems Award for Make a decade ago. In my one minute talk on stage, I began "I would like to apologize". The audience then split in two - half started laughing, the other half looked at the laughers. A perfect bipartite graph of programmers and non-programmers." I don't know whether he's blaming the first version of Lex for making it too hard to figure out how to parse Makefiles without requiring tabs, but, if so, he might have been laying some of the blame at the feet of his employer's executive chairman: https://www.cs.utexas.edu/users/novak/lexpaper.htm
And I know a couple lexical analyzers which recognize spaces but not tabs as separating tokens.
Not ideal, unless there's a compelling UI reason for that.
Which says, “Thou shalt not cross 80 columns in thy file” originated from IBM 80 column punch cards, ...
And for the history of *that*, at least according to Wikipedia, see https://en.wikipedia.org/wiki/Punched_card#Formats. (And, yes, my terminal windows default to 80 columns on all UN*X platforms, following the physical terminals I used before that, the 80-column screens most of them having used being due, as far as I know, to the same 80-column punched cards. Thank you, Mr. Clair D. Lake.)
participants (7)
-
Alois Treindl -
Brian Inglis -
Doug Ewell -
Gary Cho -
Guy Harris -
Paul Eggert -
Paul Gilmartin