I'm assuming that "Standard byte order" referred to over and over in the docs means Most Significant Byte first? It should really be documented that way. As far as I know, there is no standard byte order.
Thom Hehl wrote:
I'm assuming that "Standard byte order" referred to over and over in the docs means Most Significant Byte first? ... It should really be documented that way.
It is, and it is. From the first paragraph of description in tzfile(5): ... written in a ``standard'' byte order (the high-order byte of the value is written first). -zefram
On 2011/11/04 02:36 PM, Zefram wrote:
Thom Hehl wrote:
I'm assuming that "Standard byte order" referred to over and over in the docs means Most Significant Byte first? ... It should really be documented that way.
It is, and it is. From the first paragraph of description in tzfile(5):
... written in a ``standard'' byte order (the high-order byte of the value is written first).
-zefram
Okay, the high-order byte is written first, but it doesn't say anything about the ordering of the remaining bytes of the four- or eight-byte values. (PDP-endian, anybody?) ;-) -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
On 2011/11/04 04:23 PM, Ian Abbott wrote:
Okay, the high-order byte is written first, but it doesn't say anything about the ordering of the remaining bytes of the four- or eight-byte values. (PDP-endian, anybody?) ;-)
Except that PDP-endian wouldn't have the high-order byte first, it would have one of the middle bytes (bits 23..16 for a four-byte value) first. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
On 04-Nov-2011, Ian Abbott wrote:
On 2011/11/04 04:23 PM, Ian Abbott wrote:
Okay, the high-order byte is written first, but it doesn't say anything about the ordering of the remaining bytes of the four- or eight-byte values. (PDP-endian, anybody?) ;-)
Except that PDP-endian wouldn't have the high-order byte first, it would have one of the middle bytes (bits 23..16 for a four-byte value) first.
Yes, in PDP-11 storage, the lowest byte had the lowest memory address, but they were often written with the bytes swapped within each "word" (16 bits), so that a 32-bit value 0x76543210 would have bytes in the order 0x32 0x10 0x76 0x54. There were other permutations, too. Perhaps the documentation should say something like: For the 16-bit value 0x3210, the bytes are 0x32 followed by 0x10. For the 32-bit value 0x76543210, the bytes are in the order 0x76 0x54 0x32 0x10 (is that the case?) For the 64-bit value 0xFEDCBA9876543210, ... For character data, the bytes are written leftmost character first, and sequentially as one would read them left to right in English, and terminated with a null byte. Dave C.
On Nov 4, 2011, at 10:27 AM, Dave Cantor wrote:
Perhaps the documentation should say something like:
For the 16-bit value 0x3210, the bytes are 0x32 followed by 0x10. For the 32-bit value 0x76543210, the bytes are in the order 0x76 0x54 0x32 0x10 (is that the case?)
Yes, because the data are written in big-endian format, and that's how big-endian format works. No need to worry about whether anything is in PDP-endian order, as it's not. The documentation should use the term "big-endian", for the benefit of those who know what it means, and perhaps give details, for the benefit of those who don't.
For the 64-bit value 0xFEDCBA9876543210, ...
0xFE 0xDC 0xBA 0x98 0x76 0x54 0x32 0x10
For character data, the bytes are written leftmost character first, and sequentially as one would read them left to right in English,
"Leftmost" in what sense? If all strings in a time zone data file are required to be ASCII, then the "logical order" and display order as per http://unicode.org/reports/tr9/ are the same. If not all strings in a time zone data file are required to be ASCII, then we should perhaps specify that they're UTF-8. In the case of non-ASCII strings, they could potentially contain, for example, Arabic/Hebrew/Farsi/etc. text, in which case the "leftmost" character on the screen would, as I understand it, *not* necessarily be the first character in the string. I think the display order is Not Our Problem, especially if we specify that strings are UTF-8 (so that they're Unicode and the Unicode Bidirectional Algorithm applies), so "left" and "right" need not and should not be used. We *should*, however, specify what encoding is used - ASCII, meaning "any byte with the 8th bit set is an error" and "the national-use positions of ISO 646 have the US characters", or "all strings are UTF-8", or something else.
and terminated with a null byte.
On Nov 4, 2011, at 1:03 PM, Guy Harris wrote:
We *should*, however, specify what encoding is used - ASCII, meaning "any byte with the 8th bit set is an error" and "the national-use positions of ISO 646 have the US characters", or "all strings are UTF-8", or something else.
(...and given the "@iana.org" in the Cc: header, "something else" probably wouldn't be considered an acceptable answer.)
Guy Harris wrote:
If not all strings in a time zone data file are required to be ASCII, then we should perhaps specify that they're UTF-8.
They are in fact all ASCII, and we have a maintenance policy that the abbreviations should always be only ASCII (see the "Time zone abbreviations" section in Theory). This is indeed an omission from tzfile(5). I think we should document that they are strictly printable ASCII. We have no need to accommodate free text, and so can get away without incurring the complications of Unicode. -zefram
On 04-Nov-2011, Guy Harris wrote:
On Nov 4, 2011, at 10:27 AM, Dave Cantor wrote:
Perhaps the documentation should say something like:
For the 16-bit value 0x3210, the bytes are 0x32 followed by 0x10. For the 32-bit value 0x76543210, the bytes are in the order 0x76 0x54 0x32 0x10 (is that the case?)
Yes, because the data are written in big-endian format, and that's how big-endian format works. No need to worry about whether anything is in PDP-endian order, as it's not.
Yes, I was just amplifying on what that format looks like _sometimes_. I agree that it doesn't come into play here.
The documentation should use the term "big-endian", for the benefit of those who know what it means, and perhaps give details, for the benefit of those who don't.
Yes, that's a good idea.
For the 64-bit value 0xFEDCBA9876543210, ...
0xFE 0xDC 0xBA 0x98 0x76 0x54 0x32 0x10
For character data, the bytes are written leftmost character first, and sequentially as one would read them left to right in English,
"Leftmost" in what sense?
Leftmost in the sense of reading them (or writing them) in English; of course, I agree that if they were Hebrew characters, you'd expect the first character to be the rightmost and proceed leftward as they would be written. Do we have character strings that are not in English? Don't we use the English names for place names (zone names)?
If all strings in a time zone data file are required to be ASCII, then the "logical order" and display order as per
http://unicode.org/reports/tr9/
are the same.
If not all strings in a time zone data file are required to be ASCII, then we should perhaps specify that they're UTF-8. In the case of non-ASCII strings, they could potentially contain, for example, Arabic/Hebrew/Farsi/etc. text, in which case the "leftmost" character on the screen would, as I understand it, *not* necessarily be the first character in the string.
I think the display order is Not Our Problem, especially if we specify that strings are UTF-8 (so that they're Unicode and the Unicode Bidirectional Algorithm applies), so "left" and "right" need not and should not be used.
We *should*, however, specify what encoding is used - ASCII, meaning "any byte with the 8th bit set is an error" and "the national-use positions of ISO 646 have the US characters", or "all strings are UTF-8", or something else.
I agree with all that.
and terminated with a null byte.
Dave C.
On Fri, Nov 4, 2011 at 10:27 AM, Thom Hehl <Thom@pointsix.com> wrote:
I’m assuming that “Standard byte order” referred to over and over in the docs means Most Significant Byte first?
It should really be documented that way. As far as I know, there is no standard byte order.
Well, of course, actually there are two. And that's why it needs to be specified. Regards Marshall
LOL. Two standards. That's a good one, Marshall. I did see the bit about high-order byte, but had never heard that term before, but Google shows it is used. -----Original Message----- From: Marshall Eubanks [mailto:marshall.eubanks@gmail.com] Sent: Friday, November 04, 2011 10:39 AM To: Thom Hehl Cc: tz@iana.org Subject: Re: [tz] "Standard byte order" On Fri, Nov 4, 2011 at 10:27 AM, Thom Hehl <Thom@pointsix.com> wrote:
I'm assuming that "Standard byte order" referred to over and over in the docs means Most Significant Byte first?
It should really be documented that way. As far as I know, there is no standard byte order.
Well, of course, actually there are two. And that's why it needs to be specified. Regards Marshall
On Fri, Nov 4, 2011 at 10:40 AM, Thom Hehl <Thom@pointsix.com> wrote:
LOL. Two standards. That's a good one, Marshall.
I did see the bit about high-order byte, but had never heard that term before, but Google shows it is used.
In the IETF the terms are generally "Little Endian" and "Big Endian". This appears to be big endian. Regards Marshall
-----Original Message----- From: Marshall Eubanks [mailto:marshall.eubanks@gmail.com] Sent: Friday, November 04, 2011 10:39 AM To: Thom Hehl Cc: tz@iana.org Subject: Re: [tz] "Standard byte order"
On Fri, Nov 4, 2011 at 10:27 AM, Thom Hehl <Thom@pointsix.com> wrote:
I'm assuming that "Standard byte order" referred to over and over in the docs means Most Significant Byte first?
It should really be documented that way. As far as I know, there is no standard byte order.
Well, of course, actually there are two. And that's why it needs to be specified.
Regards Marshall
Le vendredi 04 novembre 2011 à 10:27 -0400, Thom Hehl a écrit :
I’m assuming that “Standard byte order” referred to over and over in the docs means Most Significant Byte first?
It should really be documented that way. As far as I know, there is no standard byte order.
But there is a "Network Byte Order" which is Big Endian (Most Significant Byte) http://en.wikipedia.org/wiki/Endianness#Endianness_in_networking -- Yann Droneaud
Absolutely. So let's say that in the docs. -----Original Message----- From: Yann Droneaud [mailto:yann@droneaud.fr] Sent: Friday, November 04, 2011 11:16 AM To: Thom Hehl Cc: tz@iana.org Subject: Re: [tz] "Standard byte order" Le vendredi 04 novembre 2011 à 10:27 -0400, Thom Hehl a écrit :
I’m assuming that “Standard byte order” referred to over and over in the docs means Most Significant Byte first?
It should really be documented that way. As far as I know, there is no standard byte order.
But there is a "Network Byte Order" which is Big Endian (Most Significant Byte) http://en.wikipedia.org/wiki/Endianness#Endianness_in_networking -- Yann Droneaud
The nice thing about standards is that there are so many of them to choose from. -Andrew S. Tanenbaum sometimes we get to choose from multiple wall clock times as well. :) -mld
participants (9)
-
Dave Cantor -
Eliot Lear -
Guy Harris -
Ian Abbott -
Luther Ma -
Marshall Eubanks -
Thom Hehl -
Yann Droneaud -
Zefram