On 04-Nov-2011, Guy Harris wrote:
On Nov 4, 2011, at 10:27 AM, Dave Cantor wrote:
Perhaps the documentation should say something like:
For the 16-bit value 0x3210, the bytes are 0x32 followed by 0x10. For the 32-bit value 0x76543210, the bytes are in the order 0x76 0x54 0x32 0x10 (is that the case?)
Yes, because the data are written in big-endian format, and that's how big-endian format works. No need to worry about whether anything is in PDP-endian order, as it's not.
Yes, I was just amplifying on what that format looks like _sometimes_. I agree that it doesn't come into play here.
The documentation should use the term "big-endian", for the benefit of those who know what it means, and perhaps give details, for the benefit of those who don't.
Yes, that's a good idea.
For the 64-bit value 0xFEDCBA9876543210, ...
0xFE 0xDC 0xBA 0x98 0x76 0x54 0x32 0x10
For character data, the bytes are written leftmost character first, and sequentially as one would read them left to right in English,
"Leftmost" in what sense?
Leftmost in the sense of reading them (or writing them) in English; of course, I agree that if they were Hebrew characters, you'd expect the first character to be the rightmost and proceed leftward as they would be written. Do we have character strings that are not in English? Don't we use the English names for place names (zone names)?
If all strings in a time zone data file are required to be ASCII, then the "logical order" and display order as per
http://unicode.org/reports/tr9/
are the same.
If not all strings in a time zone data file are required to be ASCII, then we should perhaps specify that they're UTF-8. In the case of non-ASCII strings, they could potentially contain, for example, Arabic/Hebrew/Farsi/etc. text, in which case the "leftmost" character on the screen would, as I understand it, *not* necessarily be the first character in the string.
I think the display order is Not Our Problem, especially if we specify that strings are UTF-8 (so that they're Unicode and the Unicode Bidirectional Algorithm applies), so "left" and "right" need not and should not be used.
We *should*, however, specify what encoding is used - ASCII, meaning "any byte with the 8th bit set is an error" and "the national-use positions of ISO 646 have the US characters", or "all strings are UTF-8", or something else.
I agree with all that.
and terminated with a null byte.
Dave C.