Re: [tz] "Standard byte order"

Nov. 4, 2011


      On 04-Nov-2011, Guy Harris wrote:
...
On Nov 4, 2011, at 10:27 AM, Dave Cantor wrote:
...
Perhaps the documentation should say something like:
For the 16-bit value 0x3210, the bytes are 0x32 followed by
0x10. For the 32-bit value 0x76543210, the bytes are in the
order
  0x76 0x54 0x32 0x10    (is that the case?)
Yes, because the data are written in big-endian format, and that's
how big-endian format works.  No need to worry about whether
anything is in PDP-endian order, as it's not.
Yes, I was just amplifying on what that format looks like 
_sometimes_.  I agree that it doesn't come into play here.
...
The documentation should use the term "big-endian", for the
benefit of those who know what it means, and perhaps give details,
for the benefit of those who don't.
Yes, that's a good idea.
...
...
For the 64-bit value 0xFEDCBA9876543210, ...
0xFE 0xDC 0xBA 0x98 0x76 0x54 0x32 0x10
...
For character data, the bytes are written leftmost character
first, and sequentially as one would read them left to right in
English,
"Leftmost" in what sense?
Leftmost in the sense of reading them (or writing them) in 
English; of course, I agree that if they were Hebrew characters, 
you'd expect the first character to be the rightmost and proceed 
leftward as they would be written.   Do we have character strings 
that are not in English?  Don't we use the English names for 
place names (zone names)?
...
If all strings in a time zone data file are required to be ASCII,
then the "logical order" and display order as per
http://unicode.org/reports/tr9/
are the same.
If not all strings in a time zone data file are required to be
ASCII, then we should perhaps specify that they're UTF-8.  In the
case of non-ASCII strings, they could potentially contain, for
example, Arabic/Hebrew/Farsi/etc. text, in which case the
"leftmost" character on the screen would, as I understand it,
*not* necessarily be the first character in the string.
I think the display order is Not Our Problem, especially if we
specify that strings are UTF-8 (so that they're Unicode and the
Unicode Bidirectional Algorithm applies), so "left" and "right"
need not and should not be used.
We *should*, however, specify what encoding is used - ASCII,
meaning "any byte with the 8th bit set is an error" and "the
national-use positions of ISO 646 have the US characters", or "all
strings are UTF-8", or something else.
I agree with all that.
...
...
and terminated with a null byte.
Dave C.