[PROPOSED] Do not assume bytes have 8 bits
* zic.c (convert, convert64): Mask bytes with 0xff before storing them, for portability to machines where bytes have more than 8 bits. Although this is surely only of theoretical interest, we might as well be portable. --- zic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/zic.c b/zic.c index 88b1531..77a5698 100644 --- a/zic.c +++ b/zic.c @@ -1927,7 +1927,7 @@ convert(uint_fast32_t val, char *buf) unsigned char *const b = (unsigned char *) buf; for (i = 0, shift = 24; i < 4; ++i, shift -= 8) - b[i] = val >> shift; + b[i] = (val >> shift) & 0xff; } static void @@ -1938,7 +1938,7 @@ convert64(uint_fast64_t val, char *buf) unsigned char *const b = (unsigned char *) buf; for (i = 0, shift = 56; i < 8; ++i, shift -= 8) - b[i] = val >> shift; + b[i] = (val >> shift) & 0xff; } static void -- 2.27.0
Paul Eggert via tz said:
* zic.c (convert, convert64): Mask bytes with 0xff before storing them, for portability to machines where bytes have more than 8 bits. Although this is surely only of theoretical interest, we might as well be portable.
About 11 years ago I was programming what was then the third most common CPU in the world. It had a 16-bit byte, which was fine when you were used to it but would confuse new staff, and we programmed it (mostly) in C. [https://en.wikipedia.org/wiki/XAP_processor#XAP2] -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 4/23/21 2:07 PM, Clive D.W. Feather wrote:
It had a 16-bit byte, which was fine when you were used to it but would confuse new staff, and we programmed it (mostly) in C.
Yes, quite a few CPUs like that exist in the embedded world. And historically some mainframish CPUs had 9-, or even variable-width bytes. I put "theoretical interest" in that zic.c commit message only because as I understand it none of those CPUs are practical platforms for running zic today. Come to think of it, even if they were practical platforms the unpatched zic.c code would likely work anyway because high-order bits would be silently discarded on output.
<<On Fri, 23 Apr 2021 14:56:21 -0700, Paul Eggert via tz <tz@iana.org> said:
Yes, quite a few CPUs like that exist in the embedded world. And historically some mainframish CPUs had 9-, or even variable-width bytes. I put "theoretical interest" in that zic.c commit message only because as I understand it none of those CPUs are practical platforms for running zic today.
POSIX since 2001 requires eight-bit bytes, so the set of platforms this could possibly be relevant for is even smaller. (We only made it explicit in 2008, as I recall, but it was implicit in the requirement that uint8_t be defined, given the requirements C99 places on such a type when it exists.) There was a Unisys 36-bit operating system that was POSIX-certified for a previous edition of the standard (before the networking interfaces were merged, which was the proximate cause of requiring uint8_t). -GAWollman
Garrett Wollman via tz said:
POSIX since 2001 requires eight-bit bytes,
Yes, that was my fault. (I've just been reading through my records of what I send to the austin-group list; it's fascinating in retrospect.) I pointed out that various network functions (starting with htonl() and friends and then going on to how send() and recv() worked on a socket) needed a better definition to cope with systems where CHAR_BIT > 8. Instead of adopting my suggestions (which were, I accept, complicated) it was decided to require that CHAR_BIT was always 8. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 2021-04-23 13:27, Paul Eggert via tz wrote:
* zic.c (convert, convert64): Mask bytes with 0xff before storing them, for portability to machines where bytes have more than 8 bits. Although this is surely only of theoretical interest, we might as well be portable.
Not uncommon on DSPs which are now being used not only for graphics but also for machine learning and neural network processing for audio speech recognition, video object recognition and classification, and natural language recognition e.g. https://www.embecosm.com/2017/04/18/non-8-bit-char-support-in-clang-and-llvm... The models are so large with billions or trillions of parameters that they hit limits on systems with 4 GPUs each having 12GB local memory, 64-128GB system memory, even with usual optimizations to use 16 bit floats. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.]
participants (4)
-
Brian Inglis -
Clive D.W. Feather -
Garrett Wollman -
Paul Eggert