FW: changes to zdump.c and zic.c invocations of is.* macros
-----Original Message----- From: Robert Elz [mailto:kre@munnari.OZ.AU] Sent: Tuesday, December 06, 2005 10:20 PM To: Arthur David Olson Cc: tz@lecserver.nci.nih.gov Subject: Re: changes to zdump.c and zic.c invocations of is.* macros Date: Tue, 6 Dec 2005 13:35:15 -0500 (EST) From: Arthur David Olson <olsona@elsie.nci.nih.gov> Message-ID: <200512061835.jB6IZFWH007741@elsie.nci.nih.gov> | ! while (isascii((unsigned char) *cp) && You sometimes do better if you write that as while (isascii(*(unsigned char *)cp) && It can also be a little clearer what you're intending - there's no intention here to fetch the char, then convert it to unsigned, all we want is the 0..255 value that cp points at. Than again, and I haven't looked at the code again just now to see if it is practical or not, but an alternative might just be to declare cp as being unsigned char * right from the start (even if that means that it needs to stop being an input parameter, and instead be copied from one which would need to remain char * to avoid changing the API). kre ps: I don't think Paul's suggested variation is really the right thing to do - it is certainly true that it's possible to test for digits by using >= '0' && <= '9' tests - but if that's the best way to write it, then that's what isdigit() ought to be doing. If you want to make it shorter, it is safe to stop the isascii() test if the arg is known to be unsigned char which after this change, it will be (or EOF, but that's irrelevant here). Paul's version may be textually shorter, but with that cp++ side effect buried in the middle of the && sequence, it is not nearly as easy to read.
From: Robert Elz [mailto:kre@munnari.OZ.AU] Sent: Tuesday, December 06, 2005 10:20 PM | ! while (isascii((unsigned char) *cp) &&
You sometimes do better if you write that as
while (isascii(*(unsigned char *)cp) &&
It can also be a little clearer what you're intending - there's no intention here to fetch the char, then convert it to unsigned, all we want is the 0..255 value that cp points at.
If memory serves, the latter form (*(unsigned char *)cp) is not portable to all C89 hosts, whereas the former form ((unsigned char) *cp) is. The idea is that some C89 hosts might have padding bits in their unsigned char representation, and it's incorrect to access a char as if it were unsigned char. I believe this issue got cleared up in C99, so the code is portable to C99 compilers. But the zic stuff attempts to be portable to C89 (as well as earlier) compilers.
it is certainly true that it's possible to test for digits by using
= '0' && <= '9' tests - but if that's the best way to write it, then that's what isdigit() ought to be doing.
Alas, that's not true in practice. isdigit is typically slower, and it can be quite a bit slower. For example, on my host (Debian GNU/Linux stable, GCC 4.0.2, gcc -O4), with the following code: int F (char *p) { return isdigit ((unsigned char) *p) != 0; } int G (char *p) { return '0' <= *p && *p <= '9'; } F compiles into 12 instructions that contain a subtroutine call (for a total of 31 instructions executed), whereas G compiles into 10 instructions of straight-line code. I think part of the problem is that isdigit might be sensitive to the locale. So there's a correctness issue here as well; isdigit might actually return the wrong value, since it might think that some other byte code is a digit. (This is just a theoretical issue, as far as I know, though.)
Paul's version may be textually shorter, but with that cp++ side effect buried in the middle of the && sequence, it is not nearly as easy to read.
True, but in my defense that buried cp++ was in the original code. How about this instead? It might be a bit clearer. char c = *cp; if ('0' <= c && c <= '9') { cp++; if (c == '1' && '0' <= *cp && *cp <= '4') cp++; }
Date: Thu, 08 Dec 2005 13:23:21 -0800 From: Paul Eggert <eggert@CS.UCLA.EDU> Message-ID: <873bl39tbq.fsf@penguin.cs.ucla.edu> | But the zic stuff attempts to be portable to C89 (as | well as earlier) compilers. OK, though I can't imagine a C implementation where it wouldn't work (whatever C89 might technically allow...) | Alas, that's not true in practice. isdigit is typically slower, and | it can be quite a bit slower. For zic/zdump I don't care which is faster or slower (it might be different if this code were in localtime or similar). What I said was "better" not "faster". | > Paul's version may be textually shorter, but with that cp++ side effect | > buried in the middle of the && sequence, it is not nearly as easy to | > read. | | True, but in my defense that buried cp++ was in the original code. It isn't the side effect itself I minded, but the buried side effect in the middle of a long expression. I have no problem with if (*cp++ == 0) (or similar) types of expressions, that's ancient C idiom that anyone can (should be able to) comprehend easily. But when you have to look hard to figure out under what circumstances the "++" actually gets executed, then I start to look for a better way to write the code. | How about this instead? It might be a bit clearer. Aside from the switch away from using isdigit() that's pretty much what was there already, isn't it? (OK, it moves out the ++ from the if, but there was no real need to do that in the original). Incidentally, if you wanted to not use isdigit(), rather than writing out the expression longhand, I'd just define a new macro that does the test. That way, it is trivial to switch between using isdigit() or using the specific tests against '0' and '9' just by altering the new macro definition. kre
participants (3)
-
Olson, Arthur David (NIH/NCI) [E] -
Paul Eggert -
Robert Elz