
Thanks to all for feedback on asctime.c. The version below uses %02.2d for universality, uses %4ld for years to avoid problems if a year isn't four digits long, and avoids the use of snprintf (which isn't available on some systems). There are comments on all these changes. Is it soup yet? --ado /* ** This file is in the public domain, so clarified as of ** 1996-06-05 by Arthur David Olson (arthur_david_olson@nih.gov). */ #ifndef lint #ifndef NOID static char elsieid[] = "@(#)asctime.c 7.15"; #endif /* !defined NOID */ #endif /* !defined lint */ /*LINTLIBRARY*/ #include "private.h" #include "tzfile.h" #define STANDARD_BUFFER_SIZE 26 /* ** A la ISO/IEC 9945-1, ANSI/IEEE Std 1003.1, 2004 Edition. */ char * asctime_r(timeptr, buf) register const struct tm * timeptr; char * buf; { static const char wday_name[][3] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" }; static const char mon_name[][3] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" }; register const char * wn; register const char * mn; /* ** Big enough for something such as ** ??? ???-2147483648 -2147483648:-2147483648:-2147483648 -2147483648\n ** (two three-character abbreviations, five strings denoting integers, ** three explicit spaces, two explicit colons, a newline, ** and a trailing ASCII nul). */ char result[2 * 3 + 5 * INT_STRLEN_MAXIMUM(int) + 3 + 2 + 1 + 1]; if (timeptr->tm_wday < 0 || timeptr->tm_wday >= DAYSPERWEEK) wn = "???"; else wn = wday_name[timeptr->tm_wday]; if (timeptr->tm_mon < 0 || timeptr->tm_mon >= MONSPERYEAR) mn = "???"; else mn = mon_name[timeptr->tm_mon]; /* ** The format used in the (2004) standard is ** "%.3s %.3s%3d %.2d:%.2d:%.2d %d\n" ** Some systems only handle "%.2d"; others only handle "%02d"; ** "%02.2d" makes (most) everybody happy. ** All years associated with 32-bit time_t values are exactly ** four digits long; some years associated with 64-bit time_t ** values are not four digits long so we throw in the 4 below. */ /* ** We avoid using snprintf since it's not available on all systems. */ (void) sprintf(result, "%.3s %.3s%3d %02.2d:%02.2d:%02.2d %4ld\n", wn, mn, timeptr->tm_mday, timeptr->tm_hour, timeptr->tm_min, timeptr->tm_sec, timeptr->tm_year + (long) TM_YEAR_BASE); if (strlen(result) >= STANDARD_BUFFER_SIZE) { errno = EOVERFLOW; return NULL; } else { (void) strcpy(buf, result); return buf; } } /* ** A la ISO/IEC 9945-1, ANSI/IEEE Std 1003.1, 2004 Edition. */ char * asctime(timeptr) register const struct tm * timeptr; { static char result[STANDARD_BUFFER_SIZE]; return asctime_r(timeptr, result); }

Date: Tue, 27 Jul 2004 10:53:36 -0400 From: "Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> Message-ID: <75DDD376F2B6B546B722398AC161106C740274@nihexchange2.nih.gov> | Thanks to all for feedback on asctime.c. The version below uses %02.2d for | universality, uses %4ld for years to avoid problems if a year isn't four | digits long, and avoids the use of snprintf (which isn't available on some | systems). There are comments on all these changes. Is it soup yet? | /* | ** Big enough for something such as | ** ??? ???-2147483648 -2147483648:-2147483648:-2147483648 | -2147483648\n | ** (two three-character abbreviations, five strings denoting | integers, | ** three explicit spaces, two explicit colons, a newline, | ** and a trailing ASCII nul). | */ | char result[2 * 3 + 5 * INT_STRLEN_MAXIMUM(int) + | 3 + 2 + 1 + 1]; That buffer is big enough for that string, but are we sure that string is the longest that can ever happen? The year is after all printed using %ld - so there must at least be the potential for that one to be a long, which might be -9223372036854775808. Beyond that, C doesn't actually promise that "int" is limited to 32 bits does it? Given that, the 2147483648 numbers are just speculation. Other than validating the input, before conversion to strings, to see that it will be printable in a reasonable number of characters, snprintf is really the only good solution. Where it doesn't exist, just use a HUGE buffer - it is just stack space after all, make it 1KB, then no combination of 5 integers is going to overflow it (no-one uses ints that require 200 digits to represent ... not yet anyway). Or if you want to ignore snprintf, always use that BIG buffer. kre

Robert Elz <kre@munnari.oz.au> writes:
That buffer is big enough for that string, but are we sure that string is the longest that can ever happen?
Yes. It's an information-theoretic argument. At compile-time we know the number of bits in an int, so we can compute at compile-time an upper bound on the number of digits that can be printed. This is true even for weird architectures that have "holes" in their int representation, since the holes can't increase the number of digits.
The year is after all printed using %ld - so there must at least be the potential for that one to be a long, which might be -9223372036854775808.
But that 'long' value is derived by adding 1900 to an 'int'. It can't possibly take more digits than the number of digits printed in an int. Even if adding 1900 overflows INT_MAX, it will add at most one digit to the print width, and that extra byte is already accounted for by the width of INT_MIN (which has a leading minus sign). Hmm, perhaps this fairly-subtle point should be commented. I'll add a comment in my next proposed draft.
Beyond that, C doesn't actually promise that "int" is limited to 32 bits does it? Given that, the 2147483648 numbers are just speculation.
That part of the comment is just an example: the code works even with wider ints. I'll propose a reworded comment to make this clearer.
Other than validating the input, before conversion to strings, to see that it will be printable in a reasonable number of characters, snprintf is really the only good solution. Where it doesn't exist, just use a HUGE buffer - it is just stack space after all,
That approach runs into a different problem. On most modern architectures, the stack space overflow checking is fairly brain damaged. Sometimes there's no checking whatsoever (ouch!), but more typically the assumption is that one does not have HUGE buffers on the stack. If you violate this assumption the behavior is undefined. A 1K local buffer is safe on all the platforms that I know about (i.e., stack overflow will be detected if you use the buffer right away), but an 8K buffer isn't. Anyway, it's always better to not allocate space you don't need, as this avoids stack overflow in some cases; this is true even for architectures with perfect stack-overflow checking. I agree that in general snprintf is the way to go: it leads to a higher-performance solution, since it avoids an extra buffer copy. However, the tradeoff is that it makes the code either less portable (older hosts don't have snprintf, or have a buggy one) or more complicated (if you have ifdefs). So I can understand Arthur's preference to stick with tried-and-true sprintf here.

On Tue, Jul 27, 2004 at 10:48:34PM +0700, Robert Elz wrote:
Beyond that, C doesn't actually promise that "int" is limited to 32 bits does it? Given that, the 2147483648 numbers are just speculation.
my copy of k&r (2ed) says, in brief: + short and long are intended (but not required) to be different lengths + int will be the "natural size" for a machine + shorts and ints are at least 16 bits + longs are at least 32 bits + sizeof(short) <= sizeof(int) <= sizeof(long) therefore, int could be 128 bits if you liked, but long would have to be at least as many, while short could only be at most as many. and that's it. iirc, the cray was a "csilp64" machine, and that didn't break any rules. -- |-----< "CODE WARRIOR" >-----| codewarrior@daemon.org * "ah! i see you have the internet twofsonet@graffiti.com (Andrew Brown) that goes *ping*!" werdna@squooshy.com * "information is power -- share the wealth."

Andrew Brown said:
my copy of k&r (2ed) says, in brief:
+ short and long are intended (but not required) to be different lengths + int will be the "natural size" for a machine + shorts and ints are at least 16 bits + longs are at least 32 bits + sizeof(short) <= sizeof(int) <= sizeof(long)
therefore, int could be 128 bits if you liked, but long would have to be at least as many, while short could only be at most as many. and that's it.
K&R isn't the Standard, and sometimes simplifies. However, your conclusions here are correct. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

On Wed, Jul 28, 2004 at 06:01:43AM +0100, Clive D.W. Feather wrote:
Andrew Brown said:
my copy of k&r (2ed) says, in brief:
+ short and long are intended (but not required) to be different lengths + int will be the "natural size" for a machine + shorts and ints are at least 16 bits + longs are at least 32 bits + sizeof(short) <= sizeof(int) <= sizeof(long)
therefore, int could be 128 bits if you liked, but long would have to be at least as many, while short could only be at most as many. and that's it.
K&R isn't the Standard, and sometimes simplifies. However, your conclusions here are correct.
true, but the second edition has the ansi stamp on it, and afaik there's no third edition. i have a c9x draft somewhere around but the last time i looked, it was more than twice as large as the k&r book i have and at most half as readable. i doubt that particular aspect of the language was changed, though. -- |-----< "CODE WARRIOR" >-----| codewarrior@daemon.org * "ah! i see you have the internet twofsonet@graffiti.com (Andrew Brown) that goes *ping*!" werdna@squooshy.com * "information is power -- share the wealth."

Andrew Brown said:
K&R isn't the Standard, and sometimes simplifies. true, but the second edition has the ansi stamp on it, and afaik there's no third edition.
The "ansi stamp" simply means that they made changes to match C99. The fact that K&R haven't changed the text to match C99 doesn't make them correct.
i have a c9x draft somewhere around but the last time i looked, it was more than twice as large as the k&r book i have and at most half as readable. i doubt that particular aspect of the language was changed, though.
Actually it was. Significantly - you're looking at the person who wrote the new text. [Before you panic, the basic concepts are unchanged, but a whole load of subtle issues have now been addressed.] -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

On Thu, Jul 29, 2004 at 03:38:03PM +0100, Clive D.W. Feather wrote:
Andrew Brown said:
K&R isn't the Standard, and sometimes simplifies. true, but the second edition has the ansi stamp on it, and afaik there's no third edition.
The "ansi stamp" simply means that they made changes to match C99. The fact that K&R haven't changed the text to match C99 doesn't make them correct.
no, the ansi stamp means that it has something to do with c89. c99 didn't exist when the second edition was printed.
i have a c9x draft somewhere around but the last time i looked, it was more than twice as large as the k&r book i have and at most half as readable. i doubt that particular aspect of the language was changed, though.
Actually it was. Significantly - you're looking at the person who wrote the new text.
[Before you panic, the basic concepts are unchanged, but a whole load of subtle issues have now been addressed.]
of the five i noted, have any of them changed at all? -- |-----< "CODE WARRIOR" >-----| codewarrior@daemon.org * "ah! i see you have the internet twofsonet@graffiti.com (Andrew Brown) that goes *ping*!" werdna@squooshy.com * "information is power -- share the wealth."

Andrew Brown said:
K&R isn't the Standard, and sometimes simplifies. true, but the second edition has the ansi stamp on it, and afaik there's no third edition. The "ansi stamp" simply means that they made changes to match C99.
Argh. That should have said "C90".
Actually it was. Significantly - you're looking at the person who wrote the new text. [Before you panic, the basic concepts are unchanged, but a whole load of subtle issues have now been addressed.] of the five i noted, have any of them changed at all?
Let's see:
+ short and long are intended (but not required) to be different lengths
The Standard tries to avoid "intended". short and long can be the same length or different lengths.
+ int will be the "natural size" for a machine
"A ``plain'' int object has the natural size suggested by the architecture of the execution environment"
+ shorts and ints are at least 16 bits + longs are at least 32 bits
These limits are defined in terms of the minimum range of values rather than bit counts. This is because types can have bits that don't take part in calculations; for example, a 16 bit storage unit might only have 12 bits that take part in calculations. signed short and signed int must be able to hold -32767 to +32767. unsigned short and unsigned int must be able to hold 0 to 65535. signed long must be able to hold -2147483647 to +2147483647. unsigned long must be able to hold 0 to 4294967295.
+ sizeof(short) <= sizeof(int) <= sizeof(long)
This one has gone. The requirements are: sizeof(signed short) == sizeof(unsigned short) sizeof(signed int) == sizeof(unsigned int) sizeof(signed long) == sizeof(unsigned long) USHRT_MAX >= SHRT_MAX UINT_MAX >= INT_MAX ULONG_MAX >= LONG_MAX SHRT_MAX <= INT_MAX USHRT_MAX <= UINT_MAX INT_MAX <= LONG_MAX UINT_MAX <= ULONG_MAX This does allow a perverse implementation where long is 32 bits, all used, while short is 64 bits but only 17 of them are used. There are reasons (too off-topic to go into) why we did it this way.
therefore, int could be 128 bits if you liked, but long would have to be at least as many, while short could only be at most as many. and that's it.
See above. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
+ sizeof(short) <= sizeof(int) <= sizeof(long)
This one has gone. This does allow a perverse implementation where long is 32 bits, all used, while short is 64 bits but only 17 of them are used. There are reasons (too off-topic to go into) why we did it this way.
It's not entirely off-topic for tz, as a bit of it (difftime.c) does assume a relationship between sizeof and the range of values that can be stored. I suspect there is a reasonable amount of code that makes the C89 sizeof assumptions, and which will silently go wrong of the assumptions fail to hold. Is there some place that documents why these sizeof assumptions were removed in C99? Are there actual C99 implementations that violate these C89 assumptions?

Paul Eggert said:
+ sizeof(short) <= sizeof(int) <= sizeof(long)
This one has gone. This does allow a perverse implementation where long is 32 bits, all used, while short is 64 bits but only 17 of them are used. There are reasons (too off-topic to go into) why we did it this way.
It's not entirely off-topic for tz, as a bit of it (difftime.c) does assume a relationship between sizeof and the range of values that can be stored.
Can you give me an example?
I suspect there is a reasonable amount of code that makes the C89 sizeof assumptions, and which will silently go wrong of the assumptions fail to hold.
The only code I can think of that will have this problem is code that assumes you can memcpy() a short into a long and have something sensible happen.
Is there some place that documents why these sizeof assumptions were removed in C99?
I don't recall what, if anything, we wrote in the Rationale. The whole area of integer types and representations got revisited and rewritten as part of the C99 process. In doing this, we identified the properties that we thought were important: * there's a hierarchy long long, long, int, short, char; * integer types all come in signed/unsigned pairs; * the range of lower types is a subset of the range of higher types; * corresponding signed and unsigned types occupy the same storage; * unsigned types can hold all possible non-negative values of the corresponding signed type. Note that while we removed the sizeof requirement, we *did* fix problems in C89:
Are there actual C99 implementations that violate these C89 assumptions?
I'm not aware either way. Please note the following: signed short ss = some_signed_value (); signed long sl; unsigned short us = some_unsigned_value (); unsigned long ul; sl = ss if (sl != ss) printf ("This can't happen in C90 or C99.\n"); ul = us if (ul != us) printf ("This can't happen in C99. This is legal C90.\n"); if (sizeof ul < sizeof us) printf ("This is legal C99. It probably can't happen in C90.\n"); -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
Can you give me an example?
Sure. Here's an example taken from the code I happened to be looking at 30 seconds before reading your email. It's taken from GNU coreutils "od.c". I've paraphrased the code slightly to simplify it. enum size_spec { NO_SIZE, CHAR, SHORT, INT, LONG } ; enum size_spec integral_type_size[sizeof (long) + 1]; for (i = 0; i <= MAX_INTEGRAL_TYPE_SIZE; i++) integral_type_size[i] = NO_SIZE; integral_type_size[sizeof (char)] = CHAR; integral_type_size[sizeof (short)] = SHORT; integral_type_size[sizeof (int)] = INT; integral_type_size[sizeof (long)] = LONG; This code has undefined behavior if, for example, sizeof (long) is 4 and sizeof (int) is 8. The code in difftime.c is a bit more subtle than this, and now that I look at it more carefully it can't strictly be justified in terms of either C89 or C99 (though it is true on all platforms I know about). However, I'd say that the general principle that sizeof(int) <= sizeof(long) is hardwired into a lot of real-world code. If there aren't any real implementations with sizeof(long) < sizeof(int), then this is only of academic interest. Still, it's strange that this longstanding requirement would get removed from the standard. After all, it's a natural assumption.

Paul Eggert said:
Can you give me an example? Sure. Here's an example taken from the code I happened to be looking at 30 seconds before reading your email. It's taken from GNU coreutils "od.c".
I've just looked at the actual code. This seems to be doing something a bit odd - trying to use printf on various types to output integers of run-time-specified size (if I'm wrong, please say so). My inclination is to say "don't do that".
I've paraphrased the code slightly to simplify it.
enum size_spec { NO_SIZE, CHAR, SHORT, INT, LONG } ; enum size_spec integral_type_size[sizeof (long) + 1];
I note the actual code has MAX_INTEGRAL_TYPE_SIZE here. This can be adjusted if necessary. [...]
This code has undefined behavior if, for example, sizeof (long) is 4 and sizeof (int) is 8.
This can be tested at compile time: add to "struct dummy" a number of fields of the form: int assert_size_spec_big_enough_for_char [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (char)]; int assert_size_spec_big_enough_for_short [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (short)]; int assert_size_spec_big_enough_for_int [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (int)];
The code in difftime.c is a bit more subtle than this, and now that I look at it more carefully it can't strictly be justified in terms of either C89 or C99 (though it is true on all platforms I know about). However, I'd say that the general principle that sizeof(int) <= sizeof(long) is hardwired into a lot of real-world code.
I'm still dubious about "a lot". What's difftime.c doing that needs that assumption? Note that the related assumption INT_MAX <= LONG_MAX *is* guaranteed in both C89 and C99. UINT_MAX <= ULONG_MAX is guaranteed in C99 but not C89. [see below]
If there aren't any real implementations with sizeof(long) < sizeof(int), then this is only of academic interest. Still, it's strange that this longstanding requirement would get removed from the standard. After all, it's a natural assumption.
We didn't feel so. "long can hold any value that int can" is an important property. "long takes up more space in core than int does" is less obviously so. ==== C89 said: There are four signed integer types, designated as signed char, short int, int, and long int. [...] In the list of signed integer types above, the range of values of each type is a subrange of the values of the next type in the list. [...] The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, Therefore: INT_MAX <= LONG_MAX INT_MAX <= UINT_MAX LONG_MAX <= ULONG_MAX but you can't derive UINT_MAX <= ULONG_MAX from that. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
My inclination is to say "don't do that".
No can do. POSIX requires the od command to "do that". Here's the spec: <http://www.opengroup.org/onlinepubs/009695399/utilities/od.html> The only way to support (say) "od -t xL" is to use a %lx format, selected at run-time.
This can be tested at compile time:
(sarcasm on) Yes, we can go through millions of lines of code, looking for dozens or hundreds of places where programmers have made the very natural assumption that sizeof(int) <= sizeof(long), and rewrite them all to be portable to hosts where this assumption isn't true. No automated tool can do this today -- but sure, we can check it all by hand. This would take months -- years maybe -- but we've got plenty of spare time and our people love to do this sort of thing. (whew! sarcasm off. hope you didn't mind...) Seriously: it's not going to happen. We have better things to do with our limited resources. We have real bugs and real security holes to fix. That is what I was doing with od.c when your email arrived; see <http://lists.gnu.org/archive/html/bug-coreutils/2004-08/msg00026.html> for the result of my efforts. "Bugs" that are merely inventions of the standardization committee, and aren't a problem on any real host, will not get "fixed".
I'm still dubious about "a lot".
What can I say? I gave you one example, from code I was working on the minute I received your email (no lie!). As it happens this code is quite widely used, and widely portable, and it has safely made the sizeof(int)<=sizeof(long) assumption since before C89 came out. I could give you other examples but I'm afraid it sounds like your mind was made up before I started.
What's difftime.c doing that needs that assumption?
difftime's problem is slightly different. It's trying to subtract two POSIX time_t values and return a floating-point answer that is exactly correct, when possible. It can't simply subtract the time_t values, because they are typically integers and we might have integer overflow. And it can't simply convert to floating point and subtract the results, because that will lose information in some cases (e.g., if time_t is 64 bits and "double" is IEEE 64-bit double). So it uses a heuristic, based on the size of time_t, to decide what to do. This heuristic is that if sizeof (time_t) < sizeof (double), then time_t can be converted to double without losing information; and similarly for long double. This heuristic is not guaranteed by C but is true on all platforms that we know of. (If you know of any counterexamples, please let us know.) The heuristic is related to the C89 guarantee that sizeof bears a sane relationship to range, but it's not identical to that guarantee. As far as I know, there is no portable way in C89 or C99 to implement POSIX difftime; the heuristic is the best we have come up with so far. There is more explanation in the difftime source code.

Paul Eggert said:
My inclination is to say "don't do that". No can do. POSIX requires the od command to "do that". Here's the spec: <http://www.opengroup.org/onlinepubs/009695399/utilities/od.html> The only way to support (say) "od -t xL" is to use a %lx format, selected at run-time.
Accepted, though your specific method isn't required. [Incidentally, if I read it correctly you don't need to handle types larger than 16 bytes anyway.]
(whew! sarcasm off. hope you didn't mind...)
No.
I'm still dubious about "a lot". What can I say? I gave you one example, from code I was working on the minute I received your email (no lie!).
I don't doubt you.
As it happens this code is quite widely used, and widely portable, and it has safely made the sizeof(int)<=sizeof(long) assumption since before C89 came out. I could give you other examples but I'm afraid it sounds like your mind was made up before I started.
No. It's just that I am having difficulty figuring out why someone would do that at all. You've given me one example.
What's difftime.c doing that needs that assumption? difftime's problem is slightly different. It's trying to subtract two POSIX time_t values and return a floating-point answer that is exactly correct, when possible. It can't simply subtract the time_t values, because they are typically integers and we might have integer overflow. And it can't simply convert to floating point and subtract the results, because that will lose information in some cases (e.g., if time_t is 64 bits and "double" is IEEE 64-bit double). So it uses a heuristic, based on the size of time_t, to decide what to do. This heuristic is that if sizeof (time_t) < sizeof (double), then time_t can be converted to double without losing information; and similarly for long double. This heuristic is not guaranteed by C but is true on all platforms that we know of. (If you know of any counterexamples, please let us know.)
I'm not familiar enough with architectures to be able to answer that. An implementation could put double in 80 bit storage but still make it an IEEE 64 bit double; that would break you. I can't see why it would, though (but that argument isn't one you like).
As far as I know, there is no portable way in C89 or C99 to implement POSIX difftime; the heuristic is the best we have come up with so far.
I'll do some thinking. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
Accepted, though your specific method isn't required.
Yes, quite true. In all these cases the code could be rewritten if necessary. It's the cost of the rewrite that I object to.
[Incidentally, if I read it correctly you don't need to handle types larger than 16 bytes anyway.]
Yes, POSIX requires support only for 1, 2, 4, 8, sizeof (short), sizeof (int), and sizeof (long).
I am having difficulty figuring out why someone would do that at all. You've given me one example.
OK. Here's another example, obtained by the command "grep 'sizeof.*int'" in the coreutils source code. There are 59 grep matches, and the 4th match (I stopped looking after that) is broken on a host where we can't assume sizeof works as in C89. This macro takes any integer value x as an argument, and returns UINTMAX_MAX if x is all-one-bits in x's type; otherwise it returns x converted to uintmax_t. #define PROPAGATE_ALL_ONES(x) \ ((sizeof (x) < sizeof (uintmax_t) \ && (~ (x) == (sizeof (x) < sizeof (int) \ ? - (1 << (sizeof (x) * CHAR_BIT)) \ : 0))) \ ? UINTMAX_MAX : (x)) Admittedly this is a less-clean example, since it also assumes a two's complement host in which integers narrower than "int" do not have "holes" in their representation. (The brief coding standard for this code allows this assumptions; see "Portability guidelines" under <http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/gnulib/gnulib/README?rev=...>.) However, we still have the case that, within the constraints allowed by this coding standard, the relaxation of the C89 sizeof rules "breaks" this code. Now that I'm thinking of this, I should probably just change the coding standard to say that it's OK to assume sizeof(int)<=sizeof(long). That will solve the problem for us, anyway. It wouldn't be the first time that the GNU coding standards have said it's OK to assume properties guaranteed by C89 but not C99.

Paul Eggert said:
Accepted, though your specific method isn't required. Yes, quite true. In all these cases the code could be rewritten if necessary. It's the cost of the rewrite that I object to.
Which is understandable.
[Incidentally, if I read it correctly you don't need to handle types larger than 16 bytes anyway.] Yes, POSIX requires support only for 1, 2, 4, 8, sizeof (short), sizeof (int), and sizeof (long).
But if sizeof (long) is 32, you don't have to support it because the entire line would be too long, right?
OK. Here's another example, obtained by the command "grep 'sizeof.*int'" in the coreutils source code. There are 59 grep matches, and the 4th match (I stopped looking after that) is broken on a host where we can't assume sizeof works as in C89. This macro takes any integer value x as an argument, and returns UINTMAX_MAX if x is all-one-bits in x's type; otherwise it returns x converted to uintmax_t.
#define PROPAGATE_ALL_ONES(x) \ ((sizeof (x) < sizeof (uintmax_t) \ && (~ (x) == (sizeof (x) < sizeof (int) \ ? - (1 << (sizeof (x) * CHAR_BIT)) \ : 0))) \ ? UINTMAX_MAX : (x))
Admittedly this is a less-clean example, since it also assumes a two's complement host in which integers narrower than "int" do not have "holes" in their representation.
There definitely *are* architectures for which that isn't the case - when working on C99, we knew that there are systems where the unsigned types simply ignore the sign bit rather than using it as a most significant bit. There's even one system where integers are stored as floating-point values with the exponent ignored. With that assumption, both C89 and C99 have the "sizeof ordering" you desire. Without it, I think that code is broken anyway. (I also don't see why you're special-casing uintmax_t at the start.) It will also break on a perverse C89 system where USHRT_MAX > UINT_MAX (C99 forbids this).
However, we still have the case that, within the constraints allowed by this coding standard, the relaxation of the C89 sizeof rules "breaks" this code.
I don't think so, because if there are no holes then the requirements on range imply the requirements on sizeof.
Now that I'm thinking of this, I should probably just change the coding standard to say that it's OK to assume sizeof(int)<=sizeof(long). That will solve the problem for us, anyway. It wouldn't be the first time that the GNU coding standards have said it's OK to assume properties guaranteed by C89 but not C99.
I hope you say this "only for code being maintained, not for new code". That way it will eventually die out. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
if sizeof (long) is 32, you don't have to support it because the entire line would be too long, right?
Sorry, I don't know. My guess is that if someone ever builds a machine with sizeof(long)==32, then POSIX will have to get fixed.
With that assumption, both C89 and C99 have the "sizeof ordering" you desire. Without it, I think that code is broken anyway.
Well, I admitted it was a less-clean example. Though your conclusion isn't quite right: the sizeof-using code works correctly on C89 hosts where "int" and wider types contain holes, but types narrower than "int" do not contain holes. Quite possibly some of the weird hosts you're talking about fall into this category. C99 relaxed sizeof rules breaks the code on any such hosts. If these two examples don't satisfy you, here's another one, again taken from GNU coreutils: int open_safer (char const *file, int oflag, ...) { int fd; mode_t mode = 0; if (oflag & O_CREAT) { va_list args; va_start (args, oflag); if (sizeof (int) <= sizeof (mode_t)) mode = va_arg (args, mode_t); else mode = va_arg (args, int); va_end (args); } fd = open (file, oflag, mode); ... } Now that I think of it, this very subject came up recently in the austin-group-l mailing list, e.g., <http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-gr...>. It was kind of a mess, if I recall.
I hope you say this "only for code being maintained, not for new code". That way it will eventually die out.
The old code won't die out for the forseeable future, I'm afraid. And I don't see the point of warning programmers even for new code. Why waste programmers' time with worries about porting to theoretical hosts that don't exist now and aren't ever likely to exist? They have more important things to worry about.

On Wed, Aug 04, 2004 at 06:10:11PM +0100, Clive D.W. Feather wrote:
This can be tested at compile time: add to "struct dummy" a number of fields of the form:
int assert_size_spec_big_enough_for_char [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (char)]; int assert_size_spec_big_enough_for_short [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (short)]; int assert_size_spec_big_enough_for_int [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (int)];
if you're going to do that, at least make it typedef int assert_size_spec_big_enough_for_char [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (char)]; typedef int assert_size_spec_big_enough_for_short [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (short)]; typedef int assert_size_spec_big_enough_for_int [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (int)]; so that the compiler can barf if it wants to, but there's no impact on the generated code. -- |-----< "CODE WARRIOR" >-----| codewarrior@daemon.org * "ah! i see you have the internet twofsonet@graffiti.com (Andrew Brown) that goes *ping*!" werdna@squooshy.com * "information is power -- share the wealth."

Andrew Brown said:
This can be tested at compile time: add to "struct dummy" a number of fields of the form:
int assert_size_spec_big_enough_for_char [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (char)]; int assert_size_spec_big_enough_for_short [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (short)]; int assert_size_spec_big_enough_for_int [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (int)];
if you're going to do that, at least make it
typedef int assert_size_spec_big_enough_for_char [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (char)]; typedef int assert_size_spec_big_enough_for_short [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (short)]; typedef int assert_size_spec_big_enough_for_int [MAX_INTEGRAL_TYPE_SIZE + 1 - sizeof (int)];
so that the compiler can barf if it wants to, but there's no impact on the generated code.
That's better, but the code Paul pointed me at already has a structure containing these fields, which is why I did it that way. Paul should change his code in the same way. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> writes:
use... %4ld for years to avoid problems if a year isn't four digits long
That "%4ld" doesn't conform to the C standard, which says that leading zeros must not be printed for years. So the following program: #include <time.h> #include <stdio.h> int main (void) { struct tm tm; tm.tm_year = 999 - 1900; tm.tm_mon = 12 - 1; tm.tm_mday = 1; tm.tm_wday = 1; tm.tm_hour = tm.tm_min = tm.tm_sec = 0; puts (asctime (&tm)); return 0; } which is strictly conforming even on 32-bit time_t hosts, must print "Mon Dec 1 00:00:00 999", without a leading zero before the 999. Also, that proposal still assumes that EOVERFLOW is defined by <errno.h>. (Perhaps you're defining it in private.h if it's not already defined? That would explain this.) Finally, there's still a regression in that asctime now sometimes return NULL when it used to return a valid string. It's fairly common for programs to assume that asctime always succeeds. zdump.c itself is one such program (I've submitted patches for that but we're working on asctime first). Admittedly such programs are unportable, but I don't see why we should have them dump core when it's easy to have them succeed and return a valid string. Anyway, here's a proposal that incorporates your changes to avoid snprintf entirely, along with comments along the lines that I suggested in my earlier message today, and a few other comments about the regression noted above. /* ** This file is in the public domain, so clarified as of ** 1996-06-05 by Arthur David Olson (arthur_david_olson@nih.gov). */ #ifndef lint #ifndef NOID static char elsieid[] = "@(#)asctime.c 7.15"; #endif /* !defined NOID */ #endif /* !defined lint */ /*LINTLIBRARY*/ #include "private.h" #include "tzfile.h" #define STANDARD_BUFFER_SIZE 26 #ifndef EOVERFLOW # define EOVERFLOW EINVAL #endif /* ** A la ISO/IEC 9945-1, ANSI/IEEE Std 1003.1, 2004 Edition. */ /* ** Big enough for something such as ** ??? ???-2147483648 -2147483648:-2147483648:-2147483648 -2147483648\n ** (two three-character abbreviations, five strings denoting integers, ** three explicit spaces, two explicit colons, a newline, ** and a trailing ASCII nul). The above example assumes 32-bit int, ** but the same idea applies to all int widths. ** ** The year is printed as a long int, but it can't possibly take more ** digits than the number of digits printed in an int, because it is ** the sum of 1900 and an int value. Even if the sum overflows past ** INT_MAX, it will add at most one digit to the print width, and that ** extra byte is already accounted for by the width of INT_MIN (which ** has a leading minus sign). */ #define MAX_ASCTIME_SIZE (2 * 3 + 5 * INT_STRLEN_MAXIMUM(int) + 3 + 2 + 1 + 1) static char * asctime_rn(timeptr, buf, size) register const struct tm * timeptr; char * buf; size_t size; { static const char wday_name[][3] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" }; static const char mon_name[][3] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" }; register const char * wn; register const char * mn; char result[MAX_ASCTIME_SIZE]; if (timeptr->tm_wday < 0 || timeptr->tm_wday >= DAYSPERWEEK) wn = "???"; else wn = wday_name[timeptr->tm_wday]; if (timeptr->tm_mon < 0 || timeptr->tm_mon >= MONSPERYEAR) mn = "???"; else mn = mon_name[timeptr->tm_mon]; /* ** The format used in the (2004) standard is ** "%.3s %.3s%3d %.2d:%.2d:%.2d %d\n" ** Some systems only handle "%.2d"; others only handle "%02d"; ** "%02.2d" makes (most) everybody happy. */ /* ** We avoid using snprintf since it's not available on all systems. */ (void) sprintf(result, "%.3s %.3s%3d %02.2d:%02.2d:%02.2d %ld\n", wn, mn, timeptr->tm_mday, timeptr->tm_hour, timeptr->tm_min, timeptr->tm_sec, timeptr->tm_year + (long) TM_YEAR_BASE); if (strlen(result) >= size) { errno = EOVERFLOW; return NULL; } else { (void) strcpy(buf, result); return buf; } } char * asctime_r(timeptr, buf) register const struct tm * timeptr; char * buf; { return asctime_rn(timeptr, buf, STANDARD_BUFFER_SIZE); } /* ** A la ISO/IEC 9945-1, ANSI/IEEE Std 1003.1, 2004 Edition, ** with core dump avoidance. */ char * asctime(timeptr) register const struct tm * timeptr; { /* ** The standard requires only STANDARD_BUFFER_SIZE bytes in ** this static buffer. However, make it longer so that ** asctime never returns a null pointer. This supports the ** many (admittedly unportable) programs that assume that ** asctime never fails. */ static char result[MAX_ASCTIME_SIZE]; return asctime_rn(timeptr, result, sizeof result); }

Date: Tue, 27 Jul 2004 11:52:29 -0700 From: Paul Eggert <eggert@CS.UCLA.EDU> Message-ID: <87llh5b7v6.fsf@penguin.cs.ucla.edu> | That "%4ld" doesn't conform to the C standard, which says that leading | zeros must not be printed for years. %4ld doesn't print a leading 0, did you actually test that? | which is strictly conforming even on 32-bit time_t hosts, must print | "Mon Dec 1 00:00:00 999", without a leading zero before the 999. I doubt it. It should print Mon Dec 1 00:00:00 999 Otherwise the \n is in buf[24] instead of buf[25] where it belongs (where it must be for old code to keep on working). kre

Robert Elz <kre@munnari.oz.au> writes:
%4ld doesn't print a leading 0, did you actually test that?
Sorry, no, I misread the format. But leading spaces aren't allowed by the standard, so an implementation can't use %4ld either.
If the standard actually says what you say (I don't have anything to do with it) then the standard is broken, and someone should file a defect report.
Feel free, but when I suggested something like that earlier this year <http://groups.google.com/groups?selm=7wekp4gm9s.fsf%40sic.twinsun.com>, the response from P. J. Plauger (a member of the standardization committee) was that it was not important enough to spend energy on. See <http://groups.google.com/groups?q=g:thl4051128685d&selm=fIhvc.24831%24oh7.21...>.
This one isn't just of academic interest, there's lots of code that does stuff like
printf("The date is: %.24s today\n", asctime(tm));
and expects that there cannot be a newline between the date and the word "today".
Yup, it's a problem all right. However, in my experience code like that is generally nonportable already, since it assumes that asctime can't possibly overrun its static buffer, and this assumption is false for many POSIX platforms. So I don't think supporting this code is important enough to violate the standard. Also, more typically I see code like this: printf("The date is: %.24s today\n", ctime(&t)); and this is definitely broken for arbitrary time_t values, since ctime returns NULL if the time_t is so large that tm_year cannot represent the year.
participants (5)
-
Andrew Brown
-
Clive D.W. Feather
-
Olson, Arthur David (NIH/NCI)
-
Paul Eggert
-
Robert Elz