Minor (unimportant really) technical UB bug in strftime() ?
The code for the %s strftime conversion starts as: (this taken from tzcode2022f strftime.c - but I suspect that many versions are the same) case 's': { struct tm tm; char buf[INT_STRLEN_MAXIMUM( time_t) + 1]; time_t mkt; tm = *t; mkt = mktime(&tm); The "tm = *t;" line is the problem. It (or more correctly, something to replace that) is needed, as *t is const, strftime() is not permitted to modify it, yet the struct tm * passed to mktime() is not const, as mktime is sometimes required to modify that struct before returning. However, mktime() refers only to the fields tm_year tm_mon tm_mday tm_hour tm_min tm_sec and tm_isdst of the struct passed to it. The %s conversion is defined as producing (in a decimal string format) the result of mktime() applied to the struct tm passed to strftime. Hence an application which intends to use strftime(..., "%s", ...) need only initialise those 7 fields, the other 2 (currently) standard fields (tm_yday and tm_wday) are explicitly ignored by mktime() on input, so there is no need to assign those values, and any other fields that might exist in a particular implementation are out of scope for a portable (conforming) application to touch, so cannot be initialised. Since referring to an uninitialised variable is, I believe, technically undefined behaviour, doing a struct copy (copying all fields) can have that effect. (An application could memset() the entire struct before providing the values it must add, but is not required to do that). It seems likely that the simple fix is to replace the struct assignment with 7 individual assignment statements tm.tm_year = t->tm_year; tm.tm_mon = t->tm_mon; /* ... */ On the other hand, hardware that balks at simply loading or copying, uninitialised values, without using them for any purpose is not exactly common (and half the world's code would probably break if an implementation that worked in such a manner was ever created) so I'd call this one of the less significant issues to worry about fixing. kre
The code has: tm = *t; mkt = mktime(&tm); This makes a copy of the `const struct tm` that is the argument to strftime(), and passes that copy to mktime(), which reads a few fields and sets most fields. But since it is working on a copy of the argument to strftime(), I don't see a problem. Can you elaborate? Is there a good reason not to write this? struct tm tm = *t; time_t mkt = mktime(&tm); It's a stylistic question wholly separate from the const-ness issue. On Tue, Nov 8, 2022 at 6:07 AM Robert Elz via tz <tz@iana.org> wrote:
The code for the %s strftime conversion starts as: (this taken from tzcode2022f strftime.c - but I suspect that many versions are the same)
case 's': { struct tm tm; char buf[INT_STRLEN_MAXIMUM( time_t) + 1]; time_t mkt;
tm = *t; mkt = mktime(&tm);
The "tm = *t;" line is the problem. It (or more correctly, something to replace that) is needed, as *t is const, strftime() is not permitted to modify it, yet the struct tm * passed to mktime() is not const, as mktime is sometimes required to modify that struct before returning.
However, mktime() refers only to the fields tm_year tm_mon tm_mday tm_hour tm_min tm_sec and tm_isdst of the struct passed to it. The %s conversion is defined as producing (in a decimal string format) the result of mktime() applied to the struct tm passed to strftime. Hence an application which intends to use strftime(..., "%s", ...) need only initialise those 7 fields, the other 2 (currently) standard fields (tm_yday and tm_wday) are explicitly ignored by mktime() on input, so there is no need to assign those values, and any other fields that might exist in a particular implementation are out of scope for a portable (conforming) application to touch, so cannot be initialised.
Since referring to an uninitialised variable is, I believe, technically undefined behaviour, doing a struct copy (copying all fields) can have that effect. (An application could memset() the entire struct before providing the values it must add, but is not required to do that).
It seems likely that the simple fix is to replace the struct assignment with 7 individual assignment statements
tm.tm_year = t->tm_year; tm.tm_mon = t->tm_mon; /* ... */
On the other hand, hardware that balks at simply loading or copying, uninitialised values, without using them for any purpose is not exactly common (and half the world's code would probably break if an implementation that worked in such a manner was ever created) so I'd call this one of the less significant issues to worry about fixing.
kre
-- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2018.1031 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."
On Tue, Nov 08, 2022 at 07:15:39AM -0700, Jonathan Leffler via tz <tz@iana.org> wrote:
This makes a copy of the `const struct tm` that is the argument to strftime(), and passes that copy to mktime(), which reads a few fields and sets most fields. But since it is working on a copy of the argument to strftime(), I don't see a problem. Can you elaborate?
The argument is that some fields might have a trap representation, so copying the whole struct might not be safe. -- Marc Lehmann
Robert Elz via tz said:
The code for the %s strftime conversion starts as: (this taken from tzcode2022f strftime.c - but I suspect that many versions are the same)
case 's': { struct tm tm; char buf[INT_STRLEN_MAXIMUM( time_t) + 1]; time_t mkt;
tm = *t; mkt = mktime(&tm);
The "tm = *t;" line is the problem. It (or more correctly, something to replace that) is needed, as *t is const, strftime() is not permitted to modify it, yet the struct tm * passed to mktime() is not const, as mktime is sometimes required to modify that struct before returning.
However, mktime() refers only to the fields tm_year tm_mon tm_mday tm_hour tm_min tm_sec and tm_isdst of the struct passed to it. The %s conversion is defined as producing (in a decimal string format) the result of mktime() applied to the struct tm passed to strftime. Hence an application which intends to use strftime(..., "%s", ...) need only initialise those 7 fields, the other 2 (currently) standard fields (tm_yday and tm_wday) are explicitly ignored by mktime() on input, so there is no need to assign those values, and any other fields that might exist in a particular implementation are out of scope for a portable (conforming) application to touch, so cannot be initialised.
Since referring to an uninitialised variable is, I believe, technically undefined behaviour, doing a struct copy (copying all fields) can have that effect.
You could replace the assignment by a memcpy. Assignment via unsigned chars (which is what memcpy does) are exempt from the undefined behaviour.
On the other hand, hardware that balks at simply loading or copying, uninitialised values, without using them for any purpose is not exactly common (and half the world's code would probably break if an implementation that worked in such a manner was ever created) so I'd call this one of the less significant issues to worry about fixing.
It might not be common, but it was certainly something we had to consider. It's why I wrote the wording about "trap representation"s in C99 the way I did. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 11/8/22 06:28, Clive D.W. Feather via tz wrote:
You could replace the assignment by a memcpy. Assignment via unsigned chars (which is what memcpy does) are exempt from the undefined behaviour.
As I understand it, that exemption is for using memcpy to copy a trap representation. There's no similar exemption for using memcpy to copy uninitialized data; for example, the following function has undefined behavior: int y; void f(void) { int x; memcpy(&y, &x, sizeof y); } If I'm right, we can't get by simply by replacing the assignment with memcpy.
Paul Eggert said:
You could replace the assignment by a memcpy. Assignment via unsigned chars (which is what memcpy does) are exempt from the undefined behaviour.
As I understand it, that exemption is for using memcpy to copy a trap representation. There's no similar exemption for using memcpy to copy uninitialized data; for example, the following function has undefined behavior:
int y; void f(void) { int x; memcpy(&y, &x, sizeof y); }
If I'm right, we can't get by simply by replacing the assignment with memcpy.
You're wrong, pure and simple. There's no such thing as uninitialized data in that sense. The following quotes are from a late draft because that's all I have to hand right this second, but the final C99 wording was either the same or effectively so. [6.2.6.1] Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation. Elsewhere we say that unsigned char doesn't have any padding bits, so it holds values in the range 0 to (1<<CHAR_BIT) - 1 inclusive. Values stored in non-bit-field objects of any other object type consist of n x CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value. I've omitted the bit about bit-fields. Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation. So, for any type T other than character types, some byte sequences can be trap representations. Reading or writing a trap representation using type T is undefined behaviour. But reading or writing it using a character type isn't, though in the case of writing the result could be a trap representation. That means that memcpy always has defined behaviour. [3.17.2] indeterminate value either an unspecified value or a trap representation [6.7.8] If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. So, given, int x; int y = x; within a block, x holds either some unspecified (valid) value or a trap representation. If it's an unspecified value, y is set to the same value. If it's a trap representation, you hit undefined behaviour. In particular, if int has N bits and can hold 1<<N different values, then all possible object representations are valid and therefore x can't hold a trap representation, so the assignment is safe. We very explicitly wanted memcpy to be safe with uninitialized values. That's why it's worded this way. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 2022-11-08 23:02, Clive D.W. Feather wrote:
You're wrong, pure and simple. There's no such thing as uninitialized data in that sense.
OK, thanks, I stand corrected. However, the C standard is (dare i say it) *peculiar* in this area. And because some implementations don't follow this part of the standard, it's safer to avoid using memcpy on uninitialized data. For example, see the following behavior observed on Fedora 36 x86-64. Although the C standard says this program is OK, Valgrind says it's not. Because Valgrind's interpretation is more likely to be useful in practice (in that it's more likely to catch real bugs, something that's more important than exploiting this peculiarity of the standard), tzcode should be portable to Valgrind and should avoid using memcpy on uninitialized data. $ cat t.c #include <string.h> int x; int main (void) { int y; memcpy (&x, &y, sizeof x); return x ? 27 : 49; } $ gcc t.c $ valgrind ./a.out ==9532== Memcheck, a memory error detector ==9532== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==9532== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==9532== Command: ./a.out ==9532== ==9532== Conditional jump or move depends on uninitialised value(s) ==9532== at 0x40111B: main (in /home/eggert/junk/a.out) ...
Paul Eggert via tz <tz@iana.org> writes:
For example, see the following behavior observed on Fedora 36 x86-64. Although the C standard says this program is OK, Valgrind says it's not.
int main (void) { int y; memcpy (&x, &y, sizeof x); return x ? 27 : 49; }
I think you are misinterpreting what Valgrind does, too. It does not complain about that memcpy. What it does during the memcpy is to copy its defined-ness status for the bytes of y to the bytes of x. Then it complains when you use the now-known-undefined value of x in the return statement. So in the case in strftime(), Valgrind would only complain if mktime() actually tried to use any fields that had not been initialized by the caller of strftime(). Which is just what one would want. So AFAICS, both Valgrind and the letter of the C standard agree that using memcpy to copy the tm struct will be fine. With the additional datum that nobody has complained about this code in decades, I can't see an argument for writing longer and more fragile code as a substitute for a full-struct copy. regards, tom lane
I'm not sure what the standard says about struct copies, but I know that most compilers do struct copies by just copying the raw bytes for the size of the structure, and don't copy each field using the type of the field. so replacing the struct copy with a memcpy would not make a functional difference. John On 11/9/2022 12:25 PM, Tom Lane via tz wrote:
Paul Eggert via tz <tz@iana.org> writes:
For example, see the following behavior observed on Fedora 36 x86-64. Although the C standard says this program is OK, Valgrind says it's not. int main (void) { int y; memcpy (&x, &y, sizeof x); return x ? 27 : 49; } I think you are misinterpreting what Valgrind does, too. It does not complain about that memcpy. What it does during the memcpy is to copy its defined-ness status for the bytes of y to the bytes of x. Then it complains when you use the now-known-undefined value of x in the return statement. So in the case in strftime(), Valgrind would only complain if mktime() actually tried to use any fields that had not been initialized by the caller of strftime(). Which is just what one would want.
So AFAICS, both Valgrind and the letter of the C standard agree that using memcpy to copy the tm struct will be fine. With the additional datum that nobody has complained about this code in decades, I can't see an argument for writing longer and more fragile code as a substitute for a full-struct copy.
regards, tom lane
John Marvin via tz said:
I'm not sure what the standard says about struct copies,
Footnote 42: 42)Thus, for example, structure assignment may be implemented element-at-a-time or via memcpy. This is in the context of: When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.42) The values of padding bytes shall not affect whether the value of such an object is a trap representation. So if all the members of a structure hold object representations that aren't trap representation, then structure copy is valid no matter how implemented, though any padding bytes can change value unpredictably. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On Nov 9, 2022, at 3:07 PM, Clive D.W. Feather via tz <tz@iana.org> wrote:
trap representation.
Does the C specification allow for a C implementation on a hypothetical load-store architecture machine in which: a byte is 8 bits plus an "initialized bit"; the "uninitialize" instruction has a memory-location operand, and clears the initialized bit; a store instruction set the initialized bit on all bytes to which it stores; a load instruction traps if any of the bytes from which it's loading doesn't have the initialized bit set?
On 11/9/22 15:34, Guy Harris via tz wrote:
Does the C specification allow for a C implementation on a hypothetical load-store architecture machine in which:
a byte is 8 bits plus an "initialized bit";
the "uninitialize" instruction has a memory-location operand, and clears the initialized bit;
a store instruction set the initialized bit on all bytes to which it stores;
a load instruction traps if any of the bytes from which it's loading doesn't have the initialized bit set?
No, not if memcpy is implemented via load+store. The C standard says memcpy is supposed to work even if its source bytes are uninitialized. (This was news to me until today, and I'm not a fan of this part of the standard.)
Guy Harris said:
trap representation.
Does the C specification allow for a C implementation on a hypothetical load-store architecture machine in which:
a byte is 8 bits plus an "initialized bit";
the "uninitialize" instruction has a memory-location operand, and clears the initialized bit;
a store instruction set the initialized bit on all bytes to which it stores;
a load instruction traps if any of the bytes from which it's loading doesn't have the initialized bit set?
No. My memory is that we agreed "no hidden bits" when writing that wording. I do recall that parity bits were discussed and we agreed that they weren't part of the representation of a value. Of course, an implementation could do what you suggested provided that there was a way of disabling it so as to produce a conforming implementation. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 09/11/2022 23:34, Guy Harris via tz wrote:
On Nov 9, 2022, at 3:07 PM, Clive D.W. Feather via tz <tz@iana.org> wrote:
trap representation.
Does the C specification allow for a C implementation on a hypothetical load-store architecture machine in which:
a byte is 8 bits plus an "initialized bit";
the "uninitialize" instruction has a memory-location operand, and clears the initialized bit;
a store instruction set the initialized bit on all bytes to which it stores;
a load instruction traps if any of the bytes from which it's loading doesn't have the initialized bit set?
Uninitialized objects may be smaller than a byte, so this hypothetical machine would require one "initialized" flag per bit, not one per byte. -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-
Ian Abbott said:
Uninitialized objects may be smaller than a byte, so this hypothetical machine would require one "initialized" flag per bit, not one per byte.
I left out the Standard text about bit-fields in this context, but it's there. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 11/9/22 11:25, Tom Lane wrote:
So in the case in strftime(), Valgrind would only complain if mktime() actually tried to use any fields that had not been initialized by the caller of strftime(). Which is just what one would want.
True, and good point. And I understand why Valgrind wants to delay checking here: it wants to avoid false alarms in some programs. Unfortunately Valgrind's delay in reporting is problematic, because Valgrind sometimes neglects to report behavior that the C standard says is undefined. For example: $ cat t.c int main (void) { int a, b=a; return b; } $ gcc t.c $ valgrind ./a.out [no error is reported] Conversely, as my previous email showed, Valgrind sometimes reports behavior that I think is a program failure that should be fixed, but the C standard says is well-defined behavior. Although Valgrind was pickier in that email than the C standard really allows, I wish Valgrind were a bit pickier still and reported each use of an uninitialized variable when it happens and not optionally later (by then, it may be difficult to debug), and I'd like tzcode to work with such a Valgrind-like system.
Hi, Le 10/11/2022 à 01:28, Paul Eggert via tz a écrit :
On 11/9/22 11:25, Tom Lane wrote:
So in the case in strftime(), Valgrind would only complain if mktime() actually tried to use any fields that had not been initialized by the caller of strftime(). Which is just what one would want.
True, and good point. And I understand why Valgrind wants to delay checking here: it wants to avoid false alarms in some programs. Unfortunately Valgrind's delay in reporting is problematic, because Valgrind sometimes neglects to report behavior that the C standard says is undefined. For example:
$ cat t.c int main (void) { int a, b=a; return b; } $ gcc t.c $ valgrind ./a.out [no error is reported]
Here I'm getting: $ valgrind ./a.out ==511562== Memcheck, a memory error detector ==511562== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==511562== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==511562== Command: ./a.out ==511562== ==511562== Syscall param exit_group(status) contains uninitialised byte(s) ==511562== at 0x4955DF1: _Exit (_exit.c:30) ==511562== by 0x48B1561: __run_exit_handlers (exit.c:136) ==511562== by 0x48B161F: exit (exit.c:143) ==511562== by 0x4896516: (below main) (libc_start_call_main.h:74) ==511562== ==511562== ==511562== HEAP SUMMARY: ==511562== in use at exit: 0 bytes in 0 blocks ==511562== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==511562== ==511562== All heap blocks were freed -- no leaks are possible ==511562== ==511562== Use --track-origins=yes to see where uninitialised values come from ==511562== For lists of detected and suppressed errors, rerun with: -s ==511562== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Conversely, as my previous email showed, Valgrind sometimes reports behavior that I think is a program failure that should be fixed, but the C standard says is well-defined behavior. Although Valgrind was pickier in that email than the C standard really allows, I wish Valgrind were a bit pickier still and reported each use of an uninitialized variable when it happens and not optionally later (by then, it may be difficult to debug), and I'd like tzcode to work with such a Valgrind-like system.
With --track-origins=yes : $ valgrind --track-origins=yes ./a.out ==511667== Memcheck, a memory error detector ==511667== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==511667== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==511667== Command: ./a.out ==511667== ==511667== Syscall param exit_group(status) contains uninitialised byte(s) ==511667== at 0x4955DF1: _Exit (_exit.c:30) ==511667== by 0x48B1561: __run_exit_handlers (exit.c:136) ==511667== by 0x48B161F: exit (exit.c:143) ==511667== by 0x4896516: (below main) (libc_start_call_main.h:74) ==511667== Uninitialised value was created by a stack allocation ==511667== at 0x109129: main (in /home/ydroneaud/src/test/a.out) ==511667== ==511667== ==511667== HEAP SUMMARY: ==511667== in use at exit: 0 bytes in 0 blocks ==511667== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==511667== ==511667== All heap blocks were freed -- no leaks are possible ==511667== ==511667== For lists of detected and suppressed errors, rerun with: -s ==511667== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Using UBSan and MSan (requires clang) instead of valgrind would catch this too: $ clang -fsanitize=undefined,memory -fsanitize-memory-track-origins=2 -g t.c $ ./a.out ==511862==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55a79df79b35 in main .../t.c:1:31 #1 0x7f686dc2350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #2 0x7f686dc235c8 in __libc_start_main csu/../csu/libc-start.c:381:3 #3 0x55a79def3294 in _start (.../a.out+0x1e294) (BuildId: 576806214712a462122845f7fa09af26e1237aee) Uninitialized value was stored to memory at #0 0x55a79df79adf in main .../t.c:1:26 #1 0x7f686dc2350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 Uninitialized value was created by an allocation of 'a' in the stack frame of function 'main' #0 0x55a79df799a0 in main .../t.c:1 SUMMARY: MemorySanitizer: use-of-uninitialized-value .../t.c:1:31 in main Exiting Regards. -- Yann Droneaud OPTEYA
Paul Eggert said:
You're wrong, pure and simple. There's no such thing as uninitialized data in that sense.
OK, thanks, I stand corrected. However, the C standard is (dare i say it) *peculiar* in this area. And because some implementations don't follow this part of the standard, it's safer to avoid using memcpy on uninitialized data.
Implementations or tools like lint or valgrind? Are there implementations that can't copy an arbitrary byte of memory to another location? It's quite understandable that valgrind is more picky and often that's what you want. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 11/10/22 02:17, Clive D.W. Feather wrote:
Implementations or tools like lint or valgrind? Are there implementations that can't copy an arbitrary byte of memory to another location?
It depends on what one means by "implementation". There are combinations of compilers and runtimes that operate that way. Valgrind is one example, and as Yann reports Clang is another if you use certain options. I don't know of any platform that cannot copy uninitialized bytes no matter what: valgrind is optional, and clang's -fsanitize option is optional, and you can run your program without these options. Still, I don't understand why the C committee required implementations to support copying from uninitialized memory. Such copying is not that useful in practice, and since it's quite useful to treat it to be an error, why force implementations to support it? I'm curious about this partly because the C standard's wording in this area is (a) so obscure that I didn't know about it despite long experience reading the standard, and (b) so buggy that the committee had to change the wording again in C23, because the wording in C17 was so unclear that it was misinterpreted. (And this is a tricky business: the wording was changed multiple times in the drafting process for C23.) The C committee has evidently gone to some length to require support for the obscure and troublesome feature of copying uninitialized data, for reasons that escape me.
Paul Eggert via tz <tz@iana.org> writes:
I'm curious about this partly because the C standard's wording in this area is (a) so obscure that I didn't know about it despite long experience reading the standard, and (b) so buggy that the committee had to change the wording again in C23, because the wording in C17 was so unclear that it was misinterpreted. (And this is a tricky business: the wording was changed multiple times in the drafting process for C23.) The C committee has evidently gone to some length to require support for the obscure and troublesome feature of copying uninitialized data, for reasons that escape me.
I don't find that surprising; in fact, simple modifications of the strftime() case can provide pretty good examples. You seem to want to insist that strftime() copy only those fields that mktime() is going to consult later. What if the set of fields that mktime() consults is data-dependent? That leads either to strftime() knowing much more about mktime()'s innards than it ought to, or to having to copy fields that might not get referenced, opening a completely unnecessary risk of hitting trap representations. I think it's a very good thing that strftime() can simply copy the whole struct without making any assumptions about which fields were set by the caller or will be referenced by mktime(). I'm bemused that you seem to want to not use the ability that the C committee has made sure to provide for you. regards, tom lane
On 2022-11-10 16:11, Tom Lane wrote:
simple modifications of the strftime() case can provide pretty good examples. I'm not sure I'm following this argument, but if I understand it correctly I disagree. I very much much want implementations to reject attempts to use uninitialized data, and I don't want the C standard to prohibit such implementations. Use of uninitialized data is a serious problem in real applications - several CVEs have arisen from it - and the C standard shouldn't require implementations to sweep it under the rug.
Let me try to make this more concrete. Suppose we are on a platform with no trap representations (which is the typical case), and suppose we make the following patch to tzcode's implementation of strftime with "%s": --- a/strftime.c +++ b/strftime.c @@ -319,16 +319,7 @@ time_t) + 1]; time_t mkt; - tm.tm_sec = t->tm_sec; - tm.tm_min = t->tm_min; - tm.tm_hour = t->tm_hour; - tm.tm_mday = t->tm_mday; - tm.tm_mon = t->tm_mon; - tm.tm_year = t->tm_year; - tm.tm_isdst = t->tm_isdst; -#if defined TM_GMTOFF && ! UNINIT_TRAP - tm.TM_GMTOFF = t->TM_GMTOFF; -#endif + memcpy(&tm, t, sizeof tm); mkt = mktime(&tm); /* If mktime fails, %s expands to the value of (time_t) -1 as a failure and suppose someone calls tzcode strftime this way: struct buf[1000]; char *f(void) { struct tm uninitialized; return strftime(buf, sizeof buf, "%s", &tm); } This call is obviously buggy since it is passing an uninitialized struct tm to strftime. However, the C standard says the patched strftime's memcpy initializes strftime's local variable tm, and therefore mktime's use of tm's contents is defined and the implementation allow the program's use of uninitialized data. Whatever the reason for this peculiarity of the C standard (and nobody seems to remember why it's there), it's counterproductive. In general C programs shouldn't use uninitialized data. If they do, it's useful to have a debugging implementation that reports any such use the instant that it happens as this will help catch real bugs. And since common debugging implementations like valgrind and UBSan are not following the C standard here, this suggests that the C standard is mistaken, not the implementations.
That leads either to strftime() knowing much more about mktime()'s innards than it ought to
It's not a problem for strftime, since strftime's has to know what's needed in the struct tm; this follows from strftime's API. It might be a problem for other scenarios, such as the one you vaguely suggested though I don't know the details. However, even assuming these other scenarios exist, they don't justify this peculiarity of the C standard. The standard should not say that the ordinary meaning of "don't use uninitialized variables" is wrong and that there's some trickier interpretation that hardly any programmer knows and that was wrong even in C17 and so needed rewording (and perhaps still needs rewording) and that nevertheless must be supported - even though important debugging implementations disagree.
Date: Thu, 10 Nov 2022 15:46:36 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <6822d9e0-bb9d-9cca-8e90-db5ad798d298@cs.ucla.edu> | The | C committee has evidently gone to some length to require support for the | obscure and troublesome feature of copying uninitialized data, for | reasons that escape me. I assume it is so portable code, which because of that cannot touch any implementation extension fields, has a way to copy a struct, based just upon its size (and a pointer to one of course), with the copy copying all fields (including non-standard extras) - because of that, copying field by field is out, and struct copy is out, because any uninit'd fields (probably extensions that the application doesn't know about either) might cause a trap. sizeof(struct whatever) always works to get the size, no matter how many extra fields might have been added, but also includes any padding that might exist, which is unlikely to be initialised unless the application has seen a need to memset() the whole thing. So a means is needed to copy that uninitialised (potentially trapping) memory safely. Kind of a pity they chose to make individual byte reads never trap (especially for architectures with no byte fetch from RAM instruction, where all fetches less than the word size need to be fetch/shift/mask type sequences) and so where if a word fetch generates a trap, the trap handler has to look at the code sequence that was going to be executed next, and hide the trap if just a byte was being accessed, but apparently, that's what would be required. kre
Robert Elz via tz said:
| The | C committee has evidently gone to some length to require support for the | obscure and troublesome feature of copying uninitialized data, for | reasons that escape me.
I assume it is so portable code, which because of that cannot touch any implementation extension fields, has a way to copy a struct, based just upon its size (and a pointer to one of course), with the copy copying all fields (including non-standard extras) - because of that, copying field by field is out, and struct copy is out, because any uninit'd fields (probably extensions that the application doesn't know about either) might cause a trap.
As I said, I don't recall why we did it, but that's a good use case. Struct copy is allowed to use (at least) memcpy or member-by-member assignment, so it is allowed to trap on an uninitialized member.
sizeof(struct whatever) always works to get the size, no matter how many extra fields might have been added, but also includes any padding that might exist, which is unlikely to be initialised unless the application has seen a need to memset() the whole thing.
Right. However, the contents of padding does not affect whether the structure (or union) as a whole is a trap representation or not. In other words, if all the members are initialized, you can safely copy the structure without worrying about the padding. Also note that all padding bytes become unspecified every time anything in the structure changes; they're not required to stay constant. (This is all C99 wording, of course.)
So a means is needed to copy that uninitialised (potentially trapping) memory safely.
Yes, but see above.
Kind of a pity they chose to make individual byte reads never trap (especially for architectures with no byte fetch from RAM instruction, where all fetches less than the word size need to be fetch/shift/mask type sequences) and so where if a word fetch generates a trap, the trap handler has to look at the code sequence that was going to be executed next, and hide the trap if just a byte was being accessed, but apparently, that's what would be required.
A byte is supposed to be individually addressable and accessible. So the word fetch shouldn't be trapping. On the XAP2, the minimum addressable and accessible object is 16 bits, so at least on our implementation that was a byte (i.e. CHAR_BITS == 16). If you wanted to get one or the other octet out of it, you had to do a fetch/shift/mask. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
Paul Eggert said:
Implementations or tools like lint or valgrind? Are there implementations that can't copy an arbitrary byte of memory to another location?
It depends on what one means by "implementation". There are combinations of compilers and runtimes that operate that way. Valgrind is one example, and as Yann reports Clang is another if you use certain options.
I don't know of any platform that cannot copy uninitialized bytes no matter what:
That's what I meant. Thanks.
Still, I don't understand why the C committee required implementations to support copying from uninitialized memory. Such copying is not that useful in practice, and since it's quite useful to treat it to be an error, why force implementations to support it?
Because it can be useful. The code that started this discussion is an example. More to the point, at the time we had members of WG14 telling us that there were code bases that relied on it. Given how long ago this was, I don't remember what they were (or even if we were told - our practice was to assume members were being honest about such things). So we made it valid. Note that this *only* applies to copying through character types (and I don't recall why we didn't make that unsigned char only). int x; int y = x; is undefined behaviour because it might assign a trap representation to y.
I'm curious about this partly because the C standard's wording in this area is (a) so obscure that I didn't know about it despite long experience reading the standard, and (b) so buggy that the committee had to change the wording again in C23, because the wording in C17 was so unclear that it was misinterpreted. (And this is a tricky business: the wording was changed multiple times in the drafting process for C23.)
I dropped out a year or two after C99 was released, so I have no idea what was done. I'd be interested to see the wordings you're referring to.
The C committee has evidently gone to some length to require support for the obscure and troublesome feature of copying uninitialized data, for reasons that escape me.
Because it {is,was} useful and because we were asked to make it continue to work. With the emphasis on "continue". -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 2022-11-11 03:55, Clive D.W. Feather wrote:
int x; int y = x;
is undefined behaviour because it might assign a trap representation to y.
It's undefined even on implementations that don't have trap representations. Draft C23 6.3.2.1 "Lvalues, arrays, and function designators" says, "If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined." Hence, in the typical case where there are no trap representations, this code: time_t x, y = x; return !y; has undefined behavior, whereas this code: time_t x, y = x; return time(&x) == (time_t)-1 ? !y : !!y; must return 0 or 1 and the implementation cannot reject the first line's use of the uninitialized variable x. This part of the standard is strange indeed, and I can't imagine debugging implementation wanting to conform to the standard as written.
On 2022-11-11 03:55, Clive D.W. Feather wrote:
I'd be interested to see the wordings you're referring to.
Some of the rewording attempts for this the area of "It's sometimes OK for C code to read uninitialized data but we're not sure exactly what that means" are here (I don't know whether I have them all): https://open-std.org/JTC1/SC22/WG14/www/docs/n2668.pdf https://open-std.org/JTC1/SC22/WG14/www/docs/n2772.pdf https://open-std.org/JTC1/SC22/WG14/www/docs/n2861.pdf The current draft C23 is here: https://open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf
On 11/8/22 05:07, Robert Elz via tz wrote:
It seems likely that the simple fix is to replace the struct assignment with 7 individual assignment statements
tm.tm_year = t->tm_year; tm.tm_mon = t->tm_mon; /* ... */
Thanks for reporting the problem and suggesting a fix. Strictly speaking, the tzcode implementation of strftime doesn't violate the C standard here, since it implements the %s format, and the C standard doesn't specify the behavior of %s (which therefore can be undefined if you pass unset tm_yday etc.). However, the next release of POSIX will require support for %s, and as I recall the current draft for the next POSIX has messed up its spec for %s, so tzcode is on shaky ground here (as is the draft POSIX spec :-), and we can do better. Worse, there's a similar problem in localtime.c's implementation of mktime, for which the C standard clearly says mktime should not crash merely because tm_yday etc. is uninitialized. Here, tzcode mktime.c clearly is running afoul of the C standard. There's a subtle point here. By default tzcode's implementation of mktime sometimes looks at tm_gmtoff which the C standard says can be uninitialized and therefore has undefined behavior. On typical platforms, reading uninitialized data gives you arbitrary bits without trapping; on these platforms looking at tm_gmtoff will merely yield randomish bits (which tzcode mktime deals with reliably). However, on oddball platforms where reading uninitialized data traps or worse, you're supposed to compile code with -DUNINIT_TRAP so that mktime does not look at tm_gmtoff (and therefore has worse behavior in other ways). As you write, these are not likely to be problems on practical hosts. After all, the code has done it this way since 1989 with no bugs reported. However, we might as well do it correctly. I installed the attached patch which I hope addresses these issues satisfactorily.
participants (10)
-
Clive D.W. Feather -
Guy Harris -
Ian Abbott -
John Marvin -
Jonathan Leffler -
Marc Lehmann -
Paul Eggert -
Robert Elz -
Tom Lane -
Yann Droneaud