Date: Tue, 12 Nov 2024 20:54:47 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <0a79acab-8509-488e-ba69-287bb92e4c97@cs.ucla.edu> | Would there be any objection to changing tzcode strftime so that "%?" | and other invalid formats output themselves? That would match GNU/Linux, | AIX, and Solaris instead of matching the BSDs. The BSDs largely run tzcode, so do what they do because tzcode does. Were I to alter that on NetBSD (that is, no longer do what tzcode does) it would be to return an error in that case. As long as we're copying tzcode here, what incorrect thing you choose to do, amongst several choices (another would be to omit both the % and the ? and any intervening pretend to be flags/widths/...) doesn't matter very much, applications should not be doing that anyway. | > surely the intent of the option is to allow the implementation | > to write the correct value | | That's not how POSIX usually works. It actually is, it just isn't the way you want to interpret it. See section 2.3 (Error Numbers) where it says: The ERRORS section on each reference page specifies which error conditions shall be detected by all implementations (``shall fail'') and which may be optionally detected by an implementation (``may fail''). There's nothing new there, we all knew that. It continues: If no error condition is detected, the action requested shall be successful. "shall be successful". | To take an extreme example, on a machine with 32-bit signed | time_t, POSIX allows time(0) to wrap around after 2038, and return | negative time_t values; this is because EOVERFLOW is a "may fail" for | "time". Nonsense, that is not what it is allowing at all. On a system with 32 bit time_t (which would include NetBSD running an app compiled on NetBSD 4 or earlier, which we still support) there are two possibilities. Either we simply act as if the system really had a 32 bit time_t, and return the 32 low order bits of the time, which definitely will wrap around after 2038 (unless we alter the 32->64 bit mapping, which for present purposes, assume we won't do). That correctly emulates a true 32 bit time_t system, which is what NetBSD 4 (and earlier) were. In that case, that is all that is possible - that value must be returned, and no EOVERFLOW is possible - the system cannot tell the difference between Tue Jan 19 03:14:08 UTC 2038 (time_t == 0x8000000 on a 64 bit time_t) and Fri Dec 13 20:45:52 UTC 1901 (time_t == 0x8000000 on a 32 bit time_t), both in UTC for simplicity, otherwise the actual times vary by timezone. You might believe it impossible for 1901 to be the correct year, but it certainly can be - if I decide I want to run my system with the date set to back then, perhaps as an emulation of events at that time, if only they had modern computers to record them. It is simply impossible to tell the difference between a 32 bit time_t that has wrapped, and one that legitimately has that value (unless you're doing the calculation that moves from 0x7FFFFFFF to 0x80000000 which the time() system call, or library function more likely these days, certainly is not). Whatever system call actually is executed to return the 32 bit time_t (or storing the value into a mapped memory page or however it is done) -- and remember everything but the kernel only knows 32 bit time_t values in this situation -- could have detected that the overflow happened and returned an error instead (even a genuine 32 bit time_t kernel could have done that). But if that didn't happen we have no choice but to return the value the kernel gave us back to the application. The kernel says X is the number of seconds since the epoch. We trust that, we have no reason, or justification, to do otherwise. However, given that modern NetBSD has 64 bit time_t's, when we map that to the 32 bit value for old system emulation purposes, we can tell the difference between 2038 and 2001 being the actual system time. In the latter case simply return that, in the former case POSIX allows us to set EOVERFLOW as we know we are unable to do as the spec for time requires, which is: The time() function shall return the value of time in seconds since the Epoch. There is no option there, either we do that (and be successful) or (unless some other error occurs, such as the pointer supplied being invalid) we must return EOVERFLOW. However, if (as best as we are able to determine) the time really is 1901 then the time_t that says that must be returned since even in a 32 bit (signed) time_t, that fits (well, from late Dec 13 1901 onwards it fits). | As I hope the "time (0)" example illustrates, whether something is | nonsense is in the eye of the beholder. POSIX doesn't have an opinion on | that. You're wrong, it does. | Yes of course, and that's why it'd be good to fix tzcode to do the right | thing here, which would be to generate the correct value regardless of | whether it fits into time_t - something that glibc already does. There's an obvious intermediate step, which is trivial to program, which could be done, which isn't as good as handling any random sized time_t and any random sized int in the struct tm, but would be adequate for practical purposes to handle the cases that actually arise. That is, make an __mktime() function with a signature intmax_t __mktime(struct tm *); (that can take a const struct tm * perhaps, see later). Then time_t mktime(struct tm *tm) { const int sverr = errno; time_t t; /* * here normalise the values in the struct if * __mktime()'s arg is const, so it cannot do it/ */ // code to do that omitted errno -; intmax_t result = __mktime(tm); if (result == -1 && errno != 0) return (time_t) -1; if (result < TIMET_MIN || result > TIMET_MAX) { errno = EOVERFLOW; return (time_t) -1; } t = result; // do *tm = localtime(&t); if needed return t; } and in strftime() simply call __mktime to get the result as an intmax_t and then format that (whether it would fit in a time_t or not) unless __mktime() returns an error, in which case strftime() simply returns 0, leaving errno from __mktime(). The implementation of __mktime() is simply the current mktime() adapted to use intmax_t instead of time_t. But you're right, this isn't high priority, and having systems where strftime(%s) can fail is good to help achieve better code portability. But as POSIX said above: If no error condition is detected, the action requested shall be successful. and for strftime's %s conversion, "successful" means: Replaced by the number of seconds since the Epoch as a decimal number, calculated as described for mktime(). Nothing at all about fitting in a time_t or actually calling mktime(), just calculates by the same method as described for mktime(). mktime() itself has a limitation, it can only return values that fit in a time_t so is required to return EOVERFLOW if the value doesn't fit. Similarly, implementations of strftime() are allowed to call mktime() to do the calculation "as described for mktime()" (not required to do it that way, but permitted) and then if they do, and mktime() returns EOVERFLOW then strftime() is as well - but in that case it must return 0 as its result (and what is in the buffer is unspecified in that case). There really are no options here beyond that if you want to conform. Either you return the result the standard says you must return, and a "success" return value (> 0 in this case) or you return 0 (failed) and set errno. Nothing else conforms. kre