On 2024-01-14 03:14, Robert Elz wrote:
Date: Sat, 13 Jan 2024 22:51:14 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <99ee73a9-6336-4f3d-a8b3-e57b2e1817dd@cs.ucla.edu>
| It's not necessarily undefined behavior because we're talking about a | standard library function, and (as you mentioned elsewhere) such | functions need not be implemented in standard C.
It might not necessarily actually result in something bad happening, but the application must assume that it might.
There are two things going on here. (A) Does the POSIX strftime spec require the caller to set a struct tm component when this requirement should be unnecessary? And (B) does the POSIX spec fail to require the caller to set a struct tm component when the requirement should be necessary? (A) is of lesser importance in practice. Although it's overkill for the POSIX strftime spec to require struct tm components to be set when they don't need to be set, and this overkill can cause useless work by portable applications, it's not that big a deal. In practice nearly every app calls strftime on the result of localtime etc. and so the components are set anyway. Our thread, if I understand things correctly, is mostly about (B), not (A). More on this below.
| More important, the draft POSIX standard is simply wrong if it says | strftime can't look at (for example) tm_gmtoff when calculating %z,
The implementation can look at whatever it likes.
That's good news. In that case we're in agreement.
| In POSIX 2017 strftime must also look at TZ for %Z and %z,
I'm not sure must is correct, I suspect that the intended implementation for %Z is
if (_tz_inited) strcpy(result, tzname[tm->tm_isdst == 0]); else strcpu(result, "");
Oh, by "look at TZ" I meant look at data generated from TZ's value. tzname is part of that data, so the code you give is "looking at TZ" in the sense I meant. If by "_tz_inited" you meant "after[ strftime did its mandatory call to tzset-or-equivalent, that call sucessfully determined the current timezone", I agree the code you gave reflects the intent for POSIX-2017. (If you meant something else then it'd be helpful to know what it was.) However, it's not at all clear that the code reflects the intent, or should reflect the intent, for POSIX-202x/D4. Similarly for %z.
the implementation can do it some other way if it likes, but it cannot assume that tm->tm_gmtoff or tm->tm_zone has been set to anything intersting (tm_zone might be a pointer to somewhere which generates SIGSEGV if referenced).
That's true for POSIX-2017, but not true for POSIX 202x/D4. It's OK for the implementation to examine tm_zone when processing %Z. See POSIX 202x/D4 line 69872.
| I'm talking about where draft POSIX | mistakenly says strftime needs mktime (or equivalent) to implement %s.
It says nothing of the kind. What it says is:
Replaced by the number of seconds since the Epoch as a decimal number, calculated as described for mktime(). [tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_isdst]
That doesn't say that mktime() needs to be used
That's why I wrote "mktime (or equivalent)", not "mktime". The draft is imprecisely worded here, and it's easy to misread it as saying this:
strftime(buf, sizeof buf, "%s", &tm); needs to be the exact same thing you'd get from snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm));
but this reading isn't quite right. All that's needed is for strftime to compute seconds since the Epoch in the usual way (i.e., using the Gregorian calendar and ignoring leap seconds), and while doing that to infer the UTC offset following the constraints described in the mktime section. Those constraints do not uniquely determine a result in every case, and this gives strftime wiggle room. That is, strftime need not use the same code that mktime does to make its inferences, and on a particular struct tm an implementation's strftime %s could infer a different UTC offset than the same implementation's mktime on the same struct tm.
Note that in the strftime case though, the tm struct is not altered by the call, which it would be (if required) by the mktime() variant.
Yes, of course.
| First, an implementation of POSIX 202x/D3 strftime doesn't need to go | near TZ to implement any conversion, not even %s.
Agreed.
That's good.
| Second, the POSIX draft requires strftime to act as if it calls tzset,
No, what it says is:
Local timezone information shall be set as though strftime( ) called tzset( ).
I see that as the same idea, just using different words. There should be no practical difference.
So if you want %z or %Z then things act as if tzset() was called (whether it actually is or not), because there's no other way to get the data that would allow those to be possible.
But there is another way. With %z, strftime can use tm_gmtoff; see POSIX 202x/D4 line 69870. And with %Z, strftime can use tm_zone; see line 69872. The same argument applies to %s, if we fix the error on line POSIX 202x/D4 line 69837 where tm_gmtoff is mistakenly omitted. There's no need to look at tzset's output if strftime simply uses tm_gmtoff and the other struct tm members listed there.
| Where? I don't see LC_CTYPE mentioned anywhere in the strftime section | (202x/D3 lines 69411-69832). Obviously LC_CTYPE is essential but it's | not explicitly called out in strftime's description.
If it were needed to reference the LC_ vars in every function which uses them in some way, the standard would be even bigger than it is.
That's fine, and I'm not objecting to that. All I'm saying is that the standard does not list every source of information that strftime can use to process conversion specs. For example, when lines 69871-69872 say: Z Replaced by the timezone name or abbreviation, or by no bytes if no timezone information exists. [tm_isdst, tm_zone] This does not mean that %Z's replacement is completely determined by tm_isdst and tm_zone; all it means is that tm_isdst and tm_zone must be set by the caller and must be in the normal range so that strftime can use them (among other things) to determine %Z's replacement.
| something that is not | explicitly mentioned in the spec; and this demonstrates that the set of | things that strftime can use is not exhaustive.
Of course not, strftime() can look at whatever it likes.
Good, and this matches what I just wrote above. (I hope we're in violent agreement. :-)
Notice I said "strictly C conforming" not POSIX conforming.
In that environment, this line:
| tm.tm_gmtoff = gmtoff;
is likely to generate a compilation error, so the application cannot include it. That one must be removed to be strictly C conforming.
Oh, good point. So let's use similar code but without tm_gmtoff: #include <stdio.h> #include <time.h> static void f (char const *fn, struct tm *tm) { char buf[100]; tm->tm_isdst = 0; strftime (buf, sizeof buf, "%z", tm); printf ("after %s, %%z formats as '%s' with tm_isdst=%d\n", fn, buf, tm->tm_isdst); } int main () { time_t t = 0; f (" gmtime", gmtime (&t)); f ("localtime", localtime (&t)); } On Ubuntu 23.10 with TZ="America/Los_Angeles" in the environment, this outputs: after gmtime, %z formats as '+0000' with tm_isdst=0 after localtime, %z formats as '-0800' with tm_isdst=0 which is fine and is the sort of behavior that Dag-Erling Smørgrav expected, even though strftime obviously must be using information other than what's in tm_isdst (or even in TZ) to compute the differing strings "+0000" and "-0800". Because strftime can look at whatever it likes, this behavior is OK.
| It intends to fix the bug reported by Dag-Erling Smørgrav here: | https://mm.icann.org/pipermail/tz/2024-January/033488.html
But there is no bug described there, the answers that the example produced are what is intended to happen. That's because of the "same result as mktime() would produce" requirement.
As mentioned above, since mktime's behavior isn't completely determined by POSIX 202x/D4, there's wiggle room in how strftime can behave on Dag-Erling's example. One possibility is that tzcode conforms to POSIX 202x/D4 both before and after the recently-installed tzcode patch <https://mm.icann.org/pipermail/tz/2024-January/033524.html>. (This patch implements Dag-Erling's suggestion, albeit in a different way that avoids some rare overflow issues.) That is, it's possible that the patch didn't fix a POSIX-conformance bug, but merely a user-expectation bug in an area where POSIX allows different behaviors. If this possibility is correct, I guess I can live with it, though I'm a bit disappointed that POSIX allows the confusing behavior that Dag-Erling described. But anyway, this would mean the recently-installed patch is OK as far as POSIX 202x/D4 is concerned.
This is from draft 4, but I don't believe that any of this changed after D3.
Thanks, I didn't know that draft 4 was out. I got a copy and am now referring to it instead in my comments now. I too haven't noticed any changes in this area.