Date: Sat, 13 Jan 2024 22:51:14 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <99ee73a9-6336-4f3d-a8b3-e57b2e1817dd@cs.ucla.edu> | It's not necessarily undefined behavior because we're talking about a | standard library function, and (as you mentioned elsewhere) such | functions need not be implemented in standard C. It might not necessarily actually result in something bad happening, but the application must assume that it might. | More important, the draft POSIX standard is simply wrong if it says | strftime can't look at (for example) tm_gmtoff when calculating %z, The implementation can look at whatever it likes. There's no problem there, but it cannot assume that the application has placed any meaningful data in the fields that the standard doesn't say that strftime() is likely to examine. | In POSIX 2017 strftime must also look at TZ for %Z and %z, I'm not sure must is correct, I suspect that the intended implementation for %Z is if (_tz_inited) strcpy(result, tzname[tm->tm_isdst == 0]); else strcpu(result, ""); (ignoring buffer overflows and stuff like that), and similarly for %z except using sprintf to generate a string from "timezone" in the case that a value has earlier been placed there - still "" otherwise). No examination of TZ (by strftime) required for those. Of course, the implementation can do it some other way if it likes, but it cannot assume that tm->tm_gmtoff or tm->tm_zone has been set to anything intersting (tm_zone might be a pointer to somewhere which generates SIGSEGV if referenced). | That's another place where the POSIX draft gets things wrong. | I'm talking about where draft POSIX | mistakenly says strftime needs mktime (or equivalent) to implement %s. It says nothing of the kind. What it says is: Replaced by the number of seconds since the Epoch as a decimal number, calculated as described for mktime(). [tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_isdst] That doesn't say that mktime() needs to be used, just that the value you get from strftime(buf, sizeof buf, "%s", &tm); needs to be the exact same thing you'd get from snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm)); Note that in the strftime case though, the tm struct is not altered by the call, which it would be (if required) by the mktime() variant. Regardless of how those 7 fields of the struct tm got filled in, and regardless of what (if anything at all) is in the other fields of the struct (the ones that must exist, and any others the implementation might have added), if those two ways of generating a string representation of an integer in buf don't do the same thing (assuming the mktime() variant doesn't generate an error and return (time_t)-1 - and perhaps even then) then the implementation is broken. | First, an implementation of POSIX 202x/D3 strftime doesn't need to go | near TZ to implement any conversion, not even %s. Agreed. | Second, the POSIX draft requires strftime to act as if it calls tzset, No, what it says is: Local timezone information shall be set as though strftime( ) called tzset( ). So if you want %z or %Z then things act as if tzset() was called (whether it actually is or not), because there's no other way to get the data that would allow those to be possible. Similarly, with %s, since we need to act as if mktime() was called (or at least generate the same result) then we need to know the local timezone, so mktime() is defined to act as if tzset() were called, and consequently, so does strftime() when using %s. For other conversions, there's no reference to local time at all, strftime() simply formats whatever is in the tm handed to it (or probably nothing, if you ask for the name of the 13th day of the week, or the 19th month, or something similarly stupid - that's actually unspecified). | Where? I don't see LC_CTYPE mentioned anywhere in the strftime section | (202x/D3 lines 69411-69832). Obviously LC_CTYPE is essential but it's | not explicitly called out in strftime's description. If it were needed to reference the LC_ vars in every function which uses them in some way, the standard would be even bigger than it is. They are listed for utilities, but for the system interfaces (ie: functions) see XBD 7.1 where it says: The behavior of some of the C-language functions defined in the System Interfaces volume of POSIX.1-202x shall also be modified based on a locale selection. The locale to be used by these functions can be selected in the following ways: [the first two mechanisms that different functions might use omitted here] 3. Some functions, such as catopen( ) and those related to text domains, may reference various environment variables and a locale category of a specific locale to access files they need to use. And on it goes. This is from draft 4, but I don't believe that any of this changed after D3. | I mentioned LC_CTYPE only because it's another example of something that | strftime can use to calculate conversions, I assume it might want LC_NUMERIC as well, but that's not mentioned either. LC_TIME is, because that one is particularly relevant to some of the conversions. | something that is not | explicitly mentioned in the spec; and this demonstrates that the set of | things that strftime can use is not exhaustive. Of course not, strftime() can look at whatever it likes. What it can't do is expect the application to have provided any data that the standard does not require of it. Locales are easy (in this regard), as everything defaults to "C" (aka "POSIX") if the application has done nothing. So any of the system interfaces can access any locale information it needs, it is always defined (somehow - how is up to the implementation). | This interpretation is too strict, Notice I said "strictly C conforming" not POSIX conforming. In that environment, this line: | tm.tm_gmtoff = gmtoff; is likely to generate a compilation error, so the application cannot include it. That one must be removed to be strictly C conforming. Given that one is not there, how is it that the strftime() function would be expected to use the parameter to this function ? | If the abovementioned interpretation were correct, then to conform to | POSIX-2017, strftime %z could look *only* at tm_isdst and could not look | at tm_gmtoff, and the Ubuntu behavior would therefore be incorrect | because its strftime is clearly looking at tm_gmtoff. It is fine, provided it continues working when the application hasn't provided a value for tm_gmtoff. And somehow it can work out the difference between that field (which will still be in the struct of course) hasn't been set and just contains garbage, and when it has. That's no longer the case in the forthcoming standard, as %z is allowed to use the value in tm_gmtoff, and so conforming applications will need to set it. | But the Ubuntu behavior *is* correct: it's good behavior, it's the | behavior most people would expect, and it's common on many | implementations. If an interpretation of the C and/or POSIX standards | says that Ubuntu doesn't conform, For C, clearly not, as tm_gmtoff doesn't exist there (which doesn't mean that an implementation cannot add it, but no conforming application can assumes that it has been - so it can never be init'd, except perhaps to 0 by a memset() (or equiv) of the entire struct. For POSIX, now that tm_gmtoff has been added, the standard has been amended. | It intends to fix the bug reported by Dag-Erling Sm�rgrav here: | https://mm.icann.org/pipermail/tz/2024-January/033488.html But there is no bug described there, the answers that the example produced are what is intended to happen. That's because of the "same result as mktime() would produce" requirement. And mktime() is defined to be the inverse of localtime() not of gmtime(). (C23 or something is supposedly adding timegm() - which is sorely lacking, and POSIX will then add it in issue 9, in a decade or two, perhaps, just perhaps, but unlikely, in some earlier TC update). | Without the patch, the bug occurs even on systems with tm_gmtoff. Once again, there is no bug (or at least, was, if you have changed how it works, anything like was requested, there will be a bug now). It might not have met his expectations, in which case his expectations were incorrect. This is really all very simple stuff. kre