Currently, strftime() implements %s by calling mktime() and then printing the result. This is fine when the struct tm passed to strftime() came from localtime() but not when it didn't. A better solution would be to call timegm() and then manually adjust the result. Of course that's only possible in the TM_GMTOFF case but that's still better than nothing. The first attachment is a program which demonstrates the issue. It should print the same value for %s in both cases but doesn't: % cc -Wall -Wextra -o strftime_s strftime_s.c % ./strftime_s local 1704903586 2024-01-10 17:19:46 CET gm 1704899986 2024-01-10 16:19:46 UTC The second attachment is my proposed fix. DES -- Dag-Erling Smørgrav - des@des.no
Dag-Erling Smørgrav via tz wrote in <86mstdp2nd.fsf@ltc.des.no>: |Currently, strftime() implements %s by calling mktime() and then |printing the result. This is fine when the struct tm passed to |strftime() came from localtime() but not when it didn't. A better Actually the manual page on Linux says %s The number of seconds since the Epoch, 1970‐01‐01 00:00:00 +0000 (UTC). (TZ) (Calculated from mktime(tm).) Maybe instead the newstrftime.3 manual should give the same hint? The POSIX standard does in sofar as it says s Replaced by the number of seconds since the Epoch as a decimal number, calculated as described for mktime() --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
On 2024-01-10 08:35, Dag-Erling Smørgrav via tz wrote:
Currently, strftime() implements %s by calling mktime() and then printing the result. This is fine when the struct tm passed to strftime() came from localtime() but not when it didn't. A better solution would be to call timegm() and then manually adjust the result. Of course that's only possible in the TM_GMTOFF case but that's still better than nothing.
Thanks, I hadn't considered that use case. This is a tricky area, as the C standard and POSIX both require strftime to look only at tm_isdst when formatting %z and %Z. If strftime simply called timegm and munged the result according to the input tm_gmtoff, it wouldn't conform to the standards. Your example used %s which isn't standardized by C or by current POSIX. However, the latest draft for the next POSIX says for %s that strftime must act as if mktime was called with tm_isdst, and this would conflict with the implementation you're proposing. Perhaps a better approach would be for tzcode to implement strftime_z a la NetBSD. That way, you could tell strftime_z that the struct tm came from gmtime. See: https://man.netbsd.org/strftime_z.3 I vaguely recall thinking that strftime_z wasn't needed and therefore omitting it from tzcode, but your example suggests otherwise.
On 1/10/24 12:20:38, Paul Eggert via tz wrote:
... Perhaps a better approach would be for tzcode to implement strftime_z a la NetBSD. That way, you could tell strftime_z that the struct tm came from gmtime. See:
https://man.netbsd.org/strftime_z.3
I vaguely recall thinking that strftime_z wasn't needed and therefore omitting it from tzcode, but your example suggests otherwise. .
It's unduly complex, even inconsistent if strftime( "%s" ) does not yield the same result as sprintf( "%d", mktime( ... ) ) similarly taking into account the implied tzset(), TZ, etc. Seconds since the epoch should not depend on TZ. The epoch is 0 UTC, not 0 local. But the programmer must make TZ consistent with the members of struct tm. -- gil
Paul Gilmartin wrote:
Seconds since the epoch should not depend on TZ. The epoch is 0 UTC, not 0 local.
Right. So it seems to me that strftime should *not* support %s, or at least, not unless struct tm is expanded to include a field holding the original time_t value. (But, yes, I saw Paul's earlier message mentioning the next POSIX draft talking about it.)
On 1/10/24 15:05:23, Steve Summit via tz wrote:
Paul Gilmartin wrote:
Seconds since the epoch should not depend on TZ. The epoch is 0 UTC, not 0 local. . Right. So it seems to me that strftime should *not* support %s, or at least, not unless struct tm is expanded to include a field holding the original time_t value. . POSIX strftime(), which does nor include the proposed %s,: <https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html#tag...> says:
"[CX] [Option Start] If a struct tm broken-down time structure is created by localtime() or localtime_r(), or modified by mktime(), and the value of TZ is subsequently modified, the results of the %Z and %z strftime() conversion specifiers are undefined, when strftime() is called with such a broken-down time structure." That is, it is the responsibility of the programmer to call strftime() with a TZ matching that corresponding to struct tm. I take this to mean that gmtime corresponds to GMT0. There is no need to ad a time_t value.
(But, yes, I saw Paul's earlier message mentioning the next POSIX draft talking about it.)
-- gil
On 2024-01-11 11:09, Paul Gilmartin via tz wrote:
That is, it is the responsibility of the programmer to call strftime() with a TZ matching that corresponding to struct tm. I take this to mean that gmtime corresponds to GMT0. There is no need to ad a time_t value.
Yes, after looking at this a bit more I'm becoming more inclined to not add a strftime_z (and strftime_lz). It's easy to generate the equivalent of %s with sprintf on the corresponding time_t value, and none of the other strftime formats should care which timezone was used to generate them. %z and %Z should be calculated from tm_gmtoff and tm_zone (when available) rather than by invoking mktime (which is the best you can do when they're not available). The next POSIX draft says something along these lines, although it botches the details (which admittedly are messy). If I understand things correctly, it's impossible for an implementation to conform to both POSIX-2017 and draft next POSIX in this area. Oh well.
Paul Eggert via tz wrote in <45af6b1e-4bc8-4b8e-86f5-101046061434@cs.ucla.edu>: |On 2024-01-11 11:09, Paul Gilmartin via tz wrote: |> That is, it is the responsibility of the programmer to call strftime() |> with a TZ matching that corresponding to struct tm. I take this to |> mean that gmtime corresponds to GMT0. There is no need to ad a |> time_t value. | |Yes, after looking at this a bit more I'm becoming more inclined to not |add a strftime_z (and strftime_lz). It's easy to generate the equivalent |of %s with sprintf on the corresponding time_t value, and none of the |other strftime formats should care which timezone was used to generate \ |them. | |%z and %Z should be calculated from tm_gmtoff and tm_zone (when |available) rather than by invoking mktime (which is the best you can do |when they're not available). The next POSIX draft says something along |these lines, although it botches the details (which admittedly are messy). | |If I understand things correctly, it's impossible for an implementation |to conform to both POSIX-2017 and draft next POSIX in this area. Oh well. To point out that the standard text is identical except for an ISO 8601 update from 2004 to 2019 and hinting the involved structure fields. So it seems to me that POSIX now includes the fields necessary to actually perform the necessary operation; possibly at least 25+ years too late, given a72c4a2a74 Author: Arthur David Olson <ado@elsie> AuthorDate: 2000-04-17 10:08:31 -0400 Commit: Paul Eggert <eggert@cs.ucla.edu> CommitDate: 2012-07-18 03:02:34 -0400 Eggert mods plus cleanups SCCS-file: strftime.c SCCS-SID: 7.59 where you say + if (t->tm_isdst < 0) + continue; +#ifdef TM_GMTOFF + diff = t->TM_GMTOFF; +#else /* !defined TM_GMTOFF */ + /* + ** C99 says that the UTC offset must + ** be computed by looking only at + ** tm_isdst. This requirement is + ** incorrect, since it means the code + ** must rely on magic (in this case + ** altzone and timezone), and the + ** magic might not have the correct + ** offset. Doing things correctly is + ** tricky and requires disobeying C99; + ** see GNU C strftime for details. + ** For now, punt and conform to the + ** standard, even though it's incorrect. + */ + diff = -(t->tm_isdst ? altzone : timezone); where altzone is precalculated from some (tz_)gmtoff that in fact is present in Mr. Olson's package aka ado@elsie since 1986-01-13 (164b93d818) -- wow! ISO C. And it still has no bit enums. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Date: Thu, 11 Jan 2024 12:28:54 -0800 From: Paul Eggert via tz <tz@iana.org> Message-ID: <45af6b1e-4bc8-4b8e-86f5-101046061434@cs.ucla.edu> | It's easy to generate the equivalent | of %s with sprintf on the corresponding time_t value, That's true for C code, %s is mostly used with date +%s which doesn't need the modified strftime_*() functions (though I don't believe there's a standard PRIxxx macro for time_t). | If I understand things correctly, it's impossible for an implementation | to conform to both POSIX-2017 and draft next POSIX in this area. What do you believe the problem to be? I don't believe there was any intent to change things (beyond adding the new tm_ fields), and if something slipped in by accident it might not be too late to fix it. kre
On 1/12/24 01:02:10, Robert Elz via tz wrote:
Date: Thu, 11 Jan 2024 12:28:54 -0800 From: Paul Eggert via tz <tz@iana.org> Message-ID: <45af6b1e-4bc8-4b8e-86f5-101046061434@cs.ucla.edu>
| It's easy to generate the equivalent | of %s with sprintf on the corresponding time_t value,
That's true for C code, %s is mostly used with date +%s which doesn't need the modified strftime_*() functions (though I don't believe there's a standard PRIxxx macro for time_t). .
The OP was concerned with the effect of calling strftime(%s) with a struct tm that was not generated by localtime() with the same TZ in the environment. That doesn't happen with date +format or with well-designed C code which can call sprintf with the time_t used to generate the struct tm. -- gil
Date: Fri, 12 Jan 2024 07:50:51 -0700 From: Paul Gilmartin via tz <tz@iana.org> Message-ID: <c7dfdd7d-0924-4c79-856f-320c6dad3e6f@AIM.com> | The OP was concerned with the effect of calling strftime(%s) with | a struct tm that was not generated by localtime() with the same | TZ in the environment. I know. | That doesn't happen with date +format or | with well-designed C code which can call sprintf with the time_t | used to generate the struct tm. I know that too, which is what I think I said, or at least, implied. Still non-trivial to use printf on a time_t without a PRIxxx designed for time_t use - really need to cast the time_t arg to intmax_t and use %jd which is a bit ugly. kre
kre wrote:
Paul Gilmartin wrote: | That doesn't happen with date +format or | with well-designed C code which can call sprintf with the time_t | used to generate the struct tm.
I know that too, which is what I think I said, or at least, implied.
The bottom line is that %s is a really, really odd strftime format -- and it's not surprising (it's manifestly inherent) that it's hard to implement. Here's the backstory, at least as I see it: Ordinary C code has no use for %s. Ordinary C code typically starts with a time_t value, and only calls localtime or gmtime to construct a struct tm if it needs to. And pretty much the only reason to call localtime or gmtime to construct a struct tm is so that you can print out a human-readable time string, perhaps using strftime. But of course %s is not a human-readable time string. And ordinary C code wouldn't need strftime %s to print a raw time_t value in the first place, because (as I was just saying) ordinary C code typically has raw time_t values rattling around already, which are easier to print with printf anyway. (Yes, as kre pointed out, it's unclear whether you'd want %ld or %lld, or some other variant, to do so, but that's another story.) But, despite their non-human-readableness, time_t values are now so ubiquitous that they are occasionally of interest to humans, so at some point along the way, the 'date' command acquired "%s" as one of its custom output format specifiers. So the specification for the 'date' command was really strange: when you asked it to print a custom string using "date +fmt", the specification for the format string was basically "anything strftime can do, plus %s". I don't know off the top of my head how the "official" date command implemented this. I know that in my own work I've several times found myself writing code that took a "strftime-plus-%s" format string, manually scanned for and interpolated any %s specifiers, then handed it off to strftime to take care of the rest. This was unremittingly ugly -- but it was (to my mind, anyway) still vastly preferable to trying to have strftime handle %s by itself, because strftime just doesn't have the right information available to it. I gather from this thread that someone has decided to "solve" this problem anyway, by officially adding %s to the supported strftime formats. However, it seems to me that the only "problem" here is that the 'date' program has been difficult to write. My own opinion is that dragging the whole rest of the world through this mudpit is not the right way of solving date's implementation problem -- but then, I'm not on the Posix committee, so it doesn't matter what I think. (Me, I'd say that if strftime is to support %s, then either (a) struct tm has to be augmented with a new field for the original time_t value, or (b) %s has to be supported only with a new strftime variant where you explicitly pass in the time_t value for %s to use, if necessary: strftime_s(char *buf, size_t bufsize, const char *format, struct tm *timeptr, time_t t). I hope that, in the absence of either of these admittedly radical proposals, Posix is at least mandating tm_gmtoff, which we've long needed anyway, and which would at least make the implicit mktime call, necessitated by %s, a tractable problem.)
On Jan 12, 2024, at 11:47 AM, Steve Summit via tz <tz@iana.org> wrote:
But, despite their non-human-readableness, time_t values are now so ubiquitous that they are occasionally of interest to humans, so at some point along the way, the 'date' command acquired "%s" as one of its custom output format specifiers.
At some point, *somebody's* "date" command may have acquired "%s", but it's not in issue 7 of POSIX/SUS.
I gather from this thread that someone has decided to "solve" this problem anyway, by officially adding %s to the supported strftime formats. However, it seems to me that the only "problem" here is that the 'date' program has been difficult to write. My own opinion is that dragging the whole rest of the world through this mudpit is not the right way of solving date's implementation problem -- but then, I'm not on the Posix committee, so it doesn't matter what I think.
From looking at the POSIX 8 draft, %s was added to strftime() as a result of Austin Group Defect 169. That defect noted that there's no POSIX-compliant way to, in a shell script, get a string that's the numerical value of the current time_t, and requested a "%s" output format identifier for the date command, rather than, say, a "-e" (for "Epoch") command-line flag. The result of the discussion for that defect was that %s should be added to strftime().
On 2024-01-12 11:47, Steve Summit via tz wrote:
I don't know off the top of my head how the "official" date command implemented this.
("official" :-)? As I vaguely recall, SunOS "date" implemented %s by 1996 and inspired by that, I added support for %s to tzcode's strftime.c around then. See: https://github.com/eggert/tz/commit/1efa2d66a41913eeaaba1f804ac91e408d3057ce https://mm.icann.org/pipermail/tz/1996-January/009464.html
found myself writing code that took a "strftime-plus-%s" format string, manually scanned for and interpolated any %s specifiers, then handed it off to strftime to take care of the rest. This was unremittingly ugly -- but it was (to my mind, anyway) still vastly preferable to trying to have strftime handle %s by itself, because strftime just doesn't have the right information available to it.
Although that's true of current POSIX (which doesn't have %s) it's not true for POSIX 202x/D3, which has tm_gmtoff as well as %s. If you have tm_gmtoff that's enough info (along with the other struct tm members) to implement %s.
I hope that, in the absence of either of these admittedly radical proposals, Posix is at least mandating tm_gmtoff, which we've long needed anyway, and which would at least make the implicit mktime call, necessitated by %s, a tractable problem.)
Yes, that's what draft POSIX is doing.
On 1/12/24 16:17:07, Paul Eggert via tz wrote:
On 2024-01-12 11:47, Steve Summit via tz wrote: .
I hope that, in the absence of either of these admittedly radical proposals, Posix is at least mandating tm_gmtoff, which we've long needed anyway, and which would at least make the implicit mktime call, necessitated by %s, a tractable problem.)
Yes, that's what draft POSIX is doing. . Is that draft publicly available?
But it may make things worse with a 3-way inconsistency: o TZ at time of localtime() o TZ at time of strftime() o tm_gmtoff o other fields in struct tm. ... any of which I can set from my wristwatch in struct tm. -- gil
Date: Fri, 12 Jan 2024 14:47:02 -0500 From: scs@eskimo.com (Steve Summit) Message-ID: <2024Jan12.1447.scs.0002@tanqueray.local> | And pretty much the only | reason to call localtime or gmtime to construct a struct tm is | so that you can print out a human-readable time string, perhaps | using strftime. That's not really the case, not in truly portable C code, as time_t (in C) has a largely unspecified format and reesolution. The only portable way to do arithmetic (perhaps even comparisons, though I am less sure about that) is to generate a struct tm, operate upon the fields of that, and then if needed, comvert back to a time_t. That's not required of a POSIX time_t which is defined as an integer count of seconds, so simple addition of an integer containing an interval in seconds achieves the same thing. | But, despite their non-human-readableness, time_t values are now | so ubiquitous that they are occasionally of interest to humans, | so at some point along the way, the 'date' command acquired "%s" | as one of its custom output format specifiers. date +%s isn't (usually) for humans, it is needed for scripts to work with time_t values (at least in a POSIX environment where time_t's are seconds - which is why %s is in POSIX strftime, but not in the C standard, in the latter it is essentially useless if defined to simply represent the time_t - it could be defined to represent the time_t converted to integral seconds however). That allows scripts to determine how old something is, by subtracting its time_t timestamp from "now" (ie: date +%s). | (to my mind, anyway) still vastly preferable to trying to have | strftime handle %s by itself, because strftime just doesn't have | the right information available to it. It does if the correct elements of the struct are filled in, as mktime() can reconstruct the time_t from a struct tm. However it does that assuming that the struct expresses the current local time (as defined by the TZ setting) - and not some other random zone. There are people who don't understand that, and insist that it must also work for other zones - but it simply doesn't. | I gather from this thread that someone has decided to "solve" | this problem anyway, by officially adding %s to the supported | strftime formats. Yes, it is in the next POSIX. | implementation problem -- but then, I'm not on the Posix | committee, so it doesn't matter what I think. It isn't so much what committee thinks, but what the implementations have done, and essentially all current strftime() implementations support %s. | I hope that, in the absence of either of these admittedly radical | proposals, Posix is at least mandating tm_gmtoff, It is. But strftime isn't allowed to use it to implement %s as old applications can't be relied upon to give tm_gmtoff a value before calling strftime() as tm_gmtoff didn't used to be required. Hence if strftime() accessess tm_gmtoff (except possibly for %z) kre
On 1/13/2024 12:00 AM, Robert Elz via tz wrote:
Date: Fri, 12 Jan 2024 14:47:02 -0500 From: scs@eskimo.com (Steve Summit) Message-ID: <2024Jan12.1447.scs.0002@tanqueray.local>
| And pretty much the only | reason to call localtime or gmtime to construct a struct tm is | so that you can print out a human-readable time string, perhaps | using strftime.
That's not really the case, not in truly portable C code, as time_t (in C) has a largely unspecified format and reesolution. The only portable way to do arithmetic (perhaps even comparisons, though I am less sure about that) is to generate a struct tm, operate upon the fields of that, and then if needed, comvert back to a time_t.
That's not required of a POSIX time_t which is defined as an integer count of seconds, so simple addition of an integer containing an interval in seconds achieves the same thing.
| But, despite their non-human-readableness, time_t values are now | so ubiquitous that they are occasionally of interest to humans, | so at some point along the way, the 'date' command acquired "%s" | as one of its custom output format specifiers.
date +%s isn't (usually) for humans, it is needed for scripts to work with time_t values (at least in a POSIX environment where time_t's are seconds - which is why %s is in POSIX strftime, but not in the C standard, in the latter it is essentially useless if defined to simply represent the time_t - it could be defined to represent the time_t converted to integral seconds however).
That allows scripts to determine how old something is, by subtracting its time_t timestamp from "now" (ie: date +%s).
| (to my mind, anyway) still vastly preferable to trying to have | strftime handle %s by itself, because strftime just doesn't have | the right information available to it.
It does if the correct elements of the struct are filled in, as mktime() can reconstruct the time_t from a struct tm. However it does that assuming that the struct expresses the current local time (as defined by the TZ setting) - and not some other random zone. There are people who don't understand that, and insist that it must also work for other zones - but it simply doesn't.
| I gather from this thread that someone has decided to "solve" | this problem anyway, by officially adding %s to the supported | strftime formats.
Yes, it is in the next POSIX.
| implementation problem -- but then, I'm not on the Posix | committee, so it doesn't matter what I think.
It isn't so much what committee thinks, but what the implementations have done, and essentially all current strftime() implementations support %s.
| I hope that, in the absence of either of these admittedly radical | proposals, Posix is at least mandating tm_gmtoff,
It is. But strftime isn't allowed to use it to implement %s as old applications can't be relied upon to give tm_gmtoff a value before calling strftime() as tm_gmtoff didn't used to be required. Hence if strftime() accessess tm_gmtoff (except possibly for %z)
kre
I like Steve's idea of including a tm_time_t member in struct tm. I use something like this in my internal processes. it means a function such as GetUT(tm* ptm) can just return the tm_time_t value. This makes many typical operations simple and fast. It seems straight forward that localtime() could set tm_time_t. However, as you point out, you still need mktime() to compute the current time_t in cases where parts of the broken-down-time have been intentionally altered. mktime() could update the tm_time_t member. I'm not sure how feasible it is but perhaps the Posix folks might consider the idea. -Brooks
On Jan 13, 2024, at 8:43 AM, Brooks Harris via tz <tz@iana.org> wrote:
I like Steve's idea of including a tm_time_t member in struct tm. I use something like this in my internal processes.
it means a function such as GetUT(tm* ptm) can just return the tm_time_t value.
Or you could just avoid calling GetUT(), and just *use* the tm_time_t value, rather than defining a function that needs to remind people, in its documentation, that you need to have arranged that the tm_time_t value has been set in the structure before calling that function, and then deal with all the people who can't be bothered to Read The Fine Manual and ask why their program is giving random results for calls to GetUT().
This makes many typical operations simple and fast.
As long as you have arranged that tm_time_t is set, perhaps by calling a function such as mktime() which isn't as simple and fast as that.
It seems straight forward that localtime() could set tm_time_t.
Not only *could* (given that it's passed a time_t as its argument), but *should* do so (given that, as per the above, plenty of programmers will just assume it's been set).
However, as you point out, you still need mktime() to compute the current time_t in cases where parts of the broken-down-time have been intentionally altered.
Or in cases where tm_time_t wasn't set in the first place. Given that mktime() cannot determine whether it's been set, this means that mktime() must *always* do the conversion work.
mktime() could update the tm_time_t member.
It should *always* set tm_time_t.
On 1/13/2024 3:19 PM, Guy Harris wrote:
On Jan 13, 2024, at 8:43 AM, Brooks Harris via tz <tz@iana.org> wrote:
I like Steve's idea of including a tm_time_t member in struct tm. I use something like this in my internal processes.
it means a function such as GetUT(tm* ptm) can just return the tm_time_t value. Or you could just avoid calling GetUT(), and just *use* the tm_time_t value, rather than defining a function that needs to remind people, in its documentation, that you need to have arranged that the tm_time_t value has been set in the structure before calling that function, and then deal with all the people who can't be bothered to Read The Fine Manual and ask why their program is giving random results for calls to GetUT(). Sure.
This makes many typical operations simple and fast. As long as you have arranged that tm_time_t is set, perhaps by calling a function such as mktime() which isn't as simple and fast as that. Sure.
It seems straight forward that localtime() could set tm_time_t. Not only *could* (given that it's passed a time_t as its argument), but *should* do so (given that, as per the above, plenty of programmers will just assume it's been set). Yeah. Just trying to use suggestive language. But, yes, "should", probably "must".
However, as you point out, you still need mktime() to compute the current time_t in cases where parts of the broken-down-time have been intentionally altered. Or in cases where tm_time_t wasn't set in the first place.
Given that mktime() cannot determine whether it's been set, this means that mktime() must *always* do the conversion work. Only if localtime() hasn't already set it. But applications must be careful to be sure it's set to the current value representing the broken-down-time. Maybe some sort of 'state' flag is needed, like "not set", "set by localtime", "set by mktime()"
mktime() could update the tm_time_t member. It should *always* set tm_time_t. Yes. Just trying to politely suggest the obvious.
So I take it you think providing tm_time_t is not a bad idea, generally? -Brooks
On Jan 13, 2024, at 12:39 PM, Brooks Harris <brooks@edlmax.com> wrote:
On 1/13/2024 3:19 PM, Guy Harris wrote:
Given that mktime() cannot determine whether it's been set, this means that mktime() must *always* do the conversion work.
Only if localtime() hasn't already set it.
Again, how is mktime() to *reliablg* know whether it's been set? Shall another element be added to the structure, containing a bitset of elements with 0 meaning "not set" and 1 meaning "set"? And how is mktime() to know that the bitset has correctly been set?
But applications must be careful to be sure it's set to the current value representing the broken-down-time. Maybe some sort of 'state' flag is needed, like "not set", "set by localtime", "set by mktime()"
And how are you to ensure whether the state flag has been set? Perhaps an additional state flag, indicating the state of the first state flag? (I'm sure you can see where that takes you.) This all sounds like a lot of complication for *not* a lot of benefit; most if not all uses of mktime() are in cases where you *don't* know the time_t because it's never been calculated, so all the work that mktime() does is necessary. Is memoizing its result in this fashion really going to buy you much?
So I take it you think providing tm_time_t is not a bad idea, generally?
I'm still unconvinced that it won't create more problems than it solves, at least if it's used for *ANY* purpose other than %s in strftime(), where you are at least somewhat likely to be using a structure either generated by localtime() (which is passed the time_t value as an argument) or processed by mktime() (which can set it to the value it computes). And, in particular, I think using it to speed up mktime() is a colossally bad idea, given that 1) the majority of calls to mktime() are, as far as I know, calls made in order to *derive* a time_t from a date-and-time-of-day value for which you *don't* already know the time_t and 2) trying to make it possible for mktime() to determine whether tm_time_t has been set is complicated and will end up relying on people writing code carefully *and* probably changing existing code.
On 1/13/24 14:09:31, Guy Harris via tz wrote:
On Jan 13, 2024, at 12:39 PM, Brooks Harris<brooks@edlmax.com> wrote:
But applications must be careful to be sure it's set to the current value representing the broken-down-time. Maybe some sort of 'state' flag is needed, like "not set", "set by localtime", "set by mktime()" . And how are you to ensure whether the state flag has been set? Perhaps an additional state flag, indicating the state of the first state flag? (I'm sure you can see where that takes you.) . I could envision, but don't advocate, a field in struct tm which is a strong checksum of the rest of the struct ...
Cui bono? -- gil
On 1/13/2024 4:09 PM, Guy Harris wrote:
On Jan 13, 2024, at 12:39 PM, Brooks Harris <brooks@edlmax.com> wrote:
On 1/13/2024 3:19 PM, Guy Harris wrote:
Given that mktime() cannot determine whether it's been set, this means that mktime() must *always* do the conversion work. Only if localtime() hasn't already set it. Again, how is mktime() to *reliablg* know whether it's been set? Shall another element be added to the structure, containing a bitset of elements with 0 meaning "not set" and 1 meaning "set"? And how is mktime() to know that the bitset has correctly been set?
mktime() would set it regardless. I doesn't need to know prior state since its calculating from the broken-down-time.
But applications must be careful to be sure it's set to the current value representing the broken-down-time. Maybe some sort of 'state' flag is needed, like "not set", "set by localtime", "set by mktime()" And how are you to ensure whether the state flag has been set? Perhaps an additional state flag, indicating the state of the first state flag? (I'm sure you can see where that takes you.)
Yes. I use things like that to flag if my app has initialized some variable to some default value. But, yes, it doesn't really flag "is valid now". The app has to keep track of that.
This all sounds like a lot of complication for *not* a lot of benefit;
Seems to me the benefit is keeping the tm_time_t value coupled to the broke-down-time. Localtime() can set that so a call to mktime() isn't needed.
most if not all uses of mktime() are in cases where you *don't* know the time_t because it's never been calculated, so all the work that mktime() does is necessary. Is memoizing its result in this fashion really going to buy you much? It not so much about mktime() as localtime().
So I take it you think providing tm_time_t is not a bad idea, generally? I'm still unconvinced that it won't create more problems than it solves, at least if it's used for *ANY* purpose other than %s in strftime(), where you are at least somewhat likely to be using a structure either generated by localtime() (which is passed the time_t value as an argument) or processed by mktime() (which can set it to the value it computes).
And, in particular, I think using it to speed up mktime() is a colossally bad idea, It's not about speeding mktime() but keeping the tm_time_t value coupled to broken-down-time in many instances instances of struct tm. given that 1) the majority of calls to mktime() are, as far as I know, calls made in order to *derive* a time_t from a date-and-time-of-day value for which you *don't* already know the time_t and 2) trying to make it possible for mktime() to determine whether tm_time_t has been set is complicated and will end up relying on people writing code carefully *and* probably changing existing code.
localtime() sets it, and mktime() sets it. Whichever last called has set it. Well, I think its a good idea. I find it simplifies many operations. But timekeeping takes time, we've been at it for 2500 years. :-)
On Jan 13, 2024, at 2:27 PM, Brooks Harris <brooks@edlmax.com> wrote:
On 1/13/2024 4:09 PM, Guy Harris wrote:
On Jan 13, 2024, at 12:39 PM, Brooks Harris <brooks@edlmax.com> wrote:
On 1/13/2024 3:19 PM, Guy Harris wrote:
Given that mktime() cannot determine whether it's been set, this means that mktime() must *always* do the conversion work. Only if localtime() hasn't already set it. Again, how is mktime() to *reliablg* know whether it's been set? Shall another element be added to the structure, containing a bitset of elements with 0 meaning "not set" and 1 meaning "set"? And how is mktime() to know that the bitset has correctly been set?
mktime() would set it regardless. I doesn't need to know prior state since its calculating from the broken-down-time.
In other words, we should replace
However, as you point out, you still need mktime() to compute the current time_t in cases where parts of the broken-down-time have been intentionally altered. mktime() could update the tm_time_t member.
with mktime() will never use the tm_time_t member and will always set it.
But applications must be careful to be sure it's set to the current value representing the broken-down-time. Maybe some sort of 'state' flag is needed, like "not set", "set by localtime", "set by mktime()" And how are you to ensure whether the state flag has been set? Perhaps an additional state flag, indicating the state of the first state flag? (I'm sure you can see where that takes you.)
Yes. I use things like that to flag if my app has initialized some variable to some default value.
In that case, you're trusting yourself to properly maintain the flag. However...
But, yes, it doesn't really flag "is valid now". The app has to keep track of that.
...in *this* case the *library* would have to trust *arbitrary* programmers to do that, and relying on *existing* code, written before any presence flags were in struct tm, that uses, for example, mktime(), can't be done.
This all sounds like a lot of complication for *not* a lot of benefit;
Seems to me the benefit is keeping the tm_time_t value coupled to the broke-down-time. Localtime() can set that so a call to mktime() isn't needed.
...
It's not about speeding mktime() but keeping the tm_time_t value coupled to broken-down-time in many instances instances of struct tm.
In other words, we should ignore
it means a function such as GetUT(tm* ptm) can just return the tm_time_t value. This makes many typical operations simple and fast. It seems straight forward that localtime() could set tm_time_t.
because 1) if you have a struct tm where you know for certain that tm_time_t is set, you can just fetch the value without bothering with a GetUT() function; 2) otherwise, you have to call mktime() anyway. Unfortunately, strftime() can only use the tm_time_t member if it knows for certain it's been set and, in general, it has no idea what routine filled in the structure, so it has no idea whether tm_time_t has been set.
given that 1) the majority of calls to mktime() are, as far as I know, calls made in order to *derive* a time_t from a date-and-time-of-day value for which you *don't* already know the time_t and 2) trying to make it possible for mktime() to determine whether tm_time_t has been set is complicated and will end up relying on people writing code carefully *and* probably changing existing code.
localtime() sets it, and mktime() sets it. Whichever last called has set it.
What if the struct tm was filled in by some *other* code, and the structure is untouched either by localtime() or mktime()? And what if that's in *existing* code that calls strftime("%s"), which obviously won't set tm_time_t because *it didn't exist at the time the code was written*?
Brooks Harris via tz wrote in <39dd1363-5618-42b6-b0cc-02f5dbc06750@edlmax.com>: |On 1/13/2024 12:00 AM, Robert Elz via tz wrote: |> Date: Fri, 12 Jan 2024 14:47:02 -0500 |> From: scs@eskimo.com (Steve Summit) |> Message-ID: <2024Jan12.1447.scs.0002@tanqueray.local> ... |> time_t (in C) has a largely unspecified format and reesolution. |> The only portable way to do arithmetic (perhaps even comparisons, ... |I like Steve's idea of including a tm_time_t member in struct tm. I use |something like this in my internal processes. ... |I'm not sure how feasible it is but perhaps the Posix folks might |consider the idea. I cannot speak for them, but Robert Elz has brought up the idea of an entirely new interface, and i personally think that *if* anything would be done, then it should be a real object based approach. Something "datetime" alike, what many scripting+ languages and libraries offer, with defined arithmetic, most often with dedicated functions like add_months|years|days|microseconds, what do i know. Ie no more +1900 at time, no more such things, but accessors and questions onto an object interface that does not modify global data. That would surely be a good new foundation. And then there needs to be some calendar object to be able to truly reflect cultural date and time differences. I never was able (nor did i have the need) to work with these real things. It is hard to imagine that ISO C does anything such. --End of <39dd1363-5618-42b6-b0cc-02f5dbc06750@edlmax.com> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Robert Elz wrote:
date +%s isn't (usually) for humans, it is needed for scripts to work with time_t values...
Ah, yes. Good point. I remember now that it's recommended -- by no less than the comp.unix.shell FAQ list -- that if you need the current time_t value but you don't have date +%s available, you can instead use the not-at-all-obvious awk 'BEGIN { print srand(srand()) }' (But I'm bringing this up merely as a curiosity and for humor value, not as a serious topic of discussion!)
| (to my mind, anyway) still vastly preferable to trying to have | strftime handle %s by itself, because strftime just doesn't have | the right information available to it.
It does if the correct elements of the struct are filled in, as mktime() can reconstruct the time_t from a struct tm. However it does that assuming that the struct expresses the current local time (as defined by the TZ setting) - and not some other random zone.
Right. I confess I had forgotten this point. For anyone else who (like me) wasn't fully paying attention, let me state it another way: The struct tm you hand to strftime() is usually one you just got from localtime() or gmtime() -- BUT IT MIGHT NOT BE. In particular, if the struct tm handed to strftime() is one that, say, the caller just has hand-constructed, it is rather *un*likely to contain proper values for things like tm_gmtoff or the hypothetical tm_time_t I was mentioning and that Brooks picked up on. So (as others have said several times), strftime %s really does have no choice but to do a mktime() based on barely-adequate information -- and part of that information is, alas, the global TZ environment variable. An implication is that if you want to implement strftime %s, you can *not* make it easier on yourself by having localtime or gmtime helpfully stash extra information in struct tm, using fields like tm_gmtoff or tm_time_t. (I feel foolish saying this, because I'm one of the ones who was just suggesting making things easier by filling in extra information like tm_gmtoff or tm_time_t.) And then the other implication is that, dammit, the global TZ environment variable still really matters. You still have to ensure that it's the same for corresponding localtime and mktime calls -- and, if you're using %s, for corresponding strftime calls as well. Keeping it the same might not seem so hard -- who changes environment variables out from under a running program, anyway? -- except that changing TZ is the recommended and pretty much only way of dealing with a time zone other than the current one. (It sounds like Posix is *finally* standardizing the vital tm_gmtoff field. I wonder how many more years we'll have to wait for someone's blessing of BSD's variant _z functions?)
There are people who don't understand that, and insist that it must also work for other zones - but it simply doesn't.
Right. But go easy on them -- it really is almost inhumanly difficult to keep all of the sloppily dovetailing constraints in mind at the same time. In particular, it's absurdly difficult to remember how badly these functions all depend on the global TZ variable, and to remember that there really is no good way of working with time zones other than the current one. Me, I find myself half wishing for a big, bold warning on the strftime man page: The struct tm handed to strftime must be one returned by an immediately preceding call to localtime or gmtime. But that's not even true. That's the warning that would allow an implementor to look at tm_gmtoff (or the hypothetical tm_time_t) when writing strftime %s. But, in fact, callers *are* allowed to pass handcrafted struct tm values to strftime, and implementors are obliged to make this work -- even if there's a %s in the format string. (Which brings me back to my conclusion that %s shouldn't exist, because it's impossible to implement correctly. But, as the saying goes, people who believe a thing to be impossible should not stand in the way of those who are doing it.) So, in fact, the necessary big, bold warning is not one that goes on the strftime man page. No, the big, bold warning is for people like me, who keep dreaming of a set of time-conversion functins that's halfway sane and coherent. I'm not sure where the warning goes, but it would say something like: You might think that the sequence struct tm *tm = localtime(&t); strftime(buf, sizeof buf, "%s", tm); is fundamentally guaranteed to place a decimal representation of t into buf, where "fundamentally" implies that it just *has* to work, even in the face of serious bugs in other, unrelated parts of the time-conversion logic. But no, this sequence is in fact utterly vulnerable to bugs in other parts of the time-conversion logic, because it is necessarily equivalent to the sequence struct tm *tm = localtime(&t); time_t t2 = mktime(tm); which sets t2 == t only in the presence of a perfectly- implemented mktime, and also given certain other constraints, such as that TZ has not changed. The bottom line is that a call to localtime followed by a call to strftime %s is an only barely, skating-on-thin-ice-ily information-preserving transformation. It'll probably work, but really, it's more in the category of float f = atof(str); sprintf(str2, "%f", f); That is, what you're looking at when you use strftime %s is *not* a straight passing-through of data; you're probably looking at a pair of nominally-inverse but delicate and potentially lossy transformations. Perhaps there *is* a warning worthy of putting on the strftime man page, which is Please rely on %s only if you're the implementor of date(1) or the equivalent. If you're using %s to print a time_t value that your program has explicitly, it is far less error-prone to print that value directly, than to convert it to a struct tm and print it with strftime %s. Apologies for the long message, and for taking so long to understand that, yes, strftime %s really does have to do a full-blown mktime, with all the unreliability and imprecision that implies -- it can *not* do the easy thing and look at tm_gmtoff. (Stay tuned for our next exciting episode, in which the programmers who used to clamor for the nonstandard timegm function now request a strftime variant whose %s specifier assumes UTC.)
On 2024-01-14 06:20, Steve Summit wrote:
strftime %s really does have no choice but to do a mktime() based on barely-adequate information -- and part of that information is, alas, the global TZ environment variable.
Although that's one interpretation of the standard, it's not the only one. As I've been saying, although the POSIX and C standards can easily be misinterpreted, they have a better interpretation which says that on a system with tm_gmtoff and tm_zone strftime need not use mktime or equivalent, not even for %s. This interpretation is better because (a) it's how popular implementations work in many cases and (b) it's what users expect.
if you want to implement strftime %s, you can *not* make it easier on yourself by having localtime or gmtime helpfully stash extra information in struct tm, using fields like tm_gmtoff
Luckily you can - if you use the better interpretation.
I find myself half wishing for a big, bold warning on the strftime man page:
The struct tm handed to strftime must be one returned by an immediately preceding call to localtime or gmtime.
This is good advice, and (at least in a "should" form) it should be in POSIX. Come to think of it, it should be in tzcode's man page too. I installed the attached proposed patch to do that. While I was at it I noticed that the man page doesn't say strftime behaves as if tzset were called (even though this is no longer needed). The attached patch contains a fix for that as well.
callers *are* allowed to pass handcrafted struct tm values to strftime, and implementors are obliged to make this work
Yes, but the standards give leeway as to how to "make this work" for %z and %Z, and this leeway includes using members like tm_gmtoff and tm_zone that the C standard does not specify.
(Which brings me back to my conclusion that %s shouldn't exist, because it's impossible to implement correctly.
It's impossible only if one uses a too-strict interpretation of the standards. Let's not do that, as it would make our implementations worse, our users more confused, and our software buggier.
the big, bold warning is for people like me, who keep dreaming of a set of time-conversion functins that's halfway sane and coherent. I'm not sure where the warning goes, but it would say something like:
You might think that the sequence
struct tm *tm = localtime(&t); strftime(buf, sizeof buf, "%s", tm);
is fundamentally guaranteed to place a decimal representation of t into buf, where "fundamentally" implies that it just *has* to work, even in the face of serious bugs in other, unrelated parts of the time-conversion logic. But no, this sequence is in fact utterly vulnerable to bugs in other parts of the time-conversion logic, because it is necessarily equivalent to the sequence
struct tm *tm = localtime(&t); time_t t2 = mktime(tm);
which sets t2 == t only in the presence of a perfectly- implemented mktime, and also given certain other constraints, such as that TZ has not changed.
Assuming that localtime and strftime both succeed (localtime returns non-null and strftime's output fits), then a warning stated this baldly would be incorrect for current tzcode as its strftime %s is indeed the inverse of localtime.
Perhaps there *is* a warning worthy of putting on the strftime man page, which is
Please rely on %s only if you're the implementor of date(1) or the equivalent. If you're using %s to print a time_t value that your program has explicitly, it is far less error-prone to print that value directly, than to convert it to a struct tm and print it with strftime %s.
It's true that strftime %s has problems on other platforms, so a portability warning is appropriate for tzcode strftime's man page. I put one into the attached proposed patch.
(Stay tuned for our next exciting episode, in which the programmers who used to clamor for the nonstandard timegm function now request a strftime variant whose %s specifier assumes UTC.)
NetBSD's strftime_z does that. But it's not needed in current tzcode, which addresses the problem in a simpler way.
awk 'BEGIN { print srand(srand()) }'
That is *hilarious* and it works on every machine I have easy access to! Alas, it's not guaranteed by POSIX, which merely says it outputs "time of day" not "number of seconds since the Epoch".
Date: Sun, 14 Jan 2024 11:41:10 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <0573ccfb-4c07-4886-916c-f521180e949a@cs.ucla.edu> Note. in what follows, lines starting " | " are quotes of text written by Paul in the message to which I'm replying, except for those lines which start " | >" which are lines Paul quoted from a message Steve sent (see the header for full names & e-mail addresses). | Although that's one interpretation of the standard, it's not the only | one. It is, however, approximately the correct one. | As I've been saying, although the POSIX and C standards can easily | be misinterpreted, Anything can be misinterpreted, that means nothing. | they have a better interpretation which says that on | a system with tm_gmtoff and tm_zone strftime need not use mktime or | equivalent, not even for %s. Nothing says that anything needs to use mktime(). What the spec for strftime("%s") says is that the result, in a POSIX system, must represent the same value as mktime() on the same struct tm would produce. That is the value that should be produced is specified. That's all. Allowing the implementation to produce any answer it likes would make it kind of difficult for anyone to use reliably, don't you think? The mechanism the implementation uses to produce that specified result is entirely up to it, provided it uses only the data that users are told they need to provide (otherise the implementation risks using garbage). | > The struct tm handed to strftime must be one returned by | > an immediately preceding call to localtime or gmtime. | | This is good advice, It actually isn't. It isn't required at all. All that is required is that the fields required for the conversions specified in the format string be correctly initialised to the desired values. Certainly calling one of the functions which fills in a struct tm will do that, and that's a very common usage, but it isn't the only way (using the results from parsedate() on systems that have it is another, as is simply doing a scanf() on a date/time string, perhape one previously created by strftime()). Or many other ways, including simply reading the struct tm from a file. | and (at least in a "should" form) it should be in POSIX. It certainly should not. | While I was at it I noticed that the man page doesn't say strftime | behaves as if tzset were called (even though this is no longer needed). But in general, it doesn't, only for the 3 conversions that need it. In any case "behaves as if tzset() were called" is more or less (not fully) irrelevant to the call of strftime() itself, what is crucial about that is that calls to tzset() can affect later function results, and the lifetimes of data returned from earlier calls - so it is important to know when that might happen. | Yes, but the standards give leeway as to how to "make this work" for %z | and %Z, and this leeway includes using members like tm_gmtoff and | tm_zone that the C standard does not specify. Certainly, POSIX has added stuff which the C standard does not require to exist - C is trying to be able to run in more environments than just POSIX ones, which necessarily affects just how much it can specify when dealing with interfaces to external systems (like time). Eg: in C, a time_t is *not* a count of seconds since some epoch, and simply printing a time_t value and expecting that to be seconds since the epoch, in a portable C application is incorrect. POSIX specifies it as an integer count of seconds since 1970-01-01T00:00:00Z (at exactly 86400 seconds per day, every day, always). C does not. A C time_t might be a count of milliseconds, of microseconds, or 2-seconds, or BCD encoded, or almost anything (though I think it is now required to be an integer type - I believe it was once allowed to be a float). | > (Which brings me back to my conclusion that %s | > shouldn't exist, because it's impossible to implement correctly. Nonsense. It is trivial to implement correctly. Perhaps you mean that the specification does not achieve what you want it to produce - that's a different issue entirely. That is, your "correctly" means "what I want" rather than "as specified". And you're certainly right that would be impossible, as what you want, and what I want, and what someone else wants might all be different - the implementation needs to pick one of them (or add more interface to select) - it cannot simply guess which one the current user expects to happen, and implement that. That is impossible. Lots of functions don't do what I'd like them to do. Sad, but true. Live with it. | It's impossible only if one uses a too-strict interpretation of the | standards. I have no idea what that means. It isn't impossible no matter how strictly the standard is interpreted (which should always be "very"). | Let's not do that, as it would make our implementations | worse, our users more confused, and our software buggier. All that is needed is to make it clear just what the %s value represents. It is *not* the time_t value that produced this struct tm - it cannot be, as no such thing need exist. It is the time_t value which localtime() would convert into the same values as are in the struct tm given to strftime() (for the fields that strftime() uses, or might). Note, only localtime() for this, never gmtime() or anything else. | > You might think that the sequence | > | > struct tm *tm = localtime(&t); | > strftime(buf, sizeof buf, "%s", tm); | > | > is fundamentally guaranteed to place a decimal representation | > of t into buf, where "fundamentally" implies that it just | > *has* to work, even in the face of serious bugs in other, Come on, be serious. Nothing is ever guaranteed to work in the face of serious bugs. If there's a serious bug in cc, you might not even be able to compile the code to test that (for example). If the startup code (what used to be crt0 but I think that's been replaced by something different - never mind) has a serious bug, your code might never start running. If ... (I could go on forever). | > unrelated parts of the time-conversion logic. But no, this | > sequence is in fact utterly vulnerable to bugs in other | > parts of the time-conversion logic, Everything is vulnerable to bugs in all kinds of things. The whole of tzcode assumes that read(2) works, so that the TZif file can be read to get the information it contains, but if there were a bug in read(2) such that every second byte was complemented, or something else weird like that, nothing would work. Do you worry about that, and abandon all uses of tzcode because of it? I certainly don't. We cannot specify things such that we are assuming that other things will be broken, or we cannot expect to rely upon anything at all. Instead, we assume that everything works, and write code based upon that assumption, and then if something doesn't behave as expected, we first double check that our expectation is correct (that is, don't simply assume that because it looks like as if it should do X, that X is what it must do - verify that the specification says that), and then if that's true, and the implementation isn't doing what it should, we file a bug report and get the thing fixed. | > because it is necessarily | > equivalent to the sequence | > | > struct tm *tm = localtime(&t); | > time_t t2 = mktime(tm); Yes. | > which sets t2 == t only in the presence of a perfectly- | > implemented mktime, Of course, and a perfectly implemented localtime(), and a perfectly implemented compiler, and correctly functioning hardware, and ... Incidentally, a bug free localtime() is much harder to achieve than a bug free mktime(), as mktime() can easily be implemented simply by making calls to localtime() and comparing the results with the input struct tm, until the input time_t to locatime() which produces the expected results is found. Perhaps not all that efficient, but very easy, and if localtime() is correct, then so will be mktime(). mktime() needs to normalise the values in the struct tm first, or they'd never compare equal to localtime results of course - strftime() doesn't need to do that, as its results are unspecified if any of the relevant struct tm values are out of their specified ranges. | > and also given certain other constraints, | > such as that TZ has not changed. Yes - mktime() uses the current TZ specified local time to do its conversion, just as does localtime. You might as well say that struct tm *tm1 = localtime(&t); struct tm *tm2 = localtime(&t); isn't guaranteed to produce the same values in *tm1 and *tm2, as it depends upon a perfectly implemented localtime() and that TZ isn't altered between the two calls, and ... (and that t doesn't change in the interim). There's nothing specific to mktime() or strftime("%s") which makes things any different in this area. | Assuming that localtime and strftime both succeed (localtime returns | non-null and strftime's output fits), then a warning stated this baldly | would be incorrect for current tzcode as its strftime %s is indeed the | inverse of localtime. As it should be. Exactly that, and nothing else. Ever. In this regard, note that localtime() uses the TZ timezone, not anything different, so what you're saying is that strftime("%s") uses the TZ timezone, and never anything else (whatever value might happen to be in the tm_gmtoff field of the struct tm passed to it). | > Please rely on %s only if you're the implementor of | > date(1) or the equivalent. Nonsense. Further the implementor of date(1) doesn't care about %s at all, the '+format' operand is simply passed directly to strftime (and then the leading '+' in the resulting string removed - there are reasons for doing it that way rather than removing the '+' first) without examining it at all. | It's true that strftime %s has problems on other platforms, What platforms have issues? That is, of ones which actually support %s of course. (Though it sounds as if perhaps the current unreleased but patched tzcode might perhaps be one of them.) | so a portability warning is appropriate for tzcode strftime's man | page. It would be better to file bugs against the broken ones. This isn't a case (like say "echo") where there are two competing specifications, and people simply will not agree on which is correct, so we just tell everyone to avoid it for safety. | NetBSD's strftime_z does that. But it's not needed in current tzcode, | which addresses the problem in a simpler way. There is no problem to address. It is just that %s is not designed to do what soem people apparently want it to do (which is, in general, not really all that useful). If there is a real need for something different, the way to deal with that is to suggest to the implementors that some other conversion be added (or a modifier applied to the %s conversion perhaps) to achieve different results - not to arbitrarily simply change the speficication of %s and by so doing break code which is justifiably relying upon it working as it is specified to work. kre
On Jan 14, 2024, at 2:46 PM, Robert Elz via tz <tz@iana.org> wrote:
Eg: in C, a time_t is *not* a count of seconds since some epoch, and simply printing a time_t value and expecting that to be seconds since the epoch, in a portable C application is incorrect. POSIX specifies it as an integer count of seconds since 1970-01-01T00:00:00Z (at exactly 86400 seconds per day, every day, always). C does not. A C time_t might be a count of milliseconds, of microseconds, or 2-seconds, or BCD encoded, or almost anything (though I think it is now required to be an integer type - I believe it was once allowed to be a float).
According to ISO/IEC 9899:2018, time_t is required to be a "real type", which means either an integer type *or* a real floating-point type. (I think some formalisms in which time is a complex number have been used, but C, at least, does not allow time_t to be a complex floating-point type, and there are no complex integer types.) So, no, at least as of the aforementioned standard, which Wikipedia informs me is "C17", the "2018" in the full name notwithstanding, C does not require time_t to be an integer type - it still may be floating-point.
Date: Sun, 14 Jan 2024 16:17:10 -0800 From: Guy Harris <gharris@sonic.net> Message-ID: <26048A4F-4E76-4984-A862-FE63573D3F49@sonic.net> | According to ISO/IEC 9899:2018, time_t is required to be a "real type", OK, thanks for that, I don't generally deal with the C standard, I just thought that someone had told me that had been altered. I may have confused (in my head) someone saying that POSIX further restricts it to being an integer type (as well as also being a count of seconds since the epoch). Thanks. kre
kre wrote, quoting me:
| > You might think that the sequence | > struct tm *tm = localtime(&t); | > strftime(buf, sizeof buf, "%s", tm); | > is fundamentally guaranteed to place a decimal representation | > of t into buf, where "fundamentally" implies that it just | > *has* to work, even in the face of serious bugs in other,
Come on, be serious.
Oh, I was being perfectly serious, just not in the perfectly literal way you interpreted it. A good portion of the long message of mine you're referring to was, I admit, primarily an internal dialog with myself, and therefore not necessarily worthy of posting to this list. But what I was convincing myself of was precisely, as you put it, that the number generated by strftime %s:
...is *not* the time_t value that produced this struct tm - it cannot be, as no such thing need exist.
But before I convinced myself of this, whenever I used to see that 'date +%s' was for printing what is, for all intents and purposes, a raw time_t value, I imagined that's what it did: print the raw time_t value. So it follows that if someone is trying to implement all of date(1)'s '+' options using strftime, with the implication that strftime has to be able to do %s, it further follows that strftime has to be able to -- somehow -- access that raw time_t value. But, hang on, don't jump down my throat and correct me again, because, I know: that's wrong. But the way for me to *remember* that it's wrong, so I can avoid making these mistakes again, is to remind myself that strftime's computation of %s is *not* a simple operation: it's a complex transformation, more or less exactly equivalent to mktime. It's potentially lossy, and it does what you expect (if your expectation is even correct) only if you use it very carefully, paying attention to subtle facets of the documentation which are easy to overlook or misinterpret. One which has been mentioned is that TZ has not changed. Another is that tzset either has or has not been called. Yet another (which I don't think has been mentioned yet) is that tm_isdst is set correctly. But, yes, if you're careful of all those things, %s will work correctly. But will it do what you want? And, more importantly, are those really the only things you have to be careful of? Or are there others that you've forgotten? Or might there be others in the future? I shouldn't have said "even in the face of serious bugs", that was sort of a personal shorthand. The point is that, even if there *aren't* "bugs in other parts of the time-conversion logic", there might be bugs in your understanding of what %s does, and there might be bugs in your ability to uphold the guarantees that %s requires. So if you've got a time_t value you want to eventually print out in raw form, printf with a format of either %ld or %lld is a much, much more reliable way of doing it that going around the barn with localtime and strftime %s. (That's why I brought up the analogy of atof and printf %f.) But, anyway, I think your disagreement is not with my conclusion, but rather, with the arguments I used to reach my conclusion. I assure you, though, that in *my* head, those *are* the arguments necessary to reinforce the correct conclusion! (And then again: given the variety of interpretations just in this thread, and even if you do succeed in convincing everyone here that your interpretation of strftime %s is the correct one, how sure can we be that all other implementations will agree? Perhaps I shouldn't have walked back those words "even in the face of serious bugs"; it may be that those serious bugs -- in some other implementations -- are more or less inevitable!)
| > (Which brings me back to my conclusion that %s | > shouldn't exist, because it's impossible to implement correctly.
Nonsense. It is trivial to implement correctly.
A laughable conclusion, given the complexity of this thread! But I think you mean, the long and the short of a proper %s implementation is to call mktime on the struct tm handed to strftime, and interpolate the result. Right now I do agree that's the correct implementation, and you're right, it's trivial. (But it's like that old joke about the lecturer who, after being questioned about whether a certain result is truly "obvious", spends half an hour alternately deep in thought or scribbling abstrusely on the chalkboard, before triumphantly concluding, "Yes, I was right, it is obvious.")
Perhaps you mean that the specification does not achieve what you want it to produce - that's a different issue entirely.
Indeed, and that's partly what I meant -- but I don't believe it's irrelevant. An implementation that perfectly implements a useless specification isn't useful. strftime %s isn't useless, so let me amend that to, an implementation that perfectly implements a problematic specification may remain problematic.
| > Please rely on %s only if you're the implementor of | > date(1) or the equivalent.
Nonsense. Further the implementor of date(1) doesn't care about %s at all, the '+format' operand is simply passed directly to strftime
No, I know, but if the implementor of date(1) has a specification of the format specifiers accepted by '+', it might be prudent to vet that list against the specification of the strftime call that's about to be used.
Incidentally, a bug free localtime() is much harder to achieve than a bug free mktime(), as mktime() can easily be implemented simply by making calls to localtime() and comparing the results with the input struct tm, until the input time_t to locatime() which produces the expected results is found.
And of course that's precisely how some implementations of mktime *do* work!
Date: Sun, 14 Jan 2024 20:33:05 -0500 From: Steve Summit via tz <tz@iana.org> Message-ID: <2024Jan14.2033.scs.0003@tanqueray.home> | But what I was convincing myself of was precisely, as you put | it, that the number generated by strftime %s: | | > ...is *not* the time_t value that produced this | > struct tm - it cannot be, as no such thing need exist. Note you need to keep the correct standards in your head, and know what each requires, and what each specifies. What I wrote there is just to make it clear that one may do: const char * func(void) { static char res[80]; struct tm T = { .tm_year = 2024 - 1900, .tm_mon = 1 - 1, .tm_mday = 15, .tm_hour = 12, .tm_min = 55, .tm_sec = 26, .tm_isdst = -1 /* or 0, or 1, your choice */ }; strftime(buf, sizeof buf, "%s", &T); return buf; } then in my timezone, in a POSIX environment, func() is required to return a pointer to the string "1705298126". The value you'll get will be different, as it depends upon your local time zone, but for any constant local timezone that is, if you do not alter TZ), the result is the same string, it does not depend upon the current date or time in any way at all. If you do alter TZ you can discover the time_t value for that particular instant of local time in various different time zones, as many as you desire. [Aside: I did not compile test that, apologies for any random syntax errors, etc, I introduced .. I did however validate the struct tm data to time_t conversion for my timezone.] The point here is that the %s conversion isn't giving the time_t value that was used to generate T as no time_t value was used for that, just the C code written above. And since the answer depends upon what timezone you execute it in, there is no one right answer, expecting one is a mistake. Anyone who believes that %s (or mktime()) is required (or even just should, and the standard should be changed to allow it) return a particular value for a given struct tm needs to explain how that is to work and keep code like the above functioning correctly. And note, code just like that has always worked for mktime() (since the dark ages when mktime() was first invented, before tzcode or tm_gmtoff existed) and thus by extension for strftime("%s") which traditionally just did mktime() on a copy of the struct tm handed to strftime() and converted the result to a string (using snprintf probably). Note that T is a local variable in func()'s stack space, the fields of the struct tm that are not explicitly set will contain whatever stack garbage was there before the call to func() and as we can sprinkle calls to func() throughout our code, after any other random function has just put who knows what on the stack, that stack garbage can vary from call to call. Hence, if the implementation were to (say) use the value of tm_gmtoff (which is not initialised above) in any way at all to compute the value returned by the %s conversion, then the result would not always be the same. But it must be, there is only one time_t value which you can pass to localtime which will generate 2024-01-15 12:55:26 in your local timezone -- except if that local time happens to be in the overlap period when summer time has just ended and local times run twice - though for that to happen would be unusual indeed, that weird hour (or however long it happens to be) typically happens in the middle of the night, and not in the middle of the day on a Monday. | But before I convinced myself of this, whenever I used to see | that 'date +%s' was for printing what is, for all intents and | purposes, a raw time_t value, I imagined that's what it did: It is what it does. But remember that "date" is a POSIX command, and obeys the POSIX standards, the C standard does not specify any commands at all, just the language often used to write them. So date(1) knows it is in a POSIX environment (anywhere else, and what it does with a '+xxxx' operate, and how that would even be specified to it if a date command even exists, is all someone else's problem - some other standard, or some vendor's proprietary S specification, or whatever) and so time_t is an integer specifying seconds since the epoch, and so that is what date +%s is guaranteed to print (and given no other args to vary it, seconds sine the epoch for "now" when the command is issued). You can rely upon that. | print the raw time_t value. So it follows that if someone is | trying to implement all of date(1)'s '+' options using strftime, | with the implication that strftime has to be able to do %s, it | further follows that strftime has to be able to -- somehow -- | access that raw time_t value. Of course it can. A time_t is a numeric value (even in C) it isn't a struct or union, or something like that, so for a particular environment there is some printf format conversion that we can hand to sprintf() to convert the value to a string. It might be needed to cast it to a long double or something first, but it can always be done. | But, hang on, don't jump down my throat and correct me again, | because, I know: that's wrong. Not really. | is to remind myself that strftime's computation of %s is *not* | a simple operation: Not completely trivial no. But not complex like attempting to measure time at the quantum level, or anything like that either. | it's a complex transformation, more or less | exactly equivalent to mktime. Yes, that is exactly what it is, which is why they are both specified to generate the same results. | It's potentially lossy, I'm not sure what that means - mktime() is (unfortunately, I have been trying hard to get this changed to something rational, but the POSIX people simply refuse to understand the issues) always defined to return a value, except when the year (or in very unlikely cases year and other fields combined) has an absolute value so large that a time_t doesn't have enough bits to represent it ... which is impossible with a 64 bit POSIX time_t in the common case where "int" (which is what tm_year is) is just 32 bits. Hence strftime(%s) always is as well. | it does what you expect (if your expectation is even correct) only | if you use it very carefully, paying attention to subtle facets | of the documentation which are easy to overlook or misinterpret. | One which has been mentioned is that TZ has not changed. NO! You can change TZ however you like. If you do the result will differ - it is intended to, that's why if you run the above code fragment in your timezone you'll get a different value than I get (presumably, unless you're in UTC+0700). That's intended, and the way things are intended to work. I still think you're hung up on the notion that struct tm must always come from a call to one of the *time() functions which return such a struct (or a pointer to one) given a time_t input, and that the value obtained should be that particular time_t value. Stop believing that, that's not how it has ever worked, or is intended to work. | Another is that tzset either has or has not been called. How does that affect anything? | Yet another (which I don't think has been mentioned yet) is | that tm_isdst is set correctly. Not that either - tm_isdst should be just a hint to mktime() (and consequently to strftime(%s)) for the ambiguous cases. Unfortunately, the bizarre desire to use localtime() and mktime() to allow arithmetic operations on C time_t's has the POSIX people demanding that tm_isdst be an instruction, rather than the presumption that the standards have always previously said it was (a presumption which can be rebutted if it turns out to be incorrect - but to be useful for arithmetic, it needs to be mandatory, and override local conventions). Exactly how that is supposed to work still baffles me, as how can someone possibly know what offset would be applied were summer time in effect in some date right in the middle of winter in some jurisdiction which has never had any summer time at all (like where I am, yet if I set tm_isdst to 1 in the above fragment, they require mktime to apply a dst correction which is an unknown magnitude and unknown sign). | But, yes, if you're careful of all those things, %s will work | correctly. But will it do what you want? That depends upon what you want. Obviously. If I want it to magically inflate my bank account, then I will probably be disappointed. If you don't happen to want what it is specified to do, you might be as well. But if you simply want it to behave as it is specified to do, then it should always work. Wanting things to do other than what they are defined to do (like wanting your car to operate as a submarine when submerged in water) is a nice fantasy, but one that only ever seems to work in movies. | there might be bugs in your understanding of what %s does, Of course, and you fix that by learning. Everyone starts out knowing almost knowing about almost everything, and learns over time. However, your objective should be to really learn, not guess, experiment a little and "confirm" the guess, and then proclaim your guess to be the rule. Unfortunately, that's what far too many people do, with the "experiment a little" often being "I tried it once and it worked". And this doesn't just apply here, it applies to everything we believe we know to be true. Make sure, don't just believe because it's easier, and you're less likely to be eventually proven wrong. | > | > (Which brings me back to my conclusion that %s | > | > shouldn't exist, because it's impossible to implement correctly. | > | > Nonsense. It is trivial to implement correctly. | | A laughable conclusion, given the complexity of this thread! Not at all. I have done it. It isn't hard at all. What is hard is convincing people that they're long held belief of just what must be correct (because they never happen to have observed anything different) is in fact wrong. That is hard. | But I think you mean, the long and the short of a proper %s | implementation is to call mktime on the struct tm handed to | strftime, and interpolate the result. It would have to be on a copy of the struct tm, not the actual one, as mktime() might modify it, and we don't want that for strftime - there might be more other conversions still coming in the format string, and we need to use the original values for those, not ones altered by mktime(). But you only need to worry about that if you're implementing strftime(). That's certainly an easy way - but as mktime() first goes about validating the ranges of all the (relevant) struct tm fields, and adjusting them (and then others to compensate) and also setting up the other fields in the struct to agree (tm_wday, tm_yday etc) that strftime() doesn't need to do - it can assume that all the fields are already within range, as its result is unspecified if the user doesn't guarantee that. So it can simply do the latter half of what mktime() does, perhaps using some private internal function which both mktime() and strftime() use, or perhaps just duplicating the code, or using a different algorithm which produces the same result. That's the implementor's choice, and users should not worry about it. | (But it's like that old joke about the lecturer who, | after being questioned about whether a certain result is truly | "obvious", spends half an hour alternately deep in thought or | scribbling abstrusely on the chalkboard, before triumphantly | concluding, "Yes, I was right, it is obvious.") Yes, heard that before - and that people believe it is humorous are not understanding what it means to be "obvious" - which just means that the result is guaranteed from known facts, and cannot be different, not that the process of determining that is quick. This is another of the things where common use of a word has lost its true meaning - another is "theory" where people will say "my theory is that ..." where they mean (at best) "my hypothesis..." and far more often "my unsupported random guess..." But theory is from the same root as theorem, and means proven. Not a guess. Of course, someone might, one day, find a flaw in the proof, but until that happens, a theory should be regarded as a fact. But in the common mindset, it tends to suggest it is just a guess, as that's how people misuse the word all the time. | An implementation that perfectly implements a | useless specification isn't useful. True. But it must have been useful to someone, sometime, for them to have implemented and specified it that way. That it doesn't meet your particular need doesn't mean it isn't useful to anyone, just not useful to you. You might need something different - just don't break what other people need because it isn't what you need. | No, I know, but if the implementor of date(1) has a specification | of the format specifiers accepted by '+', it might be prudent to | vet that list against the specification of the strftime call | that's about to be used. Why? That's strftime()'s job, it is what really knows, and more specifications can be added over time, without needing to go fiddle with the internals of some command (like date) which just happens to use it. strftime() will return "" if there is a problem in the format string - that's why we leave the '+' in the format that date passes to strftime - that way if date gets a "" result, it knows there was an error, and can print a diagnostic. If the result starts with '+', which it must if no error occurred, as that will (for date(1)) always be the fist char of the format string passed in, and only the conversions. which always start with a % are modified, then date knows that strftime() worked, and can simply print the result (without that '+' which is not intended to appear - the user's strftime() format is what followed that '+' in date's arg list). | And of course that's precisely how some implementations of | mktime *do* work! Yes, I know, I did the first of those (not my idea of how to implement it, that I was told about, but I wrote that code) - long long long ago (the actual code has been much improved over time, so I doubt you'll see any of my actual text, unless you look at some ancient archive - and there's no reason to do that). kre
You wrote:
| But what I was convincing myself of was precisely, as you | put it, that the number generated by strftime %s: | > ...is *not* the time_t value that produced this | > struct tm - it cannot be, as no such thing need exist.
Note you need to keep the correct standards in your head, and know what each requires, and what each specifies.
You also need to be clear about whether you're thinking as a user or an implementor. Also whether you're thinking about the way things are, or ought to be. I probably haven't been doing a good job of either, in this thread. Consequently I'm afraid we're talking at cross purposes. I may be able to pen a longer reply when I have more time, but likely not to the full list, as I doubt anyone else is interested in any more of my internal dialog, or your lengthy rebuttals to your, well, misunderstandings of it. :-)
If you should happen to write up something longer, but choose not to send it to the list, please copy me. I'm not in the standards-writing deliberations, but I've been interested in C, and library implementation, and time, for a long time. This thread may be straying from the work focus of tz-list, perhaps, but I'm enjoying learning from it. -Bennett (subscribed to the list from my older bet@rahul.net address) On Mon, Jan 15, 2024, 10:01 Steve Summit via tz <tz@iana.org> wrote:
You wrote:
| But what I was convincing myself of was precisely, as you | put it, that the number generated by strftime %s: | > ...is *not* the time_t value that produced this | > struct tm - it cannot be, as no such thing need exist.
Note you need to keep the correct standards in your head, and know what each requires, and what each specifies.
You also need to be clear about whether you're thinking as a user or an implementor. Also whether you're thinking about the way things are, or ought to be. I probably haven't been doing a good job of either, in this thread.
Consequently I'm afraid we're talking at cross purposes. I may be able to pen a longer reply when I have more time, but likely not to the full list, as I doubt anyone else is interested in any more of my internal dialog, or your lengthy rebuttals to your, well, misunderstandings of it. :-)
On 2024-01-14 14:46, Robert Elz wrote:
| > The struct tm handed to strftime must be one returned by | > an immediately preceding call to localtime or gmtime. | | This is good advice,
It actually isn't. It isn't required at all.
Although it's not required, it's good advice anyway. Calling strftime can be a tricky business (as this thread demonstrates) and following the advice means one needn't worry about many of the tricks. What I've found, when I want to format a timestamp that localtime etc. didn't generate, is that I'm typically better off not using strftime. E.g.: printf ("Etc/GMT%+d", - (gmtoff / (60 * 60))); This is better than stuffing gmtoff into a struct tm's tm_gmtoff and then using strftime followed by printf - notably because strftime can't even generate this particular format. Most applications are in a similar boat, and stick with timestamps generated by localtime etc. when calling strftime. It's exceptional to see otherwise, for good reason.
A C time_t might be a count of milliseconds, of microseconds, or 2-seconds, or BCD encoded, or almost anything (though I think it is now required to be an integer type - I believe it was once allowed to be a float).
Actually in C a time_t can be floating point. It can even be an enumerated type, or bool. POSIX formerly allowed time_t to be floating-point, but that was changed in POSIX-2008 as a result of DR 327 <https://www.austingroupbugs.net/view.php?id=327>. In POSIX-2017, time_t can still be bool (!) but in POSIX 202x/D4 the constraints on time_t have changed again: now it must be an integer type that is at least 64 bits wide. (Why 64 bits? Surely 60 bits would have been enough for real-world timestamps....)
On 2024-01-15 18:34, Paul Eggert via tz wrote:
Actually in C a time_t can be floating point. It can even be an enumerated type, or bool.
C 2023 CD has not changed the type specification, and SAS C on IBM mainframes originally used (hex) double with IBM ToD clock epoch 1900-01-01, so they could just tweak the clock value and also get microseconds, then changed to SAS integer date epoch 1960-01-01, before changing to conform to POSIX 1970-01-01. [SAS data still uses JD Julian Date/time, and also integer date with epoch 1960-01-01, in supported range 1582-11-01 to 9999-12-31.]
POSIX formerly allowed time_t to be floating-point, but that was changed in POSIX-2008 as a result of DR 327 <https://www.austingroupbugs.net/view.php?id=327>. In POSIX-2017, time_t can still be bool (!) but in POSIX 202x/D4 the constraints on time_t have changed again: now it must be an integer type that is at least 64 bits wide. (Why 64 bits? Surely 60 bits would have been enough for real-world timestamps....)
In decimal integers, that only supports up to: $ date -d@999999999999999 +%Y 31690708 perhaps they wanted to ensure they could support up to: $ date -d@9999999999999999 +%Y 316889355 ;^p [I use GNU date in scripts as a convenient strftime+ interface to get numeric values for epochs and convert GPS, NTP, Windows time stamps to locale display, the others, Unix, and M/JD: should add IBM and VMS just for historical completeness.] -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
On 2024-01-21 20:36, Brian.Inglis--- via tz wrote:
(Why 64 bits? Surely 60 bits would have been enough for real-world timestamps....)
In decimal integers, that only supports up to:
$ date -d@999999999999999 +%Y 31690708
perhaps they wanted to ensure they could support up to:
$ date -d@9999999999999999 +%Y 316889355
;^p
I'm not sure where those decimal integers came from. A 60-bit time_t, if signed, would support a time_t range of about -5.8e+17 .. 5.8e+17 seconds. The universe is about 13.8e+9 years or about 4.4e+17 seconds old, so a 60-bit signed time_t would cover the known universe's history so far, with a goodly amount of room for the future. Perhaps the POSIX standardizers were thinking that 18 billion years of future timestamps aren't enough, and that some apps need support for at least 292 billion years into the future. But what applications were they thinking of? Also on typical platforms where int is 32 bits, localtime stops working for time_t values greater than around 6.8e+16, so even 60-bit time_t is overkill for today's platforms. Also, the earth's rotation will become incompatible with POSIX long before 60-bit time_t rolls over....
On 2024-01-21 22:45, Paul Eggert wrote:
On 2024-01-21 20:36, Brian.Inglis--- via tz wrote:
(Why 64 bits? Surely 60 bits would have been enough for real-world timestamps....)
In decimal integers, that only supports up to:
$ date -d@999999999999999 +%Y 31690708
perhaps they wanted to ensure they could support up to:
$ date -d@9999999999999999 +%Y 316889355
;^p
I'm not sure where those decimal integers came from. A 60-bit time_t, if signed, would support a time_t range of about -5.8e+17 .. 5.8e+17 seconds. The universe is about 13.8e+9 years or about 4.4e+17 seconds old, so a 60-bit signed time_t would cover the known universe's history so far, with a goodly amount of room for the future.
Does the specification prohibit defining time_t using decimal types supported on legacy mainframes? ;^>
Perhaps the POSIX standardizers were thinking that 18 billion years of future timestamps aren't enough, and that some apps need support for at least 292 billion years into the future. But what applications were they thinking of?
Also on typical platforms where int is 32 bits, localtime stops working for time_t values greater than around 6.8e+16, so even 60-bit time_t is overkill for today's platforms.
Also, the earth's rotation will become incompatible with POSIX long before 60-bit time_t rolls over....
AI will predict those dates, so they do not have to change the standard. ;^> -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
On 2024-01-21 23:50, Brian.Inglis--- via tz wrote:
Does the specification prohibit defining time_t using decimal types supported on legacy mainframes? ;^>
Yes, I think draft POSIX 202x/D4 does that, as it uses a width terminology that means a binary representation, not a BCD or other decimal representation. One bit of wiggle room is that draft POSIX doesn't require two's complement; ones' complement and signed magnitude are also allowed. Even this wiggle room will go away eventually, as C2x will require two's complement and POSIX surely will do so as well in the next major release after 202x.
<<On Sun, 21 Jan 2024 21:45:41 -0800, Paul Eggert via tz <tz@iana.org> said:
Also on typical platforms where int is 32 bits, localtime stops working for time_t values greater than around 6.8e+16, so even 60-bit time_t is overkill for today's platforms.
32 bits being clearly inadequate, what is the next available integer type of greater width, that every implementation is required to provide? In POSIX, that's int64_t. (I don't think given the constraints on intN_t types you could have an int53_t although that's clearly practical to implement if you have support for denormalized IEEE doubles.) -GAWollman
On 1/22/24 12:49, Garrett Wollman wrote:
32 bits being clearly inadequate, what is the next available integer type of greater width, that every implementation is required to provide? In POSIX, that's int64_t.
POSIX doesn't require int64_t. This is true even of POSIX 202x/D4. And even if POSIX required int64_t, it wouldn't need to require time_t to be 64 bits. It could allow 60-bit time_t on implementations that have 60-bit integer types, much as it already allows 60-bit implementations of types like off_t and size_t. This is all hypothetical of course. As far as I know, 60-bit commercial hardware hasn't been produced since CDC Cyber 180 series of the 1980s, 60-bit machines never conformed to POSIX, the only remaining examples are in museums, and as far as I know none of them are actually running now (though simulators are available). If you're interested in computer history, here's an example 60-bitter built in 1988: https://cray-cyber.org/systems/cdc-cyber-180-960/ This machine could run in both 60- and 64-bit mode. The 60-bit mode was for backward compatibility with the CDC 6600 (1964), the first successful supercomputer.
On Mon, Jan 22, 2024 at 4:24 PM Paul Eggert via tz <tz@iana.org> wrote:
POSIX doesn't require int64_t. This is true even of POSIX 202x/D4.
And even if POSIX required int64_t, it wouldn't need to require time_t to be 64 bits. It could allow 60-bit time_t on implementations that have 60-bit integer types, much as it already allows 60-bit implementations of types like off_t and size_t.
This is all hypothetical of course. ...
Forgive me if I am getting standards / versions mixed up, but doesn't POSIX require CHAR_BIT==8? And recent POSIX says time_t is an integral type (compared to an arithmetic type)? Wouldn't this mean that time_t has to at least be a multiple of 8-bit in size now? And for historical purposes, weird non-POSIX platforms with C compilers have long life in the DSP and micro-controller world. I worked on a system into the early 2000's that had CHAR_BIT==32, and sizeof() all types was 1 as a result (along with 40-bit non-IEEE floating point). It also wouldn't surprise me if there is a VAX somewhere that won't die, and someone still is dealing with the MJD epoch and 100ns ticks. --Matthew Donadio (matt@mxd120.com)
On 2024-01-22 15:01, Matthew Donadio via tz wrote:
doesn't POSIX require CHAR_BIT==8? And recent POSIX says time_t is an integral type (compared to an arithmetic type)? Wouldn't this mean that time_t has to at least be a multiple of 8-bit in size now?
No, because integers can have padding bits that do not contribute to their values (or can cause the values to be invalid).
And for historical purposes, weird non-POSIX platforms with C compilers have long life in the DSP and micro-controller world. I worked on a system into the early 2000's that had CHAR_BIT==32, and sizeof() all types was 1 as a result (along with 40-bit non-IEEE floating point).
Sounds like the SHARC processor. All integer types are 32 bits. (The hardware supports 40-bit integers but you shouldn't try to use them from C.) Not a likely target for POSIX or for tzcode.
wouldn't surprise me if there is a VAX somewhere that won't die, and someone still is dealing with the MJD epoch and 100ns ticks.
My own practice is to not worry about porting current software to computers used only in museums. Life is too short. People with museum hardware should run museum software.
On 1/22/24 14:24:41, Paul Eggert via tz wrote:
POSIX doesn't require int64_t. This is true even of POSIX 202x/D4.
And even if POSIX required int64_t, it wouldn't need to require time_t to be 64 bits. It could allow 60-bit time_t on implementations that have 60-bit integer types, much as it already allows 60-bit implementations of types like off_t and size_t. .
Is there a format specification for printing such a type or must it first be converted to long long by either cast or arithmetic? I'm uneasy with the idea of an integral type that can't be printed. Is the format portable? And what does time_t[] look like? Might there be slack bits between members of the array? But this answers my long confusion about why localtime() requires a reference, tine_t*, rather than merely a value. There may be no way to pass such an (opaque?) type on the stack. -- gil
<<On Mon, 22 Jan 2024 17:00:26 -0700, Paul Gilmartin via tz <tz@iana.org> said:
But this answers my long confusion about why localtime() requires a reference, tine_t*, rather than merely a value. There may be no way to pass such an (opaque?) type on the stack.
time(), localtime(), and similar routines take a pointer because they predate the addition of `long int` to the C language: in the original implementation, the parameter was an array[2] of int, and of course you can't pass arrays by value in C. The arguments were left as pointers in V7 for binary compatibility with code that had not been converted to use `long`.[1] In contrast, interfaces added later, like ANSI C's mktime() and difftime(), use values instead. -GAWollman [1] Obviously this long predates ISO C's abstract machine model where there is a difference between types "pointer to array[2] of int" and "pointer to long", and of course `int` at the time, on a PDP-11, was 16 bits wide.
On 2024-01-22 16:00, Paul Gilmartin via tz wrote:
Is there a format specification for printing such a type or must it first be converted to long long by either cast or arithmetic? I'm uneasy with the idea of an integral type that can't be printed.
A similar problem occurs for lots of other POSIX types, such as off_t. For off_t one way to work around the problem is to convert to intmax_t and format with %jd. For time_t it's a bit more complicated as it might or might not be signed, but one can use %jd and intmax_t if time_t is signed, %ju and uintmax_t otherwise.
And what does time_t[] look like? Might there be slack bits between members of the array?
Sure, just like most types. Neither POSIX nor C prohibit padding bits in 'time_t' and similar types. The only types guaranteed to be free of padding bits are signed char, unsigned char, and (if they exist) the intN_t and uintN_t types. In the C standard the intN_t and uintN_t types are optional; POSIX requires them only for N equal to 8, 16, and 32. This stuff is relevant only for unusual machines like the Unisys ClearPath Dorado (36-bit one's complement but 'unsigned' doesn't always work right) and the Unisys Clearpath Libra (40-bit unsigned int and 41-bit signed-magnitude int). You can still buy these two platforms; the hardware, if memory serves, contains Intel Xeons with special microcode. Although I hope tzcode would run on these legacy platforms, it's never been tested to my knowledge and my guess is that there would be porting bugs. C2x will require two's complement so the C committee has stopped worrying about these two dinosaurs.
On Jan 22, 2024, at 10:58 PM, Paul Eggert via tz <tz@iana.org> wrote:
This stuff is relevant only for unusual machines like the Unisys ClearPath Dorado (36-bit one's complement but 'unsigned' doesn't always work right) and the Unisys Clearpath Libra (40-bit unsigned int and 41-bit signed-magnitude int). You can still buy these two platforms; the hardware, if memory serves, contains Intel Xeons with special microcode.
At least for the Dorado (the line that began with the Univac 1107), I think it's more like "with an LLVM-based binary-to-binary translator": https://discourse.llvm.org/t/llvm-job-opportunity-at-unisys/19830 They may have done the same for the Libra (the line that began with the Burroughs 6500), although it would need to handle the tag bits (they probably stuff the 48+3-tag-bit words in 64 bits).
Paul Eggert via tz said:
The only types guaranteed to be free of padding bits are signed char, unsigned char,
char,
and (if they exist) the intN_t and uintN_t types. In the C standard the intN_t and uintN_t types are optional; POSIX requires them only for N equal to 8, 16, and 32.
C requires them for N equal to 8, 16, 32, and 64 if the implementation has a type with the required properties. So they would be required on things like x86 and ARM architectures. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On Jan 23, 2024, at 4:55 AM, Clive D.W. Feather via tz <tz@iana.org> wrote:
Paul Eggert via tz said:
and (if they exist) the intN_t and uintN_t types. In the C standard the intN_t and uintN_t types are optional; POSIX requires them only for N equal to 8, 16, and 32.
C requires them for N equal to 8, 16, 32, and 64 if the implementation has a type with the required properties. So they would be required on things like x86 and ARM architectures.
At least as I read 5.2.4.2.1 "Sizes of integer types <limits.h>": The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign. ... — minimum value for an object of type long long int LLONG_MIN -9223372036854775807 // −(2^63−1) — maximum value for an object of type long long int LLONG_MAX +9223372036854775807 // 2^63−1 — maximum value for an object of type unsigned long long int ULLONG_MAX 18446744073709551615 // 2^64−1 and 6.2.5 "Types" in C11: ... There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int. (These and other types may be designated in several additional ways, as described in 6.7.2.) There may also be implementation-defined extended signed integer types.) The standard and extended signed integer types are collectively called signed integer types. ... For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements. The type _Bool and the unsigned integer types that correspond to the standard signed integer types are the standard unsigned integer types. The unsigned integer types that correspond to the extended signed integer types are the extended unsigned integer types. The standard and extended unsigned integer types are collectively called unsigned integer types.) that C11 requires the existence of long long int data types that support, at minimum, 64-bit integral values, *even if the machine's "word size" is smaller*, so even 32-bit x86 (IA-32) and 32-bit ARM (A32/T32) need to support 64-bit integral values - and the same applies to PowerPC/Power ISA, SPARCv{7,8,9}, 32-bit and 64-bit RISC-V, and z/Architecture (and any now-dead ISAs that existed in 2011). As for the intN_t/uintN_t types, as I read 7.20.1.1 "Exact-width integer types": These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names. C for the Univac/Unisys 36-bit systems and the Burroughs/Unisys 48+3-tag-bits systems need not provide the intN_t/uintN_t types, although power-of-2 word-size two's complement machines would. And as I read 7.20.1.2 "Minimum-width integer types": The following types are required: int_least8_t uint_least8_t int_least16_t uint_least16_t int_least32_t uint_least32_t int_least64_t uint_least64_t all of them would have to provide those types, even if, for example, the least64_t type are 72-bit or 96-bit-with-two-3-bit-tags-for-each-48-bits on the Unisys machines.
C for the Univac/Unisys 36-bit systems and the Burroughs/Unisys 48+3-tag-bits systems need not provide the intN_t/uintN_t types, although power-of-2 word-size two's complement machines would.
I'm wondering if an implementation is required to support all the features of the hardware it is meant for. (If not, implementations on power-of-2 word-size two's complement machines might not have some of the intN_t types.) --ado On Tue, Jan 23, 2024 at 3:59 PM Guy Harris via tz <tz@iana.org> wrote:
On Jan 23, 2024, at 4:55 AM, Clive D.W. Feather via tz <tz@iana.org> wrote:
Paul Eggert via tz said:
and (if they exist) the intN_t and uintN_t types. In the C standard the intN_t and uintN_t types are
optional;
POSIX requires them only for N equal to 8, 16, and 32.
C requires them for N equal to 8, 16, 32, and 64 if the implementation has a type with the required properties. So they would be required on things like x86 and ARM architectures.
At least as I read 5.2.4.2.1 "Sizes of integer types <limits.h>":
The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
...
— minimum value for an object of type long long int LLONG_MIN -9223372036854775807 // −(2^63−1)
— maximum value for an object of type long long int LLONG_MAX +9223372036854775807 // 2^63−1
— maximum value for an object of type unsigned long long int ULLONG_MAX 18446744073709551615 // 2^64−1
and 6.2.5 "Types" in C11:
...
There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int. (These and other types may be designated in several additional ways, as described in 6.7.2.) There may also be implementation-defined extended signed integer types.) The standard and extended signed integer types are collectively called signed integer types.
...
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements. The type _Bool and the unsigned integer types that correspond to the standard signed integer types are the standard unsigned integer types. The unsigned integer types that correspond to the extended signed integer types are the extended unsigned integer types. The standard and extended unsigned integer types are collectively called unsigned integer types.)
that C11 requires the existence of long long int data types that support, at minimum, 64-bit integral values, *even if the machine's "word size" is smaller*, so even 32-bit x86 (IA-32) and 32-bit ARM (A32/T32) need to support 64-bit integral values - and the same applies to PowerPC/Power ISA, SPARCv{7,8,9}, 32-bit and 64-bit RISC-V, and z/Architecture (and any now-dead ISAs that existed in 2011).
As for the intN_t/uintN_t types, as I read 7.20.1.1 "Exact-width integer types":
These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.
C for the Univac/Unisys 36-bit systems and the Burroughs/Unisys 48+3-tag-bits systems need not provide the intN_t/uintN_t types, although power-of-2 word-size two's complement machines would.
And as I read 7.20.1.2 "Minimum-width integer types":
The following types are required:
int_least8_t uint_least8_t int_least16_t uint_least16_t int_least32_t uint_least32_t int_least64_t uint_least64_t
all of them would have to provide those types, even if, for example, the least64_t type are 72-bit or 96-bit-with-two-3-bit-tags-for-each-48-bits on the Unisys machines.
On 1/23/24 13:29, Arthur David Olson via tz wrote:
I'm wondering if an implementation is required to support all the features of the hardware it is meant for. (If not, implementations on power-of-2 word-size two's complement machines might not have some of the intN_t types.)
C11 7.18.1.1 says "if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names." There are similar requirements in later C versions, and in C2x (which requires two's complement) this has been strengthened to "If an implementation provides standard or extended integer types with a particular width and no padding bits, it shall define the corresponding typedef names." So the implementations you're thinking of must support the usual intN_types.
Paul wrote:
On 2024-01-14 06:20, Steve Summit wrote:
strftime %s really does have no choice but to do a mktime() based on barely-adequate information -- and part of that information is, alas, the global TZ environment variable.
Although that's one interpretation of the standard, it's not the only one. As I've been saying, although the POSIX and C standards can easily be misinterpreted, they have a better interpretation which says that on a system with tm_gmtoff and tm_zone strftime need not use mktime or equivalent, not even for %s.
I need to go back and read some of the other messages in this thread, and in particular your arguments in favor of this "other interpretation". Here's the use case I'm worried about. Suppose someone wants to write some code to take a human-readable local date/time, and convert it back to a time_t value. Once upon a time there was one way to do that: stuff the human-readable components into a struct tm, and call mktime(). But now there's an alternative: take the same struct tm, and hand it to strftime with a format of %s. (That's a strange, alternative way of doing it, that I at least would never think of, but once strftime %s exists, I can't say that it's wrong.) In either case, however, the programmer probably isn't going to want to, may not be able to, might well have no idea how to, compute and fill in the tm_gmtoff field. (Asking the caller to fill that in for this case -- much less the hypothetical tm_time_t field I was suggesting earlier -- is basically "begging the question".) So I (somewhat reluctantly) come to the same conclusion Robert Elz has, namely that mktime can't look at tm_gmtoff, and that strftime %s can't, either.
callers *are* allowed to pass handcrafted struct tm values to strftime, and implementors are obliged to make this work
Yes, but the standards give leeway as to how to "make this work" for %z and %Z, and this leeway includes using members like tm_gmtoff and tm_zone that the C standard does not specify.
Ugh. Yes, %z and %Z are nasty, too. (It took me years to get around to adding them to the strftime equivalent in my own, homebrew date/time library, and now I'm going to have to go back and see how baldly I might have assumed that the incoming broken-down time struct wasn't hand-constructed.)
(Which brings me back to my conclusion that %s shouldn't exist, because it's impossible to implement correctly.
It's impossible only if one uses a too-strict interpretation of the standards.
I was wrong when I said "it's impossible". For one thing, I had forgotten about TZ, which of course *does* allow %s to be computed, reasonably correctly, even without tm_gmtoff. (For another, I was using "correctly" in a broader sense, trying to include certain additionally woolly-headed quasirequirements that I keep wanting to impose on %s, even though Robert Elz has been doggedly trying to remind me that they're no requirements at all.)
Steve Summit via tz said:
But now there's an alternative: take the same struct tm, and hand it to strftime with a format of %s. (That's a strange, alternative way of doing it, that I at least would never think of, but once strftime %s exists, I can't say that it's wrong.)
Speaking of strange, alternative ways of processing time ... On the few days either side of the 1999-2000 transition, I was site lead for one of the control centres of a major telephone and Internet service provider. More precisely, I was lead 08:00 to 20:00 daily and reserve lead between 20:00 and 08:00, meaning I could go back to the hotel so long as I answered any phone calls. But on The Night I decided to stay there to see what happened and just in case more hands were needed. At about 00:05 we got the only report of a Y2k bug to hit the entire company: a widget that customers could put on their web site to show the date and time was displaying "01 01 19100" instead of "01 01 2000". I volunteered to fix this and dug into the web site code to find the widget and see what was going on. My guess had been that someone had written sprintf with format "19%d" and argument tm_year instead of format "%d" and argument tm_year+1900. No. I have mercifully forgotten the exact code, but it was along the lines of: sprintf (buffer, "19%s%s", int_to_str (tm_year / 10), int_to_str (tm_year % 10)); -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 2024-01-15 09:10, Clive D.W. Feather via tz wrote:
I have mercifully forgotten the exact code, but it was along the lines of:
sprintf (buffer, "19%s%s", int_to_str (tm_year / 10), int_to_str (tm_year % 10));
Yeowch! Contrast to 7th Edition Unix, which despite being written in the 1970s had few Y2K issues (though of course it had a huge Y2038 issue!). I just did a quick scan of 7th Edition source code and found only "at", "date", and "troff" assuming two-digit years. (Of course I may have missed some bugs.) My guess is that 7th Edition Unix was largely Y2k-safe because much of its source code predated localtime, and simply called ctime and grabbed text from the result. ctime's simple API, which always supplied a full year, made Y2k bugs less likely. Conversely, that same API had serious problems once time_t grew enough to support years before 1000 or past 9999, which is partly why it's now obsolete. PS. 7th edition's implementation of asctime did have this amusing bit of code: if (t->tm_year>=100) { cp[1] = '2'; cp[2] = '0'; } and this was not a bug! Before it's executed cp[1]=='1' and cp[2]=='9', and on the 32-bit time_t platforms that 7th Edition ran on, "19" and "20" were the only possibilities for century digits.
You might want to have a look at __mktime_internal (). I discovered some time ago that mktime() did not return correct time_t values in some cases. For examples America/New_York, 1945-08-14 19:00:00, where the only difference is the Abbr (tm_zone), from EWT to EPT Africa/Johannesburg, 1944-03-19 01:00:00, where the Abbr (tm_zone) is the same, SAST to SAST __mktime_internal () was missing points near these transitions (and others) and so returning incorrect results. This was causing much confusion in my testing. The problem was that __mktime_internal () was comparing only isdstwhere it also needed to compare Abbr, as explained more in comments in the attached modified code. Attached is the __mktime_internal () code with my suggested modifications extracted from glibc-2.38\time\mktime.c as downloaded from https://mirrors.ibiblio.org/gnu/libc/glibc-2.38.tar.xz See the two code blocks commented // Modified by Brooks Harris Since making these changes I have not seen mktime() make any errors at many thousands of test points where localtime() was populating struct tm. I hope this might be helpful. -Brooks
On Jan 15, 2024, at 11:38 AM, Brooks Harris via tz <tz@iana.org> wrote:
The problem was that __mktime_internal () was comparing only isdst where it also needed to compare Abbr, as explained more in comments in the attached modified code.
Presumably you do not mean that it must compare the value of tm_zone, as there is no guarantee whatsoever that the caller of mktime() has set tm_zone.
On 1/15/2024 4:43 PM, Guy Harris wrote:
On Jan 15, 2024, at 11:38 AM, Brooks Harris via tz <tz@iana.org> wrote:
The problem was that __mktime_internal () was comparing only isdst where it also needed to compare Abbr, as explained more in comments in the attached modified code. Presumably you do not mean that it must compare the value of tm_zone, as there is no guarantee whatsoever that the caller of mktime() has set tm_zone. The modifications I made do indeed presume tm_zone is set correctly.
The use case I was addressing was where localtime() was populating struct tm so tm_zone is set. And that's really the only use I've put it to and wouldn't really trust using it in other uses that have not set tm_zone or tm_idst and such. In the cases I've described I think the modifications improve the results. But I'm learning from this email thread others do expect to use it in other ways and that the specifications allow that. This was not considered. I only looked to fix the fact mktime() did not always return the time_t given to localtime().
On 2024-01-12 00:02, Robert Elz wrote:
| If I understand things correctly, it's impossible for an implementation | to conform to both POSIX-2017 and draft next POSIX in this area.
What do you believe the problem to be?
On reflection I spoke too quickly and these appear to be awkward wordings and misinterpretations rather than conformance issues. Still, the awkwardnesses deserve discussion. POSIX 202x/D3 and the latest C2x draft (n3096) can both plausibly (but in my view incorrectly) be interpreted to say that strftime cannot look at tm_gmtoff or tm_zone, which is obviously wrong for tzcode, as well as for glibc, FreeBSD, and I assume other strftime implementations. To work around this problem, it'd be helpful if draft POSIX (at least) were fixed to clearly state the interpretation I proposed earlier today <https://mm.icann.org/pipermail/tz/2024-January/033524.html>. It'd also be helpful if draft POSIX were changed to state the following: * For %c and %X, strftime can look at tm_gmtoff and tm_zone. * For %s, strftime can look at tm_gmtoff. * For %Z, strftime can look at tm_zone. * For %z, strftime can look at tm_gmtoff and tm_zone. (tm_zone is useful when tm_gmtoff == 0, since it helps distinguish +0000 from -0000.) This would better reflect existing and reasonable practice. Less importantly, POSIX and C both say that strftime can look at struct tm members that aren't needed once you have tm_gmtoff and tm_zone (and in some cases aren't needed even if you have those two new members). POSIX should make it clear that the implementation need not look at these members. * For %c, %s, %Z and %z, strftime need not look at tm_isdst. * For %U, strftime need not look at tm_year. * For %X, strftime need not look at tm_year, tm_yday, tm_mon, tm_mday, tm_wday, tm_isdst. * For %x, strftime need not look at tm_hour, tm_min, tm_sec, tm_gmtoff, tm_zone, tm_isdst. To try to document all this better from tzcode's point of view, I installed the attached proposed patch to tzcode's strftime man page.
On 2024-01-10 11:20, Paul Eggert wrote:
This is a tricky area, as the C standard and POSIX both require strftime to look only at tm_isdst when formatting %z and %Z.
In rereading the C standard and POSIX, I see I was too hasty here. For strftime the standards actually use wording like the following[1]:
Each conversion specifier is replaced by appropriate characters as described in the following list. The appropriate characters are determined using the LC_TIME category of the current locale and by the values of zero or more members of the broken-down time structure pointed to by timeptr, as specified in brackets in the description.
This wording doesn't specifically say that strftime must ignore all information other than the bracketed members and the LC_TIME category; it merely calls out information that strftime can use. In general strftime cannot ignore all the other information, as in general strftime must look at the TZ setting and the LC_CTYPE category. So the standards' wording (admittedly confusingly) does allow strftime to pay attention to information outside the bracketed list. Also, my earlier (hasty) reading, which prohibited strftime from inspecting any bracketed members other than those listed, is incompatible with common practice and with POSIX 202x/D3. For example, for %z POSIX 202x/D3 lists "[tm_isdst, tm_gmtoff]" whereas C17 lists only "[tm_isdst]". My earlier (hasty) reading would have C17 prohibiting the use of any member other than tm_isdst to process %z, but this is not the intent of POSIX 202x/D3 and it's not how localtime implementations typically behave: they just consult tm_gmtoff to determine %z. With this in mind, your proposed patch looks like a good approach. It's much simpler than other approaches mentioned in this thread that would add members to struct tm and therefore would be a major pain, or that would add new functions strftime_z and strftime_lz which would also cause significant pain. However, your proposed patch has a glitch: in extreme cases its "mkt -= t->TM_GMTOFF" could suffer from integer overflow, which means the behavior could be undefined for cases that should simply fail, and also some valid inputs could mistakenly fail. To fix this glitch we can use timeoff rather than timegm. I installed the attached to implement your patch's idea, but using timeoff rather than gmtime. Most of the patch is plumbing that makes timeoff visible to strftime even if timeoff is not otherwise extern. Please give this patch a try. [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html#tag...
Date: Sat, 13 Jan 2024 15:04:31 -0800 From: Paul Eggert via tz <tz@iana.org> Message-ID: <cb3042f9-0698-431a-bc9a-542c8aa6d6ca@cs.ucla.edu> | This wording doesn't specifically say that strftime must ignore all | information other than the bracketed members and the LC_TIME category; Not all information, certainly - but unless you have some way to guarantee that the aplication has set the unspecified fields of the tm struct, referencing any of them may be referncing uninit'd date, and that is undefined behaviour (processor is permitted to execute the hcf instruction). | it merely calls out information that strftime can use. More importantly it specifies the fields the application must set, dirctly or indirectly (as by a call to localtime). | In general strftime cannot ignore all the other information, | as in general strftime must look at the TZ setting Only for %s, and that is specified by reference to mktime() which is where the TZ reuirment appears (via tzset()). If there is no use of %s strftime() should not go near TZ. | and the LC_CTYPE category. Yes. That one is explicit. | So the standards' wording (admittedly confusingly) does allow | strftime to pay attention to nformation outside the bracketed list. Implementations are generally permitted to operate however they like. If you have some way to guarantee that referencing any uninit'd fields wont have consequences you don't want to have hapoen, then go ahead and do it. Generally we assume architectures that don't trap (or worse) on references to uninit'd memory, but simply return garbage, so if that's good enough, then fine, go ahead. | for %z POSIX 202x/D3 lists "[tm_isdst, tm_gmtoff]" whereas C17 lists | only "[tm_isdst]". Yes, if you're being strictly C conforming, then you cannot access tm_gmtoff, as C doesn't demand that field exist, so conforming portable C applications cannot be expected to have set it. POSIX does required it (will require it when this new standard appears) and so applications written to conform to that standard will know to set that field. Older applications will not, so accessing it for them is risky. | My earlier (hasty) reading would have C17 prohibiting | the use of any member other than tm_isdst to process %z, Yes. It (effectively) does. | With this in mind, your proposed patch looks like a good approach. Approach for what, I'm not sure what the proposed patch is, or what it is intending to fix - as best I can tell, the tzcode strftime() doesn't really need any changes (unless we want to still attempt to support applications running on systems without tm_gmtoff - which has been very few for a long time now). | much simpler than other approaches mentioned in this thread that would | add members to struct tm and therefore would be a major pain, In practice, it is impossible. struct tm exists in binary form in thousands of places that can't be updated. If it weren't for the fact that tm_gmtoff and tm_zone already exist in essentially every implementation, and have done for years, there's no way they could have been added to the standard (and is why it took so long). Any future struct tm changes are going to need an entire new structure, and also a whole new set of accessor funxtions (ie: something to replace localtime() mktime() strftime() ...) and ideally would be defined in a way which doesn't allow applications direct access to its members, or to the size of the struct, but only via the accessor functions (which would then need some form of "set" and "get" functions to allow acess to the fields). | or that would add new functions strftime_z and strftime_lz which | would also cause significant pain. No, adding new funxtions is easy, provided name clashes can be avoided (which probably means only exposing the names if the application, one way or another, requests them). kre
On Jan 13, 2024, at 5:46 PM, Robert Elz via tz <tz@iana.org> wrote:
| much simpler than other approaches mentioned in this thread that would | add members to struct tm and therefore would be a major pain,
In practice, it is impossible. struct tm exists in binary form in thousands of places that can't be updated. If it weren't for the fact that tm_gmtoff and tm_zone already exist in essentially every implementation, and have done for years, there's no way they could have been added to the standard (and is why it took so long).
So we're presumably ruling out AIX and Solaris, for example, as implementations to count in the set of "essentially every implementation".
Date: Sat, 13 Jan 2024 19:40:40 -0800 From: Guy Harris <gharris@sonic.net> Message-ID: <5437FC1E-09EA-4D68-918E-7378B151A3D5@sonic.net> | So we're presumably ruling out AIX and Solaris, Do you seriously believe that either of those is going to do the work needed to upgrade to the coming version of POSIX? Users of those are (I suspect) going to have to live with implementations of the previous standard, forever. There are very many changes in the coming version - the first real update since 2008 (the intervening issued have just been corrections and clarifications) - and this one has more changes than the previous one had over its predecessor. kre
On Jan 13, 2024, at 8:34 PM, Robert Elz <kre@munnari.OZ.AU> wrote:
Date: Sat, 13 Jan 2024 19:40:40 -0800 From: Guy Harris <gharris@sonic.net> Message-ID: <5437FC1E-09EA-4D68-918E-7378B151A3D5@sonic.net>
| So we're presumably ruling out AIX and Solaris,
Do you seriously believe that either of those is going to do the work needed to upgrade to the coming version of POSIX?
Solaris's certification has lapsed, so probably not. AIX's hasn't, so I do not believe there is enough information out there to seriously believe either that IBM will update it or IBM will not update it.
On 2024-01-13 17:46, Robert Elz wrote:
but unless you have some way to guarantee that the aplication has set the unspecified fields of the tm struct, referencing any of them may be referncing uninit'd date, and that is undefined behaviour
It's not necessarily undefined behavior because we're talking about a standard library function, and (as you mentioned elsewhere) such functions need not be implemented in standard C. More important, the draft POSIX standard is simply wrong if it says strftime can't look at (for example) tm_gmtoff when calculating %z, because many implementations do exactly that and we shouldn't invalidate them all. (One example is given below.)
| In general strftime cannot ignore all the other information, | as in general strftime must look at the TZ setting
Only for %s
In POSIX 2017 strftime must also look at TZ for %Z and %z, and that hasn't changed in POSIX 202x/D3. (Admittedly this is a messy area and the draft doesn't get this area right.)
that is specified by reference to mktime() which is where the TZ reuirment appears (via tzset())
That's another place where the POSIX draft gets things wrong. I'm not talking about issues with mktime itself (it's dicey for other reasons, but that is a separate matter); I'm talking about where draft POSIX mistakenly says strftime needs mktime (or equivalent) to implement %s.
If there is no use of %s strftime() should not go near TZ.
First, an implementation of POSIX 202x/D3 strftime doesn't need to go near TZ to implement any conversion, not even %s. tzcode strftime no longer relies on TZ as of today's patch, and this is fine. Second, the POSIX draft requires strftime to act as if it calls tzset, which means strftime *must* go near TZ even if %s is not used, if only so that strftime conforms to this no-longer-useful POSIX requirement.
| and the LC_CTYPE category.
Yes. That one is explicit.
Where? I don't see LC_CTYPE mentioned anywhere in the strftime section (202x/D3 lines 69411-69832). Obviously LC_CTYPE is essential but it's not explicitly called out in strftime's description. I mentioned LC_CTYPE only because it's another example of something that strftime can use to calculate conversions, something that is not explicitly mentioned in the spec; and this demonstrates that the set of things that strftime can use is not exhaustive.
If you have some way to guarantee that referencing any uninit'd fields wont have consequences you don't want to have hapoen, then go ahead and do it.
We have that in tzcode, with UNINIT_TRAP. (It's a guarantee by the implementer.)
Yes, if you're being strictly C conforming, then you cannot access tm_gmtoff, as C doesn't demand that field exist, so conforming portable C applications cannot be expected to have set it. POSIX does required it (will require it when this new standard appears) and so applications written to conform to that standard will know to set that field. Older applications will not, so accessing it for them is risky.
This interpretation is too strict, and it doesn't correspond to how implementations behave. For example: #include <stdio.h> #include <time.h> static void f (int gmtoff) { char buf[100]; struct tm tm; tm.tm_gmtoff = gmtoff; tm.tm_isdst = 0; strftime (buf, sizeof buf, "%z", &tm); printf ("%%z formats as '%s' with tm_isdst=%d, tm_gmtoff=%ld\n", buf, tm.tm_isdst, tm.tm_gmtoff); } int main () { f (0); f (3600); } On my Ubuntu 23.10 platform with TZ="America/Los_Angeles" this outputs: %z formats as '+0000' with tm_isdst=0, tm_gmtoff=0 %z formats as '+0100' with tm_isdst=0, tm_gmtoff=3600 If the abovementioned interpretation were correct, then to conform to POSIX-2017, strftime %z could look *only* at tm_isdst and could not look at tm_gmtoff, and the Ubuntu behavior would therefore be incorrect because its strftime is clearly looking at tm_gmtoff. But the Ubuntu behavior *is* correct: it's good behavior, it's the behavior most people would expect, and it's common on many implementations. If an interpretation of the C and/or POSIX standards says that Ubuntu doesn't conform, then either the standards are wrong or the interpretation is wrong. My previous email today gave an alternate interpretation under which the Ubuntu behavior is conforming, and I expect that this is the best way out of this mess.
I'm not sure what the proposed patch is, or what it is intending to fix
It intends to fix the bug reported by Dag-Erling Smørgrav here: https://mm.icann.org/pipermail/tz/2024-January/033488.html Without the patch, the bug occurs even on systems with tm_gmtoff.
| or that would add new functions strftime_z and strftime_lz which | would also cause significant pain.
No, adding new funxtions is easy
True, it's easier for implementers to add functions than to add struct tm members. However, it's still a pain for users as it requires them to use a new API simply because they want %z, %Z and %s to work sanely. In contrast, today's proposed tzcode patch means no change to the API, avoiding this unnecessary pain.
Date: Sat, 13 Jan 2024 22:51:14 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <99ee73a9-6336-4f3d-a8b3-e57b2e1817dd@cs.ucla.edu> | It's not necessarily undefined behavior because we're talking about a | standard library function, and (as you mentioned elsewhere) such | functions need not be implemented in standard C. It might not necessarily actually result in something bad happening, but the application must assume that it might. | More important, the draft POSIX standard is simply wrong if it says | strftime can't look at (for example) tm_gmtoff when calculating %z, The implementation can look at whatever it likes. There's no problem there, but it cannot assume that the application has placed any meaningful data in the fields that the standard doesn't say that strftime() is likely to examine. | In POSIX 2017 strftime must also look at TZ for %Z and %z, I'm not sure must is correct, I suspect that the intended implementation for %Z is if (_tz_inited) strcpy(result, tzname[tm->tm_isdst == 0]); else strcpu(result, ""); (ignoring buffer overflows and stuff like that), and similarly for %z except using sprintf to generate a string from "timezone" in the case that a value has earlier been placed there - still "" otherwise). No examination of TZ (by strftime) required for those. Of course, the implementation can do it some other way if it likes, but it cannot assume that tm->tm_gmtoff or tm->tm_zone has been set to anything intersting (tm_zone might be a pointer to somewhere which generates SIGSEGV if referenced). | That's another place where the POSIX draft gets things wrong. | I'm talking about where draft POSIX | mistakenly says strftime needs mktime (or equivalent) to implement %s. It says nothing of the kind. What it says is: Replaced by the number of seconds since the Epoch as a decimal number, calculated as described for mktime(). [tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_isdst] That doesn't say that mktime() needs to be used, just that the value you get from strftime(buf, sizeof buf, "%s", &tm); needs to be the exact same thing you'd get from snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm)); Note that in the strftime case though, the tm struct is not altered by the call, which it would be (if required) by the mktime() variant. Regardless of how those 7 fields of the struct tm got filled in, and regardless of what (if anything at all) is in the other fields of the struct (the ones that must exist, and any others the implementation might have added), if those two ways of generating a string representation of an integer in buf don't do the same thing (assuming the mktime() variant doesn't generate an error and return (time_t)-1 - and perhaps even then) then the implementation is broken. | First, an implementation of POSIX 202x/D3 strftime doesn't need to go | near TZ to implement any conversion, not even %s. Agreed. | Second, the POSIX draft requires strftime to act as if it calls tzset, No, what it says is: Local timezone information shall be set as though strftime( ) called tzset( ). So if you want %z or %Z then things act as if tzset() was called (whether it actually is or not), because there's no other way to get the data that would allow those to be possible. Similarly, with %s, since we need to act as if mktime() was called (or at least generate the same result) then we need to know the local timezone, so mktime() is defined to act as if tzset() were called, and consequently, so does strftime() when using %s. For other conversions, there's no reference to local time at all, strftime() simply formats whatever is in the tm handed to it (or probably nothing, if you ask for the name of the 13th day of the week, or the 19th month, or something similarly stupid - that's actually unspecified). | Where? I don't see LC_CTYPE mentioned anywhere in the strftime section | (202x/D3 lines 69411-69832). Obviously LC_CTYPE is essential but it's | not explicitly called out in strftime's description. If it were needed to reference the LC_ vars in every function which uses them in some way, the standard would be even bigger than it is. They are listed for utilities, but for the system interfaces (ie: functions) see XBD 7.1 where it says: The behavior of some of the C-language functions defined in the System Interfaces volume of POSIX.1-202x shall also be modified based on a locale selection. The locale to be used by these functions can be selected in the following ways: [the first two mechanisms that different functions might use omitted here] 3. Some functions, such as catopen( ) and those related to text domains, may reference various environment variables and a locale category of a specific locale to access files they need to use. And on it goes. This is from draft 4, but I don't believe that any of this changed after D3. | I mentioned LC_CTYPE only because it's another example of something that | strftime can use to calculate conversions, I assume it might want LC_NUMERIC as well, but that's not mentioned either. LC_TIME is, because that one is particularly relevant to some of the conversions. | something that is not | explicitly mentioned in the spec; and this demonstrates that the set of | things that strftime can use is not exhaustive. Of course not, strftime() can look at whatever it likes. What it can't do is expect the application to have provided any data that the standard does not require of it. Locales are easy (in this regard), as everything defaults to "C" (aka "POSIX") if the application has done nothing. So any of the system interfaces can access any locale information it needs, it is always defined (somehow - how is up to the implementation). | This interpretation is too strict, Notice I said "strictly C conforming" not POSIX conforming. In that environment, this line: | tm.tm_gmtoff = gmtoff; is likely to generate a compilation error, so the application cannot include it. That one must be removed to be strictly C conforming. Given that one is not there, how is it that the strftime() function would be expected to use the parameter to this function ? | If the abovementioned interpretation were correct, then to conform to | POSIX-2017, strftime %z could look *only* at tm_isdst and could not look | at tm_gmtoff, and the Ubuntu behavior would therefore be incorrect | because its strftime is clearly looking at tm_gmtoff. It is fine, provided it continues working when the application hasn't provided a value for tm_gmtoff. And somehow it can work out the difference between that field (which will still be in the struct of course) hasn't been set and just contains garbage, and when it has. That's no longer the case in the forthcoming standard, as %z is allowed to use the value in tm_gmtoff, and so conforming applications will need to set it. | But the Ubuntu behavior *is* correct: it's good behavior, it's the | behavior most people would expect, and it's common on many | implementations. If an interpretation of the C and/or POSIX standards | says that Ubuntu doesn't conform, For C, clearly not, as tm_gmtoff doesn't exist there (which doesn't mean that an implementation cannot add it, but no conforming application can assumes that it has been - so it can never be init'd, except perhaps to 0 by a memset() (or equiv) of the entire struct. For POSIX, now that tm_gmtoff has been added, the standard has been amended. | It intends to fix the bug reported by Dag-Erling Sm�rgrav here: | https://mm.icann.org/pipermail/tz/2024-January/033488.html But there is no bug described there, the answers that the example produced are what is intended to happen. That's because of the "same result as mktime() would produce" requirement. And mktime() is defined to be the inverse of localtime() not of gmtime(). (C23 or something is supposedly adding timegm() - which is sorely lacking, and POSIX will then add it in issue 9, in a decade or two, perhaps, just perhaps, but unlikely, in some earlier TC update). | Without the patch, the bug occurs even on systems with tm_gmtoff. Once again, there is no bug (or at least, was, if you have changed how it works, anything like was requested, there will be a bug now). It might not have met his expectations, in which case his expectations were incorrect. This is really all very simple stuff. kre
On 2024-01-14 03:14, Robert Elz wrote:
Date: Sat, 13 Jan 2024 22:51:14 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <99ee73a9-6336-4f3d-a8b3-e57b2e1817dd@cs.ucla.edu>
| It's not necessarily undefined behavior because we're talking about a | standard library function, and (as you mentioned elsewhere) such | functions need not be implemented in standard C.
It might not necessarily actually result in something bad happening, but the application must assume that it might.
There are two things going on here. (A) Does the POSIX strftime spec require the caller to set a struct tm component when this requirement should be unnecessary? And (B) does the POSIX spec fail to require the caller to set a struct tm component when the requirement should be necessary? (A) is of lesser importance in practice. Although it's overkill for the POSIX strftime spec to require struct tm components to be set when they don't need to be set, and this overkill can cause useless work by portable applications, it's not that big a deal. In practice nearly every app calls strftime on the result of localtime etc. and so the components are set anyway. Our thread, if I understand things correctly, is mostly about (B), not (A). More on this below.
| More important, the draft POSIX standard is simply wrong if it says | strftime can't look at (for example) tm_gmtoff when calculating %z,
The implementation can look at whatever it likes.
That's good news. In that case we're in agreement.
| In POSIX 2017 strftime must also look at TZ for %Z and %z,
I'm not sure must is correct, I suspect that the intended implementation for %Z is
if (_tz_inited) strcpy(result, tzname[tm->tm_isdst == 0]); else strcpu(result, "");
Oh, by "look at TZ" I meant look at data generated from TZ's value. tzname is part of that data, so the code you give is "looking at TZ" in the sense I meant. If by "_tz_inited" you meant "after[ strftime did its mandatory call to tzset-or-equivalent, that call sucessfully determined the current timezone", I agree the code you gave reflects the intent for POSIX-2017. (If you meant something else then it'd be helpful to know what it was.) However, it's not at all clear that the code reflects the intent, or should reflect the intent, for POSIX-202x/D4. Similarly for %z.
the implementation can do it some other way if it likes, but it cannot assume that tm->tm_gmtoff or tm->tm_zone has been set to anything intersting (tm_zone might be a pointer to somewhere which generates SIGSEGV if referenced).
That's true for POSIX-2017, but not true for POSIX 202x/D4. It's OK for the implementation to examine tm_zone when processing %Z. See POSIX 202x/D4 line 69872.
| I'm talking about where draft POSIX | mistakenly says strftime needs mktime (or equivalent) to implement %s.
It says nothing of the kind. What it says is:
Replaced by the number of seconds since the Epoch as a decimal number, calculated as described for mktime(). [tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_isdst]
That doesn't say that mktime() needs to be used
That's why I wrote "mktime (or equivalent)", not "mktime". The draft is imprecisely worded here, and it's easy to misread it as saying this:
strftime(buf, sizeof buf, "%s", &tm); needs to be the exact same thing you'd get from snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm));
but this reading isn't quite right. All that's needed is for strftime to compute seconds since the Epoch in the usual way (i.e., using the Gregorian calendar and ignoring leap seconds), and while doing that to infer the UTC offset following the constraints described in the mktime section. Those constraints do not uniquely determine a result in every case, and this gives strftime wiggle room. That is, strftime need not use the same code that mktime does to make its inferences, and on a particular struct tm an implementation's strftime %s could infer a different UTC offset than the same implementation's mktime on the same struct tm.
Note that in the strftime case though, the tm struct is not altered by the call, which it would be (if required) by the mktime() variant.
Yes, of course.
| First, an implementation of POSIX 202x/D3 strftime doesn't need to go | near TZ to implement any conversion, not even %s.
Agreed.
That's good.
| Second, the POSIX draft requires strftime to act as if it calls tzset,
No, what it says is:
Local timezone information shall be set as though strftime( ) called tzset( ).
I see that as the same idea, just using different words. There should be no practical difference.
So if you want %z or %Z then things act as if tzset() was called (whether it actually is or not), because there's no other way to get the data that would allow those to be possible.
But there is another way. With %z, strftime can use tm_gmtoff; see POSIX 202x/D4 line 69870. And with %Z, strftime can use tm_zone; see line 69872. The same argument applies to %s, if we fix the error on line POSIX 202x/D4 line 69837 where tm_gmtoff is mistakenly omitted. There's no need to look at tzset's output if strftime simply uses tm_gmtoff and the other struct tm members listed there.
| Where? I don't see LC_CTYPE mentioned anywhere in the strftime section | (202x/D3 lines 69411-69832). Obviously LC_CTYPE is essential but it's | not explicitly called out in strftime's description.
If it were needed to reference the LC_ vars in every function which uses them in some way, the standard would be even bigger than it is.
That's fine, and I'm not objecting to that. All I'm saying is that the standard does not list every source of information that strftime can use to process conversion specs. For example, when lines 69871-69872 say: Z Replaced by the timezone name or abbreviation, or by no bytes if no timezone information exists. [tm_isdst, tm_zone] This does not mean that %Z's replacement is completely determined by tm_isdst and tm_zone; all it means is that tm_isdst and tm_zone must be set by the caller and must be in the normal range so that strftime can use them (among other things) to determine %Z's replacement.
| something that is not | explicitly mentioned in the spec; and this demonstrates that the set of | things that strftime can use is not exhaustive.
Of course not, strftime() can look at whatever it likes.
Good, and this matches what I just wrote above. (I hope we're in violent agreement. :-)
Notice I said "strictly C conforming" not POSIX conforming.
In that environment, this line:
| tm.tm_gmtoff = gmtoff;
is likely to generate a compilation error, so the application cannot include it. That one must be removed to be strictly C conforming.
Oh, good point. So let's use similar code but without tm_gmtoff: #include <stdio.h> #include <time.h> static void f (char const *fn, struct tm *tm) { char buf[100]; tm->tm_isdst = 0; strftime (buf, sizeof buf, "%z", tm); printf ("after %s, %%z formats as '%s' with tm_isdst=%d\n", fn, buf, tm->tm_isdst); } int main () { time_t t = 0; f (" gmtime", gmtime (&t)); f ("localtime", localtime (&t)); } On Ubuntu 23.10 with TZ="America/Los_Angeles" in the environment, this outputs: after gmtime, %z formats as '+0000' with tm_isdst=0 after localtime, %z formats as '-0800' with tm_isdst=0 which is fine and is the sort of behavior that Dag-Erling Smørgrav expected, even though strftime obviously must be using information other than what's in tm_isdst (or even in TZ) to compute the differing strings "+0000" and "-0800". Because strftime can look at whatever it likes, this behavior is OK.
| It intends to fix the bug reported by Dag-Erling Smørgrav here: | https://mm.icann.org/pipermail/tz/2024-January/033488.html
But there is no bug described there, the answers that the example produced are what is intended to happen. That's because of the "same result as mktime() would produce" requirement.
As mentioned above, since mktime's behavior isn't completely determined by POSIX 202x/D4, there's wiggle room in how strftime can behave on Dag-Erling's example. One possibility is that tzcode conforms to POSIX 202x/D4 both before and after the recently-installed tzcode patch <https://mm.icann.org/pipermail/tz/2024-January/033524.html>. (This patch implements Dag-Erling's suggestion, albeit in a different way that avoids some rare overflow issues.) That is, it's possible that the patch didn't fix a POSIX-conformance bug, but merely a user-expectation bug in an area where POSIX allows different behaviors. If this possibility is correct, I guess I can live with it, though I'm a bit disappointed that POSIX allows the confusing behavior that Dag-Erling described. But anyway, this would mean the recently-installed patch is OK as far as POSIX 202x/D4 is concerned.
This is from draft 4, but I don't believe that any of this changed after D3.
Thanks, I didn't know that draft 4 was out. I got a copy and am now referring to it instead in my comments now. I too haven't noticed any changes in this area.
Date: Sun, 14 Jan 2024 20:22:33 -0800 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <24326f8e-ec8d-4a70-bf82-d62a5790ae7f@cs.ucla.edu> | (A) is of lesser importance in practice. Agreed. | Although it's overkill for the POSIX strftime spec to require | struct tm components to be set when they don't need to be set, You're missing the point of it all. The components that it lists are what some correct implementations need to be set to function. That some other implementation might not is irrelevant - the goal is for the user to be able to write code that will work with any conforming implementation, not just the one that they happen to be using when they write the code. | and this overkill can cause useless work by portable applications, Not useless, just perhaps not needed for a particular implementation but if the application starts caring about that, it might as well delve into any local variation, and cease pretending to be portable. | it's not that big a deal. In practice nearly | every app calls strftime on the result of localtime etc. Is there some evidence to support that? And even if true, why would you penalise those other apps which don't? | Our thread, if I understand things correctly, | is mostly about (B), not (A). Yes. | > The implementation can look at whatever it likes. | That's good news. In that case we're in agreement. Yes, on that we are. Of course, the implementation still needs to implement the result that is actually specified to be produced and not some other result it thinks might be better, and it needs some way to determine whether the other information has actually been provided by anyone or not (otherwise there's nothing there to use). For things like locale info, that's easy, as the rules specify that always exists. For TZ related info, also easy in specific cases, as there the rules specify "as if tzset() were called" and that allows access to TZ and all that it happens to provide. But only when the spec says that, not just arbitrarily. | Oh, by "look at TZ" I meant look at data generated from TZ's value. Oh - that's not how I interpreted it, but OK. | If by "_tz_inited" you meant "after[ strftime did its mandatory call to | tzset-or-equivalent, that call sucessfully determined the current | timezone", I agree the code you gave reflects the intent for POSIX-2017. Yes. And that is what I meant. | (If you meant something else then it'd be helpful to know what it was.) | However, it's not at all clear that the code reflects the intent, or | should reflect the intent, for POSIX-202x/D4. Since I was replying to a comment of yours which started: In POSIX 2017 strftime must ... so I am not sure what the current drafts say is relevant. But yes, once the next POSIX is published, then the tm_gmtoff field will be available to %z and tm_zone to %Z, and simply using those will be easy to do. Of course, if you do it that way, you're breaking any existing applications which were written to either conform to the C standards (any of them) or versions up to and including the current published POSIX standard (as who knows, when it comes to ISO and IEEE balloting, the current drafts might be rejected, and be sent back to be "fixed".) But since that bridge was crossed a long long time ago, there are unlikely to be many. | That's true for POSIX-2017, but not true for POSIX 202x/D4. Again, see just above for the context for my comments. It was your restriction that provoked that response. | It's OK for the implementation to examine tm_zone when processing %Z. | See POSIX 202x/D4 line 69872. Oh, I know that's there, it was my defect report that got those all added properly. It just wasn't relevant to your precondition. | That's why I wrote "mktime (or equivalent)", not "mktime". OK. | The draft is imprecisely worded here, and it's easy to misread it | as saying this: | | > strftime(buf, sizeof buf, "%s", &tm); | > needs to be the exact same thing you'd get from | > snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm)); That is what it (at least) intends to be saying. If the wording needs fixing, now is the time to make that happen, if you can make anyone believe that the words can reasonably be read any other way (within the context of everything that is in the HUGE standard doc). | but this reading isn't quite right. All that's needed is for strftime | to compute seconds since the Epoch in the usual way (i.e., using the | Gregorian calendar and ignoring leap seconds), But aside from correcting for out of range values, which strftime is not required to do, that's eactly what mktime() is specified to do. Referencing mktime() simply avoids saying all of that in two different places (which could lead to eventual contradictions, if one of them is altered and the other is not). | infer the UTC offset following the constraints described in the mktime | section. That UTC offset *must* come from the TZ value, such that if TZ is altered to refer to some other offset, then the result from mktime() (and hence from strftime(%s)) must change. The contents of the struct tm are not allowed to alter that. The example code we were shown where TZ is purposely altered and then the results of the two calls subtracted from each other to show the time offset between two different timezones (the subtraction only works with a POSIX time_t but for most of us, that's all that matters) is an example of code that must not be broken. I think that was using mktime(), but believe me, mktime() and strftime("%s") are required to produce the exact same number. Always (assuming in range values in the tm). | Those constraints do not uniquely determine a result in every case, Only in summer time local time warps, and if you believe the POSIX people, not even then unless tm_isdst is set to -1. | and this gives strftime wiggle room. That is, strftime need not | use the same code that mktime does to make its inferences, No, it doesn't need to use the same code, but providing the 7 struct tm values it has to work with are within range, it must produce the same answer as mktime() would, however the code is written. | and on a particular struct tm an implementation's strftime %s | could infer a different UTC offset than the same implementation's | mktime on the same struct tm. That would be broken. It must generate the same result, not something different. There is no "wriggle" room to allow anything different. | But there is another way. With %z, strftime can use tm_gmtoff; | see POSIX 202x/D4 line 69870. And with %Z, strftime can use tm_zone; | see line 69872. Yes, those can, because that's what common implementations where those fields exist (which is most of them) actually do, and so applications tend to accomodate that already. That's what is needed for the standard to get updated - it is specifying what actually exists and works (except where known bugs exist). It is not a legislature. Where there is no common ground, the standard just ends up saying that something is unspecified, which is a big red flag for applications to avoid stepping into that pothole. | The same argument applies to %s, No it doesn't, because that's not what the implementations have done. Not even yours, until a week or so ago. No working application code expects that. Mistaken users might, and they might send in incorrect bug reports complaining that their code doesn't work because of it, but it is thier mistaken belief that needs to be corrected. | if we fix the error on line POSIX 202x/D4 line 69837 where tm_gmtoff | is mistakenly omitted. Good luck with that. And I can assure you, that was not an accident. But by all means, submit a defect report, and see how far that gets you. | This does not mean that %Z's replacement is completely determined by | tm_isdst and tm_zone; all it means is that tm_isdst and tm_zone must be | set by the caller and must be in the normal range so that strftime can | use them (among other things) to determine %Z's replacement. Yes. But the implementation needs to know that whatever other data it wants to look at is in fact actual pertinent data, and not just random bits, or it won't be producing the correct result. So if it wants to use tzname[] (as an example, given the %Z assumed) it needs to know (or arrange) for tzname[] to have been correctly set. And for tzname[] that would mean (explicitly or via some equivalent) calling tzset(). Not copying tm_zone into tzname[] and then using that - that would be broken. It can just return what is in tm_zone though (or will be able to in the next POSIX - which is there because that's what is currently actually done) and forget about the "among other things" (and ignore tm_isdst completely). | > Of course not, strftime() can look at whatever it likes. | Good, and this matches what I just wrote above. | (I hope we're in violent agreement. :-) We are, providing you're not proposing to use data that isn't guaranteed to be valid. I'm in violent opposition to that. | On Ubuntu 23.10 with TZ="America/Los_Angeles" in the environment, this | outputs: | | after gmtime, %z formats as '+0000' with tm_isdst=0 | after localtime, %z formats as '-0800' with tm_isdst=0 which is because it is using the tm_zone extension, that C does not guarantee exists, but ununtu is more POSIX like, and is using that, which is how it got added to the forthcoming POSIX spec (because that's how the POSIX world actually works, regardless of what the old spec said). | which is fine and is the sort of behavior that Dag-Erling Smørgrav | expected, Yes, but note, that was a complaint that it didn't work, because that's not how the implementations work. And hence, not how the function is specified to behave, now, or in the next POSIX. If you want to go into the vanguard, and make changes arbitrarily, knowingly violating POSIX, that's fine (I do that kind of thing in other areas where the standard is stupid, even when the reasons it is stupid were once valid) that's fine. But expect to get (perhaps many) bug reports over the next decade or two until there's any chance of POSIX being updated to match your implementation, with users pointing to the standard and asking why you're not doing what other conforming implementations do, and requesting you to fix it. After all, the tzcode strftime() implementation has been how it was for how many decades now, and just how many complaints like that one on this issue have been received in all that time ? | Because strftime can look at whatever it likes, this behavior is OK. As long as producing garbage answers is OK to you, then, fine. See my (quite) recent reply to Steve's message to see an example of code that should work, and you'd be breaking by doing this. More likely code would be something more like void convfile(FILE * ifd, FILE * ofd) { struct tm T; char buf[1024]; char sbuf[128]; int line = 0; while (fscanf(ifd, "%d-%d-%d %d:%d:%d %1000s", &T.tm_year, &T.tm_mon, &T.tm_mday, &T.tm_hour, &T.tm_min, &T.tm_sec, buf) != EOF) } line++; T.tm_year -= 1900; T.tm_mon -= 1; if (invalid_tm_ranges(&T)) { /* function not supplied here */ fprintf(stderr, "Line %d, Whatever...", line /*, ... */ ); continue; } if (strftime(sbuf, sizeof sbuf, " %s") == 0) { fprintf(stderr, "Line %d, cannot convert time", line); continue; fprintf(ofd, "%s %s\n", sbuf+1, buf); } } That needs to work (given the limitations on what I had time to write here, and as with my example in the previous message, not even compile tested, so there may be some stupid bugs, and of course, the internal func that's called needs writung (that's just simple comparisons of the 6 values in the arg struct tm* against the ranges specified - 5 really, as anything goes for tm_year). | As mentioned above, since mktime's behavior isn't completely determined | by POSIX 202x/D4, That's not what the POSIX writers want you to think, at least when tm_isdst is not -1 (you made it be 0, which is fine) - they're wanting mktime() (and hence strftime("%s") because it is defined by reference) to be usable for arithmetic on the struct tm, as that's the only way a conforming C application can modify a time_t (other than calling one of the functions, like time() ior stat() which returns one or more). And supporting conforming C applications is one of the goals. | there's wiggle room in how strftime can behave on Dag-Erling's example. No, there really isn't. | That is, it's possible that the patch didn't fix a POSIX-conformance | bug, No, it certainly did not do that, rather it introduced one. | but merely a user-expectation bug in | an area where POSIX allows different behaviors. Try asking that on the Austin Group list, and see how far it gets you. | If this possibility is correct, I guess I can live with it, though | I'm a bit disappointed that POSIX allows the confusing behavior that | Dag-Erling described. Not allows. Requires. There is no assumption that tm_gmtoff is set to anything at all when mktime() or strftime("%s") are called. Using it for them is a bug. Your implementation needs to work for the code in my previous message, and this one (at least if any idiotic typos/thinkos are fixed, and they are fleshed out to be complete programs, with #include added, and all that stuff). The issue is all based upon the mistaken belief that a struct tm has an underlying time_t upon which it is based, and that's what %s should produce (and since that's defined as the same as mktime() produces, then that can, and would have to be, extended to mktime() as well). | But anyway, this would mean the recently-installed | patch is OK as far as POSIX 202x/D4 is concerned. It isn't. But by all means, there's no need to trust my interpretation, ask on the austin group list, or submit a defect report via mantis, and see what happens (I know you know how to do both of those). kre
On 2024-01-15 01:38, Robert Elz wrote:
| Although it's overkill for the POSIX strftime spec to require | struct tm components to be set when they don't need to be set,
You're missing the point of it all. The components that it lists are what some correct implementations need to be set to function.
I'm not missing the point; I'm trying to clarify and/or fix it. POSIX has never required implementations to look at every listed struct tm member to function correctly in every use of the corresponding conversion spec. Nor has it required that the result of the conversion spec be completely determined by the contents of the struct tm members. The relation between the struct tm member values and the conversion outputs is more subtle than that.
| it's not that big a deal. In practice nearly | every app calls strftime on the result of localtime etc.
Is there some evidence to support that?
Sure, look at tzcode. Or at coreutils. Or at Emacs. Or at 'tar'. Or at pretty much every app that uses strftime. Although it's theoretically possible that there are exceptions for the edge cases we're talking about, so far in this thread we've seen zero real-world examples.
why would you penalise those other apps which don't?
I'm not trying to *penalize* any unusual code that trips over these edge cases. I'm trying to *help* it. If code relies on bugs or incorrect interpretations of odd corners of the POSIX spec, it'll get wrong answers. Standards should be worded clearly to help prevent this sort of confusion.
But yes, once the next POSIX is published, then the tm_gmtoff field will be available to %z and tm_zone to %Z, and simply using those will be easy to do. Of course, if you do it that way, you're breaking any existing applications which were written to either conform to the C standards (any of them)
No, it doesn't break conforming C programs that use these oddball edge cases. The C standard doesn't specify how the implementation determines the timezone. It can be TZ or it can be something else. So even if the behavior changes, it's not a violation of the C standard, as a conforming C program will still work as the C standard requires.
| All that's needed is for strftime | to compute seconds since the Epoch in the usual way (i.e., using the | Gregorian calendar and ignoring leap seconds),
But aside from correcting for out of range values, which strftime is not required to do, that's eactly what mktime() is specified to do.
That's a pretty big aside.... The intent of this part of the strftime spec, as I see it, is to say that strftime should use the standard POSIX way of breaking down time (Gregorian, no leap seconds) - not all the other mktime machinery.
That UTC offset *must* come from the TZ value, such that if TZ is altered to refer to some other offset, then the result from mktime() (and hence from strftime(%s)) must change.
This is incorrect for two reasons. First, changing TZ need not alter mktime's result. Second and more important, even if you *don't* change TZ, two calls to mktime can yield different answers for the same in-range inputs, so the POSIX 202x/D4 spec does not completely specify strftime's output. This second property is inherent to the inadequacy of mktime's API. And it undercuts any argument that strftime %s and mktime must always produce exactly the same output.
But by all means, submit a defect report, and see how far that gets you.
OK, I've done that here: https://www.austingroupbugs.net/view.php?id=1797 It uses an example that is a bit sharper that what we've discussed so far, in that the example exploits abovementioned inadequacy of mktime's API.
Paul Eggert <eggert@cs.ucla.edu> writes:
I installed the attached to implement your patch's idea, but using timeoff rather than gmtime. Most of the patch is plumbing that makes timeoff visible to strftime even if timeoff is not otherwise extern. Please give this patch a try.
Thank you. Do you have an estimated date for a 2024a release which would include this? DES -- Dag-Erling Smørgrav - des@des.no
On 2024-01-22 08:45, Dag-Erling Smørgrav via tz wrote:
Paul Eggert <eggert@cs.ucla.edu> writes:
I installed the attached to implement your patch's idea, but using timeoff rather than gmtime. Most of the patch is plumbing that makes timeoff visible to strftime even if timeoff is not otherwise extern. Please give this patch a try.
Thank you. Do you have an estimated date for a 2024a release which would include this?
Tim Parenti already said soon, as KZ will change its zone in just over 5 weeks on 2024-03-01. -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry
On 1/10/24 09:35:34, Dag-Erling Smørgrav via tz wrote:
Currently, strftime() implements %s by calling mktime() and then printing the result. This is fine when the struct tm passed to strftime() came from localtime() but not when it didn't. A better solution would be to call timegm() and then manually adjust the result. Of course that's only possible in the TM_GMTOFF case but that's still better than nothing. . This modification makes the two results equal, but I suspect it's not what you want. What is your input? What are you trying to do?
#include <stdlib.h> #include <stdio.h> #include <time.h> int main(void) { char buf[256]; time_t t; time(&t); strftime(buf, sizeof(buf), "%s %F %T %Z", localtime(&t)); printf("local\t%s\n", buf); putenv( "TZ=GMT0" ); strftime(buf, sizeof(buf), "%s %F %T %Z", gmtime(&t)); printf("gm\t%s\n", buf); } -- gil
participants (14)
-
Arthur David Olson -
Bennett Todd -
Brian.Inglis@SystematicSW.ab.ca -
Brooks Harris -
Clive D.W. Feather -
Dag-Erling Smørgrav -
Garrett Wollman -
Guy Harris -
Matthew Donadio -
Paul Eggert -
Paul Gilmartin -
Robert Elz -
scs@eskimo.com -
Steffen Nurpmeso