Invalid tm handling in %s pattern in strftime

newer
time zone boundaries contributing...

Almaz Mingaleev

June 6, 2022

5:35 p.m.

Hi Paul, https://github.com/eggert/tz/commit/8b238ec54c09556eb2aa405c1741eedfd12c4a87... added validation of tm argument. I wonder why localtime_r call is done: 1) According to the man page mktime sets errno if tm is invalid. Why is it not safe to just check errno value (saving and restoring it if needed)? 2) man page says that in error case mktime "returns (time_t) -1 and does not alter the members of the broken-down time structure". Wouldn't it be enough to check that tm.tm_yday is >= 0? Thanks, Almaz

Attachments:

attachment.html (text/html — 864 bytes)

Show replies by date

Paul Eggert

June 2022

6:27 p.m.

On 6/6/22 10:35, Almaz Mingaleev wrote:

...

1) According to the man page mktime sets errno if tm is invalid. Why is it not safe to just check errno value (saving and restoring it if needed)?

Because mktime can set errno if tm is valid. mktime is like most C syscalls - when they fail, they set errno to a value that tells you what the error was, but when they succeed they set errno to an unspecified value.

...

2) man page says that in error case mktime "returns (time_t) -1 and does not alter the members of the broken-down time structure". Wouldn't it be enough to check that tm.tm_yday is >= 0?

That's the man page for TZDB mktime; unfortunately the C standard does not guarantee this and strftime.c is attempting to be portable to non-TZDB mktime. Now that I look at the code again, I see that the heuristic was still too trusting of non-TZDB mktime implementations, so I installed the attached to tighten things up. It's just a heuristic, so it can still do the wrong thing on even weirder implementations (though not TZDB). Perhaps some day we can optimize strftime.c if we know that TZDB-compatible mktime is being used.

enh

8:25 p.m.

On Mon, Jun 6, 2022 at 11:27 AM Paul Eggert via tz <tz@iana.org> wrote:

...

On 6/6/22 10:35, Almaz Mingaleev wrote:

...
1) According to the man page mktime sets errno if tm is invalid. Why is it not safe to just check errno value (saving and restoring it if needed)?

Because mktime can set errno if tm is valid. mktime is like most C syscalls - when they fail, they set errno to a value that tells you what the error was, but when they succeed they set errno to an unspecified value.

you can still int saved_errno = errno; errno = 0; if (mktime() == -1 && errno != 0) { return NULL; } errno = saved_errno; though? shouldn't that be cheaper than an extra localtime() call? (the specific issue on 32-bit Android [where time_t is 32-bit] is that this ends up using mktime64() which takes a *const* struct tm for historical reasons. that _will_ set errno -- because it just calls tzcode's mktime() behind the scenes -- but doesn't copy the 32-bit struct tm back into the input struct tm. see https://android.googlesource.com/platform/bionic/+/master/libc/bionic/time64... for the gory details. _changing_ 32-bit-only behavior that's visible to apps in 2023 [we're too late for 2022 already] isn't super.)

...

...
2) man page says that in error case mktime "returns (time_t) -1 and does not alter the members of the broken-down time structure". Wouldn't it be enough to check that tm.tm_yday is >= 0?

That's the man page for TZDB mktime; unfortunately the C standard does not guarantee this and strftime.c is attempting to be portable to non-TZDB mktime.

ah, yeah, looks like POSIX historically said "may" set errno (though 2018 seems to provide the guarantee). that's unfortunate...

...

Now that I look at the code again, I see that the heuristic was still too trusting of non-TZDB mktime implementations, so I installed the attached to tighten things up. It's just a heuristic, so it can still do the wrong thing on even weirder implementations (though not TZDB).

Perhaps some day we can optimize strftime.c if we know that TZDB-compatible mktime is being used.

Paul Eggert

9:41 p.m.

On 6/6/22 13:25, enh wrote:

...

int saved_errno = errno; errno = 0; if (mktime() == -1 && errno != 0) {

That isn't portable, because mktime can set errno and then successfully return -1, which means the caller can't rely on errno to decide whether mktime succeeded. I'm pretty sure this sort of behavior can happen on popular platforms. This sort of thing isn't a problem with syscalls like 'open', where the caller can portably distinguish success from failure by testing whether the return value is negative. Unfortunately, it is a problem with 'mktime', where the caller cannot portably distinguish success from failure. Even main-branch TZDB strftime isn't 100% reliable here on all platforms; it's merely more reliable than inspecting errno would be.

...

(the specific issue on 32-bit Android [where time_t is 32-bit] is that this ends up using mktime64() which takes a *const* struct tm for historical reasons.

Does Android mktime64() reliably leave errno alone when it succeeds? If so, perhaps we could add a compile-time macro like "MKTIME_SETS_ERRNO_ONLY_ON_FAILURE" that would let strftime use code like what's quoted above.

Garrett Wollman

9:52 p.m.

<<On Mon, 6 Jun 2022 14:41:15 -0700, Paul Eggert via tz <tz@iana.org> said:

...

On 6/6/22 13:25, enh wrote:

...
int saved_errno = errno; errno = 0; if (mktime() == -1 && errno != 0) {

...

That isn't portable, because mktime can set errno and then successfully return -1, which means the caller can't rely on errno to decide whether mktime succeeded. I'm pretty sure this sort of behavior can happen on popular platforms.

The general rule for POSIX (and ISO C) interfaces is that unless specified otherwise, a standard library interface may never set errno to zero. As Paul notes, only some interfaces are explicitly specified to leave errno unchanged; those that are not so specified may set it to any nonzero value on success. The current POSIX standard definition of mktime() specifies only the permissive "may set errno to indicate the error" (as an extension to the C standard) and does not require that errno be preserved on success. So portable applications cannot assume that errno will be meaningful even when mktime() does experience an error. -GAWollman

enh

10:51 p.m.

On Mon, Jun 6, 2022 at 2:53 PM Garrett Wollman via tz <tz@iana.org> wrote:

...

<<On Mon, 6 Jun 2022 14:41:15 -0700, Paul Eggert via tz <tz@iana.org> said:

...
On 6/6/22 13:25, enh wrote:

...
int saved_errno = errno; errno = 0; if (mktime() == -1 && errno != 0) {

...
That isn't portable, because mktime can set errno and then successfully return -1, which means the caller can't rely on errno to decide whether mktime succeeded. I'm pretty sure this sort of behavior can happen on popular platforms.

The general rule for POSIX (and ISO C) interfaces is that unless specified otherwise, a standard library interface may never set errno to zero. As Paul notes, only some interfaces are explicitly specified to leave errno unchanged; those that are not so specified may set it to any nonzero value on success. The current POSIX standard definition of mktime() specifies only the permissive "may set errno to indicate the error" (as an extension to the C standard) and does not require that errno be preserved on success. So portable applications cannot assume that errno will be meaningful even when mktime() does experience an error.

that's what it used to say (because, as eggert said, ISO C only has that weak guarantee), but the *current* POSIX text is stronger: The mktime() function shall return the specified time since the Epoch encoded as a value of type time_t. If the time since the Epoch cannot be represented, the function shall return the value (time_t)-1 [CX] [Option Start] and set errno to indicate the error. [Option End] https://pubs.opengroup.org/onlinepubs/9699919799/functions/mktime.html (the "CX" option is how POSIX says "we deviate from ISO C here".)

...

-GAWollman

enh

10:56 p.m.

On Mon, Jun 6, 2022 at 2:41 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

...

On 6/6/22 13:25, enh wrote:

...
int saved_errno = errno; errno = 0; if (mktime() == -1 && errno != 0) {

That isn't portable, because mktime can set errno and then successfully return -1, which means the caller can't rely on errno to decide whether mktime succeeded. I'm pretty sure this sort of behavior can happen on popular platforms.

we certainly try to avoid that mistake on Android, but, yeah, i've definitely seen bugs like that.

...

This sort of thing isn't a problem with syscalls like 'open', where the caller can portably distinguish success from failure by testing whether the return value is negative. Unfortunately, it is a problem with 'mktime', where the caller cannot portably distinguish success from failure. Even main-branch TZDB strftime isn't 100% reliable here on all platforms; it's merely more reliable than inspecting errno would be.

...
(the specific issue on 32-bit Android [where time_t is 32-bit] is that this ends up using mktime64() which takes a *const* struct tm for historical reasons.

Does Android mktime64() reliably leave errno alone when it succeeds? If so, perhaps we could add a compile-time macro like "MKTIME_SETS_ERRNO_ONLY_ON_FAILURE" that would let strftime use code like what's quoted above.

it's your mktime(), so you tell me :-) but if you meant "does the extra cruft that _isn't_ tzcode mktime() touch errno?", no, it doesn't touch errno directly, nor does it call anything other than mktime() that might set errno. (or if you actually meant "look how hard it is to prove such things", yeah :-( in bionic we use RAII to make functions "errno clean" all the time to avoid this kind of problem because it's too hard to reason about...)

Paul Eggert

12:49 a.m.

On 6/6/22 15:56, enh wrote:

...

it's your mktime(), so you tell me :-)

tzdb mktime doesn't make any attempt to preserve errno on success. Even on a platform like bionic with primitives following a preserve-errno-on-success philosophy, tzdb functions can succeed after calling other functions that may fail and set errno. For example, tzset_unlocked can recover from a failed zoneinit (which sets errno) by calling zoneinit again.

...

in bionic we use RAII to make functions "errno clean" all the time to avoid this kind of problem because it's too hard to reason about...)

If we wanted to make tzdb localtime.c and strftime.c "errno clean" it'd require some careful work. It'd be doable but I expect it's not a one-hour job (though perhaps you could prove me wrong :-).

enh

1:23 a.m.

On Mon, Jun 6, 2022 at 5:49 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

...

On 6/6/22 15:56, enh wrote:

...
it's your mktime(), so you tell me :-)

tzdb mktime doesn't make any attempt to preserve errno on success. Even on a platform like bionic with primitives following a preserve-errno-on-success philosophy, tzdb functions can succeed after calling other functions that may fail and set errno. For example, tzset_unlocked can recover from a failed zoneinit (which sets errno) by calling zoneinit again.

...
in bionic we use RAII to make functions "errno clean" all the time to avoid this kind of problem because it's too hard to reason about...)

If we wanted to make tzdb localtime.c and strftime.c "errno clean" it'd require some careful work. It'd be doable but I expect it's not a one-hour job (though perhaps you could prove me wrong :-).

i think worse than that, it's just too hard to reason about --- we'd bitrot anyway. it's really not obvious that there's a good fix (beyond "wait for 32-bit to die") at all... i don't think we can easily fix mktime64() --- i think we'd basically need to fix it differently, so we built tzcode twice (once with the public ABI 32-bit time_t and again with the 64-bit time_t), which at the very least would be quite a behavior change for any code using the 64-bit stuff. we'd also have to change the const-ness of a few arguments, which could be a breaking change. (i mean, it _is_ a breaking change; the question is really "how many apps would break?".) i don't think we can really go the errno route. you've convinced me of that. i don't see how we can save the localtime_r() comparisons --- our fundamental problem is that we don't have anything from mktime64() to compare against. i think the safest choice for bionic is probably just "disable the new check and preserve the old behavior". given that you can only get into this mess by passing a bad `struct tm` to strftime(), and given that most use cases i'm aware of have literally just asked us to fill out that `struct tm` from a `time_t`, so -1 is almost certain to be "legit -1" rather than "error -1", i think i'm fine with that? especially since we haven't seen/heard any complaints about the existing behavior (we haven't yet shipped https://github.com/eggert/tz/commit/8b238ec54c09556eb2aa405c1741eedfd12c4a87, despite it being from 2020 --- it's mingaleev's work trying to get back in sync that uncovered this). it sounds from the checkin comment like there wasn't a motivating example for this behavior change, but let us know if you can think of an actual downside to not taking this change...

Paul Eggert

2:07 a.m.

On 6/6/22 18:23, enh wrote:

...

i think the safest choice for bionic is probably just "disable the new check and preserve the old behavior".

Yes, that sounds right. Come to think of it, that's simpler (and arguably better) on all platforms. That is, if you give strftime a format containing %s, you must also pass a struct tm that came from a successful localtime call; otherwise behavior is undefined. In the undefined case tzdb strftime can simply output (time_t)-1 for %s. Proposed patch attached and installed into the development repository. This doesn't affect tzdb itself, as tzdb never gets into this situation.

enh

9:58 p.m.

On Mon, Jun 6, 2022 at 7:07 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

...

On 6/6/22 18:23, enh wrote:

...
i think the safest choice for bionic is probably just "disable the new check and preserve the old behavior".

Yes, that sounds right.

Come to think of it, that's simpler (and arguably better) on all platforms. That is, if you give strftime a format containing %s, you must also pass a struct tm that came from a successful localtime call; otherwise behavior is undefined. In the undefined case tzdb strftime can simply output (time_t)-1 for %s.

Proposed patch attached and installed into the development repository. This doesn't affect tzdb itself, as tzdb never gets into this situation.

awesome, thanks! not having to keep a diff relative to upstream tzcode is great. (it's the large number of diffs we have so that tzdata can be one big file rather than lots of little files that got us into this mess where we're so far behind ToT in the first place!)

Paul Eggert

10:42 p.m.

On 6/7/22 14:58, enh wrote:

...

(it's the large number of diffs we have so that tzdata can be one big file rather than lots of little files that got us into this mess where we're so far behind ToT in the first place!)

Perhaps we should add support for that to tzcode as well. It's not like there isn't any experience with it....

1446

Age (days ago)

1447

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Almaz Mingaleev
enh
Garrett Wollman
Paul Eggert

Invalid tm handling in %s pattern in strftime

tags

participants (4)