Re: [tz] strftime %s

Jan. 14, 2024

      Date:        Sun, 14 Jan 2024 11:41:10 -0800
    From:        Paul Eggert <eggert@cs.ucla.edu>
    Message-ID:  <0573ccfb-4c07-4886-916c-f521180e949a@cs.ucla.edu>

Note. in what follows, lines starting "  | " are quotes of
text written by Paul in the message to which I'm replying,
except for those lines which start "  | >" which are lines
Paul quoted from a message Steve sent (see the header for
full names & e-mail addresses).

  | Although that's one interpretation of the standard, it's not the only 
  | one.

It is, however, approximately the correct one.

  | As I've been saying, although the POSIX and C standards can easily 
  | be misinterpreted,

Anything can be misinterpreted, that means nothing.

  | they have a better interpretation which says that on 
  | a system with tm_gmtoff and tm_zone strftime need not use mktime or 
  | equivalent, not even for %s.

Nothing says that anything needs to use mktime().   What the spec for
strftime("%s") says is that the result, in a POSIX system, must represent
the same value as mktime() on the same struct tm would produce.   That
is the value that should be produced is specified.  That's all.
Allowing the implementation to produce any answer it likes would make
it kind of difficult for anyone to use reliably, don't you think?
The mechanism the implementation uses to produce that specified result
is entirely up to it, provided it uses only the data that users are told
they need to provide (otherise the implementation risks using garbage).

  | > 	The struct tm handed to strftime must be one returned by
  | > 	an immediately preceding call to localtime or gmtime.
  |
  | This is good advice,

It actually isn't.   It isn't required at all.  All that is required
is that the fields required for the conversions specified in the
format string be correctly initialised to the desired values.

Certainly calling one of the functions which fills in a struct tm
will do that, and that's a very common usage, but it isn't the only
way (using the results from parsedate() on systems that have it
is another, as is simply doing a scanf() on a date/time string,
perhape one previously created by strftime()).   Or many other ways,
including simply reading the struct tm from a file.

  | and (at least in a "should" form) it should be in POSIX.

It certainly should not.

  | While I was at it I noticed that the man page doesn't say strftime 
  | behaves as if tzset were called (even though this is no longer needed). 

But in general, it doesn't, only for the 3 conversions that need it.

In any case "behaves as if tzset() were called" is more or less (not
fully) irrelevant to the call of strftime() itself, what is crucial
about that is that calls to tzset() can affect later function results,
and the lifetimes of data returned from earlier calls - so it is
important to know when that might happen.

  | Yes, but the standards give leeway as to how to "make this work" for %z 
  | and %Z, and this leeway includes using members like tm_gmtoff and 
  | tm_zone that the C standard does not specify.

Certainly, POSIX has added stuff which the C standard does not require
to exist - C is trying to be able to run in more environments than
just POSIX ones, which necessarily affects just how much it can
specify when dealing with interfaces to external systems (like time).

Eg: in C, a time_t is *not* a count of seconds since some epoch, and
simply printing a time_t value and expecting that to be seconds since
the epoch, in a portable C application is incorrect.  POSIX specifies
it as an integer count of seconds since 1970-01-01T00:00:00Z (at exactly
86400 seconds per day, every day, always).   C does not.   A C time_t
might be a count of milliseconds, of microseconds, or 2-seconds, or
BCD encoded, or almost anything (though I think it is now required
to be an integer type - I believe it was once allowed to be a float).

  | > (Which brings me back to my conclusion that %s
  | > shouldn't exist, because it's impossible to implement correctly.

Nonsense.   It is trivial to implement correctly.

Perhaps you mean that the specification does not achieve what you
want it to produce - that's a different issue entirely.  That is,
your "correctly" means "what I want" rather than "as specified".

And you're certainly right that would be impossible, as what you
want, and what I want, and what someone else wants might all be
different - the implementation needs to pick one of them (or add
more interface to select) - it cannot simply guess which one the
current user expects to happen, and implement that.  That is impossible.

Lots of functions don't do what I'd like them to do.   Sad, but true.
Live with it.

  | It's impossible only if one uses a too-strict interpretation of the 
  | standards.

I have no idea what that means.   It isn't impossible no matter
how strictly the standard is interpreted (which should always
be "very").

  | Let's not do that, as it would make our implementations 
  | worse, our users more confused, and our software buggier.

All that is needed is to make it clear just what the %s value
represents.   It is *not* the time_t value that produced this
struct tm - it cannot be, as no such thing need exist.  It is
the time_t value which localtime() would convert into the same
values as are in the struct tm given to strftime() (for the fields
that strftime() uses, or might).   Note, only localtime() for this,
never gmtime() or anything else.

  | >     You might think that the sequence
  | > 
  | >         struct tm *tm = localtime(&t);
  | > 	strftime(buf, sizeof buf, "%s", tm);
  | > 
  | >     is fundamentally guaranteed to place a decimal representation
  | >     of t into buf, where "fundamentally" implies that it just
  | >     *has* to work, even in the face of serious bugs in other,

Come on, be serious.   Nothing is ever guaranteed to work in the
face of serious bugs.   If there's a serious bug in cc, you might
not even be able to compile the code to test that (for example).
If the startup code (what used to be crt0 but I think that's been
replaced by something different - never mind) has a serious bug,
your code might never start running.   If ... (I could go on forever).

  | >     unrelated parts of the time-conversion logic.  But no, this
  | >     sequence is in fact utterly vulnerable to bugs in other
  | >     parts of the time-conversion logic,

Everything is vulnerable to bugs in all kinds of things.  The whole
of tzcode assumes that read(2) works, so that the TZif file can be
read to get the information it contains, but if there were a bug in
read(2) such that every second byte was complemented, or something
else weird like that, nothing would work.   Do you worry about that,
and abandon all uses of tzcode because of it?   I certainly don't.

We cannot specify things such that we are assuming that other things
will be broken, or we cannot expect to rely upon anything at all.

Instead, we assume that everything works, and write code based upon
that assumption, and then if something doesn't behave as expected,
we first double check that our expectation is correct (that is,
don't simply assume that because it looks like as if it should do
X, that X is what it must do - verify that the specification says
that), and then if that's true, and the implementation isn't doing
what it should, we file a bug report and get the thing fixed.

  | > because it is necessarily
  | >     equivalent to the sequence
  | > 
  | >         struct tm *tm = localtime(&t);
  | > 	time_t t2 = mktime(tm);

Yes.

  | >     which sets t2 == t only in the presence of a perfectly-
  | >     implemented mktime,

Of course, and a perfectly implemented localtime(), and a perfectly
implemented compiler, and correctly functioning hardware, and ...

Incidentally, a bug free localtime() is much harder to achieve
than a bug free mktime(), as mktime() can easily be implemented
simply by making calls to localtime() and comparing the results
with the input struct tm, until the input time_t to locatime()
which produces the expected results is found.  Perhaps not all
that efficient, but very easy, and if localtime() is correct,
then so will be mktime().   mktime() needs to normalise the values
in the struct tm first, or they'd never compare equal to localtime
results of course - strftime() doesn't need to do that, as its
results are unspecified if any of the relevant struct tm values
are out of their specified ranges.

  | > and also given certain other constraints,
  | >     such as that TZ has not changed.

Yes - mktime() uses the current TZ specified local time to do its
conversion, just as does localtime.

You might as well say that

	struct tm *tm1 = localtime(&t);
	struct tm *tm2 = localtime(&t);

isn't guaranteed to produce the same values in *tm1 and *tm2, as
it depends upon a perfectly implemented localtime() and that TZ
isn't altered between the two calls, and ...   (and that t doesn't
change in the interim).

There's nothing specific to mktime() or strftime("%s") which
makes things any different in this area.

  | Assuming that localtime and strftime both succeed (localtime returns 
  | non-null and strftime's output fits), then a warning stated this baldly 
  | would be incorrect for current tzcode as its strftime %s is indeed the 
  | inverse of localtime.

As it should be.   Exactly that, and nothing else.   Ever.

In this regard, note that localtime() uses the TZ timezone, not
anything different, so what you're saying is that strftime("%s")
uses the TZ timezone, and never anything else (whatever value might
happen to be in the tm_gmtoff field of the struct tm passed to it).

  | > 	Please rely on %s only if you're the implementor of
  | > 	date(1) or the equivalent.

Nonsense.   Further the implementor of date(1) doesn't care
about %s at all, the '+format' operand is simply passed
directly to strftime (and then the leading '+' in the resulting
string removed - there are reasons for doing it that way rather
than removing the '+' first) without examining it at all.

  | It's true that strftime %s has problems on other platforms,

What platforms have issues?   That is, of ones which actually support
%s of course.   (Though it sounds as if perhaps the current unreleased
but patched tzcode might perhaps be one of them.)

  | so a portability warning is appropriate for tzcode strftime's man
  | page.

It would be better to file bugs against the broken ones.   This
isn't a case (like say "echo") where there are two competing
specifications, and people simply will not agree on which is
correct, so we just tell everyone to avoid it for safety.

  | NetBSD's strftime_z does that. But it's not needed in current tzcode, 
  | which addresses the problem in a simpler way.

There is no problem to address.   It is just that %s is not designed to
do what soem people apparently want it to do (which is, in general,
not really all that useful).

If there is a real need for something different, the way to deal with
that is to suggest to the implementors that some other conversion be
added (or a modifier applied to the %s conversion perhaps) to achieve
different results - not to arbitrarily simply change the speficication
of %s and by so doing break code which is justifiably relying upon it
working as it is specified to work.

kre

Re: [tz] strftime %s

Robert Elz