Date: Sun, 14 Jan 2024 20:33:05 -0500 From: Steve Summit via tz <tz@iana.org> Message-ID: <2024Jan14.2033.scs.0003@tanqueray.home> | But what I was convincing myself of was precisely, as you put | it, that the number generated by strftime %s: | | > ...is *not* the time_t value that produced this | > struct tm - it cannot be, as no such thing need exist. Note you need to keep the correct standards in your head, and know what each requires, and what each specifies. What I wrote there is just to make it clear that one may do: const char * func(void) { static char res[80]; struct tm T = { .tm_year = 2024 - 1900, .tm_mon = 1 - 1, .tm_mday = 15, .tm_hour = 12, .tm_min = 55, .tm_sec = 26, .tm_isdst = -1 /* or 0, or 1, your choice */ }; strftime(buf, sizeof buf, "%s", &T); return buf; } then in my timezone, in a POSIX environment, func() is required to return a pointer to the string "1705298126". The value you'll get will be different, as it depends upon your local time zone, but for any constant local timezone that is, if you do not alter TZ), the result is the same string, it does not depend upon the current date or time in any way at all. If you do alter TZ you can discover the time_t value for that particular instant of local time in various different time zones, as many as you desire. [Aside: I did not compile test that, apologies for any random syntax errors, etc, I introduced .. I did however validate the struct tm data to time_t conversion for my timezone.] The point here is that the %s conversion isn't giving the time_t value that was used to generate T as no time_t value was used for that, just the C code written above. And since the answer depends upon what timezone you execute it in, there is no one right answer, expecting one is a mistake. Anyone who believes that %s (or mktime()) is required (or even just should, and the standard should be changed to allow it) return a particular value for a given struct tm needs to explain how that is to work and keep code like the above functioning correctly. And note, code just like that has always worked for mktime() (since the dark ages when mktime() was first invented, before tzcode or tm_gmtoff existed) and thus by extension for strftime("%s") which traditionally just did mktime() on a copy of the struct tm handed to strftime() and converted the result to a string (using snprintf probably). Note that T is a local variable in func()'s stack space, the fields of the struct tm that are not explicitly set will contain whatever stack garbage was there before the call to func() and as we can sprinkle calls to func() throughout our code, after any other random function has just put who knows what on the stack, that stack garbage can vary from call to call. Hence, if the implementation were to (say) use the value of tm_gmtoff (which is not initialised above) in any way at all to compute the value returned by the %s conversion, then the result would not always be the same. But it must be, there is only one time_t value which you can pass to localtime which will generate 2024-01-15 12:55:26 in your local timezone -- except if that local time happens to be in the overlap period when summer time has just ended and local times run twice - though for that to happen would be unusual indeed, that weird hour (or however long it happens to be) typically happens in the middle of the night, and not in the middle of the day on a Monday. | But before I convinced myself of this, whenever I used to see | that 'date +%s' was for printing what is, for all intents and | purposes, a raw time_t value, I imagined that's what it did: It is what it does. But remember that "date" is a POSIX command, and obeys the POSIX standards, the C standard does not specify any commands at all, just the language often used to write them. So date(1) knows it is in a POSIX environment (anywhere else, and what it does with a '+xxxx' operate, and how that would even be specified to it if a date command even exists, is all someone else's problem - some other standard, or some vendor's proprietary S specification, or whatever) and so time_t is an integer specifying seconds since the epoch, and so that is what date +%s is guaranteed to print (and given no other args to vary it, seconds sine the epoch for "now" when the command is issued). You can rely upon that. | print the raw time_t value. So it follows that if someone is | trying to implement all of date(1)'s '+' options using strftime, | with the implication that strftime has to be able to do %s, it | further follows that strftime has to be able to -- somehow -- | access that raw time_t value. Of course it can. A time_t is a numeric value (even in C) it isn't a struct or union, or something like that, so for a particular environment there is some printf format conversion that we can hand to sprintf() to convert the value to a string. It might be needed to cast it to a long double or something first, but it can always be done. | But, hang on, don't jump down my throat and correct me again, | because, I know: that's wrong. Not really. | is to remind myself that strftime's computation of %s is *not* | a simple operation: Not completely trivial no. But not complex like attempting to measure time at the quantum level, or anything like that either. | it's a complex transformation, more or less | exactly equivalent to mktime. Yes, that is exactly what it is, which is why they are both specified to generate the same results. | It's potentially lossy, I'm not sure what that means - mktime() is (unfortunately, I have been trying hard to get this changed to something rational, but the POSIX people simply refuse to understand the issues) always defined to return a value, except when the year (or in very unlikely cases year and other fields combined) has an absolute value so large that a time_t doesn't have enough bits to represent it ... which is impossible with a 64 bit POSIX time_t in the common case where "int" (which is what tm_year is) is just 32 bits. Hence strftime(%s) always is as well. | it does what you expect (if your expectation is even correct) only | if you use it very carefully, paying attention to subtle facets | of the documentation which are easy to overlook or misinterpret. | One which has been mentioned is that TZ has not changed. NO! You can change TZ however you like. If you do the result will differ - it is intended to, that's why if you run the above code fragment in your timezone you'll get a different value than I get (presumably, unless you're in UTC+0700). That's intended, and the way things are intended to work. I still think you're hung up on the notion that struct tm must always come from a call to one of the *time() functions which return such a struct (or a pointer to one) given a time_t input, and that the value obtained should be that particular time_t value. Stop believing that, that's not how it has ever worked, or is intended to work. | Another is that tzset either has or has not been called. How does that affect anything? | Yet another (which I don't think has been mentioned yet) is | that tm_isdst is set correctly. Not that either - tm_isdst should be just a hint to mktime() (and consequently to strftime(%s)) for the ambiguous cases. Unfortunately, the bizarre desire to use localtime() and mktime() to allow arithmetic operations on C time_t's has the POSIX people demanding that tm_isdst be an instruction, rather than the presumption that the standards have always previously said it was (a presumption which can be rebutted if it turns out to be incorrect - but to be useful for arithmetic, it needs to be mandatory, and override local conventions). Exactly how that is supposed to work still baffles me, as how can someone possibly know what offset would be applied were summer time in effect in some date right in the middle of winter in some jurisdiction which has never had any summer time at all (like where I am, yet if I set tm_isdst to 1 in the above fragment, they require mktime to apply a dst correction which is an unknown magnitude and unknown sign). | But, yes, if you're careful of all those things, %s will work | correctly. But will it do what you want? That depends upon what you want. Obviously. If I want it to magically inflate my bank account, then I will probably be disappointed. If you don't happen to want what it is specified to do, you might be as well. But if you simply want it to behave as it is specified to do, then it should always work. Wanting things to do other than what they are defined to do (like wanting your car to operate as a submarine when submerged in water) is a nice fantasy, but one that only ever seems to work in movies. | there might be bugs in your understanding of what %s does, Of course, and you fix that by learning. Everyone starts out knowing almost knowing about almost everything, and learns over time. However, your objective should be to really learn, not guess, experiment a little and "confirm" the guess, and then proclaim your guess to be the rule. Unfortunately, that's what far too many people do, with the "experiment a little" often being "I tried it once and it worked". And this doesn't just apply here, it applies to everything we believe we know to be true. Make sure, don't just believe because it's easier, and you're less likely to be eventually proven wrong. | > | > (Which brings me back to my conclusion that %s | > | > shouldn't exist, because it's impossible to implement correctly. | > | > Nonsense. It is trivial to implement correctly. | | A laughable conclusion, given the complexity of this thread! Not at all. I have done it. It isn't hard at all. What is hard is convincing people that they're long held belief of just what must be correct (because they never happen to have observed anything different) is in fact wrong. That is hard. | But I think you mean, the long and the short of a proper %s | implementation is to call mktime on the struct tm handed to | strftime, and interpolate the result. It would have to be on a copy of the struct tm, not the actual one, as mktime() might modify it, and we don't want that for strftime - there might be more other conversions still coming in the format string, and we need to use the original values for those, not ones altered by mktime(). But you only need to worry about that if you're implementing strftime(). That's certainly an easy way - but as mktime() first goes about validating the ranges of all the (relevant) struct tm fields, and adjusting them (and then others to compensate) and also setting up the other fields in the struct to agree (tm_wday, tm_yday etc) that strftime() doesn't need to do - it can assume that all the fields are already within range, as its result is unspecified if the user doesn't guarantee that. So it can simply do the latter half of what mktime() does, perhaps using some private internal function which both mktime() and strftime() use, or perhaps just duplicating the code, or using a different algorithm which produces the same result. That's the implementor's choice, and users should not worry about it. | (But it's like that old joke about the lecturer who, | after being questioned about whether a certain result is truly | "obvious", spends half an hour alternately deep in thought or | scribbling abstrusely on the chalkboard, before triumphantly | concluding, "Yes, I was right, it is obvious.") Yes, heard that before - and that people believe it is humorous are not understanding what it means to be "obvious" - which just means that the result is guaranteed from known facts, and cannot be different, not that the process of determining that is quick. This is another of the things where common use of a word has lost its true meaning - another is "theory" where people will say "my theory is that ..." where they mean (at best) "my hypothesis..." and far more often "my unsupported random guess..." But theory is from the same root as theorem, and means proven. Not a guess. Of course, someone might, one day, find a flaw in the proof, but until that happens, a theory should be regarded as a fact. But in the common mindset, it tends to suggest it is just a guess, as that's how people misuse the word all the time. | An implementation that perfectly implements a | useless specification isn't useful. True. But it must have been useful to someone, sometime, for them to have implemented and specified it that way. That it doesn't meet your particular need doesn't mean it isn't useful to anyone, just not useful to you. You might need something different - just don't break what other people need because it isn't what you need. | No, I know, but if the implementor of date(1) has a specification | of the format specifiers accepted by '+', it might be prudent to | vet that list against the specification of the strftime call | that's about to be used. Why? That's strftime()'s job, it is what really knows, and more specifications can be added over time, without needing to go fiddle with the internals of some command (like date) which just happens to use it. strftime() will return "" if there is a problem in the format string - that's why we leave the '+' in the format that date passes to strftime - that way if date gets a "" result, it knows there was an error, and can print a diagnostic. If the result starts with '+', which it must if no error occurred, as that will (for date(1)) always be the fist char of the format string passed in, and only the conversions. which always start with a % are modified, then date knows that strftime() worked, and can simply print the result (without that '+' which is not intended to appear - the user's strftime() format is what followed that '+' in date's arg list). | And of course that's precisely how some implementations of | mktime *do* work! Yes, I know, I did the first of those (not my idea of how to implement it, that I was told about, but I wrote that code) - long long long ago (the actual code has been much improved over time, so I doubt you'll see any of my actual text, unless you look at some ancient archive - and there's no reason to do that). kre