strftime %y and negative years

"Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> writes:
...the suggested change to strftime is trying to get %y to produce the "year within the century" as output.
Yes, that's correct: it's trying to produce the year modulo 100.
But the latest IEEE Std 1003.1 calls for...
%y Replaced by the last two digits of the year as a decimal number [00,99].
Read literally, this would have undefined behavior for the years -9 through 9 (since they don't have two digits), and would generate "10" for the year -10, and so forth, which obviously disagrees with the proposed "year modulo 100" semantics. However, I just checked 3 implementations and found that nobody obeys the literal behavior for years before -9, that there's disagreement about negative single-digit years, and that at least one traditional Unix implementation mishandles years before 1900 (not too surprising, since the Unix Version 7 ctime allowed only years in the range 1900..2099). Here's what I found. year (i.e., glibc 2.2.5 OpenBSD 3.4 Solaris 9 tm_year-1900) (Debian patch 112874-29 2.2.5-11.5) (64-bit sparc) -101 99 -1 0/ -100 00 00 00 -99 01 -99 '' (i.e., two apostrophes) ... -2 98 -2 0. -1 99 -1 0/ 0 00 00 00 1 01 01 '' ... 9 09 09 '/ 10 10 10 '0 11 11 11 (' ... 99 99 99 0/ 100 00 00 00 101 01 01 '' ... 1899 99 99 0/ 1900 00 00 00 ... The implementations agreed for years after 1899, until they got to the year 2**31: 2**31 - 1 47 47 47 2**31 48 -48 48 2**31 + 1 49 -47 49 2**31 + 2 50 -46 50 .... 2**31 + 1897 45 -51 45 2**31 + 1898 46 -50 46 2**31 + 1899 47 -49 47 So it appears that, in practice, the behavior of strftime %y is undefined when tm_year is negative, or when tm_year+1900 exceeds INT_MAX. I don't know whether the standards committee would consider all this to be a bug in the standard or in the implementations, but here's what I think. The Solaris behavior is clearly buggy for years before 1900. The OpenBSD behavior is clearly buggy for years after 2**31 - 1. For years before 0, I suppose that it's debatable between OpenBSD 3.4 and glibc 2.2.5. However, I'd say that the glibc 2.2.5 behavior is cleaner, since it's more regular, it always outputs two bytes, and it doesn't output "-". PS. Disclaimer: I wrote that part of glibc 2.2.5 so my opinion is hardly impartial. PPS. Ironic, isn't it? The consensus is that new code shouldn't use ctime, as it's obsolete and is undefined for years out of the traditional range, and that new code should use strftime. But in practice, strftime has problems too. PPPS. I'll CC: this to the tz mailing list to see if anybody else has experiences with other implementations in this area, or strong opinions on the subject.

Paul Eggert said:
Yes, that's correct: it's trying to produce the year modulo 100.
But the latest IEEE Std 1003.1 calls for...
%y Replaced by the last two digits of the year as a decimal number [00,99].
Read literally, this would have undefined behavior for the years -9 through 9 (since they don't have two digits), and would generate "10" for the year -10, and so forth, which obviously disagrees with the proposed "year modulo 100" semantics.
Ouch. This wording comes from ISO C, and we plain didn't think about that. It looks like another DR is required. Note, by the way, that strftime is only supposed to work when the relevant fields are in their "normal range". No such range is given for tm_year.
However, I just checked 3 implementations
It would be interesting to see what they do with %C as well: %C is replaced by the year divided by 100 and truncated to an integer, as a decimal number (00-99).
year (i.e., glibc 2.2.5 OpenBSD 3.4 Solaris 9 tm_year-1900) (Debian patch 112874-29 2.2.5-11.5) (64-bit sparc) -101 99 -1 0/ -100 00 00 00 -99 01 -99 '' (i.e., two apostrophes) ... -2 98 -2 0. -1 99 -1 0/ 0 00 00 00 1 01 01 '' ... 9 09 09 '/ 10 10 10 '0 11 11 11 (' ... 99 99 99 0/
The glibc and OpenBSD behaviours appear to be using the % operator. This is defined by: the expression (a/b)*b + a%b shall equal a. If b is positive (here, it's 100) and a is negative, there are two possibilities in C90: (1) a/b rounds towards zero (this is required in C99) and a%b is negative. This is what OpenBSD appears to be doing, with a = tm_year - 1900. (2) a/b rounds towards more negative numbers and a%b is positive. This is actually the behaviour I prefer, but regrettably it's not available in a simple way in C99. This is what glibc is doing. As for Solaris, my best guess is that it's calculating: '0' + tm_year / 10 % 10 '0' + tm_year % 10 If tm_year is negative and / rounds towards zero, you'd get that behaviour (the characters ' ( and / are respectively '0'-9, '0'-8, and '0'-1 on an ASCII system). -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
Note, by the way, that strftime is only supposed to work when the relevant fields are in their "normal range". No such range is given for tm_year.
I intepret this to mean that strftime is supposed to work regardless of the value of tm_year. However, the standard says that %C always generates a value in the range [00,99] so it would appear there's an inconsistency here. I suppose one could argue that %C has undefined behavior for years outside the range [0, 9999]. But this appears to me to be a defect in the standard -- at least, things are quite unclear here. I would prefer it if strftime were required to handle all tm_year values. There is no similar restriction on the range for %Y, which suggests that strftime %Y must handle all tm_year values. For %y the range is [00,99], which argues for using modulus rather than remainder.
It would be interesting to see what they do with %C as well:
Solaris interprets %C completely differently: it treats it as a request to output the same string that the "date" command outputs by default. The strftime man page says that there is a "standard-conforming" strftime somewhere but doesn't say how to get it. I couldn't figure it out so I gave up looking for it.
The glibc and OpenBSD behaviours appear to be using the % operator.
glibc uses %, but adjusts negative remainders to make them positive, so that it's actually using modulus. I think OpenBSD uses plain %.
As for Solaris, my best guess is that it's calculating: '0' + tm_year / 10 % 10 '0' + tm_year % 10
Yes, that sounds plausible, as Unix Version 7 does something similar. Solaris also mishandles %Y for negative and/or large years. For example, strftime %Y prints the year -1 as "000/", and prints the year 2**31 (i.e., tm_year == 2**31 - 1900) as "-*,(". This is consistent with your theory. Here's a test program you can use to try out your implementation. It's not strictly conforming code (it relies on floating point) but it should work on all practical platforms. Only glibc "passes" the test, in then sense that it produces a coherent set of values for all inputs (it always uses modulus for %y, and for %C it always truncates towards minus infinity). Solaris botches %C entirely, and mishandles %y for years before 1900, mishandles %Y for years before 0. OpenBSD uses signed remainder for negative years, though I'd argue that having %y generate "-" is bogus. OpenBSD and Solaris both clearly mishandle tm_year values close to INT_MAX. #include <string.h> #include <limits.h> #include <stdio.h> #include <stdlib.h> #include <time.h> static void process (int tm_year) { struct tm tm; char y[1000]; char C[1000]; char Y[1000]; tm.tm_year = tm_year; strftime (y + 1, sizeof y - 2, "%y", &tm); y[0] = '"'; strcat (y, "\""); strftime (C + 1, sizeof C - 2, "%C", &tm); C[0] = '"'; strcat (C, "\""); strftime (Y + 1, sizeof Y - 2, "%Y", &tm); Y[0] = '"'; strcat (Y, "\""); printf ("%13d %13.0f %13s %13s %13s\n", tm_year, tm_year + 1900.0, y, C, Y); } int main (int argc, char **argv) { printf ("%13s %13s %13s %13s %13s\n", "tm_year", "year", "%y", "%C", "%Y"); if (argc <= 1) { #define near(x) (x) - 1900, (x) - 1900 + 1, (x) - 1900 + 2 static int test[] = { near (INT_MIN + 1900), near (-1001), near (-101), near (-11), near (-1), near (9), near (99), near (999), near (1899), near (1969), near (1999), near (2099), near (INT_MAX - 1), near (INT_MAX + 1900.0 - 2) }; int i; for (i = 0; i < sizeof test / sizeof *test; i++) { if (i == 0 || test[i - 1] + 1 != test[i]) printf ("\n"); process (test[i]); } } else while (*++argv) process (atoi (*argv)); return 0; }

Paul Eggert said:
Note, by the way, that strftime is only supposed to work when the relevant fields are in their "normal range". No such range is given for tm_year. I intepret this to mean that strftime is supposed to work regardless of the value of tm_year.
I would agree.
However, the standard says that %C always generates a value in the range [00,99] so it would appear there's an inconsistency here.
Also agreed.
I suppose one could argue that %C has undefined behavior for years outside the range [0, 9999]. But this appears to me to be a defect in the standard -- at least, things are quite unclear here. I would prefer it if strftime were required to handle all tm_year values.
So would I, but persuading WG14 might be harder.
There is no similar restriction on the range for %Y, which suggests that strftime %Y must handle all tm_year values.
For %y the range is [00,99], which argues for using modulus rather than remainder.
All that says is that the value must be in that range. A system that generates %C "-2" %y "34" for the year -234 (235 B.C.) would, I think, conform.
Here's a test program you can use to try out your implementation.
FreeBSD appears to do the same as OpenBSD. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
The glibc and OpenBSD behaviours appear to be using the % operator. This is defined by:
the expression (a/b)*b + a%b shall equal a.
If b is positive (here, it's 100) and a is negative, there are two possibilities in C90:
(1) a/b rounds towards zero (this is required in C99) and a%b is negative. This is what OpenBSD appears to be doing, with a = tm_year - 1900. (2) a/b rounds towards more negative numbers and a%b is positive. This is actually the behaviour I prefer, but regrettably it's not available in a simple way in C99. This is what glibc is doing.
I agree with your preference. In fact, I have never found an application for (1). I expect the only reason (2) was ever accepted is that it's easy to implement with common hardware. There used to be a lot of lame floating point implementations around for the same reason. It's a really poor reason for standardizing on (2), though. - Jim Van Zandt
participants (3)
-
Clive D.W. Feather
-
James R. Van Zandt
-
Paul Eggert