
Olson, Arthur David (NIH/NCI) wrote:
Is using "%-4ld" to print the year a happy medium?
OK, some advice probably nobody wants from a timezone lurker... :-) I would rather see digits appear in the same position in the string if possible (i.e. right justified, blank filled). My reasoning? The existing behavior has the day of month and hour printed that way (right justified, blank filled). I vote for %4ld, which for me fulfils the principle of least surprise. -- Michael Lindner

Michael Lindner said:
I would rather see digits appear in the same position in the string if possible (i.e. right justified, blank filled). My reasoning? The existing behavior has the day of month and hour printed that way (right justified, blank filled).
Sorry, but the hour is zero-filled, not blank filled. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> writes:
Is using "%-4ld" to print the year a happy medium?
You mean, print something like "999 " for the year, with trailing spaces? Sorry, no, that doesn't conform either. If we're going to fail to conform to the standard, we shouldn't mess around with left-justification: it's even more confusing. I agree with Robert Elz that "that code is broken. End of story" is too harsh. That is why I'm advocating that we continue to have asctime always return a valid non-NULL string, even though the standard doesn't require this. This is a good thing -- we shouldn't gratuitiously break common usage even if it's no longer conforming. However, I'm not convinced by this example that it's worth departing from the standard here. Here's the example again: printf("The date is: %.24s today\n", asctime(tm)); This code does not work with arbitrary "tm" anyway, even on PDP-11 Unix Version 7 with 16-bit int and 32-bit timestamps. This is because the code will print garbage by truncating long years in some cases. There is no escaping this: asctime can't squish years past 9999 or before -999 into that string without either munging the traditional format (which will break other common usage) or printing the wrong year (which is simply incorrect). Since this example is already broken for years after 9999 or before -999, we needn't worry about the fact that conforming to the standard will also break this code for years in the range -99 through 999. If the example's "tm" was derived from a 32-bit time_t a la traditional Unix, there is no problem since the year will be at least 1901. And if "tm" was derived from a 64-bit time_t, or was constructed from arbitrary data, the code is already broken. So, let's just stick with the standard behavior (extended so that asctime never returns NULL). In practice, it won't break any applications that aren't already broken.

Date: Wed, 28 Jul 2004 07:48:44 -0700 From: Paul Eggert <eggert@CS.UCLA.EDU> Message-ID: <87smbcyypf.fsf@penguin.cs.ucla.edu> | That is why I'm advocating that we continue to have | asctime always return a valid non-NULL string, even though the | standard doesn't require this. This is a good thing -- we shouldn't | gratuitiously break common usage even if it's no longer conforming. I agree with that, though I suspect my interpretation of "valid" might be different than yours. I'd prefer to always return a string that fits exactly in a 26 byte buffer, with the \n in buf[24] and \0 in buf[25], and every other character posit position occupied by exactly what is supposed to be there (the colons at known offsets, etc). If that makes the output be "incorrect" when the input struct tm contains values (way) beyond those which asctime() was designed to handle, then that's just fine with me. Where we can handle the value, and retain the output format, we should however. People expecting (or who should be expecting) to deal with arbitrary struct tm values - or in fact, just about any current code expecting to convert times into strings ought to be using strftime, the only code that should be using asctime() is legacy code that predates strftime - and for that, keeping the legacy interface is important. kre

Robert Elz said:
I agree with that, though I suspect my interpretation of "valid" might be different than yours. I'd prefer to always return a string that fits exactly in a 26 byte buffer, with the \n in buf[24] and \0 in buf[25], and every other character posit position occupied by exactly what is supposed to be there (the colons at known offsets, etc).
The problem is that the only specification for "exactly what is supposed to be there" is what's in the standard.
If that makes the output be "incorrect" when the input struct tm contains values (way) beyond those which asctime() was designed to handle, then that's just fine with me. Where we can handle the value, and retain the output format, we should however.
Assuming that all fields except the year are in range, then we have the following cases: * Years 1000 to 9999 and -999 to -100: everyone agrees. * Years before -999 or after 9999: undefined behaviour, so implementations can do what they want. * Years -99 to 999: the C standard says the string should be shorter, you say the string should contain extra spaces or zeroes. I don't believe that there has ever been a real expectation for that case. I don't know the history of the asctime() specification, but I suspect that it has more basis than your claim.
People expecting (or who should be expecting) to deal with arbitrary struct tm values - or in fact, just about any current code expecting to convert times into strings ought to be using strftime,
People expecting an exact format for the output should be using strftime. asctime() is a quick-and-dirty interface to produce something that can be output in a log file or similar. It's not intended to be an exact science. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

Date: Thu, 29 Jul 2004 15:52:46 +0100 From: "Clive D.W. Feather" <clive@demon.net> Message-ID: <20040729145246.GY34961@finch-staff-1.thus.net> | I don't know the history of the asctime() specification, but I suspect that | it has more basis than your claim. The original asctime() specified exactly what the out buffer would contain. Sure, it was never really expecting to be dealing with years before 1900, but as it always took a struct tm, that was always possible. | People expecting an exact format for the output should be using strftime. This much I agree with - but that's only rational for code people are writing (or at least, maintaining) today. What matters here is old orphaned code, written 10-20 years ago (or more), that still works. | asctime() is a quick-and-dirty interface to produce something that can be | output in a log file or similar. It's not intended to be an exact science. asctime() is a historic interface - no-one would be inventing it today. When it was created, it was the only time -> printable string routine around (after asctime() was invented, I vaguely recall that ctime(), which produces the same output format, predates it - once asctime() was created, ctime() became its most frequent client, of course). Now, the only justification really to keep asctime() is for old code that is using it - and expecting its old interface. The really poor thing about this, is that the time functions (dst and zone representations excluded) have been one of the C/unix functions that have never been subject to the "my version is different than yours" syndrome that plagued so many of the other interfaces. There was never any justification for any changes to this interface. kre

Robert Elz said:
The original asctime() specified exactly what the out buffer would contain.
Did it? How? That is, what did old manual pages actually say?
The really poor thing about this, is that the time functions (dst and zone representations excluded) have been one of the C/unix functions that have never been subject to the "my version is different than yours" syndrome that plagued so many of the other interfaces. There was never any justification for any changes to this interface.
I'm not convinced that there has actually been a change. I've fired off a query to WG14 about the history of this interface. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

On Jul 29, 2004, at 11:18 AM, Clive D.W. Feather wrote:
Robert Elz said:
The original asctime() specified exactly what the out buffer would contain.
Did it? How? That is, what did old manual pages actually say?
If the FreeBSD man page repository is to be believed: http://www.freebsd.org/cgi/man.cgi? query=ctime&manpath=Unix+Seventh+Edition&format=html then it said: Ctime converts a time pointed to by clock such as returned by time(2) into ASCII and returns a pointer to a 26-character string in the fol- lowing form. All the fields have constant width. Sun Sep 16 01:03:52 1973\n\0 Localtime and gmtime return pointers to structures containing the bro- ken-down time. Localtime corrects for the time zone and possible day- light savings time; gmtime converts directly to GMT, which is the time UNIX uses. Asctime converts a broken-down time to ASCII and returns a pointer to a 26-character string. However, that man page has a large chunk of a BSD include file in it, which leads one to suspect that the man page might not be the one from V7. However, if we go to the V7 archives at Bell Labs: http://plan9.bell-labs.com/7thEdMan/index.html and get the section 3 bundle and extract it, we get a "ctime.3" that has The structure declaration from the include file is: .RS .PP .nf .so /usr/include/time.h .fi .RE which suggests that the BSD man page might really *be* the V7 one, as the V7 one produces different results on different OSes as it includes the OS's "time.h". In any case, it says the same thing as the one from the FreeBSD site in the paragraphs I cited above. It doesn't *explicitly* say that the 26-character string returned by "asctime()" has the same format as the one returned by "ctime()", although I think they intended to imply that - at least for a "struct tm" generated from a "time()" value. They probably didn't even think about what would or should happen if you handed it a "struct tm" with values that couldn't come from a "time()" value.

Guy Harris said:
The original asctime() specified exactly what the out buffer would contain. Did it? How? That is, what did old manual pages actually say? If the FreeBSD man page repository is to be believed: http://www.freebsd.org/cgi/man.cgi? query=ctime&manpath=Unix+Seventh+Edition&format=html
then it said:
Ctime converts a time pointed to by clock such as returned by time(2) into ASCII and returns a pointer to a 26-character string in the fol- lowing form. All the fields have constant width.
Sun Sep 16 01:03:52 1973\n\0
Right. I've also been pointed at other documents saying basically the same. It looks like the people who turned that specification into C89 got it slightly wrong. But I doubt it's going to get changed now. It looks like the best we'll get is to have HISTORICAL and STANDARDIZED versions, selected at compile time. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
It looks like the people who turned that specification into C89 got it slightly wrong. But I doubt it's going to get changed now.
Can't we easily fix things by changing the standard to say that it's implementation-specified as to whether the format uses %d or %4d for the year? This would allow both fixed-width and and strict C89 implementations. In practice, portable programs can't assume the C89 behavior now anyway.
It looks like the best we'll get is to have HISTORICAL and STANDARDIZED versions, selected at compile time.
Sorry, I don't follow this remark -- are you referring to the tz code, or to the standard itself?

Paul Eggert said:
It looks like the people who turned that specification into C89 got it slightly wrong. But I doubt it's going to get changed now. Can't we easily fix things by changing the standard to say that it's implementation-specified as to whether the format uses %d or %4d for the year?
Actually %.4d would be better.
This would allow both fixed-width and and strict C89 implementations.
This change could, in theory, break an existing program which relied on the present specification. In addition, it moves certain boundary cases from defined to undefined behaviour, which is generally seen as bad. A case would have to be made for this.
In practice, portable programs can't assume the C89 behavior now anyway.
For some value of "portable". There's "portable to all Standard C implementations" and "portable to both Standard and pre-Standard C". As a general rule, WG14 restricts itself to the former; the whole point of the process was to decide which of the latter should or should not be addressed. I'm not saying this is a hopeless cause, because this *is* clearly a case where WG14/X3J11 messed up. But nobody has spotted this fact in 16 years or longer, meaning it's not exactly been a major concern.
It looks like the best we'll get is to have HISTORICAL and STANDARDIZED versions, selected at compile time. Sorry, I don't follow this remark -- are you referring to the tz code, or to the standard itself?
The tz code. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

Date: Mon, 2 Aug 2004 09:16:35 +0100 From: "Clive D.W. Feather" <clive@demon.net> Message-ID: <20040802081635.GD7224@finch-staff-1.thus.net> | Actually %.4d would be better. You'd prefer years with leading zeroes? Why? Year 0103 looks like someone might be playing with octal base output to me ... -- on the other hand, everyone is used to leading zeroes on minutes and seconds, as they come after the ':' usually - the 0 isn't seen as redundant, and in the hours, the leading 0 generally implies that a 24 hour clock is being used (except the time isn't late enough for it to be obvious, so we have 1am or 1pm, but 01 hours.) | This change could, in theory, break an existing program which relied on the | present specification. It could, but should there be any? The mistake that was made here was in standardising asctime/ctime at all when strftime was all that was needed. The older functions clearly have a stupid interface (for any half modern requirements) - the only reason we should be keeping them is for compatability with old code. There's no reason to do that if they're going to be changed, even slightly, from the historical definitions. And that remains true no matter how much one may be considered to be "just handling previously undefined cases". | For some value of "portable". There's "portable to all Standard C | implementations" and "portable to both Standard and pre-Standard C". There's only one definition of "portable" that matters - if I code in this particular way, can I distribute my code and assume that it will work everywhere (here that means, everywhere there's a compiler that claims to compile C code). Technical arguments about who is right, what the implementation is being compatible with, and all the rest simply don't matter. If my code fails on *any* implementations, it isn't portable. If I want my code to be portable, it has to work on everything that is able to compile it. | I'm not saying this is a hopeless cause, because this *is* clearly a case | where WG14/X3J11 messed up. But nobody has spotted this fact in 16 years | or longer, meaning it's not exactly been a major concern. That's most likely because no-one has been dealing with years outside the 1900-2140 address block very much (given that's all a 32 bit time_t can represent). Regardless of the possibility to set tm_year to just about anything, almost no-one does. It has been the recent interest in handling 64 bit time_t's that raised this issue now. kre

Robert Elz scripsit:
There's only one definition of "portable" that matters - if I code in this particular way, can I distribute my code and assume that it will work everywhere (here that means, everywhere there's a compiler that claims to compile C code).
"Claims" to compile C code? What if it lies, and actually only accepts Fortran? Seriously, it's one thing to claim conformance to (a particular version of) the C Standard. It's quite another thing to claim to compile C in general. C has had forward- and backward-incompatible changes, and while the effort was made to change as little as possible during standardization, that's not the same as changing *nothing*. If you want Perl-style portability, you know where to find it.
If I want my code to be portable, it has to work on everything that is able to compile it.
My computer that I have in my pocket here has only 256 words of RAM, so your program won't "work" on it for very large values of "your program". I don't see that as a portability failure. -- "You know, you haven't stopped talking John Cowan since I came here. You must have been http://www.reutershealth.com vaccinated with a phonograph needle." jcowan@reutershealth.com --Rufus T. Firefly http://www.ccil.org/~cowan

Date: Mon, 2 Aug 2004 16:26:58 -0400 From: John Cowan <jcowan@reutershealth.com> Message-ID: <20040802202657.GB2655@skunk.reutershealth.com> | "Claims" to compile C code? | What if it lies, and actually only accepts Fortran? That (probably) my C code won't compile, and I'll curse a lot, but aside from if you're in my immediate vicinity at the time, there won't be a problem. | My computer that I have in my pocket here has only 256 words of RAM, so your | program won't "work" on it for very large values of "your program". I don't | see that as a portability failure. I do, if the code is supposed to be able to work there, looks as if it will work there, and attempts to work there. Clive D.W. Feather <clive@demon.net> said: | > You'd prefer years with leading zeroes? Why? | Because that's what "constant width" says to me. Hmm, one of us is confused. A constant width field to me is one with a fixed width (fixed number of bytes used). That by itself says nothing about the characters that get put in the field. As an example, the day of the month field is also constant width (2 characters), yet is space filled for values < 10, not zero filled. Nothing that I've seen cares what characters are in the field (well, a year is wanted, but spaces, of zeroes isn't too relevant) - that is, no-one takes the field and tries atoi() (or modern equivalent) on it to get the value of the year - that would be perverse in the extreme. What people want is fixed width fields, in fixed places, so parts that are not wanted can be easily omitted (including especially the trailing \n), but I have also seen programs take just the time, or just the date (the first 10 bytes, and then the last 4 - that is: [20..23] - sometimes other combinations (skip the day of the week and the seconds, ...) Once again, in a modern program, strftime provides all of that, in a far easier and more flexible manner, but strftime didn't always exist. clive@demon.net said: | Why not? This started because you claimed programs were relying on the | previous specification. Yes, I have seen many. I have never seen one that relies on what has been cited as the current standard. Never once. And I look at quite a lot of code. New code just doesn't race about parsing asctime (or ctime) output strings to get the parts of it, old code used to do that all the time. clive@demon.net said: | I don't believe you can write any code that that statement is true for. | Clue: there are C compilers that don't implement printf(). Of course, and that's no problem - if my code uses printf, then that implementation wouldn't compile it (by which I mean producing a fully linked executable that starts to run). Programs that fail to compile (or link) aren't a serious problem, they're fixable. The problems are programs that seem to work just fine, but then (just sometimes) don't produce the results they're supposed to produce. That's the kind of thing that is the problem here. If the C standards had simply omitted asctime, and then there were implementations that didn't bother to provide it at all, then fine. But providing something different than what programs have been expecting, with no way for the program to discover that it isn't going to get on this system the same behaviour as on all the others, essentially forever, is seriously poor, and is the very thing that standards are supposed to help prevent, not promulgate. kre

Robert Elz said:
Clive D.W. Feather <clive@demon.net> said: | > You'd prefer years with leading zeroes? Why? | Because that's what "constant width" says to me.
Hmm, one of us is confused. A constant width field to me is one with a fixed width (fixed number of bytes used). That by itself says nothing about the characters that get put in the field.
Okay. However, I'd rather that the year field said "0097" and not " 97", so that some idiot doesn't think the latter means 1997.
clive@demon.net said: | I don't believe you can write any code that that statement is true for. [...] Programs that fail to compile (or link) aren't a serious problem, they're fixable. The problems are programs that seem to work just fine, but then (just sometimes) don't produce the results they're supposed to produce.
Even with that limitation you have problems. There were just too many places where pre-Standard compilers differed and the Standard had to make a choice one way or the other.
But providing something different than what programs have been expecting, with no way for the program to discover that it isn't going to get on this system the same behaviour as on all the others, essentially forever, is seriously poor, and is the very thing that standards are supposed to help prevent, not promulgate.
While asctime() may not be a good example, this is a situation that the authors of C89 had to face time and time again, and *had* to come down on one side or the other of various decisions. A good example is: if (-1 > (unsigned short) 0) which was true on some implementations but became false when that implementation moved to C89. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

Date: Thu, 5 Aug 2004 07:38:17 +0100 From: "Clive D.W. Feather" <clive@demon.net> Message-ID: <20040805063817.GC22268@finch-staff-1.thus.net> | However, I'd rather that the year field said "0097" and not " 97", so that | some idiot doesn't think the latter means 1997. I'm not sure that there's enough difference there to matter, but this isn't important enough to argue about. | Even with that limitation you have problems. There were just too many | places where pre-Standard compilers differed and the Standard had to make a | choice one way or the other. Those cases I don't worry about - portable code simply has to avoid any such usages, or it isn't portable (either avoid them, or do some kind of compile (or run) time feature test, and work out what works on the particular system in question). The standard may have blessed one of the previous implementation choices, or come up with some entirely new way that seems better than any of the earlier ones, or ... That's all fine in a case like this. A truly portable program however simply ignores the standard, it has to, at least until all evidence suggests that no old systems remain, anywhere at all, where they might cause a problem. Programs that don't care as much about true portability can simply have a README (or whatever) that says "a C99 implementation is required for this program". | While asctime() may not be a good example, For the point I am trying to make, asctime() is the prefect example. This isn't a case where there were different implementations doing different things, that needed to be rationalised. There was no requirement that anything be changed. What there was however was a (hopelessly) inadequate interface, almost everything about it is (was) inadequate (its inability to handle anything other than years that can be represented in 4 characters, the use of the static buffer, ...). When that situation arises, the worst thing a standards body can do is to succumb to the temptation to "fix" the interface, or as in this case, attempt to partially fix it. All that means is that now old code, that had been working with the old interface (apparently adequate for the old code's needs) can no longer trust the function, as some new compilers might have it do something different than what it has always done before. New code still shouldn't go near this interface, as to keep it even seemingly compatible with what previously existed, it must remain largely inadequate (so the static buffer in asctime cannot be "fixed"). The end result is that we end up with a standards blessed function that is absolutely useless for everyone. I can kind of live with ado's last suggested patch around this issue, though I would really much prefer there was no #if to generate the "standards conforming" version of asctime. Simply existing that way does no harm, the problem is that someone might not understand what is happening, and believe that turning on that behaviour is actually the better choice to make. If that happens and the resulting code gets distributed anywhere, then asctime has just turned into yet another of those "any code that uses this is broken by definition" functions that we (unfortunately) have too many of already. So, I'd prefer just the "always fixed width field" version of asctime, the way the old interface was defined - I'd prefer it generate "YYYY" for the year field than "10000" (or anything else) - as long as it remains 4 characters wide - always. kre

On 2004-08-05, Robert Elz wrote: [ Paraphrased: asctime was always stupid, but consistent; new code should have avoided it once we had strftime; we only offer it now for old code that was happy with its hopeless design; there's no excuse for breaking that old code since new code will always use strftime. ] This is a no-brainer; kre is right.
I can kind of live with ado's last suggested patch around this issue, though I would really much prefer there was no #if to generate the "standards conforming" version of asctime. Simply existing that way does no harm, the problem is that someone might not understand what is happening, and believe that turning on that behaviour is actually the better choice to make. If that happens and the resulting code gets distributed anywhere, then asctime has just turned into yet another of those "any code that uses this is broken by definition" functions that we (unfortunately) have too many of already.
So, I'd prefer just the "always fixed width field" version of asctime, the way the old interface was defined - I'd prefer it generate "YYYY" for the year field than "10000" (or anything else) - as long as it remains 4 characters wide - always.
Exactly. This is the only sane solution. We don't need to "fix" asctime; we just need to maintain it. Software that wanders outside its domain will eventually have to be fixed to use strftime, and that's something that will be done when the authors need to do it. The thing that should be added to asctime and friends is a note in the man pages that they are not intended for new code and are provided for historical compatibility; and that new code should be looking at strftime. The stuff that usually appears about standards conformance does rather give the impression to people who stumble across this stuff that it's somehow blessed. It's not; it was always cursed; it should stay that way. Greg

Robert Elz said:
| Actually %.4d would be better.
As Paul points out, this breaks with negative numbers; %04d is better.
You'd prefer years with leading zeroes? Why?
Because that's what "constant width" says to me.
| This change could, in theory, break an existing program which relied on the | present specification. It could, but should there be any?
Why not? This started because you claimed programs were relying on the previous specification.
The mistake that was made here was in standardising asctime/ctime at all when strftime was all that was needed.
Possibly.
| For some value of "portable". There's "portable to all Standard C | implementations" and "portable to both Standard and pre-Standard C". There's only one definition of "portable" that matters - if I code in this particular way, can I distribute my code and assume that it will work everywhere (here that means, everywhere there's a compiler that claims to compile C code).
I don't believe you can write any code that that statement is true for. Clue: there are C compilers that don't implement printf().
If my code fails on *any* implementations, it isn't portable.
I think you need to meet the real world. It doesn't work like that. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
A case would have to be made for this.
The case is pretty simple: * In practice more programs rely on the exactly-26-byte behavior, (which is still documented in many manuals) than in the standard-mandated behavior. * Many popular implentations fail to conform to the standard for years less than 1000. This includes the current versions of Solaris and HP-UX. No doubt there are others.
Can't we easily fix things by changing the standard to say that it's implementation-specified as to whether the format uses %d or %4d for the year?
Actually %.4d would be better.
Wouldn't that print the year -9 as "-0009"? That would be bad, since programs depend on exactly 26 bytes. If you don't like %4d, then we would go even further and allow more implementation freedom, e.g., allow implementations to use %.4d for positive years and %.3d for negative. Portable code couldn't rely on the exact behavior when years are in that range, but it can't rely on the behavior now anyway. Personally, though, I still think "%d or %4d" is simplest.
I'm not saying this is a hopeless cause, because this *is* clearly a case where WG14/X3J11 messed up.
OK. What's the next step, if we'd like to pursue this more formally? How does one file a defect report against the C Standard these days? I could file one, though I'd like to credit Robert Elz for lighting a fire under this issue. The real problem here isn't the exact format: it's that widespread existing softare assumes the exactly-26-byte output, and the C standard breaks this software.

Paul Eggert said: >> A case would have to be made for this. > The case is pretty simple: > * In practice more programs rely on the exactly-26-byte behavior, > (which is still documented in many manuals) than in the > standard-mandated behavior. This may be true. However, making a Quiet Change to a published standard is a serious matter, since it potentially breaks existing working code. Doing this in favour of code that does not conform to the Standard is even more serious. > * Many popular implentations fail to conform to the standard for > years less than 1000. This includes the current versions of > Solaris and HP-UX. No doubt there are others. Broken implementations aren't exactly a great argument either. What about those implementations that are currently conforming? >>> Can't we easily fix things by changing the standard to say that it's >>> implementation-specified as to whether the format uses %d or %4d for >>> the year? >> Actually %.4d would be better. > Wouldn't that print the year -9 as "-0009"? Oops. %04d would be better in my opinion, though I'm willing to debate it. In particular, "0093" is clearly a long time ago; " 93" could be 1993. >> I'm not saying this is a hopeless cause, because this *is* clearly a >> case where WG14/X3J11 messed up. > OK. What's the next step, if we'd like to pursue this more formally? I'm discussing it on the WG14 mailing list. > How does one file a defect report against the C Standard these days? Through your National Body. I will draft one when I have some spare time. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

"Clive D.W. Feather" <clive@demon.net> writes:
making a Quiet Change to a published standard is a serious matter, since it potentially breaks existing working code.
Quite true. However, this is an unusual case, since the standard itself made an undocumented Quiet Change to existing documentation, and the overwhelming majority of working code that cares, one way or another, wants the old 26-byte behavior rather than standard behavior. I've never seen code that assumes the standard behavior, but (like Robert Elz) I've seen several pieces of code, over the years, that assumes the 26-byte behavior.
Broken implementations aren't exactly a great argument either. What about those implementations that are currently conforming?
They would still conform after the proposed change, no? So I don't see the problem. The tz implementation currently conforms to C99, but the next version will deliberately not conform by default. That's one pretty-strong vote for the change, as the tz code is the sole widely-used public-domain implementation of this part of the standard. I am one of the official maintainers of the GNU C Library (which does currently conform), and I'd be happy to see this change. I daresay the BSD folks would agree, if Robert Elz's opinion is any sample. So, overall I think we have a goodly number of "yes" opinions from people whose implementations currently conform. I realize that this isn't everybody, not by a long shot, but it's a reasonable sample of people who are very concerned about compatibility, and they're in favor of this change.
%04d would be better in my opinion, though I'm willing to debate it.
That'd be fine with me too, though it'd be nice to hear Arthur David Olson's opinion on it as well. The main point is that it should be 4 bytes for years within range.
I will draft one when I have some spare time.
Thanks!

"Clive D.W. Feather" <clive@demon.net> writes:
Robert Elz said:
The original asctime() specified exactly what the out buffer would contain.
Did it? How? That is, what did old manual pages actually say?
The Unix Version 7 manuals say that the output of ctime and asctime contain exactly 26 characters (including the trailing newline and null), and that all fields have constant width. When a currently-conforming C implementation prints a year like 999 as "999", without a leading space or blank, it does not conform to the original specification. I should mention again that the V7 implementation was buggy, in that years before 1900 and after 2099 were not rendered correctly. However, even with the bugs, the output was always exactly 26 bytes long and included a trailing newline and null, so that part of the V7 specification was adhered to and applications can and did rely on this property.
I'm not convinced that there has actually been a change.
Well, here's a quote from the horse's mouth <http://cm.bell-labs.com/7thEdMan/vol1/man3.bun> (in the ctime.3 man page section, in in troff input form). It's quite clear that the output is fixed-width. .I Ctime converts a time pointed to by .I clock such as returned by .IR time (2) into ASCII and returns a pointer to a 26-character string in the following form. All the fields have constant width. .PP Sun Sep 16 01:03:52 1973\\n\\0 ... .I Asctime converts a broken-down time to ASCII and returns a pointer to a 26-character string.

Paul Eggert said:
Is using "%-4ld" to print the year a happy medium? You mean, print something like "999 " for the year, with trailing spaces? Sorry, no, that doesn't conform either. If we're going to fail to conform to the standard, we shouldn't mess around with left-justification: it's even more confusing.
Right.
I agree with Robert Elz that "that code is broken. End of story" is too harsh. That is why I'm advocating that we continue to have asctime always return a valid non-NULL string, even though the standard doesn't require this. This is a good thing -- we shouldn't gratuitiously break common usage even if it's no longer conforming.
I don't have a problem with always getting a text string even when the behaviour is undefined; that's a good approach. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |
participants (8)
-
Clive D.W. Feather
-
Greg Black
-
Guy Harris
-
John Cowan
-
Michael Lindner
-
Olson, Arthur David (NIH/NCI)
-
Paul Eggert
-
Robert Elz