comments on draft ISO C9x changes to <time.h>
The ISO committee in charge of the C language has issued a draft for C9x, the next major revision to C. A copy of this (large) document is available in: http://osiris.dkuug.dk/JTC1/SC22/open/2620/n2620/ Section 7.16 of this draft C standard proposes a major overhaul of the functions and datatypes defined in <time.h>. It adds a new data type `struct tmx' that is struct tm extended with the following members: int tm_version; // version number int tm_zone; // time zone offset in minutes from UTC [-1439,+1439] int tm_leapsecs;// number of leap seconds applied void *tm_ext; // extension block size_t tm_extlen; // size of extension block Also, a struct tmx's tm_isdst is the positive number of minutes of offset if DST is in effect. New functions mkxtime, strfxtime use struct tmx instead of struct tm; a new function struct tmx *zonetime (const time_t *timer, int zone); is the rough analog of localtime and gmtime for struct tm. I've submitted the following comments to the ISO committee for their review. A copy of these comments (along with all other US public comments on Committee Draft 1) can be found in: http://osiris.dkuug.dk/JTC1/SC22/WG14/www/docs/n834.htm Category: Feature that should be removed Committee Draft subsection: 7.16 Title: changes to <time.h> need a lot of work and should be withdrawn for now Detailed description: Background and comments Draft C9X introduced a new time struct tmx, new macros _NO_LEAP_SECONDS and _LOCALTIME, and new functions mkxtime, zonetime, and strfxtime. These new functions seem to be an invention of the committee; they are not based on existing practice, and in some cases even ignore longstanding existing practice. The new functions do not address many of the common problems observed with the C89 primitives, notably with mktime. Nor do they add much functionality. For example, a common extension to C, now required by POSIX.1, are reentrant versions of localtime, gmtime, etc. This fills a genuine need, but it's not addressed by draft C9X. There are also other genuine needs that are not addressed; just look at, say, the harsh words about mktime expressed by the author of the tide-calculation program XTide in its source code <http://www.universe.digex.net/~dave/files/xtide-1.6.2.tar.gz>. Draft C9X addresses few of the needs expressed by this author. Here are some more detailed comments on technical shortcomings in this area. Section 7.16.1 paragraph 3. The tm_zone member is an integer number of minutes. However, common practice (e.g. SunOS 4.x, BSD/OS, Linux) is to have a member named tm_gmtoff that is a long number of seconds. This is required for proper support of POSIX.1, which lets the user specify UTC offset to the second; it is also required for proper support of historical applications. For example, the UTC offset of Liberia was 44 minutes and 30 seconds until May 1972, and any program running on, say, Linux with the TZ environment variable set to "Africa/Monrovia" cannot operate correctly with if the UTC offset is required to be a multiple of 60 seconds. The tm_ext and tm_extlen members are an unprecedented kludge in the standard library spec. This is not C++! If the specification for struct tmx is incomplete, this suggests that the editorial work is not done and this type should be withdrawn from the standard. Section 7.16.2.3 paragraph 4. Here, draft C9X added the following new specification for mktime: If the call is successful, a second call to the mktime function with the resulting struct tm value shall always leave it unchanged and return the same value as the first call. (*) This specification is reasonable for mkxtime, but for mktime it requires changes to existing practice in a way that breaks existing software. Existing software often assumes that tm_isdst is either negative, 0, or 1; C89 does not guarantee this, but it is common existing practice, so software that makes this assumption is portable in practice. Unfortunately, specification (*) cannot be satisfied without either adding hidden members to struct tm (which breaks binary compatibility) or by stuffing more information into tm_isdst (which breaks the programs described above). Granted, programs shouldn't assume that a positive tm_isdst is 1, but it's very common in POSIX.1 programs to see expressions like `tzname[tm->tm_isdst]', and these expressions won't work if tm_isdst contains large values. Section 7.16.2.4 paragraph 3. If tm_zone was _LOCALTIME, and if tm_isdst is preposterous (e.g. negative, or INT_MAX), this specification is unclear about what to do. The comments in 7.16.2.6 don't help much. Section 7.16.2.6 paragraph 1. The specification for tm_isdst does not allow for negative daylight-saving time. I don't know of any historical practice for this, but POSIX.1 allows it, and implementations that support POSIX.1 have to allow for it. Section 7.16.2.6 paragraph 2. The limits on ranges for struct tmx members are unreasonable. Common existing practice, for example, is to invoke mktime with a large value for tm_sec to compute a time stamp at some distance from the POSIX.1 epoch. If int and long are the same size, this runs afoul of the new restriction in this section, which limits tm_sec to one-eighth of the potential range. With this limitation I cannot even use mktime to compute today's date on my Unix host from today's time_t value! The other limits are also unnecessary. A well-written mktime should work in the presence of arbitrary values in struct tm members; similarly for mkxtime. Section 7.16.2.6 paragraph 3. There are so many errors in this section that it is hard to determine what is intended. But from what I can tell, the intent is wrong. For example, it seems to be saying that if the implementation supports leap seconds, and if local time is UTC, and if I have a struct tmx that corresponds to 1997-06-30 00:00:00, and then add 1 to tm_mday and invoke mkxtime, I should get 1997-06-30 23:59:60 due to the intervening leap second. This is not what I, the programmer, want or expect! The first sentence in this paragraph reads ``Values S and D shall be determined as follows''. But the rules that follow do not _determine_ S and D; they merely place _constraints_ on S and D. This is because the implementation has some leeway in choosing X1 and X2. It's not clear in this paragraph whether we're looking at C code or mathematics. Are we supposed to be using all the C rules for promotion, conversion, and overflow, or are the calculations to be done using mathematical integer arithmetic? The last sentence in the comment about X1 and X2 is incoherent; I really can't make out what it means. For the implementation to determine X1 and X2, it needs to know what D and S are. But D and S are computed from X1 and X2! More explanation is needed before I can really figure out what's intended here. The definition of D is completely unmotivated, and does not obey the rules of the Gregorian calendar. Among other things, it uses / and % in places where it should use QUOT and REM. (And it can't possibly be right without a `100' in it somewhere. :-) The definition should be rewritten to be something like the following. (Sorry, I haven't tested this, as it's less than 30 minutes before the deadline for submitting comments in the US as this sentence is being written.) D = // day offset since 0000-03-01 // contribution from year Z*365 // number of non-leap days since 0000-03-01 + QUOT(Z, 4) // Every 4 years ends in a leap year. - QUOT(Z, 100) // Every 100 years ends in a nonleap year. + QUOT(Z, 400) // Every 400 years ends in a leap year. // contribution from month; note we start from 03-01 + ((int []){ ...yday offsets, starting in March ...}) [REM(M - 2, 12)] // contribution from day of month + tm_mday - 1 // contribution from time of day + QUOT(SS, 86400) except of course that the expression QUOT(SS, 86400) mishandles leap seconds as described above. Section 7.16.3.5 This new function zonetime is if only marginal use; it seems to be present mostly as a way of defining how mkxtime works. The definition of leap seconds is incorrect. Leap seconds are not a UTC-UT1 offset. The absolute value of the difference between UTC and UT1 is at most 0.9 seconds, by definition. The changes to 7.16 seem to be hastily edited: there are a number of what seem to be typographical errors. The changed text is not explained, and the typos make it hard to understand what was intended. Here are some of the typos that I spotted despite these problems: Section 7.16.1 paragraph 2. _LOCALTIME ``must be outside the range [-14400, +14400].'' Presumably this should be [-1440, +1440], i.e. one day's worth not ten. Section 7.16.2.6 paragraph 3. The definition for QUOT yields numerically incorrect results if (b)-(a) or (b)-(a)-1 overflows. I suggest replacing it with the following definition, which is clearer and free of problems with overflow. This definition relies on C9X's new guarantees about integer division. #define QUOT(a,b) ((a)/(b) - ((a)%(b) < 0)) Similarly, REM can overflow if (b)*QUOT(a,b) overflows. Here is a better version. #define REM(a,b) ((a)%(b) + (b) * ((a)%(b) < 0)) The definition of Z can be written more compactly as: Z = Y - (M < 2); Section 7.16.3.6 paragraph 5. ``If this value is outside the normal range, the characters stored are unspecified.'' What is the ``normal range''? The range as output by localtime, the range of the Gregorian calendar, or the limits as specified in 7.16.2.6? Suggestion Drop all changes to the <time.h> section for this revision of the C Standard. Bring in experts in this area for the next revision of the C Standard. I suggest working together with the members of the Time Zone Mailing list <tz@elsie.nci.nih.gov>. Build on existing practice rather than relying on committee inventions, which have been error-prone in this area. If these suggestions is not followed, a lot of changes are needed to this section, as suggested by the above discussion; please contact me if you need more details.
Paul Eggert wrote:
The ISO committee in charge of the C language has issued a draft for C9x, the next major revision to C. Section 7.16 of this draft C standard proposes a major overhaul of the functions and datatypes defined in <time.h>.
I've submitted the following comments to the ISO committee for their review.
I have submitted the following memo to the comittee, in order to handle Paul's comment. I shall appreciate any comment about this from the people listening at this list, since it appears to me that they are among the most informed persons on these matters. I noticed in the archives the PC-US0011, from Paul Eggert, and particularly the point #14 about the extensions introduced by C9X to the time functions. I intended to propose a change to the draft to solve these issues. Of course, I do not comment about the points of detail regarding the wordings, but I try to stick at the most basic problems. I believe that if we want to go beyond the present (C90) state of time functions, the following needs are to be covered (in this order): 1) remove the dependancy to the internal static buffers 2) doing 1) in a way compatible with POSIX.1 (*_r functions) 3) having a way to specify the timezone (other than the local and UTC ones) [when timezones are considered as UTC offsets] 4) doing 3), including when passing a DST shift, or a change of rules, i.e. when timezones are considered as a portion of the world 5) handling explicitely leapseconds Notes: There are three kinds of internal static buffers: - the buffers specified in the Standard which hold the return values of asctime, ctime, gmtime and localtime - the buffer to hold informations about the timezone - the buffer to hold the locale informations for strftime As we all know, the third kind is bound to the locale model, so I shall not go further in this area. Unfortunately, the POSIX.1 *_r functions remove the dependency on only the 1st kind. So the removal of the dependency to the 2nd kind will require inventing new stuff. Then, I believe POSIX's ctime_r and asctime_r can be written using the present standard library, with code like char * asctime_r (const struct tm *timeptr, char *p ) { char *old_loc; GET_MUTEX_LOCK(locale_lock); /* if the locale is shared */ old_loc = setlocale(LC_TIME, "C"); strftime(p, 26, "%c\n", timeptr); setlocale(LC_TIME, old_loc); RELEASE_MUTEX_LOCK(locale_lock); return p; } (in part because the behavior of strftime is now better specified in the "C" locale). And if I am wrong, then I believe wordings should be improved for this to be correct (I know the problem about setlocale to *not* be called from inside the library, but this can be solved by the implementor using an internal alias). So, to solve localtime_r/gmtime_r needs and point #3, I then propose two new functions (to be compared with zonetime and mkxtime from C9X draft) and a new structure [and perhaps another type]. Remarks around brackets are for you to comment on! The structure is named struct tzinfo, and its first field, named tz_gmtoff, is a long containing the offset in seconds from UTC to a timezone, with positive values meaning ahead of UTC. The structure might contain other [unspecified ? implementation-dependent ?] fields, for example to specify DST rules, but they should be designed such as when initialized to zero, the designated timezone holds a constant offset with UTC. The functions are: struct tm * zonetime ( const time_t * timer, const struct tzinfo * tz, /* if NULL, use local time */ struct tm * timeptr); /* if NULL, use internal */ time_t timezone ( struct tm * timeptr, const struct tzinfo * tz); /* if NULL, use local time */ The meaning of these functions should be obvious when I say that the usual functions can be expressed with them: time_t mktime( struct tm* timeptr ) { return timezone(timeptr, NULL); } time_t timegm( struct tm* timeptr ) { /* Here, I used the fact that tz_gmtoff is the first field of the structure, and that an partly-initialized structure is filled with zeroes */ return timezone(timeptr, &{0} ); } struct tm * localtime( const time_t * timer ) { return zonetime(timer, NULL, NULL); } struct tm * gmtime( const time_t * timer ) { return zonetime(timer, &{0}, NULL); } struct tm * localtime_r( const time_t * timer, struct tm* p ) { return zonetime(timer, NULL, p); } struct tm * gmtime_r( const time_t * timer, struct tm* p ) { return zonetime(timer, &{0}, p); } struct tm * CD1_zonetime( const time_t * timer, int value ) { return zonetime(timer, &{value}, p); } Another usefull call is when you parse a date with an explicit timezone indication, like in an e-mail client, and want to deal with: just transform "+hhmm" into a count of seconds sec, and call t = timezone(&tm, &{sec}); Of course, zonetime is allowed to return a NULL pointer if the given inputs are not valid (and so must be allowed localtime, as pointed out in PC-US0011#13), or if the offset between timer and UTC cannot be determined (as it is currently the case with gmtime); this is the same for timezone. (BTW: all names are just indications, they are open to discussion; timezone is probably a bad choice, since it was used in some versions of UNIX). Then, the type of tz_gmtoff need not be `long'; it needs only to be a numeric type capable of holding all the integers in the range [-89999, 89999] -- which would be enough to satisfy POSIX.1, plus another value meaning _LOCALTIME like in CD1. It might be worth adding a type gmtoff_t (or utcoff_t) for this. This is important if we think about the ways to retrieve the correct offset of the timezone to UTC *after* the call of the function, in particular in the cases of time zones with DST changes, like is local time. So an alternative model might be: struct tm * zonetime ( const time_t * timer, const struct tzinfo * tz, /* if NULL, use local time */ struct tm * timeptr; /* if NULL, use internal */ gmtoff_t * offsetptr); /* if NULL, do not store */ time_t timezone ( struct tm * timeptr, const struct tzinfo * tz; /* if NULL, use local time */ gmtoff_t * offsetptr); /* if NULL, do not store */ Another possibility is to add another function, that returns the offset of the time zone from a struct tm; like gmtoff_t gmtoff ( const struct tm *timeptr ); or perhaps gmtoff_t gmtoff ( const struct tm *timeptr, const struct tzinfo *tz ); (The Comittee might consider turning gmt to utc; but that is another story). Then, to extend this to handle "real" time zones, instead of just their offsets, we need to go a little further. C standard choose to *not* describe in details the behavior, and I feel it can be considered too heavy for some implementations (need to update, for example). Also, specifying point #4 might be very tricky in the context of a International Standard (like Israel or Saudian rules). So I believe point #4 should be left as a QoI issue (but should be available as natural extension, of course). OTOH, in the realm of POSIX, there is a quite natural extension to this mechanism which fits the need: struct tzinfo *tzalloc ( const char *tzspec); /* if NULL, use local time */ where tzspec has the same format as the contents of the POSIX.1 TZ environment variable (with the usual extensions, such as Olson's : prefix, allowed). This function yields a null pointer if given an invalid specification; the storage allocated by this function can be freed with `free'. Having a mechanism to dynamicaly allocate storage is required, because the most powerful implementations will store in struct tzinfo historical information, which grows with time, and thus requires this struct to be implemented as a VLA. This design fits pretty well in Olson's code, where it only adds small things. It is also pretty light (when compared to the actual <time.h> stuff), and eases upgrading current implementations, more limited in functionnalities (like "only USA rules are used" found in numerous std libc coming from the USA ;-), or even "I only know localtime" flavors). Please vendors that may have a different point of view do write it to me. It also have the property of *not* breaking current binary compatibility, as it does not extend the meaning of anything in struct tm (but merely stores the necessary informations in a different place). But please keep reading. However, it has a major flaw: it does not permit an explicit handling of leap seconds. And I do not how to solve this: I do not want to add new parameters (too expensive for almost no gain in day-to-day use), and I do not want to introduce a new kind of structure (too complex I believe). (Another problem is that I do not know how to express the rules about leap seconds in the context of the library, while allowing an implementation to optionaly support it... see the point of Paul Eggert in his PC about mkxtime on +1 day on June 30th, 1997, midnight, which in the current draft doesn't yield July 1st). For the leapseconds, there is first a very delicate inter- operability problem, since POSIX.1 request time_t value to *not* record any leap second information at all. So I have no practical solution here, outside the one proposed in CD1, i.e. to extend struct tm to have a new field storing this information. This (and another open problem, how to print the actual time zone name) leads to another question: why does the Comittee introduced a new type for the time functions, instead of extending struct tm? In the ausence of the answer to this question, I stay with an open alternative on how to end this proposal: a) extending struct tm explicitely (adding fields), which perhaps might require using tm_isdst field as a flag to request/signal C9X behavior (tm_isdst is superseeded by the informations in the struct tzinfo) b) requesting indirectly implementations to insert the new fields to support the whole spec, but keeping them invisible to the "conforming" user (as do tz package or BSD right now) c) not extending struct tm and adding some more arguments to the new functions (or additional functions) to collect other informations and of course d) CD1's solution, creating a structure for extending (still a correct possibility), replacing the tm_ext stuff with a pointer to a tzinfo struct Waiting for your welcome and certainly useful comments, Antoine
On Mon, 15 Jun 1998, Antoine Leca wrote:
OTOH, in the realm of POSIX, there is a quite natural extension to this mechanism which fits the need:
struct tzinfo *tzalloc ( const char *tzspec); /* if NULL, use local time */
where tzspec has the same format as the contents of the POSIX.1 TZ environment variable (with the usual extensions, such as Olson's : prefix, allowed).
This function yields a null pointer if given an invalid specification; the storage allocated by this function can be freed with `free'.
It might make sense to use the same sort of interface as for the POSIX.2 regular expression functions, e.g. int tzcomp(timezone_t *zone, const char *tzspec); size_t tzerror(int errcode, const timezone_t *zone, char *errbuf, size_t *errbuf_size); void tzfree(timezone_t *zone); tzcomp would return zero on success, or an error code otherwise (e.g. TZ_BADSPEC for an invalid tzspec string, TZ_NOMEM for allocation failure or implementation defined values for other errors, e.g. in a timezone file); tzspec would be a POSIX.1 TZ value or NULL for an implementation defined local timezone. This allows extension through the use of the implementation defined values beginning with ':'. tzerror would convert an error code to a string (modeled on regerror); tzfree would free any allocated parts of the timezone_t structure. The structure could have some specified elements (e.g. UTC offset) if these are useful and can be sensibly defined (i.e., a timezone_t can represent a complete timezone history covering the past and future - what date's offset should be given?).
For the leapseconds, there is first a very delicate inter- operability problem, since POSIX.1 request time_t value to *not* record any leap second information at all. So I have no practical solution here, outside the one proposed in CD1, i.e. to extend struct tm to have a new field storing this information.
The most compatible solution would be to use Markus Kuhn's CLOCK_UTC with nanosecond values up to 1999999999 during leap seconds (and if the system clock ticks TAI the library handles the conversion using a leap second table). This would however require new interfaces for time conversion that take times with nanoseconds. What is the `correct' time display to give for a leapsecond in a zone with an offset from UTC that is not an integral number of minutes? Have there actually been any such zones since the start of the leapsecond system? -- Joseph S. Myers jsm28@cam.ac.uk
Joseph S. Myers wrote:
First, thanks for your input.
On Mon, 15 Jun 1998, Antoine Leca wrote:
OTOH, in the realm of POSIX, there is a quite natural extension to this mechanism which fits the need: <snip: a mechanism>
It might make sense to use the same sort of interface as for the POSIX.2 regular expression functions, e.g. <snip: another mechanism>
I am not competent enough to discuss this point. However, I shall transmit this to the "liaison", i.e. to the person in charge of keeping the POSIX standard in line with the C standard (which happens to be Keld Simonsen). Anyway, it appears to me that there is no real problem to have *in the POSIX realm* mechanisms that permit fine selection of the time zone, viewed as a part of the world. Do I rewrite correctly your point?
For the leapseconds, there is first a very delicate inter- operability problem, since POSIX.1 request time_t value to *not* record any leap second information at all. So I have no practical solution here, outside the one proposed in CD1, i.e. to extend struct tm to have a new field storing this information.
The most compatible solution would be to use Markus Kuhn's CLOCK_UTC with nanosecond values up to 1999999999 during leap seconds (and if the system clock ticks TAI the library handles the conversion using a leap second table). This would however require new interfaces for time conversion that take times with nanoseconds.
This seems feasable; I have just a question: when there is a "step" leap second, i.e. when Terra is turning faster (clocks are going from 23:59:58 to 00:00:00), how is it supposed to be handled in this mechanism? However, I see this as requiring a new field, here to hold the nanoseconds, so in effect it returns to CD1 solution (a new structure), since the C Comittee is reluctant at inserting new "full-class" fields in struct tm because it breaks binary compatibility.
What is the `correct' time display to give for a leapsecond in a zone with an offset from UTC that is not an integral number of minutes? Have there actually been any such zones since the start of the leapsecond system?
I do not know of, except of course Saudi Arabia (but their system is entirely different, and is not covered here). I think that introducing leap seconds require some level of technology available that is not compatible with keeping clocks at local mean time instead of using the world time zones system; but I may be wrong. Antoine
Antoine Leca wrote on 1998-06-15 11:21 UTC:
The most compatible solution would be to use Markus Kuhn's CLOCK_UTC with nanosecond values up to 1999999999 during leap seconds (and if the system clock ticks TAI the library handles the conversion using a leap second table). This would however require new interfaces for time conversion that take times with nanoseconds.
This seems feasable; I have just a question: when there is a "step" leap second, i.e. when Terra is turning faster (clocks are going from 23:59:58 to 00:00:00), how is it supposed to be handled in this mechanism?
You just jump from (time_t) tv_sec to (time_t) tv_sec+1. Remember that (time_t) is just an encoding of a clock display and not necessarily a physical seconds counter. I have just written a Web page about my suggestions for POSIX clocks that handle leap seconds adequately: http://www.cl.cam.ac.uk/~mgk25/posix-clocks.html At the beginning of this text, you will find references to a number of documents that explain leap seconds and related concepts. For instance, have a look at ITU-R Recommendation TF.460-4, which is the definition of UTC and how its leap seconds work. TF.460-4 defines what a UTC clock displays during a positive and negative leap second, and we just encode this straight forward into struct time_spec. There just wasn't so far a way defined how to encode a seconds field containing the value 60, but adding 1_000_000_000 to the nanosecond field seems to be a practical way of extending the set of time displays that struct timespec can encode for the required inserted leap second. There is no need to encode a removed leap second, because you will not be able to produce a time stamp during a removed leap second. Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>
"Joseph S. Myers" wrote on 1998-06-15 09:29 UTC:
What is the `correct' time display to give for a leapsecond in a zone with an offset from UTC that is not an integral number of minutes? Have there actually been any such zones since the start of the leapsecond system?
There is no official national time zone that is defined as an offset to UTC but that does not have an integral number of minutes difference relative to UTC. Obviously, such time zones would be rather difficult to define, because there would be no obvious notation for the inserted leap second, as it could not be called 60. Paul Eggert wrote: The tm_zone member is an integer number of minutes. However, common practice (e.g. SunOS 4.x, BSD/OS, Linux) is to have a member named tm_gmtoff that is a long number of seconds. This is required for proper support of POSIX.1, which lets the user specify UTC offset to the second; it is also required for proper support of historical applications. For example, the UTC offset of Liberia was 44 minutes and 30 seconds until May 1972, and any program running on, say, Linux with the TZ environment variable set to "Africa/Monrovia" cannot operate correctly with if the UTC offset is required to be a multiple of 60 seconds. I think time zone definitions such as Liberia until May 1972 were obviously either not based on UTC or were an intellectual error of someone who tried to define it based on UTC but didn't understand UTC (which is excusable since at that time it was rather young and as we all know well, people still have understanding the concept of leap seconds today). The second offset count would only be useful to allow a good approximation of time zones that can best be described by an integral second offset relative to UT0 or UT0, but nobody interested in precision timestamps would use such a time zone today. People not interested in precision time stamps do not operate computers with clocks that have an accuracy of better than 30 seconds, so all this sounds rather academic to me anyway. Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>
participants (4)
-
Antoine Leca
-
Joseph S. Myers
-
Markus Kuhn
-
Paul Eggert