Proposal: API for thread-safe time zone functions
One of the major shortcomings of the current time zone API defined by ISO C and POSIX is that they rely on global data to define the current time zone. If you're writing a server program which needs to talk to people all over the world in their local time zones, this can lead to remarkable difficulties. Therefore, I've written up a proposal (attached) for a thread-safe API for time zone functions. These extend (I think) the current time API in a natural way. There are two forms: one purely re-entrant, and another one that specifies thread-local timezone data to be used by the existing time functions. I've drawn a lot of inspiration for this work from Ulrich Drepper's work on thread-aware POSIX locales (http://www.cygnus.com/~drepper/tllocale.ps.bz2). * Are people interested in this? * Do people think that my proposal is a sensible API? * Would people be interested in seeing the resulting code? * Would people be interested in helping write the code? * Would the resulting code (assuming it's written sensibly) be acceptable/appropriate for incorporation into tzcode? Comments very welcome! A proposal for thread-safe time zone information. Three functions manipulate a struct tz: Name newtz -- create a time zone object Synopsis #include <time.h> struct tz* newtz(const char *tzname); Description The newtz() function creates a new time zone object, corresponding to the given time zone name. tzname is a pointer to a string representing the name of the allocated time zone. It has the same syntax as the "TZ" environment variable, both the POSIX forms and locally-defined symbolic names. The allocated struct tz contains all necessary information to represent times in the specified time zone. It has one externally-visible element: tz_name, an array of two ASCII strings containing the time zone abbreviations for standard and daylight time in the current time zone. struct tz { const char *tz_name[2]; (private data) }; If tzname is NULL, the returned struct tz describes the local wall-clock time, as best as it is known by the local system. If tzname is the empty string "", the returned struct tz describes Coordinated Universal Time (UTC). (The POSIX rules also allow UTC to be represented with the string "GMT0".) This interface is designed so that newtz(getenv("TZ")) will return an object describing the default time zone object that non-thread-aware versions of the time functions will use by default, provided TZ (if set) is set to a valid time zone name. Return Value If the function call succeeds, the return value is a pointer to a time zone object, which should be released by a call to freetz(). If it fails, it returns NULL and sets errno. Errors The newtz() function shall fail if: ENOMEM Not enough memory is available to create the time zone object. ENOENT No known time zone corresponds to tzname. newtz() may also fail with other errno values if there is a problem with the system time zone database. Name freetz -- Free resources allocated for a time zone object Synopsis #include <time.h> void freetz(struct tz* tzobj); Description The freetz() function frees the resources allocated for a time zone object returned by a call to newtz() or duptz(). If the system defines the tm_zone field of struct tm, this function invalidates the strings pointed to by the tm_zone field of all struct tm values created by localtime_z called with this tzobj. Return Value None. Errors None. Name duptz -- Duplicate a time zone object Synopsis #include <time.h> struct tz* duptz(const struct tz* tzobj); Description The duptz() function can be used to duplicate an existing time zone object. A time zone object can be used at any time in multiple places but if the lifetime can possibly end before all uses are finished one has to create a duplicate. Return Value If the function succeeds it returns a pointer to a time zone object identical to the one represented by the pointer passed in tzobj. If it fails it returns NULL. Errors The duptz() function shall fail if: ENOMEM Not enough memory is available to create the duplicated time zone object. The duptz() function may fail if: EINVAL tzobj does not point to a valid time zone object. Modified time-zone-aware time manipulation functions: struct tm * localtime_z(const time_t *clock, struct tm *result, const struct tz *tz); Equivalent to localtime_r() or gmtime_r(), in the time zone represented by tz, except that tz->tz_name is not modified. (The return value's tm_zone value is set correctly, if the system has tm_zone.) char * ctime_z(const time_t *clock, char *buf, const struct tz *tz); Equivalent to ctime_r(), in the time zone represented by tz. char * asctime_z(const struct tm *tm, char *buf, const struct tz *tz); Equivalent to asctime_r(), in the time zone represented by tz. time_t mktime_z(struct tm* tm, const struct tz *tz); Equivalent to mktime() or timegm(), in the time zone represented by tz. size_t strftime_z(char *buf, size_t maxsize, const char *format, const struct tm* timeptr, const struct tz *tz); Equivalent to strftime(), in the time zone represented by tz. char * strptime_z(const char *buf, const char *format, struct tm *timeptr, const struct tz *tz); Equivalent to strptime(), in the time zone represented by tz. (This affects only the interpretation of the %Z format specifier.) Thread-support functions Name tzuse -- use time zone object in current thread. Synopsis #include <time.h> void tzuse(const struct tz *tz); Description The tzuze() function is similar to the tzset() function, but it does not affect the global time zone. Instead it selects the new time zone only for the current thread. The time zone setting for all other threads remains the same. Once tzuse() has been called, all calls to the functions localtime(), localtime_r(), ctime(), ctime_r(), asctime(), asctime_r(), mktime(), strftime(), and strptime() in the current thread will use the current thread's define time zone. If tzuse() is called with a NULL pointer as its argument, the current thread will again use the global time zone object. -- Jonathan Lennox lennox@cs.columbia.edu
On Thu, 7 Jun 2001, Jonathan Lennox wrote:
I've drawn a lot of inspiration for this work from Ulrich Drepper's work on thread-aware POSIX locales (http://www.cygnus.com/~drepper/tllocale.ps.bz2).
* Are people interested in this? * Do people think that my proposal is a sensible API? * Would people be interested in seeing the resulting code? * Would people be interested in helping write the code? * Would the resulting code (assuming it's written sensibly) be acceptable/appropriate for incorporation into tzcode?
Comments very welcome!
Have you looked at Markus Kuhn's proposal at http://www.cl.cam.ac.uk/~mgk25/c-time/ and the other ones linked to from there? My recommendations: * Design the proposal as an amendment for ISO C rather than POSIX. * Don't touch how timestamps are represented (any interface can be adapted to use any time_t replacement that gets agreed). * Provide a struct tm replacement with (a) subsecond resolution and (b) a proper field indicating which repetition of a repeated timestamp is referred to (A/B in German time notation), rather than the inadequate indication of whether the time is in daylight savings. * Provide four conversion functions: between time_t and broken down times, in either direction, and equivalents of strftime and wcsftime. Don't duplicate other functions such as asctime that can easily be replicated. * Use an C99 snprintf-style return value (return the length of buffer required if the buffer isn't long enough) rather than what strftime currently does (return 0 if the buffer isn't long enough). * Provide specified timezone names for both the user's local timezone and the system's local timezone. -- Joseph S. Myers jsm28@cam.ac.uk
<<On Thu, 7 Jun 2001 17:30:58 +0100 (BST), "Joseph S. Myers" <jsm28@cam.ac.uk> said:
* Design the proposal as an amendment for ISO C rather than POSIX. * Don't touch how timestamps are represented [...] * Provide a struct tm replacement with [...] * Use an C99 snprintf-style return value [...]
All of these suggestions are essentially orthogonal to the main issue of re-entrant, thread-specific time conversions. I suggest that Mr. Lennox's proposal is much more likely to be gain concensus than any changes to the underlying interfaces.
* Provide four conversion functions: between time_t and broken down times, in either direction, and equivalents of strftime and wcsftime.
This seems to be sound advice. This interface should also include the equivalent of strptime() as well, assuming it can be adequately specified. Only the timezone code has adequate access to the localization information needed to parse such times.
* Provide specified timezone names for both the user's local timezone and the system's local timezone.
In the context of POSIX, there is no such distinction. (Of course, the timezone library is able to make a distinction.) -GAWollman
On Thu, 7 Jun 2001, Garrett Wollman wrote:
All of these suggestions are essentially orthogonal to the main issue of re-entrant, thread-specific time conversions. I suggest that Mr. Lennox's proposal is much more likely to be gain concensus than any changes to the underlying interfaces.
If designing better timezone interfaces, we should try to get them right rather than needing another change later. As long as we don't try to change time_t, consensus shouldn't be a great problem. Some more points: * Better timezone interfaces were extensively discussed on the tz list in September/October 1998, and that discussion should be taken into account. It isn't clear whether this proposal has done so. * Better time interfaces were listed as one possible item for an amendment to C99 (after the problematic changes in some C9X drafts were backed out). Antoine Leca may know the current ISO status of this. * A list for discussing these interfaces was then set up, though it hasn't had much discussion, but it would be the appropriate place for discussing these interfaces. c-time@list.cr.yp.to, subscription by empty message to c-time-subscribe@list.cr.yp.to.
* Provide specified timezone names for both the user's local timezone and the system's local timezone.
In the context of POSIX, there is no such distinction. (Of course, the timezone library is able to make a distinction.)
The user's time is that in TZ, the system time is that when TZ is unset. In the ISO C context, defined names for these should be provided. The system time is for use of programs controlling access to resources whose cost depends on the system's local time. -- Joseph S. Myers jsm28@cam.ac.uk
Hi folks, Sorry, I was remote from TZ some days, so I did notice this discussion until today. I have not yet read the whole thread, but I should say it looks like to me a lot of sensible things have been said, so a really good thing would be to follow someone's, I believe that is Paul's, advice: a rationale that captures what has been said here would be very welcome, and this is particularly important for the standarization committees. This post is to answer an easy one: Joseph S. Myers wrote:
* Better time interfaces were listed as one possible item for an amendment to C99 (after the problematic changes in some C9X drafts were backed out). Antoine Leca may know the current ISO status of this.
Yes. Currently, things are exactly as you describe: this is a possible item, but not one on which we are currently working on. If there is some proposal that emerge as being acceptable to the vast majority of the specialists, then we could move on: that is, designing a "new work item" for the C standardization committee (and probably or subsequantly the POSIX committee as well), to design indeed such an amendment, and get it voted in by the national bodies at ISO that are in charge of programming languages (that is SC22 if you are OK with ISO hierarchy). Then working as a subgroup including all of you which are the experts in the field, to issue a formal draft of an amendment to the C standard. In parallel, we will have to built up implemen- tations. After that (and we are in the 2004 time frame as a minimum), the baby will escape us and enter the real process of standardization, with the politics it involves (this means this can delay things quite a lot). There have been two major attempts in the (recent) past: one was part of the C99 process, and it failed because a number of experts did consider that the proposed changes, while ambitious, failed to solve all the issues. A lot of the material that is still floating around (and that are conveniently linked from Markus' page), dated from this epoch. A second attempt took place as part of the Austin group process (the next revision of Posix); it was mainly motivated by the 2038 problem (time_t), but then drifted to more ambitious "solutions", and the subgroup failed to achieve a minimal level of consensus about the results, so again interest vanished. While I tried to be as objective as possible, I welcome rectifications. Antoine
Garrett Wollman wrote on 2001-06-07 17:11 UTC:
All of these suggestions are essentially orthogonal to the main issue of re-entrant, thread-specific time conversions. I suggest that Mr. Lennox's proposal is much more likely to be gain concensus than any changes to the underlying interfaces.
IMHO, API's should be designed properly right from the beginning, because they can't be withdrawn in the future. The world is already full of quick-add-on reentrant API fixes that later had to be again superseded by something different, because only the thread-safely and none of the other accumulated problems had been addressed. Markus P.S.: I just fixed lots of broken links in the reference section of http://www.cl.cam.ac.uk/~mgk25/c-time/ that had moved around, so if you couldn't find something, please try again. -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
On Thursday, June 7 2001, "Markus Kuhn" wrote to "tz@elsie.nci.nih.gov" saying:
Garrett Wollman wrote on 2001-06-07 17:11 UTC:
All of these suggestions are essentially orthogonal to the main issue of re-entrant, thread-specific time conversions. I suggest that Mr. Lennox's proposal is much more likely to be gain concensus than any changes to the underlying interfaces.
IMHO, API's should be designed properly right from the beginning, because they can't be withdrawn in the future. The world is already full of quick-add-on reentrant API fixes that later had to be again superseded by something different, because only the thread-safely and none of the other accumulated problems had been addressed.
I think there's need for APIs to interact with both time_t's and your xtime_t's (or whatever they end up being called) using thread-safe timezones. Since I believe you didn't modify struct tm in your proposal, only the conversion functions (the equivalents of mktime and localtime) need to be different for the two datatypes. Your naming choices: xtime_make() and xtime_breakup() -- imply obvious names for time_t-based equivalents (time_make() and time_breakup() respectively). -- Jonathan Lennox lennox@cs.columbia.edu
On Thursday, June 7 2001, "Joseph S. Myers" wrote to "Jonathan Lennox, <tz@elsie.nci.nih.gov>" saying:
Have you looked at Markus Kuhn's proposal at
http://www.cl.cam.ac.uk/~mgk25/c-time/
and the other ones linked to from there?
No, I hadn't. It's interesting -- he has some of the same ideas I had, but trying to solve some problems that are rather larger in scope. I was trying to solve a much smaller problem -- making the minimal changes to existing functions needed in order to get re-entrant timezone support.
My recommendations:
* Design the proposal as an amendment for ISO C rather than POSIX.
Wouldn't it need to be an amendment to both? I don't have either specification handy, but I was under the impression that the TZ environment variable, and tzset(), were POSIX.
* Don't touch how timestamps are represented (any interface can be adapted to use any time_t replacement that gets agreed).
Agreed.
* Provide a struct tm replacement with (a) subsecond resolution and (b) a proper field indicating which repetition of a repeated timestamp is referred to (A/B in German time notation), rather than the inadequate indication of whether the time is in daylight savings.
I think this is orthogonal to what I'm looking at. I agree it's a good idea in principle, but I don't think I have the expertise to do it in practice.
* Provide four conversion functions: between time_t and broken down times, in either direction, and equivalents of strftime and wcsftime. Don't duplicate other functions such as asctime that can easily be replicated.
Fair enough. Also a good point about the wide versions. What's your thought on strptime_z()? (And should there be a wcsptime_z()?)
* Use an C99 snprintf-style return value (return the length of buffer required if the buffer isn't long enough) rather than what strftime currently does (return 0 if the buffer isn't long enough).
Useful enough, I suppose, but is it worth the compatibility break with strftime()?
* Provide specified timezone names for both the user's local timezone and the system's local timezone.
I don't think ISO C distinguishes between these concepts, and for POSIX, getenv("TZ") should be sufficient for the former, no? I defined the NULL string as representing the latter. -- Jonathan Lennox lennox@cs.columbia.edu
On Thu, 7 Jun 2001, Jonathan Lennox wrote:
* Design the proposal as an amendment for ISO C rather than POSIX.
Wouldn't it need to be an amendment to both? I don't have either specification handy, but I was under the impression that the TZ environment variable, and tzset(), were POSIX.
Conceptually, this is at a level which is appropriate and useful for ISO C. POSIX will adopt ISO C amendments in due course. The basic interfaces for thread-safe use of timezones can be specified in ISO C, with additional features (e.g. TZ, tzset, how anything to do with threads is done) left to POSIX. (I think thread-safe interfaces will still be useful to people writing for a non-POSIX ISO C environment with threads.) The POSIX timezone strings could either be carried over to ISO C, or left to POSIX with timezone strings left implementation defined in ISO C.
What's your thought on strptime_z()? (And should there be a wcsptime_z()?)
strptime is an X/Open feature (newly in POSIX as of the Austin Group drafts) so would go in the POSIX side of these amendments. The Austin Group drafts don't seem to have a wcsptime function.
* Use an C99 snprintf-style return value (return the length of buffer required if the buffer isn't long enough) rather than what strftime currently does (return 0 if the buffer isn't long enough).
Useful enough, I suppose, but is it worth the compatibility break with strftime()?
When C99 specified snprintf's return value, this was different from some prior art that did it differently, and from swprintf (added in AMD1) which used strftime-like returns. I think providing the information for a single new allocation of the desired buffer length to suffice (rather than iterating with larger buffer sizes) is sufficiently worthwhile to make the change.
* Provide specified timezone names for both the user's local timezone and the system's local timezone.
I don't think ISO C distinguishes between these concepts, and for POSIX, getenv("TZ") should be sufficient for the former, no? I defined the NULL string as representing the latter.
If the weasel wording required to fit this in ISO C (which does rather lack the concepts involved) can be worked out, I think it would be useful to put these in. -- Joseph S. Myers jsm28@cam.ac.uk
Jonathan Lennox wrote:
* Are people interested in this?
I think it's a good idea.
* Do people think that my proposal is a sensible API?
My main problem with it is that I see no point in the struct tz objects. Programs don't typically do zillions of time conversions. I would rather pass the time zone name to each externally vizible *_z function and let them look up the name in the currently maintained cache, if any. We certainly want to discourage callers from trying to save the tz structure in a persistent database, since its content may change over time. This would require adding a new function to retrieve the tz_name array for a given time zone name. I also question the utility of a per-thread current time zone being maintained by the library, which then has to know what kind of thread library you have so it can discover the current thread ID. Making something thread-safe at the cost of adding one argument to each tzlib call is not so bad. -- There is / one art || John Cowan <jcowan@reutershealth.com> no more / no less || http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
On Thursday, June 7 2001, "John Cowan" wrote to "Jonathan Lennox, tz@elsie.nci.nih.gov" saying:
* Do people think that my proposal is a sensible API?
My main problem with it is that I see no point in the struct tz objects. Programs don't typically do zillions of time conversions. I would rather pass the time zone name to each externally vizible *_z function and let them look up the name in the currently maintained cache, if any. We certainly want to discourage callers from trying to save the tz structure in a persistent database, since its content may change over time.
The problem with having a cache is that it exposes the tzcode to all the inherent difficulties of thread-consistency and synchronization. And I think everyone would agree that *not* caching loaded tz information in some way (either in the library or the client code) would be a bad idea. It wouldn't be possible to save the tz structure in a *persistent* database, if by that you mean non-volatile, because it contains private information. Keeping it over the course of the lifetime of a process is perhaps more troubling? Currently, tzcode doesn't notice (I believe) when /etc/localtime or the zoneinfo files are modified out from underneath a running process. You could argue that this is less of a problem for the single-timezone conceptual model, though.
I also question the utility of a per-thread current time zone being maintained by the library, which then has to know what kind of thread library you have so it can discover the current thread ID. Making something thread-safe at the cost of adding one argument to each tzlib call is not so bad.
The motivation for this section is to allow existing code which uses the time functions to work with thread-safe timezones. This model allows you, e.g., to set a thread timezone and then call a function in an external library to which you don't have source. -- Jonathan Lennox lennox@cs.columbia.edu
<<On Thu, 7 Jun 2001 12:09:33 -0400 (EDT), Jonathan Lennox <lennox@cs.columbia.edu> said:
Three functions manipulate a struct tz:
newtz -- create a time zone object
Poor choice of name. I believe that POSIX now requires any new interfaces to use namespace prefixes to avoid taking away more application namespace.
#include <time.h>
...probably a different header file should be used as well.
tzname is a pointer to a string representing the name of the allocated time zone.
Should allow a null pointer to represent whatever the default (as used in tzset()) would be.
It has one externally-visible element: tz_name,
A structure of this sort should be opaque. Define accessor functions, not the members of the structure. Unfortunately, POSIX is rather too fond of defining foo_t typedefs for things (like pointers to opaque structures) which should not be typedef'ed, so `struct tz' would probably get transmogrified by standards committees into `timezone_t' or some similar nonsense.
If tzname is NULL, the returned struct tz describes the local wall-clock time, as best as it is known by the local system.
Oops... Should reorder this description.
If tzname is the empty string "", the returned struct tz describes Coordinated Universal Time (UTC). (The POSIX rules also allow UTC to be represented with the string "GMT0".)
As I think about it, I think a better alternative would be parallel to how setlocale() works: If tzname is a null pointer, the return value shall represent Coordinated Universal Time (UTC). If tzname is a string of length zero, the return value shall represent the same timezone as is chosen by tzset().
This interface is designed so that newtz(getenv("TZ")) will return an object describing the default time zone object that non-thread-aware versions of the time functions will use by default, provided TZ (if set) is set to a valid time zone name.
I would suggest that requiring applications to check the ennvironment -- or even assuming that there is a meaningful environment, which there may not be in some profiles -- is probably a bad idea and potentially prone to error. (Consider: getenv("ZT") would be a common typo which would not be recognized by a compiler.)
#include <time.h> void freetz(struct tz* tzobj);
The same namespace comments apply here as well.
If the system defines the tm_zone field of struct tm, this function invalidates the strings pointed to by the tm_zone field of all struct tm values created by localtime_z called with this tzobj.
The result of accessing any freed memory is undefined, so such language is not necessary and would probably reduce standardizability.
#include <time.h> struct tz* duptz(const struct tz* tzobj);
I'm not sure how really useful this interface is. In many other places in C and POSIX we define opaque structures without any sort of ``duplicate'' mechanism, and leave the application to do reference counting if it so wishes. (Viz., the `FILE *' interfaces.)
struct tm * localtime_z(const time_t *clock, struct tm *result, const struct tz *tz);
Equivalent to localtime_r() or gmtime_r(), in the time zone represented by tz, except that tz->tz_name is not modified.
There doesn't seem to me to be any benefit in this restriction, and a program which is adopting this interface may well need to interact or be linked with libraries developed to the old interface. If you eliminate this restriction, then you no longer need to duplicate the strftime() interfacem, since the returned `struct tm' contains all the necessary information.
char * ctime_z(const time_t *clock, char *buf, const struct tz *tz);
char * asctime_z(const struct tm *tm, char *buf, const struct tz *tz);
As Joseph Myers pointed out, these interfaces are redundant with strftime() and an appropriate format specifier. Don't forget that for C99 you'll want to add an appropriate `restrict' qualifier or two.
Name tzuse -- use time zone object in current thread.
Synopsis #include <time.h> void tzuse(const struct tz *tz);
Rather than having a specific function, one might instead define a specific pre-instantiated key such that one can call `pthread_setspecific(PTHREAD_DEFAULT_TIMEZONE, tz)' with the consequences you describe. This makes it possible for the application to find out what the current timezone is, simply by calling `pthread_getspecific(PTHREAD_DEFAULT_TIMEZONE)'. You would need to specify what the thread-termination consequences are. -GAWollman
On Thursday, June 7 2001, "Garrett Wollman" wrote to "Jonathan Lennox, tz@elsie.nci.nih.gov" saying:
<<On Thu, 7 Jun 2001 12:09:33 -0400 (EDT), Jonathan Lennox <lennox@cs.columbia.edu> said:
Three functions manipulate a struct tz:
newtz -- create a time zone object
Poor choice of name. I believe that POSIX now requires any new interfaces to use namespace prefixes to avoid taking away more application namespace.
Okay. Marcus Kuhn's proposal used tz_*, so I'd be willing to go with that.
#include <time.h>
...probably a different header file should be used as well.
It would be, in practice, at first, clearly. But presumably if this actually got accepted by any standards body they'd want it to go into <time.h>.
It has one externally-visible element: tz_name,
A structure of this sort should be opaque. Define accessor functions, not the members of the structure.
I think, on reflection, that the publically-visible tz_name field was unnecessary, so I'll drop it. It was intended to reflect 'extern char *tzname[2]', but that's really a backward-compatibility hack anyway. The tm_zone and tm_gmtoff fields of struct tm are much more generally-applicable.
Unfortunately, POSIX is rather too fond of defining foo_t typedefs for things (like pointers to opaque structures) which should not be typedef'ed, so `struct tz' would probably get transmogrified by standards committees into `timezone_t' or some similar nonsense.
timezone_t is fine.
As I think about it, I think a better alternative would be parallel to how setlocale() works:
If tzname is a null pointer, the return value shall represent Coordinated Universal Time (UTC). If tzname is a string of length zero, the return value shall represent the same timezone as is chosen by tzset().
We want there to be a way of getting the system time -- the equivalent of 'unsetenv("TZ"); tzset();'. That said, I suppose defining the interface such that tzset() doesn't depend on "TZ" is a good idea. (I had thought that tzset() and "TZ" were both POSIX, but is one of them ANSI?)
If the system defines the tm_zone field of struct tm, this function invalidates the strings pointed to by the tm_zone field of all struct tm values created by localtime_z called with this tzobj.
The result of accessing any freed memory is undefined, so such language is not necessary and would probably reduce standardizability.
I think you missed my point here. My point is that mktime_z() fills in the tm.tm_zone field of struct tm, which is a char *. Since there's no function tm_free(), you can't dynamically allocate the tm.tm_zone. Thus, either we need to have tm_zone be a static buffer, or we need to have it point to some memory defined elsewhere. The natural thing to do -- reflecting most closely, I think, what the current tzcode does -- is to have a private (e.g.) tz._tz_zonenames[] field of struct tz, and then have tm.tm_zone point to the appropriate tz._tz_zonename field. However, you'd want _tz_zonename to be freed when you call tz_free(), which would invalidate the pointer inside the tm. The other possibility would be to deprecate tm.tm_zone and leave it always NULL, with the new functions, since it's badly designed from the point of view of localization anyway. We'd say that if you want the time zone name, you should call strftime with an appropriate format string.
#include <time.h> struct tz* duptz(const struct tz* tzobj);
I'm not sure how really useful this interface is. In many other places in C and POSIX we define opaque structures without any sort of ``duplicate'' mechanism, and leave the application to do reference counting if it so wishes. (Viz., the `FILE *' interfaces.)
The idea was to make it easy to write a copy constructor for a C++ wrapper to the object, but yes, a reference-counting implementation would be easy enough.
struct tm * localtime_z(const time_t *clock, struct tm *result, const struct tz *tz);
Equivalent to localtime_r() or gmtime_r(), in the time zone represented by tz, except that tz->tz_name is not modified.
There doesn't seem to me to be any benefit in this restriction, and a program which is adopting this interface may well need to interact or be linked with libraries developed to the old interface. If you eliminate this restriction, then you no longer need to duplicate the strftime() interfacem, since the returned `struct tm' contains all the necessary information.
I'm not quite sure I follow what your point was, here. The point I was making is that localtime() and localtime_r are defined to modify the appropriate (is_dst'th) field of 'extern char *tzname[2]', but I was saying that localtime_z doesn't do this, or the equivalent of it.
char * ctime_z(const time_t *clock, char *buf, const struct tz *tz);
char * asctime_z(const struct tm *tm, char *buf, const struct tz *tz);
As Joseph Myers pointed out, these interfaces are redundant with strftime() and an appropriate format specifier.
Yes, okay. (Would it be worthwhile having a #define for the asctime/ctime format?)
Don't forget that for C99 you'll want to add an appropriate `restrict' qualifier or two.
Right.
Name tzuse -- use time zone object in current thread.
Synopsis #include <time.h> void tzuse(const struct tz *tz);
Rather than having a specific function, one might instead define a specific pre-instantiated key such that one can call `pthread_setspecific(PTHREAD_DEFAULT_TIMEZONE, tz)' with the consequences you describe. This makes it possible for the application to find out what the current timezone is, simply by calling `pthread_getspecific(PTHREAD_DEFAULT_TIMEZONE)'. You would need to specify what the thread-termination consequences are.
Hm, interesting. I suppose that would work, given that that's the natural way to implement it. -- Jonathan Lennox lennox@cs.columbia.edu
On Thursday, June 7 2001, "Garrett Wollman" wrote to "Jonathan Lennox, tz@elsie.nci.nih.gov" saying:
<<On Thu, 7 Jun 2001 15:04:44 -0400 (EDT), Jonathan Lennox <lennox@cs.columbia.edu> said:
timezone_t is fine.
It may be fine for you, but it's a real PITA for anyone trying to write reasonable header files for an unrelated library.
I'm not sure I follow you...you'd prefer the 'struct timezone' (or whatever) name, but you think it wouldn't get past a standards committee? -- Jonathan Lennox lennox@cs.columbia.edu
<<On Thu, 7 Jun 2001 15:28:52 -0400 (EDT), Jonathan Lennox <lennox@cs.columbia.edu> said:
I'm not sure I follow you...you'd prefer the 'struct timezone' (or whatever) name, but you think it wouldn't get past a standards committee?
Precisely. The reason is that the C language has opaque (incomplete) structures, but for technical reasons it cannot have opaque typedefs, and multiple typedefs for the same name in the same scope are not permitted even if they resolve to the same type. (Unless C99 fixed that bug...?) (Thus, if I am designing a library which exports a function that references a timezone, I can't use the `timezone_t' type without pulling in large amounts of namespace pollution, but I can easily do so with either a plain string, or a `struct timezone *'. Using a `void *' would also work, at the cost of type safety.) -GAWollman
Garrett Wollman said:
Precisely. The reason is that the C language has opaque (incomplete) structures, but for technical reasons it cannot have opaque typedefs,
Huh ? struct timezone; // Opaque structure typedef struct timezone timezone_t; // Opaque typedef
and multiple typedefs for the same name in the same scope are not permitted even if they resolve to the same type. (Unless C99 fixed that bug...?)
It's not a bug, it's a feature. Just the same as you can't declare the same variable twice in a block.
(Thus, if I am designing a library which exports a function that references a timezone, I can't use the `timezone_t' type without pulling in large amounts of namespace pollution, but I can easily do so with either a plain string, or a `struct timezone *'.
Um, you mean that you want to be able to refer to the type without including the header ? Then just write both the above lines. But people shouldn't be using your function without using your (public) header anyway. You *are* using public and private headers, aren't you ? That is, users of your library include <timezone.h>, which contains the above lines and any function declarations you like. Your implementation includes that and also includes a <timezone_p.h> or suchlike that redeclares struct timezone with its contents. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8371 1138 Internet Expert | Home: <clive@davros.org> | Fax: +44 20 8371 1037 Demon Internet | WWW: http://www.davros.org | DFax: +44 20 8371 4037 Thus plc | | Mobile: +44 7973 377646
<<On Fri, 8 Jun 2001 11:02:09 +0100, "Clive D.W. Feather" <clive@demon.net> said:
struct timezone; // Opaque structure typedef struct timezone timezone_t; // Opaque typedef
Not at all. That typedef is not opaque, it is entirely transparent, and simply resolves to an incomplete structure. Consider the following two-line compilation unit: struct timezone *foo; // Compiles fine... timezone_t foo; // Syntax error!
Um, you mean that you want to be able to refer to the type without including the header ? Then just write both the above lines.
Then it is no longer opaque, and will result in errors when a user program which actually uses the `timezone_t' interface includes the header that declares it.
You *are* using public and private headers, aren't you ? That is, users of your library include <timezone.h>
No, users of my library include <foo.h>. It defines some interfaces which take a timezone parameter. It also defines some interfaces which don't. The header for my library should stand alone; only clients which need the timezone interfaces should include <timezone.h> (or any other header). Clients of my library which do not need the timezone interfaces should not suffer the namespace pollution of the timezone header. This is the general case of the `FILE' botch. -GAWollman
Garrett Wollman said:
struct timezone; // Opaque structure typedef struct timezone timezone_t; // Opaque typedef
Not at all. That typedef is not opaque, it is entirely transparent, and simply resolves to an incomplete structure. Consider the following two-line compilation unit:
struct timezone *foo; // Compiles fine... timezone_t foo; // Syntax error!
Not a syntax error, a declaration error. Try: clive@finch-staff-1> cat xx.c struct timezone; typedef struct timezone timezone_t; struct timezone s; struct timezone *sp; extern struct timezone es; extern struct timezone *esp; timezone_t t; timezone_t *tp; extern timezone_t et; extern timezone_t *etp; clive@finch-staff-1> cc xx.c xx.c:4: storage size of `s' isn't known xx.c:8: storage size of `t' isn't known clive@finch-staff-1> Nothing to do with typedef or not, just to do with the rules for declarations.
You *are* using public and private headers, aren't you ? That is, users of your library include <timezone.h>
No, users of my library include <foo.h>. It defines some interfaces which take a timezone parameter.
So in foo.h write the one line: typedef struct timezone timezone_t; End of story. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8371 1138 Internet Expert | Home: <clive@davros.org> | Fax: +44 20 8371 1037 Demon Internet | WWW: http://www.davros.org | DFax: +44 20 8371 4037 Thus plc | | Mobile: +44 7973 377646
<<On Fri, 8 Jun 2001 16:10:59 +0100, "Clive D.W. Feather" <clive@demon.net> said:
So in foo.h write the one line:
typedef struct timezone timezone_t;
So it's perfectly OK for me to also write: typedef struct __sFILE FILE; ...in my headers as well? You know very well that this is not portable. The implementation (which controls <timezone.h> in the scenario we're discussing) might choose to define timezone_t some other way -- perhaps they make it a `union __timezone *' instead. Even if it were standardized, this would still break when an application actually needed <timezone.h> and my declaration in <foo.h> became redundant. In any case, only a faulty standard would declare that `timezone_t' is always a `struct timezone'. A good standard would not have a `timezone_t' to begin with. (In the FreeBSD Project, and probably many other places, our coding standard is very explicit: don't use typedefs for structures. This deficiency in C is precisely the reason for making that requirement.) -GAWollman
From: Jonathan Lennox <lennox@cs.columbia.edu> Date: Thu, 7 Jun 2001 15:04:44 -0400 (EDT)
As I think about it, I think a better alternative would be parallel to how setlocale() works:
If tzname is a null pointer, the return value shall represent Coordinated Universal Time (UTC). If tzname is a string of length zero, the return value shall represent the same timezone as is chosen by tzset().
We want there to be a way of getting the system time -- the equivalent of 'unsetenv("TZ"); tzset();'.
The null pointer sounds like a good way to do that, as it "feels like" having an unset TZ. We shouldn't use the empty string for that, as TZ='' has a special meaning to the tz code (namely, assume GMT without leap seconds, regardless of whether the underlying system uses leap seconds), and newtz("") should use this interpretation rather than supply its own.
That said, I suppose defining the interface such that tzset() doesn't depend on "TZ" is a good idea. (I had thought that tzset() and "TZ" were both POSIX, but is one of them ANSI?)
Neither is in the C standard. I don't know what you mean by "tzset() doesn't depend on 'TZ'" but I don't think we should modify tzset's specification. tzset is an anachronism. If you can assume the current POSIX interface, you don't need tzset at all. The only reason we should mention tzset at all is to specify how it interacts (if at all) with the new features. (Perhaps this is what you were saying....)
The motivation for this section is to allow existing code which uses the time functions to work with thread-safe timezones. This model allows you, e.g., to set a thread timezone and then call a function in an external library to which you don't have source.
Suggestion: create a "rationale" section and put the above comment into that section. (Also, fold all our other comments into the rationale while you're at it. :-)
From: Jonathan Lennox <lennox@cs.columbia.edu> Date: Thu, 7 Jun 2001 12:09:33 -0400 (EDT)
* Are people interested in this? * Would people be interested in seeing the resulting code? * Would people be interested in helping write the code? * Would the resulting code (assuming it's written sensibly) be acceptable/appropriate for incorporation into tzcode?
Yes, several people have proposed something along those lines. (I'm one of them -- see <http://www.twinsun.com/tz/timeapi.html>.) However, nobody has gotten anything running yet. Part of the problem with many of the proposals (mine included) is that they're perhaps too ambitious. There are a lot of problems with the existing interfaces, and it's a pain to solve them all at once.
* Do people think that my proposal is a sensible API?
Here are some thoughts: * Is this spec compatible with Drepper's proposal? This is a relevant issue, since time zone names ought to be part of the locale. For example, where you say "EDT", a French Canadian would say "HAE". Thus, for example, for full generality it seems to me that localtime_z would need to have a locale parameter or have access to the locale somehow. Admittedly the current time zone database does not support any locale other than English, but we've seen proposals for fixing this and it will probably happen some day. * Conversely, if you merge Drepper's work with your proposal, then 'struct tz' should be part of his locale object, so that there is no need for a separate 'struct tz', or a separate 'localtime_z', etc. Perhaps this is too drastic (as you may not want to assume Drepper's approach), but treating TZ like we treat LC_CTYPE etc. would have real advantages if we were starting from green fields. * I see no reason for struct tz to make tz_name visible. If a user wants the time zone name, he can use strftime_z with the "%Z" format. Many time zones have more than two names, so it's incorrect for tz_name to have just two elements anyway. * I would drop 'tzuse', or at least make it optional and dependent on support for POSIX threads. If you drop it, you don't have to worry about threads at all; you'll simply have thread-safe functions. It's not worth the hassle of interacting with all the different thread implementations out there. * Frankly, I don't know how POSIX functions like localtime_r are supposed to interact with setting the TZ variable. It seems to me that the POSIX spec is unclear here. This murk is not your fault, but it seems to me that if you want to address the thread problem that this issue should be made crystal clear in any thread-safe spec. * 'strptime_z' should be marked as being POSIX-only; it needn't be available on ISO-C-only systems. * What is the use of 'duptz'? Can you give an example that cannot easily be expressed without 'duptz'? * A nit: you mention "GMT0", but the latest POSIX draft says that "date -u" acts as if TZ is either "UTC0" or "GMT0", with "UTC0" preferred.
Date: Thu, 7 Jun 2001 17:30:58 +0100 (BST) From: "Joseph S. Myers" <jsm28@cam.ac.uk>
* Don't touch how timestamps are represented (any interface can be adapted to use any time_t replacement that gets agreed).
Presumably if we have an xtime_t, then the xtime_t functions will address all the issues Lennox raises, because they'll have merged or subsumed his ideas.
* Provide a struct tm replacement with (a) subsecond resolution and (b) a proper field indicating which repetition of a repeated timestamp is referred to (A/B in German time notation), rather than the inadequate indication of whether the time is in daylight savings. * Provide four conversion functions: between time_t and broken down times, in either direction, and equivalents of strftime and wcsftime. Don't duplicate other functions such as asctime that can easily be replicated. * Use an C99 snprintf-style return value (return the length of buffer required if the buffer isn't long enough) rather than what strftime currently does (return 0 if the buffer isn't long enough).
These changes all sound reasonable to me -- though they make the proposal more ambitious. If we go down this route, though, we should definitely look at Kuhn's and my proposals (both of which are too ambitious, in my opinion). I could try to merge the three, or perhaps you'd like to take a crack at it.
* Provide specified timezone names for both the user's local timezone and the system's local timezone.
Doesn't his proposed API already do this, with newtz(getenv("TZ")) and newtz(NULL)?
On Thursday, June 7 2001, "Paul Eggert" wrote to "lennox@cs.columbia.edu, tz@elsie.nci.nih.gov" saying:
* Is this spec compatible with Drepper's proposal? This is a relevant issue, since time zone names ought to be part of the locale. For example, where you say "EDT", a French Canadian would say "HAE". Thus, for example, for full generality it seems to me that localtime_z would need to have a locale parameter or have access to the locale somehow. Admittedly the current time zone database does not support any locale other than English, but we've seen proposals for fixing this and it will probably happen some day.
I'm pretty sure that only strftime/wcsftime need to have locale-specific information. (Especially if we drop time zone names from anything but strftime().) Actually, I suppose strptime/wcsptime might need it too, but that's a bit more confusing. I was thinking of something along the lines of strftime_zl(char *buf, size_t maxsize, const char *format, const struct tm* timeptr, const timezone_t tz, const locale_t locale); (merging strftime_z and strftime_l), but maybe that's awful...
* Conversely, if you merge Drepper's work with your proposal, then 'struct tz' should be part of his locale object, so that there is no need for a separate 'struct tz', or a separate 'localtime_z', etc. Perhaps this is too drastic (as you may not want to assume Drepper's approach), but treating TZ like we treat LC_CTYPE etc. would have real advantages if we were starting from green fields.
It seems a bit drastic for my purposes. Also, timezones feel rather different from locales to me.
* I see no reason for struct tz to make tz_name visible. If a user wants the time zone name, he can use strftime_z with the "%Z" format. Many time zones have more than two names, so it's incorrect for tz_name to have just two elements anyway.
Fair enough.
* I would drop 'tzuse', or at least make it optional and dependent on support for POSIX threads. If you drop it, you don't have to worry about threads at all; you'll simply have thread-safe functions. It's not worth the hassle of interacting with all the different thread implementations out there.
Either dependent on POSIX threads, or else a simple (rough) equivalent of tzset() without them.
* Frankly, I don't know how POSIX functions like localtime_r are supposed to interact with setting the TZ variable. It seems to me that the POSIX spec is unclear here. This murk is not your fault, but it seems to me that if you want to address the thread problem that this issue should be made crystal clear in any thread-safe spec.
Hm. What does tzcode do?
* 'strptime_z' should be marked as being POSIX-only; it needn't be available on ISO-C-only systems.
Is strptime actually even POSIX? I thought it was an extension. Or has the Austin group added it?
* What is the use of 'duptz'? Can you give an example that cannot easily be expressed without 'duptz'?
It was following Drepper's duplocale(), which was intended to make it easy to implement copy constructurs for C++ locale objects. It can be dropped easily enough.
* A nit: you mention "GMT0", but the latest POSIX draft says that "date -u" acts as if TZ is either "UTC0" or "GMT0", with "UTC0" preferred.
Fair enough. -- Jonathan Lennox lennox@cs.columbia.edu
<<On Thu, 7 Jun 2001 15:15:59 -0400 (EDT), Jonathan Lennox <lennox@cs.columbia.edu> said:
It seems a bit drastic for my purposes. Also, timezones feel rather different from locales to me.
I would agree with this statement. For example, the timezone America/Montreal is appropriate for both en_CA and fr_CA locales, and conversely there are a dozen timezones which are all used concurrently with the en_CA locale. Moreover, if TZ is set to "<FOO>+4<BAR>" then no localization is possible. -GAWollman
<<On Thu, 7 Jun 2001 15:28:20 -0400 (EDT), Garrett Wollman <wollman@khavrinen.lcs.mit.edu> said:
For example, the timezone America/Montreal is appropriate for both en_CA and fr_CA locales
One more thing I forgot: even if my locale is en_US, I might still wish to find out the time in Europe/Paris or Asia/Tokyo and have it displayed in a manner which is appropriate to *my* locale, not the prevailing locale in those places. -GAWollman
Date: Thu, 7 Jun 2001 15:28:20 -0400 (EDT) From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
It seems a bit drastic for my purposes. Also, timezones feel rather different from locales to me.
I would agree with this statement. For example, the timezone America/Montreal is appropriate for both en_CA and fr_CA locales,
No, because with TZ='America/Montreal' the 'date' command and the strftime "%Z" format both output EST/EDT, whereas a French Canadian would prefer HNE/HAE. See the Canadian standard CAN/CSA-Z234.4-89.
and conversely there are a dozen timezones which are all used concurrently with the en_CA locale.
Yes, under the POSIX model, the fr_CA locale does not specify the spelling of time zone abbreviations; all it does (as of POSIX 1003.1-200x draft 6) is place certain constraints on the abbreviations. However, we're not talking only about the POSIX model here: we're also talking about the Olson extension to the POSIX model, and we should design an interface that works well for it, too. One of the things about the Olson model is that it's easier to use for English-speakers: you just say TZ='America/Montreal' instead of having to say something like TZ='EST5EDT,M4.1.0,M10.5.0'. Unfortunately, though, the Olson extension currently doesn't work well for French Canadians who want HNE/HAE. They have to fall back on a POSIX setting like TZ='HNE5HAE,M4.1.0,M10.5.0'. But this mishandles timestamps before 1987, and also it's a confusing interface that is hard to get right. I think a French Canadian should get HNE/HAE by setting TZ='America/Montreal' and LC_TIME='fr_CA'. This doesn't happen now, but POSIX allows this behavior, and (from a user's point of view) it is quite desirable. This is on my list of things to add to the Olson extension. Hence there ought to be an interaction between time zones and locales of some sort, and any thread-safe strftime replacement should be able to access a thread-local time zone and locale information, e.g. by having one or two extra arguments.
<<On Thu, 7 Jun 2001 13:29:45 -0700 (PDT), Paul Eggert <eggert@twinsun.com> said:
No, because with TZ='America/Montreal' the 'date' command and the strftime "%Z" format both output EST/EDT, whereas a French Canadian would prefer HNE/HAE. See the Canadian standard CAN/CSA-Z234.4-89.
The abbreviation is localized, but the timezone itself (in the specialized sense we use here) is locale-independent.
I think a French Canadian should get HNE/HAE by setting TZ='America/Montreal' and LC_TIME='fr_CA'.
I think we are in violent agreement.
Hence there ought to be an interaction between time zones and locales of some sort, and any thread-safe strftime replacement should be able to access a thread-local time zone and locale information, e.g. by having one or two extra arguments.
From a maintenance perspective, I don't think that localized abbreviations are properly part of the timezone, but rather part of the LC_TIME locale information used by strftime(). My reason is that most of the timezone abbreviations are unlikely to be localized; each locale will have a set of timezones which are relevant to the culture of the locale, and which will be localized, but the vast majority of the timezone database would not be localized in most locales.
-GAWollman
Garrett Wollman scripsit:
[E]ach locale will have a set of timezones which are relevant to the culture of the locale, and which will be localized, but the vast majority of the timezone database would not be localized in most locales.
Actually, I suspect that many timezones have no local abbreviations whatever, because there is only one timezone in force in the locale, and it is simply "the time". -- John Cowan cowan@ccil.org One art/there is/no less/no more/All things/to do/with sparks/galore --Douglas Hofstadter
From: Jonathan Lennox <lennox@cs.columbia.edu> Date: Thu, 7 Jun 2001 15:15:59 -0400 (EDT)
* Frankly, I don't know how POSIX functions like localtime_r are supposed to interact with setting the TZ variable. It seems to me that the POSIX spec is unclear here. This murk is not your fault, but it seems to me that if you want to address the thread problem that this issue should be made crystal clear in any thread-safe spec.
Hm. What does tzcode do?
It doesn't worry about threads, so it mishandles this case: two different threads can clobber the same internal structure. I don't offhand know what other systems do.
* 'strptime_z' should be marked as being POSIX-only; it needn't be available on ISO-C-only systems.
Is strptime actually even POSIX? I thought it was an extension. Or has the Austin group added it?
Yes, it's in the latest draft.
<<On Thu, 7 Jun 2001 13:44:48 -0700 (PDT), Paul Eggert <eggert@twinsun.com> said:
It doesn't worry about threads, so it mishandles this case: two different threads can clobber the same internal structure.
I don't offhand know what other systems do.
At least in FreeBSD, these functions are serialized for thread safety. (That's one of the reasons we haven't updated to a recent tzcode in years.) Obviously, functions like localtime() which return static buffers can never be made thread-safe. -GAWollman
On Thursday, June 7 2001, "Garrett Wollman" wrote to "Paul Eggert, tz@elsie.nci.nih.gov" saying:
<<On Thu, 7 Jun 2001 13:44:48 -0700 (PDT), Paul Eggert <eggert@twinsun.com> said:
It doesn't worry about threads, so it mishandles this case: two different threads can clobber the same internal structure.
I don't offhand know what other systems do.
At least in FreeBSD, these functions are serialized for thread safety. (That's one of the reasons we haven't updated to a recent tzcode in years.) Obviously, functions like localtime() which return static buffers can never be made thread-safe.
Actually, localtime() can be made thread-safe -- it you just have to return a pthread_getspecific() buffer rather than a static buffer. And in fact, this is exactly what FreeBSD does. What I think you meant is that localtime() can't be made *reentrant*? -- Jonathan Lennox lennox@cs.columbia.edu
Jonathan Lennox said:
One of the major shortcomings of the current time zone API defined by ISO C [...] Therefore, I've written up a proposal (attached) for a thread-safe API for time zone functions.
Please don't. There are a number of known deficiencies with the C time stuff, and it really needs redesigning from scratch with a solid set of concepts behind it. For various reasons I'm supposed to be organising a group to do that, but it's way way down my list of priorities. Any new design will, of course, need to address these issues, but I don't think there's any point in doing this sort of ad hoc work to solve just one item. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8371 1138 Internet Expert | Home: <clive@davros.org> | Fax: +44 20 8371 1037 Demon Internet | WWW: http://www.davros.org | DFax: +44 20 8371 4037 Thus plc | | Mobile: +44 7973 377646
On Friday, June 8 2001, "Clive D.W. Feather" wrote to "Jonathan Lennox, tz@elsie.nci.nih.gov" saying:
Jonathan Lennox said:
One of the major shortcomings of the current time zone API defined by ISO C [...] Therefore, I've written up a proposal (attached) for a thread-safe API for time zone functions.
Please don't.
There are a number of known deficiencies with the C time stuff, and it really needs redesigning from scratch with a solid set of concepts behind it. For various reasons I'm supposed to be organising a group to do that, but it's way way down my list of priorities.
Any new design will, of course, need to address these issues, but I don't think there's any point in doing this sort of ad hoc work to solve just one item.
Well, this is the thing. I need C-based thread-safe time zone functions for a project I'm working on. I figured a) I'd base my code on tzcode, b) it'd be sensible to contribute my modifications back to tzcode, and c) it'd be good to have a sensible API for my code. I don't have the time or experience to do a full re-write of the C time stuff, though. I've tried to model my (revised) API (to be sent out to the list shortly) on a subset and simplification of Markus Kuhn's proposed time zone model, to keep upward extensibility to a full, sensible API. Given that I need to write this code anyway, I don't see any benefit (to me or the community) from keeping it private. -- Jonathan Lennox lennox@cs.columbia.edu
After the comments on this list (and several sent by private e-mail) I've revised yesterday's thread-safe time zone API document. I thank everyone who commented; I think this document is a lot stronger and clearer. A list of changes from yesterday's version are given at the end of the document. Once again, comments are very welcome. A proposal for thread-safe time zone information. A set of extensions to ISO C (99) and IEEE POSIX (200x). by Jonathan Lennox, Columbia University. Version 2. Summary: A new data type is defined, 'struct timezone', that represents the time zone information for a particular region. Versions of the ISO C time conversion functions are defined that take this data type as an argument. Definitions: Time zone: The set of information necessary to correctly convert bidirectionally between a time_t and a struct tm, for a particular geographic region. Wall clock time: The system's best approximation of the local time in its physical location, independent of the physical location of any user. Functions defined in this proposal: Functions manipulating struct timezone values: Name tz_prep -- prepare a time zone object, based on a descriptive string. Synopsis #include <time.h> int tz_prep tz_prep(struct timezone** tz, const char *tzstring); Description The tz_prep() function creates a new time zone object, corresponding to the given time zone name. tzstring is a pointer to a string, or NULL. The two defined values of tzstring are NULL, representing the system's best approximation of its wall clock time, and the zero-length string "", representing the system's best approximation of Coordinated Universal Time (UTC). On successful completion, *tz will contain a pointer to a struct timezone, contains all necessary information to represent times in the specified time zone. It has no externally-visible elements. POSIX extension: In a POSIX-based environment, the syntax of tzstring is the same as that of the "TZ" environment variable, including the implementation-defined values beginning with ':'. Return Value On successful completion, tz_prep() returns a value of 0 and fills in *tz. On failure, an error number is returned and no resources are allocated. Errors The tz_prep() function shall fail if: ENOMEM Not enough memory is available to create the time zone object. ENOENT No known time zone corresponds to tzname. The tz_prep() function may fail if: * There is a problem with the system's configuration (such as an on-disk time zone database) which caused retrieval of the time zone information to fail unexpectedly. In this case any appropriate error number may be returned. Rationale and commentary Systems which define extensions beyond POSIX (such as the Olson time zone names) for the "TZ" environment variable should support the same extensions for tzstring. Under the POSIX rules, strings such as "UTC0" can also represent UTC, but the empty string "" is the portable representation. This representation is also correctly locale-independent -- "UTC0" in fr_FR generates times with the incorrect time zone abbreviation "UTC", whereas "" in that locale should use the correct time zone abbreviation "TUC" if the time zone code correctly supports locales. This interface is designed so that tz_prep(getenv("TZ")) will return an object describing the default time zone object that non-thread-aware versions of the time functions will use by default, provided TZ (if set) is set to a valid time zone name. Question for discussion: should there be an ISO C way of saying "the timezone which localtime() and mktime() would use by default"? Currently this is only possible for POSIX, using thee tz_prep(getenv("TZ")) idiom mentioned above. Name tz_free -- Free resources allocated for a time zone object Synopsis #include <time.h> void tz_free(struct timezone* tzobj); Description The tz_free() function frees the resources allocated for a time zone object returned by a call to tz_prep(). Return Value None. Errors None. Rationale and commentary None needed. Functions using struct timezone to manipulate time values: Name time_make - derive a time_t from a struct tm, in a specified time zone. Synopsis #include <time.h> int time_make(time_t *clock, struct tm *tm, const timezone_t *tz); Description This function interprets the broken-down time in *tm as a local time in the timezone specified in *tz, and writes the corresponding value into *clock, using the same encoding as that of the values returned by the 'time' function. The original values of the tm_wday and tm_yday components of *tm are ignored, and the original values of the other components are not restricted to their normal ranges. (A positive or zero value for tm_isdst causes mktime() to presume initially that summer time (for example, Daylight Saving Time) is or is not in effect for the specified time, respectively. A negative value for tm_isdst causes the mktime() function to attempt to divine whether summer time is in effect for the specified time.) On successful completion, the values of the tm_wday and tm_yday components of *tm are set appropriately, and the other components are set to represent the specified calendar time, but with their values forced to their normal ranges; the final value of tm_mday is not set until tm_mon and tm_year are determined. Returns time_make returns 0 on sucessful completion, sets *clock, and normalizes *tm. On failure, it returns an error number and leaves *clock and *tm unmodified. Errors time_make() shall fail if: ERANGE *tm does not represent a time representable by a time_t value. time_make() may fail if: EINVAL *tm does not represent a possible time in the timezone *tz. (For instance, a leap-forward interval.) Rationale and commentary This function is the generalization of the ISO C function mktime() and the BSD/tzcode function timegm(). The name and calling conventions of this function are inspired by Markus Kuhn's proposed xtime_prep(). struct xtime is a much more sophisticated and better-defined representation of time than time_t; this proposal is designed to be less ambitious while still leaving room for these future improvements. Following Kuhn, this function corrects a flaw of mktime()/timegm(). Those functions use the value (time_t)-1 to represent an error return status. However, this value can also be a correct translation of a struct tm representing the time December 31, 1969, 23:59:59 GMT (assuming POSIX time_t's). This function, instead, uses an out-of-band method to indicate error conditions, leaving the entire time_t space free to represent valid values. Given an impossible time value for the time zone (e.g. one that occurs during a leap-forward interval), implemenetations have a choice of failing and returning ERANGE, or normalizing *tm to a nearby valid time. Name time_breakup - derive a struct tm from a time_t, in a specified time zone. Summary #include <time.h> int time_breakup(struct tm *result, const time_t *clock, const struct timezone *tz); Description Fill in *result with the values corresponding to *clock in the time zone *tz. Return value time_breakup() returns 0, and fills in *result, always. Errors None. Rationale and commentary This function is the generalization of the ISO C functions localtime() and gmtime(), and the POSIX functions localtime_r() and gmtime_r(). It is inspired by Markus Kuhn's xtime_breakup(). The function assumes that a 'struct timezone' will be able to do something reasonable over the whole range of a time_t. This isn't necessarily true; it might be sensible to define an ERANGE error return value? This function omits the localtime side effect that it sets the value of the global variable char *tzname[2]. It's not clear how best to handle the BSD/tzcode tm_zone field of struct tm. This is defined as a char*, but since there is no tm_free() function, the data it points to cannot be dynamically allocated by time_breakup(). One possibility would be to have tm_zone point into data stored within 'struct tz', but then it would be invalidated when tz_free() was called. A second possibility would be to define tm_zone as char tm_zone[TZNAME_MAX], but this would break binary compatibility. A third possibility would be to deprecate tm_zone, and have time_breakup leave it NULL. Zone names can be obtained by calling strftime with a "%Z" format string (assuming POSIX strftime). Name strftime_z - generate a text representation of a time value, in a given time zone. Summary #include <time.h> size_t strftime_z(char * restrict buf, size_t maxsize, const char * restrict format, const struct tm * restrict timeptr, const struct timezone * restrict tz); Description strftime_z() formats the information from 'timeptr' into the buffer 'buf' according to the format specified by 'format' and the system locale. The format specified by 'format' is the same as that of strftime(). POSIX note: All POSIX extensions to strftime() apply to strftime_z() as well. Return value On successful completion, strftime_z fills in 'buf' and returns the number of bytes converted. On failure, strftime returns the number of bytes of buffer that would be required to fully perform the conversion (including the terminating NUL), and leaves 'buf' in an indeterminate state. Errors None. Rationale and commentary This function is a generalization of strftime(). The return value of this function has changed from strftime(), however. strftime() returns 0 if the buffer is not large enough. This function, instead, follows the example of ISO C 99's snprintf(), which allows an appropriately-sized buffer to be allocated in one step after a failure, rather than requiring a binary search. The error status can be easily checked by checking (ret <= maxbuf). The formatting strings of ISO C strftime() are not affected by the time zone, so this function is strictly speaking not necessary in a pure-C environment. POSIX strftime(), however, has the '%z' and '%Z' conversions, which require knowledge of the relevant time zone. It has been argued that this function is redundant, because time_breakup() could embed time zone information into struct tm (either in extension fields or private fields), and thus the standard POSIX strftime could suffice to convert struct tm's in a thread-safe way. However, this is only true for struct tm's created from time_breakup() or the mktime() family. Creation of struct tm values "by hand" is still fairly common, and a time zone specification is needed to print these. Name strftime_zl - generate a text representation of a time value, in a given time zone and locale. Summary #include <time.h> size_t strftime_zl(char * restrict buf, size_t maxsize, const char * restrict format, const struct tm * restrict timeptr, const struct timezone * restrict tz, const locale_t * restrict l); Description (This is an extension based on Ulrich Drepper's thread-aware locale proposal; see rationale.) strftime_z() formats the information from 'timeptr' into the buffer 'buf' according to the format specified by 'format' and the locale specified by 'l'. Other than the source of locale information, this is in all ways identical to strftime_z. Rationale and commentary This function follows Ulrich Drepper's thread-safe locale proposal; it is the combination of his strftime_l and strftime_z defined above. See <http://www.cygnus.com/~drepper/tllocale.ps.bz2> for details. This function should only be implemented if the rest of Drepper's proposal is as well. The _zl suffix is ugly. More natural would be to define a single function which takes all the necessary thread-safe data as arguments, but I don't want to force implementation of this proposal to be blocked on Drepper's. Thread-support functions Name pthread_settz -- set time zone object for the current POSIX thread. Synopsis #include <time.h> #include <pthread.h> int pthread_settz(const struct timezone *tz); Description (This function is only relevant in a POSIX system with the Threads option.) pthread_settz() sets the time zone object to be used for time functions called from the current POSIX thread. The time zone setting for all other threads remains the same. Once pthread_settz() has been called, all calls to the functions localtime(), localtime_r(), ctime(), ctime_r(), asctime(), asctime_r(), mktime(), strftime(), and strptime() in the thread from which it was called will use the specified time zone rather than the process-wide time zone. If tzuse() is called with a NULL pointer as its argument, the current thread will again use the global time zone object. Return value Upon successful completion, pthread_settz() returns a value of zero. Otherwise, an error code is returned to indicate an error. Errors pthread_settz() may fail if: ENOMEM Insufficient memory exists to store the time zone information for the thread. EAGAIN The system lacked the necessary resources to associate the time zone information with the thread. Rationale and commentary This function allows existing code which uses the ISO C and POSIX APIs to work correctly, unmodified, in a threaded environment. It would typically be implemented as a wrapper around pthread_setspecific(). It's not clear if a parallel pthread_gettz() is also needed. General rationale and commentary Thanks to many comments from the people of the tz@elsie.nci.nih.gov mailing list. Much inspiration for this work was drawn from Markus Kuhn's proposed extended time APIs for ISO C 200x, and from Ulrich Drepper's proposed thread-aware locale functions. Markus Kuhn's proposed time zone API defined an additional time zone manipulation function tz_jump(), which gets the next or previous discontinuity in a given time zone. This is a potentially useful function, but it was omitted here so as to keep the API definitions modest. (Additionally, it should probably wait on the proper definition of an extended time type, to avoid unnecessary redundancy.) Time zone specific versions of asctime[_r] and ctime[_r] were omitted, as they can be constructed trivially out of the functions defined here. A time zone specific version of strptime is not necessary. The only zone-aware function of strptime that I know of is the BSD extension to scan "%Z"; POSIX 200x does not define this, and it generally doesn't seem to work very well as zone abbreviations are ambiguous. (BSD strptime only recognizes the current local timezone, in standard and summer forms, and "GMT". Even this is not necessarily unambiguous: consider for example the Australian "EST" meaning both "Eastern Standard Time" and "Eastern Summer Time".) wcsftime_z and wcsftime_zl functions are needed, analogous to strftime_z and strftime_zl for wide strings. Their definitions are obvious analogies to the existing functions. Changes from Version 1: The time-zone creation and destruction functions were renamed from newtz() and freetz() to tz_prep() and tz_free(). This puts them in a name space, and aligns them with Markus Kuhn's functions. (These functions are slightly different from Kuhn's, though, in that they are defined to return errno values on errors.) tzdup() was dropped as not useful. mktime_z() and localtime_z() were renamed to time_make() and time_breakup(), by analogy with Kuhn's xtime_* functions. Zone-specific versions of asctime(), ctime(), and strptime() were dropped as unnecessary. The text was greatly clarified to make clear the distinctions between ISO C extensions, POSIX extensions, and rationale and commentary. The externally-visible tz_name elements of struct tz were dropped. The strftime_zl function was defined. -- Jonathan Lennox lennox@cs.columbia.edu
participants (9)
-
Antoine Leca -
Clive D.W. Feather -
Garrett Wollman -
John Cowan -
John Cowan -
Jonathan Lennox -
Joseph S. Myers -
Markus Kuhn -
Paul Eggert