
Nathan Myers wrote on 1998-09-16 19:10 UTC:
While this new proposal is much, much better than tmx, it still has some problems. They look fixable to me.
First, it doesn't entirely solve the re-entrancy problem. If an error state and error message are to be carried around in the timezone_t object, then a "bad" timezone_t cannot be shared across threads which might have different locales. This part of the interface needs some rework. Given a bad timezone_t value, I don't see how strfxtime should indicate failure for those formats which use the time zone. An alternative interface, in place of tz_prep and tz_error, might be:
timezone_t* tz_construct(const char *restrict tzstring, char *msg, int maxsize)
which returns 0 for failure, and then if msg is non-null, stores a message into it up to max_size characters in length. This way there is never a bad timezone value to handle. (A null timezone is already specified to be treated like UTC.)
I expect tz_error usually to be called immediately after a tz_prep has signalled a problem. If programmers are sharing a bad timezone_t across thread, then honestly, that is their problem. Your concern sounds to me a bit far fetched and there are many obvious programming techniques to avoid the problem. There is no way in C for an API designer to enforce multi-threading safety, all we can do is to provide an API that enables multi-threading safe use of the functions, and that is IMHO good enough. I have indeed thought about a user provided finite buffer, as well as about tz_error doing a malloc. The main reason why I do not like both approaches is my recently gained experience with writing bindings from C libraries to other languages. Let's take Ada for example: All this mess with C returning variable length strings in a multi-threading safe way is a non-problem in Ada. Ada allows functions to return variable length arrays. The way most compilers (e.g., GNU Ada) do is this as follows: There is a secondary stack managed by the run-time library. Before an expression is evaluated which returns variable length arrays, the secondary stack pointer is saved. The space for the variable array to be returned is allocated on the secondary stack and can be used from there by other functions in the expression which use returned result. The secondary stack pointer is restored after the expression has been fully evaluated. If the returned variable length array has to be preserved for further use bejond the expression, it usually has to be copied (or relinked if the secondary stack uses the same storage pool with reference counters as the normal variable length string library). If I have an Ada function that returns the tz_error value, then I first have to call the C tz_error, then I have to find out how long the resulting string is (e.g., with strlen(), perhaps the length should also be returned), then I copy the string on the secondary stack of the Ada runtime-library and return from the function. In your proposal, I would have to introduce arbitrary limits for the length of the returnable string, which are difficult to justify to users of the API in other programming languages where such restrictions have no justification. Or I would have to iterate over the function that accepts a user-provided buffer, to find out how large this buffer has to be. In the end, considering interfacing to programming languages like Ada or Python which can comfortably return strings, I see my approach as preferably conceptually. In practice, error messages will hardly ever be longer than 80 characters, so a 256 limit for the maximum length should never hurt.
Another problem I foresee is that there is no way, given a timezone_t object, to retrieve the string used to construct it. This might best be another strfxtime directive.
Which would mean that we have to force the implementor to store the original string. Is this really necessary? The user has provided the string himself, so why whould he depend on getting it back later. We certainly could easily add another strfxtime conversion specifier, but I wonder whether this is necessary at all.
I don't like to see the %H, %M, and %S formats restricted in their format to only '.' and ',' decimal separators. Separate directives (perhaps %.nH et al) that format only the fractional part would allow users to supply any decimal separator they chose, as in "%H:%M!%.3M" for "03:20!666". (I have seen satellite navigation systems with stranger choices of syntax.)
Thanks, that is a very good suggestion. I also have seem frequently h, m, or s in astronomical software as the separator instead of dot and comma, so the decimal separator should be user provided. Related problem: What is the semantics of decimal fractions of minutes and hours during a leap second. These are obviously illdefined and a neat solution is not possible (applications expecting leap seconds should never use decimal fractions of minutes, hours, and days). The best semantic I can thing of is to use max(nsec, 999_999_999) instead of nsec directly when calculating these decimal fractions. For decimal fractions of seconds, there is no problem, as long as we are with a leap second beyond 59 (which should be guaranteed unless someone introduces a UTC offset that is not an integral multiple of minutes).
Given that the format already specifies 64-bit operations on the more commonly-used component of the time, is there any reason to restrict the resolution of the fractional part to nanoseconds? Clock speeds greater than 1e9 Hz will be common before this interface comes into wide use. It may as well use (say) attoseconds, as in Bernstein's library.
I think attoseconds are horrible overkill. Considering precision: The best atomic clock on this planet (CS1 by PTB in Braunschweig) barely can do UTC with a real-time precision of one nanosecond. GPS provides to civilian users UTC with around 340 ns root mean square error, military users get down to perhaps tens on nanoseconds. Radio clocks like WWV or DCF77 provide UTC with around a millisecond precision, atmospheric path delays are often worse. NTP also works with millisecond precision under good conditions. So considering phase precision, we are many orders of magnitude better with nanoseconds than what is practically required. Considering resolution and uniqueness of local timestamps: It is hard to imagine that mass market silicon microprocessors will leave UHF and break through the 10 GHz barrier for internal clock speed during my lifetime. Reading out an internal monotonic clock counter, converting it to a portable UTC representation, and returning it via a system call interface will certainly take much longer than a few tens of instruction cycles (unless we see full hardware implementations of xtime_get(), for which I see no market justification), therefore processors that can do more than 10**9 calls to xtime_get() sound to me very much like science-fiction at the moment. Nanoseconds sound to me quite sufficient to guarantee unique timestamps with a comfortable safety margin. Considering frequency resolution: A nanosecond is also a nice representation for the phase and nanoseconds per second is a nice representation of the frequency of a kernel clock. If you add an adjustable real second every second to your phase base, then you can adjust the frequency with which your phase base progresses in nanoseconds per second, also known as parts per billion. This is significantly better already than the frequency change in your PC if you open the window and the temperature in the room drops a few Kelvin. Between these per-second adjustments, you do a linear extrapolation using a bus cycle counter and precalculated compensation factors. See the Linux kernel clock PLL for an implementation example, or Mill's papers that I quoted on my page.
The library might also define constants corresponding to one nanosecond, microsecond, and millisecond in whatever unit is used for the fractional part, to minimize user errors.
This is one solution. I would prefer another more general solution to minimize user error here: Allow underscores in numeric literals, like Ada does. I think 1_000_000_000 is much more readable than 1000000000 (it is really one billion?).
Typos: I believe the first paragraph describing representation of leap seconds refers to the member "sec" in one place where it should say "nsec".
I couldn't find this.
Also, the "note" text shows up in my browser in a microscopic font.
I only used the HTML <SMALL> tag and did not specify a specific font size. It is the reponsibility of your browser and its local configuration to select an adequate font size. If you use Netscape under X11, I can probably tell you how to fully configure all font sizes (look into Netscape.ad).
I'm interested in what can be done to improve its suitability for incorporation into a future C++ standard. If the C and C++ bindings could be described simultaneously this would save a lot of trouble in the future.
A C++ binding (and also an Ada95 binding) could use exceptions in order to signal the unavailablibity of a clock in xtime_get. This would leave the return value for the actual clock value. xtime would become a class under C++ and a private record under Ada, and the arithmetic functions and conversion functions to other existing time types would be appropriately overloaded. As I pointed out above, the tz_error message can be returned directly as an array under Ada (I don't think C++ has a comparable allocator-free mechanism). I don't see any immediate improvements that I could make to the C API to make it more suitable for bindings to more modern programming languages. I think though that this is a general design criterion, as the run-time libraries of most modern languages are sitting on top of some underlying C API. Therefore the underlying C API has to be as robust and flexible as possible. In order to not hinder proper implementations of higher language bindings. Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>