
(I'm not on the mailing list. Please CC me on replies.) The `abbreviation' for the Factory timezone isn't valid for a TZ variable specification. I have discovered this because I have a tzfile parser in Perl that follows POSIX rules strictly for the TZ-format rule part, and I've started applying it systematically to the Olson database. This is the only zone that it has a problem with. The POSIX rules for TZ, as described in the Theory file and available online at <http://www.opengroup.org/onlinepubs/000095399/basedefs/ xbd_chap08.html> among other places, prohibit spaces in the abbreviations. Factory's abbreviation is a sentence with words separated by spaces. I see it's grandfathered past zic's test for POSIX validity, but the intent behind that seems to be more about the length than about the spaces. I don't have a problem with the length: my code doesn't impose any fixed length limit. (POSIX allows implementations to limit the length to six bytes or higher, but doesn't oblige them to.) You could fix this by changing the spaces to dashes. The result is POSIX-valid, and also avoids violating the expectations of naive parsers of date(1) output that expect the timezone abbreviation to be just one word. The downside is that it slightly impairs readability to humans. POSIX systems are also still allowed to object to it if they do impose a small TZNAME_MAX. What are your thoughts? -zefram

I'm forwarding this message from Zefram, who is not on the time zone mailing list. Those of you who are on the list, please direct replies appropriately. --ado ________________________________________ From: Zefram [zefram@fysh.org] Sent: Wednesday, August 25, 2010 4:48 PM To: tz@lecserver.nci.nih.gov Subject: factory zone abbreviation (I'm not on the mailing list. Please CC me on replies.) The `abbreviation' for the Factory timezone isn't valid for a TZ variable specification. I have discovered this because I have a tzfile parser in Perl that follows POSIX rules strictly for the TZ-format rule part, and I've started applying it systematically to the Olson database. This is the only zone that it has a problem with. The POSIX rules for TZ, as described in the Theory file and available online at <http://www.opengroup.org/onlinepubs/000095399/basedefs/ xbd_chap08.html> among other places, prohibit spaces in the abbreviations. Factory's abbreviation is a sentence with words separated by spaces. I see it's grandfathered past zic's test for POSIX validity, but the intent behind that seems to be more about the length than about the spaces. I don't have a problem with the length: my code doesn't impose any fixed length limit. (POSIX allows implementations to limit the length to six bytes or higher, but doesn't oblige them to.) You could fix this by changing the spaces to dashes. The result is POSIX-valid, and also avoids violating the expectations of naive parsers of date(1) output that expect the timezone abbreviation to be just one word. The downside is that it slightly impairs readability to humans. POSIX systems are also still allowed to object to it if they do impose a small TZNAME_MAX. What are your thoughts? -zefram

On Wed, Aug 25, 2010 at 05:19:19PM -0400, "Olson, Arthur David (NIH/NCI) [E]" <olsona@dc37a.nci.nih.gov> wrote:
impose any fixed length limit. (POSIX allows implementations to limit the length to six bytes or higher, but doesn't oblige them to.)
Maybe I am dense, but where does POSIX actually oblige implementations to only accept the POSIX forms for TZ? I see no requirement for implementations to handle this in any specific way, but maybe I overlooked it? (The lack of requirements might not invalidate reasons to change anything, of course). -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@schmorp.de -=====/_/_//_/\_,_/ /_/\_\

Marc Lehmann wrote:
Maybe I am dense, but where does POSIX actually oblige implementations to only accept the POSIX forms for TZ?
In one sense it doesn't, because it has an explicit escape hatch, that any TZ value beginning with a colon has implementation-defined meaning. In another sense, accepting the colon form as a "POSIX form", the standard that I pointed at contains an explicit obligation. It says "The contents of the environment variable named TZ shall be used ... by various utilities, to override the default timezone. The value of TZ has one of the two forms ...". So it obliges certain functions and utilities to use TZ, and says what values in TZ (when they don't start with a colon) mean. And in another sense no again, because the description of the two forms of TZ can be read as an obligation on the *setter* of TZ. Setting TZ to a non-conforming value would be a violation of that obligation, presumably invoking undefined behaviour from the various utilities that are obliged to pay attention to TZ. But this is slightly beside the point. We're not actually talking about a value of a TZ variable. We're talking about a field in tzfiles, which is a format not defined by POSIX but by tzfile(5). That says "After the second header and data comes a newline-enclosed, POSIX-TZ-environment-variable-style string ...". It explicitly incorporates by reference the POSIX definition of the meaning of TZ values. I based my implementation on that. Actually I've now modified my implementation of the tzfile format to grandfather that specific space-containing value for this final field. If the last local time type in the file is UT with the funky abbreviation, and the `POSIX-TZ' field has the funky non-POSIX value, then it uses the last local time type, and refrains from passing the non-POSIX `POSIX-TZ' value on to the POSIX-TZ parsing code. This hack is in the same vein as one I already had, where a local time type of UT with abbreviation "zzz" is treated as not defining local time (which for this code *is* distinct from being defined as UT with abbreviation "zzz"). I've made both of these deviations from tzfile(5) in order to better handle the tzfiles that will be de facto found in the wild. So, erm, attempting to get back to some kind of point, I see two main reasons to pay attention to POSIX's rules here. Firstly, because tzfile(5) says that that field is for a POSIX TZ value, so it's a good idea to make sure that what's put in there really is a POSIX TZ value, so that you can hand it off to any code (not necessarily your own) that parses POSIX TZ values. You could of course change tzfile(5) so that it no longer claims that non-empty values in that field are always POSIX TZ values, but this would dilute the value of the field. It's *useful* to conform to a standard protocol in this field. Secondly, a direct implication of the POSIX rules on TZ is that timezone abbreviations are expected to be composed from a very limited set of characters, in particular not including space. This too is a protocol, and a useful one, as I alluded to in my previous message. Even if you're not handling TZ values, you may be handling local time abbreviations, and knowing what they'll look like is useful. So even if tzfile(5) didn't invoke the POSIX TZ rules at all, it would still be a good idea for all the abbreviations in a tzfile to satisfy POSIX's rules for local time abbreviations. -zefram
participants (3)
-
Marc Lehmann
-
Olson, Arthur David (NIH/NCI) [E]
-
Zefram