proposed changes for Win32 and a improved mktime() algorithm
Hello, I've a patch (diff/patch format) attached, which is the result of a longer period of testing (and delays due to other duties). The patch has been created against 2017B code. And I've tried to limit my changes to a minimum. The patch comprises changes for 5 files, where most of the changes are in private.h and localtime.c. The changes are (summarized list): 1. Win32/Visual Studio (2015) port. Related to including header files + handling of the Windows style backslash (by adding some macros) 2. Adding a faster (and more efficient) algorithm for mktime(). Instead of the binary search another approach is used, but also results in different ambiguous boundaries for switching from DST to non-DST. 3. Adding a possibility not to use putenv(). Putenv() may result in leaking memory, as it is unsure putenv() behaves on different platforms. As long as set & get of a TZ variable is within the TZ library, there is no need to use putenv()/getenv() at all. Also note that putenv/getenv on Windows are thread-unsafe (and I did not like to use the GetEnvironmentString/SetEnvironmentString() counterparts). 4. Solved some platform (e.g. HPUX) specific errors/warnings (initializing + some fixed char buffers). Indeed, making some compilers happy (there is more than gcc only). I'm not sure whether the way how I provide this patch is the recommended way. If another way is preferred, please let me know. If more information is needed, please ask. I added as much as possible the comments in the affected source files itself, but I can imagine that still some questions left. It is not my intention to see 'good luck with this patch'. If there is another preferred method to chop this patch in chunks, please let me know. Regards, Kees
Date: Tue, 9 May 2017 14:44:33 +0000 From: Kees Dekker <Kees.Dekker@infor.com> Message-ID: <DM5PR02MB2809951912FF7B2DE1A24AD5E9EF0@DM5PR02MB2809.namprd02.prod.outlook.com> | 2. Adding a faster (and more efficient) algorithm for mktime(). | Instead of the binary search another approach is used, I think that kind of thing is fine for a system's implementation of mktime(), but I don't think it is best for the reference implementation. The big advantage of the binary search algorithm, is that it moves *all* assumptions about the rules about the conversions into localtime, so they exist only in one place, and need to be updated only in one place. For example, ... + lDays += 1; /* year 0 is a leap year */ Is it indeed? Is there even a year 0? For sure, in the current implementation, we extend the Gregorian calendar way back into pre-history, and assume that years all had 365.25 days, and days all had 86400 seconds, even when we know that neither was true ... we extend that assumption back before the earth was formed, when there was nothing rotating and revolving around the sun, back even before there was a sun to rotate around, as if that all makes some kind of sense - which it doesn't. While fixing this in a way that would have made contemporary sense is hard, if not impossible, we could at least do better with pre-Gregorian human era dates - we could have the timezone data include the date/time at which the Gregorian calendar came into use in a particular zone, and some indication of what algorithm preceded it, and when that one came into being, etc, and for times before we have any idea what times were called we could just adopt stardates ... If we do any of that, it would be nice to have to write the code just once, in localtime(), and not have to also invent its inverse in mktime(). | 3. Adding a possibility not to use putenv(). That is certainly a worthwhile goal, though I would prefer to solve it by introducing a variant of tzset() which takes a timezone spec (anything which can be a value of getenv("TZ") as an arg, and returns a void * (pointer to an internal tzlib data struct) (it could be given some other type name, as long as internal details of its representation are not available) and then a parallel set of functions to localtime, mktime, etc which take this additional pointer as an extra arg, and use that zone. This allows parallel use of many different zones, without needing to be constantly switching between them, with all the costs that involves. We could perhaps call the tzset() variant, tzalloc() the data type it returns (rather than void *) a timezone_t, and add tzfree() to go with it of course, and the parallel functions could be localtime_rz() ctime_rz() mktime_z() ... And just maybe, all of this is already implemented and available... If the reason for doing it the other way is to avoid needing to change some application's localtime() (etc) calls, that can easily be handled by macros... #define tzset() (_my_tz = tzalloc(getenv("TZ"))) #define localtime(t) (localtime_rz(_my_tz, (t), &_my_static_tm)) /* ... */ | 4. Solved some platform (e.g. HPUX) specific errors/warnings | (initializing + some fixed char buffers). Assuming that this is the reason for ... - static const char wday_name[][3] = { + static const char wday_name[][4] = { Then IMO that system is just broken, it has always been legal to provide a string where the non-null chars exactly fill the space provided, in which case the terminating \0 is simply omitted from the initialisation. If some system has broken that, it should be made to suffer, we should not be working around it. | I'm not sure whether the way how I provide this patch is the | recommended way. It is not me who has to deal with it, but the patch format itself looked fine to me. I would suggest re-formatting the code into the style of the existing reference implementation, even if it is not your (or yor system's) preferred coding (and naming) style, if you expect the patches to be adopted though. A couple of the changes perplex me though (minor issues, but ...) Why: -# ifdef USG_COMPAT +# if defined(USG_COMPAT) When I first saw that, I assumed perhaps some system had deleted support for the (very old now, and no longer really needed) #ifdef, in which case that change would make sense, but then just after there is ... +#ifdef _MSC_VER where a new #ifdef is added, so that leaves the reason for the earlier change unexplained? +/* a lot of stuff does not exist on Windows/Visual Studio */ +#define DEFAULT_FOR_UNIX (!_MSC_VER) + #ifndef HAVE_LINK -#define HAVE_LINK 1 +#define HAVE_LINK DEFAULT_FOR_UNIX This kind of change should just be handled by pre-defining the HAVE_XXX symbols (with 0 values), that's why the #ifndef is there. kre
Robert Elz wrote:
| 2. Adding a faster (and more efficient) algorithm for mktime(). | Instead of the binary search another approach is used,
I think that kind of thing is fine for a system's implementation of mktime(), but I don't think it is best for the reference implementation.
Normally I'd agree. However, there is an alternative that would let us do both. We could use the glibc implementation of mktime. Like tzcode's mktime, it's independent of the rest of the implementation and is quite portable. And like the recently-proposed code, it's waay faster than binary search. A major advantage is that glibc is well-tested on many platforms to handle corner cases such as the ones that you mention. An advantage over the proposed code is that its copyright status is clear. The only downside that I can see is that it's LGPLed, which some of our downstream users might not like. If that's a showstopper, perhaps we could make the glibc version available as an option, so that LGPL-averse users can stick with what we've got. I'd rather not have to debug a whole new set of code for what should be a solved problem, though. (Disclaimer: I wrote both the tzcode mktime and the glibc mktime.)
Assuming that this is the reason for ...
- static const char wday_name[][3] = { + static const char wday_name[][4] = {
Then IMO that system is just broken, it has always been legal to provide a string where the non-null chars exactly fill the space provided, in which case the terminating \0 is simply omitted from the initialisation.
If some system has broken that, it should be made to suffer, we should not be working around it.
All true. Still, the C code is a little clearer when written with null-terminated strings, so I'm mildly inclined to change the 3 to a 4 anyway. I agree with your other comments. I'll follow up separately to Kees.
Date: Tue, 9 May 2017 21:02:01 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <1c7aabd1-ae3d-e200-16d4-68d9ad50c2bc@cs.ucla.edu> | there is an alternative that would let us do both. | We could use the glibc implementation of mktime. | The only downside that I can see is that it's LGPLed, That is a very big downside, but ... | (Disclaimer: I wrote both the tzcode mktime and the glibc mktime.) In that case you, or UCLA, own the copyright, and you can release it on any licence(s) that you like... Just because the gnusers insist on everything being *GPL'd doesn't mean you cannot also release the code under the MIT or BSD licence (or something similar.) I haven't ever seen that version - I make it a point not to look at *GPL'd code if I can possibly avoid it, but if manages to avoid all knowledge of calendaring in its implementation, then given a suitable licence it should be fine. (Disclaimer; I wrote the original version of the BSD mktime() which was the first I'm aware of to use the binary search method - though the idea to do it that way was not mine, that came, I was told, from Bob Kridle.) kre
Robert Elz wrote:
In that case you, or UCLA, own the copyright, and you can release it on any licence(s) that you like.
I long ago assigned my glibc contributions' copyright to the Free Software Foundation and would rather that they stay LGPLed. Again, if this is a showstopper I won't insist on switching the implementations. I'd rather not reinvent the mktime wheel, though.
On Tue, May 9, 2017 at 9:45 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Robert Elz wrote:
In that case you, or UCLA, own the copyright, and you can release it on any licence(s) that you like.
I long ago assigned my glibc contributions' copyright to the Free Software Foundation and would rather that they stay LGPLed.
Again, if this is a showstopper I won't insist on switching the implementations. I'd rather not reinvent the mktime wheel, though.
we wouldn't use LGPL code in Android's libc.
On Tue, May 9, 2017 at 9:48 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
enh wrote:
we wouldn't use LGPL code in Android's libc.
What does Android use for mktime? Something slow? (If so, it's in good company. :-)
afaik we're the only people shipping a mostly unmolested copy of the tzcode implementation :-) (the only behavioral difference i can remember is: // http://b/31339449: POSIX says localtime(3) acts as if it called tzset(3), but upstream // and glibc both think it's okay for localtime_r(3) to not do so (presumably because of // the "not required to set tzname" clause). It's unclear that POSIX actually intended this, // the BSDs disagree with glibc, and it's confusing to developers to have localtime_r(3) // behave differently than other time zone-sensitive functions in <time.h>. which i think i mentioned on the list at the beginning of the year.)
Date: Tue, 9 May 2017 21:48:23 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <9b1d76e4-d0d9-09a1-73df-822f3aa85e65@cs.ucla.edu> | What does Android use for mktime? Something slow? Why does it matter? If you have an application where the speed of mktime() is a critical factor, it is probably time to redesign the algorithm. kre
Date: Tue, 9 May 2017 21:46:42 -0700 From: enh via tz <tz@iana.org> Message-ID: <CAJgzZooAg8ArTiOk3D4-96t2Yu=T-BPbUYZMWHiH27Y2-FU6ng@mail.gmail.com> | we wouldn't use LGPL code in Android's libc. And while I have no particular status that allows me to definitive, I am confident that NetBSD wouldn't use it either. kre
Date: Tue, 9 May 2017 21:45:08 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <24fdd1d2-0152-e8e4-8a25-9f5dc47985c9@cs.ucla.edu> | I long ago assigned my glibc contributions' copyright to the Free Software | Foundation and would rather that they stay LGPLed. You could describe the algorithm you used, and someone else could do an implementation with a rational licence - ideas have no copyright... kre
On May 9, 9:45pm, eggert@cs.ucla.edu (Paul Eggert) wrote: -- Subject: Re: [tz] proposed changes for Win32 and a improved mktime() algor | Robert Elz wrote: | > In that case you, or UCLA, own the copyright, and you can release it | > on any licence(s) that you like. | | I long ago assigned my glibc contributions' copyright to the Free Software | Foundation and would rather that they stay LGPLed. Please consider releasing your code under the BSD license. | Again, if this is a showstopper I won't insist on switching the | implementations. I'd rather not reinvent the mktime wheel, though. The two licenses (BSD and LGPL) are very different and we (NetBSD) would like to keep *GPL code out of libc. The two routines also behave differently with respect to the DST gap (though this was easy to change). Best, christos
On 10/05/17 12:22, Christos Zoulas wrote:
On May 9, 9:45pm, eggert@cs.ucla.edu (Paul Eggert) wrote: -- Subject: Re: [tz] proposed changes for Win32 and a improved mktime() algor
| Robert Elz wrote: | > In that case you, or UCLA, own the copyright, and you can release it | > on any licence(s) that you like. | | I long ago assigned my glibc contributions' copyright to the Free Software | Foundation and would rather that they stay LGPLed.
Please consider releasing your code under the BSD license.
I think once you've assigned copyright to the FSF (and got your US one dollar bill from the FSF), you no longer retain any copyright, although I might be wrong about that (IANAL). (I folded an origami tank from my FSF one dollar bill, although it unfortunately got cremated in an office fire.) -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Web: http://www.mev.co.uk/ )=-
Thanks for writing up that patch. I installed the attached patches, which I hope addresses a few of the issues you raised. I'll comment below on the issues that the attached patches don't address. Kees Dekker wrote:
1. Win32/Visual Studio (2015) port. Related to including header files + handling of the Windows style backslash (by adding some macros)
In thinking about the problem with MS-Windows file name syntax, I would like to avoid complicating tzcode in areas where the complication isn't really needed. The zic man page already says that portable time zone names should not use constructs like leading "C:" and/or backslash. It shouldn't be tzcode's job to worry about the limitations of DOS file names, any more than it's tzcode's job to worry about file name restrictions imposed by VMS, z/OS, ISO 9660, etc. zic's -v option already warns about drive letters and backslashes in time zone names, and such names are so rare in practice that I don't see us as needing to do anything further here. So, instead of adding a bunch of code to deal with backslashes and drive letters, I'd like to remove the couple of stray lines of code that attempt to deal with this issue and which (obviously) do not suffice to solve the problem. For the other proposed changes: * My previous messages discussed the proposed mktime redo. * I think Robert Elz covered the putenv issue fairly well: that is, the problem has already been solved, in a different way. * I would rather not add pragmas to pacify a compiler that is generating false alarms. We don't have the resources to pacify the dozens of C compilers out there, and pacifying them all would contort the code. Keeping GCC happy is as much as I want to take on. Similarly, I'd rather not add INITIALIZE calls to pacify older versions of GCC, or to pacify other compilers that can't keep track of locals well.
-#define HAVE_SYMLINK 1 +#define HAVE_SYMLINK DEFAULT_FOR_UNIX
For changes like this you should be able to invoke 'make' with arguments suitable for your platform, and that way we don't need to modify the source code. E.g., you can run this: make CFLAGS='-DHAVE_LINK=0 -DHAVE_SYMLINK=0 -Dssize_t=long -Dlint' and similarly for the other flags you need to set for your platform.
+#ifndef TYPE_BIT
I didn't see the need for this #ifndef, and others like it. How could TYPE_BIT already be defined?
+#ifdef _MSC_VER + /* Windows does not have shell scripts */ + const char *getoptopt = "d:l:p:L:vs"; +#else + const char *getoptopt = "d:l:p:L:vsy:"; +#endif
If MS-Windows does not have shell scripts, then calling 'system' always fails, right? That should be good enough. (While we're on the topic, I wouldn't mind deprecating the 'y' option on all platforms, as it hasn't been needed for years and it has security implications.)
+#else + /* for systems that do not support symlinks, there is no need to call relname() (=alloc) + free, + but the to directory need to be created */ + symlink_errno = ENOTSUP;
As far as I can tell, this merely improves zic's performance a bit on platforms lacking the symlink system call. I wouldn't worry about this. It's better to keep the code simpler and easier to maintain, than to worry about barely-significant tuning of a rarely-used and plenty-fast program, a tuning that helps only nonstandard platforms.
Thanks for writing up that patch. I installed the attached patches, which I hope addresses a few of the issues you raised. I'll comment below on the issues that the attached patches don't address.
I will check the patches at end of this week, or next week. So please give me some time. I will also need some more time to read all feedback. Thanks for providing this. On beforehand: telling that a compiler is wrong is probably true/valid from developers or clean-coding perspective, but it is a very cumbersome job to change compiler behavior. I understand your feedback, but I can't do much with it. In most cases, I'm using the pragmatic approach, and understand that I've to live with the (probably) odd behavior of some compilers. It makes life too complicated to argue to all vendors that they must change their compiler. E.g. HP does not spend much time anymore to their compiler (because HPUX is almost dead?) and arguing with Microsoft about standards is certainly something that will never succeed. We have no other choice than continue using the existent compilers (and in most cases, Visual Studio is not that bad). Even if I can convince vendors to adjust their compiler, it takes months or years before these changes become available (I'm ignoring for case of simplicity the fact that standards are not always 100% clear, and leave some room for vendor's own thought about an implementation). Regards, Kees
Kees Dekker wrote:
HP does not spend much time anymore to their compiler (because HPUX is almost dead?) and arguing with Microsoft about standards is certainly something that will never succeed.
The general rule is that if the platform isn't supported by its supplier any more then we don't need to support it either. HP says they will support HP-UX on Itanium until at least 2025, so it's still viable. However, some old HP-UX compilers are not supported by HP any more, so just saying "I need this for HP-UX" is not enough: the old unbundled C compiler is long obsolete, for example, and the PA-RISC platform is obsolete too. And even if it's a viable compiler, if the compiler is merely generating false-alarm warnings then we typically do not bother "fixing" the code. Such "fixes" are often more trouble than they're worth. Fixing real bugs takes precedence over pacifying foolish compilers. You can ignore the false alarms with "grep -v". Microsoft is another animal: they don't care about whether native MS-Windows conforms to POSIX, and we don't have the resources to maintain a port to their idiosyncratic system. Luckily, we don't need to, as POSIX-like environments such as Cygwin and WSL are readily available for MS-Windows, and though not perfect they're good enough.
The general rule is that if the platform isn't supported by its supplier any more then we don't need to support it either. HP says they will support HP-UX on Itanium until at least 2025, so it's still viable. However, some old HP-UX compilers are not supported by HP any more, so just saying "I need this for HP-UX" is not enough: the old unbundled C compiler is long obsolete, for example, and the PA-RISC platform is obsolete too.
HP PA-RISC is a different story, but also for HP UX Itanium, the number of fixes + conformance to e.g. C++11 (or later) is quite low. We are using aCC 06.28.02 (=latest one AFAIK), but it is not updated since past 1.5 year. I guess, but can't prove, that HP will limit releasing patches to a very low level, long before 2025 is reached. Their 2025 boundary is also a little bit theoretical, as the most recent Itanium hardware was launched 5+ years ago, and HW has its own life-cycle policy. The 2025 date is related to the OS, but is useless if there is no supported hardware anymore (see also: https://www.hpe.com/h20195/V2/getpdf.aspx/4AA4-7673ENW.pdf?ver=Rev.%205). The end of support date for HPUX11.31 is actually 31-Dec-2020 (IA64). As PA-RISC is not sold for years, the 2025 date is imaginary, because none of the PA-RISC hardware is still supported.
Microsoft is another animal: they don't care about whether native MS-Windows conforms to POSIX, and we don't have the resources to maintain a port to their idiosyncratic system. Luckily, we don't need to, as POSIX-like environments such as Cygwin and WSL are readily available for MS-Windows, and though not perfect they're good enough.
I know, but Cygwin is no option for us. As you could see with my patch, I got all working :-).
And even if it's a viable compiler, if the compiler is merely generating false-alarm warnings then we typically do not bother "fixing" the code. Such "fixes" are often more trouble than they're worth. Fixing real bugs takes precedence over pacifying foolish compilers. You can ignore the false alarms with "grep -v".
I forgot something: warnings about not initialized variables are almost always triggered because there is a If else[/if else].. codepath, where one of the codepaths uses a variable that is not initialized somewhere. Although This situation may not happen in practice; such warnings point to (IMO) imperfect code. Based on my experience, such warnings are usually worth to be solved. Maintaining code will not suffer from one initialization at declaration level, and performance will (usually) also not suffer, as most initialization code just takes one CPU instruction. Anyhow, I will try to check your patches Friday or otherwise hopefully next week.
On 05/10/2017 04:34 AM, Kees Dekker wrote:
I forgot something: warnings about not initialized variables are almost always triggered because there is a If else[/if else].. codepath, where one of the codepaths uses a variable that is not initialized somewhere. Yes, here's an example of the problem:
bool ok = complicated condition; int value; if (ok) value = complicated expression; complicated actions; if (ok) return value; An inferior compiler might complain that 'value' might be used uninitialized. One can pacify such a compiler by changing line 2 to 'int value = 0;', and you're correct that this typically won't slow the code down significantly. However, such a change obfuscates the source code, as the reader is left wondering why the unnecessary initialization is present. Instead, it's better to disable or ignore the bogus warnings, or get a better compiler. After all, there's nothing wrong with this code, and your compiler is supposed to be your servant not your master.
Yes, here's an example of the problem:
bool ok = complicated condition; int value; if (ok) value = complicated expression; complicated actions; if (ok) return value;
An inferior compiler might complain that 'value' might be used uninitialized. One can pacify such a compiler by changing line 2 to 'int value = 0;', and you're correct that this typically won't slow the code down significantly. However, such a change obfuscates the source code, as the reader is left wondering why the unnecessary initialization is present. Instead, it's better to disable or ignore the bogus warnings, or get a better compiler. After all, there's nothing wrong with this code, and your compiler is supposed to be your servant not your master.
I agree with this example, but also disagree to a little extend: 1. although a compiler is indeed inferior, it is not easy (or even impossible) to switch to another compiler. You are IMO focusing too much on gcc. 2. Adding int value = 0 (or any initializing value what is wanted) does not reduce the readability of the code. If it does, I can’t imagine how. 3. Initialization gives an indication about how the developer things. E.g. if the initial value is an error, then you can easily see 'this function returns x upon error'. Initialization then does not obfuscate code, but improve readability. Note that much other languages (Java/C++) always initialize types to defaults. 4. I've seen too many issues with uninitialized code in C, that I lean towards 'better safe than sorry' and always initialize any value. Figuring out errors that are related to uninitialized code are very hard to find (debugging problems is almost always a hard job). 5. The compiler acts as servant to notify (probably sometimes with false alarms) potential issues. Indeed, dome compilers are too pedantic. But that Is the freedom (isn't it?) of the owner of the compiler. It is usually out of the scope of a developer to define that (so many man, so many minds). 6. In most cases, real code is (much) more complex. In terms of costs and time is adding a good value for initialization is cheaper, compared to the single CPU needed for init. Anyhow, as you are the maintainer of the TZ code, it is up to you, but is saves us time if we can use the unmodified code... I guess that most users are not using visual studio (or something else than gcc), so it may also help others if these changes are accepted. (I still did not check the patches, so I don't yet know what part was accepted. Have a nice day, I will leave for today).
On Wed, 10 May 2017, Kees Dekker wrote:
2. Adding int value = 0 (or any initializing value what is wanted) does not reduce the readability of the code. If it does, I can’t imagine how.
Killing warnings this way - kills warnings from all compilers, good and bad. Warnings are meant to expose problems in code, and if they are killed, bugs introduced later might not be detected. In code that will be maintained for thousands of years (I hope) this counts. -- Foreca Ltd Jaakko.Hyvatti@foreca.com Keilaranta 1, FI-02150 Espoo, Finland http://www.foreca.com
Date: Thu, 11 May 2017 08:17:28 +0300 (EEST) From: =?ISO-8859-15?Q?Jaakko_Hyv=E4tti?= <jaakko.hyvatti@foreca.com> Message-ID: <alpine.LFD.2.03.1705110810140.3744@gyre.foreca.com> | In code that will be maintained for thousands of years [...] Since a hundred years ago there was no such thing as computer code (or computers) postulating what might be true even 1000 years (or 200) into the future (other than Paul's geological/astronomical/... calculation where things change slowly) is kind of brave. Personally I'd expect that anything computer related from today would be about as relevant then as a fletcher is today (ie: there are a few enthusiasts for whom there is still a demand, but for 99% of the world, we'd never notice if they all just vanished...) kre
On Wed, 10 May 2017, Kees Dekker wrote:
2. Adding int value = 0 (or any initializing value what is wanted) does not reduce the readability of the code. If it does, I canʼt imagine how.
Killing warnings this way - kills warnings from all compilers, good and bad. Warnings are meant to expose problems in code, and if they are killed, bugs introduced later might not be detected. In code that will be maintained for thousands of years (I hope) this counts.
It is a good practice to write code like: int f () { // error return at default int ret = -1; if (all-ok) { // success ret = 0; } return ret; } This has nothing to do with suppressing potential bugs. Because the 'possibly not initialized' warnings are very compiler dependent, Including compilers that never warn/check these issues, you can't rely on your compiler whether it will tell that something is not Initialized. If a compiler never complains about uninitialized code, a first initialization will prevent you from making bugs. Regards, Kees
On May 11, 2017, at 12:26 AM, Kees Dekker <Kees.Dekker@infor.com> wrote:
On Wed, 10 May 2017, Kees Dekker wrote:
2. Adding int value = 0 (or any initializing value what is wanted) does not reduce the readability of the code. If it does, I canʼt imagine how.
Killing warnings this way - kills warnings from all compilers, good and bad. Warnings are meant to expose problems in code, and if they are killed, bugs introduced later might not be detected. In code that will be maintained for thousands of years (I hope) this counts.
It is a good practice to write code like:
int f () { // error return at default int ret = -1; if (all-ok) { // success ret = 0; } return ret; }
It is not necessarily a good practice to write code such as int foo = 0; ... switch (bar) { case ORIGINAL: ... foo = original_foo(...); break; case NEW: ... foo = new_foo(...); break; } If the code is later modified to have a case "NEWER", and that case doesn't do foo = newer_foo(...); then, with a compiler that does good enough data flow analysis to detect foo not being set, then, if foo *weren't* initialized at the beginning of the function, the compiler would catch that; the initialization prevents that. This is, BTW, not a hypothetical example, although, for some reason, my compiler (clang Apple LLVM version 8.1.0 (clang-802.0.42)) didn't seem to catch that; somebody else's compiler (probably some version of GCC) *did* catch that. Their fix was to initialize the variable in its declaration; my fix, after checking the code, was to add code to set it in the case that was missing it, and then adding a default case that set the variable (the variable switched on came from a file, and it *should* have only one of a limited number of values, but nothing other than the code writing the file would prevent it having another value, which the code should *somehow* handle). So it's a tradeoff between "compiler with weaker data flow analysis won't catch the use of an uninitialized variable if you *don't* do a forcible initialization" and "compiler with better data flow analysis won't catch the failure to set the variable if you *do* a forcible initialization". (And I need to figure out how to beat clang into doing the right data flow analysis here - I won't settle for "see, your compiler sux0rs, you need to initialize the variable".)
On May 10, 2017, at 10:51 AM, Paul Eggert <eggert@CS.UCLA.EDU> wrote:
On 05/10/2017 04:34 AM, Kees Dekker wrote:
I forgot something: warnings about not initialized variables are almost always triggered because there is a If else[/if else].. codepath, where one of the codepaths uses a variable that is not initialized somewhere. Yes, here's an example of the problem:
bool ok = complicated condition; int value; if (ok) value = complicated expression; complicated actions; if (ok) return value;
An inferior compiler might complain that 'value' might be used uninitialized. One can pacify such a compiler by changing line 2 to 'int value = 0;', and you're correct that this typically won't slow the code down significantly. However, such a change obfuscates the source code, as the reader is left wondering why the unnecessary initialization is present. Instead, it's better to disable or ignore the bogus warnings, or get a better compiler. After all, there's nothing wrong with this code, and your compiler is supposed to be your servant not your master.
In GCC at least, you can suppress the warning with an extension: "int foo = foo;" which means "don't initialize but don't bug me". That has the nice property of expressing exactly what you intended, as opposed to initializing with an actual value. BTW, because of the halting problem, it's not valid to say that a warning is a sign of an inferior compiler. For some values of "complicated condition" it might be, although even then it would be a matter of opinion (to wit, is it reasonable for a compiler to limit compile time evaluation). paul
On 05/10/2017 08:35 AM, Paul.Koning@dell.com wrote:
In GCC at least, you can suppress the warning with an extension: "int foo = foo;" which means "don't initialize but don't bug me".
Thanks, I didn't know about this extension. However, it has undefined behavior in standard C, so we can't use it in code intended to be portable everywhere. And it'd be error-prone even to use it in conditionally-compiled code. Suppose we added this macro: #ifdef __GNUC__ # define INITIALIZED(x) x = x #else # define INITIALIZED(x) x #endif And suppose we remove -Winit-self from GCC_DEBUG_FLAGS, and replace existing code "int local; INITIALIZE(local);" with "int INITIALIZED(local);" to pacify GCC when we know that "local" cannot be used uninitialized but GCC doesn't deduce this. The problem with this approach is that if there is buggy code somewhere else that says "int local2 = local2;", GCC won't warn about it.
because of the halting problem, it's not valid to say that a warning is a sign of an inferior compiler
It's a quality-of-implementation issue, and it's fair to say that some compilers are better than others in this area. It's true that because of the halting problem, no compiler can do a perfect job: for any compiler X that you can build, someone else can build a compiler Y that is better than X in this area. When I wrote "an inferior compiler", I meant "a compiler inferior to recent GCC", not "a compiler inferior to a perfect compiler".
Paul Eggert wrote:
Thanks, I didn't know about this extension. However, it has undefined behavior in standard C, so we can't use it in code intended to be portable everywhere.
I wondered about that, because it seems like the kind of thing that would have undefined behaviour, but I looked at the C standard, and on my reading it actually doesn't. The relevant wording is # An object whose identifier is declared with no linkage and without the # storage-class specifier "static" has /automatic storage duration/. # [...] storage is guaranteed to be reserved for a new instance of # the object on each entry into the block with which it is associated; # the initial value of the object is indeterminate. To have an indeterminate value does not amount to undefined behaviour. When we have no initialising clause at all, clearly the indeterminate value is not a problem if we don't rely on the value; that is how the code is currently correct. An "x = x" initialiser amounts to an assignment using that initial indeterminate value. Crucially, there's nothing saying that using an indeterminate value per se invokes undefined behaviour. My reading is that an indeterminate value is some actual value, though we don't know which value it is, and that copying it by assignment or initialisation just copies whatever value it is without trouble. Thus "x = x" is truly a no-op. So I think we'd be fine to adopt the "x = x" idiom without any conditionality. Of course, the idiom might elicit *more* warnings from compilers other than gcc, and that could be a reason to use it conditionally even if correctness isn't on the line. One also might want to hide it behind a macro even if used unconditionally, on the grounds that the macro name makes clearer what's going on. -zefram
On 05/18/2017 10:08 AM, Zefram wrote:
it seems like the kind of thing that would have undefined behaviour, but I looked at the C standard, and on my reading it actually doesn't.
I think you need to look at other parts of the standard. C11 6.3.2.1 paragraph 2 says, "If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined." Also see C11 6.2.6.1 paragraph 5 for non-character accesses.
Paul Eggert wrote:
I think you need to look at other parts of the standard. C11 6.3.2.1 paragraph 2
That does change things, thanks. I concur with you that that paragraph (on lvalue conversion) does make the idiom in question undefined behaviour. Interestingly, that sentence is new in C11; C99 doesn't have it. (I had been looking at C99.) After a bit more looking around, I think my earlier analysis still holds for C99. This change between versions is a bit surprising. -zefram
On 18/05/17 18:08, Zefram wrote:
Paul Eggert wrote:
Thanks, I didn't know about this extension. However, it has undefined behavior in standard C, so we can't use it in code intended to be portable everywhere.
I wondered about that, because it seems like the kind of thing that would have undefined behaviour, but I looked at the C standard, and on my reading it actually doesn't. The relevant wording is
# An object whose identifier is declared with no linkage and without the # storage-class specifier "static" has /automatic storage duration/. # [...] storage is guaranteed to be reserved for a new instance of # the object on each entry into the block with which it is associated; # the initial value of the object is indeterminate.
To have an indeterminate value does not amount to undefined behaviour. When we have no initialising clause at all, clearly the indeterminate value is not a problem if we don't rely on the value; that is how the code is currently correct. An "x = x" initialiser amounts to an assignment using that initial indeterminate value. Crucially, there's nothing saying that using an indeterminate value per se invokes undefined behaviour. My reading is that an indeterminate value is some actual value, though we don't know which value it is, and that copying it by assignment or initialisation just copies whatever value it is without trouble. Thus "x = x" is truly a no-op.
So I think we'd be fine to adopt the "x = x" idiom without any conditionality. Of course, the idiom might elicit *more* warnings from compilers other than gcc, and that could be a reason to use it conditionally even if correctness isn't on the line. One also might want to hide it behind a macro even if used unconditionally, on the grounds that the macro name makes clearer what's going on.
I'm not sure about that. Although it is not normative, Section "J.2 Undefined behavior" includes: # - The value of an object with automatic storage duration is # used while it is indeterminate (6.2.4, 6.7.9, 6.8). -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Web: http://www.mev.co.uk/ )=-
Ian Abbott wrote:
I'm not sure about that. Although it is not normative, Section "J.2 Undefined behavior" includes:
# - The value of an object with automatic storage duration is # used while it is indeterminate (6.2.4, 6.7.9, 6.8).
Interesting. This exists in both C99 and C11. However, the list item is not supported by the normative sections of the standard to which it refers, in either version. The first referenced section, 6.2.4, in paragraph 2, makes the narrower statement that "if an object is referred to outside of its lifetime, the behaviour is undefined". It also states that the lifetime of an auto variable (other than a variable length array) starts at the start of its enclosing block. Neither of the other referenced sections, or all three taken together, say anything closer to what the list item claims. So I think the list item is actually erroneous. It might possibly have be intended to refer to the lifetime rule in 6.2.4, but that appears as another list item in J.2. Perhaps it referred to other wording in an earlier draft of C99 that later got edited out. As Paul Eggert has pointed out in another message, C11 section 6.3.2.1 has wording that makes the situation described by the list item *usually* undefined behaviour. C99 doesn't have it. Since the wording of the list item didn't change between versions (except for updating a section number), the fact that the item is now nearly correct seems to be accidental. -zefram
Zefram said:
Thanks, I didn't know about this extension. However, it has undefined behavior in standard C, so we can't use it in code intended to be portable everywhere.
# the initial value of the object is indeterminate.
Right.
To have an indeterminate value does not amount to undefined behaviour.
No. But an indeterminate value doesn't have to be a valid value; it can be a trap representation [C99 3.17.2].
When we have no initialising clause at all, clearly the indeterminate value is not a problem if we don't rely on the value; that is how the code is currently correct. An "x = x" initialiser amounts to an assignment using that initial indeterminate value.
Which could be a trap representation.
Crucially, there's nothing saying that using an indeterminate value per se invokes undefined behaviour.
Yes, there is. Reading, let alone writing, a trap representation is undefined behaviour [C99 6.2.6.1 paragraph 5].
My reading is that an indeterminate value is some actual value, though we don't know which value it is, and that copying it by assignment or initialisation just copies whatever value it is without trouble.
That, I'm afraid, is wrong [C99 footnote 41]. (I am the person who wrote 6.2.6.) -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
In thinking about the problem with MS-Windows file name syntax, I would like to avoid complicating tzcode in areas where the complication isn't really needed. The zic man page already says that portable time zone names should not use constructs like leading "C:" and/or backslash. It shouldn't be tzcode's job to worry about the limitations of DOS file names, any more than it's tzcode's job to worry about file name restrictions imposed by VMS, z/OS, ISO 9660, etc. ... So, instead of adding a bunch of code to deal with backslashes and drive letters, I'd like to remove the couple of stray lines of code that attempt to deal with this issue and which (obviously) do not suffice to solve the problem.
I agree with you regarding slashes, because (most) Windows API functions allow to use UNIX-style slashes. But support for drive letters is a must have on Windows, as the root (in UNIX terms) of a process is relative to a drive. If a process was started from drive D:, Then '/' points to D:, but if the zonefiles exist on drive C:, they can't be opened. I don't know how to work around this. To fix this, the TZ library needs support for Windows drive-letters or if anyone else can provide suggestions for another approach.
* I think Robert Elz covered the putenv issue fairly well: that is, the problem has already been solved, in a different way.
I've checked latest version, but do not understand how it is fixed. Maybe I missed something in the localtime.c code? I'm afraid I don't fully understand why Robert Elz solution is a fix. Please keep in mind that I'm not living only In the GNU/Linux world. I've to work with HPUX, Solaris, AIX, Windows as well (= not (!) using GNU libs).
If MS-Windows does not have shell scripts, then calling 'system' always fails, right? That should be good enough. (While we're on the topic, I wouldn't mind deprecating the 'y' option on all platforms, as it hasn't been needed for years and it has security implications.) Yes, but yearistype is always needed, and if nobody changes the shell script, why not keep this code into C? It is *much* faster than create a child process to execute the shell script. Moving away from shell scripts may also solves security implications?
As far as I can tell, this merely improves zic's performance a bit on platforms lacking the symlink system call. I wouldn't worry about this. It's better to keep the code simpler and easier to maintain, than to worry about barely-significant tuning of a rarely-used and plenty-fast program, a tuning that helps only nonstandard platforms.
Our own software heavily relies on switching time zones, so any improvement is welcome. Also, that's why we have improved mkime (with the time_succesive() code), as the current approach is pretty inefficient. We are considering to make a LRU cache, to cache the last 2 time zonefiles loaded. Regards, Kees
Date: Wed, 24 May 2017 15:14:58 +0000 From: Kees Dekker <Kees.Dekker@infor.com> Message-ID: <DM5PR02MB2809298A95EC38CE6C2FB046E9FE0@DM5PR02MB2809.namprd02.prod.outlook.com> | | >* I think Robert Elz covered the putenv issue fairly well: | | I've checked latest version, but do not understand how it is fixed. The point (about which I was not all that clear really) is that using putenv is the wrong way (normally) to deal with time zones which the application wants to use (and to use a putenv solution, it has to know the zone - get it from somewhere.) The better solution (which already exists) is to use parallel functions which are told which zone to use explicitly. That way you can operate in as many different zones as you like (well, up to the number that exist) in parallel, with no overhead at all (except the ram they use.) | Yes, but yearistype is always needed, By what? While I am sorry to see it go (in a way) as it provides a flexibility that isn't available any other way, I cannot pretend that it is actually useful anywhere currently, or any time in the recent past - and for the older past, when it was needed, the zones have just been explicitly coded with the results yearistype used to return. What's more, these days there are so many unpredictable zone changes every year, that there are constant new releases, which allows the problems that yearistype used to solve to be handled by just adding more rule lines - in the early days, it was naively assumed that the rules would remain mostly constant for years, and all we needed to be able to do was write down the rule (however complicated.) That was a nice theory... | and if nobody changes the shell script, why not keep this code into C? Because, if it were useful at all, that would defeat the point. The whole idea of yearistype was to be able to easily express relationships between years, and labels, that were unknown when the code was written. Anyone can add a line (or several) to a script to work out which years are "the year of the dog", but the code writers cannot, as they have no idea that this particular information is considered (by someone) to be useful. | Our own software heavily relies on switching time zones, so any | improvement is welcome. The versions of the functions with the _z suffix (including _rz) which take an explicit zone (the result of tzalloc(zone_name) as a parameter should make things run much faster (and more reliably, especially if your applications are multi-threaded) than anything using putenv. | We are considering to make a LRU cache, to cache the last 2 time | zonefiles loaded. You don't need that with the _z functions, they are *all* cached (all the ones that you are using anyway - and the application can decide whether it is better to retain a loaded zone, or save space and discard it, and then load it again later.) kre
On 05/24/2017 10:18 AM, Robert Elz wrote:
While I am sorry to see it go (in a way) as it provides a flexibility that isn't available any other way, I cannot pretend that it is actually useful anywhere currently
Yes, and it's time to take the next step in deprecating yearistype, by issuing warnings when people try to use it, rather than merely documenting that it's deprecated. A proposed patch is attached. Also, due to this 2016-09-06 commit: https://github.com/eggert/tz/commit/6d70edaaf149e35ebb6d9107b216519debcc7f11 zic now chdirs to its destination directory, which means 'zic -y ./yearistype' no longer works unless yearistype is in the destination directory. I expect that nobody has reported the issue because nobody uses yearistype any more.
Yes, and it's time to take the next step in deprecating yearistype, by issuing warnings when people try to use it, rather than merely documenting that it's deprecated. A proposed patch is attached.
Also, due to this 2016-09-06 commit:
https://github.com/eggert/tz/commit/6d70edaaf149e35ebb6d9107b216519debcc7f11
zic now chdirs to its destination directory, which means 'zic -y ./yearistype' no longer works unless yearistype is in the destination directory. I expect that nobody has reported the issue because nobody uses yearistype any more.
Thanks. Kees
The point (about which I was not all that clear really) is that using putenv is the wrong way (normally) to deal with time zones which the application wants to use (and to use a putenv solution, it has to know the zone - get it from somewhere.)
The better solution (which already exists) is to use parallel functions which are told which zone to use explicitly. That way you can operate in as many different zones as you like (well, up to the number that exist) in parallel, with no overhead at all (except the ram they use.)
Please forgive my innocence. It would be great if you provide function names. I assume you are pointing to function names in localtime.c? Do you mean the _rz functions? But then NETBSD_INSPIRED need to be defined (no problem, but may result in another porting issues for me on MS-Windows). ...
Because, if it were useful at all, that would defeat the point. The whole idea of yearistype was to be able to easily express relationships between years, and labels, that were unknown when the code was written. Anyone can add a line (or several) to a script to work out which years are "the year of the dog", but the code writers cannot, as they have no idea that this particular information is considered (by someone) to be useful.
I saw that Paul has started to make the -y option obsolete, thanks. This will solve another non-portable thing for MS-Windows. ...
The versions of the functions with the _z suffix (including _rz) which take an explicit zone (the result of tzalloc(zone_name) as a parameter should make things run much faster (and more reliably, especially if your applications are multi-threaded) than anything using putenv.
I will check and try to use the TZ code with as less modifications as possible.
You don't need that with the _z functions, they are *all* cached (all the ones that you are using anyway - and the application can decide whether it is better to retain a loaded zone, or save space and discard it, and then load it again later.)
Will check. Kees
Date: Mon, 29 May 2017 07:15:49 +0000 From: Kees Dekker <Kees.Dekker@infor.com> Message-ID: <DM5PR02MB280930B966DE287CB603680AE9F30@DM5PR02MB2809.namprd02.prod.outlook.com> | Please forgive my innocence. It would be great if you provide function names. | I assume you are pointing to function names in localtime.c? Yes, those. | Do you mean the _rz functions? Yes, and tzalloc() and one or two which are just _z ... if it has a 'z' suffix in there somewhere, then it is relevant. The 'z' implies that it has a timezone_t as a parameter (which tzalloc() makes from a TZ string). Oh, there's also tzfree() of course - the inverse of tzalloc(). | But then NETBSD_INSPIRED need to be defined Notice just "inspired" not required... I suspect (without actually going to check) that essentially all you get with thet symbol defined will be the ones that you need. | (no problem, but may result in another porting issues for me on MS-Windows). There should not be a lot, and I cannot think of anything of a different nature than you have seen already. One issue may be the 'r' suffix part (thread safe) - but I assume windows has some way to make threaded programs, and hence to make thread safe library functions. kre
Yes, and tzalloc() and one or two which are just _z ... if it has a 'z' suffix in there somewhere, then it is relevant. The 'z' implies that it has a timezone_t as a parameter (which tzalloc() makes from a TZ string). Oh, there's also tzfree() of course - the inverse of tzalloc().
Thanks... ...
There should not be a lot, and I cannot think of anything of a different nature than you have seen already.
One issue may be the 'r' suffix part (thread safe) - but I assume windows has some way to make threaded programs, and hence to make thread safe library functions.
Windows at default (although with some exceptions, especially to POSIX like functions) is multi-threaded aware. But as our application is not, it is not a problem. I will check how I can use these _z and/or _rz functions. I will first try to make an isolated patch proposal for time_succesive(), as asked by Paul (for mktime() performance).
On 05/24/2017 08:14 AM, Kees Dekker wrote:
I agree with you regarding slashes, because (most) Windows API functions allow to use UNIX-style slashes. But support for drive letters is a must have on Windows, as the root (in UNIX terms) of a process is relative to a drive. If a process was started from drive D:, Then '/' points to D:, but if the zonefiles exist on drive C:, they can't be opened.
I don't see why they can't be opened. If the value of the TZDIR macro is "C:/zoneinfo", then calling tzalloc("America/Los_Angeles") opens the file "C:/zoneinfo/America/Los_Angeles" even when the process's working directory is D:/foobar. Although it's true that tzalloc("C:/zoneinfo/America/Los_Angeles") won't access the file in question, we shouldn't need to worry about supporting this unusual use case on MS-Windows. Even on POSIX platforms, using absolute filenames for time zones is a questionable practice and some systems disable it for security reasons. I think Robert covered the other points of your email. (In particular, look for tzalloc, localtime_rz, and mktime_z.)
Also, that's why we have improved mkime (with the time_succesive() code), as the current approach is pretty inefficient. If this change can be separated out and made portable and reliable in the presence of integer overflow and unsigned time_t and so forth, it should be OK. As I recall the proposed change is a long way from that, though. From my point of view the simplest thing to do would be change the code to use GNU mktime, as a build-time option (default off) for people who want speed and don't mind the GPL.
Also, that's why we have improved mkime (with the time_succesive() code), as the current approach is pretty inefficient. If this change can be separated out and made portable and reliable in the presence of integer overflow and unsigned time_t and so forth, it should be OK. As I recall the proposed change is a long way from that, though. From my point of view the simplest thing to do would be change the code to use GNU mktime, as a build-time option (default off) for people who want speed and don't mind the GPL.
Attached a new patch, only covering the time_succesive() part. Inconsistencies in using spaces or tabs in idents (e.g. in time(), init_ttinfo()) are ignored. I'm using the same (inconsistent) mix of idents. These were (partially) fixed in our own code, but omitted in the patch. The code is provided as is. Free to use, no restrictions, under same conditions as IANA provides its software. You are free to adopt, change the code, including rewrite, removal of comments (and the name of my colleague). The core part of time_succesive() is explained in the comments in time_successive(): /* As an initial guess for the correct time_t value, take the utc value of 00:00:00 at the 3rd day of the specified month and year. When calculating that value back to a struct tm, at least the year and the month will be correct. */ The patch has been created on top of latest tz changes (as of today). As said before: the ambiguity for DST changes is different, compared to the original/IANA's implementation. The code does not (yet) check for integer overflows. But hopefully, the patch will already give you an impression about its mindset. Just compile with -DUSE_TIME_SUCCESSIVE to make the new code active. Kees
Kees Dekker wrote:
the patch will already give you an impression about its mindset.
Unfortunately the patch doesn't work well in boundary cases like overflow or leap seconds, and (more generally) it is duplicating functionality that is already present in tzcode. If the patch works for you in your situation that's fine, but it's not generally suitable for tzcode.
Unfortunately the patch doesn't work well in boundary cases like overflow or leap seconds, and (more generally) it is duplicating functionality that is already present in tzcode. If the patch works for you in your situation that's fine, but it's not generally suitable for tzcode.
Ok, no problem. Feel free to use its ideas, as the binary search is quite inefficient. Overflow detection can be added, which will be done, if we have time. I'm not sure whether leap second detection is absent. Anyhow, if you say on beforehand that you are not interested in a fixed version, I don't spend much hurry to it. Kees
participants (11)
-
christos@zoulas.com -
Clive D.W. Feather -
enh -
Guy Harris -
Ian Abbott -
Jaakko Hyvätti -
Kees Dekker -
Paul Eggert -
Paul.Koning@dell.com -
Robert Elz -
Zefram