zic tweak to warn about non-ASCII in filenames
To help ensure that non-ASCII characters don't appear in distribution filenames, changes to zic.c so that the "-v" option warns about them. Both attached and tab-mangled below. --ado *** /tmp/,azic.c 2014-06-25 19:32:44.803874900 -0400 --- /tmp/,bzic.c 2014-06-25 19:32:44.906880800 -0400 *************** *** 134,139 **** --- 134,140 ---- static int itsdir(const char * name); static int lowerit(int c); static int mkdirs(char * filename); + static void namecheck(const char * name); static void newabbr(const char * abbr); static zic_t oadd(zic_t t1, zic_t t2); static void outzone(const struct zone * zp, int ntzones); *************** *** 621,632 **** --- 622,652 ---- return (errors == 0) ? EXIT_SUCCESS : EXIT_FAILURE; } + #define BENIGN "+-_/" + + static void + namecheck(const char * const name) + { + register const char * cp; + + if (!noise) + return; + for (cp = name; *cp != '\0'; ++cp) + if (!isascii(*cp) || + (!isalnum(*cp) && strchr(BENIGN, *cp) == NULL)) { + warning(_("file name %s has non-ASCII-alphanumeric character other than %s"), + name, BENIGN); + return; + } + } + static void dolink(const char *const fromfield, const char *const tofield) { register char * fromname; register char * toname; + namecheck(tofield); if (fromfield[0] == '/') fromname = ecpyalloc(fromfield); else { *************** *** 1495,1500 **** --- 1515,1521 ---- void *typesptr = ats + timecnt; unsigned char *types = typesptr; + namecheck(name); /* ** Sort. */
On June 25, 2014 7:37:28 PM EDT, Arthur David Olson <arthurdavidolson@gmail.com> wrote:
To help ensure that non-ASCII characters don't appear in distribution filenames, changes to zic.c so that the "-v" option warns about them. Both attached and tab-mangled below.
It would be better to explicitly check against the Portable File Name Character Set. -GAWollman
Dot is pretty benign in a file name, isn't it? POSIX defines the portable file name character set as: 3.278 Portable Filename Character Set The set of characters from which portable filenames are constructed. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ - ( http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_... ) or (http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm and under Base Definitions, section 3 Definitions, and thence to 3.278). Your list omits . <dot> and adds + <plus> (and includes / <slash> the path separator). On Wed, Jun 25, 2014 at 4:37 PM, Arthur David Olson < arthurdavidolson@gmail.com> wrote:
To help ensure that non-ASCII characters don't appear in distribution filenames, changes to zic.c so that the "-v" option warns about them. Both attached and tab-mangled below.
--ado
*** /tmp/,azic.c 2014-06-25 19:32:44.803874900 -0400 --- /tmp/,bzic.c 2014-06-25 19:32:44.906880800 -0400 *************** *** 134,139 **** --- 134,140 ---- static int itsdir(const char * name); static int lowerit(int c); static int mkdirs(char * filename); + static void namecheck(const char * name); static void newabbr(const char * abbr); static zic_t oadd(zic_t t1, zic_t t2); static void outzone(const struct zone * zp, int ntzones); *************** *** 621,632 **** --- 622,652 ---- return (errors == 0) ? EXIT_SUCCESS : EXIT_FAILURE; }
+ #define BENIGN "+-_/" + + static void + namecheck(const char * const name) + { + register const char * cp; + + if (!noise) + return; + for (cp = name; *cp != '\0'; ++cp) + if (!isascii(*cp) || + (!isalnum(*cp) && strchr(BENIGN, *cp) == NULL)) { + warning(_("file name %s has non-ASCII-alphanumeric character other than %s"), + name, BENIGN); + return; + } + } + static void dolink(const char *const fromfield, const char *const tofield) { register char * fromname; register char * toname;
+ namecheck(tofield); if (fromfield[0] == '/') fromname = ecpyalloc(fromfield); else { *************** *** 1495,1500 **** --- 1515,1521 ---- void *typesptr = ats + timecnt; unsigned char *types = typesptr;
+ namecheck(name); /* ** Sort. */
-- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2013.0521 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."
I limited the BENIGN list to charcters other than [a-zA-Z0-9] currently used in distribution file names; <dot> could be added. (The "etcetera" file has lines such as "Zone Etc/GMT+10...") On an unrelated note: I checked; "zic -v" already checks abbreviations and issues warnings. --ado On Wed, Jun 25, 2014 at 8:36 PM, Jonathan Leffler < jonathan.leffler@gmail.com> wrote:
Dot is pretty benign in a file name, isn't it?
POSIX defines the portable file name character set as:
3.278 Portable Filename Character Set
The set of characters from which portable filenames are constructed.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ -
( http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_... ) or (http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm and under Base Definitions, section 3 Definitions, and thence to 3.278).
Your list omits . <dot> and adds + <plus> (and includes / <slash> the path separator).
On Wed, Jun 25, 2014 at 4:37 PM, Arthur David Olson < arthurdavidolson@gmail.com> wrote:
To help ensure that non-ASCII characters don't appear in distribution filenames, changes to zic.c so that the "-v" option warns about them. Both attached and tab-mangled below.
--ado
*** /tmp/,azic.c 2014-06-25 19:32:44.803874900 -0400 --- /tmp/,bzic.c 2014-06-25 19:32:44.906880800 -0400 *************** *** 134,139 **** --- 134,140 ---- static int itsdir(const char * name); static int lowerit(int c); static int mkdirs(char * filename); + static void namecheck(const char * name); static void newabbr(const char * abbr); static zic_t oadd(zic_t t1, zic_t t2); static void outzone(const struct zone * zp, int ntzones); *************** *** 621,632 **** --- 622,652 ---- return (errors == 0) ? EXIT_SUCCESS : EXIT_FAILURE; }
+ #define BENIGN "+-_/" + + static void + namecheck(const char * const name) + { + register const char * cp; + + if (!noise) + return; + for (cp = name; *cp != '\0'; ++cp) + if (!isascii(*cp) || + (!isalnum(*cp) && strchr(BENIGN, *cp) == NULL)) { + warning(_("file name %s has non-ASCII-alphanumeric character other than %s"), + name, BENIGN); + return; + } + } + static void dolink(const char *const fromfield, const char *const tofield) { register char * fromname; register char * toname;
+ namecheck(tofield); if (fromfield[0] == '/') fromname = ecpyalloc(fromfield); else { *************** *** 1495,1500 **** --- 1515,1521 ---- void *typesptr = ats + timecnt; unsigned char *types = typesptr;
+ namecheck(name); /* ** Sort. */
-- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2013.0521 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."
If you allow dot, you'll want to add checks that "." and ".." are not used as file name components. Our guidelines in Theory prevent us from doing this to ourselves. It states, "Within a file name component, use only ASCII letters, [dot, hyphen, and underscore]." (Slash delimits components, and plus is used in some Etc zones.) However, Theory also says to "[o]mit [dot] from abbreviations in names". So the question is: Could we imagine some use case where someone (likely not us) might want to use a dot that makes this worthwhile? A related question that arose along those lines: Are we already checking that components do not start with a hyphen? Theory specifies that as well. -- Tim Parenti On 25 June 2014 20:52, Arthur David Olson <arthurdavidolson@gmail.com> wrote:
I limited the BENIGN list to charcters other than [a-zA-Z0-9] currently used in distribution file names; <dot> could be added.
(The "etcetera" file has lines such as "Zone Etc/GMT+10...")
On an unrelated note: I checked; "zic -v" already checks abbreviations and issues warnings.
--ado
On Wed, Jun 25, 2014 at 8:36 PM, Jonathan Leffler < jonathan.leffler@gmail.com> wrote:
Dot is pretty benign in a file name, isn't it?
POSIX defines the portable file name character set as:
3.278 Portable Filename Character Set
The set of characters from which portable filenames are constructed.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ -
( http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_... ) or (http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm and under Base Definitions, section 3 Definitions, and thence to 3.278).
Your list omits . <dot> and adds + <plus> (and includes / <slash> the path separator).
On Wed, Jun 25, 2014 at 4:37 PM, Arthur David Olson < arthurdavidolson@gmail.com> wrote:
To help ensure that non-ASCII characters don't appear in distribution filenames, changes to zic.c so that the "-v" option warns about them. Both attached and tab-mangled below.
--ado
*** /tmp/,azic.c 2014-06-25 19:32:44.803874900 -0400 --- /tmp/,bzic.c 2014-06-25 19:32:44.906880800 -0400 *************** *** 134,139 **** --- 134,140 ---- static int itsdir(const char * name); static int lowerit(int c); static int mkdirs(char * filename); + static void namecheck(const char * name); static void newabbr(const char * abbr); static zic_t oadd(zic_t t1, zic_t t2); static void outzone(const struct zone * zp, int ntzones); *************** *** 621,632 **** --- 622,652 ---- return (errors == 0) ? EXIT_SUCCESS : EXIT_FAILURE; }
+ #define BENIGN "+-_/" + + static void + namecheck(const char * const name) + { + register const char * cp; + + if (!noise) + return; + for (cp = name; *cp != '\0'; ++cp) + if (!isascii(*cp) || + (!isalnum(*cp) && strchr(BENIGN, *cp) == NULL)) { + warning(_("file name %s has non-ASCII-alphanumeric character other than %s"), + name, BENIGN); + return; + } + } + static void dolink(const char *const fromfield, const char *const tofield) { register char * fromname; register char * toname;
+ namecheck(tofield); if (fromfield[0] == '/') fromname = ecpyalloc(fromfield); else { *************** *** 1495,1500 **** --- 1515,1521 ---- void *typesptr = ats + timecnt; unsigned char *types = typesptr;
+ namecheck(name); /* ** Sort. */
-- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2013.0521 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."
Tim Parenti <tim@timtimeonline.com> writes:
If you allow dot, you'll want to add checks that "." and ".." are not used as file name components.
And it shouldn't start with a dot. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Given its use in "filename.ext" names, dot pretty much must be included in any list of file name characters. It isn't used in any current distribution zone names, and its presence might cause Windows shenanigans by making it look as if a name has an extension. Since we're in warning land, I'll advocate for warning about any dots in file names; this eliminates the need for more specific testing. --ado On Thu, Jun 26, 2014 at 3:58 AM, Andreas Schwab <schwab@suse.de> wrote:
Tim Parenti <tim@timtimeonline.com> writes:
If you allow dot, you'll want to add checks that "." and ".." are not used as file name components.
And it shouldn't start with a dot.
Andreas.
-- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Arthur David Olson wrote:
I'll advocate for warning about any dots in file names
That's easy enough, and simplifies the code and documentation; I pushed the attached patches. The second patch documents four other exceptional names I found when I ran the new 'zic -v' against the tz database.
I've not looked at the complete code, but the patch '0001-zic-v-...' contains: - static char const benign[] = ("-./_" + static char const benign[] = ("-/_" "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ"); register char const *component = name; I don't see the digits in there; are they handled separately? Also, what is the benefit of the parentheses around the string? On Thu, Jun 26, 2014 at 9:42 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Arthur David Olson wrote:
I'll advocate for warning about any dots in file names
That's easy enough, and simplifies the code and documentation; I pushed the attached patches. The second patch documents four other exceptional names I found when I ran the new 'zic -v' against the tz database.
-- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2013.0521 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."
Jonathan Leffler wrote:
I don't see the digits in there; are they handled separately?
No, because the Theory file does not consider digits to be benign. Because the behavior of a TZ setting like "PST8" is specified by POSIX, it's problematic to use 'zic' to define a zone name like that.
What is the benefit of the parentheses around the string?
I thought it'd help GNU Emacs indent it. But it can be reworded to avoid that; I plan to do that as part of my next proposed patch.
Shouldn't zic *error* if a component is equal to "." or "..", and warn on all other components containing dot? -- Tim Parenti On 26 June 2014 12:42, Paul Eggert <eggert@cs.ucla.edu> wrote:
Arthur David Olson wrote:
I'll advocate for warning about any dots in file names
That's easy enough, and simplifies the code and documentation; I pushed the attached patches. The second patch documents four other exceptional names I found when I ran the new 'zic -v' against the tz database.
In my tests, adding a new Zone line for "America/./New_York" at the end of northamerica, for example, causes the second definition of America/New_York to silently overwrite the first. Disallowing "." alongside ".." effectively requires that all paths be explicit, which allows the "duplicate zone name" error to handle the rest. -- Tim Parenti On 26 June 2014 18:55, Paul Eggert <eggert@cs.ucla.edu> wrote:
Tim Parenti wrote:
Shouldn't zic *error* if a component is equal to "." or ".."
I can see an argument for an error if a component is "..", but why is "." a problem? Can you give a problem scenario involving "."?
On 2014/06/26 08:58 AM, Andreas Schwab wrote:
Tim Parenti <tim@timtimeonline.com> writes:
If you allow dot, you'll want to add checks that "." and ".." are not used as file name components.
And it shouldn't start with a dot.
Andreas.
Trailing dots are a bit iffy as well, at least on Windows, where any trailing sequence of dots and spaces will get stripped (except when UNC paths are used). -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
Thanks. I pushed the attached patch, which started with your patch and incorporated suggestions from the followup comments. This patch also modifies 'Theory' to match practice better.
participants (7)
-
Andreas Schwab -
Arthur David Olson -
Garrett Wollman -
Ian Abbott -
Jonathan Leffler -
Paul Eggert -
Tim Parenti