Idea for internationalized time point unique time zone abbreviations
== INTRODUCTION == If time zone abbreviations for a given point in time are unique they can be used to faster find IANA time zones. The current abbreviations are not unique, e.g. IST could stand for Indian Standard Time and for Israel Standard Time. Since the IANA time zone database cutoff point is 1970-01-01T00:00:00Z and the first publication of ISO 3166 alpha-2 country codes occurred in the early 1970s, one could easily use the country codes to separate sets of abbreviations between countries. One rule sometimes used in the IANA time zone database for creating new abbreviations is to use the ISO alpha-2 country codes. The idea presented here applies such a rule rigidly. Apart from that the idea presented also applies rigid rules for marcation of offset changes like daylight saving time. == Features == One can derive the country of the time zone to which the abbreviation belongs from the abbreviation and from knowledge of the current ISO country code for that country. This is not possible with current IANA time zone database abbreviations like CET, IST, EDT, GALT, EAST, CT (e.g. Cuba Time), CUT (1924 Central Ukraine Time). One can group the abbreviations by country, alone by alphabetic sorting of the abbreviations. This is not possible with current IANA time zone database abbreviations like CST (US), CT (Cuba), PST (US). If the year is known, one can identify the time zone for any given abbreviation. This is not possible with current IANA time zone database abbreviations like IST. Reduces the number of IANA time zones for some abbreviations, e.g. IST is used for Asia/Jerusalem, Asia/Kolkata, while INCT/INZT would only refer to Asia/Kolkata. == DEFINITIONS == === D0) General === D0.1) DST - Daylight saving time D0.2) Format: "<CC><LC><(OC)>T" #abbreviations would be four or five letters long. Details in D1 to D4. === D1) CC - country code and similar === - ISO alpha-2 country code or, - a special code like "EU" or, - a code from the private use area to define larger regions, e.g.: -- XA - ASEAN -- XE - "East" for UTC offsets zones having positive offset -- XW - "West" for UTC offsets zones having negative offset -- ZZ - the whole world, used for UTC and UTC offset zones, details see examples. === D2) LC - location code and similar === D2.1) A character from the set [A-Z], unique for each real time zone within the country. Preferably not from the set [SDF] or any letter agreed in D3 for offset changes D2.2) A character from the set [0-9] for numbered zones, e.g. in Russia. D2.3) The letter C (Common Time) for the most common time, maybe also N derived from "National Time". D2.4) The letter Z could be used if there is only one time zone. Could be dropped for countries that only have one zone, but mandatory per D0.2). D2.5) For UTC offset zones that start with ZZ, when using date-time group (DTG) letters, the letter G - not the letter L to make changing into or from E harder. Otherwise either E for East or W for West, as defined in D1. === D3) OC - seasonal offset code and similar === D3.1) for DST a letter from the set [SD], preferably "D" since "S" in some contexts stands for "standard time" So even if in Europe the term "summer time" is more common, the preferred presentation uses unambiguous"D" D3.2) For wartime maybe the letter W or F as in forward time, W in D2 stands for West. D3.3) For double summer time - to be defined, maybe M for midsummer time D3.4) For absence of any extra rules "standard time", the letter Z is optional. D3.5) For UTC offset zones: a digit. === D4) T === As currently done in English to indicate "Time". == EXAMPLES == === E1) No DST === CUCT - Cuba (Common) Time#in IANA tzdb northamerica 8.54 is CT THCT - Thailand (Common) Time USET - US Eastern Time USCT - US Central Time USPT - US Pacific Time CAET - Canada Eastern Time CLCT - Chile (Common) Time #Continental Time CLET - Chile Eastern Time #Easter Islands, which in the IANA tzdb is EAS%sT ECCT - Ecuador (Common) Time #Continental Time ECGT - Galapagos Time #which in the IANA tzdb is GALT RUOT - Omsk Time #in IANA tzdb is OMST, could be read as Oman Summer/Standard Time RUMT - Moscow Time #Maybe RUCT - Russia Central/Common Time or RUNT - Russia National Time RUKT - Kaliningrad Time RUIT - Irkutsk Time RUVT - Volgograd Time #in IANA tzdb was VOLT Optional: RU1T - Russia First Time Zone Optional: RU2T - Russia Second Time Zone EUCT - Central European Time #Some countries that use this time, are not in the EU. EUWT - Western European Time #See comment for EUCT EUET - Eastern European Time #See comment for EUCT XACT - ASEAN Common Time XE01T - UTC+01 XE08T - UTC+08 XE13T - UTC+13 XW06T - UTC-06 XW11T - UTC-11 ZZZT - UTC ZZE1T - UTC+01 ZZE8T - UTC+08 ZZEAT - UTC+10 # A = hexadecimal for 10 ZZEDT - UTC+13 # A = hexadecimal for 13 ZZW6T - UTC-06 ZZWBT - UTC-11 # B = hexadecimal for 11 #Letters from the NATO(?) date-time group #taken from http://de.wikipedia.org/wiki/Date_Time_Group ZZGZT - UTC±00 ZZGAT - UTC+01 ZZGHT - UTC+08 ZZGKT - UTC+10 ZZGST - UTC-06 ZZGXT - UTC-11 === E2) With DST === #for meaning of the third letter see section E2 USEDT USCDT USPDT CAEDT RUMDT # Daylight saving time or "decree time" EUCDT #preferred as defined in D3.1 -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com/
Good idea, but no one has the right to impose a system like this on any country. Time zones and their names are decided at national, regional, and even local government levels, in some cases. The tz database simply tries to capture current tz usage - not apply or impose a standard. On 2012-06-05 18:42, Tobias Conradi wrote:
== INTRODUCTION == If time zone abbreviations for a given point in time are unique they can be used to faster find IANA time zones. The current abbreviations are not unique, e.g. IST could stand for Indian Standard Time and for Israel Standard Time.
Since the IANA time zone database cutoff point is 1970-01-01T00:00:00Z and the first publication of ISO 3166 alpha-2 country codes occurred in the early 1970s, one could easily use the country codes to separate sets of abbreviations between countries.
One rule sometimes used in the IANA time zone database for creating new abbreviations is to use the ISO alpha-2 country codes. The idea presented here applies such a rule rigidly.
Apart from that the idea presented also applies rigid rules for marcation of offset changes like daylight saving time.
== Features == One can derive the country of the time zone to which the abbreviation belongs from the abbreviation and from knowledge of the current ISO country code for that country. This is not possible with current IANA time zone database abbreviations like CET, IST, EDT, GALT, EAST, CT (e.g. Cuba Time), CUT (1924 Central Ukraine Time).
One can group the abbreviations by country, alone by alphabetic sorting of the abbreviations. This is not possible with current IANA time zone database abbreviations like CST (US), CT (Cuba), PST (US).
If the year is known, one can identify the time zone for any given abbreviation. This is not possible with current IANA time zone database abbreviations like IST.
Reduces the number of IANA time zones for some abbreviations, e.g. IST is used for Asia/Jerusalem, Asia/Kolkata, while INCT/INZT would only refer to Asia/Kolkata.
== DEFINITIONS == === D0) General === D0.1) DST - Daylight saving time D0.2) Format: "<CC><LC><(OC)>T" #abbreviations would be four or five letters long. Details in D1 to D4.
=== D1) CC - country code and similar === - ISO alpha-2 country code or, - a special code like "EU" or, - a code from the private use area to define larger regions, e.g.: -- XA - ASEAN -- XE - "East" for UTC offsets zones having positive offset -- XW - "West" for UTC offsets zones having negative offset -- ZZ - the whole world, used for UTC and UTC offset zones, details see examples.
=== D2) LC - location code and similar === D2.1) A character from the set [A-Z], unique for each real time zone within the country. Preferably not from the set [SDF] or any letter agreed in D3 for offset changes D2.2) A character from the set [0-9] for numbered zones, e.g. in Russia. D2.3) The letter C (Common Time) for the most common time, maybe also N derived from "National Time". D2.4) The letter Z could be used if there is only one time zone. Could be dropped for countries that only have one zone, but mandatory per D0.2). D2.5) For UTC offset zones that start with ZZ, when using date-time group (DTG) letters, the letter G - not the letter L to make changing into or from E harder. Otherwise either E for East or W for West, as defined in D1.
=== D3) OC - seasonal offset code and similar === D3.1) for DST a letter from the set [SD], preferably "D" since "S" in some contexts stands for "standard time" So even if in Europe the term "summer time" is more common, the preferred presentation uses unambiguous"D"
D3.2) For wartime maybe the letter W or F as in forward time, W in D2 stands for West. D3.3) For double summer time - to be defined, maybe M for midsummer time D3.4) For absence of any extra rules "standard time", the letter Z is optional. D3.5) For UTC offset zones: a digit.
=== D4) T === As currently done in English to indicate "Time".
== EXAMPLES == === E1) No DST ===
CUCT - Cuba (Common) Time#in IANA tzdb northamerica 8.54 is CT THCT - Thailand (Common) Time USET - US Eastern Time USCT - US Central Time USPT - US Pacific Time CAET - Canada Eastern Time CLCT - Chile (Common) Time #Continental Time CLET - Chile Eastern Time #Easter Islands, which in the IANA tzdb is EAS%sT ECCT - Ecuador (Common) Time #Continental Time ECGT - Galapagos Time #which in the IANA tzdb is GALT
RUOT - Omsk Time #in IANA tzdb is OMST, could be read as Oman Summer/Standard Time RUMT - Moscow Time #Maybe RUCT - Russia Central/Common Time or RUNT - Russia National Time RUKT - Kaliningrad Time RUIT - Irkutsk Time RUVT - Volgograd Time #in IANA tzdb was VOLT Optional: RU1T - Russia First Time Zone Optional: RU2T - Russia Second Time Zone
EUCT - Central European Time #Some countries that use this time, are not in the EU. EUWT - Western European Time #See comment for EUCT EUET - Eastern European Time #See comment for EUCT
XACT - ASEAN Common Time
XE01T - UTC+01 XE08T - UTC+08 XE13T - UTC+13 XW06T - UTC-06 XW11T - UTC-11
ZZZT - UTC ZZE1T - UTC+01 ZZE8T - UTC+08 ZZEAT - UTC+10 # A = hexadecimal for 10 ZZEDT - UTC+13 # A = hexadecimal for 13 ZZW6T - UTC-06 ZZWBT - UTC-11 # B = hexadecimal for 11
#Letters from the NATO(?) date-time group #taken from http://de.wikipedia.org/wiki/Date_Time_Group ZZGZT - UTC±00 ZZGAT - UTC+01 ZZGHT - UTC+08 ZZGKT - UTC+10 ZZGST - UTC-06 ZZGXT - UTC-11
=== E2) With DST === #for meaning of the third letter see section E2 USEDT USCDT USPDT CAEDT RUMDT # Daylight saving time or "decree time" EUCDT #preferred as defined in D3.1
--
On Wed, Jun 6, 2012 at 2:09 AM, David Patte <dpatte@relativedata.com> wrote:
Good idea, but no one has the right to impose a system like this on any country. They would not be more imposed than the current abbreviations are. E.g. FET, see the thread at http://mm.icann.org/pipermail/tz/2011-September/008846.html
or likely EAST (Chile), GALT (Ecuador), CT (Cuba) in Spanish-speaking Latin America.
Time zones and their names are decided at national, regional, and even local government levels, in some cases.
And abbreviations in some cases by Paul Eggert (proposer) and Arthur David Olson (decider) http://mm.icann.org/pipermail/tz/2011-September/008838.html ""FET" used as abbreviation for Belarus, Ukraine, and western Russia (thanks to Paul Eggert)" And the first thread above shows several opposing users, Yury, Clive, Tobias.
The tz database simply tries to capture current tz usage - not apply or impose a standard. At least there is rule in the Theory file to use ISO 3166-1 under certain conditions.
The idea that I did present is maybe a little bit extreme if applied for all countries, but maybe abbreviations that have no local usage could be revised inspired by the idea. Or some people that fork can make use of the idea. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com/
On Wed, Jun 6, 2012 at 2:50 AM, Tobias Conradi <tobias.conradi@gmail.com> wrote:
or likely EAST (Chile), GALT (Ecuador), CT (Cuba) in Spanish-speaking Latin America. According to ftp://ftp.iana.org/tz/data/northamerica 8.54 it is never CT as Tobias Conradi claimed but one of CST/CDT.
EAST is correct, and for daylight saving time it is EASST. -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 For what it's worth, I'll second David Patte's post that Tobias's idea is a good one (that would face the usual hurdles of adoption of a new international standard). I do have a criticism regarding localizations of the abbreviations (see below), but that doesn't detract in any way from the merit I see in having a unique international standard for the abbreviations. - From my perspective as an app developer, Tobias's proposal would (obviously) make it easier to work with users' identification of timezones. Here's a tweak to Tobias's idea that, while linguistically inelegant, might be a functional improvement: Have the DST code be the final character of the abbreviation string. This would allow computer programs to easily truncate and ignore the the DST code, associate the remainder with a timezone name, and get the relevant and correct time data either directly from the tzif file or from a library function. Thus (as in John Haxby's recent example in his post), a self-described infallible user could refer to her zone using the 'winter' abbreviation for a 'summer' date, without any negative repercussions to her self-esteem or to the scheduling of an international conference call between PST and GMT. CRITICISM: As Andy Lipscomb, Brian Inglis and Peter Machata all recently posted, localizations for all timezone names and abbreviations can (and should) be expected to be in the local language and character set. Who is/are responsible for maintaining each locality's localization lists, in order, for example, to guarantee uniqueness? The GLIBC library of GNU/Linux currently has a function nl_langinfo, which gives programmers an easy way to localize pretty much all time and date related data elements... but not timezone names or abbreviations. The GNU/Linux coreutil commandline 'date +%Z' command will return a timezone abbreviation, but seemingly never localized. I see no option in the GNU/Linux coreutil commandline 'date' command to return a timezone name. So how do locales represent timezone strings in their own localization? I recently started testing my libhdate collection under a variety of locales, so I had a small script all ready for this test: ===================================================================== #!/bin/bash # locale_test.sh # execute an arbitrary command from a list of locales # note: the locales must first be defined using localedef(1). This usually requires administrator privileges locales=( \ he_IL Hebrew Israel \ de_DE German Germany \ pt_BR Portuguese Brazil \ ru_RU Russian Russia \ ar_MA Arabic Morocco \ es_AR Spanish Argentina \ es_MX Spanish Mexico \ fr_FR French France \ fr_CA French Canada \ hi_IN Hindi India \ fa_IR Farsi Iran \ ja_JP Japanese Japan \ ko_KR Korean Korea \ hy_AM Armenian Armenia \ ka_GE Georgian Georgia \ am_ET Amharic Ethiopia \ zz ) i=0 while [[ "${locales[$i]}" != "zz" ]]; do echo -e "\e[48;5;234mcommand: $*, locale: ${locales[$i]} ${locales[$i+1]} ${locales[$i+2]}\e[01;00m" env LC_ALL=${locales[$i]}.UTF-8 $* i=$i+3 done ===================================================================== $ ./locale_test.sh TZ='Asia/Tehran' date +"%A_%B_%Z" command: TZ=Asia/Tehran date +%A_%B_%Z, locale: he_IL Hebrew Israel רביעי_יוני_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: de_DE German Germany Mittwoch_Juni_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: pt_BR Portugese Brazil quarta_junho_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ru_RU Russian Russia Среда_Июнь_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ar_MA Arabic Morocco الأربعاء_يونيو_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: es_AR Spanish Argentina miércoles_junio_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: es_MX Spanish Mexico miércoles_junio_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fr_FR French France mercredi_juin_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fr_CA French Canada mercredi_juin_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: hi_IN Hindi India बुधवार _जून_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fa_IR Farsi Iran چهارشنبه_ژوئن_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ja_JP Japanese Japan 水曜日_6月_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ko_KR Korean Korea 수요일_6월_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: hy_AM Armenian Armenia Չորեքշաբթի_Հունիսի_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ka_GE Georgian Georgia ოთხშაბათი_ივნისი_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: am_ET Amharic Ethiopia ረቡዕ_ጁን_IRDT ===================================================================== - From my perspective, it would be best if nl_langinfo would be extended to include timezone names and abbreviations. That would make it trivial to write code that would portable across all locales, and would be able to auto-magically accept timezone names and abbreviations in the user's selected locale. Of course, nl_langinfo just picks up data from locale definition files, so this still would mean each locale having a complete set of entries for each timezone name and abbreviation (just as each dos now month names, days of week). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJPz3ivAAoJEDvrUfDmCx9L+U4P/0a3XTlUXWZ6P563TE/IqPJA r/NuTtWpx7+G4pJnwLewKEmV1Gh1ZTiVgnuKppyNHIh6eKLNZBrunFYhuuq0AZNc YCQqnKjS5q9L1Kf1XBidn7WyB3MTESNlVJTfpZddsBQeC/g3avbmN5sAw6zmDhwQ C07MKyl5ugysd3BgyLGx+veujlx5uT3QGjx++xyGiqJn4kkCiz3hfEc62BkZZVvF fYrjF2Oo+TEJFu5KpIzankpjZuS2PhclXvV1FcgJbDV5HKWSWsngirIXcQmucPWa Ji2p9IPEsO0eNPKHAkBvrFipF+YoSDQrGkY0D5iBM2Ja4ty1SpklzFljexZbgdsB E/3d+QR9nfF+2Mdev+A0sezDILKrdS8/UFUMa34NK2+UpD4NvtxBOyCAsuwu5POV Tor3W70pK3F+/EcWHJmSKS7QabTrJwI+DVH74OK1pLwZ+GiUUQuBIPlIk6Ct7SRr dVSKkcvnobrS5kXb+ZnbiRGuyUFN02BMo3gROGTTuLZQ6K16fe1bnaVN0fRjhMcf HCl0zh9qEpY+OhqQs6Jnm9VmDG67K1mbh6g4MdvPjP0q6xyhSGon/rCBYOWSizKf gaJnXUnNxva5B+Yla1OhyhdonRDhXiTXyJTUP5urjQM+vWaV671DdbSEDZZgKrU/ lrAUzcEWaU9q5ir/FflU =fGnP -----END PGP SIGNATURE-----
On 2012-06-06 16:35, Boruch Baum wrote:
- From my perspective, it would be best if nl_langinfo would be extended to include timezone names and abbreviations. That would make it trivial to write code that would portable across all locales, and would be able to auto-magically accept timezone names and abbreviations in the user's selected locale.
Of course, nl_langinfo just picks up data from locale definition files, so this still would mean each locale having a complete set of entries for each timezone name and abbreviation (just as each dos now month names, days of week).
I guess the main problem here is that the week-day names and month names (in the Gregorian/Julian calendar system) are known internally by glibc, whereas timezone names and abbreviations are external to glibc (and are also a bit of a moving target). -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
On 2012-06-06 16:51, Ian Abbott wrote:
On 2012-06-06 16:35, Boruch Baum wrote:
- From my perspective, it would be best if nl_langinfo would be extended to include timezone names and abbreviations. That would make it trivial to write code that would portable across all locales, and would be able to auto-magically accept timezone names and abbreviations in the user's selected locale.
Of course, nl_langinfo just picks up data from locale definition files, so this still would mean each locale having a complete set of entries for each timezone name and abbreviation (just as each dos now month names, days of week).
I guess the main problem here is that the week-day names and month names (in the Gregorian/Julian calendar system) are known internally by glibc, whereas timezone names and abbreviations are external to glibc (and are also a bit of a moving target).
Even worse, the user can change the abbreviations to anything she likes by setting the TZ environment variable and calling tzset(). How would _that_ get localized? -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2012 11:56 AM, Ian Abbott wrote:
Even worse, the user can change the abbreviations to anything she likes by setting the TZ environment variable and calling tzset(). How would _that_ get localized?
If I understand your case correctly, that wouldn't need to be localized in any special way. Should the infallible user set TZ to a zone for which no tzif exists, the system should functionally ignore it, because it has no tzif to turn to for data. If you mean that the user was sufficiently sophisticated to first go to the trouble of compiling a custom timezone using 'zic', and then set the TZ, then I'd feel comfortable imposing upon that user the burden of likewise modifying and recompiling the locale definition file. Am I on track? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJPz4StAAoJEDvrUfDmCx9LW+MP/3nq1WavbZ9VYQbnfC8VemH8 WLgOmb2bHIGD4TaRES6QKvunsh+hpcUta0WE2fJfFyyCEZCVpu7pPT8HU29n53ZA Udchqr3nuJaifCUnlebzrN1sygv6M+6f7/MQEBWQx4hH4yIfQkniCnj5P9XTAerR W4OFSkJXR3dwPMx/rS+/XaHpxNcIK/1FvAEu0/6Q0brOP0li9SDAfCppyKEFekjN h0EbxZGyxGVSrJQlK8KGyAqPRP+h+4Ni05JfxkfdtBhHjIl8Ow6nVRTWQuaYSddV 3XKSUx7z/3TEtKKRgB2cF6ESsjCgb4sD+qj9t9UdZvCOub3RINswenbumB0I6jWx rHN1VQ43qlfLfuJoN7p2F9xbhBhlAHqWwuNyQ2Sg7FmcukbqbgrwTwVexyx8hG1Y /JJ9iNb0WJ4bOZfluO6GbbzdJFeM5r/kTGVa7aV/gtjfmg+r0TFNH4XqKf69zicR t+55aZ7NNwPHajw5Z8ffaJtzzSx5qaY5JgaatoxiDZHCkCaAZ0kIdjgIAVi5gJoU fajQoItSB6TRoc7jmrjxwlkaSoeyZ8X2J7bti1wXBM4PNbnxfXS26A7Ebpid6dLn KF6J4opuVcdIGeulZbIqZ2WcvURH3FG1nWBTdhaxKt4x0vjjJb+K7TDoTZjTmeir l5DShEyggQHctuoNJBcj =Hyby -----END PGP SIGNATURE-----
On 2012-06-06 17:26, Boruch Baum wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/06/2012 11:56 AM, Ian Abbott wrote:
Even worse, the user can change the abbreviations to anything she likes by setting the TZ environment variable and calling tzset(). How would _that_ get localized?
If I understand your case correctly, that wouldn't need to be localized in any special way. Should the infallible user set TZ to a zone for which no tzif exists, the system should functionally ignore it, because it has no tzif to turn to for data.
If you mean that the user was sufficiently sophisticated to first go to the trouble of compiling a custom timezone using 'zic', and then set the TZ, then I'd feel comfortable imposing upon that user the burden of likewise modifying and recompiling the locale definition file.
Am I on track?
Well she might expect the abbreviations in the TZ environment variable to be used without localization, in which case there would need to be a hidden flag in the global variables (and the struct tm for the re-entrant variants of various time-related functions) to say if the time-zone abbreviations are allowed to be localized. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2012 01:12 PM, Ian Abbott wrote:
Well she might expect the abbreviations in the TZ environment variable to be used without localization, in which case there would need to be a hidden flag in the global variables (and the struct tm for the re-entrant variants of various time-related functions) to say if the time-zone abbreviations are allowed to be localized.
I don't understand any of this. Could you please 'dumb-down' the description of the case and your solution? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJPz5iQAAoJEDvrUfDmCx9LckoP/0dKf6nIonk6ayaq2DOsQtFb 5bkuXVMt3/geQzsrP0gph1jLp7lsZ59tbxbW5LiBT5G3LwKUew4zAB7L+qjc8KhI +WGCmsZptA0XGmi66sgkFEx7ABVbAs8foExLiwsqkUDv0q6HexvmR58HpI8GfeJ7 YTZbKq7fu+RgBlI9ViFsTpUuhYuHbp30UrPMtmJRol7ULdC8i+1FXSG5cmTpoIuh cLMby83GcLbLi9yJSpK4FWD5U9TY2lgj5Y6JRZsxWEj3n251gPQBqvyFK1xGjrJ3 LA98UXlDGBUp5lCuevAHV/9sslJv3hL53vqso87QCt8ruEU1HGgTO5RsgyrbLXns i4IexSNYekjYRDh7mBwin/vF7erfjRxPOaN/JoJ0WM2tpjloFmLjHGAZoDdzTko5 6loRHK2au7trs+zsGnr7A31B5jgEVJusTBLrYzpZ1XjPpMgVUfvSEmhdlS9QE47+ FwiIBjR2BW4GMqPqyo7kAPFkoFWpsyLQivV3wVotvDZhz69Y7+eMubssKlHqhLrs ErnZpgqFUFtLzCxETdJseu0bEot5AR3fGYQzjfs33dP3cowAgf+7Fq2BlJ4JKgwM rcgWZH1F+rCoDJ+ssKmkjZ+ypmxfg2j1IfPf3W3l+Pfxr/yhtNW5T1HY+gwP7eb/ CZRO06lK44pA7YCau8Kb =uj6e -----END PGP SIGNATURE-----
On 2012-06-06 18:51, Boruch Baum wrote:
On 06/06/2012 01:12 PM, Ian Abbott wrote:
Well she might expect the abbreviations in the TZ environment variable to be used without localization, in which case there would need to be a hidden flag in the global variables (and the struct tm for the re-entrant variants of various time-related functions) to say if the time-zone abbreviations are allowed to be localized.
I don't understand any of this. Could you please 'dumb-down' the description of the case and your solution?
Well, say the set the TZ environment variable to "AST5" and expected "AST" to be displayed as the timezone abbreviation rather than some translation based on the user's current locale. Actually, "AST" is used as the abbreviation for several real timezones, so even if it localization were allowed, the localization would have to depend on the specific timezone it came from as well as the user's current locale. Of course, Herr Conradi's proposal for unique timezone abbreviations would help in that case. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2012 11:51 AM, Ian Abbott wrote:
I guess the main problem here is that the week-day names and month names (in the Gregorian/Julian calendar system) are known internally by glibc, whereas timezone names and abbreviations are external to glibc (and are also a bit of a moving target).
Yup. Not trivial. However, GLIBC and coreutils are already picking up timezone information and abbreviations (eg. date +%z_%Z) from the tzif files, and they already pick up localizations from localedef files, so it's not too far a jump to suggest extending it to an arbitrarily sized list of unique zone-names (and with Tobias's suggestion, unique zone-abbreviations). BTW, a look at the wikipedia list of tz abbreviations shows a case to confound some translators for unique zone descriptions, because I suspect that some languages use non-unique words for Arab, Arabian, Arabic: AST Arab Standard Time (Kuwait, Riyadh) UTC+03 AST Arabian Standard Time (Abu Dhabi, Muscat) UTC+04 AST Arabic Standard Time (Baghdad) UTC+03 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJPz4LhAAoJEDvrUfDmCx9L/z0P/jP3M9mWGm9EsjWv5Ra96Fri GkjBUXVHPoTwYFKXJK5gg9ZjqxVSDfTIyTBruzSg/szkhCNEwSS56LL5j0AOtKwm zyyW/2MTVBRRWQ5W+HST9O5i6oR1kEutHXNkCCCNknsiH9HmdjK7jBcMankHre/R 7+SN18opNCwef6/ZTJJY95FAE2D1tDbJPdKTXK9tqpM0wNWkxeTjq9yvfPXbgm5r B/slkCyJ8Ef8cAhnr6lEMQI6KWz2bFWt+PZBfp07kzyUbK1ZSmqTlUFF+gFx1Ttb KHJ/dhgHecGHu8lwZoywscIVbwDRN3iOmGU1jCpRBzOKjYh468hxGLbiVTVXdcLs G+6DmVkF4XbZDIdvkJk3TLs5EUzLqaXgJ3VNSK6thfjiS9vqE9sDUzTEsDNgxUj0 qQL9ugRx2DLOcu4zVjD8syh5opCFvKPN6psCmHCxFOnlWrnGvvuzeBDHSzPoi40c 2ps2KhDBf2DXWH0ZAh8+oe8ZEsN4SQsmGjx8BZX3Xm/76FHG3rU+vk/8jtESZ+DB FHRcGUAIU/b7G/55pRyd819aJGu2Utgio9NVFxivcxfOsNLbJ3i6uuNhQ4XNbfmR sN0Tz1+eMcjaC4lHaVNOKGYyV1MZBEiXvAJe6YFuFn2YNRIrSj4A3cFU35Jko2AG OJ1XHLLGw0vO05ihsABD =egOy -----END PGP SIGNATURE-----
I thought that the abbreviations were intended to be the local abbreviations used for the display of a time value within particular timezones and reflect local practice. I didn't think that they were supposed to be unique. The Zone name is unique. Why does a timezone need two unique identifiers? If this proposal is adopted, is the intent to promote the use of the new unique abbreviations as the new local practice? alan On 6/6/12 8:35 AM, Boruch Baum wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
For what it's worth, I'll second David Patte's post that Tobias's idea is a good one (that would face the usual hurdles of adoption of a new international standard). I do have a criticism regarding localizations of the abbreviations (see below), but that doesn't detract in any way from the merit I see in having a unique international standard for the abbreviations.
- From my perspective as an app developer, Tobias's proposal would (obviously) make it easier to work with users' identification of timezones.
Here's a tweak to Tobias's idea that, while linguistically inelegant, might be a functional improvement: Have the DST code be the final character of the abbreviation string. This would allow computer programs to easily truncate and ignore the the DST code, associate the remainder with a timezone name, and get the relevant and correct time data either directly from the tzif file or from a library function. Thus (as in John Haxby's recent example in his post), a self-described infallible user could refer to her zone using the 'winter' abbreviation for a 'summer' date, without any negative repercussions to her self-esteem or to the scheduling of an international conference call between PST and GMT.
CRITICISM: As Andy Lipscomb, Brian Inglis and Peter Machata all recently posted, localizations for all timezone names and abbreviations can (and should) be expected to be in the local language and character set.
Who is/are responsible for maintaining each locality's localization lists, in order, for example, to guarantee uniqueness?
The GLIBC library of GNU/Linux currently has a function nl_langinfo, which gives programmers an easy way to localize pretty much all time and date related data elements... but not timezone names or abbreviations. The GNU/Linux coreutil commandline 'date +%Z' command will return a timezone abbreviation, but seemingly never localized. I see no option in the GNU/Linux coreutil commandline 'date' command to return a timezone name.
So how do locales represent timezone strings in their own localization?
I recently started testing my libhdate collection under a variety of locales, so I had a small script all ready for this test:
===================================================================== #!/bin/bash # locale_test.sh # execute an arbitrary command from a list of locales # note: the locales must first be defined using localedef(1). This usually requires administrator privileges
locales=( \ he_IL Hebrew Israel \ de_DE German Germany \ pt_BR Portuguese Brazil \ ru_RU Russian Russia \ ar_MA Arabic Morocco \ es_AR Spanish Argentina \ es_MX Spanish Mexico \ fr_FR French France \ fr_CA French Canada \ hi_IN Hindi India \ fa_IR Farsi Iran \ ja_JP Japanese Japan \ ko_KR Korean Korea \ hy_AM Armenian Armenia \ ka_GE Georgian Georgia \ am_ET Amharic Ethiopia \ zz )
i=0 while [[ "${locales[$i]}" != "zz" ]]; do echo -e "\e[48;5;234mcommand: $*, locale: ${locales[$i]} ${locales[$i+1]} ${locales[$i+2]}\e[01;00m" env LC_ALL=${locales[$i]}.UTF-8 $* i=$i+3 done
===================================================================== $ ./locale_test.sh TZ='Asia/Tehran' date +"%A_%B_%Z" command: TZ=Asia/Tehran date +%A_%B_%Z, locale: he_IL Hebrew Israel רביעי_יוני_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: de_DE German Germany Mittwoch_Juni_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: pt_BR Portugese Brazil quarta_junho_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ru_RU Russian Russia Среда_Июнь_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ar_MA Arabic Morocco الأربعاء_يونيو_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: es_AR Spanish Argentina miércoles_junio_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: es_MX Spanish Mexico miércoles_junio_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fr_FR French France mercredi_juin_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fr_CA French Canada mercredi_juin_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: hi_IN Hindi India बुधवार _जून_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fa_IR Farsi Iran چهارشنبه_ژوئن_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ja_JP Japanese Japan 水曜日_6月_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ko_KR Korean Korea 수요일_6월_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: hy_AM Armenian Armenia Չորեքշաբթի_Հունիսի_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ka_GE Georgian Georgia ოთხშაბათი_ივნისი_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: am_ET Amharic Ethiopia ረቡዕ_ጁን_IRDT =====================================================================
- From my perspective, it would be best if nl_langinfo would be extended to include timezone names and abbreviations. That would make it trivial to write code that would portable across all locales, and would be able to auto-magically accept timezone names and abbreviations in the user's selected locale.
Of course, nl_langinfo just picks up data from locale definition files, so this still would mean each locale having a complete set of entries for each timezone name and abbreviation (just as each dos now month names, days of week).
Alan Perry said:
The Zone name is unique. Why does a timezone need two unique identifiers?
I agree: one unique identifier seems sufficient, and is consistent with general database principles. I believe abbreviations are a UI problem and not directly related to calculating wall-clock time in a particular location. I think we should keep them, for convenience, but ultimately it's up to implementers to display meaningful, localized, user-consumable strings on a UI. It's not up to a lower-level system dealing with time zone calculations. David Braverman http://www.inner-drive.com/Demo/TimeZones.aspx
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2012 12:16 PM, Alan Perry wrote:
I thought that the abbreviations were intended to be the local abbreviations used for the display of a time value within particular timezones and reflect local practice. I didn't think that they were supposed to be unique. The Zone name is unique. Why does a timezone need two unique identifiers?
I can't speak for the OP, but ... The attraction of the proposal to me is that there be unique identifiers that are easy to type, easy to work with, and that will be sufficiently attractive to people and mass media that they actually use the identifiers to the extent that what they input into a computer program / mobile app corresponds to something that can be parsed.
If this proposal is adopted, is the intent to promote the use of the new unique abbreviations as the new local practice?
I don't think that would ever work; the global trend is to accept diversity and multiplicity of locales. People would continue to refer to timezones using abbreviations and names in their own languages and character-sets (refer to recent posts by Andy Lipscomb re French Canadians, and Peter Machata re Czech republic). Rather, the idea would be to start with a baseline international standard of unique abbreviations (and their text descriptions); each locale would have locale-internal-unique localization strings corresponding to the baseline. The result could allow international portability of computer programs for timezone information similar to that which already exists for gregorian month names and days of the weeks (using, in the case of GNU/Linux, the GLIBC nl_langinfo() function ). That I take it for granted the baseline will be in Latin characters and English is just a consequence of that language's stature at this point in history. Two thousand years ago, we would be e-mailing each other in Greek. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJPz5BdAAoJEDvrUfDmCx9LQokP/Rawy5VK3DN72l++Dj9CWs5i TmE5PZHhN2x2+Pkhgo7BuUxIkrwyObLshW0ss630+cEOn9HJNqTLbTG4YZckdnD9 ii7lBHWxbgA631mj+Lw8KxZ12fWBVxiKlnRJwYYy/MQSeTRmMw9wTl017oNJ3GWZ 3nuyBr36iD+1zPPaJlECZDJuZCR8mT3YgSgn3YBAH0CO8zQRGcKaSUh9zEKY2J9S xIaHmO9f93Bi3jNOT7avZAPQZICQ2Okx4vA1A215oMwLMG9w+y79zLbXiNHvqiET rWMy2/7Nuuzdc6WMkr0u/oAQb9/G+9DuP9coXMNiKuCvlNrsmFdRu2WXLX0c68lC skHjpnSwe31doMNo4BXu5Pk4ZA/d0SFgB3+LBZ5nrdQ086JPURLadWwPDLvJGkVu 1U/aescOYrph/VX+1Mxjjf85l6pJoSljB2iSEriyHQC6OGq8+smY3JrSEwgrQgMu sn5811gzci1BNQxyrFTR52J1gybkylkvFRdLDo1lYZYJvjJhupNxYRHnXzaIoWrh ktd5CBMT309uuUAWqzv5tMqBRZ5MZVN8LURIGRYoKAP/m8acNRNzg31Sik9lK/wb dY3vodrILc2fqqkGnHlkpcJr8fH1m88/hWnuzG+NSEws6i7uzKlEhokquMpgrtkj tOA45cpvoaYpY+xiDd52 =b3QX -----END PGP SIGNATURE-----
I think it is a waste of time to worry about the timezone abbreviations in the tzdatabase. Despite being around for years, the vast majority of the ones that are in the files are not familiar to people in the timezones in question, and are thus essentially useless. Moreover, they are only in a single language (English) and script (Latin). Time spend on talking about them would be far better spent doing something productive.
"I think it is a waste of time to worry about the timezone abbreviations in the tzdatabase. Despite being around for years, the vast majority of the ones that are in the files are not familiar to people in the timezones in question, and are thus essentially useless. Moreover, they are only in a single language (English) and script (Latin). Time spend on talking about them would be far better spent doing something productive." Not to mention, there's already an effort to gather localized names for zones in the CLDR.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2012 01:42 PM, Andy Lipscomb wrote:
Not to mention, there's already an effort to gather localized names for zones in the CLDR. Thank you for the reference, Andy. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAEBAgAGBQJPz5rkAAoJEDvrUfDmCx9LkPIP/R2WYFGup5QHd3ngG+YIAGaT uBdsA31CE+vDClSffJFFMT3lFJiG4khUOx/syQ6hENCJB6YalieRSoQCa9jZQHAO K/c2uP/dotUv2xDJIGoVmUYnN3sRV+LvqqmcshvNs4howaZiZPZ7F9kQ9P3BBq14 I5XtcuYjgw61rXSYrCT1y7wjgNOTTa7hm2g845ieXSOBa8ZEoZ10DP5nuKGOKdex 1FxLDqhGmXdlbWJ3myf1vQ0LUpTLyQmT1hu8KssjMCtuhxksOvhPq5KFmK02q+Lb lQv3QfxBNMaJy2nx99w452xVGkzJRCAiph59h5IP8aBDnMPsbBWieKQL5+jBScvq tKYQMM4CdPpq0iXqfHcyyU0s5Z9mDp9gOiNLXrVF2oufeIy2TnNSFkouBKw1QKt2 WGx3kRawu+2/188mit2BBO/gHVboSP3wPdmDRqunjUlFWrDz3eO1xMPAJwcxMR1C CEc8IDAv3n87yZ0/0GbQO2j32O8FV5nTHd6FdPsyle5yuQflLN7p67amrbsfwW+S nRKVGWpr+J2YUlkvkjnEKVPkrQqziqgYu9EIE5DqtafCQRUZlxYhGRTdHT2syJgy 4+omDkgBab9YJeU7fpqf9pjN8vKSqPackhHo8wsml8/O46y7C51q4ZETMZfem6+o 0oBBlYXQa1r2ri3jMHDc =o/SH -----END PGP SIGNATURE-----
On the contrary, I think people are far more familiar with their timezone abbreviations that they are with the olson timezone region names. On 2012-06-06 13:36, Mark Davis ☕ wrote:
I think it is a waste of time to worry about the timezone abbreviations in the tzdatabase.
Despite being around for years, the vast majority of the ones that are in the files are not familiar to people in the timezones in question, and are thus essentially useless. Moreover, they are only in a single language (English) and script (Latin). Time spend on talking about them would be far better spent doing something productive.
--
on the contrary
First, I wasn't saying that the TZDB identifiers were commonly used externally; they are not. Nobody writes "I'll meet you at 12:30 Europe/Paris". Secondly, I said the "vast majority". Some of the abbreviations are quite familiar, at least for those familiar with the timezones in question, because they match what is in common use. People living in the US, or English speakers outside that have a lot to do with the US would recognize "CST". Tell a Japanese "I'll send the message at 12:30 AMST" and see what he or she understands. In CLDR we ended up dropping almost all abbreviations, because people (whether familiar with the timezones in question or not) didn't recognize them. It's not as if the TZDB is using the wrong abbreviation (at least for English), it's most often that there just isn't an accepted abbreviation in a given language for a particular timezone. ------------------------------ Mark <https://plus.google.com/114199149796022210033> * * *— Il meglio è l’inimico del bene —* ** On Wed, Jun 6, 2012 at 11:26 AM, David Patte <dpatte@relativedata.com>wrote:
On the contrary, I think people are far more familiar with their timezone abbreviations that they are with the olson timezone region names.
On 2012-06-06 13:36, Mark Davis ☕ wrote:
I think it is a waste of time to worry about the timezone abbreviations in the tzdatabase.
Despite being around for years, the vast majority of the ones that are in the files are not familiar to people in the timezones in question, and are thus essentially useless. Moreover, they are only in a single language (English) and script (Latin). Time spend on talking about them would be far better spent doing something productive.
--
On Thu, Jun 7, 2012 at 5:09 PM, Mark Davis ☕ <mark@macchiato.com> wrote:
Secondly, I said the "vast majority". Some of the abbreviations are quite familiar, at least for those familiar with the timezones in question, because they match what is in common use. People living in the US, or English speakers outside that have a lot to do with the US would recognize "CST". Tell a Japanese "I'll send the message at 12:30 AMST" and see what he or she understands. In CLDR we ended up dropping almost all abbreviations, because people (whether familiar with the timezones in question or not) didn't recognize them. It's not as if the TZDB is using the wrong abbreviation (at least for English), it's most often that there just isn't an accepted abbreviation in a given language for a particular timezone.
I live in a country comprised entirely in one time zone (Italy), so the sensibility to the issue is lower than in a country spanning more zones. The only way I see other time zones mentioned in media/common people speech is the equivalent of "time of a well known big city" like, "london time", "new york time", "tokyo time". FWIW. P.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/07/2012 11:38 AM, Pierpaolo Bernardi wrote:
I live in a country comprised entirely in one time zone (Italy), so the sensibility to the issue is lower than in a country spanning more zones. The only way I see other time zones mentioned in media/common people speech is the equivalent of "time of a well known big city" like, "london time", "new york time", "tokyo time". How would you prefer to see (and be able to program) that information in a computer interface?
For example, currently the coreutils command line utility 'date' gives you only two format options for presenting a timezone: 1] %z - a numeric UTC offset hh:mm 2] %Z - a tz abbreviation, only available in latin characters, of only the Anglo (and non-IANA) timezone name (eg. CST for not-Australian but USA Central Standard Time). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJP0fIJAAoJEDvrUfDmCx9LaH4P/17v8C88C/9rOcxNli9Dpgcd Qp5fqDe3Qvr38OTMBVt4Jknhy5sK9IaoSm1f1sRZ1ZU2a4VM29WW/ccE+mLNr8zy L7YkycEaNvo9CMHHdGiDN5rl9p9Y7+ilkQC32VfPaAIhfg1INYCsU9MUNwNrr7Sk 9jVNrm8md5fa6eOnZMNi31cjDCrOKvmXb4Prt3Pl8G0gkET3HKtqDyKXz83ccRwV DKduWQkMuhztY43MD1SWOF1ls3VBY5EIDRhRkgd9MaaN5VodGi9PbxCmWKjbQ7lt x1owU/7dlill5s/Ex32ZK+mb+IbRtJpnWHBNJK6cWSb9IPhgytTfrjOehdxGt8df n/gEncHDDiqlgAE2XlPRuHHIfJwWSUZaiVfwBgya6Zi4w1VV4CYcNNiYuWMJGm4S +QN0KqAfnTyfKr1vwhzIcN8YVVebeF8E3M6HWQx2BA24ngK3i7pqrya1IaXy/FvB rFfepXSCbEvlcuf4H/Xahh3J4F+K3Rw4UTV7DIlKg0DSzt6TYaVfor0kDzxHSRfl X5jyH6y81A1fS2+sVVwGnuy+4VosvIbzvntFf6RobSxF6QjUa0p5nx3NGHjP3hCd 2hvlFzkySp7v1JiNM0bsZPq9+alaK0+KuOXy0M/RTDq9qJz+ULtl/UP0ff2TMXUW M3tOpGJkvnCfrckqFLMr =LqRE -----END PGP SIGNATURE-----
On Fri, Jun 8, 2012 at 2:38 PM, Boruch Baum <boruch_baum@gmx.com> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/07/2012 11:38 AM, Pierpaolo Bernardi wrote:
I live in a country comprised entirely in one time zone (Italy), so the sensibility to the issue is lower than in a country spanning more zones. The only way I see other time zones mentioned in media/common people speech is the equivalent of "time of a well known big city" like, "london time", "new york time", "tokyo time". How would you prefer to see (and be able to program) that information in a computer interface?
The proposal that started this thread, or a refinement, looks sensible to me. If widely adopted it would simplify a bit this big mess of the local times. A big IF this is. 8^) P.
Just FYI. In Unicode CLDR we needed to define a set of abbreviations for timezone IDs. They have a different purpose than what's been discussed here; they are purely internal ids, and only required because of restrictions in BCP47 (so they all needed to be sequences of 3 to 8 ASCII alphanumerics - case not significant). What we did was use the United Nations LOCODE values whenever available, which are all 5 characters long and start with the country code. When there wasn't one available, we used values that were not of length 5 so that they wouldn't collide with future values. So America/Los_Angeles gets "uslax", while Etc/GMT-1 gets "utce01". http://unicode.org/repos/cldr/tags/release-21-0-2/common/bcp47/timezone.xml
On 2012-06-08 18:10, Mark Davis ☕ wrote:
Just FYI.
In Unicode CLDR we needed to define a set of abbreviations for timezone IDs. They have a different purpose than what's been discussed here; they are purely internal ids, and only required because of restrictions in BCP47 (so they all needed to be sequences of 3 to 8 ASCII alphanumerics - case not significant).
What we did was use the United Nations LOCODE values whenever available, which are all 5 characters long and start with the country code. When there wasn't one available, we used values that were not of length 5 so that they wouldn't collide with future values. So America/Los_Angeles gets "uslax", while Etc/GMT-1 gets "utce01".
http://unicode.org/repos/cldr/tags/release-21-0-2/common/bcp47/timezone.xml
Those are abbreviations for the zone names, but a particular zone may need different abbreviations at different times, depending on daylight savings. We probably don't want more than 6 characters for the abbreviations, which is the SUSv3 value of {_POSIX_TZNAME_MAX}. Incidentally, Microsoft use the older value 3 for _POSIX_TZNAME_MAX and the value 10 for TZNAME_MAX, but their tzname[] values are typically longer than that, e.g. "GMT Standard Time" and "GMT Daylight Time". They don't make for easy parsing either! -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
"They have a different purpose than what's been discussed here." ------------------------------ Mark <https://plus.google.com/114199149796022210033> * * *— Il meglio è l’inimico del bene —* ** On Fri, Jun 8, 2012 at 12:31 PM, Ian Abbott <abbotti@mev.co.uk> wrote:
On 2012-06-08 18:10, Mark Davis ☕ wrote:
Just FYI.
In Unicode CLDR we needed to define a set of abbreviations for timezone IDs. They have a different purpose than what's been discussed here; they are purely internal ids, and only required because of restrictions in BCP47 (so they all needed to be sequences of 3 to 8 ASCII alphanumerics - case not significant).
What we did was use the United Nations LOCODE values whenever available, which are all 5 characters long and start with the country code. When there wasn't one available, we used values that were not of length 5 so that they wouldn't collide with future values. So America/Los_Angeles gets "uslax", while Etc/GMT-1 gets "utce01".
http://unicode.org/repos/cldr/**tags/release-21-0-2/common/** bcp47/timezone.xml<http://unicode.org/repos/cldr/tags/release-21-0-2/common/bcp47/timezone.xml>
Those are abbreviations for the zone names, but a particular zone may need different abbreviations at different times, depending on daylight savings. We probably don't want more than 6 characters for the abbreviations, which is the SUSv3 value of {_POSIX_TZNAME_MAX}.
Incidentally, Microsoft use the older value 3 for _POSIX_TZNAME_MAX and the value 10 for TZNAME_MAX, but their tzname[] values are typically longer than that, e.g. "GMT Standard Time" and "GMT Daylight Time". They don't make for easy parsing either!
-- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
On 6/6/12 10:16 AM, Boruch Baum wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/06/2012 12:16 PM, Alan Perry wrote:
I thought that the abbreviations were intended to be the local abbreviations used for the display of a time value within particular timezones and reflect local practice. I didn't think that they were supposed to be unique. The Zone name is unique. Why does a timezone need two unique identifiers?
I can't speak for the OP, but ...
The attraction of the proposal to me is that there be unique identifiers that are easy to type, easy to work with, and that will be sufficiently attractive to people and mass media that they actually use the identifiers to the extent that what they input into a computer program / mobile app corresponds to something that can be parsed. I understand why the idea is attractive.
Is the proposal to add the unique abbreviation as an additional attribute of the Zone or replace the existing abbreviation (the Format attribute)? alan
Boruch Baum <boruch_baum@gmx.com> wrote: |CRITICISM: As Andy Lipscomb, Brian Inglis and Peter Machata all |recently posted, localizations for all timezone names and |abbreviations can (and should) be expected to be in the local language |and character set. Hmm. |Who is/are responsible for maintaining each locality's localization |lists, in order, for example, to guarantee uniqueness? I would appreciate it if there would be a mapping table from the zone name to a "real" and "official" english name of the zone, also to compensate a bit for the abbreviations which are used. But for localization of that (table and the timezone string)? This is an immense amount of work, and who is going to maintain that? |The GLIBC library of GNU/Linux currently has a function nl_langinfo, |which gives programmers an easy way to localize pretty much all time |and date related data elements... but not timezone names or |abbreviations. This function is part of POSIX ([1] recent standard). And it's a dust-dry interface, just like many other localization efforts. Is localization at all possible in such a simple-minded way, through variables which represent format strings and things like "the ante-meridiem affix"? There are people out there who don't even use words to communicate, numerical systems which don't know about '0' and use special symbols for values like '1000' or '10000', very different calendars ... [.] |So how do locales represent timezone strings in their own localization? [.] |- From my perspective, it would be best if nl_langinfo would be extended |to include timezone names and abbreviations. That would make it |trivial to write code that would portable across all locales, and |would be able to auto-magically accept timezone names and |abbreviations in the user's selected locale. | |Of course, nl_langinfo just picks up data from locale definition |files, so this still would mean each locale having a complete set of |entries for each timezone name and abbreviation (just as each dos now |month names, days of week). I guess the only really satisfying solution would be something like the Berkeley packet filter or similar -- i.e., pseudo-ops that will be executed and which produce some string as output. That is to say, language/locale specific, port programming language independent code, which can compute and reorder things as necessary. I fail to see how you could gain real internationalization with the current tools otherwise. But the TZ database gives you data and tools to compute the correct time -- isn't that just fantastic. Thanks, --steffen [1] http://pubs.opengroup.org/onlinepubs/9699919799/
On 2012-06-06 09:35, Boruch Baum wrote:
CRITICISM: As Andy Lipscomb, Brian Inglis and Peter Machata all recently posted, localizations for all timezone names and abbreviations can (and should) be expected to be in the local language and character set.
Who is/are responsible for maintaining each locality's localization lists, in order, for example, to guarantee uniqueness?
The GLIBC library of GNU/Linux currently has a function nl_langinfo, which gives programmers an easy way to localize pretty much all time and date related data elements... but not timezone names or abbreviations. The GNU/Linux coreutil commandline 'date +%Z' command will return a timezone abbreviation, but seemingly never localized. I see no option in the GNU/Linux coreutil commandline 'date' command to return a timezone name.
So how do locales represent timezone strings in their own localization?
I recently started testing my libhdate collection under a variety of locales, so I had a small script all ready for this test:
===================================================================== #!/bin/bash # locale_test.sh # execute an arbitrary command from a list of locales # note: the locales must first be defined using localedef(1). This usually requires administrator privileges
locales=( \ he_IL Hebrew Israel \ de_DE German Germany \ pt_BR Portuguese Brazil \ ru_RU Russian Russia \ ar_MA Arabic Morocco \ es_AR Spanish Argentina \ es_MX Spanish Mexico \ fr_FR French France \ fr_CA French Canada \ hi_IN Hindi India \ fa_IR Farsi Iran \ ja_JP Japanese Japan \ ko_KR Korean Korea \ hy_AM Armenian Armenia \ ka_GE Georgian Georgia \ am_ET Amharic Ethiopia \ zz )
i=0 while [[ "${locales[$i]}" != "zz" ]]; do echo -e "\e[48;5;234mcommand: $*, locale: ${locales[$i]} ${locales[$i+1]} ${locales[$i+2]}\e[01;00m" env LC_ALL=${locales[$i]}.UTF-8 $* i=$i+3 done
===================================================================== $ ./locale_test.sh TZ='Asia/Tehran' date +"%A_%B_%Z" command: TZ=Asia/Tehran date +%A_%B_%Z, locale: he_IL Hebrew Israel רביעי_יוני_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: de_DE German Germany Mittwoch_Juni_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: pt_BR Portugese Brazil quarta_junho_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ru_RU Russian Russia Среда_Июнь_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ar_MA Arabic Morocco الأربعاء_يونيو_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: es_AR Spanish Argentina miércoles_junio_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: es_MX Spanish Mexico miércoles_junio_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fr_FR French France mercredi_juin_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fr_CA French Canada mercredi_juin_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: hi_IN Hindi India बुधवार _जून_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: fa_IR Farsi Iran چهارشنبه_ژوئن_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ja_JP Japanese Japan 水曜日_6月_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ko_KR Korean Korea 수요일_6월_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: hy_AM Armenian Armenia Չորեքշաբթի_Հունիսի_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: ka_GE Georgian Georgia ოთხშაბათი_ივნისი_IRDT command: TZ=Asia/Tehran date +%A_%B_%Z, locale: am_ET Amharic Ethiopia ረቡዕ_ጁን_IRDT =====================================================================
- From my perspective, it would be best if nl_langinfo would be extended to include timezone names and abbreviations. That would make it trivial to write code that would portable across all locales, and would be able to auto-magically accept timezone names and abbreviations in the user's selected locale.
Of course, nl_langinfo just picks up data from locale definition files, so this still would mean each locale having a complete set of entries for each timezone name and abbreviation (just as each dos now month names, days of week).
CLDR would need to collect timezone names and abbreviations for each combo of Olson/IANA timezone (which specifies an associated territory/ country code in the zone.tab file), standard/daylight state, and languages used in that territory. So for any timezone and territory, localization is likely to be possible only for those languages used (or the subset provided in the CLDR, which may be only the officially supported languages) in that territory for that timezone and state. For Canada, English and French names and abbreviations are provided, but do how those are rendered into say Chinese, Arabic, and Hebrew depend on whether the populations using those languages are in North American English, French, or their native locales? So limited localization may be supported by CLDR, but it is likely to be a large effort and a long time before internationalization of timezone names and abbreviations could be supported, as the locales required could be a large subset of the product of all languages and territories.
Boruch Baum <boruch_baum@gmx.com> writes:
CRITICISM: As Andy Lipscomb, Brian Inglis and Peter Machata all recently posted, localizations for all timezone names and abbreviations can (and should) be expected to be in the local language and character set.
Who is/are responsible for maintaining each locality's localization lists, in order, for example, to guarantee uniqueness?
In Fedora Linux in particular zone identifier translations are stuffed at system-config-date. I recall that Debian has a dedicated package for that. In both these cases, the translation is done by calling gettext on a time zone identifier. There are some improvements to be made. In particular, the zone ID prefixes (e.g. "America") get translated each time a new zone appears. Typos are not unheard of. This could be solved by translating each fragment separately (providing context to disambiguate namespace collisions) [1]. Nills Philippsen and I have been about to publish a zoneinfo translation repository for some time now, but can't seem to get around to it. As a bonus you would then be able to partially translate even new zones. E.g., you could translate America/Indiana/NewZone into a hybrid Америка/Индиана/NewZone, which is probably better than the bare thing. (At least it sorts correctly.) So this can be solved, and is, away from zoneinfo project, and there are existing Open Source translation efforts for time zone strings in particular. Of course this is only for presentation purposes. You don't translate contents of $TZ much like you don't translate program option names or library function names. gettext might be kinda sorta used for translating abbreviations as well. One would need to mangle the abbreviation, e.g. "Europe/Prague//CET", to construct a unique string, which is admittedly quite awkward. (Also, glibc wouldn't know to do it when formatting localized date strings.) Then you could shove this to gettext and get back "SEČ" as you expect. Scripts could be written to sort the abbreviations into pools of equality, so that "Europe/Prague//CET" doesn't have to be translated separately from "Europe/Bratislava//CET" etc. [1] Scripts for this are available: https://github.com/pmachata/zoneinfo-localization
The GLIBC library of GNU/Linux currently has a function nl_langinfo, which gives programmers an easy way to localize pretty much all time and date related data elements... but not timezone names or abbreviations. The GNU/Linux coreutil commandline 'date +%Z' command will return a timezone abbreviation, but seemingly never localized. I see no option in the GNU/Linux coreutil commandline 'date' command to return a timezone name.
That's because the TZ identifier is not stored in zoneinfo file. Doing so would prevent us from hardlinking equal zones to save disk space. (Though I don't know if this is the reason the TZID is absent from the file, or the hardlinking trick is the consequence of this.) Furthermore, the time and date functions just look into /etc/localtime, which is a _copy_ of the zoneinfo file for your time zone. It's a copy for historical reasons: zoneinfo would be installed into /usr/share, and /usr could be separately mounted, and you need correct time even if the mount fails, or before it happens. I'm not sure how relevant this is these days. In Fedora 17 in particular, /usr is now the canonical location for system files. /bin /lib* etc. are just symlinks to /usr. But in any case, "date" and glibc simply don't know, in general, what the zone ID is. It's just /etc/localtime. Thanks, PM
<<On Thu, 07 Jun 2012 18:53:20 +0200, Petr Machata <pmachata@redhat.com> said:
In Fedora Linux in particular zone identifier translations are stuffed at system-config-date. I recall that Debian has a dedicated package for that. In both these cases, the translation is done by calling gettext on a time zone identifier.
That seems like a really odd thing to do. Why not translate zone.tab? That's what user interfaces should be using to select zones anyway. The TZid is just a token you can hand to the C library (the fact that it's also a pathname is just an implementation detail).
That's because the TZ identifier is not stored in zoneinfo file. Doing so would prevent us from hardlinking equal zones to save disk space. (Though I don't know if this is the reason the TZID is absent from the file, or the hardlinking trick is the consequence of this.)
In recent releases of FreeBSD the tzsetup utility automatically stores the installed zone's TZid in /var/cache/zoneinfo so that it can be reinstalled without user intervention when the tzdata is updated using "tzsetup -r". Many FreeBSD machines still have a separate /usr partition becuase that is the traditional partitioning setup. -GAWollman
Garrett Wollman <wollman@csail.mit.edu> writes:
<<On Thu, 07 Jun 2012 18:53:20 +0200, Petr Machata <pmachata@redhat.com> said:
In Fedora Linux in particular zone identifier translations are stuffed at system-config-date. I recall that Debian has a dedicated package for that. In both these cases, the translation is done by calling gettext on a time zone identifier.
That seems like a really odd thing to do. Why not translate zone.tab? That's what user interfaces should be using to select zones anyway. The TZid is just a token you can hand to the C library (the fact that it's also a pathname is just an implementation detail).
We translate some of the strings in zone.tab, yes. system-config-date is a GUI application for clock management that presents a map and a list of zones. The zones IDs and descriptions need to be localized, that's why we translate it.
That's because the TZ identifier is not stored in zoneinfo file. Doing so would prevent us from hardlinking equal zones to save disk space. (Though I don't know if this is the reason the TZID is absent from the file, or the hardlinking trick is the consequence of this.)
In recent releases of FreeBSD the tzsetup utility automatically stores the installed zone's TZid in /var/cache/zoneinfo so that it can be reinstalled without user intervention when the tzdata is updated using "tzsetup -r". Many FreeBSD machines still have a separate /usr partition becuase that is the traditional partitioning setup.
It's similar in Linux as well, the file is /etc/sysconfig/clock in Fedora. Then we have a trigger on tzdata package that copies the new zone over when tzdata is updated. But nothing prevents a user from copying stuff over by hand, at which point the information in /etc/sysconfig/clock is wrong. Thanks, PM
<<On Thu, 07 Jun 2012 20:11:53 +0200, Petr Machata <pmachata@redhat.com> said:
We translate some of the strings in zone.tab, yes. system-config-date is a GUI application for clock management that presents a map and a list of zones. The zones IDs and descriptions need to be localized, that's why we translate it.
Why does the user even see the TZid? I'm curious as to the reasoning behind this design decision. In FreeBSD, the only reason a user has to know about TZids is if they want to explicitly invoke a timezone other than the system default from the shell. -GAWollman
Garrett Wollman <wollman@csail.mit.edu> writes:
<<On Thu, 07 Jun 2012 20:11:53 +0200, Petr Machata <pmachata@redhat.com> said:
We translate some of the strings in zone.tab, yes. system-config-date is a GUI application for clock management that presents a map and a list of zones. The zones IDs and descriptions need to be localized, that's why we translate it.
Why does the user even see the TZid? I'm curious as to the reasoning behind this design decision. In FreeBSD, the only reason a user has to know about TZids is if they want to explicitly invoke a timezone other than the system default from the shell.
To set up that default zone, you would use system-config-date. Thanks, PM
On 2012/06/07 05:53 PM, Petr Machata wrote:
gettext might be kinda sorta used for translating abbreviations as well. One would need to mangle the abbreviation, e.g. "Europe/Prague//CET", to construct a unique string, which is admittedly quite awkward. (Also, glibc wouldn't know to do it when formatting localized date strings.) Then you could shove this to gettext and get back "SEČ" as you expect. Scripts could be written to sort the abbreviations into pools of equality, so that "Europe/Prague//CET" doesn't have to be translated separately from "Europe/Bratislava//CET" etc.
But it would have to get the "Europe/Prague//" bit from somewhere in order to construct this string. TZ might not be set or might not be a tzdata zone name. The current zone may be set from the /etc/localtime file or wherever. There's nothing in the contents of a compiled zone file to give you the actual tzdata zone name that it originated from. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
Ian Abbott <abbotti@mev.co.uk> writes:
On 2012/06/07 05:53 PM, Petr Machata wrote:
gettext might be kinda sorta used for translating abbreviations as well. One would need to mangle the abbreviation, e.g. "Europe/Prague//CET", to construct a unique string, which is admittedly quite awkward. (Also, glibc wouldn't know to do it when formatting localized date strings.) Then you could shove this to gettext and get back "SEČ" as you expect. Scripts could be written to sort the abbreviations into pools of equality, so that "Europe/Prague//CET" doesn't have to be translated separately from "Europe/Bratislava//CET" etc.
But it would have to get the "Europe/Prague//" bit from somewhere in order to construct this string. TZ might not be set or might not be a tzdata zone name. The current zone may be set from the /etc/localtime file or wherever. There's nothing in the contents of a compiled zone file to give you the actual tzdata zone name that it originated from.
Ah, true. I was still thinking about the system-config-date use case, where this information is at hand. Thanks, PM
participants (13)
-
Alan Perry -
Andy Lipscomb -
Boruch Baum -
Brian Inglis -
David Braverman -
David Patte -
Garrett Wollman -
Ian Abbott -
Mark Davis ☕ -
Petr Machata -
Pierpaolo Bernardi -
Steffen Daode Nurpmeso -
Tobias Conradi