Proposal for a modern 'Collapsed' Namespace

Walter

Oct. 12, 2012

1:59 a.m.

Hi there, I would like to propose solving the following issues for many users by adding a new file in the TZ database: - The much referenced issue of 1.4 billion+ people on Beijing time being semantically mismatched to their timezone entry, which for lack of meaningful alternative timezone identification strings winds up getting displayed to users. - The lack of a functional entry for the widely used but unofficial Xinjiang or Wulumuqi time of western China. - General accrual of crufty old timezones. - The need for a shared database of decent multilingual timezone names and descriptions. - The problem of how to display a timezone to the user that is detected via zone.tab and GeoIP, or similar methods. Please find attached my proposal, which is born of developer requirements and primarily the consideration for end user experience rather than that of historical correctness in edge cases. Kind regards, Walter Stanish

Attachments:

collapsed.tab (application/octet-stream — 5.3 KB)

Show replies by date

Guy Harris

October 2012

2:09 a.m.

On Oct 11, 2012, at 6:59 PM, Walter <walter.stanish@gmail.com> wrote:

...

Please find attached my proposal

...which has a suffix of .tab and a media type of application/octet-stream, neither of which make it very readable by many mail user agents. It's actually just plain text (with an unnecessary <pre> before it), so here it is: # This file is in the public domain. # Author: Walter Stanish <walter@stani.sh> (2012-10-12) # # This file collapses multiple time zones in to a reduced # number of modern "time zones" (in the natural language # semantic sense) within the 'Collapsed/*' namespace for # user modern interface display purposes. # # The sole focus here is on achieving a pragmatic solution # for end users in normal modern use cases, in a manner # that is compatible with the existing format and # maintenance of the time zone database. Any and all # other considerations are implicitly dismissed. # # Users MAY be presented with items from the collapsed # list as the primary means of timezone selection or # display, though given the option to select from the # complete database (including the component entries) # belonging to should they wish. # # ------------------------------------------------------ # NOTE: THE 'Collapsed/*' NAMESPACE PURPOSEFULLY DOES # NOT CONSIDER HISTORIC DIFFERENCES BETWEEN # COMPONENT TIME ZONES AND AS SUCH IS NOT # RECOMMENDED FOR SYSTEMS INTENDING TO CALCULATE # 20TH CENTURY HISTORIC DATES AND TIMES IN A # TIMEZONE DEPENDENT FASHION. # ------------------------------------------------------ # # Rationale: # - Users often need to select the name of a timezone, # for example when installing an operating system or # working with global event data. # - To date (2012), the time zone database has supplied # only (A) a list of 'Region/Name' style identifiers # that are not appropriate for user presentation for # various reasons; and (B) zone.tab, a list of # single geographical coordinate pairs for the each # of the above, with concise descriptions. # - Even if the zone.tab file may assit autoselection # an appropriate user timezone in some cases, it # still does not solve the display problem for the # resulting timezone, once it has been selected. # - None of the above provide user interface oriented # human language descriptions of modern "timezones" # with which the user may be familiar. # - For example, the "Beijing time" that is considered # standard in mainland China is presently split in to # no less than five regions that, whilst historically # unique, in any modern use case remain completely # irrelevant. Furthermore, the most common use # case for non-standard time within China (the # informal "Xijiang time"; actively used and # referred to as 'Wulumuqi time' by Chinese press # and government) is not adequately presented AT ALL # as an alternative to standard Beijing time in # the current database (though it is discussed under # both the 'Asia/Kashgar' and 'Asia/Urumqi' entries). # - China represents some huge percentage of today's # internet and the time zone database has failed # to provide an appropriate solution for those users. # - There are sure to be other regions of the world # with similar time zone nomenclature and/or # legacy data situations to that of China. # - There is no reasonable basis for translation of # time zone descriptions at this time (zone.tab is # certainly inadequate). # # Format: # # Collapsed <CollapsedZoneName> <TargetZone> # Define a new zone within the modern 'Collapsed/*' namespace. # The <CollapsedZoneName> may be referenced as # 'Collapsed/<CollapsedZoneName>' and will assume the properties # of the pre-existing time zone <TargetZone>. # # CollapsedName <CollapsedZoneName> <language> "<Title>" # Define the title of a collapsed zone within the modern, # 'Collapsed/*' namespace for a given IANA language tag. # # Collapse <SourceZone> <CollapsedZoneName> # Provide a 1:1 mapping from a traditional time zone through to # the modern 'Collapsed/*' namespace. # # CanCollapse <SourceZone> <CollapsedZoneName> [<Priority>] # Provide a potential option within a 1:n mapping from a # traditional time zone through to the modern 'Collapsed/*' # namespace. Optionally, specify a <Priority> ranging from # 0-255 that should be used to order the options presented # to a user in order of most to least likely mapping. # Higher priorities mean a greater likelihood that an option # is the correct assumption. # Collapsed/BeijingTime - Beijing Time (all of mainland China) Collapsed BeijingTime Asia/Shanghai CollapsedName BeijingTime en "Beijing time" CollapsedName BeijingTime zh-hans "北京时间" CollapsedName BeijingTime zh-hant "北京時間" CollapsedDesc BeijingTime en "As used officially across mainland China." CollapsedDesc BeijingTime zh-hans "全国正式使用时区" CollapsedDesc BeijingTime zh-hant "全國正式使用時區" Collapse Asia/Harbin BeijingTime Collapse Asia/Shanghai BeijingTime Collapse Asia/Chongqing BeijingTime CanCollapse Asia/Kashgar BeijingTime 100 CanCollapse Asia/Urumqi BeijingTime 100 # Collapsed/XinjiangTime - Xinjiang Time (aka 'Wulumuqi time') Collapsed XinjiangTime UTC+06:00 CollapsedName XinjiangTime en "Xinjiang time" CollapsedName XinjiangTime zh-hans "新疆时间" CollapsedName XinjiangTime zh-hant "新疆時間" CollapsedDesc XinjiangTime en "As used unofficially throughout Xinjiang." CollapsedDesc XinjiangTime zh-hans "新疆非正式使用时区" CollapsedDesc XinjiangTime zh-hant "新疆非正式使用時區" CanCollapse Asia/Kashgar XinjiangTime 50 CanCollapse Asia/Urumqi XinjiangTime 50

Walter

2:13 a.m.

...

...which has a suffix of .tab and a media type of application/octet-stream, neither of which make it very readable by many mail user agents. It's actually just plain text (with an unnecessary <pre> before it), so here it is:

Each to their own. <tab> and file name were both copied from an existing file within the distribution. - Walter

Guy Harris

3:03 a.m.

On Oct 11, 2012, at 7:13 PM, Walter <walter.stanish@gmail.com> wrote:

...

...
...which has a suffix of .tab and a media type of application/octet-stream, neither of which make it very readable by many mail user agents. It's actually just plain text (with an unnecessary <pre> before it), so here it is:

Each to their own.

Unfortunately, in this case, that amounts to "to each member of the mailing list perhaps some time spent trying to open the document", which is Not A Good Thing if you want people to discuss the document, which I presume you do.

...

<tab> and file name were both copied from an existing file within the distribution.

I'm not sure why those files need # <pre> at the beginning, or why they have .tab as a suffix, as I don't think that's a commonly-known file suffix.

Walter

3:09 a.m.

...

...
<tab> and file name were both copied from an existing file within the distribution.

I'm not sure why those files need

# <pre>

at the beginning, or why they have .tab as a suffix, as I don't think that's a commonly-known file suffix.

I don't know either, though I suspect the .tab suffix is to distinguish the table, perhaps in a nominally 8.3 backwards-compatible format (for embedded, etc.), from the main zone listings. When in Rome! - Walter

Paul Eggert

4:57 a.m.

On 10/11/2012 08:03 PM, Guy Harris wrote:

...

I'm not sure why those files need

# <pre>

at the beginning, or why they have .tab as a suffix

I used .tab originally, to mark them as files with tab-separated columns (except '#' for comments), a slightly different format from the main DB. It was a spur-of-the-moment thing, nothing deep there. I think Arthur put in the <pre> stuff. I'm not such a big fan of that myself; XML hurts my eyes.

Derick Rethans

10:03 a.m.

On Thu, 11 Oct 2012, Guy Harris wrote:

...

On Oct 11, 2012, at 6:59 PM, Walter <walter.stanish@gmail.com> wrote:

...
Please find attached my proposal

It's actually just plain text (with an unnecessary <pre> before it), so here it is:

# This file is in the public domain. # Author: Walter Stanish <walter@stani.sh> (2012-10-12) # # This file collapses multiple time zones in to a reduced # number of modern "time zones" (in the natural language # semantic sense) within the 'Collapsed/*' namespace for # user modern interface display purposes.

In my opinion, this does not belong in the timezone database, but rather in a convenience layer within your project - or the Unicode's Consortiums CLDR (http://cldr.unicode.org/). The CLDR already as a large number of those user-friendly names in many languages already too. F.e.: http://unicode.org/repos/cldr-tmp/trunk/charts/supplemental/zone_tzid.html And in the common/main/<country>.xml files that you can download off http://unicode.org/Public/cldr/22/ cheers, Derick

Walter

11:39 a.m.

...

In my opinion, this does not belong in the timezone database, but rather in a convenience layer within your project - or the Unicode's Consortiums CLDR (http://cldr.unicode.org/). The CLDR already as a large number of those user-friendly names in many languages already too. F.e.: http://unicode.org/repos/cldr-tmp/trunk/charts/supplemental/zone_tzid.html

First, thanks for your response, which is the first that has actually dealt with the contents of the proposal. I was not aware of the page you mentioned and indeed this is an interesting dataset. It is apparently a mapping between the Windows timezone database (copyright status?) and the tz data set. It is particularly interesting because it appears to promote the effective grouping of the historical China-related timezones present in the tz data in to a single modern entity, precisely as proposed as appropriate for normal, modern use. Unfortunately, it does not appear to resolve some of the issues mentioned, in particular: - It appears to simply relate Windows timezones to the tz database, and not to provide any additional information - It appears to lacks Xinjiang time - It appears to lack any human language string featuring the term 'Beijing time', the dominant verbal timezone identifier for 1.4 billion+ mainland Chinese people - It appears to be only provided in English The update process seems unclear but it seems that the table in question is not intended to be extended and/or maintained separately to the apparent purpose of providing direct Windows/tz timezone record translation. Owing to the above I do not believe that this resource either (1) addresses the problems identified, or (2) is the appropriate vehicle with which to do so. However, thank you very much for pointing it out as an interesting and potentially relevant resource.

...

And in the common/main/<country>.xml files that you can download off http://unicode.org/Public/cldr/22/

The 'core.zip' file at that location (the only one that appeared to be relevant) seems to include a repeat of the above information in 'supplemental/windowsZones.xml' -- the only place where the word 'Beijing' occurs throughout the entire archive. The word 'Shanghai' appears sporadically throughout some, but not all, of the language files throughout the 'main' subdirectory, however the contents of the appropriate sections appear to be simply translations of the geographic name of the 'exemplar city' (most populous city, from what I understand) that is used to represent the lat/long coordinates for a tz database time zone within the 'zone.tab' file. Geographic nomenclature translation is not the issue here. Critically, no file within the archive appears to address the establishment of either timezones themselves (Xinjiang time) or common names for modern time zones that span multiple tz entries (Beijing time). Instead, it appears that the only relevant data is a repeat of the above URL, ie. a table intended to equate Windows and tz timezone records. Regards, Walter Stanish

Robert Elz

11:33 a.m.

Date: Fri, 12 Oct 2012 12:59:11 +1100 From: Walter <walter.stanish@gmail.com> Message-ID: <CACwuEiMncemY8tf+aWuDoDumgaSzgAaQCBn5v5+kQp=QaDQbDw@mail.gmail.com> | - The much referenced issue of 1.4 billion+ people on Beijing time | being semantically mismatched to their timezone entry, which for lack | of meaningful alternative timezone identification strings winds up | getting displayed to users. The problem of user selection (and localised identification) of time zones is a real one, and worth working on - though personally I'd prefer that you set up a new project for this, and get the involvement of people experienced in international UI issues, rather than people who know about time and (perhaps) care about little else... That is, this group might not be the best place to achieve a good result for that worthwhile aim. | - The lack of a functional entry for the widely used but unofficial | Xinjiang or Wulumuqi time of western China. This one has been discussed before - my memory is poor, and I haven't gone back through the archives to check, but I think the only real issue was some doubt as to just how much those timezones are actually used. If we get any good information that there's a timezone that is in use, but we don't have, there's essentially never any problem adding it. | - General accrual of crufty old timezones. That's a mistake. You're making assumptions about the way people use the data that are not always correct. Sure, if all the users ever care about is "what is the time now" then zones that are different only wrt times in the past seem superfluous - but when you need to look at a historical timestamp, which people sometimes do, having the wrong zone causes errors. What needs to be done is for the UI to better educate people and guide them to the correct timezone selection for their needs, which is all part of the UI issue, which I don't believe belongs here (let this project collect the data, and someone else figure out how to present it, each needs experts from their own fields, which are quite distinct.) | - The need for a shared database of decent multilingual timezone | names and descriptions. An aspect of the first issue I believe (or at least, should be considered as part of the same project). | - The problem of how to display a timezone to the user that is | detected via zone.tab and GeoIP, or similar methods. Same.

Walter

12:19 p.m.

...

| - The much referenced issue of 1.4 billion+ people on Beijing time | being semantically mismatched to their timezone entry, which for lack | of meaningful alternative timezone identification strings winds up | getting displayed to users.

The problem of user selection (and localised identification) of time zones is a real one, and worth working on - though personally I'd prefer that you set up a new project for this, and get the involvement of people experienced in international UI issues, rather than people who know about time and (perhaps) care about little else...

That's an understandable take, however consider the other side: - ICANN's mandate is global and whilst its operating language may very well be English, and tz has hobbled-along on a broken identifier scheme for some time, it seems somewhat difficult to dismiss i18n requirements, especially those affecting some huge percentage of humanity - Because this is the de-facto library for so many systems, it seems a really good place to solve common issues encountered when building real systems dealing with this data. Some of those issues that tz is failing to solve right now are: - Grouping of historical timezones in to single logical entities - Timezone (or timezone group) names - i18n of the above - The tz data set has already overstepped the raw tz data purpose and branched out in to providing useful, (arguably less) closely related information such as associated lat/long and city names. In doing so, it has broadened its scope beyond raw tz data to closely tz-related data that is useful in implementing tz data based systems. Basically, tz is a database, and the name of the tz's themselves should be a core feature of that database.

...

That is, this group might not be the best place to achieve a good result for that worthwhile aim.

If a result were achieved here, however, it would be helpful to many more people in the sense that it would be more likely to become available to all related libraries and systems, and provide a common point for internet-wide maintenance in the public interest. Is this not in line with ICANN's "mission of technical coordination"?* * http://www.icann.org/en/about/welcome

...

| - The lack of a functional entry for the widely used but unofficial | Xinjiang or Wulumuqi time of western China.

This one has been discussed before - my memory is poor, and I haven't gone back through the archives to check, but I think the only real issue was some doubt as to just how much those timezones are actually used.

If we get any good information that there's a timezone that is in use, but we don't have, there's essentially never any problem adding it.

OK. I would be worried however that this would cause issues with existing systems utilizing the database -- because of the fact that the tz database has apparently not provided enough structure within the zone data to clearly delineate between different time zones simultaneously in use within the same geographic region. It seems to me that there is some kind of breakdown between cities as geographic entities as principals for time zone affected regions (unsuitable for presentation to the end user, but apparently sometimes used for wont of alternative), the zone identifiers themselves (unsuitable for presentation to the end user, but often used for wont of alternative), and the actual time zone names as used by normal people, which are apparently almost entirely missing!

...

| - General accrual of crufty old timezones.

That's a mistake. You're making assumptions about the way people use the data that are not always correct. Sure, if all the users ever care about is "what is the time now" then zones that are different only wrt times in the past seem superfluous - but when you need to look at a historical timestamp, which people sometimes do, having the wrong zone causes errors.

I see the use case and certainly don't mean to devalue in any way the tremendous work that's gone in to compiling the tz resource. I just think that on the weight of it, historic timezones that few people have even heard of are a virtually academic edge-case with regards to the 1.399 billion people that use tz data for normal computing purposes in China and couldn't care less about 20th century regulatory hiccups. They don't have something that says "Beijing time", nor is there even a means to link the five (!) disparate historic timezones that may be useful for academics and specialists in to a single timezone, which is the modern reality for 1.4 billion people. They simply can't be presented with an effective user interface, based upon the tz data. That's clearly a bug, any way you look at it. As seen in an earlier post on this thread, other zone lists have apparently taken some initiative here. Why can't tz? (In addition to China, it may be safe to assume that there are many other areas of the world with now-unified timezones of purely historic interest, presenting both translation overheads and a UI impediment to non-academic developers and end users.)

...

What needs to be done is for the UI to better educate people and guide them to the correct timezone selection for their needs, which is all part of the UI issue, which I don't believe belongs here (let this project collect the data, and someone else figure out how to present it, each needs experts from their own fields, which are quite distinct.)

I'm not advocating the tz database care too much about UI. I am merely advocating that it provides the fundamental requirement for any timezone related program - a human readable name for the time zone in question. Where the human readable name crosses multiple historic timezones, some form of grouping such as that proposed (and apparently adopted elsewhere) should also, quite necessarily, be provided. Right now, people use the identifier (eg: Asia/Shanghai) despite problems with its use for this purpose. That's because there's no alternative provided except for the zone.tab comments, which are less than uniformly suitable for presentation to (and translation for) end users. There should at least be a name. And if there's a name, in this day and age, it should be multilingual. Right now the tz dataset, whilst successful, apparently remains a database of identifiers for entities that cannot be presented to end users, for wont of human readable names. Regards, Walter

Robert Elz

4:08 p.m.

Date: Fri, 12 Oct 2012 23:19:19 +1100 From: Walter <walter.stanish@gmail.com> Message-ID: <CACwuEiM8=UeXjBWBrMr10vLswWFYZdwzr1bYxdX7MHJxi-AfOA@mail.gmail.com> | That's an understandable take, however consider the other side: | - ICANN's mandate is global and whilst its operating language may | very well be English, You're preaching to the converted, I have no doubt that the work needs to be done, it is just that I am not qualified to do it, and the mandate for this group does not necessarily attract those who do. | and tz has hobbled-along on a broken identifier scheme for some time, No, its identifier scheme is just fine. You're just trying to use it for something it wasn't designed to do, and then claiming it is broken because it doesn't meet your need. That's unreasonable - go design the identifier scheme that is needed for your purpose (or join the people already doing that), and we can continue just making sure that the data is correct, which is our role. | it seems somewhat difficult to dismiss i18n requirements, | especially those affecting some huge percentage of humanity No-one is dismissing the requirements, but we cannot do everything (there are lots of projects more important to even bigger percentages of humanity that we're not working on either.) | Some of those issues that tz is failing to solve right now are: | - Grouping of historical timezones in to single logical entities Because that would devalue the project, if anything we'd more likely move the cut-off point beyond which we currently ignore differences (somewhat arbitrarily set at 1970 - the unix time epoch) backwards, causing some of the zones we currently have to split. The chances of ever combining zones (in any way) (with the sole exception of a split made based on what turns out to be incorrect information) are close to nil. Give up on that one. | - Timezone (or timezone group) names We have names. The names we have work for their purpose. | - i18n of the above Someone else's problem. We understand time, not international naming. Others do the latter. | - The tz data set has already overstepped the raw tz data purpose and | branched out in to providing useful, (arguably less) closely related | information such as associated lat/long and city names. We deal with lat/long because that gives us local mean times, as used before standardised regional times were in use. Note that the lat/long we deal with are for points, not regions (some others have worked on the latter, we don't). The city names are just our naming convention. | In doing so, | it has broadened its scope beyond raw tz data to closely tz-related | data that is useful in implementing tz data based systems. Basically, | tz is a database, and the name of the tz's themselves should be a core | feature of that database. Absolutely, but the names for this purpose are internal database names, used to unambiguously select a particular set of data - that's all they're intended for, they are not supposed to be the user interface. I sometimes believe we should delete all the Australia/Melbourne Eurpoe/London America/New_York type names, and rename the zones as TZ1 TZ2 TZ3 ... Those would work just as well (though be harder to remember, and perhaps cause confusion on occasion) and would make it much clearer that people who do UI's should be providing a better interface than "select one of these names" for timezone selection (some already do, but they're a minority). | If a result were achieved here, however, it would be helpful to many | more people in the sense that it would be more likely to become | available to all related libraries and systems, That's less likely than you'd think. And certainly no more likely (perhaps even less likely) coming from a group with little expertise in the area. | Is this not in line with ICANN's "mission of technical coordination"?* Note that this group existed long before any involvement with ICANN - they just offered to be a host, but if you read the RFC that was created to enable the IANA involvement, you'll see that we remain almost completely independent. | OK. I would be worried however that this would cause issues with | existing systems utilizing the database -- because of the fact that | the tz database has apparently not provided enough structure within | the zone data to clearly delineate between different time zones | simultaneously in use within the same geographic region. We just define the zones, it is not, and never has been, our role to to attempt to work out what region uses any particular timezone. If there's a different timezone somewhere, and we can determine its parameters, we will define it, so it is available for use. Who uses it, and how they use it, is someone else's problem... | It seems to me that there is some kind of breakdown between cities as | geographic entities as principals for time zone affected regions The city names are just labels - they were chosen as we had (and I think still have) the belief that a city attempting to simultaneously use two different time settings (for common everyday use, specialised uses of different times are fine) would be an unworkable mess, and so that's so unlikely that using a city that happens to use a particular timezone as the name of that timezone seemed like a reasonable choice (and slightly more human friendly than TZ1 TZ2 TZ3 ...) | I just think that on the weight of it, historic timezones that few people | have even heard of are a virtually academic edge-case with regards to | the 1.399 billion people that use tz data for normal computing | purposes in China and couldn't care less about 20th century regulatory | hiccups. You mean they want it easy, rather than correct. That's not an uncommon desire, but it always always turns out to be a mistake. Do you plan on accepting responsibility for anything that goes wrong because of this "flexible" attitude to what is correct? | They don't have something that says "Beijing time", nor is | there even a means to link the five (!) disparate historic timezones | that may be useful for academics and specialists in to a single | timezone, which is the modern reality for 1.4 billion people. You mean they all see the same time when they look at their clocks, today. That's not a timezone, that's a clock setting. The timezone includes all the historical data, which isn't just of academic interest, it is reality. | That's clearly a bug, any way you look at it. As seen in an earlier | post on this thread, other zone lists have apparently taken some | initiative here. Why can't tz? I have no idea what bug you think it is, if you believe being correct is a bug, you have a vastly different view than I do. What other projects do depends upon their needs, if one of those is doing work more closely associated with your needs, perhaps you should join them. Their responsibility isn't to get the timezone data correct, ours is. That's what we will continue to do. | I'm not advocating the tz database care too much about UI. I am | merely advocating that it provides the fundamental requirement for any | timezone related program - a human readable name for the time zone in | question. We have that. TZ1 TZ2 ... would also qualify. What they're not is human friendly. That's something that we don't need. Others probably do. Once again, someone else's problem. | Right now, people use the identifier (eg: Asia/Shanghai) despite | problems with its use for this purpose. Asia/Shanghai is the tz database name for one of the zones that applies in China. If people want to select that zone when some other one really applies to them, I'd treat that as a failure of the UI (and someone else's problem). If the zone that they should select doesn't exist (which perhaps the ones you requested don't, and maybe should) then that is our problem. | That's because there's no | alternative provided except for the zone.tab comments, which are less | than uniformly suitable for presentation to (and translation for) end | users. zone.tab is an addendum to our primary purpose, personally I wouldn't mind if it simply went away, or perhaps responsibility for its maintenance shifted to some other group. | There should at least be a name. And if there's a name, in this day | and age, it should be multilingual. Fine, no argument, but we don't need that kind of name for our purposes. Others do. Once again, someone else's problem. | Right now the tz dataset, whilst successful, apparently remains a | database of identifiers for entities that cannot be presented to end | users, for wont of human readable names. No argument. So, go join the CLDR effort (or whatever it is) who apparently are working on, or have worked on, this issue (whoever they are). Our job is to make sure that all the data is present, available, and correct. We can't do everything, we have one particular role. It is quite specialised. In no way does it cover (even) everything related to times (we don't work on earth synchronisation, or leap seconds, or time scales, or ...) We just collect and publish timezone data - lists of discontinuities in the passage of time, as used locally on local wall clocks. Everything more than that is someone else's problem. kre

Philip Newton

4:21 p.m.

On 12 October 2012 14:19, Walter <walter.stanish@gmail.com> wrote:

...

There should at least be a name. And if there's a name, in this day and age, it should be multilingual.

How many languages should there be? Should each time zone have the same number of translations? Who will provide the language data? (Both now, and for any new time zones that might arise in the future?) Cheers, Philip -- Philip Newton <philip.newton@gmail.com>

Paul_Koning＠Dell.com

1:57 p.m.

On Oct 12, 2012, at 7:33 AM, Robert Elz wrote:

...

Date: Fri, 12 Oct 2012 12:59:11 +1100 From: Walter <walter.stanish@gmail.com> Message-ID: <CACwuEiMncemY8tf+aWuDoDumgaSzgAaQCBn5v5+kQp=QaDQbDw@mail.gmail.com> ... | - General accrual of crufty old timezones.

That's a mistake. You're making assumptions about the way people use the data that are not always correct. Sure, if all the users ever care about is "what is the time now" then zones that are different only wrt times in the past seem superfluous - but when you need to look at a historical timestamp, which people sometimes do, having the wrong zone causes errors.

You almost always need history, because you have stored objects with timestamps on them -- files in directories, for example. And those timestamps need to be displayed by the rules applicable to that time, not to the rule applicable today. Otherwise you can't answer questions like "did you create file foo.txt before the noon on April 1, 2001 deadline". How far back depends on the system. I work on one where the horizon is 2001; others have horizons that are either closer or farther away. But many of us -- perhaps all of us -- need some amount of history. paul

Tony Finch

2:16 p.m.

Walter <walter.stanish@gmail.com> wrote:

...

- The much referenced issue of 1.4 billion+ people on Beijing time being semantically mismatched to their timezone entry, which for lack of meaningful alternative timezone identification strings winds up getting displayed to users. - The need for a shared database of decent multilingual timezone names and descriptions. - The problem of how to display a timezone to the user that is detected via zone.tab and GeoIP, or similar methods.

Aren't these requirements addressed by the Unicode common locale data repository? http://cldr.unicode.org/ Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first. Rough, becoming slight or moderate. Showers, rain at first. Moderate or good, occasionally poor at first.

Paul_Koning＠Dell.com

2:43 p.m.

On Oct 12, 2012, at 10:16 AM, Tony Finch wrote:

...

Walter <walter.stanish@gmail.com> wrote:

...
- The much referenced issue of 1.4 billion+ people on Beijing time being semantically mismatched to their timezone entry, which for lack of meaningful alternative timezone identification strings winds up getting displayed to users. - The need for a shared database of decent multilingual timezone names and descriptions. - The problem of how to display a timezone to the user that is detected via zone.tab and GeoIP, or similar methods.

Aren't these requirements addressed by the Unicode common locale data repository? http://cldr.unicode.org/

Tony.

I believe so, though the data is not that easy to find. Cross-references might be helpful. paul

enh

3:50 p.m.

Yes, CLDR has what you need. That's what we use on Android to localize time zone names, starting from Olson ids. All that code is open source in AOSP, and icu4c does most of the work. What is commonly requested but AFAIK not available anywhere is a localized list of world cities with their corresponding Olson id. CLDR has exemplar cities, so you can get "Los Angeles" from America/Los_Angeles, say, but there's nothing that would go from "San Jose" or "Portland" or "Seattle" to America/Los_Angeles. I think Paul Koning from Dell was saying a while ago that Dell would use something like that too. I think when it comes to humans picking time zones, city names are what they want. Which is why you're mistakenly obsessed with localizing Asia/Shanghai as "Bejing Time". If there was a canonical source for this mapping, with an appropriate license, I'd use it in Android. AFAIK what lists there are aren't localized. Another potentially useful thing would be GPS coordinates of the boundaries of time zones, rather than the center of population the zone is named after. Again, the "human choosing a time zone" problem. On Oct 12, 2012 7:49 AM, <Paul_Koning@dell.com> wrote:

...

On Oct 12, 2012, at 10:16 AM, Tony Finch wrote:

...
Walter <walter.stanish@gmail.com> wrote:

...
- The much referenced issue of 1.4 billion+ people on Beijing time being semantically mismatched to their timezone entry, which for lack of meaningful alternative timezone identification strings winds up getting displayed to users. - The need for a shared database of decent multilingual timezone names and descriptions. - The problem of how to display a timezone to the user that is detected via zone.tab and GeoIP, or similar methods.

Aren't these requirements addressed by the Unicode common locale data repository? http://cldr.unicode.org/

Tony.

I believe so, though the data is not that easy to find. Cross-references might be helpful.

paul

Derick Rethans

5:03 p.m.

On Fri, 12 Oct 2012, enh wrote:

...

Yes, CLDR has what you need. That's what we use on Android to localize time zone names, starting from Olson ids. All that code is open source in AOSP, and icu4c does most of the work.

What is commonly requested but AFAIK not available anywhere is a localized list of world cities with their corresponding Olson id. CLDR has exemplar cities, so you can get "Los Angeles" from America/Los_Angeles, say, but there's nothing that would go from "San Jose" or "Portland" or "Seattle" to America/Los_Angeles.

Actually, such a list is there on http://download.geonames.org/export/dump/ - there is no meta data for the alternatives names for a city though, but it does map city to tzid. cheers, Derick -- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug Posted with an email client that doesn't mangle email: alpine

enh

5:18 p.m.

but where are, say, the Greek localizations in that? On Fri, Oct 12, 2012 at 10:03 AM, Derick Rethans <tz@derickrethans.nl> wrote:

...

On Fri, 12 Oct 2012, enh wrote:

...
Yes, CLDR has what you need. That's what we use on Android to localize time zone names, starting from Olson ids. All that code is open source in AOSP, and icu4c does most of the work.

What is commonly requested but AFAIK not available anywhere is a localized list of world cities with their corresponding Olson id. CLDR has exemplar cities, so you can get "Los Angeles" from America/Los_Angeles, say, but there's nothing that would go from "San Jose" or "Portland" or "Seattle" to America/Los_Angeles.

Actually, such a list is there on http://download.geonames.org/export/dump/ - there is no meta data for the alternatives names for a city though, but it does map city to tzid.

cheers, Derick

-- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug Posted with an email client that doesn't mangle email: alpine

-- Elliott Hughes - http://who/enh - http://jessies.org/~enh/ NIO, JNI, or bionic questions? Mail me/drop by/add me as a reviewer.

Paul_Koning＠Dell.com

6:28 p.m.

On Oct 12, 2012, at 1:18 PM, enh wrote:

...

but where are, say, the Greek localizations in that?

Right here: http://unicode.org/cldr/trac/browser/trunk/common/main/el.xml There are *lots* of languages there, including some not very well known ones, like http://unicode.org/cldr/trac/browser/trunk/common/main/cy.xml or http://unicode.org/cldr/trac/browser/trunk/common/main/chr.xml . paul

enh

6:43 p.m.

no, that's the data i told you Android uses several posts ago! those are localized names for the time zones. we're already all in agreement that that's a solved problem, solved by CLDR. i was saying that the useful thing i'm missing is _city_ names. (not exemplar names, either. re-read my original mail.) On Fri, Oct 12, 2012 at 11:28 AM, <Paul_Koning@dell.com> wrote:

...

On Oct 12, 2012, at 1:18 PM, enh wrote:

...
but where are, say, the Greek localizations in that?

Right here: http://unicode.org/cldr/trac/browser/trunk/common/main/el.xml

There are *lots* of languages there, including some not very well known ones, like http://unicode.org/cldr/trac/browser/trunk/common/main/cy.xml or http://unicode.org/cldr/trac/browser/trunk/common/main/chr.xml .

paul

-- Elliott Hughes - http://who/enh - http://jessies.org/~enh/ NIO, JNI, or bionic questions? Mail me/drop by/add me as a reviewer.

Walter

8:19 p.m.

...

no, that's the data i told you Android uses several posts ago! those are localized names for the time zones. we're already all in agreement that that's a solved problem, solved by CLDR.

We are not in agreement on this point. CLDR translates principal city names affiliated with timezones, and provides a mapping to Windows timezones. That is all. (It doesn't define or translate timezone names, or allow for the grouping of historical timezones in to a reduced set of entitites suitable for presentation to end users.)

...

i was saying that the useful thing i'm missing is _city_ names. (not exemplar names, either. re-read my original mail.)

It's easy to find internationalised geoname data... one approach might be to look at language equivalence across the Wikipedia dataset, another OpenStreetmaps, etc. - Walter

enh

8:24 p.m.

On Fri, Oct 12, 2012 at 1:19 PM, Walter <walter.stanish@gmail.com> wrote:

...

...
no, that's the data i told you Android uses several posts ago! those are localized names for the time zones. we're already all in agreement that that's a solved problem, solved by CLDR.

We are not in agreement on this point.

CLDR translates principal city names affiliated with timezones, and provides a mapping to Windows timezones. That is all.

(It doesn't define or translate timezone names,

nonsense. this is exactly what Android uses.

...

or allow for the grouping of historical timezones in to a reduced set of entitites suitable for presentation to end users.)

...
i was saying that the useful thing i'm missing is _city_ names. (not exemplar names, either. re-read my original mail.)

It's easy to find internationalised geoname data... one approach might be to look at language equivalence across the Wikipedia dataset, another OpenStreetmaps, etc.

- Walter

-- Elliott Hughes - http://who/enh - http://jessies.org/~enh/ NIO, JNI, or bionic questions? Mail me/drop by/add me as a reviewer.

Mark Davis ☕

12:39 a.m.

That is incorrect. The metazone mechanism that you found in ICU comes from CLDR. For more about it (and timezone names), you can also look at the LDML spec. Mark <https://plus.google.com/114199149796022210033> * * *— Il meglio è l’inimico del bene —* ** On Fri, Oct 12, 2012 at 1:19 PM, Walter <walter.stanish@gmail.com> wrote:

...

...
no, that's the data i told you Android uses several posts ago! those are localized names for the time zones. we're already all in agreement that that's a solved problem, solved by CLDR.

We are not in agreement on this point.

CLDR translates principal city names affiliated with timezones, and provides a mapping to Windows timezones. That is all.

(It doesn't define or translate timezone names, or allow for the grouping of historical timezones in to a reduced set of entitites suitable for presentation to end users.)

...
i was saying that the useful thing i'm missing is _city_ names. (not exemplar names, either. re-read my original mail.)

It's easy to find internationalised geoname data... one approach might be to look at language equivalence across the Wikipedia dataset, another OpenStreetmaps, etc.

- Walter

Random832

4:36 a.m.

On 10/12/2012 04:19 PM, Walter wrote:

...

CLDR translates principal city names affiliated with timezones, and provides a mapping to Windows timezones. That is all.

(It doesn't define or translate timezone names, or allow for the grouping of historical timezones in to a reduced set of entitites suitable for presentation to end users.) It actually does both - it defines an entity called a "metazone" that is for example "America_Eastern", provides mappings (both present _and_ historical) of all the various Olson zones that map to it (for example, America/New_York and America/Indianapolis), and each language file may contain translations for the name of it (for example, es.xml has an entry with <generic>Hora oriental</generic><standard>Hora estándar oriental</standard><daylight>Hora de verano oriental</daylight>). I don't know how you came to believe that it contained none of these things.

Walter

12:04 p.m.

...

...
CLDR translates principal city names affiliated with timezones, and provides a mapping to Windows timezones. That is all.

(It doesn't define or translate timezone names, or allow for the grouping of historical timezones in to a reduced set of entitites suitable for presentation to end users.)

It actually does both - it defines an entity called a "metazone" that is for example "America_Eastern", provides mappings (both present _and_ historical) of all the various Olson zones that map to it (for example, America/New_York and America/Indianapolis), and each language file may contain translations for the name of it (for example, es.xml has an entry with <generic>Hora oriental</generic><standard>Hora estándar oriental</standard><daylight>Hora de verano oriental</daylight>). I don't know how you came to believe that it contained none of these things.

You are right. It seems I had somehow missed 'supplemental/metaZones.xml' within core.zip of the CLDR distribution which does indeed seem to do this. (i18n names live in 'common/main/<language>.xml' under timeZoneNames.) Thanks for taking the time to point this out, and I hope this thread is useful to people searching in future. - Walter

Walter

8:20 p.m.

...

...
but where are, say, the Greek localizations in that?

Right here: http://unicode.org/cldr/trac/browser/trunk/common/main/el.xml

You will note that there are no timezone names. Merely city names. - Walter

Paul_Koning＠Dell.com

8:43 p.m.

On Oct 12, 2012, at 4:20 PM, Walter wrote:

...

...
...
but where are, say, the Greek localizations in that?

Right here: http://unicode.org/cldr/trac/browser/trunk/common/main/el.xml

You will note that there are no timezone names. Merely city names.

- Walter

Huh? I'm not sure what you think a timezone name looks like. "Americas/New_York" is a timezone name, and those cldr files provide translations of those. paul

Mark Davis ☕

12:42 a.m.

I think that is a matter of terminology. In CLDR we call "Americas/New_York" an ID (not a name), since it is an internal identifier, and shouldn't be shown to users. A "name" is something you'd show to a user, like "New York Zeit" or "Nordamerikanische Ostküstenzeit" for a German user, or "Восточно-американское время" for a Russian user. Mark <https://plus.google.com/114199149796022210033> * * *— Il meglio è l’inimico del bene —* ** On Fri, Oct 12, 2012 at 1:43 PM, <Paul_Koning@dell.com> wrote:

...

On Oct 12, 2012, at 4:20 PM, Walter wrote:

...
...
...
but where are, say, the Greek localizations in that?

Right here: http://unicode.org/cldr/trac/browser/trunk/common/main/el.xml

You will note that there are no timezone names. Merely city names.

- Walter

Huh? I'm not sure what you think a timezone name looks like. "Americas/New_York" is a timezone name, and those cldr files provide translations of those.

paul

Guy Harris

1:19 a.m.

On Oct 12, 2012, at 5:42 PM, Mark Davis ☕ <mark@macchiato.com> wrote:

...

I think that is a matter of terminology. In CLDR we call "Americas/New_York" an ID (not a name), since it is an internal identifier, and shouldn't be shown to users. A "name" is something you'd show to a user, like "New York Zeit" or "Nordamerikanische Ostküstenzeit" for a German user, or "Восточно-американское время" for a Russian user.

...or "Eastern Standard Time" or "US Eastern Standard Time" for a US user and perhaps the latter even for other English-speaking users, i.e. "Americas/New_York" is not even intended to be the Official Display Name For Users for *English-speaking* users. An ID can even refer to more than one "time zone" in the sense of, for example, "Eastern Standard Time", if a given location moved between "time zones" in that sense. To quote a comment in the northamerica file: # Daviess, Dubois, Knox, and Martin Counties, Indiana, # switched from eastern to central time in April 2006, then switched back # in November 2007. so the ID America/Indiana/Vincennes does *NOT* map to any single "time zone" in the sense of "Eastern Standard Time" or "Central Standard Time"; it isn't intended to do so, and it should not do so, as the idea is that a machine in those counties running an OS using the tz database should be configured for America/Indiana/Vincennes and, if the tz database is kept up-to-date on it, the configuration would *not* have to have been changed in April 2006 or November 2007.

Walter

9:06 p.m.

...

Yes, CLDR has what you need.

It doesn't. It doesn't have timezone names, nor the capacity to collapse historically accrued timezones (with near-zero interest to modern users) in to reduced sets for display.

...

That's what we use on Android to localize time zone names, starting from Olson ids. All that code is open source in AOSP, and icu4c does most of the work.

Hrrm. Seems I was looking at the wrong bit of code. OK, ICU does seem to have the goods here! (1) establishing a more meaningful ("meta") namespace for modern use that is separate to the Olson IDs; (2) a translated library of human-meaningful timezone names (these are in the "source/data/zone" directory, visible online at http://www.opensource.apple.com/source/ICU/ICU-461.12/icuSources/data/zone/ probably amongst other places). Anyway, it appears that we have a solution. Hooray. (Though does anyone else find it interesting that the purview of fixing timezone data to something human-accessible falls to a project like this: http://en.wikipedia.org/wiki/International_Components_for_Unicode ?) Onwards and thanks for the educational discussion, Walter

Paul_Koning＠Dell.com

9:17 p.m.

On Oct 12, 2012, at 5:06 PM, Walter wrote:

...

...
Yes, CLDR has what you need.

It doesn't. It doesn't have timezone names, nor the capacity to collapse historically accrued timezones (with near-zero interest to modern users) ...

You're making an unwarranted and unsupported generalization. The fact that this information is not interesting to you doesn't mean it isn't interesting to others. If you want to filter timezone data to compress out the parts you don't care about, go right ahead. It's a small matter of programming once you know what you need. I did exactly that to remove pre-2001 data which in my application was not necessary. I didn't have to get help from this list for that; it's just a couple of lines changed in zic to do what was needed. paul

Walter

9:52 p.m.

...

...
...
Yes, CLDR has what you need. It doesn't. It doesn't have timezone names, nor the capacity to collapse historically accrued timezones (with near-zero interest to modern users) ... You're making an unwarranted and unsupported generalization. The fact that this information is not interesting to you doesn't mean it isn't interesting to others.

If you want to filter timezone data to compress out the parts you don't care about, go right ahead. It's a small matter of programming once you know what you need.

True, but for maintenance purposes I try not to go too far out on a limb for such basic components... I had assumed tz would solve the 'reasonable list of timezones in current use' issue. Since tz apparently considers it out of scope, then the only real solution for our project's requirements appears to be ICU. This seems slightly obtuse, but is perfectly workable. Interestingly though, perhaps because unlike the tz database, ICU's focus is not that of providing a timezone database, at least some of the ICU bindings in other languages do not appear to provide any way to list available timezones (apparently provided by 'enumerate' functionality within ICU). *sigh* That one can be worked around, but in this day and age, so many hoops for such a fundamental dataset seems a little saddening. 2-bit notion of the email: perhaps the tz maintainers could recognize the problems with the current Olson IDs and consider a policy of importing aliases from the 'meta:' namespace within ICU without the associated multilingual names baggage? - Walter

4984

Age (days ago)

4987

Last active (days ago)

List overview

Download

31 comments

11 participants

participants (11)

Derick Rethans
enh
Guy Harris
Mark Davis ☕
Paul Eggert
Paul_Koning＠Dell.com
Philip Newton
Random832
Robert Elz
Tony Finch
Walter