Over a month ago Mark Davis posted an email that mentioned issues related to translations of time zone IDs provided by the common locale data repository project (CLDR). I'm interested to find about the status of this localization and to find a way to keep informed of the progress. The initial email discussion was somewhat lengthy and I'm sure that I did not digest all the points mentioned. The beginning of the CLDR proposal explains the current mechanism for localizing Olson Time Zone Identifiers (Olson TZIDs). And in the tz code directory, the rules for naming Olson TZIDs are in the Theory file. Many Olson TZIDs simply do not localize well. I believe that the Olson TZIDs were not designed with localization in mind. I suggest a zoneMeta.tab file (very similar to zone.tab) be created for the propose of facilitating localization. Such a file could include the following columns: - ISO 3166-1 - latitude/longitude (from zone.tab) - Olson TZID - city or place name - sub-division (i.e. state, province, prefecture, or kingdom) - time zone name, general (e.g. Eastern Time) - time zone name, specific if is exists (e.g. Eastern Standard Time - Indiana - most locations) - time zone name, std (e.g. Eastern Standard Time) - time zone name, dst (e.g. Eastern Daylight Saving Time) - historic (yes/no) - attributes (major, minor, etc.) Is Center, North Dakota minor? (perhaps historical and attributes could be combined?) - geographical comments (entire country or ISO subdivision codes?) - revision notes (always the last column) additionally there could be columns to track if a particular time zone is within another time zone. - tz level (e.g. Denver == 0, Arizona == 1) - parent Olson TZID (e.g. Arizona's parent is Denver) *** It's important that this information is maintained in the time zone database. The maintainers of the database should be the ones to determine or declare if a particular time zone is modern, historical, major, minor, or has other attributes. I also think that a meta data file for time zone abbreviations would be useful. Many abbreviations are important and widely used while others are invented by the maintainers for convenience. Currently, time zone abbreviations are not unique (e.g. EST) and need to be associated with both an ISO 3166-1 code and an Olson TZID. A abbrMeta.tab file could contain the following columns: - ISO 3166-1 - OlsonTZID - time zone abbr - regular (e.g. PT for Pacific Time) - time zone abbr - std (e.g. PST for Pacific Standard Time) - time zone abbr - dst (e.g. PDT for Pacific Daylight Saving Time) - time zone name - regular (e.g. Pacific Time for PT) - time zone name - std (e.g. Pacific Standard Time for PST) - time zone name - dst (e.g. Pacific Daylight Saving Time for PDT) - historic (yes/no) - abbr valid or in use (yes/no) - comments - revision notes A unique time zone abbreviation ID could could eliminate the need to have time zone names in both the abbrMeta.tab and zoneMeta.tab files. The zoneMeta.tab file could contain unique time zone abbreviation IDs that refer to the abbrMeta.tab file. Perhaps, simply appending the abbreviation to the ISO 3166-1 code. Pacific Standard Time for the US could be US_PDT. The purpose of these meta data files would be to provide English readable time zone information (more comprehensive than zone.tab) and to facilitate localization. These files could be added to the data directory and clearly noted that they are a work in progress. If this or a similar approach is taken the validity of these data will evolve over time. Questions: - Should the tz database attempt to maintain or track ISO 3166-1 codes that are either obsolete or do not have an associated time zone? - Should XML or XLIFF be considered for meta data only? This would allow future support for local city, sub-division, and time zone names utilizing Unicode. A local name would be in the country's own language (and there might be secondary local names for countries with more than one language). I expect to be very busy for the next couple of weeks, but I'm interested to find out if my comments are useful. Chuck
Chuck Soper <chucks@lmi.net> writes:
Many Olson TZIDs simply do not localize well. I believe that the Olson TZIDs were not designed with localization in mind.
I'm not sure what you mean here. TZIDs are merely identifiers for regions of the globe.
I suggest a zoneMeta.tab file (very similar to zone.tab) be created for the propose of facilitating localization. Such a file could include the following columns:
- ISO 3166-1 - latitude/longitude (from zone.tab) - Olson TZID - city or place name
This is all in zone.tab now, for English only. The city or place name should be localized of course.
- sub-division (i.e. state, province, prefecture, or kingdom)
Often the TZID subdivisions are ad hoc; this reflects the ad hoc nature of time zone and DST political decisions. For the best TZ subdivision data I know of, please see Gwillim Law's Statoids <http://www.statoids.com/statoids.html>. Another source is Oscar van Vlijmen's Time zone boundaries for multizone countries <http://home-4.tiscali.nl/~t876506/Multizones.html>.
- time zone name, general (e.g. Eastern Time) - time zone name, specific if is exists (e.g. Eastern Standard Time - Indiana - most locations) - time zone name, std (e.g. Eastern Standard Time) - time zone name, dst (e.g. Eastern Daylight Saving Time)
This is a different table, one that is not currently addressed explicitly by the Olson database. There is not a 1-1 correspondence between time zone IDs and these names; it's a different relation. Also, there is widespread disagreement over what the names are, in any particular locale. Different Australians disagree about what the Australian English names are for their time zones, for example. I don't envy the job of people who would have to compile this information. (This is a polite way of saying that I'm skeptical about whether anybody's going to do it. :-)
- historic (yes/no)
An Olson Time zone identifier corresponds to a set of clocks that have used the same civil time since 1970, so I don't know what you mean by this attribute. Are you referring to discontinued identifiers?
additionally there could be columns to track if a particular time zone is within another time zone.
Olson TZIDs are either disjoint, or aliases. There are no parents. Perhaps this should be changed, but it would have to be designed carefully.
Currently, time zone abbreviations are not unique (e.g. EST) and need to be associated with both an ISO 3166-1 code and an Olson TZID.
Currently, to generate a time zone abbreviation, you need (1) an Olson TZID and (2) the UTC time stamp that you want the time zone abbreviation for. A single Olson TZID can map to many time zone abbreviations.
- Should the tz database attempt to maintain or track ISO 3166-1 codes that are either obsolete or do not have an associated time zone?
iso3166.tab tracks all the current ISO 3166-1 identifiers. We haven't seen any need to worry about obsolete codes.
- Should XML or XLIFF be considered for meta data only?
That's an interchange issue, not a schema issue. The schema is more important. The source data can be stored in a simple text file, as now.
I'm interested to find about the status of this localization and to find a way to keep informed of the progress.
I suggest contacting Mark Davis.
Thank you for responding to my long email and I'm sorry for my delayed response. I think that my original post went too far towards addressing localization issues. I will scale back that suggestion and try to explain my rationale. When an operating system organization, a software developer, or perhaps, the common locale data repository project (CLDR) needs to display time zone location names (based on zone info files) in English or localized to another language that zone.tab and other files in tzdata is generally the starting point. Many TZIDs do not (nor should they) express the city and state/province for the TZID. I know of at least 48 such TZIDs. I suggest that two new columns be added to zone.tab, location (city) and sub-division (state/province). The sub-division would be for the city not the entire time zone region. For example, Illinois for America/Chicago. For many or most TZIDs the sub-division field may be blank. I'm fairly sure that most or all of the information to do this is already in the tzdata files. I believe that the task of assembling English readable tz location names that correspond with TZIDs for the purpose of display (English or localized) has been done many times. Each time this task is done it would be reasonable to expect slightly different results. Having these two new columns, location and sub-division, would add consistency and improve maintenance of tz location names. Also, having the columns would prevent the task from continually being repeated by different people and organizations. On my Unix-based system the tz location name for Chicago is "Chicago" not "Chicago, Illinois". I think this is because the state/province information is not easily accessible from the zone.tab file. This is my suggestion. I'm not sure there is interest in this suggestion, but I at least wanted to convey what I was thinking. Chuck I have a few additional comments below. At 10:05 PM -0700 9/2/04, Paul Eggert wrote:
Chuck Soper <chucks@lmi.net> writes:
Many Olson TZIDs simply do not localize well. I believe that the Olson TZIDs were not designed with localization in mind.
I'm not sure what you mean here. TZIDs are merely identifiers for regions of the globe.
I agree.
I suggest a zoneMeta.tab file (very similar to zone.tab) be created for the propose of facilitating localization. Such a file could include the following columns:
- ISO 3166-1 - latitude/longitude (from zone.tab) - Olson TZID - city or place name
This is all in zone.tab now, for English only. The city or place name should be localized of course.
- sub-division (i.e. state, province, prefecture, or kingdom)
Often the TZID subdivisions are ad hoc; this reflects the ad hoc nature of time zone and DST political decisions. For the best TZ subdivision data I know of, please see Gwillim Law's Statoids <http://www.statoids.com/statoids.html>. Another source is Oscar van Vlijmen's Time zone boundaries for multizone countries <http://home-4.tiscali.nl/~t876506/Multizones.html>.
By sub-division, I meant for the tz location city not the entire time zone region. I am also interested in time zone regions and I appreciate these references.
- time zone name, general (e.g. Eastern Time) - time zone name, specific if is exists (e.g. Eastern Standard Time - Indiana - most locations) - time zone name, std (e.g. Eastern Standard Time) - time zone name, dst (e.g. Eastern Daylight Saving Time)
This is a different table, one that is not currently addressed explicitly by the Olson database. There is not a 1-1 correspondence between time zone IDs and these names; it's a different relation.
Also, there is widespread disagreement over what the names are, in any particular locale. Different Australians disagree about what the Australian English names are for their time zones, for example. I don't envy the job of people who would have to compile this information. (This is a polite way of saying that I'm skeptical about whether anybody's going to do it. :-)
I would like to try this for a project that I am working on, but I suspect it will be difficult.
- historic (yes/no)
An Olson Time zone identifier corresponds to a set of clocks that have used the same civil time since 1970, so I don't know what you mean by this attribute. Are you referring to discontinued identifiers?
My idea of labeling historic (and modern) time zones comes from the following text I read in the CLDR Time Zone Localization proposal: "A lot of people just don't care about historic differences." While I do agree with the statement, I am not sure if there are clear definitions for historic and modern time zones. Perhaps, the entire country of Argentina is a historic time zone. :-) In any case, my comments about historic/modern time zones should be directed to the CLDR.
additionally there could be columns to track if a particular time zone is within another time zone.
Olson TZIDs are either disjoint, or aliases. There are no parents. Perhaps this should be changed, but it would have to be designed carefully.
I like the idea of tracking parent/child relationship for time zones, yet I'm not sure how important it is.
Currently, time zone abbreviations are not unique (e.g. EST) and need to be associated with both an ISO 3166-1 code and an Olson TZID.
Currently, to generate a time zone abbreviation, you need (1) an Olson TZID and (2) the UTC time stamp that you want the time zone abbreviation for. A single Olson TZID can map to many time zone abbreviations.
Thanks for the clarification. I need to look at this more closely.
- Should the tz database attempt to maintain or track ISO 3166-1 codes that are either obsolete or do not have an associated time zone?
iso3166.tab tracks all the current ISO 3166-1 identifiers. We haven't seen any need to worry about obsolete codes.
I agree. I think I suggested this because YU and BV were mentioned in the original Time Zone Localization discussion related to CLDR.
- Should XML or XLIFF be considered for meta data only?
That's an interchange issue, not a schema issue. The schema is more important. The source data can be stored in a simple text file, as now.
This makes sense to me. (I was in a localization mindset.)
I'm interested to find about the status of this localization and to find a way to keep informed of the progress.
I suggest contacting Mark Davis.
Chuck Soper <chucks@lmi.net> writes:
I suggest that two new columns be added to zone.tab, location (city) and sub-division (state/province). The sub-division would be for the city not the entire time zone region. For example, Illinois for America/Chicago.
But America/Chicago identifies a fairly large chunk of the United States, including Iowa, Missouri, most of (but not all of) Kansas, etc. The main idea behind the "America/Chicago" and the current latitude/longitude is to identify a single point in the region, a point that will continue to be identified if the region splits (an event that occurs from time to time). The latitude/longitude is a quite-inadequate substitute for what is really needed (namely, the entire region boundary), but it's the best we've got right now. I worry that adding a column with data like "Illinois" would be a step in the wrong direction, and would cause more confusion than it would cure, since "Illinois" is an attribute of Chicago, and is not a direct attribute of the America/Chicago TZID. What we really need are the region boundaries (ideally hooked up to GPS :-), or some data that will let us derive the region boundaries from other databases. The current "comments" column is an informal attempt in that direction, and I'd rather focus our efforts there.
At 1:33 PM -0700 9/9/04, Paul Eggert wrote:
Chuck Soper <chucks@lmi.net> writes:
I suggest that two new columns be added to zone.tab, location (city) and sub-division (state/province). The sub-division would be for the city not the entire time zone region. For example, Illinois for America/Chicago.
But America/Chicago identifies a fairly large chunk of the United States, including Iowa, Missouri, most of (but not all of) Kansas, etc.
The main idea behind the "America/Chicago" and the current latitude/longitude is to identify a single point in the region, a point that will continue to be identified if the region splits (an event that occurs from time to time). The latitude/longitude is a quite-inadequate substitute for what is really needed (namely, the entire region boundary), but it's the best we've got right now. I worry that adding a column with data like "Illinois" would be a step in the wrong direction, and would cause more confusion than it would cure, since "Illinois" is an attribute of Chicago, and is not a direct attribute of the America/Chicago TZID.
I tend to think that: - a populous geographic point within a time zone region, and - a description of the entire time zone region are both useful. When I want to know what time what is in Sydney, the display name "Sydney, Australia" is more understandable to me than "Eastern Standard Time" or "Eastern Summer Time". Yet, if I lived in Australia then Eastern Time would be more understandable. I live near San Francisco, California. If someone talks about the time zone for Los Angeles it sounds strange, but if they say, "Pacific Time", then it makes sense. I believe that the understandability of a time zone display name (Los Angeles or Pacific Time) is dependent on where you are from. Even if the time zone boundaries were available there would still be a need for a display name such as "Chicago", "Chicago, Illinois", "Central Daylight Time", or "Chicago - Central Time". I don't think that adding a column with data like "Illinois" wouldn't cause more confusion, it would just maintain the save level of confusion. :-) On the other hand, adding a couple columns to zone.tab would probably not provide any immediate benefit to anyone. And it is work that would take time. Thanks for listening to my feedback.
What we really need are the region boundaries (ideally hooked up to GPS :-), or some data that will let us derive the region boundaries from other databases. The current "comments" column is an informal attempt in that direction, and I'd rather focus our efforts there.
I like the idea of heading in the direction of having the actual region boundaries. About a year ago, using ESRI software, I thoroughly looked manifold.net's World Time Zones Map. At that time it was a little out of date. It looked like it would be difficult to maintain. I'm glad that Manifold made it available. The current "comments" column contains various data. I think it would help if this was a little more formal. A "region" column with a specific format might be an answer, yet I think there would probably be a lot of issues (standardization of codes such as ISO or FIPS, etc.). Perhaps, the Open Geospatial Consortium would be interested in helping add region boundaries to the tz database? http://www.opengeospatial.org/about/?page=vision They clearly state: "Our core mission is to deliver spatial interface specifications that are openly available for global use." Chuck
participants (2)
-
Chuck Soper -
Paul Eggert