How can this project successfully handle pre-1970 data? Why have I focussed so much on IDs and their value to tzdb? Intro ------- In previous threads I've focussed on IDs, and laid out different ways to describe the groups of IDs we have. For now, lets focus solely on the two main groups - region IDs, which represent abstract regions where clocks have been the same since 1970 - non-region IDs, which represent locations where tzdb has, at some point, added an ID The key observation is that the segregation between these two groups of IDs *did not exist* until around 2014. It was only in 2018 that the ISO country rule was removed. It has only been since 2014 or so that IDs have been merged. What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not). What has happened since is a split, where no split previously existed. A split which favours some locations over others. The recent mailing list debates merely represent the cumulation of this favouritism. The irony is that the merges are being done in the name of equity and fairness, when the outcome has actually been the exact opposite - picking favourites, and denigrating everywhere else. That the approach to picking favourites is according to a standardised largest city rule isn't really that relevant here - it is the outcome that is unfair, not the process. If region IDs were of different appearance (eg. numeric or textually different) then the issues would not have arisen. The mistake was taking a fully functional and fully integral set of IDs, and bifurcating it into two groups. The split was actually a huge change in the policy of tzdb, which has been added drip by drip, rather than something that was ever fully appreciated up front. FWIW, it is clear to me that there is an aspect of imposing a US-centric timezone system on other parts of the world. The recent tzdb approach of focussing entirely on timezone regions makes perfect sense for the US, where region boundaries do not follow state lines, and ordinary members of the public need to be aware of whether they are in US/Mountain or US/Central. This simply isn't the timezone model in many other parts of the world. In places like Europe and Asia, the timezone is driven primarily by the country you live in - an ordinary member of the public in Iceland is never going to associate with some abstract timezone region stretching down the Atlantic that is not named, not legally defined and is little more than a random outcome based on tzdb's choice of 1970. Even in somewhere like Norway, an ordinary member of the public will understand that although they follow CET, their timezone is actually driven by their Government in Oslo. The brilliance of the original rule - "ID as needed for post-1970 data, with at least one per ISO country" - was that it seamlessly handled *both* models of timezone in one unified set of IDs. Removal of the ISO country part has completely destabilised that balance. As a constraint to this thread, tzdb really needs to offer one standard view of data, not command line flags that allow different views. If downstream projects end up with different views of the data, it makes tzdb a much less reliable source. (tzdb can be packaged in different ways on the same machine, for example it is undesirable for Postgres' internal tzdb and the OS tzdb to diverge for the same version). Given this, what needs to be nailed down is what is the default data set that tzdb publishes - there isn't really much point in talking about compile time flags, or that the contents of backzone could be used by someone. Lets look at seven options for pre-1970 data: Pre-1970 data for regions only ----------------------------------------- 1) Pre-1970 data for regions only - Despite looking identical to other IDs, region IDs are treated as special/favoured - Pre-1970 data for non-region IDs is of no importance whatsoever, thus most get pre-1970 data from another country/continent - The split between region and non-region locations is fully completed - the US-centric timezone model is dominant - An end user in Iceland is supposed to use Africa/Abidjan, Europe/Reykjavik is treated as a historical mistake of tzdb kept around only for backwards compatibility Pre-1970 data for most IDs ------------------------------------ 2) Pre-1970 data for each ID meeting the rule "ID as needed for post-1970 data, with at least one per ISO country" - The split between region and non-region locations is healed - High quality data from places like Iceland and Norway is retained, but low quality data from elsewhere is restored - The pre-1970 data is simply viewed as the best available data for the each location which can be improved over time - Most end-users in Iceland would expect tzdb to provide pre-1970 data from Iceland, not the Ivory Coast 3) Pre-1970 data for for all IDs that currently exist except true aliases - This is very similar to #2, but would include something like Montreal which was effectively mistakenly added to tzdb - This doesn't seem as desirable as #2 4) Pre-1970 data for any ID where the pre-1970 data is high quality - Subjective on quality, which doesn't seem like a great idea - It does avoid bringing bad quality data back into the main tzdb distribution - On balance, #2 has fewer places for debate to arise 5) Pre-1970 data for all IDs that currently exist except true aliases, plus a *new* set of IDs representing regions - As per #2, but adding IDs like "Region/12345" or "Region/Berlin" - Regions should contain post-1970 data only, as the region is by definition only meaningful post-1970 - It is not entirely clear what this solves over just going with #2 - This option is connected to Russ' proposals [1], although he suggests a bigger split between timekeeping data and ID naming - Perhaps it makes sense if the new region IDs were internal and not normally seen by end users? Remove pre-1970 data from general use ------------------------------------------------------- 6) No pre-1970 data whatsoever, all IDs are post-1970 only - Project policy is that tzdb is focussed on post-1970 data only - Paul repeatedly tells us that pre-1970 data is unreliable, and people shouldn't use it - Pre-1970 data would not be deleted, but would not be available in most downstreams - The split between region and non-region locations is effectively healed - End users lose access to pre-1970 data, which is particularly notable in some locations where that data is reliable, eg London - It is unknown at this point what the user impact is of removing pre-1970 data from major financial/business centres (no major location has yet been merged) 7) No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it - As per #5, the existing IDs get no pre-1970 data and the split between region and non-region locations is healed - New IDs, such as "Historic/Europe/London" or "Historic/Africa/Abidjan" get created for each of the main IDs - The historic IDs include pre-1970 data, the standard IDs do not - There is a mechanical transformation between the historic and non-historic IDs - Whether downstreams do or do not include the historic IDs is their choice, potentially based on available space - The data provided by the standard non-historic IDs remains the same whether the downstream includes the historic IDs or not (a Good Thing) - Users have to deliberately opt-in to get pre-1970 data, which might make them think about the accuracy issue Notes -------- - I'm not discussing adding new IDs simply to represent locations whose clocks differ only before 1970. I don't personally think that is a job for tzdb, and even if it is, it is a job for a different thread. - I'm not discussing what value is returned prior to 1970 when pre-1970 data is removed. That would be a job for a different thread. - Link vs Zone is not important for this discussion. Summary ------------- After a few weeks of thinking, these are the options I've come up with. Feel free to suggest another option or variant if you think I've missed anything obvious. I believe that was a disaster that the brilliant "ID as needed for post-1970 data, with at least one per ISO country" rule was removed. It has created needless division, bifurcating a unified set of ID into region and non-region IDs and creating backwards compatibility issues in the data of many locations. IMO, there are two basic models of timezones in the world, and moving tzdb from one that supported both to one that only supports the US-centric model is simply a mistake that needs correcting. As such, my preference would be to adopt option 2. Option 6 or 7 could work, and might be the best choices if we had a clean slate or if there was some hidden pressure that the list is unaware of to remove pre-1970 data, but they are risky options given we do not know the impact on end-users in major financial/business centres. Stephen [1] https://mm.icann.org/pipermail/tz/2021-September/030518.html
This discussion seems to have settled down, so here are my thoughts: 1) I would like to commend Paul Eggert for handling this debate as graciously as possible under the circumstances. 2) I would also like to commend Stephen Colebourne for his persistence in raising the problem of losing the pre-1970 data, and the problem of merging zone IDs from seemingly unrelated regions and political entities. (Although I think the discussion about replacing the TZDB Coordinator was perhaps not helpful, since no one else seemed to want the job.) 3) In an ideal world, I think the decision to use or not use the pre-1970 data ought to be made by the end-user or the end-developer, not by the timezone library maintainer or the OS maintainer. Even if some of the pre-1970 data is "low quality", if that information is not available anywhere else, TZ DB should make it readily accessible to the end-users. 4) As I recall, Paul gave 2 main reasons for moving the pre-1970 data to backzone and combining post-1970 IDs: (a) fairness, and (b) the maintenance burden. (a) The fairness argument has not made sense to me, even after seeing it multiple times. Personally, I think it is ok that some countries have better data than others, as long as it was not caused by malicious intent. (b) The maintenance burden argument is more compelling. If Paul Eggert is the only one willing to maintain this data, and if he finds it burdensome, then it is what it is. 5) With regards to the specific options listed in Stephen's email, I find Option 2 to be compelling ("ID as needed for post-1970 data, with at least one per ISO country") , because I think most end-users and end-developers understand timezones in this way --- their country's political system determines the rules for the timezone(s) in their country. It seems to me that the concept of timezones is inherently a political creation, not a technical one. Various posts on this list about how we should "avoid politics" have not made sense to me. 6) I also find Option 7 to be interesting ("No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it"). I offer a slight variation: What if we placed the "Historic" part at the end of the ID path, such as "Europe/London/Historic" and "Africa/Abidjan/Historic"? Then the timezone library can choose to use the closest matching timezone if it does not have a Historic pre-1970 database installed, so it can default to "Europe/London" and "Africa/Abidjan" instead. 7) As a maintainer of an independent timezone library, I would like to request that the "API" into the TZDB project be the raw files themselves (e.g. africa, europe, northamerica, etc), instead of the TZif files or the Makefile. My library uses its own TZDB parser, and its own binary representation instead of TZif, and does not use zic, zdump, or the provided Makefile. I believe there are other major 3rd party libraries which have their own parsers and binary representation formats: Joda-Time, Java java.time, C++20/Hinnant date, and Noda Time. 8) If the only way for end-users to have access to the pre-1970 data is through a fork of TZDB, then it is not ideal, but I don't think it's the end of the world. Different libraries may choose to use different databases, and users will have to deal with mismatching timezone identifiers and differing DST transition rules. But it seems that end-users and end-developers are forced to deal with those issues right now anyway. Since different libraries are packaged with different versions of the TZDB, and different OS's have different update schedules. Brian On Mon, Oct 18, 2021 at 6:08 AM Stephen Colebourne via tz <tz@iana.org> wrote:
How can this project successfully handle pre-1970 data? Why have I focussed so much on IDs and their value to tzdb?
Intro ------- In previous threads I've focussed on IDs, and laid out different ways to describe the groups of IDs we have. For now, lets focus solely on the two main groups - region IDs, which represent abstract regions where clocks have been the same since 1970 - non-region IDs, which represent locations where tzdb has, at some point, added an ID
The key observation is that the segregation between these two groups of IDs *did not exist* until around 2014. It was only in 2018 that the ISO country rule was removed. It has only been since 2014 or so that IDs have been merged. What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not). What has happened since is a split, where no split previously existed. A split which favours some locations over others.
The recent mailing list debates merely represent the cumulation of this favouritism. The irony is that the merges are being done in the name of equity and fairness, when the outcome has actually been the exact opposite - picking favourites, and denigrating everywhere else. That the approach to picking favourites is according to a standardised largest city rule isn't really that relevant here - it is the outcome that is unfair, not the process.
If region IDs were of different appearance (eg. numeric or textually different) then the issues would not have arisen. The mistake was taking a fully functional and fully integral set of IDs, and bifurcating it into two groups. The split was actually a huge change in the policy of tzdb, which has been added drip by drip, rather than something that was ever fully appreciated up front.
FWIW, it is clear to me that there is an aspect of imposing a US-centric timezone system on other parts of the world. The recent tzdb approach of focussing entirely on timezone regions makes perfect sense for the US, where region boundaries do not follow state lines, and ordinary members of the public need to be aware of whether they are in US/Mountain or US/Central. This simply isn't the timezone model in many other parts of the world. In places like Europe and Asia, the timezone is driven primarily by the country you live in - an ordinary member of the public in Iceland is never going to associate with some abstract timezone region stretching down the Atlantic that is not named, not legally defined and is little more than a random outcome based on tzdb's choice of 1970. Even in somewhere like Norway, an ordinary member of the public will understand that although they follow CET, their timezone is actually driven by their Government in Oslo. The brilliance of the original rule - "ID as needed for post-1970 data, with at least one per ISO country" - was that it seamlessly handled *both* models of timezone in one unified set of IDs. Removal of the ISO country part has completely destabilised that balance.
As a constraint to this thread, tzdb really needs to offer one standard view of data, not command line flags that allow different views. If downstream projects end up with different views of the data, it makes tzdb a much less reliable source. (tzdb can be packaged in different ways on the same machine, for example it is undesirable for Postgres' internal tzdb and the OS tzdb to diverge for the same version). Given this, what needs to be nailed down is what is the default data set that tzdb publishes - there isn't really much point in talking about compile time flags, or that the contents of backzone could be used by someone.
Lets look at seven options for pre-1970 data:
Pre-1970 data for regions only ----------------------------------------- 1) Pre-1970 data for regions only - Despite looking identical to other IDs, region IDs are treated as special/favoured - Pre-1970 data for non-region IDs is of no importance whatsoever, thus most get pre-1970 data from another country/continent - The split between region and non-region locations is fully completed - the US-centric timezone model is dominant - An end user in Iceland is supposed to use Africa/Abidjan, Europe/Reykjavik is treated as a historical mistake of tzdb kept around only for backwards compatibility
Pre-1970 data for most IDs ------------------------------------ 2) Pre-1970 data for each ID meeting the rule "ID as needed for post-1970 data, with at least one per ISO country" - The split between region and non-region locations is healed - High quality data from places like Iceland and Norway is retained, but low quality data from elsewhere is restored - The pre-1970 data is simply viewed as the best available data for the each location which can be improved over time - Most end-users in Iceland would expect tzdb to provide pre-1970 data from Iceland, not the Ivory Coast
3) Pre-1970 data for for all IDs that currently exist except true aliases - This is very similar to #2, but would include something like Montreal which was effectively mistakenly added to tzdb - This doesn't seem as desirable as #2
4) Pre-1970 data for any ID where the pre-1970 data is high quality - Subjective on quality, which doesn't seem like a great idea - It does avoid bringing bad quality data back into the main tzdb distribution - On balance, #2 has fewer places for debate to arise
5) Pre-1970 data for all IDs that currently exist except true aliases, plus a *new* set of IDs representing regions - As per #2, but adding IDs like "Region/12345" or "Region/Berlin" - Regions should contain post-1970 data only, as the region is by definition only meaningful post-1970 - It is not entirely clear what this solves over just going with #2 - This option is connected to Russ' proposals [1], although he suggests a bigger split between timekeeping data and ID naming - Perhaps it makes sense if the new region IDs were internal and not normally seen by end users?
Remove pre-1970 data from general use ------------------------------------------------------- 6) No pre-1970 data whatsoever, all IDs are post-1970 only - Project policy is that tzdb is focussed on post-1970 data only - Paul repeatedly tells us that pre-1970 data is unreliable, and people shouldn't use it - Pre-1970 data would not be deleted, but would not be available in most downstreams - The split between region and non-region locations is effectively healed - End users lose access to pre-1970 data, which is particularly notable in some locations where that data is reliable, eg London - It is unknown at this point what the user impact is of removing pre-1970 data from major financial/business centres (no major location has yet been merged)
7) No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it - As per #5, the existing IDs get no pre-1970 data and the split between region and non-region locations is healed - New IDs, such as "Historic/Europe/London" or "Historic/Africa/Abidjan" get created for each of the main IDs - The historic IDs include pre-1970 data, the standard IDs do not - There is a mechanical transformation between the historic and non-historic IDs - Whether downstreams do or do not include the historic IDs is their choice, potentially based on available space - The data provided by the standard non-historic IDs remains the same whether the downstream includes the historic IDs or not (a Good Thing) - Users have to deliberately opt-in to get pre-1970 data, which might make them think about the accuracy issue
Notes -------- - I'm not discussing adding new IDs simply to represent locations whose clocks differ only before 1970. I don't personally think that is a job for tzdb, and even if it is, it is a job for a different thread. - I'm not discussing what value is returned prior to 1970 when pre-1970 data is removed. That would be a job for a different thread. - Link vs Zone is not important for this discussion.
Summary ------------- After a few weeks of thinking, these are the options I've come up with. Feel free to suggest another option or variant if you think I've missed anything obvious.
I believe that was a disaster that the brilliant "ID as needed for post-1970 data, with at least one per ISO country" rule was removed. It has created needless division, bifurcating a unified set of ID into region and non-region IDs and creating backwards compatibility issues in the data of many locations. IMO, there are two basic models of timezones in the world, and moving tzdb from one that supported both to one that only supports the US-centric model is simply a mistake that needs correcting.
As such, my preference would be to adopt option 2. Option 6 or 7 could work, and might be the best choices if we had a clean slate or if there was some hidden pressure that the list is unaware of to remove pre-1970 data, but they are risky options given we do not know the impact on end-users in major financial/business centres.
Stephen
[1] https://mm.icann.org/pipermail/tz/2021-September/030518.html
Thanks Brian and Adhemar for your thoughts. Does anyone else want to chime in on the best way to move forward? thanks Stephen On Fri, 22 Oct 2021 at 00:59, Brian Park <brian@xparks.net> wrote:
This discussion seems to have settled down, so here are my thoughts:
1) I would like to commend Paul Eggert for handling this debate as graciously as possible under the circumstances.
2) I would also like to commend Stephen Colebourne for his persistence in raising the problem of losing the pre-1970 data, and the problem of merging zone IDs from seemingly unrelated regions and political entities. (Although I think the discussion about replacing the TZDB Coordinator was perhaps not helpful, since no one else seemed to want the job.)
3) In an ideal world, I think the decision to use or not use the pre-1970 data ought to be made by the end-user or the end-developer, not by the timezone library maintainer or the OS maintainer. Even if some of the pre-1970 data is "low quality", if that information is not available anywhere else, TZ DB should make it readily accessible to the end-users.
4) As I recall, Paul gave 2 main reasons for moving the pre-1970 data to backzone and combining post-1970 IDs: (a) fairness, and (b) the maintenance burden. (a) The fairness argument has not made sense to me, even after seeing it multiple times. Personally, I think it is ok that some countries have better data than others, as long as it was not caused by malicious intent. (b) The maintenance burden argument is more compelling. If Paul Eggert is the only one willing to maintain this data, and if he finds it burdensome, then it is what it is.
5) With regards to the specific options listed in Stephen's email, I find Option 2 to be compelling ("ID as needed for post-1970 data, with at least one per ISO country") , because I think most end-users and end-developers understand timezones in this way --- their country's political system determines the rules for the timezone(s) in their country. It seems to me that the concept of timezones is inherently a political creation, not a technical one. Various posts on this list about how we should "avoid politics" have not made sense to me.
6) I also find Option 7 to be interesting ("No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it"). I offer a slight variation: What if we placed the "Historic" part at the end of the ID path, such as "Europe/London/Historic" and "Africa/Abidjan/Historic"? Then the timezone library can choose to use the closest matching timezone if it does not have a Historic pre-1970 database installed, so it can default to "Europe/London" and "Africa/Abidjan" instead.
7) As a maintainer of an independent timezone library, I would like to request that the "API" into the TZDB project be the raw files themselves (e.g. africa, europe, northamerica, etc), instead of the TZif files or the Makefile. My library uses its own TZDB parser, and its own binary representation instead of TZif, and does not use zic, zdump, or the provided Makefile. I believe there are other major 3rd party libraries which have their own parsers and binary representation formats: Joda-Time, Java java.time, C++20/Hinnant date, and Noda Time.
8) If the only way for end-users to have access to the pre-1970 data is through a fork of TZDB, then it is not ideal, but I don't think it's the end of the world. Different libraries may choose to use different databases, and users will have to deal with mismatching timezone identifiers and differing DST transition rules. But it seems that end-users and end-developers are forced to deal with those issues right now anyway. Since different libraries are packaged with different versions of the TZDB, and different OS's have different update schedules.
Brian
On Mon, Oct 18, 2021 at 6:08 AM Stephen Colebourne via tz <tz@iana.org> wrote:
How can this project successfully handle pre-1970 data? Why have I focussed so much on IDs and their value to tzdb?
Intro ------- In previous threads I've focussed on IDs, and laid out different ways to describe the groups of IDs we have. For now, lets focus solely on the two main groups - region IDs, which represent abstract regions where clocks have been the same since 1970 - non-region IDs, which represent locations where tzdb has, at some point, added an ID
The key observation is that the segregation between these two groups of IDs *did not exist* until around 2014. It was only in 2018 that the ISO country rule was removed. It has only been since 2014 or so that IDs have been merged. What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not). What has happened since is a split, where no split previously existed. A split which favours some locations over others.
The recent mailing list debates merely represent the cumulation of this favouritism. The irony is that the merges are being done in the name of equity and fairness, when the outcome has actually been the exact opposite - picking favourites, and denigrating everywhere else. That the approach to picking favourites is according to a standardised largest city rule isn't really that relevant here - it is the outcome that is unfair, not the process.
If region IDs were of different appearance (eg. numeric or textually different) then the issues would not have arisen. The mistake was taking a fully functional and fully integral set of IDs, and bifurcating it into two groups. The split was actually a huge change in the policy of tzdb, which has been added drip by drip, rather than something that was ever fully appreciated up front.
FWIW, it is clear to me that there is an aspect of imposing a US-centric timezone system on other parts of the world. The recent tzdb approach of focussing entirely on timezone regions makes perfect sense for the US, where region boundaries do not follow state lines, and ordinary members of the public need to be aware of whether they are in US/Mountain or US/Central. This simply isn't the timezone model in many other parts of the world. In places like Europe and Asia, the timezone is driven primarily by the country you live in - an ordinary member of the public in Iceland is never going to associate with some abstract timezone region stretching down the Atlantic that is not named, not legally defined and is little more than a random outcome based on tzdb's choice of 1970. Even in somewhere like Norway, an ordinary member of the public will understand that although they follow CET, their timezone is actually driven by their Government in Oslo. The brilliance of the original rule - "ID as needed for post-1970 data, with at least one per ISO country" - was that it seamlessly handled *both* models of timezone in one unified set of IDs. Removal of the ISO country part has completely destabilised that balance.
As a constraint to this thread, tzdb really needs to offer one standard view of data, not command line flags that allow different views. If downstream projects end up with different views of the data, it makes tzdb a much less reliable source. (tzdb can be packaged in different ways on the same machine, for example it is undesirable for Postgres' internal tzdb and the OS tzdb to diverge for the same version). Given this, what needs to be nailed down is what is the default data set that tzdb publishes - there isn't really much point in talking about compile time flags, or that the contents of backzone could be used by someone.
Lets look at seven options for pre-1970 data:
Pre-1970 data for regions only ----------------------------------------- 1) Pre-1970 data for regions only - Despite looking identical to other IDs, region IDs are treated as special/favoured - Pre-1970 data for non-region IDs is of no importance whatsoever, thus most get pre-1970 data from another country/continent - The split between region and non-region locations is fully completed - the US-centric timezone model is dominant - An end user in Iceland is supposed to use Africa/Abidjan, Europe/Reykjavik is treated as a historical mistake of tzdb kept around only for backwards compatibility
Pre-1970 data for most IDs ------------------------------------ 2) Pre-1970 data for each ID meeting the rule "ID as needed for post-1970 data, with at least one per ISO country" - The split between region and non-region locations is healed - High quality data from places like Iceland and Norway is retained, but low quality data from elsewhere is restored - The pre-1970 data is simply viewed as the best available data for the each location which can be improved over time - Most end-users in Iceland would expect tzdb to provide pre-1970 data from Iceland, not the Ivory Coast
3) Pre-1970 data for for all IDs that currently exist except true aliases - This is very similar to #2, but would include something like Montreal which was effectively mistakenly added to tzdb - This doesn't seem as desirable as #2
4) Pre-1970 data for any ID where the pre-1970 data is high quality - Subjective on quality, which doesn't seem like a great idea - It does avoid bringing bad quality data back into the main tzdb distribution - On balance, #2 has fewer places for debate to arise
5) Pre-1970 data for all IDs that currently exist except true aliases, plus a *new* set of IDs representing regions - As per #2, but adding IDs like "Region/12345" or "Region/Berlin" - Regions should contain post-1970 data only, as the region is by definition only meaningful post-1970 - It is not entirely clear what this solves over just going with #2 - This option is connected to Russ' proposals [1], although he suggests a bigger split between timekeeping data and ID naming - Perhaps it makes sense if the new region IDs were internal and not normally seen by end users?
Remove pre-1970 data from general use ------------------------------------------------------- 6) No pre-1970 data whatsoever, all IDs are post-1970 only - Project policy is that tzdb is focussed on post-1970 data only - Paul repeatedly tells us that pre-1970 data is unreliable, and people shouldn't use it - Pre-1970 data would not be deleted, but would not be available in most downstreams - The split between region and non-region locations is effectively healed - End users lose access to pre-1970 data, which is particularly notable in some locations where that data is reliable, eg London - It is unknown at this point what the user impact is of removing pre-1970 data from major financial/business centres (no major location has yet been merged)
7) No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it - As per #5, the existing IDs get no pre-1970 data and the split between region and non-region locations is healed - New IDs, such as "Historic/Europe/London" or "Historic/Africa/Abidjan" get created for each of the main IDs - The historic IDs include pre-1970 data, the standard IDs do not - There is a mechanical transformation between the historic and non-historic IDs - Whether downstreams do or do not include the historic IDs is their choice, potentially based on available space - The data provided by the standard non-historic IDs remains the same whether the downstream includes the historic IDs or not (a Good Thing) - Users have to deliberately opt-in to get pre-1970 data, which might make them think about the accuracy issue
Notes -------- - I'm not discussing adding new IDs simply to represent locations whose clocks differ only before 1970. I don't personally think that is a job for tzdb, and even if it is, it is a job for a different thread. - I'm not discussing what value is returned prior to 1970 when pre-1970 data is removed. That would be a job for a different thread. - Link vs Zone is not important for this discussion.
Summary ------------- After a few weeks of thinking, these are the options I've come up with. Feel free to suggest another option or variant if you think I've missed anything obvious.
I believe that was a disaster that the brilliant "ID as needed for post-1970 data, with at least one per ISO country" rule was removed. It has created needless division, bifurcating a unified set of ID into region and non-region IDs and creating backwards compatibility issues in the data of many locations. IMO, there are two basic models of timezones in the world, and moving tzdb from one that supported both to one that only supports the US-centric model is simply a mistake that needs correcting.
As such, my preference would be to adopt option 2. Option 6 or 7 could work, and might be the best choices if we had a clean slate or if there was some hidden pressure that the list is unaware of to remove pre-1970 data, but they are risky options given we do not know the impact on end-users in major financial/business centres.
Stephen
[1] https://mm.icann.org/pipermail/tz/2021-September/030518.html
I strongly support option 2. Howard On Nov 2, 2021, at 3:09 AM, Stephen Colebourne via tz <tz@iana.org> wrote:
Thanks Brian and Adhemar for your thoughts. Does anyone else want to chime in on the best way to move forward? thanks Stephen
On Fri, 22 Oct 2021 at 00:59, Brian Park <brian@xparks.net> wrote:
This discussion seems to have settled down, so here are my thoughts:
1) I would like to commend Paul Eggert for handling this debate as graciously as possible under the circumstances.
2) I would also like to commend Stephen Colebourne for his persistence in raising the problem of losing the pre-1970 data, and the problem of merging zone IDs from seemingly unrelated regions and political entities. (Although I think the discussion about replacing the TZDB Coordinator was perhaps not helpful, since no one else seemed to want the job.)
3) In an ideal world, I think the decision to use or not use the pre-1970 data ought to be made by the end-user or the end-developer, not by the timezone library maintainer or the OS maintainer. Even if some of the pre-1970 data is "low quality", if that information is not available anywhere else, TZ DB should make it readily accessible to the end-users.
4) As I recall, Paul gave 2 main reasons for moving the pre-1970 data to backzone and combining post-1970 IDs: (a) fairness, and (b) the maintenance burden. (a) The fairness argument has not made sense to me, even after seeing it multiple times. Personally, I think it is ok that some countries have better data than others, as long as it was not caused by malicious intent. (b) The maintenance burden argument is more compelling. If Paul Eggert is the only one willing to maintain this data, and if he finds it burdensome, then it is what it is.
5) With regards to the specific options listed in Stephen's email, I find Option 2 to be compelling ("ID as needed for post-1970 data, with at least one per ISO country") , because I think most end-users and end-developers understand timezones in this way --- their country's political system determines the rules for the timezone(s) in their country. It seems to me that the concept of timezones is inherently a political creation, not a technical one. Various posts on this list about how we should "avoid politics" have not made sense to me.
6) I also find Option 7 to be interesting ("No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it"). I offer a slight variation: What if we placed the "Historic" part at the end of the ID path, such as "Europe/London/Historic" and "Africa/Abidjan/Historic"? Then the timezone library can choose to use the closest matching timezone if it does not have a Historic pre-1970 database installed, so it can default to "Europe/London" and "Africa/Abidjan" instead.
7) As a maintainer of an independent timezone library, I would like to request that the "API" into the TZDB project be the raw files themselves (e.g. africa, europe, northamerica, etc), instead of the TZif files or the Makefile. My library uses its own TZDB parser, and its own binary representation instead of TZif, and does not use zic, zdump, or the provided Makefile. I believe there are other major 3rd party libraries which have their own parsers and binary representation formats: Joda-Time, Java java.time, C++20/Hinnant date, and Noda Time.
8) If the only way for end-users to have access to the pre-1970 data is through a fork of TZDB, then it is not ideal, but I don't think it's the end of the world. Different libraries may choose to use different databases, and users will have to deal with mismatching timezone identifiers and differing DST transition rules. But it seems that end-users and end-developers are forced to deal with those issues right now anyway. Since different libraries are packaged with different versions of the TZDB, and different OS's have different update schedules.
Brian
On Mon, Oct 18, 2021 at 6:08 AM Stephen Colebourne via tz <tz@iana.org> wrote:
How can this project successfully handle pre-1970 data? Why have I focussed so much on IDs and their value to tzdb?
Intro ------- In previous threads I've focussed on IDs, and laid out different ways to describe the groups of IDs we have. For now, lets focus solely on the two main groups - region IDs, which represent abstract regions where clocks have been the same since 1970 - non-region IDs, which represent locations where tzdb has, at some point, added an ID
The key observation is that the segregation between these two groups of IDs *did not exist* until around 2014. It was only in 2018 that the ISO country rule was removed. It has only been since 2014 or so that IDs have been merged. What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not). What has happened since is a split, where no split previously existed. A split which favours some locations over others.
The recent mailing list debates merely represent the cumulation of this favouritism. The irony is that the merges are being done in the name of equity and fairness, when the outcome has actually been the exact opposite - picking favourites, and denigrating everywhere else. That the approach to picking favourites is according to a standardised largest city rule isn't really that relevant here - it is the outcome that is unfair, not the process.
If region IDs were of different appearance (eg. numeric or textually different) then the issues would not have arisen. The mistake was taking a fully functional and fully integral set of IDs, and bifurcating it into two groups. The split was actually a huge change in the policy of tzdb, which has been added drip by drip, rather than something that was ever fully appreciated up front.
FWIW, it is clear to me that there is an aspect of imposing a US-centric timezone system on other parts of the world. The recent tzdb approach of focussing entirely on timezone regions makes perfect sense for the US, where region boundaries do not follow state lines, and ordinary members of the public need to be aware of whether they are in US/Mountain or US/Central. This simply isn't the timezone model in many other parts of the world. In places like Europe and Asia, the timezone is driven primarily by the country you live in - an ordinary member of the public in Iceland is never going to associate with some abstract timezone region stretching down the Atlantic that is not named, not legally defined and is little more than a random outcome based on tzdb's choice of 1970. Even in somewhere like Norway, an ordinary member of the public will understand that although they follow CET, their timezone is actually driven by their Government in Oslo. The brilliance of the original rule - "ID as needed for post-1970 data, with at least one per ISO country" - was that it seamlessly handled *both* models of timezone in one unified set of IDs. Removal of the ISO country part has completely destabilised that balance.
As a constraint to this thread, tzdb really needs to offer one standard view of data, not command line flags that allow different views. If downstream projects end up with different views of the data, it makes tzdb a much less reliable source. (tzdb can be packaged in different ways on the same machine, for example it is undesirable for Postgres' internal tzdb and the OS tzdb to diverge for the same version). Given this, what needs to be nailed down is what is the default data set that tzdb publishes - there isn't really much point in talking about compile time flags, or that the contents of backzone could be used by someone.
Lets look at seven options for pre-1970 data:
Pre-1970 data for regions only ----------------------------------------- 1) Pre-1970 data for regions only - Despite looking identical to other IDs, region IDs are treated as special/favoured - Pre-1970 data for non-region IDs is of no importance whatsoever, thus most get pre-1970 data from another country/continent - The split between region and non-region locations is fully completed - the US-centric timezone model is dominant - An end user in Iceland is supposed to use Africa/Abidjan, Europe/Reykjavik is treated as a historical mistake of tzdb kept around only for backwards compatibility
Pre-1970 data for most IDs ------------------------------------ 2) Pre-1970 data for each ID meeting the rule "ID as needed for post-1970 data, with at least one per ISO country" - The split between region and non-region locations is healed - High quality data from places like Iceland and Norway is retained, but low quality data from elsewhere is restored - The pre-1970 data is simply viewed as the best available data for the each location which can be improved over time - Most end-users in Iceland would expect tzdb to provide pre-1970 data from Iceland, not the Ivory Coast
3) Pre-1970 data for for all IDs that currently exist except true aliases - This is very similar to #2, but would include something like Montreal which was effectively mistakenly added to tzdb - This doesn't seem as desirable as #2
4) Pre-1970 data for any ID where the pre-1970 data is high quality - Subjective on quality, which doesn't seem like a great idea - It does avoid bringing bad quality data back into the main tzdb distribution - On balance, #2 has fewer places for debate to arise
5) Pre-1970 data for all IDs that currently exist except true aliases, plus a *new* set of IDs representing regions - As per #2, but adding IDs like "Region/12345" or "Region/Berlin" - Regions should contain post-1970 data only, as the region is by definition only meaningful post-1970 - It is not entirely clear what this solves over just going with #2 - This option is connected to Russ' proposals [1], although he suggests a bigger split between timekeeping data and ID naming - Perhaps it makes sense if the new region IDs were internal and not normally seen by end users?
Remove pre-1970 data from general use ------------------------------------------------------- 6) No pre-1970 data whatsoever, all IDs are post-1970 only - Project policy is that tzdb is focussed on post-1970 data only - Paul repeatedly tells us that pre-1970 data is unreliable, and people shouldn't use it - Pre-1970 data would not be deleted, but would not be available in most downstreams - The split between region and non-region locations is effectively healed - End users lose access to pre-1970 data, which is particularly notable in some locations where that data is reliable, eg London - It is unknown at this point what the user impact is of removing pre-1970 data from major financial/business centres (no major location has yet been merged)
7) No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it - As per #5, the existing IDs get no pre-1970 data and the split between region and non-region locations is healed - New IDs, such as "Historic/Europe/London" or "Historic/Africa/Abidjan" get created for each of the main IDs - The historic IDs include pre-1970 data, the standard IDs do not - There is a mechanical transformation between the historic and non-historic IDs - Whether downstreams do or do not include the historic IDs is their choice, potentially based on available space - The data provided by the standard non-historic IDs remains the same whether the downstream includes the historic IDs or not (a Good Thing) - Users have to deliberately opt-in to get pre-1970 data, which might make them think about the accuracy issue
Notes -------- - I'm not discussing adding new IDs simply to represent locations whose clocks differ only before 1970. I don't personally think that is a job for tzdb, and even if it is, it is a job for a different thread. - I'm not discussing what value is returned prior to 1970 when pre-1970 data is removed. That would be a job for a different thread. - Link vs Zone is not important for this discussion.
Summary ------------- After a few weeks of thinking, these are the options I've come up with. Feel free to suggest another option or variant if you think I've missed anything obvious.
I believe that was a disaster that the brilliant "ID as needed for post-1970 data, with at least one per ISO country" rule was removed. It has created needless division, bifurcating a unified set of ID into region and non-region IDs and creating backwards compatibility issues in the data of many locations. IMO, there are two basic models of timezones in the world, and moving tzdb from one that supported both to one that only supports the US-centric model is simply a mistake that needs correcting.
As such, my preference would be to adopt option 2. Option 6 or 7 could work, and might be the best choices if we had a clean slate or if there was some hidden pressure that the list is unaware of to remove pre-1970 data, but they are risky options given we do not know the impact on end-users in major financial/business centres.
Stephen
[1] https://mm.icann.org/pipermail/tz/2021-September/030518.html
Hi Stephen, I don't really have to add much to what you said. I think option #6 or #7 would be okay if this would have been the stituation from the beginning, but as we're now many decades later, I believe that the establishment of new "Historic" timezones isn't particularly useful. TZIDs have been so wildly used, with an understanding of the "currently" (pre-split) situation. Whether we like it or not, downstream users have made use of pre-1970 data, even for the regions with dubious accuracy, but of course also for the regions with high-accuracy. Removing the data for the high-accuracy areas seems a little odd to do. I also believe that removing the "ID as needed for post-1970 data, with at least one per ISO country" rule was a mistake, and I would like to see that reinstatated, which brings me to strongly support your #2. cheers, Derick On Tue, 2 Nov 2021, Stephen Colebourne via tz wrote:
Thanks Brian and Adhemar for your thoughts. Does anyone else want to chime in on the best way to move forward? thanks Stephen
On Fri, 22 Oct 2021 at 00:59, Brian Park <brian@xparks.net> wrote:
This discussion seems to have settled down, so here are my thoughts:
1) I would like to commend Paul Eggert for handling this debate as graciously as possible under the circumstances.
2) I would also like to commend Stephen Colebourne for his persistence in raising the problem of losing the pre-1970 data, and the problem of merging zone IDs from seemingly unrelated regions and political entities. (Although I think the discussion about replacing the TZDB Coordinator was perhaps not helpful, since no one else seemed to want the job.)
3) In an ideal world, I think the decision to use or not use the pre-1970 data ought to be made by the end-user or the end-developer, not by the timezone library maintainer or the OS maintainer. Even if some of the pre-1970 data is "low quality", if that information is not available anywhere else, TZ DB should make it readily accessible to the end-users.
4) As I recall, Paul gave 2 main reasons for moving the pre-1970 data to backzone and combining post-1970 IDs: (a) fairness, and (b) the maintenance burden. (a) The fairness argument has not made sense to me, even after seeing it multiple times. Personally, I think it is ok that some countries have better data than others, as long as it was not caused by malicious intent. (b) The maintenance burden argument is more compelling. If Paul Eggert is the only one willing to maintain this data, and if he finds it burdensome, then it is what it is.
5) With regards to the specific options listed in Stephen's email, I find Option 2 to be compelling ("ID as needed for post-1970 data, with at least one per ISO country") , because I think most end-users and end-developers understand timezones in this way --- their country's political system determines the rules for the timezone(s) in their country. It seems to me that the concept of timezones is inherently a political creation, not a technical one. Various posts on this list about how we should "avoid politics" have not made sense to me.
6) I also find Option 7 to be interesting ("No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it"). I offer a slight variation: What if we placed the "Historic" part at the end of the ID path, such as "Europe/London/Historic" and "Africa/Abidjan/Historic"? Then the timezone library can choose to use the closest matching timezone if it does not have a Historic pre-1970 database installed, so it can default to "Europe/London" and "Africa/Abidjan" instead.
7) As a maintainer of an independent timezone library, I would like to request that the "API" into the TZDB project be the raw files themselves (e.g. africa, europe, northamerica, etc), instead of the TZif files or the Makefile. My library uses its own TZDB parser, and its own binary representation instead of TZif, and does not use zic, zdump, or the provided Makefile. I believe there are other major 3rd party libraries which have their own parsers and binary representation formats: Joda-Time, Java java.time, C++20/Hinnant date, and Noda Time.
8) If the only way for end-users to have access to the pre-1970 data is through a fork of TZDB, then it is not ideal, but I don't think it's the end of the world. Different libraries may choose to use different databases, and users will have to deal with mismatching timezone identifiers and differing DST transition rules. But it seems that end-users and end-developers are forced to deal with those issues right now anyway. Since different libraries are packaged with different versions of the TZDB, and different OS's have different update schedules.
Brian
On Mon, Oct 18, 2021 at 6:08 AM Stephen Colebourne via tz <tz@iana.org> wrote:
How can this project successfully handle pre-1970 data? Why have I focussed so much on IDs and their value to tzdb?
Intro ------- In previous threads I've focussed on IDs, and laid out different ways to describe the groups of IDs we have. For now, lets focus solely on the two main groups - region IDs, which represent abstract regions where clocks have been the same since 1970 - non-region IDs, which represent locations where tzdb has, at some point, added an ID
The key observation is that the segregation between these two groups of IDs *did not exist* until around 2014. It was only in 2018 that the ISO country rule was removed. It has only been since 2014 or so that IDs have been merged. What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not). What has happened since is a split, where no split previously existed. A split which favours some locations over others.
The recent mailing list debates merely represent the cumulation of this favouritism. The irony is that the merges are being done in the name of equity and fairness, when the outcome has actually been the exact opposite - picking favourites, and denigrating everywhere else. That the approach to picking favourites is according to a standardised largest city rule isn't really that relevant here - it is the outcome that is unfair, not the process.
If region IDs were of different appearance (eg. numeric or textually different) then the issues would not have arisen. The mistake was taking a fully functional and fully integral set of IDs, and bifurcating it into two groups. The split was actually a huge change in the policy of tzdb, which has been added drip by drip, rather than something that was ever fully appreciated up front.
FWIW, it is clear to me that there is an aspect of imposing a US-centric timezone system on other parts of the world. The recent tzdb approach of focussing entirely on timezone regions makes perfect sense for the US, where region boundaries do not follow state lines, and ordinary members of the public need to be aware of whether they are in US/Mountain or US/Central. This simply isn't the timezone model in many other parts of the world. In places like Europe and Asia, the timezone is driven primarily by the country you live in - an ordinary member of the public in Iceland is never going to associate with some abstract timezone region stretching down the Atlantic that is not named, not legally defined and is little more than a random outcome based on tzdb's choice of 1970. Even in somewhere like Norway, an ordinary member of the public will understand that although they follow CET, their timezone is actually driven by their Government in Oslo. The brilliance of the original rule - "ID as needed for post-1970 data, with at least one per ISO country" - was that it seamlessly handled *both* models of timezone in one unified set of IDs. Removal of the ISO country part has completely destabilised that balance.
As a constraint to this thread, tzdb really needs to offer one standard view of data, not command line flags that allow different views. If downstream projects end up with different views of the data, it makes tzdb a much less reliable source. (tzdb can be packaged in different ways on the same machine, for example it is undesirable for Postgres' internal tzdb and the OS tzdb to diverge for the same version). Given this, what needs to be nailed down is what is the default data set that tzdb publishes - there isn't really much point in talking about compile time flags, or that the contents of backzone could be used by someone.
Lets look at seven options for pre-1970 data:
Pre-1970 data for regions only ----------------------------------------- 1) Pre-1970 data for regions only - Despite looking identical to other IDs, region IDs are treated as special/favoured - Pre-1970 data for non-region IDs is of no importance whatsoever, thus most get pre-1970 data from another country/continent - The split between region and non-region locations is fully completed - the US-centric timezone model is dominant - An end user in Iceland is supposed to use Africa/Abidjan, Europe/Reykjavik is treated as a historical mistake of tzdb kept around only for backwards compatibility
Pre-1970 data for most IDs ------------------------------------ 2) Pre-1970 data for each ID meeting the rule "ID as needed for post-1970 data, with at least one per ISO country" - The split between region and non-region locations is healed - High quality data from places like Iceland and Norway is retained, but low quality data from elsewhere is restored - The pre-1970 data is simply viewed as the best available data for the each location which can be improved over time - Most end-users in Iceland would expect tzdb to provide pre-1970 data from Iceland, not the Ivory Coast
3) Pre-1970 data for for all IDs that currently exist except true aliases - This is very similar to #2, but would include something like Montreal which was effectively mistakenly added to tzdb - This doesn't seem as desirable as #2
4) Pre-1970 data for any ID where the pre-1970 data is high quality - Subjective on quality, which doesn't seem like a great idea - It does avoid bringing bad quality data back into the main tzdb distribution - On balance, #2 has fewer places for debate to arise
5) Pre-1970 data for all IDs that currently exist except true aliases, plus a *new* set of IDs representing regions - As per #2, but adding IDs like "Region/12345" or "Region/Berlin" - Regions should contain post-1970 data only, as the region is by definition only meaningful post-1970 - It is not entirely clear what this solves over just going with #2 - This option is connected to Russ' proposals [1], although he suggests a bigger split between timekeeping data and ID naming - Perhaps it makes sense if the new region IDs were internal and not normally seen by end users?
Remove pre-1970 data from general use ------------------------------------------------------- 6) No pre-1970 data whatsoever, all IDs are post-1970 only - Project policy is that tzdb is focussed on post-1970 data only - Paul repeatedly tells us that pre-1970 data is unreliable, and people shouldn't use it - Pre-1970 data would not be deleted, but would not be available in most downstreams - The split between region and non-region locations is effectively healed - End users lose access to pre-1970 data, which is particularly notable in some locations where that data is reliable, eg London - It is unknown at this point what the user impact is of removing pre-1970 data from major financial/business centres (no major location has yet been merged)
7) No existing IDs get pre-1970 data, but a *new* set of IDs are created containing it - As per #5, the existing IDs get no pre-1970 data and the split between region and non-region locations is healed - New IDs, such as "Historic/Europe/London" or "Historic/Africa/Abidjan" get created for each of the main IDs - The historic IDs include pre-1970 data, the standard IDs do not - There is a mechanical transformation between the historic and non-historic IDs - Whether downstreams do or do not include the historic IDs is their choice, potentially based on available space - The data provided by the standard non-historic IDs remains the same whether the downstream includes the historic IDs or not (a Good Thing) - Users have to deliberately opt-in to get pre-1970 data, which might make them think about the accuracy issue
Notes -------- - I'm not discussing adding new IDs simply to represent locations whose clocks differ only before 1970. I don't personally think that is a job for tzdb, and even if it is, it is a job for a different thread. - I'm not discussing what value is returned prior to 1970 when pre-1970 data is removed. That would be a job for a different thread. - Link vs Zone is not important for this discussion.
Summary ------------- After a few weeks of thinking, these are the options I've come up with. Feel free to suggest another option or variant if you think I've missed anything obvious.
I believe that was a disaster that the brilliant "ID as needed for post-1970 data, with at least one per ISO country" rule was removed. It has created needless division, bifurcating a unified set of ID into region and non-region IDs and creating backwards compatibility issues in the data of many locations. IMO, there are two basic models of timezones in the world, and moving tzdb from one that supported both to one that only supports the US-centric model is simply a mistake that needs correcting.
As such, my preference would be to adopt option 2. Option 6 or 7 could work, and might be the best choices if we had a clean slate or if there was some hidden pressure that the list is unaware of to remove pre-1970 data, but they are risky options given we do not know the impact on end-users in major financial/business centres.
Stephen
[1] https://mm.icann.org/pipermail/tz/2021-September/030518.html
-- PHP 7.4 Release Manager Host of PHP Internals News: https://phpinternals.news Like Xdebug? Consider supporting me: https://xdebug.org/support https://derickrethans.nl | https://xdebug.org | https://dram.io twitter: @derickr and @xdebug
I contribute from the standpoint of a developer and maintainer of astrology software and astrological web services. Astrologers need pre-1970 data for correct astrological charts. I accept that it is the responsibility of the 'astrological community' to collect and maintain historical time zone information. Thomas Shanks has been an important researcher in this field, and his data were published in book format. Some people I do not need to mention now made wrong business decisions by trying to make these data their private intellectual property, but these attempts have successfully challenged and were rejected. We can collectively consider time zone history information published in books as free public domain data. Of course I vote for the continued maintenance of pre-1970 data in TZDB. Among Stephen Colebourne's proposals, I think 2) makes the most sense.
2) Pre-1970 data for each ID meeting the rule "ID as needed for post-1970 data, with at least one per ISO country"
Considering all the existing Link records and backzone data, the condition 'at least one ID zone per ISO-country' is mostly fulfilled. If here is something missing, it can be added over a period of time without any urgency. On 02.11.21 08:09, Stephen Colebourne via tz wrote:
Thanks Brian and Adhemar for your thoughts. Does anyone else want to chime in on the best way to move forward? thanks Stephen
O
On Nov 5, 2021, at 4:49 AM, Alois Treindl via tz <tz@iana.org> wrote:
Astrologers need pre-1970 data for correct astrological charts.
Do they need *correct*, *complete* pre-1970 data? If so, then they either need to use a source other than tzdb or need to submit changes to the tzdb to provide that data, as the tzdb does *not* currently have it, and didn't necessarily even have it before Paul'z marging of zones.
I accept that it is the responsibility of the 'astrological community' to collect and maintain historical time zone information. Thomas Shanks has been an important researcher in this field,
...some of whose reports for both pre-1970 *and* post-1970 data have been contradicted by other sources; see a number of comments in the northamerica file, for example.
and his data were published in book format. Some people I do not need to mention now made wrong business decisions by trying to make these data their private intellectual property, but these attempts have successfully challenged and were rejected.
We can collectively consider time zone history information published in books as free public domain data.
Of course I vote for the continued maintenance of pre-1970 data in TZDB.
...which will require continued work on the part of the community to correct errors and fix omissions in that data.
On 10/18/21 06:07, Stephen Colebourne via tz wrote:
What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not).
That wasn't ever the case. For example, there was never full history (accurate or not) for San Marino. We shouldn't base our analysis on the idea that we formerly had at least one Zone per ISO country, as we never had an ironclad rule like that and we did just fine without any such rule. There's no *timekeeping* reason to require a Zone for every ISO country. Adding such a requirement would complicate maintenance. It would add a significant amount of likely-bogus data, as witness the recent discussion about the likely-bogus data for Bamako that's in 'backzone'. And it would increase the role of politics particularly as new countries emerge, and politics is something we should avoid as much as possible. These downsides of a one-Zone-per-country rule may not appear to be all that serious to people who are not actively maintaining the database, but as the primary maintainer of a database that I would like to be as accurate as possible, I would object to adding distracting and error-prone makework like that to my volunteer workload.
tzdb really needs to offer one standard view of data, not command line flags that allow different views. If downstream projects end up with different views of the data, it makes tzdb a much less reliable source
Unfortunately it's impossible in any voluntary project to assure complete uniformity on all platforms. It's been reasonably common for various downstream platforms to have their own wrinkles (their own names or zones, or they truncate data to certain years, or they have only a subset of the names, or they don't support some tzdata feature, or whatever) and the sky has not fallen as a result. Although we can attempt to lessen unnecessary differences, there are sometimes good reasons to give users different views of available data and I doubt whether it would be a good idea to prohibit variances like these, even in the long run. What I am sensing from your proposal, as well as from some of the followup comments, is a need to further clarify exactly what the tzdb project's interfaces are. Some downstream uses have relied on internal details of the database that have never been promised to be stable, and this has caused friction when these details change. It would be helpful, I think, to make the boundaries clearer.
On Wed, 3 Nov 2021 at 22:40, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 10/18/21 06:07, Stephen Colebourne via tz wrote:
What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not).
That wasn't ever the case. For example, there was never full history (accurate or not) for San Marino. We shouldn't base our analysis on the idea that we formerly had at least one Zone per ISO country, as we never had an ironclad rule like that and we did just fine without any such rule.
Lets unpack this for a minute. Looking at the state of tzdb in mid 2012: - Europe/San_Marino existed as an ID - it was an alias for Europe/Rome https://github.com/eggert/tz/blob/dccd5a16af62c52f2b49a2fe56270a710617cbbd/e... In practical terms as a user: - you could query it for full history - the data you got back was accurate post-1970 - the data you got back pre-1970 was of unknown accuracy (except LMT which was definitely inaccurate) - the data was the best researched data for San Marino available As such, I don't think it is correct to say that "there was never full history" for San Marino. The ID existed and history could be queried. The data that was available was good enough because San Marino shares enough geopolitical history with Rome that users can overlook the distinction. And no-one has ever been motivated to do better. This is a hugely different scenario to Reykjavik returning data from Abidjan where you are intending to knowingly make the data worse for end-users. The ironclad rule (AFAICT) is that there was always an *ID* for each ISO country, and that the data it returned was acceptably accurate, not outrageously wrong.
There's no *timekeeping* reason to require a Zone for every ISO country. Adding such a requirement would complicate maintenance.
I think someone born in Iceland before 1970 might well disagree that there is no timekeeping reason at work here. I think the real problem here is that you are trying to fundamentally change what tzdb offers. I'm here communicating as clearly as I can that end-users expect one zone per country as a minimum because that is what they have had for 15 or 20 years. Retaining backwards compatibility for IDs is great, but meaningless if those IDs return backwards incompatible data. Ultimately, you haven't addressed my key point that a perfectly rational unified set of IDs has been bifurcated into ones that are deemed important and ones that are not. That is quite specifically something *new*, a change from what the project previously provided. And I think most would objectively judge it as being a degradation of what is offered by tzdb.
These downsides of a one-Zone-per-country rule may not appear to be all that serious to people who are not actively maintaining the database, but as the primary maintainer of a database that I would like to be as accurate as possible, I would object to adding distracting and error-prone makework like that to my volunteer workload.
To be clear, I think this is exactly why tzdb should move beyond being a volunteer-led project. In practical terms, the only realistic financially supported option I'm aware of is CLDR. But it is up to those funding CLDR to decide if they are willing to pay to expand it's mandate. In reality I don't think there actually is any extra work, as you have already separately committed to including any historical data people provide, and new ISO codes are an extremely rare occurrence. The real work in recent years has been the fallout from your choice to degrade what tzdb offers. If you genuinely do want to reduce your volunteer work to only be the abstract post-1970 regions and not to maintain any data pre-1970, then you really should be clear about that. You could then look for an alternate maintainer of tzdb itself as you would be maintaining what amounts to a new database, which would best sit in a different git repo. That data could then be an input to tzdb itself. Stephen
I get the impression that this debate is caused by the existence of 2 different schools of thought: * Descriptive: Paul wants to describe the timezones of the world without regard to how those time zones were created, and merge them into the smallest set that can generate the timekeeping rules. I can see that in this view, merging timezones from different countries into the same equivalence class is reasonable. * Prescriptive: I think Stephen and others start with the fact that time zones are the creations of political organizations which write the regulations that define the timezones. Those governing bodies are predominantly organized by country in a hierarchical structure. In this view, it does *not* make sense to merge timezones from different countries. This view also implies that the TZ identifiers should reflect the political organizational structure of the world. I want to suggest that it may be possible for these 2 views to coexist. We could create a new file, e.g. call it 'countryzone', which contains a set of Links organized in a hierarchical tree by country, pointing to the Core zones. Paul can maintain the Core files as before, and 'countryzone' would be maintained by a different set of people. Assuming the Core timezones is a complete set that covers all unique timezones in the world, then all other ISO-country based timezones can be mapped to one of the Core timezones. For this to work, I think we need to clarify the semantics of the 'Link' records in the TZ database. As far as I can tell, there are at least 3 different meanings of the Link record: 1) Link Canonical Deprecated * Deprecated is an old zone which should no longer be used 2) Link Canonical Alternate * alternate spelling or alias, but not deprecated 3) Link Canonical Merged * zones which were merged because they have the same rules by chance, but there is no semantic relationship to each other I propose that we replace the 'Link' keyword with 3 new keywords that identify the precise meaning: LinkOld, LinkAlt, and LinkMerged. (My hope is that keeping the 'Link' prefix will make it easy to update existing TZDB parsers to preserve their previous behavior.) Slight aside: I learned that some 3rd party timezone libraries do not preserve round-trip zone Id for Links. In other words, (pseudo-code) `TimeZone(linkName).getName() != linkName`. I wonder if it is worth defining the expected behavior of each type of Links for downstream libraries. For the pre-1970 data, it is my understanding that the 'backzone' file contains Zone records which should replace ONLY the LinkMerged records found in the other files. I propose that all LinkMerged records be extracted into a separate file (let's call it 'mergedzone') so that there is a clear symmetry between 'backzone' and 'mergedzone', which allows them to be substituted for each other. The dependency diagram looks something like this: countryzone | v Core (africa, asia, etc...) +-- backzone +-- mergedzone Downstream libraries which want only post-1970 can use: countryzone, Core, mergedzone Downstream libraries which want to include pre-1970 can use: countryzone, Core, backzone @Stephen: We may be at a point where further debate is not productive. Perhaps we should create an exploratory fork of the TZDB to evaluate these ideas explicitly. It is easier to get feedback from a concrete implementation than to continue discussing ideas and options in a vacuum. I propose a GitHub project with an initial seed of the 10 raw TZDB files. And let's use the usual GitHub PR, Issues, and Discussions workflow, so that proposals can be reviewed and discussed before being committed into the repo. If there is any chance that this will result in being able to type "Canada/Toronto" instead of "America/Toronto", that would resolve an annoyance that has lasted some 30-35 years. Brian On Thu, Nov 4, 2021 at 4:04 PM Stephen Colebourne via tz <tz@iana.org> wrote:
On Wed, 3 Nov 2021 at 22:40, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 10/18/21 06:07, Stephen Colebourne via tz wrote:
What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country". Full history was available for each of these (whether accurate or not).
That wasn't ever the case. For example, there was never full history (accurate or not) for San Marino. We shouldn't base our analysis on the idea that we formerly had at least one Zone per ISO country, as we never had an ironclad rule like that and we did just fine without any such rule.
Lets unpack this for a minute.
Looking at the state of tzdb in mid 2012: - Europe/San_Marino existed as an ID - it was an alias for Europe/Rome
https://github.com/eggert/tz/blob/dccd5a16af62c52f2b49a2fe56270a710617cbbd/e...
In practical terms as a user: - you could query it for full history - the data you got back was accurate post-1970 - the data you got back pre-1970 was of unknown accuracy (except LMT which was definitely inaccurate) - the data was the best researched data for San Marino available
As such, I don't think it is correct to say that "there was never full history" for San Marino. The ID existed and history could be queried. The data that was available was good enough because San Marino shares enough geopolitical history with Rome that users can overlook the distinction. And no-one has ever been motivated to do better. This is a hugely different scenario to Reykjavik returning data from Abidjan where you are intending to knowingly make the data worse for end-users.
The ironclad rule (AFAICT) is that there was always an *ID* for each ISO country, and that the data it returned was acceptably accurate, not outrageously wrong.
There's no *timekeeping* reason to require a Zone for every ISO country. Adding such a requirement would complicate maintenance.
I think someone born in Iceland before 1970 might well disagree that there is no timekeeping reason at work here.
I think the real problem here is that you are trying to fundamentally change what tzdb offers. I'm here communicating as clearly as I can that end-users expect one zone per country as a minimum because that is what they have had for 15 or 20 years. Retaining backwards compatibility for IDs is great, but meaningless if those IDs return backwards incompatible data.
Ultimately, you haven't addressed my key point that a perfectly rational unified set of IDs has been bifurcated into ones that are deemed important and ones that are not. That is quite specifically something *new*, a change from what the project previously provided. And I think most would objectively judge it as being a degradation of what is offered by tzdb.
These downsides of a one-Zone-per-country rule may not appear to be all that serious to people who are not actively maintaining the database, but as the primary maintainer of a database that I would like to be as accurate as possible, I would object to adding distracting and error-prone makework like that to my volunteer workload.
To be clear, I think this is exactly why tzdb should move beyond being a volunteer-led project. In practical terms, the only realistic financially supported option I'm aware of is CLDR. But it is up to those funding CLDR to decide if they are willing to pay to expand it's mandate.
In reality I don't think there actually is any extra work, as you have already separately committed to including any historical data people provide, and new ISO codes are an extremely rare occurrence. The real work in recent years has been the fallout from your choice to degrade what tzdb offers.
If you genuinely do want to reduce your volunteer work to only be the abstract post-1970 regions and not to maintain any data pre-1970, then you really should be clear about that. You could then look for an alternate maintainer of tzdb itself as you would be maintaining what amounts to a new database, which would best sit in a different git repo. That data could then be an input to tzdb itself.
Stephen
On 2021-11-05 12:17:34 (+0800), Brian Park via tz wrote:
I get the impression that this debate is caused by the existence of 2 different schools of thought: [...]
I want to suggest that it may be possible for these 2 views to coexist.
They de facto coexist right now. The overwhelming majority of the data are descriptive. Only recent efforts have made some of the post-1970 data appear more prescriptive.
We could create a new file, e.g. call it 'countryzone', which contains a set of Links organized in a hierarchical tree by country, pointing to the Core zones.
I strongly believe we should continue to carefully avoid attempting to group data by country. [I would even avoid using the word "country" wherever possible.]
For the pre-1970 data, it is my understanding that the 'backzone' file contains Zone records which should replace ONLY the LinkMerged records found in the other files. I propose that all LinkMerged records be extracted into a separate file (let's call it 'mergedzone') so that there is a clear symmetry between 'backzone' and 'mergedzone', which allows them to be substituted for each other. The dependency diagram looks something like this:
As I've suggested before in another thread, I think we should consider undoing the split into backzone. I really liked Stephen's phrasing earlier in this thread: acceptably accurate, not outrageously wrong. We started moving data to backzone to limit the scope of 'active' maintenance to post-1970 data. That artificial split led us towards a more prescriptive worldview. It seems clear that prescriptive simply does not work for a real world with people on it.
If there is any chance that this will result in being able to type "Canada/Toronto" instead of "America/Toronto", that would resolve an annoyance that has lasted some 30-35 years.
In this context, America refers to the landmass, not to the political entity occupying a large chunk of it. [Canada/Eastern etc moved to backward around 1993, as far as I can tell.] Trying to group by country would only lead to (more) divisive arguments. I do support the notion that we should have at least one time zone identifier per ISO code though (whether or not that code refers to a country). Philip -- Philip Paeps Senior Reality Engineer Alternative Enterprises
On Thu, Nov 4, 2021 at 10:11 PM Philip Paeps <philip@trouble.is> wrote:
On 2021-11-05 12:17:34 (+0800), Brian Park via tz wrote:
I get the impression that this debate is caused by the existence of 2 different schools of thought: [...]
I want to suggest that it may be possible for these 2 views to coexist.
They de facto coexist right now. The overwhelming majority of the data are descriptive. Only recent efforts have made some of the post-1970 data appear more prescriptive.
They coexist in an ad hoc manner right now, and that seems to be one of the causes for the contention. I am suggesting that we formalize the separation, so that both groups are happier.
We
could create a new file, e.g. call it 'countryzone', which contains a set of Links organized in a hierarchical tree by country, pointing to the Core zones.
I strongly believe we should continue to carefully avoid attempting to group data by country. [I would even avoid using the word "country" wherever possible.]
Can you explain why? Because it will cause arguments about disputed places? I think only a small minority of places around the world are disputed. By separating these ISO-country timezones into a 'countryzone' file, perhaps we can confine the debate into a smaller section of the TZDB. We could create duplicate entries (i.e. Country1/City, Country2/City), or create a pseudo-country called "Disputed" (i.e. Disputed/City). The point is, we can create policies that govern these disputed regions. Could we move 'countryzone' into a separate project? Probably, but some amount of initial coordination and refactoring would be required to resolve conflicting zone identifiers. Overall, I feel like the TZDB data should lean a bit more towards matching how end-users think about timezones in the real world (Prescriptive), and lean slightly less on treating timezones as a clustering problem (Descriptive). But I can see pros and cons of both approaches. Which is why I am suggesting ways to make the 2 approaches interoperate better.
For the pre-1970 data, it is my understanding that the 'backzone' file
contains Zone records which should replace ONLY the LinkMerged records found in the other files. I propose that all LinkMerged records be extracted into a separate file (let's call it 'mergedzone') so that there is a clear symmetry between 'backzone' and 'mergedzone', which allows them to be substituted for each other. The dependency diagram looks something like this:
As I've suggested before in another thread, I think we should consider undoing the split into backzone. I really liked Stephen's phrasing earlier in this thread: acceptably accurate, not outrageously wrong. We started moving data to backzone to limit the scope of 'active' maintenance to post-1970 data. That artificial split led us towards a more prescriptive worldview. It seems clear that prescriptive simply does not work for a real world with people on it.
I think Paul Eggert has made it clear that he does not want to maintain this data. My proposed refactoring of this info into the 'backzone' / 'mergedzone' pair makes it easy for downstream libraries to add back the 'backzone' data if they want. The 'make PACKRATDATA=backzone' hack does not help downstream libraries which do not use TZif or the Makefile.
If there is any chance that this will result in being able to type "Canada/Toronto" instead of "America/Toronto", that would resolve an annoyance that has lasted some 30-35 years.
In this context, America refers to the landmass, not to the political
entity occupying a large chunk of it. [Canada/Eastern etc moved to backward around 1993, as far as I can tell.]
Virtual no one in the world thinks of "America" as referring to all of "North America" and "South America". Brian
On Fri, 5 Nov 2021, Brian Park via tz wrote:
Virtual no one in the world thinks of "America" as referring to all of "North America" and "South America".
Arguably, it should have been Americas (with trailing 's') rather than America. But it's likely way past the point of being able to change now. +--------------------+--------------------------+----------------------+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org | | & Network Engineer | | pgoyette99@gmail.com | +--------------------+--------------------------+----------------------+
On 05.11.21 16:26, Brian Park via tz wrote:
Can you explain why [no ISO countries]? Because it will cause arguments about disputed places? I think only a small minority of places around the world are disputed.
Over the time I have been following this group: * YAR, South Yemen -> Yemen * Zaire -> DRC * East Germany, West Germany -> Germany * Yugoslavia -> Croatia, Serbia & Montenegro, Bosnia Herzegovina, Slovenia, Macedonia * Serbia & Montenegro -> Serbia, Montenegro * Serbia -> Serbia & Kosovo * Macedonia -> Northern Macedonia * Czechoslovakia, The Czech Republic, Slovakia * Czech Republic -> Czechia * Sudan -> Sudan, South Sudan * And then there's Russia and the Ukraine * Israel & Palestine * And South Africa and Namibia I'm sure I'm missing a few. You may say that these are a minority of nations, but these changes have NOT by themselves necessitated ANY work on the part of this project. That is a feature. To be clear, that work would involve someone taking a political stance, even if that means supporting UN decisions (that's a political decision). Better to stick with what we have: observe what people on the ground think the time is. Eliot
On Fri, Nov 5, 2021 at 8:47 AM Eliot Lear <lear@lear.ch> wrote:
On 05.11.21 16:26, Brian Park via tz wrote:
Can you explain why [no ISO countries]? Because it will cause arguments about disputed places? I think only a small minority of places around the world are disputed.
Over the time I have been following this group:
- YAR, South Yemen -> Yemen - Zaire -> DRC - East Germany, West Germany -> Germany - Yugoslavia -> Croatia, Serbia & Montenegro, Bosnia Herzegovina, Slovenia, Macedonia - Serbia & Montenegro -> Serbia, Montenegro - Serbia -> Serbia & Kosovo - Macedonia -> Northern Macedonia - Czechoslovakia, The Czech Republic, Slovakia - Czech Republic -> Czechia - Sudan -> Sudan, South Sudan - And then there's Russia and the Ukraine - Israel & Palestine - And South Africa and Namibia
I'm sure I'm missing a few.
You may say that these are a minority of nations, but these changes have NOT by themselves necessitated ANY work on the part of this project. That is a feature. To be clear, that work would involve someone taking a political stance, even if that means supporting UN decisions (that's a political decision).
Thanks for the historical context, this is a good list to have. It looks like some of those are name changes, and some of those are disputed regions. I think we would be able to create appropriate policies to govern the various situations. With my proposal of refactoring the ISO-country timezones into a separate 'countryzone' file, the churn would be isolated to that file. Perhaps there is a difference in perspective as well. As a downstream library maintainer, I almost always try to be an advocate of the end-users. I try to ask myself, "How can I make things easier for my users?", instead of "How can I make things easier for me, or the TZDB maintainers?" I understand the advantages of an abstract organization of timezones to prevent churn. But the lack of ISO-country based timezones causes a suboptimal experience for the end-users. We can solve that problem using a thin mapping layer on top of the more abstract timezone identifiers.
Better to stick with what we have: observe what people on the ground think the time is.
I've seen this a few times, but I don't understand it. No normal person on the ground thinks their time is "America/Los_Angeles". It's "US/Pacific". No normal person in Toronto thinks their time is "America/Toronto". Their country is not even America. They think their timezone is "Canada/Eastern". People are forced to use "America/Los_Angeles" or "America/Toronto" because the TZDB forced that nomenclature upon our users. It seems a mapping layer, like the 'countryzone' file containing ISO-countries, would be the one that provides the timezones that people use on the ground.
Just on this point: On 05.11.21 17:42, Brian Park wrote:
Better to stick with what we have: observe what people on the ground think the time is.
I've seen this a few times, but I don't understand it. No normal person on the ground thinks their time is "America/Los_Angeles". It's "US/Pacific". No normal person in Toronto thinks their time is "America/Toronto". Their country is not even America. They think their timezone is "Canada/Eastern". People are forced to use "America/Los_Angeles" or "America/Toronto" because the TZDB forced that nomenclature upon our users. It seems a mapping layer, like the 'countryzone' file containing ISO-countries, would be the one that provides the timezones that people use on the ground.
Second verse, same as the first: these are database keys, not user interface presentation. Nobody is forced to present any database key to a user. If you have locale awareness, as most modern user-facing systems have, you're going to be far more granular anyway. Eliot
On 11/5/21 2:18 PM, Eliot Lear via tz wrote:
Just on this point:
On 05.11.21 17:42, Brian Park wrote:
Better to stick with what we have: observe what people on the ground think the time is.
I've seen this a few times, but I don't understand it. No normal person on the ground thinks their time is "America/Los_Angeles". It's "US/Pacific". No normal person in Toronto thinks their time is "America/Toronto". Their country is not even America. They think their timezone is "Canada/Eastern". People are forced to use "America/Los_Angeles" or "America/Toronto" because the TZDB forced that nomenclature upon our users. It seems a mapping layer, like the 'countryzone' file containing ISO-countries, would be the one that provides the timezones that people use on the ground.
Second verse, same as the first: these are database keys, not user interface presentation. Nobody is forced to present any database key to a user. If you have locale awareness, as most modern user-facing systems have, you're going to be far more granular anyway.
Couldn't agree more with Eliot. -- Kenneth Murchison Senior Software Developer Fastmail US LLC
On Fri, Nov 5, 2021 at 11:23 AM Ken Murchison via tz <tz@iana.org> wrote:
On 11/5/21 2:18 PM, Eliot Lear via tz wrote:
Just on this point: On 05.11.21 17:42, Brian Park wrote:
Better to stick with what we have: observe what people on the ground think
the time is.
I've seen this a few times, but I don't understand it. No normal person on the ground thinks their time is "America/Los_Angeles". It's "US/Pacific". No normal person in Toronto thinks their time is "America/Toronto". Their country is not even America. They think their timezone is "Canada/Eastern". People are forced to use "America/Los_Angeles" or "America/Toronto" because the TZDB forced that nomenclature upon our users. It seems a mapping layer, like the 'countryzone' file containing ISO-countries, would be the one that provides the timezones that people use on the ground.
Second verse, same as the first: these are database keys, not user interface presentation. Nobody is forced to present any database key to a user. If you have locale awareness, as most modern user-facing systems have, you're going to be far more granular anyway.
Couldn't agree more with Eliot.
The practical reality is that the TZDB identifiers are externally visible identifiers to end-users. The Unix system forces the TZDB identifiers on to the user when I have to type this: $ TZ=America/Toronto date I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off. Brian
My two cents: Many of us actually work in IT and we cannot hide from the time zone names. The use of the string "America" in Canadian time zone names has always been a sore point for me. I will always use Canada/Eastern over America/Toronto wherever it is possible to do so. It is all about National Identity and it is something that seems to be lost on Paul and few others. I am not asking to get rid of the "America" string in the core North and South American times zones because it is far too late to do that. It is not, however, too late to stop the senseless merging of time zones across ISO boundaries. I view it as a political attack on the National Identities of countries with small populations. So far, Africa, the Caribbean and Canada are getting the short end of the stick. I vote for #3 on Stephen's list but I would gladly settle for #2. Is somebody keeping track of the vote? -chris On Fri, Nov 5, 2021 at 3:02 PM Brian Park via tz <tz@iana.org> wrote:
On Fri, Nov 5, 2021 at 11:23 AM Ken Murchison via tz <tz@iana.org> wrote:
On 11/5/21 2:18 PM, Eliot Lear via tz wrote:
Just on this point:
On 05.11.21 17:42, Brian Park wrote:
Better to stick with what we have: observe what people on the ground think the time is.
I've seen this a few times, but I don't understand it. No normal person on the ground thinks their time is "America/Los_Angeles". It's "US/Pacific". No normal person in Toronto thinks their time is "America/Toronto". Their country is not even America. They think their timezone is "Canada/Eastern". People are forced to use "America/Los_Angeles" or "America/Toronto" because the TZDB forced that nomenclature upon our users. It seems a mapping layer, like the 'countryzone' file containing ISO-countries, would be the one that provides the timezones that people use on the ground.
Second verse, same as the first: these are database keys, not user interface presentation. Nobody is forced to present any database key to a user. If you have locale awareness, as most modern user-facing systems have, you're going to be far more granular anyway.
Couldn't agree more with Eliot.
The practical reality is that the TZDB identifiers are externally visible identifiers to end-users. The Unix system forces the TZDB identifiers on to the user when I have to type this: $ TZ=America/Toronto date
I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off.
Brian
On Nov 5, 2021, at 12:01 PM, Brian Park via tz <tz@iana.org> wrote:
The practical reality is that the TZDB identifiers are externally visible identifiers to end-users. The Unix system forces the TZDB identifiers on to the user when I have to type this: $ TZ=America/Toronto date
1) The practical reality is that the TXDB identifiers are externally visible identifiers to *those end users who switch time zones on the command line*. This is a subset of the set of end users. 2) Perhaps the problem there is that you can't, for example, do $ TZ=`tzid Ottawa` date or something such a that - not everybody in the Canadian Eastern time zone is in Toronto. ("tzid" could also, for example, allow "Ottawa, ON, CA" or "Ottawa, IL, US" or "Ottawa, KS, US" or "Ottawa, CI"; *defaulting* to Ottawa, ON, CA probably makes sense.) 3) If the problem is that "America" often refers to the US in English, then, as has been noted, the choice of "America" rather than "Americas" was the cause of the problem. Then *nobody* in the Americas would have their country's name in the tzdb ID for the region they're in.
On November 5, 2021 3:47:03 PM EDT, Guy Harris via tz <tz@iana.org> wrote:
On Nov 5, 2021, at 12:01 PM, Brian Park via tz <tz@iana.org> wrote:
The practical reality is that the TZDB identifiers are externally visible identifiers to end-users. The Unix system forces the TZDB identifiers on to the user when I have to type this: $ TZ=America/Toronto date
1) The practical reality is that the TXDB identifiers are externally visible identifiers to *those end users who switch time zones on the command line*. This is a subset of the set of end users.
If only that were so. I took note of the following post from ISO New England, the electric grid operator for most of New England, reminding energy market participants using one of their applications (NEXTT) that times are displayed in "America/New_York" time: <https://isonewswire.com/2021/11/01/daylight-saving-time-ends-sunday-fall-bac...> -GAWollman
On Nov 5, 2021, at 1:31 PM, Garrett Wollman <wollman@csail.mit.edu> wrote:
If only that were so. I took note of the following post from ISO New England, the electric grid operator for most of New England, reminding energy market participants using one of their applications (NEXTT) that times are displayed in "America/New_York" time: <https://isonewswire.com/2021/11/01/daylight-saving-time-ends-sunday-fall-bac...>
The set of users confronted with Web sites doing a lazy form of time zone selection (having signed up for COVID-19 vaccinations at Safeway, I'm a member of that set) is still a subset of the set of end users. But software should do better. macOS and Ubuntu do; OpenStreetMap may enable doing better in a number of situations.
On Fri, Nov 5, 2021 at 12:01 PM Brian Park <brian@xparks.net> wrote:
I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off.
Following up my own post, I took an initial stab at what this 'countryzone' file would look like, and immediately ran into problems that convinces me that this does *not* belong in the TZDB project. The scope seems too large, so it seems better as a separate project. I started from an ISO-3166 CSV file (see https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes for a human readable version), and I found: 1) Many country names are too long to fit into 14 characters. Let's say we relax that constraint because we deprecate support for any old Unix system that cannot support these longer file names. But there are countries like "Heard Island and McDonald Islands", "South Georgia and the South Sandwich Islands", and "United States Minor Outlying Islands", and "British Indian Ocean Territory". Just from an ergonomics perspective, we should find a way to shorten these very long names. 2) If we shorten some countries, like "Bosnia and Herzegovina" to just "Bosnia" for convenience, are we going to offend people? I don't know anyone from Bosnia and Herzegovina, so I have no idea. Each country that we shorten needs to be researched carefully. 3) At least 5 countries have non-ASCII characters in their ISO names: "Côte d'Ivoire ", "Curaçao", "Åland Islands", "Saint Barthélemy", "Réunion". Personally, I would like to use only ASCII characters because they are the lowest common denominator that is guaranteed to work, outside of mainframes using EBCDIC. If we remove these non-ASCII characters, are we going to offend the people of those countries, even though these are supposed to be English versions of their country names? 4) So maybe the solution is to use 2-letter or 3-letter ISO codes, instead of the shortened, quasi-English versions of the country names. So we get things like "CA/Eastern" or "CAN/Eastern", instead of "Canada/Eastern". Not very satisfying for Canadians or many other countries (except for Americans whose ISO codes "US" and "USA" match their colloquial usage perfectly). All this before we even get to the work of mapping various ISO countries (and their subregions if needed) to their corresponding canonical TZDB identifiers. With regards to pre-1970 data in 'backzone', I'll see if I can do some exploratory work on the 'backzone'/'mergedzone' pairing next week, and determine if there are any major problems with the idea. Brian
On Nov 5, 2021, at 2:53 PM, Brian Park via tz <tz@iana.org> wrote:
4) So maybe the solution is to use 2-letter or 3-letter ISO codes, instead of the shortened, quasi-English versions of the country names. So we get things like "CA/Eastern" or "CAN/Eastern", instead of "Canada/Eastern". Not very satisfying for Canadians or many other countries (except for Americans whose ISO codes "US" and "USA" match their colloquial usage perfectly).
All this before we even get to the work of mapping various ISO countries (and their subregions if needed) to their corresponding canonical TZDB identifiers.
And "subregions" means more than just "time zone names" - Arizone isn't in the same tzdb region as Utah, so "US/Mountain" isn't sufficient here. I.e., even picking names for country/subregion combinations is going to involve some work and some decision-making. (And, while "US/Arizona" might work, "US/Indiana" won't - look for "Indiana" in the northamerica file. Then look at the whole Canada section - CA/{zone name} isn't going to suffice, either.)
Two different issues are being confused. 1. Don't have timezone IDs span country countries (specifically for,* country code*). 2. Use a short English name of countries in the timezone IDs. #1 former I fully agree with. The https://www.ietf.org/timezones/tzdb-2021e/zone1970.tab file goes a long way in this direction, and it wouldn't be hard to add some timezone IDs with Links to complete the set. Countries are important entities (denying that feels like https://www.eff.org/cyberspace-independence). While it may seem pointless to add timezone IDs for some rocks like Heard and McDonald Islands, there are only a few of those, and for testing purposes it is better to be complete. Note: I like the previous proposal to cleanly distinguish between those Links that are between IDs that really refer to the same place (just different spelling or name), and those that refer to different places. Alternatively, a step that would also serve to disambiguate, would be to have approximate coordinates for *each* timezone ID (not just the ones in the zone table); IDs with the same coordinates would have the 'same place' links. #2 is disruptive, and serves no purpose. The timezone IDs are no more than internal IDs, with a bit of mnemonic flavor just for internal recognition. No real implementation should present them to users; especially outside of the Anglosphere. Mark On Fri, Nov 5, 2021 at 3:11 PM Guy Harris via tz <tz@iana.org> wrote:
On Nov 5, 2021, at 2:53 PM, Brian Park via tz <tz@iana.org> wrote:
4) So maybe the solution is to use 2-letter or 3-letter ISO codes, instead of the shortened, quasi-English versions of the country names. So we get things like "CA/Eastern" or "CAN/Eastern", instead of "Canada/Eastern". Not very satisfying for Canadians or many other countries (except for Americans whose ISO codes "US" and "USA" match their colloquial usage perfectly).
All this before we even get to the work of mapping various ISO countries (and their subregions if needed) to their corresponding canonical TZDB identifiers.
And "subregions" means more than just "time zone names" - Arizone isn't in the same tzdb region as Utah, so "US/Mountain" isn't sufficient here. I.e., even picking names for country/subregion combinations is going to involve some work and some decision-making. (And, while "US/Arizona" might work, "US/Indiana" won't - look for "Indiana" in the northamerica file. Then look at the whole Canada section - CA/{zone name} isn't going to suffice, either.)
Brian Park via tz said:
4) So maybe the solution is to use 2-letter or 3-letter ISO codes, instead of the shortened, quasi-English versions of the country names. So we get things like "CA/Eastern" or "CAN/Eastern", instead of "Canada/Eastern". Not very satisfying for Canadians or many other countries (except for Americans whose ISO codes "US" and "USA" match their colloquial usage perfectly).
Are you ready to jump into the political quagmire that is "GB" versus "UK"? Something that people are - literally - ready to kill over. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
Whatever the iso decides is THEIR problem, not ours, if we defer to world standard.Sent from my Galaxy -------- Original message --------From: "Clive D.W. Feather via tz" <tz@iana.org> Date: 2021-11-05 19:05 (GMT-05:00) To: Brian Park <brian@xparks.net> Cc: IANA TZ Database <tz@iana.org> Subject: Re: [tz] Pre-1970 data Brian Park via tz said:> 4) So maybe the solution is to use 2-letter or 3-letter ISO codes,> instead of the shortened, quasi-English versions of the country names.> So we get things like "CA/Eastern" or "CAN/Eastern", instead of> "Canada/Eastern". Not very satisfying for Canadians or many other> countries (except for Americans whose ISO codes "US" and "USA" match> their colloquial usage perfectly).Are you ready to jump into the political quagmire that is "GB" versus "UK"?Something that people are - literally - ready to kill over.-- Clive D.W. Feather | If you lie to the compiler,Email: clive@davros.org | it will get its revenge.Web: http://www.davros.org | - Henry SpencerMobile: +44 7973 377646
I second iso country codes.Sent from my Galaxy -------- Original message --------From: "Clive D.W. Feather via tz" <tz@iana.org> Date: 2021-11-05 19:05 (GMT-05:00) To: Brian Park <brian@xparks.net> Cc: IANA TZ Database <tz@iana.org> Subject: Re: [tz] Pre-1970 data Brian Park via tz said:> 4) So maybe the solution is to use 2-letter or 3-letter ISO codes,> instead of the shortened, quasi-English versions of the country names.> So we get things like "CA/Eastern" or "CAN/Eastern", instead of> "Canada/Eastern". Not very satisfying for Canadians or many other> countries (except for Americans whose ISO codes "US" and "USA" match> their colloquial usage perfectly).Are you ready to jump into the political quagmire that is "GB" versus "UK"?Something that people are - literally - ready to kill over.-- Clive D.W. Feather | If you lie to the compiler,Email: clive@davros.org | it will get its revenge.Web: http://www.davros.org | - Henry SpencerMobile: +44 7973 377646
On Fri, Nov 5, 2021 at 3:28 PM Clive D.W. Feather <clive@davros.org> wrote:
Brian Park via tz said:
4) So maybe the solution is to use 2-letter or 3-letter ISO codes, instead of the shortened, quasi-English versions of the country names. So we get things like "CA/Eastern" or "CAN/Eastern", instead of "Canada/Eastern". Not very satisfying for Canadians or many other countries (except for Americans whose ISO codes "US" and "USA" match their colloquial usage perfectly).
Are you ready to jump into the political quagmire that is "GB" versus "UK"? Something that people are - literally - ready to kill over.
The ISO code for the UK is listed as "GB". Not our fault. But I think what you are saying is, if Northern Ireland decides to have its own timezone, we can't do GB/Northern_Ireland, so we'd need to create a UK/Northern_Ireland (that will point to the Zone Europe/Belfast that will have to be created in the canonical TZDB) and that would be an exception. I guess anything that describes the real world will have exceptions like this. The question I have for people who have expressed support for creating a TZ database organized by ISO country, are the benefits worth the inevitable controversies that will arise? If so, then we should create a new project and work out the details. Brian
Brian Park said:
Are you ready to jump into the political quagmire that is "GB" versus "UK"? Something that people are - literally - ready to kill over.
The ISO code for the UK is listed as "GB". Not our fault. But I think what you are saying is, if Northern Ireland decides to have its own timezone, we can't do GB/Northern_Ireland, so we'd need to create a UK/Northern_Ireland (that will point to the Zone Europe/Belfast that will have to be created in the canonical TZDB) and that would be an exception.
Actually, I wasn't even going that far: there are plenty of people in Northern Ireland who would object to GB/London right now. Note that having "GB" on the back of your car is no longer valid for driving in other countries; you need to have "UK" instead. So all those with "GB" fixed in metal or with number plates that have GB in a ring of stars are now stuffed.
The question I have for people who have expressed support for creating a TZ database organized by ISO country, are the benefits worth the inevitable controversies that will arise?
I've yet to see any benefits. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 2021-11-06 05:53:23 (+0800), Brian Park via tz wrote:
On Fri, Nov 5, 2021 at 12:01 PM Brian Park <brian@xparks.net> wrote:
I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off.
Following up my own post, I took an initial stab at what this 'countryzone' file would look like, and immediately ran into problems that convinces me that this does *not* belong in the TZDB project. The scope seems too large, so it seems better as a separate project.
I'm glad we can agree on this. :)
2) If we shorten some countries, like "Bosnia and Herzegovina" to just "Bosnia" for convenience, are we going to offend people? I don't know anyone from Bosnia and Herzegovina, so I have no idea. Each country that we shorten needs to be researched carefully.
That's just a subset of the problems you'll encounter. Referring to certain regions assigned ISO two-letter codes as countries will cause considerable awkwardness too. Even if you wisely avoid using the word country, your suggestion to abbreviate (or not) will be deeply controversial. You'll face an uphill struggle defending each decision. You might reasonably (from a technical perspective) suggest mechanically abbreviating (truncating) to the first whitespace or punctuation mark. That'll give you seven regions named "Saint" and that will probably the least of your problems.
3) At least 5 countries have non-ASCII characters in their ISO names: "Côte d'Ivoire ", "Curaçao", "Åland Islands", "Saint Barthélemy", "Réunion". Personally, I would like to use only ASCII characters because they are the lowest common denominator that is guaranteed to work, outside of mainframes using EBCDIC. If we remove these non-ASCII characters, are we going to offend the people of those countries, even though these are supposed to be English versions of their country names?
I don't believe the spelling will be nearly as controversial as referring to most of the regions in that list as countries. There is prior art in the tzdb for ASCII-fying accented letters.
4) So maybe the solution is to use 2-letter or 3-letter ISO codes, instead of the shortened, quasi-English versions of the country names. So we get things like "CA/Eastern" or "CAN/Eastern", instead of "Canada/Eastern". Not very satisfying for Canadians or many other countries (except for Americans whose ISO codes "US" and "USA" match their colloquial usage perfectly).
That would be somewhat less controversial. Though note Clive's observation about GB/UK. Philip -- Philip Paeps Senior Reality Engineer Alternative Enterprises
On Fri, Nov 05, 2021 at 02:53:23PM -0700, Brian Park via tz wrote:
On Fri, Nov 5, 2021 at 12:01 PM Brian Park <brian@xparks.net> wrote:
I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off.
Following up my own post, I took an initial stab at what this 'countryzone' file would look like, and immediately ran into problems that convinces me that this does *not* belong in the TZDB project. The scope seems too large, so it seems better as a separate project.
I started from an ISO-3166 CSV file (see https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes for a human readable version), and I found:
1) Many country names are too long to fit into 14 characters. Let's say we relax that constraint because we deprecate support for any old Unix system that cannot support these longer file names. But there are countries like "Heard Island and McDonald Islands", "South Georgia and the South Sandwich Islands", and "United States Minor Outlying Islands", and "British Indian Ocean Territory". Just from an ergonomics perspective, we should find a way to shorten these very long names.
2) If we shorten some countries, like "Bosnia and Herzegovina" to just "Bosnia" for convenience, are we going to offend people? I don't know anyone from Bosnia and Herzegovina, so I have no idea. Each country that we shorten needs to be researched carefully.
3) At least 5 countries have non-ASCII characters in their ISO names: "Côte d'Ivoire ", "Curaçao", "Åland Islands", "Saint Barthélemy", "Réunion". Personally, I would like to use only ASCII characters because they are the lowest common denominator that is guaranteed to work, outside of mainframes using EBCDIC. If we remove these non-ASCII characters, are we going to offend the people of those countries, even though these are supposed to be English versions of their country names?
This also brings up the question about why any of the subregion identifiers should be included? They are not countries and I find it hard to defend that Jan Mayen (population: 4 (scientists on the weather station)) should have it's own time zone when the US state of e.g. Texas shouldn't. /MF
On Sat, Nov 6, 2021 at 12:55 AM Magnus Fromreide via tz <tz@iana.org> wrote:
On Fri, Nov 05, 2021 at 02:53:23PM -0700, Brian Park via tz wrote:
On Fri, Nov 5, 2021 at 12:01 PM Brian Park <brian@xparks.net> wrote:
I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off.
Following up my own post, I took an initial stab at what this 'countryzone' file would look like, and immediately ran into problems that convinces me that this does *not* belong in the TZDB project. The scope seems too large, so it seems better as a separate project.
I started from an ISO-3166 CSV file (see https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes for a human readable version), and I found:
1) Many country names are too long to fit into 14 characters. Let's say we relax that constraint because we deprecate support for any old Unix system that cannot support these longer file names. But there are countries like "Heard Island and McDonald Islands", "South Georgia and the South Sandwich Islands", and "United States Minor Outlying Islands", and "British Indian Ocean Territory". Just from an ergonomics perspective, we should find a way to shorten these very long names.
2) If we shorten some countries, like "Bosnia and Herzegovina" to just "Bosnia" for convenience, are we going to offend people? I don't know anyone from Bosnia and Herzegovina, so I have no idea. Each country that we shorten needs to be researched carefully.
3) At least 5 countries have non-ASCII characters in their ISO names: "Côte d'Ivoire ", "Curaçao", "Åland Islands", "Saint Barthélemy", "Réunion". Personally, I would like to use only ASCII characters because they are the lowest common denominator that is guaranteed to work, outside of mainframes using EBCDIC. If we remove these non-ASCII characters, are we going to offend the people of those countries, even though these are supposed to be English versions of their country names?
This also brings up the question about why any of the subregion identifiers should be included? They are not countries and I find it hard to defend that Jan Mayen (population: 4 (scientists on the weather station)) should have it's own time zone when the US state of e.g. Texas shouldn't.
Because Texas has never had a different time from other entries in the DB, while Jan Mayen has.
/MF
-- Astra mortemque praestare gradatim
On Sat, Nov 06, 2021 at 11:19:28PM -0700, Watson Ladd wrote:
On Sat, Nov 6, 2021 at 12:55 AM Magnus Fromreide via tz <tz@iana.org> wrote:
On Fri, Nov 05, 2021 at 02:53:23PM -0700, Brian Park via tz wrote:
On Fri, Nov 5, 2021 at 12:01 PM Brian Park <brian@xparks.net> wrote:
I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off.
Following up my own post, I took an initial stab at what this 'countryzone' file would look like, and immediately ran into problems that convinces me that this does *not* belong in the TZDB project. The scope seems too large, so it seems better as a separate project.
I started from an ISO-3166 CSV file (see https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes for a human readable version), and I found:
1) Many country names are too long to fit into 14 characters. Let's say we relax that constraint because we deprecate support for any old Unix system that cannot support these longer file names. But there are countries like "Heard Island and McDonald Islands", "South Georgia and the South Sandwich Islands", and "United States Minor Outlying Islands", and "British Indian Ocean Territory". Just from an ergonomics perspective, we should find a way to shorten these very long names.
2) If we shorten some countries, like "Bosnia and Herzegovina" to just "Bosnia" for convenience, are we going to offend people? I don't know anyone from Bosnia and Herzegovina, so I have no idea. Each country that we shorten needs to be researched carefully.
3) At least 5 countries have non-ASCII characters in their ISO names: "Côte d'Ivoire ", "Curaçao", "Åland Islands", "Saint Barthélemy", "Réunion". Personally, I would like to use only ASCII characters because they are the lowest common denominator that is guaranteed to work, outside of mainframes using EBCDIC. If we remove these non-ASCII characters, are we going to offend the people of those countries, even though these are supposed to be English versions of their country names?
This also brings up the question about why any of the subregion identifiers should be included? They are not countries and I find it hard to defend that Jan Mayen (population: 4 (scientists on the weather station)) should have it's own time zone when the US state of e.g. Texas shouldn't.
Because Texas has never had a different time from other entries in the DB, while Jan Mayen has.
I will happily admit to not having done any deeper research than reading the comments in europe and backzone but both Arctic/Longyearbyen and Atlantic/Jan_Mayen seems to be links to Europe/Oslo and that is all the history they have. What I was after though, was that Svalbard & Jan Mayen do have an ISO country code - SJ - but that country code is rightfully marked as a region code in iso 3166. I do not know why some places get their own 3166 region codes but I suppose it is a thing for autonomus regions. /MF
Magnus Fromreide via tz said:
I do not know why some places get their own 3166 region codes but I suppose it is a thing for autonomus regions.
Originally it was something to do with trade statistics. Codes were given to what the UN recognized as countries *or* places that had international trade that was tracked separately. So it's not so much autonomus regions as physically separate regions. So Corsica doesn't have a code because it's just (for these purposes) an island off the coast of France and Northern Ireland doesn't have a code for much the same reason. While SJ (or Greenland, as part of Denmark) are tracked separately by whichever body was doing the tracking. (Even before the Soviet Union dissolved, Belarus and Ukraine had their own codes because the UN recognized them as separate countries - even though part of the USSR - as a fudge to give the USSR more power in the UN.) -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On Nov 5, 2021, at 17:53, Brian Park via tz <tz@iana.org> wrote:
1) Many country names are too long to fit into 14 characters. Let's say we relax that constraint because we deprecate support for any old Unix system that cannot support these longer file names. But there are countries like "Heard Island and McDonald Islands", "South Georgia and the South Sandwich Islands", and "United States Minor Outlying Islands", and "British Indian Ocean Territory". Just from an ergonomics perspective, we should find a way to shorten these very long names.
Given that some of these names are themselves the result of long and sometimes contentious debate within their own constituencies, I suspect that *any* shortening of these names is going to ignite controversies. Even the mere ‘Englishification’ of them will be controversial. (Witness the ongoing Kyiv/Kiev controversy). Cheers! |---------------------------------------------------------------------| | Frederick F. Gleason, Jr. | Chief Developer | | | Paravel Systems | |---------------------------------------------------------------------| | A room without books is like a body without a soul. | | | | -- Cicero | |---------------------------------------------------------------------|
Actually this Canadian is quite ok with CA for the country code. We are familiar with it as our domain name extensionCA/ON/Ottawa works.Sent from my Galaxy -------- Original message --------From: Brian Park via tz <tz@iana.org> Date: 2021-11-05 17:54 (GMT-05:00) To: Ken Murchison <murch@fastmail.com> Cc: IANA TZ Database <tz@iana.org> Subject: Re: [tz] Pre-1970 data On Fri, Nov 5, 2021 at 12:01 PM Brian Park <brian@xparks.net> wrote:I agree that it is conceptually cleaner if the Core TZDB identifiers were internal only. But I understand that some people would consider ISO-country identifiers to be out of scope of this project, although there are many ad hoc ones currently in the database. I think a file like 'countryzone' should be added only if there are people willing to maintain such a list. It may need to be a separate project, to avoid forcing the TZ Coordinator to pick up the slack if those maintainers drop off.Following up my own post, I took an initial stab at what this 'countryzone' file would look like, and immediately ran into problems that convinces me that this does *not* belong in the TZDB project. The scope seems too large, so it seems better as a separate project.I started from an ISO-3166 CSV file (see https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes for a human readable version), and I found: 1) Many country names are too long to fit into 14 characters. Let's say we relax that constraint because we deprecate support for any old Unix system that cannot support these longer file names. But there are countries like "Heard Island and McDonald Islands", "South Georgia and the South Sandwich Islands", and "United States Minor Outlying Islands", and "British Indian Ocean Territory". Just from an ergonomics perspective, we should find a way to shorten these very long names.2) If we shorten some countries, like "Bosnia and Herzegovina" to just "Bosnia" for convenience, are we going to offend people? I don't know anyone from Bosnia and Herzegovina, so I have no idea. Each country that we shorten needs to be researched carefully.3) At least 5 countries have non-ASCII characters in their ISO names: "Côte d'Ivoire ", "Curaçao", "Åland Islands", "Saint Barthélemy", "Réunion". Personally, I would like to use only ASCII characters because they are the lowest common denominator that is guaranteed to work, outside of mainframes using EBCDIC. If we remove these non-ASCII characters, are we going to offend the people of those countries, even though these are supposed to be English versions of their country names?4) So maybe the solution is to use 2-letter or 3-letter ISO codes, instead of the shortened, quasi-English versions of the country names. So we get things like "CA/Eastern" or "CAN/Eastern", instead of "Canada/Eastern". Not very satisfying for Canadians or many other countries (except for Americans whose ISO codes "US" and "USA" match their colloquial usage perfectly).All this before we even get to the work of mapping various ISO countries (and their subregions if needed) to their corresponding canonical TZDB identifiers.With regards to pre-1970 data in 'backzone', I'll see if I can do some exploratory work on the 'backzone'/'mergedzone' pairing next week, and determine if there are any major problems with the idea.Brian
That is not correct. It is perfectly normal in German to say 'in Amerika' or in French 'en Amerique'. That usually means the complete continent, unless there is already a context which, for example, narrows it to a specific part, like the US. If one wants to know more, one has to ask back 'in north, south or middle America?' or 'Where in America'. On 05.11.21 16:26, Brian Park via tz wrote:
Virtual no one in the world thinks of "America" as referring to all of "North America" and "South America".
Brian Park via tz said:
Can you explain why? Because it will cause arguments about disputed places? I think only a small minority of places around the world are disputed.
But they're the ones that create all the noise.
By separating these ISO-country timezones into a 'countryzone' file, perhaps we can confine the debate into a smaller section of the TZDB. We could create duplicate entries (i.e. Country1/City, Country2/City), or create a pseudo-country called "Disputed" (i.e. Disputed/City). The point is, we can create policies that govern these disputed regions.
How do you handle the meta-issue that it's disputed whether some of these issues are even disputed? The government in Beijing will tell you that there is no dispute about Taiwan: it's part of China. Other people will tell you that there is a dispute. Similarly for Crimea. And I'm sure many others. I can already imagine all the complaints that something "obvious" is being falsely labelled as disputed. Equally, someone can raise a dispute about almost anything: which disputes are you going to tag as disputed?
Could we move 'countryzone' into a separate project?
Of course you could. As a simple starting point, you could try creating the list you want *ignoring the pre-1970 data issue* and see how it maps to TZDB.
Virtual no one in the world thinks of "America" as referring to all of "North America" and "South America".
Have you asked them all? -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 6 Nov 2021, at 02:26, Brian Park via tz <tz@iana.org <mailto:tz@iana.org>> wrote:
Virtual no one in the world thinks of "America" as referring to all of "North America" and "South America".
Building upon what Alois and Clive have said, this is a dubious statement that’s disproven with only a cursory search. This is what Wikipedia[1] currently has to say on the matter, referring to an English-language shift in terminology towards separate “North” and “South” continents:
This shift did not seem to happen in most other cultural hemispheres on Earth, such as Romance-speaking (including France, Belgium, Luxembourg, Italy, Portugal, Spain, Romania, Switzerland, and the postcolonial Romance-speaking countries of Latin America and Africa), Germanic (but excluding English) speaking (including Germany, Austria, Switzerland, Belgium, The Netherlands, Luxembourg, Denmark, Norway, Sweden, Icelands, Faroe Islands), Baltic-Slavic languages (including Czechia, Slovakia, Poland, Ukraine, Belarus, Lithuania, Latvia, Russia, Slovenia, Croatia, Bosnia and Herzegovina, Serbia, Montenegro, Bulgaria) and in many other hemispheres, where America is still considered a continent encompassing the North America and South America subcontinents, as well as Central America.
Would it have been less confusing overall to have used the prefix “Americas/” instead of “America/”? Yes. But is it completely wrong to use “America” to refer to the entire continental mass? Not at all. (And should the Wikipedia entry be rewritten to not have a single sentence that's 102 words long? Undoubtedly.) On a broader note, I’ve seen a few “my opinion is fact” statements fly around on the mailing list lately, from various parties. (I know, I know, “welcome to the internet”.) Normally I let them slide by—I’m far more a casual peruser of this list than an active contributor. But I realised that bold assertions about what people in “other countries” do or do not care about is partly what’s caused this pre-1970 dilemma in the first place. Obviously we can’t do rigorous information-gathering for every decision about the tzdb structure—the scope is so broad that nothing would ever get resolved. But on the other extreme, if everyone only puts forward unresearched opinions masquerading as facts, then still nothing gets resolved because there’s no compromise. [1]: https://en.wikipedia.org/wiki/Naming_of_the_Americas <https://en.wikipedia.org/wiki/Naming_of_the_Americas> Cheers, Gil
On Tue, Nov 9, 2021 at 2:58 AM Gilmore Davidson via tz <tz@iana.org> wrote:
On 6 Nov 2021, at 02:26, Brian Park via tz <tz@iana.org> wrote:
Virtual no one in the world thinks of "America" as referring to all of "North America" and "South America".
Building upon what Alois and Clive have said, this is a dubious statement that’s disproven with only a cursory search. This is what Wikipedia[1] currently has to say on the matter, referring to an English-language shift in terminology towards separate “North” and “South” continents:
This shift did not seem to happen in most other cultural hemispheres on Earth, such as Romance-speaking (including France, Belgium, Luxembourg, Italy, Portugal, Spain, Romania, Switzerland, and the postcolonial Romance-speaking countries of Latin America and Africa), Germanic (but excluding English) speaking (including Germany, Austria, Switzerland, Belgium, The Netherlands, Luxembourg, Denmark, Norway, Sweden, Icelands, Faroe Islands), Baltic-Slavic languages (including Czechia, Slovakia, Poland, Ukraine, Belarus, Lithuania, Latvia, Russia, Slovenia, Croatia, Bosnia and Herzegovina, Serbia, Montenegro, Bulgaria) and in many other hemispheres, where America is still considered a continent encompassing the North America and South America subcontinents, as well as Central America.
Would it have been less confusing overall to have used the prefix “Americas/” instead of “America/”? Yes. But is it completely wrong to use “America” to refer to the entire continental mass? Not at all. (And should the Wikipedia entry be rewritten to not have a single sentence that's 102 words long? Undoubtedly.)
On a broader note, I’ve seen a few “my opinion is fact” statements fly around on the mailing list lately, from various parties. (I know, I know, “welcome to the internet”.) Normally I let them slide by—I’m far more a casual peruser of this list than an active contributor. But I realised that bold assertions about what people in “other countries” do or do not care about is partly what’s caused this pre-1970 dilemma in the first place.
Obviously we can’t do rigorous information-gathering for every decision about the tzdb structure—the scope is so broad that nothing would ever get resolved. But on the other extreme, if everyone only puts forward unresearched opinions masquerading as facts, then still nothing gets resolved because there’s no compromise.
[1]: https://en.wikipedia.org/wiki/Naming_of_the_Americas
Cheers, Gil
That's a fair point, and reminds me that I should avoid making overly broad statements on the internet. Although in the multiple decades that I have been on Earth, I have never heard of the English word "America" being used to refer to both continents. We agree that the better term is "the Americas". Brian
On Fri, 5 Nov 2021 at 04:17, Brian Park <brian@xparks.net> wrote:
* Descriptive: Paul wants to describe the timezones of the world without regard to how those time zones were created, and merge them into the smallest set that can generate the timekeeping rules. I can see that in this view, merging timezones from different countries into the same equivalence class is reasonable.
The minimalist view espoused by Paul is, IMO, perfectly rationale if, and only if: a) there is no pre-1970 history associated with each abstract region b) the abstract regions have names that don't imply national boundaries/limits on their scope Merging without these two things being true merely results in a horrible mess. There is a perfectly viable solution to the problem if maintenance of post-1970 timezones was moved to a different repo along the lines above. This repo could then import the post-1970 definitions and unite them with the pre-1970 ones (with a different volunteer doing that work).
* Prescriptive: I think Stephen and others start with the fact that time zones are the creations of political organizations which write the regulations that define the timezones. Those governing bodies are predominantly organized by country in a hierarchical structure. In this view, it does *not* make sense to merge timezones from different countries. This view also implies that the TZ identifiers should reflect the political organizational structure of the world.
I don't think this is really a good summary. All I'm asking for is a return to the system that was in use up to 2014 or so. I'm asking because it is what end users expect, it works well, its backwards compatible, and it models how timezone rules actually work in the real world, ie. sometimes based on countries and sometimes real entities the general public are aware of. I do not want to see a hierarchical structure of country code and location, like FR/Paris. Examples like Crimea demonstrate why this is a bad idea, plus ISO codes get reused eventually making them bad identifiers. The connection between Europe/Paris and ISO code FR tzdb currently has is IMO at a completely different level, and far less politically contentious.
For this to work, I think we need to clarify the semantics of the 'Link' records in the TZ database. As far as I can tell, there are at least 3 different meanings of the Link record:
1) Link Canonical Deprecated * Deprecated is an old zone which should no longer be used 2) Link Canonical Alternate * alternate spelling or alias, but not deprecated 3) Link Canonical Merged * zones which were merged because they have the same rules by chance, but there is no semantic relationship to each other
It is undoubtedly true that there are different meanings of Link. Stephen
On 2021-11-05 07:03:33 (+0800), Stephen Colebourne via tz wrote:
On Wed, 3 Nov 2021 at 22:40, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 10/18/21 06:07, Stephen Colebourne via tz wrote:
The ironclad rule (AFAICT) is that there was always an *ID* for each ISO country, and that the data it returned was acceptably accurate, not outrageously wrong.
While we could quibble about how "ironclad" the rule was, the tzdb certainly leaned in that direction until comparatively recently. That aside though, I completely support the second half of this sentence as a guiding principle of what the tzdb ought to provide. Acceptably accurate, not outrageously wrong.
If you genuinely do want to reduce your volunteer work to only be the abstract post-1970 regions and not to maintain any data pre-1970, then you really should be clear about that. You could then look for an alternate maintainer of tzdb itself as you would be maintaining what amounts to a new database, which would best sit in a different git repo. That data could then be an input to tzdb itself.
In all the heated discussions we've had on this mailing list in recent months, I don't believe we actually discussed the way the tzdb is maintained and whether we can improve it. There was a suggestion to replace Paul as coordinator. I do not support this. As far as I can tell, we never discussed the maintenance process. We seem to be taking it for granted that the coordinators (Paul and Tim) are doing all the work maintaining the tzdb. Other contributions are essentially updates to historical information and upcoming changes to transitions, which the coordinators merely merge and credit. This way of working places a huge burden on the coordinators. I cannot find any reason to criticise Paul for wanting to reduce his maintenance workload. While I do not agree with the consequences of merging time zone regions, under the circumstances the premise is sound. Maybe we can spread the burden of maintenance over more volunteers. Note that I am not suggesting we replace the coordinators or abolish their role. Instead of both coordinating and doing all the work, others could step up and volunteer to help maintain the data. Stephen, you have been very vocal about supporting one identifier per ISO code. Would you volunteer to putting in the work of maintaining this? Are others on this list willing to help share this work? Paul/Tim, would you support coordinating the efforts of additional maintainers and ensuring that what ends up in the repository continues to meet the high standards of quality the community expects? Philip -- Philip Paeps Senior Reality Engineer Alternative Enterprises
Philip Paeps via tz <tz@iana.org> wrote:
On 2021-11-05 07:03:33 (+0800), Stephen Colebourne via tz wrote:
The ironclad rule (AFAICT) is that there was always an *ID* for each ISO country, and that the data it returned was acceptably accurate, not outrageously wrong.
While we could quibble about how "ironclad" the rule was, the tzdb certainly leaned in that direction until comparatively recently. That aside though, I completely support the second half of this sentence as a guiding principle of what the tzdb ought to provide.
The tz maintenance rules as written clearly said that each country should have at least one zone until 2019. That rule started being broken in 2013 when zones started being merged and pre-1970 data was dropped for some countries that lacked representation on this list. Tony. -- f.anthony.n.finch <dot@dotat.at> https://dotat.at/ Mull of Kintyre to Ardnamurchan Point: Southwest 5 to 7, veering west 4 or 5 for a time. Moderate or rough, occasionally very rough later. Occasional rain or drizzle, showers later. Moderate or good, occasionally poor.
On 11/8/21 13:30, Tony Finch via tz wrote:
The tz maintenance rules as written clearly said that each country should have at least one zone until 2019. That rule started being broken in 2013
If I'm reading the tzdb history correctly, from 2013–2019 the guidelines said only that there should typically be at least one name (not Zone) for each inhabited ISO country or territory. From 1997–2013 the guidelines also talked about names, not Zones. (There were no guidelines in this area before 1997.) So it appears that there has never been a guideline saying that each country should have at least one zone, and this means no such rule was ever broken.
On Mon, 8 Nov 2021 at 23:05, Paul Eggert via tz <tz@iana.org> wrote:
On 11/8/21 13:30, Tony Finch via tz wrote:
The tz maintenance rules as written clearly said that each country should have at least one zone until 2019. That rule started being broken in 2013
If I'm reading the tzdb history correctly, from 2013–2019 the guidelines said only that there should typically be at least one name (not Zone) for each inhabited ISO country or territory. From 1997–2013 the guidelines also talked about names, not Zones. (There were no guidelines in this area before 1997.)
So it appears that there has never been a guideline saying that each country should have at least one zone, and this means no such rule was ever broken.
Here is what it said in 2012: "Include at least one location per time zone rule set per country. One such location is enough. Use ISO 3166 (see the file iso3166.tab) to help decide whether something is a country. However, uninhabited ISO 3166 regions like Bouvet Island do not need locations, since local time is not defined there." It was remove by this commit in 2013: https://github.com/eggert/tz/commit/d3b025adb25554ee10b986850371e573df92733e and re-added in the weaker form of "name" after I objected: https://github.com/eggert/tz/commit/3d046bc0e4351c658d333d1dcc9c69ab15dfb743 IMO the original definition referred to Zone and not just name. That is no surprise, because it is a very rational way to model timezones. Stephen
Paul Eggert via tz said:
On 10/18/21 06:07, Stephen Colebourne via tz wrote:
What tzdb previously offered was a set of IDs, based on a simple rule - "ID as needed for post-1970 data, with at least one per ISO country".
But there's an even simpler rule: "ID as needed for post-1970 data".
Full history was available for each of these (whether accurate or not).
That wasn't ever the case. For example, there was never full history (accurate or not) for San Marino. We shouldn't base our analysis on the idea that we formerly had at least one Zone per ISO country, as we never had an ironclad rule like that and we did just fine without any such rule. There's no *timekeeping* reason to require a Zone for every ISO country.
I agree with this.
Adding such a requirement would complicate maintenance. It would add a significant amount of likely-bogus data, as witness the recent discussion about the likely-bogus data for Bamako that's in 'backzone'. And it would increase the role of politics particularly as new countries emerge, and politics is something we should avoid as much as possible.
Suppose that a new "country" (quotes because not all ISO 3166-1 entries are countries and I *REALLY* don't want to start arguing which ones are) is formed. We now have to split a zone into two with identical post-1970 data. But what do we use for the pre-1970 data? One of them will have to take data for a city in a different country. Precisely what Stephen has been arguing against! We need to decide what TZDB's policies are in relation to pre-1970 data, including when and where the data is included, how to indicate the reliability of the data, and how to handle zone splits and merges.
What I am sensing from your proposal, as well as from some of the followup comments, is a need to further clarify exactly what the tzdb project's interfaces are.
I agree with this. I think it should have priority over the rest of this discussion. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 11/8/21 00:27, Clive D.W. Feather wrote:
Suppose that a new "country" (quotes because not all ISO 3166-1 entries are countries and I *REALLY* don't want to start arguing which ones are) is formed. We now have to split a zone into two with identical post-1970 data.
But what do we use for the pre-1970 data? One of them will have to take data for a city in a different country. Precisely what Stephen has been arguing against!
Years ago when that happened and the old guidelines suggested a new Zone, I filled in the blanks with pre-1970 data mostly from Shanks. After some experience of the problems with this approach I changed the guidelines: with the old appraoch, Shanks's unreliability caused me to put more bogus data into tzdb, and this created more work for me, for the scarce people who help maintain the data, and for downstream users who had to deal with the unnecessary proliferation of Zones. It was a mess better avoided if we don't need to do it, which we don't if we're creating a Zone purely for political reasons.
What I am sensing from your proposal, as well as from some of the followup comments, is a need to further clarify exactly what the tzdb project's interfaces are.
I agree with this. I think it should have priority over the rest of this discussion.
For starters we could include something like the above paragraph, as it describes what I've done in the past. Another clarification is over the role of the names. In the reference implementation if you have "Zone A ..." and "Link A B", there's no difference between the two names A and B: they're both hard links to the same TZif file, neither is "the" name for the file, and nothing in the file's contents tells you what its name is. Much of tzdb database maintenance has assumed this, and it'd be helpful to document this assumption. I'm sure there are other things that need to be documented/clarified. (There always are. :-)
Hi, On 08.11.21 09:27, Clive D.W. Feather via tz wrote:
What I am sensing from your proposal, as well as from some of the followup comments, is a need to further clarify exactly what the tzdb project's interfaces are. I agree with this. I think it should have priority over the rest of this discussion.
Which are we talking about? There are already the POSIX interfaces. Are we talking about the binary files? Is this something on which we are likely to gain consensus? Eliot
Eliot Lear said:
What I am sensing from your proposal, as well as from some of the followup comments, is a need to further clarify exactly what the tzdb project's interfaces are.
I agree with this. I think it should have priority over the rest of this discuss ion.
Which are we talking about? There are already the POSIX interfaces. Are we talking about the binary files? Is this something on which we are likely to gain consensus?
We're talking about a whole load of stuff and I'm not even sure what it all is. But documented formats for the various files, *including* saying what stuff may change without warning ("unspecified behaviour" in C terms) versus what can be relied on, is an obvious start. Defining what the legal values of fields are and what they mean (see Irish winter time for an example). What happens if you provide a value that's out of range? Et seq ad nauseam. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 9 Nov 2021, at 21:27, Clive D.W. Feather via tz <tz@iana.org <mailto:tz@iana.org>> wrote:
Defining what the legal values of fields are and what they mean (see Irish winter time for an example). What happens if you provide a value that's out of range?
This is partly why I tried to compile a list of use cases for direct consumption of the source text files: https://mm.icann.org/pipermail/tz/2021-October/030950.html <https://mm.icann.org/pipermail/tz/2021-October/030950.html> Knowing how and why people are using the files helps define what gets “blessed” as a supported interface. Cheers, Gil
Eliot Lear via tz <tz@iana.org> writes:
Which are we talking about? There are already the POSIX interfaces. Are we talking about the binary files? Is this something on which we are likely to gain consensus?
AFAIK, the only way to enumerate the available zones in a typical installation is to search the directory tree. This isn't great, not least because it means that non-tzcode code has to know where that directory tree is. It'd be good to support that functionality in a more direct fashion. regards, tom lane
On 09.11.21 15:52, Tom Lane via tz wrote:
AFAIK, the only way to enumerate the available zones in a typical installation is to search the directory tree. This isn't great, not least because it means that non-tzcode code has to know where that directory tree is. It'd be good to support that functionality in a more direct fashion.
That's an easy one to solve (finally? An easy problem?). That is the C equivalent of- ( export TDIR=/usr/share/zoneinfo cd $TDIR; find * -type f -print |egrep -v 'VERSION|leapseconds|posixrules|zone.tab' ) where TDIR is defined at build. Is that an important interface to define and standardize? Eliot
On 11/9/21 06:52, Tom Lane wrote:
AFAIK, the only way to enumerate the available zones in a typical installation is to search the directory tree.
Although that was true years ago, since 2017c tzdb has distributed a file tzdata.zi that also works. For example, on my Fedora 35 host, the shell command: awk '/^Z/ {print $2}' /usr/share/zoneinfo/tzdata.zi lists a name for each distinct TZif file.
Paul Eggert <eggert@cs.ucla.edu> writes:
On 11/9/21 06:52, Tom Lane wrote:
AFAIK, the only way to enumerate the available zones in a typical installation is to search the directory tree.
Although that was true years ago, since 2017c tzdb has distributed a file tzdata.zi that also works.
Is that present in every downstream distribution? Is it specified in the TZif RFC? More to the point, I thought this thread was about whether tzcode's APIs are sufficient for everyone's use cases. A discussion back at the Postgres project [1] reminded me of another requirement that is only poorly satisfied by the existing APIs: how can programs tell when zone data has changed? Right now you can sort of tell by examining the mod dates in the tzdata file tree, but that gives you a ton of false positives, even granted that examining the file tree is something that tzcode users ought to be doing. It'd sure be nice if individual zone data files had a version number or "this data was last changed on <date>" sort of label, and tzcode provided a way to get that for any particular zone ID. An overall tzdata version number ("2021e" etc) would be a second-best answer, but AFAIK that's not available in any standardized way either. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CADT4RqDVBbqSbQVH_v_vS5_9DPhjsfmQ...
On 11/11/21 10:51, Tom Lane wrote:
Although that was true years ago, since 2017c tzdb has distributed a file tzdata.zi that also works.
Is that present in every downstream distribution?
It's present in every tzdb distribution. Nothing (not even TZif files) is present in every downstream redistribution of tzdb data.
Is it specified in the TZif RFC?
No.
More to the point, I thought this thread was about whether tzcode's APIs are sufficient for everyone's use cases.
No, primarily it's about the "API" for tzdb, not about tzcode directly. That is, the main issue here comes from downstream uses that reach into the tzdb source code directly without going through the tzcode API. In effect there appears to be a not-formalized "API" (it's not really an API - it's not written down so in some sense it doesn't even exist!) that is the source of these issues.
It'd sure be nice if individual zone data files had a version number
I'm a bit skeptical of the cost/benefit of that, for the same reason I'm skeptical of the utility of putting version numbers in files generally - the version numbers are not generally trustworthy and they are often more trouble than they're worth. Anyway, there isn't really room in the current TZif file format for a general-enough version number, though of course we could extend the format in a future RFC to make room for it.
On 11/9/21 02:23, Eliot Lear wrote:
Which are we talking about? There are already the POSIX interfaces. Are we talking about the binary files? Is this something on which we are likely to gain consensus?
For binary files (the TZif files) we have Internet RFC 8536 and its draft successor RFC 8536bis <https://datatracker.ietf.org/doc/draft-murchison-rfc8536bis/>. Arthur, Ken and I drafted this mostly tzdb's tzfile.5, and although the RFC has been mentioned occasionally this mailing list has not focused on that effort and hasn't helped its progress. (Occasionally people here have suggested using up the spare space in the TZif header for some extension, but none of these proposals have seemed to be worth the cost.) The issues causing controversy on this list have been in areas not covered by RFC 8536 (or RFC 6557, for that matter). If we wanted another RFC to address these other issues, I suppose one way to do it would be to draft one starting with material already in tzdb. It'd take some work to draft any such RFC, though, and any drafters would need to be thick-skinned.
On Tue, Nov 9, 2021 at 8:29 AM Paul Eggert via tz <tz@iana.org> wrote:
On 11/9/21 02:23, Eliot Lear wrote:
Which are we talking about? There are already the POSIX interfaces. Are we talking about the binary files? Is this something on which we are likely to gain consensus?
For binary files (the TZif files) we have Internet RFC 8536 and its draft successor RFC 8536bis <https://datatracker.ietf.org/doc/draft-murchison-rfc8536bis/>. Arthur, Ken and I drafted this mostly tzdb's tzfile.5, and although the RFC has been mentioned occasionally this mailing list has not focused on that effort and hasn't helped its progress. (Occasionally people here have suggested using up the spare space in the TZif header for some extension, but none of these proposals have seemed to be worth the cost.)
The issues causing controversy on this list have been in areas not covered by RFC 8536 (or RFC 6557, for that matter). If we wanted another RFC to address these other issues, I suppose one way to do it would be to draft one starting with material already in tzdb. It'd take some work to draft any such RFC, though, and any drafters would need to be thick-skinned.
Is the TZDB project intended to support only POSIX systems? I had assumed that the C library was the reference implementation, and the TZif was the file format supporting the reference implementation, not the API into the TZDB. For my downstream library, I parse the raw zone files directly from GitHub. That makes it easy to integrate into the GitHub Actions continuous build system, and I can test against the most recent commits. My downstream usage does not have a file system, or even an operating system. For me, the API into the TZDB project are the raw files, but I understand that my usage is unusual. Brian
On Tue, Nov 9, 2021 at 2:39 PM Brian Park via tz <tz@iana.org> wrote:
Is the TZDB project intended to support only POSIX systems? I had assumed that the C library was the reference implementation, and the TZif was the file format supporting the reference implementation, not the API into the TZDB. For my downstream library, I parse the raw zone files directly from GitHub. That makes it easy to integrate into the GitHub Actions continuous build system, and I can test against the most recent commits. My downstream usage does not have a file system, or even an operating system. For me, the API into the TZDB project are the raw files, but I understand that my usage is unusual.
Unusual, but surely not aberrant. The Tcl programming language also parses the raw files into its own format. (It uses zoneinfo and excludes the Tcl files on platforms where zoneinfo is expected by default (e.g., most Unixes) but includes the information in its own format on Windows (and iOS and Android if memory serves). The inclusion on Windows is partly because many releases of Windows didn't have time zone logic that could handle historical rule changes (and I'm talking post-1970: basically, the logic allowed only one set of DST transition rules per zone). That may have been fixed in current API's (I haven't tracked it recently; I know that Tcl's method works.) -- 73 de ke9tv/2, Kevin
On 11/9/21 2:38 PM, Brian Park via tz wrote:
Is the TZDB project intended to support only POSIX systems? I had assumed that the C library was the reference implementation, and the TZif was the file format supporting the reference implementation, not the API into the TZDB. For my downstream library, I parse the raw zone files directly from GitHub. That makes it easy to integrate into the GitHub Actions continuous build system, and I can test against the most recent commits. My downstream usage does not have a file system, or even an operating system. For me, the API into the TZDB project are the raw files, but I understand that my usage is unusual.
FWIW, vzic (I have been maintaining a fork of the original in libical) also uses the raw files to create iCalendar VTIMEZONE components. -- Kenneth Murchison Senior Software Developer Fastmail US LLC
participants (23)
-
Alois Treindl -
Brian Park -
Chris Walton -
Clive D.W. Feather -
Derick Rethans -
dpatte -
Eliot Lear -
Fred Gleason -
Garrett Wollman -
Gilmore Davidson -
Guy Harris -
Howard Hinnant -
Ken Murchison -
Kevin Kenny -
Magnus Fromreide -
Mark Davis ☕️ -
Paul Eggert -
Paul Goyette -
Philip Paeps -
Stephen Colebourne -
Tom Lane -
Tony Finch -
Watson Ladd