An alternate framing of timezone maintenance
Over the past few days, I've felt like the framing of the discussion hasn't taken into account Paul's clearly expressed desire for the part of maintenance he wants to focus on, and has not attempted to incorporate that into a design that would preserve other properties that other mailing list participants are interested in. I've also wondered if all parties are making unnecessarily strong assumptions about the nature of tz maintenance that exclude potentially useful designs. In the hope of applying the maxim that all problems in computer science can be solved by adding a level of indirection, here's a wild proposal that, even if not workable as-is, might help in looking at the discussion from a different angle. One can think of the tz database as two layers. The first is a collection of rulesets that represent rules for clock changes in particular regions. Call that the timekeeping data set. The second is a many-to-one assignment of names to those rulesets. Call that the naming layer. The scheme used for the naming layer attempted to avoid politicization of that layer by using the continent and largest city approach. This was largely successful, particularly by the standards of attempts of this sort, but not entirely so. For years now, the tz project has in essence asked people to treat the zone names as opaque identifiers and not imbue them with political meaning. Unfortunately, because those identifiers embed real-world names with other meanings in other contexts, I believe this effort is doomed to never fully succeed. The names and spellings of cities are political. The choice of continent to which to assign a city can be political. Population counts are political. Readers of the mailing list can fill in more examples. However, the timekeeping data set, divorced from the naming layer, is as close to apolitical as anything involving laws and human practice could be. Putting aside timezone abbreviations, nearly all of the political conflict is over the naming layer, not the timekeeping data set. I believe Paul has clearly indicated that the part of the work that he wants to focus on is maintenance of the timekeeping data set. I would characterize his recent proposed changes as attempts to make the naming layer less political to reduce political arguments and thus allow more time and attention to be spent on the timekeeping data set, which is where the primary value of the project lies. The stability concerns that have prompted most of the recent discussion are almost entirely about the naming layer. Suppose we resurrect the idea of opaque timezone identifiers. Specifically, suppose that we *add* a new, random identifier, something like TZ0045 with random digits, to all existing rulesets in either the main database or backzone. These identifiers would be unique identifiers for the dataset itself, independent of any other names. These identifiers would immediately have some useful properties: 1. Historic times for a given identifier would change only if we discovered that the previous times were clearly erroneous. Apart from fixing discovered errors, historic times would be stable for any given identifier. 2. Looking forward, new identifiers may be added if portions of an existing region diverge in their timekeeping practices or if someone gathers new historical information that would prompt the creation of a new backzone ruleset, but that's the only possible change. Identifiers will never change or be retired. 3. These identifiers carry absolutely no additional political content on top of the rules themselves. In other words, they add no new political problems not inherent and unavoidable in the data itself. Adding these identifiers would nearly double the number of names in the current tz database, which is unfortunate, but certainly far less disruptive than the sorts of changes that have recently been considered. Once these identifiers exist, the combination of those identifiers and the timekeeping data set form a nearly apolitical collection of data to which a naming layer can be cleanly applied. One can, for example, define a naming layer that exactly corresponds to the naming in use in the previous release of the tz database. With the exception of the implementation detail that the previous names become links to a new canonical identifier, the combination of that naming layer and that conception of the timekeeping data set is functionally identical to the previous tz release (except for the normal sorts of modifications for on-the-ground timekeeping changes). This may sound like a lot of work just to get back to where we already are, but with a pile of new, ugly names. But the point of such a change is that it now permits a separation of concerns and even potentially a separation of maintenance. The timekeeping data set is now a separate artifact that those whose primary interest is in timekeeping data can focus on without having to get involved in political naming discussions. It achieves the goal that Paul has been working towards (but which is impossible to fully achieve with the current naming) of separating the data from political and historical decisions about who got a timezone name and who didn't. And (very slowly, of course) there is now the possibility for consumers of the tz database to opt out of the naming conventions. One could, for instance, choose a timezone based on selection from a map and have that correspond to the unique, permanent timezone identifier. Meanwhile, clearly there is a strong interest in the naming layer and a strong desire to continue to maintain it along lines that Paul is not entirely comfortable with. Recently, that discussion has focused on naming stability, but other parties have expressed other interests in the past (adding new spellings of cities, ensuring a name exists for every ISO-recognized country, ensuring a name exists for regional capitals that are commonly referenced locally as the name for a timezone, etc.). Nothing is going to make those discussions go away, as the past many years of discussions here have shown, but now they are separable from the timekeeping data set and participants can decide which part of the maintenance they're interested in. If Paul (or any other contributor) wished, he could choose to focus on the part of the project that he finds the most interesting and leave maintenance of the naming layer largely to other parties. Given recent mailing list traffic, there is obviously substantial interest in that naming layer and thus I'm sure there will be no shortage of volunteers to help maintain it. And those who make decisions about the naming layer can then also absorb the consequences of those decisions, such as handling arguments over the spelling of cities. It would even be possible (although not necessary) to move discussion of the naming layer to a separate mailing list to more clearly separate political discussion from ruleset maintenance and technical work on the associated code libraries. The naming layer, which is now nearly devoid of technical decisions, could even be delegated to a more political body that deals with these sorts of conflicts constantly and is thus better equipped to handle them than the tz mailing list. Numerous options like that become possible. Even if a maintenance split doesn't happen, I think everyone may benefit from cleanly separating the spectacularly high-quality resource of rulesets and their accompanying exhaustive references, discussion, and human-readable descriptions of applicable regions from the politically fraught but technically quite small and simple naming layer. This idea may not be workable for reasons that aren't obvious to me at nearly 4:00am, but hopefully it will at least provide a different angle from which to look at the current arguments and possibly achieve some clarity about which portions of the overall tz project people are interested in working on and where the exact controversy lies. -- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
I was in the process of starting another thread, but will add onto this as it is similar in spirit. Is this an opportunity to take advantage of some GitHub features to differentiate the type of work being done? Off the bat, I am *not* suggesting a pull-request model, but think that there is an opportunity to use branches in a strategic manner to differentiate the type of work. Currently, the public repository at https://github.com/eggert/tz has a single branch where work is done. Patches either applied/committed or proposed are posted here for comment. When a release is made, a git tag is created on main to represent that point in time. Could a next step be splitting out the work into separate branches by type of work? This isn't fully baked (and the names are more descriptive rather than prosed one), but it could looks like main - no work is directly done of this branch, only tagged for releases post-1970 - use this branch for changes for future changes or corrections to 1970+ timestamps ; ie normal data maintenance pre-1970 - use this branch for adding or correcting historical data ; ie normal historical updates maintenance - use this branch for code updates and bugfixes ; no data changes proposed - use this branch for potentially disruptive changes And then, instead of rolling commits to main, we define a pre-release window and then agreed upon changes are merged into main, tagged, and a proper release is cut, built, and formally announced. There are likely nuances that I am not thinking about, but this is a thought starter, and I think could allow necessary release to happen as needed (like the short notice data changes) while not holding up other work. --Matthew Donadio (matt@mxd120.com)
This is pretty much what I've been suggesting (though not in as much detail). I would also suggest we (re)consider distribution of the resulting data. Asking people to discover and run code to produce the format they desire seems unnecessary. Providing the data in a number of (popular) downloadable formats would seem reasonable. Timezone distribution service? On 9/22/21 06:52, Russ Allbery via tz wrote:
Over the past few days, I've felt like the framing of the discussion hasn't taken into account Paul's clearly expressed desire for the part of maintenance he wants to focus on, and has not attempted to incorporate that into a design that would preserve other properties that other mailing list participants are interested in. I've also wondered if all parties are making unnecessarily strong assumptions about the nature of tz maintenance that exclude potentially useful designs.
In the hope of applying the maxim that all problems in computer science can be solved by adding a level of indirection, here's a wild proposal that, even if not workable as-is, might help in looking at the discussion from a different angle.
One can think of the tz database as two layers. The first is a collection of rulesets that represent rules for clock changes in particular regions. Call that the timekeeping data set. The second is a many-to-one assignment of names to those rulesets. Call that the naming layer.
The scheme used for the naming layer attempted to avoid politicization of that layer by using the continent and largest city approach. This was largely successful, particularly by the standards of attempts of this sort, but not entirely so.
For years now, the tz project has in essence asked people to treat the zone names as opaque identifiers and not imbue them with political meaning. Unfortunately, because those identifiers embed real-world names with other meanings in other contexts, I believe this effort is doomed to never fully succeed. The names and spellings of cities are political. The choice of continent to which to assign a city can be political. Population counts are political. Readers of the mailing list can fill in more examples.
However, the timekeeping data set, divorced from the naming layer, is as close to apolitical as anything involving laws and human practice could be. Putting aside timezone abbreviations, nearly all of the political conflict is over the naming layer, not the timekeeping data set.
I believe Paul has clearly indicated that the part of the work that he wants to focus on is maintenance of the timekeeping data set. I would characterize his recent proposed changes as attempts to make the naming layer less political to reduce political arguments and thus allow more time and attention to be spent on the timekeeping data set, which is where the primary value of the project lies. The stability concerns that have prompted most of the recent discussion are almost entirely about the naming layer.
Suppose we resurrect the idea of opaque timezone identifiers. Specifically, suppose that we *add* a new, random identifier, something like TZ0045 with random digits, to all existing rulesets in either the main database or backzone. These identifiers would be unique identifiers for the dataset itself, independent of any other names. These identifiers would immediately have some useful properties:
1. Historic times for a given identifier would change only if we discovered that the previous times were clearly erroneous. Apart from fixing discovered errors, historic times would be stable for any given identifier.
2. Looking forward, new identifiers may be added if portions of an existing region diverge in their timekeeping practices or if someone gathers new historical information that would prompt the creation of a new backzone ruleset, but that's the only possible change. Identifiers will never change or be retired.
3. These identifiers carry absolutely no additional political content on top of the rules themselves. In other words, they add no new political problems not inherent and unavoidable in the data itself.
Adding these identifiers would nearly double the number of names in the current tz database, which is unfortunate, but certainly far less disruptive than the sorts of changes that have recently been considered.
Once these identifiers exist, the combination of those identifiers and the timekeeping data set form a nearly apolitical collection of data to which a naming layer can be cleanly applied. One can, for example, define a naming layer that exactly corresponds to the naming in use in the previous release of the tz database. With the exception of the implementation detail that the previous names become links to a new canonical identifier, the combination of that naming layer and that conception of the timekeeping data set is functionally identical to the previous tz release (except for the normal sorts of modifications for on-the-ground timekeeping changes).
This may sound like a lot of work just to get back to where we already are, but with a pile of new, ugly names. But the point of such a change is that it now permits a separation of concerns and even potentially a separation of maintenance.
The timekeeping data set is now a separate artifact that those whose primary interest is in timekeeping data can focus on without having to get involved in political naming discussions. It achieves the goal that Paul has been working towards (but which is impossible to fully achieve with the current naming) of separating the data from political and historical decisions about who got a timezone name and who didn't. And (very slowly, of course) there is now the possibility for consumers of the tz database to opt out of the naming conventions. One could, for instance, choose a timezone based on selection from a map and have that correspond to the unique, permanent timezone identifier.
Meanwhile, clearly there is a strong interest in the naming layer and a strong desire to continue to maintain it along lines that Paul is not entirely comfortable with. Recently, that discussion has focused on naming stability, but other parties have expressed other interests in the past (adding new spellings of cities, ensuring a name exists for every ISO-recognized country, ensuring a name exists for regional capitals that are commonly referenced locally as the name for a timezone, etc.). Nothing is going to make those discussions go away, as the past many years of discussions here have shown, but now they are separable from the timekeeping data set and participants can decide which part of the maintenance they're interested in.
If Paul (or any other contributor) wished, he could choose to focus on the part of the project that he finds the most interesting and leave maintenance of the naming layer largely to other parties. Given recent mailing list traffic, there is obviously substantial interest in that naming layer and thus I'm sure there will be no shortage of volunteers to help maintain it. And those who make decisions about the naming layer can then also absorb the consequences of those decisions, such as handling arguments over the spelling of cities. It would even be possible (although not necessary) to move discussion of the naming layer to a separate mailing list to more clearly separate political discussion from ruleset maintenance and technical work on the associated code libraries. The naming layer, which is now nearly devoid of technical decisions, could even be delegated to a more political body that deals with these sorts of conflicts constantly and is thus better equipped to handle them than the tz mailing list. Numerous options like that become possible.
Even if a maintenance split doesn't happen, I think everyone may benefit from cleanly separating the spectacularly high-quality resource of rulesets and their accompanying exhaustive references, discussion, and human-readable descriptions of applicable regions from the politically fraught but technically quite small and simple naming layer.
This idea may not be workable for reasons that aren't obvious to me at nearly 4:00am, but hopefully it will at least provide a different angle from which to look at the current arguments and possibly achieve some clarity about which portions of the overall tz project people are interested in working on and where the exact controversy lies.
I wanted to thank you for this detailed write up. In response, I'll say that I don't believe that naming is the root of the problem. There has been a lot of distracting discussion around ISO countries, city names and more, but the fundamental issue actually is with the timekeeping data set, not the naming. Whatever a region is called, it still represents *somewhere*. Changing the history of that somewhere is a big deal, whatever name you give the region. Don't get me wrong, I do understand the attraction of arbitrary names. But the existing names will never disappear as they are so widely used, and so useful. To me adding new names is net negative at this stage of tzdb. Stephen On Wed, 22 Sept 2021 at 11:52, Russ Allbery via tz <tz@iana.org> wrote:
Over the past few days, I've felt like the framing of the discussion hasn't taken into account Paul's clearly expressed desire for the part of maintenance he wants to focus on, and has not attempted to incorporate that into a design that would preserve other properties that other mailing list participants are interested in. I've also wondered if all parties are making unnecessarily strong assumptions about the nature of tz maintenance that exclude potentially useful designs.
In the hope of applying the maxim that all problems in computer science can be solved by adding a level of indirection, here's a wild proposal that, even if not workable as-is, might help in looking at the discussion from a different angle.
One can think of the tz database as two layers. The first is a collection of rulesets that represent rules for clock changes in particular regions. Call that the timekeeping data set. The second is a many-to-one assignment of names to those rulesets. Call that the naming layer.
The scheme used for the naming layer attempted to avoid politicization of that layer by using the continent and largest city approach. This was largely successful, particularly by the standards of attempts of this sort, but not entirely so.
For years now, the tz project has in essence asked people to treat the zone names as opaque identifiers and not imbue them with political meaning. Unfortunately, because those identifiers embed real-world names with other meanings in other contexts, I believe this effort is doomed to never fully succeed. The names and spellings of cities are political. The choice of continent to which to assign a city can be political. Population counts are political. Readers of the mailing list can fill in more examples.
However, the timekeeping data set, divorced from the naming layer, is as close to apolitical as anything involving laws and human practice could be. Putting aside timezone abbreviations, nearly all of the political conflict is over the naming layer, not the timekeeping data set.
I believe Paul has clearly indicated that the part of the work that he wants to focus on is maintenance of the timekeeping data set. I would characterize his recent proposed changes as attempts to make the naming layer less political to reduce political arguments and thus allow more time and attention to be spent on the timekeeping data set, which is where the primary value of the project lies. The stability concerns that have prompted most of the recent discussion are almost entirely about the naming layer.
Suppose we resurrect the idea of opaque timezone identifiers. Specifically, suppose that we *add* a new, random identifier, something like TZ0045 with random digits, to all existing rulesets in either the main database or backzone. These identifiers would be unique identifiers for the dataset itself, independent of any other names. These identifiers would immediately have some useful properties:
1. Historic times for a given identifier would change only if we discovered that the previous times were clearly erroneous. Apart from fixing discovered errors, historic times would be stable for any given identifier.
2. Looking forward, new identifiers may be added if portions of an existing region diverge in their timekeeping practices or if someone gathers new historical information that would prompt the creation of a new backzone ruleset, but that's the only possible change. Identifiers will never change or be retired.
3. These identifiers carry absolutely no additional political content on top of the rules themselves. In other words, they add no new political problems not inherent and unavoidable in the data itself.
Adding these identifiers would nearly double the number of names in the current tz database, which is unfortunate, but certainly far less disruptive than the sorts of changes that have recently been considered.
Once these identifiers exist, the combination of those identifiers and the timekeeping data set form a nearly apolitical collection of data to which a naming layer can be cleanly applied. One can, for example, define a naming layer that exactly corresponds to the naming in use in the previous release of the tz database. With the exception of the implementation detail that the previous names become links to a new canonical identifier, the combination of that naming layer and that conception of the timekeeping data set is functionally identical to the previous tz release (except for the normal sorts of modifications for on-the-ground timekeeping changes).
This may sound like a lot of work just to get back to where we already are, but with a pile of new, ugly names. But the point of such a change is that it now permits a separation of concerns and even potentially a separation of maintenance.
The timekeeping data set is now a separate artifact that those whose primary interest is in timekeeping data can focus on without having to get involved in political naming discussions. It achieves the goal that Paul has been working towards (but which is impossible to fully achieve with the current naming) of separating the data from political and historical decisions about who got a timezone name and who didn't. And (very slowly, of course) there is now the possibility for consumers of the tz database to opt out of the naming conventions. One could, for instance, choose a timezone based on selection from a map and have that correspond to the unique, permanent timezone identifier.
Meanwhile, clearly there is a strong interest in the naming layer and a strong desire to continue to maintain it along lines that Paul is not entirely comfortable with. Recently, that discussion has focused on naming stability, but other parties have expressed other interests in the past (adding new spellings of cities, ensuring a name exists for every ISO-recognized country, ensuring a name exists for regional capitals that are commonly referenced locally as the name for a timezone, etc.). Nothing is going to make those discussions go away, as the past many years of discussions here have shown, but now they are separable from the timekeeping data set and participants can decide which part of the maintenance they're interested in.
If Paul (or any other contributor) wished, he could choose to focus on the part of the project that he finds the most interesting and leave maintenance of the naming layer largely to other parties. Given recent mailing list traffic, there is obviously substantial interest in that naming layer and thus I'm sure there will be no shortage of volunteers to help maintain it. And those who make decisions about the naming layer can then also absorb the consequences of those decisions, such as handling arguments over the spelling of cities. It would even be possible (although not necessary) to move discussion of the naming layer to a separate mailing list to more clearly separate political discussion from ruleset maintenance and technical work on the associated code libraries. The naming layer, which is now nearly devoid of technical decisions, could even be delegated to a more political body that deals with these sorts of conflicts constantly and is thus better equipped to handle them than the tz mailing list. Numerous options like that become possible.
Even if a maintenance split doesn't happen, I think everyone may benefit from cleanly separating the spectacularly high-quality resource of rulesets and their accompanying exhaustive references, discussion, and human-readable descriptions of applicable regions from the politically fraught but technically quite small and simple naming layer.
This idea may not be workable for reasons that aren't obvious to me at nearly 4:00am, but hopefully it will at least provide a different angle from which to look at the current arguments and possibly achieve some clarity about which portions of the overall tz project people are interested in working on and where the exact controversy lies.
-- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
Stephen Colebourne via tz <tz@iana.org> writes:
In response, I'll say that I don't believe that naming is the root of the problem. There has been a lot of distracting discussion around ISO countries, city names and more, but the fundamental issue actually is with the timekeeping data set, not the naming. Whatever a region is called, it still represents *somewhere*. Changing the history of that somewhere is a big deal, whatever name you give the region.
I think you have not understood the post to which you are replying. Your objection that started this recent discussion is solely contained in the naming layer. You are objecting to the change to where Europe/Oslo points (and similar changes). Viewed through the separation of the timekeeping data set and the naming layer, your objection is that Europe/Oslo used to point to TZ1386 (or whatever), which contains historical data (of whatever quality) for Oslo, and now points to TZ1490 (or whatever), which contains historical data for Berlin. Nothing has changed about the rulesets. Nothing has changed about the recorded history. What has changed is where the *name* Europe/Oslo points, since it becomes an alias to Europe/Berlin instead of pointing to a separate ruleset (which still exists). Your concern can therefore be completely addressed in the naming layer by pointing the name Europe/Oslo back at TZ1386. All of the various build options that incorporate backzone or portions of backzone are, in this model, accomplishing two things: choosing which data sets to build (so, for instance, whether to build the TZ1386 ruleset at all), and modifying the many-to-one mapping between the naming layer and the timekeeping data set. The stability that you are asking for is addressed in this model by (a) always building the backzone rulesets so that their permanent identifiers are available, and (b) maintaining the mapping from Europe/Oslo to that identifier. The point of my message is that focusing on the effects that you see is misleading because the nature of the data is not what one might think it is when one is only looking at the downstream effects. You seem to think that the data has changed, but the data has not. Only the names assigned to it have changed.
Don't get me wrong, I do understand the attraction of arbitrary names. But the existing names will never disappear as they are so widely used, and so useful.
Oddly, that was exactly the point I was making in the message to which this theoretically was a reply. -- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
Russ Allbery via tz <tz@iana.org> writes:
I think you have not understood the post to which you are replying.
Your objection that started this recent discussion is solely contained in the naming layer. You are objecting to the change to where Europe/Oslo points (and similar changes). Viewed through the separation of the timekeeping data set and the naming layer, your objection is that Europe/Oslo used to point to TZ1386 (or whatever), which contains historical data (of whatever quality) for Oslo, and now points to TZ1490 (or whatever), which contains historical data for Berlin.
Nothing has changed about the rulesets. Nothing has changed about the recorded history. What has changed is where the *name* Europe/Oslo points, since it becomes an alias to Europe/Berlin instead of pointing to a separate ruleset (which still exists).
Your concern can therefore be completely addressed in the naming layer by pointing the name Europe/Oslo back at TZ1386.
As a theoretical argument, that's great. Given a few months or a year, maybe we could even implement such a model. The problem at hand is what are we going to ship *tomorrow*. There's no time to make such a thing happen. A secondary problem is that with or without a additional layer of indirection, what end users in Norway are going to care about is whether "Europe/Oslo" gives the same results it used to in a default build of tzdb. No amount of mechanism is going to let us escape making that decision. Nor does it seem like having multiple popular variants of tzdb will be a great outcome. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
As a theoretical argument, that's great. Given a few months or a year, maybe we could even implement such a model. The problem at hand is what are we going to ship *tomorrow*. There's no time to make such a thing happen.
I completely agree that this thread is about the long-term direction and an attempt to provide a way of reframing the problem so that various parties don't feel backed into a corner with no path forward other than a fork. None of this discussion addresses the immediate concern of what is in the 2021b release. I stated my opinion on that topic in a different thread earlier today. -- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
Russ Allbery via tz <tz@iana.org> writes:
I completely agree that this thread is about the long-term direction and an attempt to provide a way of reframing the problem so that various parties don't feel backed into a corner with no path forward other than a fork. None of this discussion addresses the immediate concern of what is in the 2021b release.
So after thinking about this for awhile, I don't see how adding a layer of indirection would help us as far as the contentious questions go. That's because the contentious questions are about what view of the data will be seen by the average end user using a default installation of tzdb. We can't abdicate making choices about that, and we shouldn't want to, because if every platform vendor starts making their own choices about it we'll have chaos. To take an example, I think people would agree that best practice right now for letting a user select their time zone is to provide a world map that can be clicked on. (macOS does it that way for instance.) So the user clicks somewhere within Norway ... what next? Should the machine now pop up a dialog box saying "I see you are in Norway. Would you prefer probably-accurate pre-1970 data for Norway, or certainly-wrong pre-1970 data for Norway?" The idea is laughable --- nobody's going to take the second choice if presented with the option. Lower-level APIs for selecting time zone, such as setting a TZ environment variable or filling in /etc/localtime, typically involve time zone names these days. For instance on my Linux machine I've got $ ls -l /etc/localtime lrwxrwxrwx. 1 root root 38 May 30 2020 /etc/localtime -> ../usr/share/zoneinfo/America/New_York You can quibble about whether that's the most ideal representation, but it's what we've arrived at, and I think there's not much chance of changing it. If tzdb were to say "You can't write America/New_York any more, you have to write TZ5828", the universal response would be "Up with this nonsense I shall not put". There would instantly spring up a cottage industry making APIs to map intelligible identifiers to TZ IDs, and the best outcome we could hope for is that all those APIs chose to just duplicate the old zone names. If every system decided to invent its own names for zones, again we've got chaos that serves nobody. So really the responsibility for choosing those names has to stay with tzdb. In short, I think that opaque IDs that are hidden within tzdb won't add anything, while exposing them as external identifiers is just a non-starter. That's probably why the previous discussions of the idea haven't gone anywhere. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
Russ Allbery via tz <tz@iana.org> writes:
I completely agree that this thread is about the long-term direction and an attempt to provide a way of reframing the problem so that various parties don't feel backed into a corner with no path forward other than a fork. None of this discussion addresses the immediate concern of what is in the 2021b release.
So after thinking about this for awhile, I don't see how adding a layer of indirection would help us as far as the contentious questions go. That's because the contentious questions are about what view of the data will be seen by the average end user using a default installation of tzdb. We can't abdicate making choices about that, and we shouldn't want to, because if every platform vendor starts making their own choices about it we'll have chaos.
I agree with almost all of that except the critical place where I think it might help: separating the data from the naming means that people who primarily want to work on the data and find the naming frustrating and political *can* abdicate making choices about that. Specifically, the naming and the data can be maintained by *different people*. My outsider perspective is that a large source of the conflict here is different goals. From Paul's previous statements, I think it's clear that he wants his work to be as apolitical as possible and is uncomfortable with the places where the tzdata database necessarily makes political decisions involving naming. Separating the naming from the data creates a set of data that one can work on apolitically. I completely agree with you that someone still has to maintain the naming and make those political decisions, but that work is much less *technical* and much more obviously political. Perhaps, with a level of indirection, the people who do not want to do that political work (or are not well-equipped to do it, which I would argue this mailing list probably is not) will see a path clear to delegate that work to another group that is willing to take it on. Then everyone is working on things that are near and dear to their heart without being forced to consider things that they find unpleasant. In other words, this isn't a *technical* fix (although it requires a technical component). It's a *social* approach, which to me felt like the piece that was missing in previous discussions.
To take an example, I think people would agree that best practice right now for letting a user select their time zone is to provide a world map that can be clicked on. (macOS does it that way for instance.) So the user clicks somewhere within Norway ... what next? Should the machine now pop up a dialog box saying "I see you are in Norway. Would you prefer probably-accurate pre-1970 data for Norway, or certainly-wrong pre-1970 data for Norway?"
Obviously not; obviously the author of the time zone selection tool should be opinionated about that choice and make a choice for the user. And obviously those opinions should be pooled and hashed out somewhere. The key point is that somewhere *doesn't need to be here*. This is less obvious for something like Norway and becomes more obvious if the user clicks on Hebron or Crimea or Xinjiang. The tz project already does not take a position on the mapping of geographical coordinates to time zones and leaves that to other projects. Obviously there is input that needs to flow in both directions. For instance, clearly indicating the quality and believed accuracy of a given data set would be useful. But my point is that so much of this conflict currently is between folks who are trying to remove politics from the work and folks who care very deeply about the outcome of those political decisions (or about stability decisions which, while not directly about politics, have significant impact on the political layer and very little impact on the data collection layer). We can't stop doing both parts of that work, but we *can* separate them so that everyone gets some distance and can retreat to the part of the work that they care about the most without impacting the other side of it.
If tzdb were to say "You can't write America/New_York any more, you have to write TZ5828", the universal response would be "Up with this nonsense I shall not put".
No, this is definitely not what I'm saying. It's obvious that we cannot discard the current naming layer, and indeed a partial attempt to do that is exactly what set off the current disagreement. -- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
participants (5)
-
Matthew Donadio -
Michael Douglass -
Russ Allbery -
Stephen Colebourne -
Tom Lane