In this thread I want to try and capture the *requirements* for the project from the perspective of a downstream user. Given this, we may then have a chance to analyse whether the project needs a change of rules. NOTE: The discussion below is not a proposal, nor does it seek to explore which locations get pre-1970 data. Lets keep that for another thread. -------- 1) "What is the time now?" To answer this question you need a dataset that records the current timezone rules for each region in the world. Over time that data will naturally build up to form a historic record. To aid the project, a 1970 rule has been adopted for data management. In some parts of the world, clocks have agreed since 1970 across large parts of a continent. In other parts of the world, different parts of the same country have had different rules since 1970. Viewing the data through a post-1970 lens splits the planet into regions where clocks have agreed since 1970. Some of these regions are quite abstract, for example the region including Iceland, Ivory Coast, and St Helena. An abstract region would need to be split if one of the parts of the region changed its timezone rules. If a user had previously selected to follow one of these abstract regions, they might have to change their timezone settings. For example, if a user in Iceland selected their abstract region by the recommended ID based on the largest city they would get `Africa/Abidjan`, but if the Ivory Coast changed their rules, users in Iceland would have to change their selected timezone. The minimum set of IDs is one ID for each abstract region. The current ID scheme selects the largest city in the abstract region. My interpretation: * An abstract region should theoretically have an ID separate to the IDs of the cities/countries in the region * Trying the ID of the abstract region to the largest city in the region is prone to failure if that city changes timezone rules but the rest of the abstract region stays the same. * An abstract region ID would naturally have no pre-1970 data, as it doesn't define data in a single city/location. * The alternative is to encourage users to use their local city, which means that IDs like Europe/Amsterdam and Europe/Oslo cannot be deprecated -------- 2) "What was the time at some instant in the past?" The project can answer this question pretty reliably for dates from 1970 onwards because of the data set in #1. The project can never realistically answer the question reliably or accurately before 1970 for *all* locations. For example, timezone rules were entirely local in the US for about 20 years, meaning that hundreds of entries would be needed to correctly capture the data. In addition, much of the historic data globally is unknown and not viable to research, even if someone wanted to do so. It is possible to research and record historic timezone rules for *some* locations. Europe/London being one example. The project generally operates on the basis that if an ID exists, historical data can be recorded against it, although which file that data lives in has varied over time. IDs are generally not created solely to document historic data. There is no minimum set of IDs for this use case - the set of possible IDs is unbounded. My interpretation: * The majority of pre-1970 data isn't much use * Some pre-1970 data is high quality, and almost certainly relied upon (eg Europe/London) * Without a massive increase in scope the best that can be done pre-1970 is to record data against IDs created for some other reason. -------- 3) "An event will happen at this time in the future" If an end-user wants to store an event in the future, eg. a one hour meeting next month, the correct approach is to store the date, time and zone ID, not the offset (which is not yet known for certain).Even if the local timezone authority changes the tz rules between now and the event, the time of the event is still correct. https://stackoverflow.com/questions/19166995/java-calendar-date-and-time-man... For this approach to work, the ID that is stored must be aligned with the timezone authority. Business contracts are frequently written to describe future events. For example, payment must be made by 2pm Paris time on a given date. The minimum set of IDs for this use case is one ID per timezone authority. This is typically the country but not necessarily, examples being decisions devolved to states/nations etc. My interpretation: * Using TZDB IDs for future events/contracts is very widespread * In most parts of the world, this requires one ID per country, in some parts of the world additional IDs would ideally be needed such as per state in the US * One ID for each ISO country (as was the guideline in the past) is a reasonable minimum expectation of end users * While one might argue IDs for future events are not TZDBs problem, I believe the reality is that they are because usage in this way is already very widespread -------- 4) "An event will happen at this time in the future relative to a shared/common definition" Some future events are declared relative to a definition that is supra-national or federal. For example, US Mountain, US Pacific, EU Central etc. A TV show might be defined to air at 8pm Mountain Time in one months time. It will air at that time regardless of whether any of the states that are currently on Mountain Time change to Pacific Time. For example, even if Denver (and Colorado) moved to Pacific Time or stopped following DST, Mountain Time would still exist and the TV show would still air at 8pm Mountain. The minimum set of IDs for this use case is one ID per shared/common definition. These IDs are not full aliases for a country or city ID, although they may currently share data. To support this use case, these IDs cannot be deprecated. My interpretation: * Some shared zone IDs are distinct and separate enough to warrant a full ID * US Mountain time is not the same as Denver time when considering the future, the IDs could diverge * In TZDB terms, the Link from US/Mountain to America/Denver is fine to define the data, but US/Mountain should not be considered deprecated as it has a real and distinct meaning separate from America/Denver * It would be useful to agree what the criteria for including these shared/federal rules is. -------- 5) "Data should be backwards compatible" The project has various associated downstream projects. Many of these have an expectation of backwards compatibility. This compatibility extends to the binary format, the source format, and the data contained within. This use case wants to see the project just continue with the data it has, enhancing it over time. The project has a pretty good record in terms of not removing IDs. For example, IDs exist for "Portugal" and "Poland", but not "Italy" or "Spain" simply because they were defined once upon a time. My understanding is that IDs in the `backward` file represent deprecated IDs that should not be used going forward. Yet, downstream projects focussed on backwards compatibility will generally use the `backward` file in full. The minimum set of IDs for this use case is the set of IDs that existed in the previous release. My interpretation: * Many users care about compatibility * The current IDs are not going away * Inconsistency has existed in TZDB for many years, see "Portugal" vs "Spain" - no one wants to fix that * It is hard at present to identify IDs that are deprecated because of proper aliasing, such as spelling changes, vs those where the ID is a real location but its TZDB status has been reduced. * Change may be possible with a long notice period --------- Please note that the above is not a proposal, but an exploration of the design space. If you have any use cases I've missed feel free to write them up. thanks Stephen