On 03.06.20 12:17, Clive D.W. Feather wrote:
Alois Treindl said:
   I have analyzed to what extent TZ database represents the time zone
   information correctly, for these data.

   I think the results are of some interest for the TZ community.

   The overall sums are:

   tz_count    24'674'767  data records, 100%
   tz_irange   23'675'636 in time range and region which TZ covers
   correctly, 96%
   tz_good        736'769 in time range and region where TZ is unreliable,
   but correct, 3%
   tz_bad         262'362 in time range and region where TZ database gives
   false result. 1%
Can you please explain how you've done this analysis. In particular, how do
you know that TZ is "unreliable, but correct" or gives a bad result?

As I said, we have other sources which we use besides TZ data. These sources are used for the pre-1970 cases where we know that TZ data is incomplete.

I give you an example:

For Germany TZ data uses Berlin. But Berlin had double summer time in 1945, which the western occupation zones (what later became federal republic of Germany) did not have.
Europe/Berlin does not represent the whole of Germany well, before 1946.

There are also some minor differences in the late 19th century, when different provinces switched from local mean time to Central European time.
There are other complications, not represented by TZ database.
For example in the 1920s France coccupied the Rhineland and Ruhr area, and west of the Rhine French time (UT+0) was used.
After 1st world war, parts of Prussia fell to Poland.
After 2nd world war, more parts of Prussia fell to Poland, and Königsberg province (Kaliningrad) fell to Russia.
These all have consequences for correct pre-1970 time zone history.

This means in our system that for West Germany, TZ data is unreliable before 1946, and we do not use it, but use other tables.

This incorrectness of TZ database for Germany applies only to those few months on 1945. But we say 'West Germany before 1946, do not use TZ database'.

the results for Germany are:

GER       1202363         95.4     4.4     0.2         Germany

this means: we have 1'202'363 cases.

95.4% of these fall either after 1 Jan 1946, or are in area described correctly by Europe/Berlin. For these we use TZ database.
4.6% of cases do not fall into this category, and we do not use TZ databases for them.

If we would use TZ database for these 4.6% cases, 4.4% would come out correct anyway, but 0.2% would come out bad.

This are 0.2 % of those data records our users have entered into our database, and used out 'automatic time zone' setting. They could have used 'manual time zone', and I have ignored those for the statistics.