>From 3b5f9871e8befa3b105c25ffaac28f1085a36526 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 30 Jul 2015 00:15:25 -0700
Subject: [PROPOSED PATCH] * Theory: Reorder to put naming issues earlier.

This entails some rewording to avoid use-before-define problems.
Add missing "Accuracy" entry to the outline.
Move public-domain notice to the end.
* NEWS: Document this.
2015-07-29  Paul Eggert  <eggert@cs.ucla.edu>
---
 NEWS   |   3 +
 Theory | 708 +++++++++++++++++++++++++++++++++--------------------------------
 2 files changed, 361 insertions(+), 350 deletions(-)

diff --git a/NEWS b/NEWS
index 2c45231..d6c0eae 100644
--- a/NEWS
+++ b/NEWS
@@ -56,6 +56,9 @@ Unreleased, experimental changes
 
   Changes affecting documentation
 
+    The Theory file mentions naming issues earlier, as these seem to be
+    poorly publicized (thanks to Gilmore Davidson for reporting the problem).
+
     tz-link.htm mentions Time Zone Database Parser (thanks to Howard Hinnant).
 
 
diff --git a/Theory b/Theory
index 9861a4f..ca4ff19 100644
--- a/Theory
+++ b/Theory
@@ -1,233 +1,233 @@
-This file is in the public domain, so clarified as of
-2009-05-17 by Arthur David Olson.
+Theory and pragmatics of the tz code and data
+
 
 ----- Outline -----
 
-	Time and date functions
 	Scope of the tz database
-	Names of time zone rule files
+	Names of time zone rules
 	Time zone abbreviations
+	Accuracy of the tz database
+	Time and date functions
 	Calendrical issues
 	Time and time zones on Mars
 
------ Time and date functions -----
 
-These time and date functions are upwards compatible with those of POSIX,
-an international standard for UNIX-like systems.
-As of this writing, the current edition of POSIX is:
+----- Scope of the tz database -----
+
+The tz database attempts to record the history and predicted future of
+all computer-based clocks that track civil time.  To represent this
+data, the world is partitioned into regions whose clocks all agree
+about time stamps that occur after the somewhat-arbitrary cutoff point
+of the POSIX Epoch (1970-01-01 00:00:00 UTC).  For each such region,
+the database records all known clock transitions, and labels the region
+with a notable location.  Although 1970 is a somewhat-arbitrary
+cutoff, there are significant challenges to moving the cutoff earlier
+even by a decade or two, due to the wide variety of local practices
+before computer timekeeping became prevalent.
+
+Clock transitions before 1970 are recorded for each such location,
+because most systems support time stamps before 1970 and could
+misbehave if data entries were omitted for pre-1970 transitions.
+However, the database is not designed for and does not suffice for
+applications requiring accurate handling of all past times everywhere,
+as it would take far too much effort and guesswork to record all
+details of pre-1970 civil timekeeping.
+
+As described below, reference source code for using the tz database is
+also available.  The tz code is upwards compatible with POSIX, an
+international standard for UNIX-like systems.  As of this writing, the
+current edition of POSIX is:
 
   The Open Group Base Specifications Issue 7
   IEEE Std 1003.1, 2013 Edition
   <http://pubs.opengroup.org/onlinepubs/9699919799/>
 
-POSIX has the following properties and limitations.
 
-*	In POSIX, time display in a process is controlled by the
-	environment variable TZ.  Unfortunately, the POSIX TZ string takes
-	a form that is hard to describe and is error-prone in practice.
-	Also, POSIX TZ strings can't deal with other (for example, Israeli)
-	daylight saving time rules, or situations where more than two
-	time zone abbreviations are used in an area.
-
-	The POSIX TZ string takes the following form:
 
-		stdoffset[dst[offset][,date[/time],date[/time]]]
+----- Names of time zone rules -----
 
-	where:
+Each of the database's time zone rules has a unique name.
+Inexperienced users are not expected to select these names unaided.
+Distributors should provide documentation and/or a simple selection
+interface that explains the names; for one example, see the 'tzselect'
+program in the tz code.
 
-	std and dst
-		are 3 or more characters specifying the standard
-		and daylight saving time (DST) zone names.
-		Starting with POSIX.1-2001, std and dst may also be
-		in a quoted form like "<UTC+10>"; this allows
-		"+" and "-" in the names.
-	offset
-		is of the form '[+-]hh:[mm[:ss]]' and specifies the
-		offset west of UT.  'hh' may be a single digit; 0<=hh<=24.
-		The default DST offset is one hour ahead of standard time.
-	date[/time],date[/time]
-		specifies the beginning and end of DST.  If this is absent,
-		the system supplies its own rules for DST, and these can
-		differ from year to year; typically US DST rules are used.
-	time
-		takes the form 'hh:[mm[:ss]]' and defaults to 02:00.
-		This is the same format as the offset, except that a
-		leading '+' or '-' is not allowed.
-	date
-		takes one of the following forms:
-		Jn (1<=n<=365)
-			origin-1 day number not counting February 29
-		n (0<=n<=365)
-			origin-0 day number counting February 29 if present
-		Mm.n.d (0[Sunday]<=d<=6[Saturday], 1<=n<=5, 1<=m<=12)
-			for the dth day of week n of month m of the year,
-			where week 1 is the first week in which day d appears,
-			and '5' stands for the last week in which day d appears
-			(which may be either the 4th or 5th week).
-			Typically, this is the only useful form;
-			the n and Jn forms are rarely used.
-
-	Here is an example POSIX TZ string, for US Pacific time using rules
-	appropriate from 1987 through 2006:
-
-		TZ='PST8PDT,M4.1.0/02:00,M10.5.0/02:00'
-
-	This POSIX TZ string is hard to remember, and mishandles time stamps
-	before 1987 and after 2006.  With this package you can use this
-	instead:
-
-		TZ='America/Los_Angeles'
+The time zone rule naming conventions attempt to strike a balance
+among the following goals:
 
-*	POSIX does not define the exact meaning of TZ values like "EST5EDT".
-	Typically the current US DST rules are used to interpret such values,
-	but this means that the US DST rules are compiled into each program
-	that does time conversion.  This means that when US time conversion
-	rules change (as in the United States in 1987), all programs that
-	do time conversion must be recompiled to ensure proper results.
+ * Uniquely identify every region where clocks have agreed since 1970.
+   This is essential for the intended use: static clocks keeping local
+   civil time.
 
-*	In POSIX, there's no tamper-proof way for a process to learn the
-	system's best idea of local wall clock.  (This is important for
-	applications that an administrator wants used only at certain times -
-	without regard to whether the user has fiddled the "TZ" environment
-	variable.  While an administrator can "do everything in UTC" to get
-	around the problem, doing so is inconvenient and precludes handling
-	daylight saving time shifts - as might be required to limit phone
-	calls to off-peak hours.)
+ * Indicate to experts where that region is.
 
-*	POSIX requires that systems ignore leap seconds.
+ * Be robust in the presence of political changes.  For example, names
+   of countries are ordinarily not used, to avoid incompatibilities
+   when countries change their name (e.g. Zaire->Congo) or when
+   locations change countries (e.g. Hong Kong from UK colony to
+   China).
 
-*	The tz code attempts to support all the time_t implementations
-	allowed by POSIX.  The time_t type represents a nonnegative count of
-	seconds since 1970-01-01 00:00:00 UTC, ignoring leap seconds.
-	In practice, time_t is usually a signed 64- or 32-bit integer; 32-bit
-	signed time_t values stop working after 2038-01-19 03:14:07 UTC, so
-	new implementations these days typically use a signed 64-bit integer.
-	Unsigned 32-bit integers are used on one or two platforms,
-	and 36-bit and 40-bit integers are also used occasionally.
-	Although earlier POSIX versions allowed time_t to be a
-	floating-point type, this was not supported by any practical
-	systems, and POSIX.1-2013 and the tz code both require time_t
-	to be an integer type.
+ * Be portable to a wide variety of implementations.
 
-These are the extensions that have been made to the POSIX functions:
+ * Use a consistent naming conventions over the entire world.
 
-*	The "TZ" environment variable is used in generating the name of a file
-	from which time zone information is read (or is interpreted a la
-	POSIX); "TZ" is no longer constrained to be a three-letter time zone
-	name followed by a number of hours and an optional three-letter
-	daylight time zone name.  The daylight saving time rules to be used
-	for a particular time zone are encoded in the time zone file;
-	the format of the file allows U.S., Australian, and other rules to be
-	encoded, and allows for situations where more than two time zone
-	abbreviations are used.
+Names normally have the form AREA/LOCATION, where AREA is the name
+of a continent or ocean, and LOCATION is the name of a specific
+location within that region.  North and South America share the same
+area, 'America'.  Typical names are 'Africa/Cairo', 'America/New_York',
+and 'Pacific/Honolulu'.
 
-	It was recognized that allowing the "TZ" environment variable to
-	take on values such as "America/New_York" might cause "old" programs
-	(that expect "TZ" to have a certain form) to operate incorrectly;
-	consideration was given to using some other environment variable
-	(for example, "TIMEZONE") to hold the string used to generate the
-	time zone information file name.  In the end, however, it was decided
-	to continue using "TZ": it is widely used for time zone purposes;
-	separately maintaining both "TZ" and "TIMEZONE" seemed a nuisance;
-	and systems where "new" forms of "TZ" might cause problems can simply
-	use TZ values such as "EST5EDT" which can be used both by
-	"new" programs (a la POSIX) and "old" programs (as zone names and
-	offsets).
+Here are the general rules used for choosing location names,
+in decreasing order of importance:
 
-*	To handle places where more than two time zone abbreviations are used,
-	the functions "localtime" and "gmtime" set tzname[tmp->tm_isdst]
-	(where "tmp" is the value the function returns) to the time zone
-	abbreviation to be used.  This differs from POSIX, where the elements
-	of tzname are only changed as a result of calls to tzset.
+	Use only valid POSIX file name components (i.e., the parts of
+		names other than '/').  Do not use the file name
+		components '.' and '..'.  Within a file name component,
+		use only ASCII letters, '.', '-' and '_'.  Do not use
+		digits, as that might create an ambiguity with POSIX
+		TZ strings.  A file name component must not exceed 14
+		characters or start with '-'.  E.g., prefer 'Brunei'
+		to 'Bandar_Seri_Begawan'.  Exceptions: see the discussion
+		of legacy names below.
+	A name must not be empty, or contain '//', or start or end with '/'.
+	Do not use names that differ only in case.  Although the reference
+		implementation is case-sensitive, some other implementations
+		are not, and they would mishandle names differing only in case.
+	If one name A is an initial prefix of another name AB (ignoring case),
+		then B must not start with '/', as a regular file cannot have
+		the same name as a directory in POSIX.  For example,
+		'America/New_York' precludes 'America/New_York/Bronx'.
+	Uninhabited regions like the North Pole and Bouvet Island
+		do not need locations, since local time is not defined there.
+	There should typically be at least one name for each ISO 3166-1
+		officially assigned two-letter code for an inhabited country
+		or territory.
+	If all the clocks in a region have agreed since 1970,
+		don't bother to include more than one location
+		even if subregions' clocks disagreed before 1970.
+		Otherwise these tables would become annoyingly large.
+	If a name is ambiguous, use a less ambiguous alternative;
+		e.g. many cities are named San Jos�� and Georgetown, so
+		prefer 'Costa_Rica' to 'San_Jose' and 'Guyana' to 'Georgetown'.
+	Keep locations compact.  Use cities or small islands, not countries
+		or regions, so that any future time zone changes do not split
+		locations into different time zones.  E.g. prefer 'Paris'
+		to 'France', since France has had multiple time zones.
+	Use mainstream English spelling, e.g. prefer 'Rome' to 'Roma', and
+		prefer 'Athens' to the Greek '����������' or the Romanized 'Ath��na'.
+		The POSIX file name restrictions encourage this rule.
+	Use the most populous among locations in a zone,
+		e.g. prefer 'Shanghai' to 'Beijing'.  Among locations with
+		similar populations, pick the best-known location,
+		e.g. prefer 'Rome' to 'Milan'.
+	Use the singular form, e.g. prefer 'Canary' to 'Canaries'.
+	Omit common suffixes like '_Islands' and '_City', unless that
+		would lead to ambiguity.  E.g. prefer 'Cayman' to
+		'Cayman_Islands' and 'Guatemala' to 'Guatemala_City',
+		but prefer 'Mexico_City' to 'Mexico' because the country
+		of Mexico has several time zones.
+	Use '_' to represent a space.
+	Omit '.' from abbreviations in names, e.g. prefer 'St_Helena'
+		to 'St._Helena'.
+	Do not change established names if they only marginally
+		violate the above rules.  For example, don't change
+		the existing name 'Rome' to 'Milan' merely because
+		Milan's population has grown to be somewhat greater
+		than Rome's.
+	If a name is changed, put its old spelling in the 'backward' file.
+		This means old spellings will continue to work.
 
-*	Since the "TZ" environment variable can now be used to control time
-	conversion, the "daylight" and "timezone" variables are no longer
-	needed.  (These variables are defined and set by "tzset"; however, their
-	values will not be used by "localtime.")
+The file 'zone1970.tab' lists geographical locations used to name time
+zone rules.  It is intended to be an exhaustive list of names for
+geographic regions as described above; this is a subset of the names
+in the data.  Although a 'zone1970.tab' location's longitude
+corresponds to its LMT offset with one hour for every 15 degrees east
+longitude, this relationship is not exact.
 
-*	The "localtime" function has been set up to deliver correct results
-	for near-minimum or near-maximum time_t values.  (A comment in the
-	source code tells how to get compatibly wrong results).
+Older versions of this package used a different naming scheme,
+and these older names are still supported.
+See the file 'backward' for most of these older names
+(e.g., 'US/Eastern' instead of 'America/New_York').
+The other old-fashioned names still supported are
+'WET', 'CET', 'MET', and 'EET' (see the file 'europe').
 
-*	A function "tzsetwall" has been added to arrange for the system's
-	best approximation to local wall clock time to be delivered by
-	subsequent calls to "localtime."  Source code for portable
-	applications that "must" run on local wall clock time should call
-	"tzsetwall();" if such code is moved to "old" systems that don't
-	provide tzsetwall, you won't be able to generate an executable program.
-	(These time zone functions also arrange for local wall clock time to be
-	used if tzset is called - directly or indirectly - and there's no "TZ"
-	environment variable; portable applications should not, however, rely
-	on this behavior since it's not the way SVR2 systems behave.)
+Older versions of this package defined legacy names that are
+incompatible with the first rule of location names, but which are
+still supported.  These legacy names are mostly defined in the file
+'etcetera'.  Also, the file 'backward' defines the legacy names
+'GMT0', 'GMT-0', 'GMT+0' and 'Canada/East-Saskatchewan', and the file
+'northamerica' defines the legacy names 'EST5EDT', 'CST6CDT',
+'MST7MDT', and 'PST8PDT'.
 
-*	Negative time_t values are supported, on systems where time_t is signed.
+Excluding 'backward' should not affect the other data.  If
+'backward' is excluded, excluding 'etcetera' should not affect the
+remaining data.
 
-*	These functions can account for leap seconds, thanks to Bradley White.
 
-Points of interest to folks with other systems:
+----- Time zone abbreviations -----
 
-*	This package is already part of many POSIX-compliant hosts,
-	including BSD, HP, Linux, Network Appliance, SCO, SGI, and Sun.
-	On such hosts, the primary use of this package
-	is to update obsolete time zone rule tables.
-	To do this, you may need to compile the time zone compiler
-	'zic' supplied with this package instead of using the system 'zic',
-	since the format of zic's input changed slightly in late 1994,
-	and many vendors still do not support the new input format.
+When this package is installed, it generates time zone abbreviations
+like 'EST' to be compatible with human tradition and POSIX.
+Here are the general rules used for choosing time zone abbreviations,
+in decreasing order of importance:
 
-*	The UNIX Version 7 "timezone" function is not present in this package;
-	it's impossible to reliably map timezone's arguments (a "minutes west
-	of GMT" value and a "daylight saving time in effect" flag) to a
-	time zone abbreviation, and we refuse to guess.
-	Programs that in the past used the timezone function may now examine
-	tzname[localtime(&clock)->tm_isdst] to learn the correct time
-	zone abbreviation to use.  Alternatively, use
-	localtime(&clock)->tm_zone if this has been enabled.
+	Use abbreviations that consist of three or more ASCII letters.
+		Previous editions of this database also used characters like
+		' ' and '?', but these characters have a special meaning to
+		the shell and cause commands like
+			set `date`
+		to have unexpected effects.
+		Previous editions of this rule required upper-case letters,
+		but the Congressman who introduced Chamorro Standard Time
+		preferred "ChST", so the rule has been relaxed.
 
-*	The 4.2BSD gettimeofday function is not used in this package.
-	This formerly let users obtain the current UTC offset and DST flag,
-	but this functionality was removed in later versions of BSD.
+		This rule guarantees that all abbreviations could have
+		been specified by a POSIX TZ string.  POSIX
+		requires at least three characters for an
+		abbreviation.  POSIX through 2000 says that an abbreviation
+		cannot start with ':', and cannot contain ',', '-',
+		'+', NUL, or a digit.  POSIX from 2001 on changes this
+		rule to say that an abbreviation can contain only '-', '+',
+		and alphanumeric characters from the portable character set
+		in the current locale.  To be portable to both sets of
+		rules, an abbreviation must therefore use only ASCII
+		letters.
 
-*	In SVR2, time conversion fails for near-minimum or near-maximum
-	time_t values when doing conversions for places that don't use UT.
-	This package takes care to do these conversions correctly.
+	Use abbreviations that are in common use among English-speakers,
+		e.g. 'EST' for Eastern Standard Time in North America.
+		We assume that applications translate them to other languages
+		as part of the normal localization process; for example,
+		a French application might translate 'EST' to 'HNE'.
 
-The functions that are conditionally compiled if STD_INSPIRED is defined
-should, at this point, be looked on primarily as food for thought.  They are
-not in any sense "standard compatible" - some are not, in fact, specified in
-*any* standard.  They do, however, represent responses of various authors to
-standardization proposals.
+	For zones whose times are taken from a city's longitude, use the
+		traditional xMT notation, e.g. 'PMT' for Paris Mean Time.
+		The only name like this in current use is 'GMT'.
 
-Other time conversion proposals, in particular the one developed by folks at
-Hewlett Packard, offer a wider selection of functions that provide capabilities
-beyond those provided here.  The absence of such functions from this package
-is not meant to discourage the development, standardization, or use of such
-functions.  Rather, their absence reflects the decision to make this package
-contain valid extensions to POSIX, to ensure its broad acceptability.  If
-more powerful time conversion functions can be standardized, so much the
-better.
+	If there is no common English abbreviation, abbreviate the English
+		translation of the usual phrase used by native speakers.
+		If this is not available or is a phrase mentioning the country
+		(e.g. "Cape Verde Time"), then:
 
+		When a country is identified with a single or principal zone,
+			append 'T' to the country's ISO	code, e.g. 'CVT' for
+			Cape Verde Time.  For summer time append 'ST';
+			for double summer time append 'DST'; etc.
+		Otherwise, take the first three letters of an English place
+			name identifying each zone and append 'T', 'ST', etc.
+			as before; e.g. 'VLAST' for VLAdivostok Summer Time.
 
------ Scope of the tz database -----
+	Use 'LMT' for local mean time of locations before the introduction
+		of standard time; see "Scope of the tz database".
 
-The tz database attempts to record the history and predicted future of
-all computer-based clocks that track civil time.  To represent this
-data, the world is partitioned into regions whose clocks all agree
-about time stamps that occur after the somewhat-arbitrary cutoff point
-of the POSIX Epoch (1970-01-01 00:00:00 UTC).  For each such region,
-the database records all known clock transitions, and labels the region
-with a notable location.  Although 1970 is a somewhat-arbitrary
-cutoff, there are significant challenges to moving the cutoff earlier
-even by a decade or two, due to the wide variety of local practices
-before computer timekeeping became prevalent.
+	Use UT (with time zone abbreviation 'zzz') for locations while
+		uninhabited.  The 'zzz' mnemonic is that these locations are,
+		in some sense, asleep.
 
-Clock transitions before 1970 are recorded for each such location,
-because most POSIX-compatible systems support negative time stamps and
-could misbehave if data entries were omitted for pre-1970 transitions.
-However, the database is not designed for and does not suffice for
-applications requiring accurate handling of all past times everywhere,
-as it would take far too much effort and guesswork to record all
-details of pre-1970 civil timekeeping.
+Application writers should note that these abbreviations are ambiguous
+in practice: e.g. 'CST' has a different meaning in China than
+it does in the United States.  In new applications, it's often better
+to use numeric UT offsets like '-0600' instead of time zone
+abbreviations like 'CST'; this avoids the ambiguity.
 
 
 ----- Accuracy of the tz database -----
@@ -358,194 +358,197 @@ creation of zones merely because two locations differ in LMT or
 transitioned to standard time at different dates.
 
 
------ Names of time zone rule files -----
+----- Time and date functions -----
 
-The time zone rule file naming conventions attempt to strike a balance
-among the following goals:
+The tz code contains time and date functions that are upwards
+compatible with those of POSIX.
+
+POSIX has the following properties and limitations.
+
+*	In POSIX, time display in a process is controlled by the
+	environment variable TZ.  Unfortunately, the POSIX TZ string takes
+	a form that is hard to describe and is error-prone in practice.
+	Also, POSIX TZ strings can't deal with other (for example, Israeli)
+	daylight saving time rules, or situations where more than two
+	time zone abbreviations are used in an area.
+
+	The POSIX TZ string takes the following form:
+
+		stdoffset[dst[offset][,date[/time],date[/time]]]
 
- * Uniquely identify every national region where clocks have all
-   agreed since 1970.  This is essential for the intended use: static
-   clocks keeping local civil time.
+	where:
+
+	std and dst
+		are 3 or more characters specifying the standard
+		and daylight saving time (DST) zone names.
+		Starting with POSIX.1-2001, std and dst may also be
+		in a quoted form like "<UTC+10>"; this allows
+		"+" and "-" in the names.
+	offset
+		is of the form '[+-]hh:[mm[:ss]]' and specifies the
+		offset west of UT.  'hh' may be a single digit; 0<=hh<=24.
+		The default DST offset is one hour ahead of standard time.
+	date[/time],date[/time]
+		specifies the beginning and end of DST.  If this is absent,
+		the system supplies its own rules for DST, and these can
+		differ from year to year; typically US DST rules are used.
+	time
+		takes the form 'hh:[mm[:ss]]' and defaults to 02:00.
+		This is the same format as the offset, except that a
+		leading '+' or '-' is not allowed.
+	date
+		takes one of the following forms:
+		Jn (1<=n<=365)
+			origin-1 day number not counting February 29
+		n (0<=n<=365)
+			origin-0 day number counting February 29 if present
+		Mm.n.d (0[Sunday]<=d<=6[Saturday], 1<=n<=5, 1<=m<=12)
+			for the dth day of week n of month m of the year,
+			where week 1 is the first week in which day d appears,
+			and '5' stands for the last week in which day d appears
+			(which may be either the 4th or 5th week).
+			Typically, this is the only useful form;
+			the n and Jn forms are rarely used.
 
- * Indicate to humans as to where that region is.  This simplifies use.
+	Here is an example POSIX TZ string, for US Pacific time using rules
+	appropriate from 1987 through 2006:
 
- * Be robust in the presence of political changes.  This reduces the
-   number of updates and backward-compatibility hacks.  For example,
-   names of countries are ordinarily not used, to avoid
-   incompatibilities when countries change their name
-   (e.g. Zaire->Congo) or when locations change countries
-   (e.g. Hong Kong from UK colony to China).
+		TZ='PST8PDT,M4.1.0/02:00,M10.5.0/02:00'
 
- * Be portable to a wide variety of implementations.
-   This promotes use of the technology.
+	This POSIX TZ string is hard to remember, and mishandles time stamps
+	before 1987 and after 2006.  With this package you can use this
+	instead:
 
- * Use a consistent naming convention over the entire world.
-   This simplifies both use and maintenance.
+		TZ='America/Los_Angeles'
 
-This naming convention is not intended for use by inexperienced users
-to select TZ values by themselves (though they can of course examine
-and reuse existing settings).  Distributors should provide
-documentation and/or a simple selection interface that explains the
-names; see the 'tzselect' program supplied with this distribution for
-one example.
+*	POSIX does not define the exact meaning of TZ values like "EST5EDT".
+	Typically the current US DST rules are used to interpret such values,
+	but this means that the US DST rules are compiled into each program
+	that does time conversion.  This means that when US time conversion
+	rules change (as in the United States in 1987), all programs that
+	do time conversion must be recompiled to ensure proper results.
 
-Names normally have the form AREA/LOCATION, where AREA is the name
-of a continent or ocean, and LOCATION is the name of a specific
-location within that region.  North and South America share the same
-area, 'America'.  Typical names are 'Africa/Cairo', 'America/New_York',
-and 'Pacific/Honolulu'.
+*	In POSIX, there's no tamper-proof way for a process to learn the
+	system's best idea of local wall clock.  (This is important for
+	applications that an administrator wants used only at certain times -
+	without regard to whether the user has fiddled the "TZ" environment
+	variable.  While an administrator can "do everything in UTC" to get
+	around the problem, doing so is inconvenient and precludes handling
+	daylight saving time shifts - as might be required to limit phone
+	calls to off-peak hours.)
 
-Here are the general rules used for choosing location names,
-in decreasing order of importance:
+*	POSIX requires that systems ignore leap seconds.
 
-	Use only valid POSIX file name components (i.e., the parts of
-		names other than '/').  Do not use the file name
-		components '.' and '..'.  Within a file name component,
-		use only ASCII letters, '.', '-' and '_'.  Do not use
-		digits, as that might create an ambiguity with POSIX
-		TZ strings.  A file name component must not exceed 14
-		characters or start with '-'.  E.g., prefer 'Brunei'
-		to 'Bandar_Seri_Begawan'.  Exceptions: see the discussion
-		of legacy names below.
-	A name must not be empty, or contain '//', or start or end with '/'.
-	Do not use names that differ only in case.  Although the reference
-		implementation is case-sensitive, some other implementations
-		are not, and they would mishandle names differing only in case.
-	If one name A is an initial prefix of another name AB (ignoring case),
-		then B must not start with '/', as a regular file cannot have
-		the same name as a directory in POSIX.  For example,
-		'America/New_York' precludes 'America/New_York/Bronx'.
-	Uninhabited regions like the North Pole and Bouvet Island
-		do not need locations, since local time is not defined there.
-	There should typically be at least one name for each ISO 3166-1
-		officially assigned two-letter code for an inhabited country
-		or territory.
-	If all the clocks in a region have agreed since 1970,
-		don't bother to include more than one location
-		even if subregions' clocks disagreed before 1970.
-		Otherwise these tables would become annoyingly large.
-	If a name is ambiguous, use a less ambiguous alternative;
-		e.g. many cities are named San Jos�� and Georgetown, so
-		prefer 'Costa_Rica' to 'San_Jose' and 'Guyana' to 'Georgetown'.
-	Keep locations compact.  Use cities or small islands, not countries
-		or regions, so that any future time zone changes do not split
-		locations into different time zones.  E.g. prefer 'Paris'
-		to 'France', since France has had multiple time zones.
-	Use mainstream English spelling, e.g. prefer 'Rome' to 'Roma', and
-		prefer 'Athens' to the Greek '����������' or the Romanized 'Ath��na'.
-		The POSIX file name restrictions encourage this rule.
-	Use the most populous among locations in a zone,
-		e.g. prefer 'Shanghai' to 'Beijing'.  Among locations with
-		similar populations, pick the best-known location,
-		e.g. prefer 'Rome' to 'Milan'.
-	Use the singular form, e.g. prefer 'Canary' to 'Canaries'.
-	Omit common suffixes like '_Islands' and '_City', unless that
-		would lead to ambiguity.  E.g. prefer 'Cayman' to
-		'Cayman_Islands' and 'Guatemala' to 'Guatemala_City',
-		but prefer 'Mexico_City' to 'Mexico' because the country
-		of Mexico has several time zones.
-	Use '_' to represent a space.
-	Omit '.' from abbreviations in names, e.g. prefer 'St_Helena'
-		to 'St._Helena'.
-	Do not change established names if they only marginally
-		violate the above rules.  For example, don't change
-		the existing name 'Rome' to 'Milan' merely because
-		Milan's population has grown to be somewhat greater
-		than Rome's.
-	If a name is changed, put its old spelling in the 'backward' file.
-		This means old spellings will continue to work.
+*	The tz code attempts to support all the time_t implementations
+	allowed by POSIX.  The time_t type represents a nonnegative count of
+	seconds since 1970-01-01 00:00:00 UTC, ignoring leap seconds.
+	In practice, time_t is usually a signed 64- or 32-bit integer; 32-bit
+	signed time_t values stop working after 2038-01-19 03:14:07 UTC, so
+	new implementations these days typically use a signed 64-bit integer.
+	Unsigned 32-bit integers are used on one or two platforms,
+	and 36-bit and 40-bit integers are also used occasionally.
+	Although earlier POSIX versions allowed time_t to be a
+	floating-point type, this was not supported by any practical
+	systems, and POSIX.1-2013 and the tz code both require time_t
+	to be an integer type.
 
-The file 'zone1970.tab' lists geographical locations used to name time
-zone rule files.  It is intended to be an exhaustive list of names
-for geographic regions as described above; this is a subset of the
-names in the data.  Although a 'zone1970.tab' location's longitude
-corresponds to its LMT offset with one hour for every 15 degrees east
-longitude, this relationship is not exact.
+These are the extensions that have been made to the POSIX functions:
 
-Older versions of this package used a different naming scheme,
-and these older names are still supported.
-See the file 'backward' for most of these older names
-(e.g., 'US/Eastern' instead of 'America/New_York').
-The other old-fashioned names still supported are
-'WET', 'CET', 'MET', and 'EET' (see the file 'europe').
+*	The "TZ" environment variable is used in generating the name of a file
+	from which time zone information is read (or is interpreted a la
+	POSIX); "TZ" is no longer constrained to be a three-letter time zone
+	name followed by a number of hours and an optional three-letter
+	daylight time zone name.  The daylight saving time rules to be used
+	for a particular time zone are encoded in the time zone file;
+	the format of the file allows U.S., Australian, and other rules to be
+	encoded, and allows for situations where more than two time zone
+	abbreviations are used.
 
-Older versions of this package defined legacy names that are
-incompatible with the first rule of location names, but which are
-still supported.  These legacy names are mostly defined in the file
-'etcetera'.  Also, the file 'backward' defines the legacy names
-'GMT0', 'GMT-0', 'GMT+0' and 'Canada/East-Saskatchewan', and the file
-'northamerica' defines the legacy names 'EST5EDT', 'CST6CDT',
-'MST7MDT', and 'PST8PDT'.
+	It was recognized that allowing the "TZ" environment variable to
+	take on values such as "America/New_York" might cause "old" programs
+	(that expect "TZ" to have a certain form) to operate incorrectly;
+	consideration was given to using some other environment variable
+	(for example, "TIMEZONE") to hold the string used to generate the
+	time zone information file name.  In the end, however, it was decided
+	to continue using "TZ": it is widely used for time zone purposes;
+	separately maintaining both "TZ" and "TIMEZONE" seemed a nuisance;
+	and systems where "new" forms of "TZ" might cause problems can simply
+	use TZ values such as "EST5EDT" which can be used both by
+	"new" programs (a la POSIX) and "old" programs (as zone names and
+	offsets).
 
-Excluding 'backward' should not affect the other data.  If
-'backward' is excluded, excluding 'etcetera' should not affect the
-remaining data.
+*	To handle places where more than two time zone abbreviations are used,
+	the functions "localtime" and "gmtime" set tzname[tmp->tm_isdst]
+	(where "tmp" is the value the function returns) to the time zone
+	abbreviation to be used.  This differs from POSIX, where the elements
+	of tzname are only changed as a result of calls to tzset.
 
+*	Since the "TZ" environment variable can now be used to control time
+	conversion, the "daylight" and "timezone" variables are no longer
+	needed.  (These variables are defined and set by "tzset"; however, their
+	values will not be used by "localtime.")
 
------ Time zone abbreviations -----
+*	The "localtime" function has been set up to deliver correct results
+	for near-minimum or near-maximum time_t values.  (A comment in the
+	source code tells how to get compatibly wrong results).
 
-When this package is installed, it generates time zone abbreviations
-like 'EST' to be compatible with human tradition and POSIX.
-Here are the general rules used for choosing time zone abbreviations,
-in decreasing order of importance:
+*	A function "tzsetwall" has been added to arrange for the system's
+	best approximation to local wall clock time to be delivered by
+	subsequent calls to "localtime."  Source code for portable
+	applications that "must" run on local wall clock time should call
+	"tzsetwall();" if such code is moved to "old" systems that don't
+	provide tzsetwall, you won't be able to generate an executable program.
+	(These time zone functions also arrange for local wall clock time to be
+	used if tzset is called - directly or indirectly - and there's no "TZ"
+	environment variable; portable applications should not, however, rely
+	on this behavior since it's not the way SVR2 systems behave.)
 
-	Use abbreviations that consist of three or more ASCII letters.
-		Previous editions of this database also used characters like
-		' ' and '?', but these characters have a special meaning to
-		the shell and cause commands like
-			set `date`
-		to have unexpected effects.
-		Previous editions of this rule required upper-case letters,
-		but the Congressman who introduced Chamorro Standard Time
-		preferred "ChST", so the rule has been relaxed.
+*	Negative time_t values are supported, on systems where time_t is signed.
 
-		This rule guarantees that all abbreviations could have
-		been specified by a POSIX TZ string.  POSIX
-		requires at least three characters for an
-		abbreviation.  POSIX through 2000 says that an abbreviation
-		cannot start with ':', and cannot contain ',', '-',
-		'+', NUL, or a digit.  POSIX from 2001 on changes this
-		rule to say that an abbreviation can contain only '-', '+',
-		and alphanumeric characters from the portable character set
-		in the current locale.  To be portable to both sets of
-		rules, an abbreviation must therefore use only ASCII
-		letters.
+*	These functions can account for leap seconds, thanks to Bradley White.
 
-	Use abbreviations that are in common use among English-speakers,
-		e.g. 'EST' for Eastern Standard Time in North America.
-		We assume that applications translate them to other languages
-		as part of the normal localization process; for example,
-		a French application might translate 'EST' to 'HNE'.
+Points of interest to folks with other systems:
 
-	For zones whose times are taken from a city's longitude, use the
-		traditional xMT notation, e.g. 'PMT' for Paris Mean Time.
-		The only name like this in current use is 'GMT'.
+*	This package is already part of many POSIX-compliant hosts,
+	including BSD, HP, Linux, Network Appliance, SCO, SGI, and Sun.
+	On such hosts, the primary use of this package
+	is to update obsolete time zone rule tables.
+	To do this, you may need to compile the time zone compiler
+	'zic' supplied with this package instead of using the system 'zic',
+	since the format of zic's input changed slightly in late 1994,
+	and many vendors still do not support the new input format.
 
-	If there is no common English abbreviation, abbreviate the English
-		translation of the usual phrase used by native speakers.
-		If this is not available or is a phrase mentioning the country
-		(e.g. "Cape Verde Time"), then:
+*	The UNIX Version 7 "timezone" function is not present in this package;
+	it's impossible to reliably map timezone's arguments (a "minutes west
+	of GMT" value and a "daylight saving time in effect" flag) to a
+	time zone abbreviation, and we refuse to guess.
+	Programs that in the past used the timezone function may now examine
+	tzname[localtime(&clock)->tm_isdst] to learn the correct time
+	zone abbreviation to use.  Alternatively, use
+	localtime(&clock)->tm_zone if this has been enabled.
 
-		When a country is identified with a single or principal zone,
-			append 'T' to the country's ISO	code, e.g. 'CVT' for
-			Cape Verde Time.  For summer time append 'ST';
-			for double summer time append 'DST'; etc.
-		Otherwise, take the first three letters of an English place
-			name identifying each zone and append 'T', 'ST', etc.
-			as before; e.g. 'VLAST' for VLAdivostok Summer Time.
+*	The 4.2BSD gettimeofday function is not used in this package.
+	This formerly let users obtain the current UTC offset and DST flag,
+	but this functionality was removed in later versions of BSD.
 
-	Use 'LMT' for local mean time of locations before the introduction
-		of standard time; see "Scope of the tz database".
+*	In SVR2, time conversion fails for near-minimum or near-maximum
+	time_t values when doing conversions for places that don't use UT.
+	This package takes care to do these conversions correctly.
 
-	Use UT (with time zone abbreviation 'zzz') for locations while
-		uninhabited.  The 'zzz' mnemonic is that these locations are,
-		in some sense, asleep.
+The functions that are conditionally compiled if STD_INSPIRED is defined
+should, at this point, be looked on primarily as food for thought.  They are
+not in any sense "standard compatible" - some are not, in fact, specified in
+*any* standard.  They do, however, represent responses of various authors to
+standardization proposals.
 
-Application writers should note that these abbreviations are ambiguous
-in practice: e.g. 'CST' has a different meaning in China than
-it does in the United States.  In new applications, it's often better
-to use numeric UT offsets like '-0600' instead of time zone
-abbreviations like 'CST'; this avoids the ambiguity.
+Other time conversion proposals, in particular the one developed by folks at
+Hewlett Packard, offer a wider selection of functions that provide capabilities
+beyond those provided here.  The absence of such functions from this package
+is not meant to discourage the development, standardization, or use of such
+functions.  Rather, their absence reflects the decision to make this package
+contain valid extensions to POSIX, to ensure its broad acceptability.  If
+more powerful time conversion functions can be standardized, so much the
+better.
 
 
 ----- Calendrical issues -----
@@ -766,6 +769,11 @@ Tom Chmielewski, "Jet Lag Is Worse on Mars", The Atlantic (2015-02-26)
 <http://www.theatlantic.com/technology/archive/2015/02/jet-lag-is-worse-on-mars/386033/>
 
 -----
+
+This file is in the public domain, so clarified as of 2009-05-17 by
+Arthur David Olson.
+
+-----
 Local Variables:
 coding: utf-8
 End:
-- 
2.1.4

