Below is a complete mapping of identifiers from timezone.xml into a 5-char set of identifiers that produces strings that are distinct from UN/LOCODEs per http://www.unece.org/fileadmin/DAM/cefact/locode/unlocode_manual.pdf 3.2.1 "However, where all permutations available for a country have been exhausted, the numerals 2-9 may also be used." On Sat, May 26, 2012 at 9:59 PM, Tobias Conradi <tobias.conradi@gmail.com> wrote:
Steven, Mark,
I checked the latest timezone.xml contained in core.zip linked from http://cldr.unicode.org/index/bcp47-extension ... Since UN/LOCODE doesn't use the numbers 0 and 1, I created private codes using "1" in third position, so for Santa Isabel I would use
MX1SI or in lower case mx1si
for Hebron PS1HB, Gaza PS1GZ
That way the codes all can be of the same length, namely 5 characters.
The utc based codes could be converted to 5 char too, replacing utc with zz: utce01 -> zze01 utcw12 -> zzw12 UTC itself could be: utc -> zz000 Unkown could be: unk -> zzunk or zz1un The use of 0 and 1 ensure there is no clash with UN/LOCODEs. Here are some more possible mappings for identifiers that are not 5 char long: usndnsl -> usnqy (UN/LOCODE USNQY) usndcnt -> uszt8 (UN/LOCODE USZT8) Handmade codes using "1" and as of assignment using the correct ISO 3166-1 alpha-2 code: gaza -> ps1gz gldkshvn -> gl1dm hebron -> ps1hb jeruslm -> il1jr mxstis -> mx1si usnavajo -> us1nv usinvev -> us1vv That would leave only four US specific codes: cst6cdt est5edt mst7mdt pst8pdt In case they could be changed, they could be: us1c6 us1e5 us1m7 us1p8 -- Tobias Conradi Rheinsberger Str. 18 10115 Berlin Germany http://tobiasconradi.com/