Tom Lane wrote:
I doubt we'd do better with a different hash.
We can do a bit better; the attached patches uses a hash that shrinks the size of tzdata.zi by about 0.5% compared to the method used in 2018e. This hash should also avoid needless churn during updates.
if the ruleset syntax is ever expanded to make punctuation have some other meaning, the existing compression rule is going to cause forward-compatibility problems.
Good point. The data entries are already using some punctuation characters as Rule names and so these characters are fair game, but we should reserve some of the never-used characters. The attached proposed patches reserve the characters in "!$%&'()*,/:;<=>?@[\]^`{|}~", unless quoted. (However, this restriction is not enforced by zic in the attached patches.) The attached patches also require Rule names to begin with a character that is not a digit, -, +, or white space; zic already rejected the empty string (this was not documented) and there were ambiguities if one of these characters started a Rule name so I added a check for this to zic. If anybody uses unusual Rule names, now's a good time to speak up.