The current tz parsers seem to go quite fast, so it doesn't seem worth the trouble to optimize the format. Also keep in mind that the O(logN) searches don't need to exist, as hashtables are often used instead to manage symbols. This optimization is used by one and two pass parsers. Since the full set of tz rules is fairly small, the parser can read all the rules in memory and process them there. The biggest cost is disk I/O for the first pass, and the second pass is just in memory and costs nothing in comparison. I'm not too sure how zic works, but the tz parser I wrote does everything I described above. Liviu Nicoara wrote:
Allow me to offer a performance reason which would probably show where I am heading; a two-step parser would have to deal with:
1. Failure to resolve an undefined reference at a cost of O(logN) per search in a set of N rules. 2. Creating a set of unresolved references - at a total cost of O(M) for M unresolved references if the set is unordered (plus a constant if necessary adornment of the dangling reference is needed, etc.). 3. Search the [smaller] set of unresolved references and for each one (M) perform a search at a minimum cost of O(logN) for a set of N defined rules.
(The above with fairly basic data structures and algorithms and further optimization may be possible.)
A one-step parser would only incur the penalty of step 1 because it would always find the [already] defined rule.
All of the above may seem academic, and indeed the sets are small enough that we could ignore, e.g., the differences between linear and binary searches, but it just does not seem right. If all we are challenging is convenience, then I believe it is worth a second look regardless of the capabilities of existing parsers.
I hope I have not offended anyone by insisting on this relatively unimportant issue.