Re: Position of rules definitions

May 3, 2007

      The current tz parsers seem to go quite fast, so it doesn't seem worth 
the trouble to optimize the format. Also keep in mind that the O(logN) 
searches don't need to exist, as hashtables are often used instead to 
manage symbols. This optimization is used by one and two pass parsers.

Since the full set of tz rules is fairly small, the parser can read all 
the rules in memory and process them there. The biggest cost is disk I/O 
for the first pass, and the second pass is just in memory and costs 
nothing in comparison.

I'm not too sure how zic works, but the tz parser I wrote does 
everything I described above.

Liviu Nicoara wrote:
...
Allow me to offer a performance reason which would probably show where 
I am heading; a two-step parser would have to deal with:
1. Failure to resolve an undefined reference at a cost of O(logN) per 
search in a set of N rules.
2. Creating a set of unresolved references - at a total cost of O(M) 
for M unresolved references if the set is unordered (plus a constant 
if necessary adornment of the dangling reference is needed, etc.).
3. Search the [smaller] set of unresolved references and for each one 
(M) perform a search at a minimum cost of O(logN) for a set of N 
defined rules.
(The above with fairly basic data structures and algorithms and 
further optimization may be possible.)
A one-step parser would only incur the penalty of step 1 because it 
would always find the [already] defined rule.
All of the above may seem academic, and indeed the sets are small 
enough that we could ignore, e.g., the differences between linear and 
binary searches, but it just does not seem right. If all we are 
challenging is convenience, then I believe it is worth a second look 
regardless of the capabilities of existing parsers.
I hope I have not offended anyone by insisting on this relatively 
unimportant issue.

Re: Position of rules definitions

Brian S O'Neill