Hi all, I noticed that a couple of rules (e.g. EU, Romania) in the Europe file are being referenced in zone definitions before they are defined. Is there a strong reason why their definitions should not precede their usage? Regards, Liviu
Liviu Nicoara <nicoara@roguewave.com> writes:
I noticed that a couple of rules (e.g. EU, Romania) in the Europe file are being referenced in zone definitions before they are defined. Is there a strong reason why their definitions should not precede their usage?
It's listed that way for convenience, which I suppose is not a strong reason. Is there some reason to change things, and put definitions before use?
Paul Eggert wrote:
Liviu Nicoara <nicoara@roguewave.com> writes:
I noticed that a couple of rules (e.g. EU, Romania) in the Europe file are being referenced in zone definitions before they are defined. Is there a strong reason why their definitions should not precede their usage?
It's listed that way for convenience, which I suppose is not a strong reason. Is there some reason to change things, and put definitions before use?
I understand that convenience; the only reason I have for reordering the rules is to favor a one-pass parser. I find it more logical to have the definition of an entity before its first use. Thanks, Liviu
Date: Wed, 02 May 2007 18:12:16 -0400 From: Liviu Nicoara <nicoara@roguewave.com> Message-ID: <46390CC0.5020605@roguewave.com> | I find it more logical to have the | definition of an entity before its first use. It isn't really - in the computer world we've gotten used to it that way because of the demands of early resource scarce parsers (and these days, lazy ones, plus those obeying established rules of course). In other fields, it's normal to see a reference to an object before you seek its definition, or you never have any idea why this object is being defined, and hence have no idea why it exists or why you'd bother caring about its definition - after you've seen a use for the object, you know why you need to know what it is. Definition after use is more logical, it's just harder for the parser to cope with. kre
Robert Elz wrote:
Date: Wed, 02 May 2007 18:12:16 -0400 From: Liviu Nicoara <nicoara@roguewave.com> Message-ID: <46390CC0.5020605@roguewave.com>
| I find it more logical to have the | definition of an entity before its first use.
It isn't really - in the computer world we've gotten used to it that way because of the demands of early resource scarce parsers (and these days, lazy ones, plus those obeying established rules of course).
In other fields, it's normal to see a reference to an object before you seek its definition, or you never have any idea why this object is being defined, and hence have no idea why it exists or why you'd bother caring about its definition - after you've seen a use for the object, you know why you need to know what it is. Definition after use is more logical, it's just harder for the parser to cope with.
I must admit I am agnostic of such fields and, while they might be attractive to others, I prefer to think in the "C/C++" set of mind. But so far it all seems to me a matter of personal preferences with no strong arguments on either side. At least I haven't offered any, because I assumed it would be obvious. Allow me to offer a performance reason which would probably show where I am heading; a two-step parser would have to deal with: 1. Failure to resolve an undefined reference at a cost of O(logN) per search in a set of N rules. 2. Creating a set of unresolved references - at a total cost of O(M) for M unresolved references if the set is unordered (plus a constant if necessary adornment of the dangling reference is needed, etc.). 3. Search the [smaller] set of unresolved references and for each one (M) perform a search at a minimum cost of O(logN) for a set of N defined rules. (The above with fairly basic data structures and algorithms and further optimization may be possible.) A one-step parser would only incur the penalty of step 1 because it would always find the [already] defined rule. All of the above may seem academic, and indeed the sets are small enough that we could ignore, e.g., the differences between linear and binary searches, but it just does not seem right. If all we are challenging is convenience, then I believe it is worth a second look regardless of the capabilities of existing parsers. I hope I have not offended anyone by insisting on this relatively unimportant issue. Thanks, Liviu
At this point there are time zone source files out in the wild in which "Rule" lines appear after "Zone" lines that reference the associated rules. That being true, time zone source parsers should be prepared to cope with such files. And THAT being true, we should probably leave some "out-of-order" stuff in the canonical files to ensure that folks creating alternate parsers will include the capability to handle "out-of-order" stuff. --ado
The current tz parsers seem to go quite fast, so it doesn't seem worth the trouble to optimize the format. Also keep in mind that the O(logN) searches don't need to exist, as hashtables are often used instead to manage symbols. This optimization is used by one and two pass parsers. Since the full set of tz rules is fairly small, the parser can read all the rules in memory and process them there. The biggest cost is disk I/O for the first pass, and the second pass is just in memory and costs nothing in comparison. I'm not too sure how zic works, but the tz parser I wrote does everything I described above. Liviu Nicoara wrote:
Allow me to offer a performance reason which would probably show where I am heading; a two-step parser would have to deal with:
1. Failure to resolve an undefined reference at a cost of O(logN) per search in a set of N rules. 2. Creating a set of unresolved references - at a total cost of O(M) for M unresolved references if the set is unordered (plus a constant if necessary adornment of the dangling reference is needed, etc.). 3. Search the [smaller] set of unresolved references and for each one (M) perform a search at a minimum cost of O(logN) for a set of N defined rules.
(The above with fairly basic data structures and algorithms and further optimization may be possible.)
A one-step parser would only incur the penalty of step 1 because it would always find the [already] defined rule.
All of the above may seem academic, and indeed the sets are small enough that we could ignore, e.g., the differences between linear and binary searches, but it just does not seem right. If all we are challenging is convenience, then I believe it is worth a second look regardless of the capabilities of existing parsers.
I hope I have not offended anyone by insisting on this relatively unimportant issue.
Brian S O'Neill wrote:
The current tz parsers seem to go quite fast, so it doesn't seem worth the trouble to optimize the format. Also keep in mind that the O(logN) searches don't need to exist, as hashtables are often used instead to manage symbols. This optimization is used by one and two pass parsers.
Since the full set of tz rules is fairly small, the parser can read all the rules in memory and process them there. The biggest cost is disk I/O for the first pass, and the second pass is just in memory and costs nothing in comparison.
I'm not too sure how zic works, but the tz parser I wrote does everything I described above.
I guess Arthur previously settled it on the grounds of compatibility with user-defined data. :-) Works for me. Thanks, Liviu
participants (5)
-
Brian S O'Neill -
Liviu Nicoara -
Olson, Arthur David (NIH/NCI) [E] -
Paul Eggert -
Robert Elz