Pulling the data apart ...
Paul ... I've started on phase two which is extracting the data, but I'm falling at the first hurdle. Should there be a tab character between 'Zone' and the name? I've found many entries in the africa file which don't and it is messing up my 'csv' splitter. Just need a special rule when 'Zone' is found, but looks a little inconsistent with respect to the following lines? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
On 09/13/2013 11:38 AM, Lester Caine wrote:
Should there be a tab character between 'Zone' and the name?
No, just white space, i.e., a sequence of one or more white space characters. These are spaces, tabs, carriage-returns, form-feeds, or vertical tabs. In practice only spaces and tabs are used, and perhaps we should tighten up the spec along those lines.
Paul Eggert wrote:
On 09/13/2013 11:38 AM, Lester Caine wrote:
Should there be a tab character between 'Zone' and the name?
No, just white space, i.e., a sequence of one or more white space characters. These are spaces, tabs, carriage-returns, form-feeds, or vertical tabs. In practice only spaces and tabs are used, and perhaps we should tighten up the spec along those lines.
The bulk of the data loaded cleanly into a spreadsheet using tabs, and it would not take long to clean up, so I was hoping to get away with the csv library to process records. Can I at least assume that there are no plans to split records across line boundaries? So each one is simply a new line ... -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
You could also run it through a regex (search for \s+ or [:space:]+ depending on your syntax, replace it with \t) and feed the result to your csv library. One thing to be aware of: the fields in zone-continuation lines are one step offset from the zone-start lines. That can also be fixed with a regex: replace \n\t with \n\t\t after doing the previous operation. Andy Lipscomb, CPA • ABV, ASA Senior Financial Analyst Decosimo Advisory Services Direct: 423-266-0292 Phone: 423-756-7100 -----Original Message----- From: tz-bounces@iana.org [mailto:tz-bounces@iana.org] On Behalf Of Lester Caine Sent: Friday 13 September 2013 15:26 To: Time Zone Mailing List Subject: Re: [tz] Pulling the data apart ... Paul Eggert wrote:
On 09/13/2013 11:38 AM, Lester Caine wrote:
Should there be a tab character between 'Zone' and the name?
No, just white space, i.e., a sequence of one or more white space characters. These are spaces, tabs, carriage-returns, form-feeds, or vertical tabs. In practice only spaces and tabs are used, and perhaps we should tighten up the spec along those lines.
The bulk of the data loaded cleanly into a spreadsheet using tabs, and it would not take long to clean up, so I was hoping to get away with the csv library to process records. Can I at least assume that there are no plans to split records across line boundaries? So each one is simply a new line ... -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
On Fri, 13 Sep 2013, Lester Caine wrote:
I've started on phase two which is extracting the data, but I'm falling at the first hurdle. Should there be a tab character between 'Zone' and the name?
The input format is documented in the zic(8) man page. Here's the relevant part: Input lines are made up of fields. Fields are separated from one another by any number of white space characters. White space within a line is unlikely to be anything other than spaces and tabs, but strictly speaking form feed, vertical tab, carriage return, and line feed (a.k.a. newline), are also white space characters, and any of these except newline may appear within a line. Realistically, you can expect any combination of one or more spaces and tabs to appear between 'Zone' and the name, or between any other fields. --apb (Alan Barrett)
On 2013-09-13 18:32, Alan Barrett wrote:
The input format is documented in the zic(8) man page. Here's the relevant part: Input lines are made up of fields. Fields are separated from one another by any number of white space characters.
Actually, "one or more" seems to be meant rather than "any number" (which would include zero). Michael Deckers.
On Sun, Sep 15, 2013, at 12:23, Michael Deckers wrote:
On 2013-09-13 18:32, Alan Barrett wrote:
The input format is documented in the zic(8) man page. Here's the relevant part: Input lines are made up of fields. Fields are separated from one another by any number of white space characters.
Actually, "one or more" seems to be meant rather than "any number" (which would include zero).
It would also include negative numbers, and nonintegers, and complex numbers. The fact that they're _separated_ implies there is at least one character, because if not then there is no separation.
participants (6)
-
Alan Barrett -
Andy Lipscomb -
Lester Caine -
Michael Deckers -
Paul Eggert -
random832@fastmail.us