Re: [tz] Simplification and unification of scheme:// anchors

Jan. 30, 2013

      On 2013-01-30 14:02, Steffen Daode Nurpmeso wrote:
...
Ian Abbott <abbotti@mev.co.uk> wrote:
  |On 2013-01-30 11:28, Steffen Daode Nurpmeso wrote:
  |> Ian Abbott <abbotti@mev.co.uk> wrote:
  |>|While on the subject, the backslash escapes at the ends of the lines
  |>|with a <URL> with a parenthesised comment on the following line is kind
  |>|of ugly.  I'm sure it must be possible to re-work your script to avoid
  |>|the need for that.  (I.e. if a line ends with a <URL> plus optional
  |>|whitespace, check if the following line starts with optional whitespace
  |>|plus parenthesised link text.)
  |>
  |> Hmm.
  |> So i've reworked the (Pod-less) script to support multiple follow
  |> lines in the middle of nowhere, and changed the two links from
  |> which i remembered that it did matter.
  |>
  |> This updated version also fixes the "trailing empty line after
  |> rules are included in data boxes" issue.
  |> And it uses normal text paragraphs for the comment text, forcing
  |> newline breaks via <br />, instead of using preformatted text for
  |> that, which makes it even nicer, since some of the dramatically
  |> long links will now be wrapped by browsers.
  |
  |Self closing tags such as <br /> are only legal in xhtml, not plain
  |html, so you'll need to output a XML declaration and a DOCTYPE in your
  |script.
That is indeed a good point, it must be '<br>'.
That depends what DOCTYPE you decide to use.

There are various other things wrong with the output, such as '&', '<' 
and '>' not being turned into the entities '&', '<' and '>'. 
Note that if doing that, you'd need to make sure not to convert the 
existing entities such as 'á' into '&aacute;'.  That would be 
easier if the existing HTML entities were converted to UTF-8 sequences 
first!

(There are also a few odd-ball bits of mark-up in the original text, 
such as <e'> which need to be dealt with by a separate patch to the data 
files, e.g. to replace <e'> with the HTML entity é or by the 
UTF-8 sequence é if going down the UTF-8 road.)

Also, validator.w3.org is your friend!
...
|>   # For more about the first ten years of DST in the United States, see
  |>   # Robert Garland's <http://www.clpgh.org/exhibit/dst.html> \
  |> -# (``Ten years of daylight saving from the Pittsburgh standpoint'', \
  |. Carnegie Library of Pittsburgh, 1927).
  |> +# (``Ten years of daylight saving from the Pittsburgh standpoint'', \
  |> +# Carnegie Library of Pittsburgh, 1927).
  |
  |It would still be great to get rid of the backslash line continuations
  |and modify the script to work without them.
:)
I personally like it explicit and would definitely go for the L<><>
syntax i've used first, since it is completely unambiguous.
There's also the MediaWiki style for external links, e.g.:

[http://www.foobar.org/baz.html Meaningful link text]

which is not too unreadable, but less readable than having the 
Meaningful link text in parentheses.  For long URLs, it might be split 
like this:

[http://www.foobar.org/baz.html
Meaningful link text]

or even:

[http://www.foobar.org/baz.html Meaningful
link
text]

which should be fine as long as the Meaningful link text contains no ']' 
characters (or at least no unmatched ']' characters if matched pairs of 
'[' and ']' are to be allowed).
...
I would also spend some more time and convert the many "headlines" that
yet exist in the comments to enough markup to get to something
real; in fact with not that much effort, maybe a weekend, it would
be possible to adjust the comments so that the script could use
indents, lists and normal paragraphs without any <br> at all;
then the Pod-way (any many others, too) could be pursued,
also leading to cross-referenced PDF output -- and that is
something that would surely be interesting for some people, as
i suppose.
It depends how much mark-up people are willing to put up with in the 
tzdata files, but I suspect not very much, if any!  The primary method 
for viewing the tzdata files should be the plain text originals, not the 
output from some fancy converter.
...
But the idea ypu proposed won't work with git(1), since trailing
whitespace is a no-go; right?
You shouldn't need trailing anything, right?  If the line ends with URL, 
see if the next line(s) contains the start of the link text before you 
decide to output the <br /> or whatever.

-- 
-=( Ian Abbott @ MEV Ltd.    E-mail: <abbotti@mev.co.uk>        )=-
-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587         )=-