In the southamerica data file we have several Latin-1 encoded non-ASCII characters.

The other data files are 100% ASCII

Here they are:

$ grep -nP "[\\x80-\\xff]" *
southamerica:384:# There's also a note in only one of the major national papers (La Naci�n) at
southamerica:390:#  (...) anunci� que el pr�ximo domingo a las 00:00 los puntanos deber�n
southamerica:393:# A partir de entonces, San Luis establecer� el huso horario propio de
southamerica:395:# 2009, el cambio horario quedar� comprendido entre las 00:00 del tercer
southamerica:396:# domingo de marzo y las 24:00 del segundo s�bado de octubre.
southamerica:815:# I just send a e-mail to Zulmira Brand�o at

I think it's fine to allow non-ASCII in comments, but would strongly request that the files be UTF-8 encoded. Anything else leads to immense confusion over what charset is in use.

  -- Andy