In the southamerica data file we have several Latin-1 encoded non-ASCII characters. The other data files are 100% ASCII Here they are: $ grep -nP "[\\x80-\\xff]" * southamerica:384:# There's also a note in only one of the major national papers (La Naci�n) at southamerica:390:# (...) anunci� que el pr�ximo domingo a las 00:00 los puntanos deber�n southamerica:393:# A partir de entonces, San Luis establecer� el huso horario propio de southamerica:395:# 2009, el cambio horario quedar� comprendido entre las 00:00 del tercer southamerica:396:# domingo de marzo y las 24:00 del segundo s�bado de octubre. southamerica:815:# I just send a e-mail to Zulmira Brand�o at I think it's fine to allow non-ASCII in comments, but would strongly request that the files be UTF-8 encoded. Anything else leads to immense confusion over what charset is in use. -- Andy