Data loss on FTP Server
Hello, I've been using the FTP server to access Timezone Info for a while now and it seems the directory I was using (/tz/data) has lost all of its files. Additionally, I'm unable to view the data for the most recent release ( tzdb-2017c/ <ftp://ftp.iana.org/tz/tzdb-2017c/>). Has anything changed recently? Thanks, Nate
Hello, "tz-bounces@iana.org on behalf of Nathan Winters" <tz-bounces@iana.org on behalf of nate.winters13@gmail.com> wrote: I've been using the FTP server to access Timezone Info for a while now and it seems the directory I was using (/tz/data) has lost all of its files. Additionally, I'm unable to view the data for the most recent release (tzdb-2017c/[ftp.iana.org] <https://urldefense.proofpoint.com/v2/url?u=ftp-3A__ftp.iana.org_tz_tzdb-2D20...>). Has anything changed recently? Please try again now, it should be resolved. kim
On 10/24/2017 12:38 PM, Kim Davies wrote:
Please try again now, it should be resolved. Thanks for fixing that.
I no longer recommend FTP to get tzdb. FTP has trouble with firewalls, does not support caching or accelerators, has an aging software base, and has real problems with man-in-the-middle attacks. Although we don't have immediate plans to remove the FTP servers, anybody who's currently using them should put "switch to HTTPS for tzdb access" on their to-do list, as I expect the FTP servers will be on their way out sooner or later. For more about FTP and its problems, please see: Springall D, Durumeric Z, Halderman JA. FTP: the forgotten cloud. 46th DSN. 2016;503-13. https://dx.doi.org/10.1109/DSN.2016.52
Paul Eggert wrote:
I no longer recommend FTP to get tzdb.
HTTPS is fine for retrieving a specific release, but FTP offers a couple of other facilities that AFAICS HTTPS doesn't provide. Specifically, by FTP I can enumerate old releases (NLST on the /tz/releases directory), and I can identify the latest release (by processing the directory listing). I've automated those jobs. Is there a recommended way to do these things through the HTTPS interface? (Scraping a human-oriented web page isn't an attractive approach.) -zefram
On 2017-10-24 16:26, Zefram wrote:
Paul Eggert wrote:
I no longer recommend FTP to get tzdb.
HTTPS is fine for retrieving a specific release, but FTP offers a couple of other facilities that AFAICS HTTPS doesn't provide. Specifically, by FTP I can enumerate old releases (NLST on the /tz/releases directory), and I can identify the latest release (by processing the directory listing). I've automated those jobs. Is there a recommended way to do these things through the HTTPS interface? (Scraping a human-oriented web page isn't an attractive approach.)
I've used the FTP site to date because I can download the -latest symlinks as files using wget -N to a cache directory, so it only checks the mod date if unchanged, then run readlink on the symlink files to get the actual version filenames, and quit if those are the same as the last downloaded. If the HTTPS repository/releases/ were browsable and/or had public static -latest URIs provided by either server or HTML redirection, I could do similar using curl sans -L to get the latest version URIs, compare those to the last downloaded, and quit if unchanged. I do prefer HTTPS where available, preferably also backed up by directories with decent sha###sums, and/or gpg/pgp .sigs, and/or .ascs, for downloaded file validation, in case of server, storage, or site problems. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
You can also do that with rsync. Here’s my script: rsync --links rsync://rsync.iana.org/tz/tzdata-latest.tar.gz /tmp/tzdata-latest.tar.gz rsync --links rsync://rsync.iana.org/tz/`readlink /tmp/tzdata-latest.tar.gz` ./ rsync --links rsync://rsync.iana.org/tz/tzcode-latest.tar.gz /tmp/tzcode-latest.tar.gz rsync --links rsync://rsync.iana.org/tz/`readlink /tmp/tzcode-latest.tar.gz` ./ Debbie
On Oct 24, 2017, at 4:12 PM, Brian Inglis <Brian.Inglis@SystematicSW.ab.ca> wrote:
On 2017-10-24 16:26, Zefram wrote:
Paul Eggert wrote:
I no longer recommend FTP to get tzdb.
HTTPS is fine for retrieving a specific release, but FTP offers a couple of other facilities that AFAICS HTTPS doesn't provide. Specifically, by FTP I can enumerate old releases (NLST on the /tz/releases directory), and I can identify the latest release (by processing the directory listing). I've automated those jobs. Is there a recommended way to do these things through the HTTPS interface? (Scraping a human-oriented web page isn't an attractive approach.)
I've used the FTP site to date because I can download the -latest symlinks as files using wget -N to a cache directory, so it only checks the mod date if unchanged, then run readlink on the symlink files to get the actual version filenames, and quit if those are the same as the last downloaded.
If the HTTPS repository/releases/ were browsable and/or had public static -latest URIs provided by either server or HTML redirection, I could do similar using curl sans -L to get the latest version URIs, compare those to the last downloaded, and quit if unchanged.
I do prefer HTTPS where available, preferably also backed up by directories with decent sha###sums, and/or gpg/pgp .sigs, and/or .ascs, for downloaded file validation, in case of server, storage, or site problems.
-- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
Deborah Goldsmith <goldsmit@apple.com> wrote: |You can also do that with rsync. Here’s my script: I guess the usual way today is having a -latest thing, which would need to be a hardlink if symbolic links are not allowed, and then only check the timestamp of that file via HTTP, tested via the HTTP HEAD command. But an unformatted directory listing possibility i miss in HTTP too, but have not read HTTP/2 yet. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Quoting Zefram on Tuesday October 24, 2017:
HTTPS is fine for retrieving a specific release, but FTP offers a couple of other facilities that AFAICS HTTPS doesn't provide. Specifically, by FTP I can enumerate old releases (NLST on the /tz/releases directory), and I can identify the latest release (by processing the directory listing). I've automated those jobs. Is there a recommended way to do these things through the HTTPS interface? (Scraping a human-oriented web page isn't an attractive approach.)
I'm happy to explore providing additional metadata in a structured way if that is useful. There is currently some metadata on the current version that is not currently exposed but could be. Is retrieving old versions of the tzdb a common use case? kim
On Oct 24, 2017, at 8:38 PM, Kim Davies <kim.davies@iana.org> wrote:
Quoting Zefram on Tuesday October 24, 2017:
HTTPS is fine for retrieving a specific release, but FTP offers a couple of other facilities that AFAICS HTTPS doesn't provide. Specifically, by FTP I can enumerate old releases (NLST on the /tz/releases directory), and I can identify the latest release (by processing the directory listing). I've automated those jobs. Is there a recommended way to do these things through the HTTPS interface? (Scraping a human-oriented web page isn't an attractive approach.)
I'm happy to explore providing additional metadata in a structured way if that is useful. There is currently some metadata on the current version that is not currently exposed but could be.
Is retrieving old versions of the tzdb a common use case?
Not for me. The reason I use FTP is that I can find the latest file name, so I can track which revision it is. That information exists, I suppose, in the NEWS file but that's in human form, not program-friendly form. A metadata file would be great. XML might be a good format. The minimal content would be, I think, version string (e.g. "2017c") and release date. I can imagine other information that might be attractive, for example the names of countries affected (separate for "future" and "past" timestamps), and perhaps the dates affected (for future, that would be the effective date of the new or changed rule). paul
Paul.Koning@dell.com <Paul.Koning@dell.com> wrote on Wed, 25 Oct 2017 at 00:52:37 +0000 in <440C015B-36CC-44FD-A657-83EF6AAEC81A@dell.com>:
A metadata file would be great. XML might be a good format. The
XML is almost never a good format. This is not a space tzdb should innovate in. (OK, sure, an additional XML file would be OK, but not to rely on...) The obvious answer, since www.iana.org runs Apache, is to turn on Apache's mod_autoindex (https://httpd.apache.org/docs/2.4/mod/mod_autoindex.html) of the directory in question. It's not a standards-track protocol but it's definitely a de facto standard for http directory listings and better than most other choices... --jhawk@mit.edu John Hawkinson
On 24 October 2017 at 20:52, <Paul.Koning@dell.com> wrote:
A metadata file would be great. XML might be a good format. The minimal content would be, I think, version string (e.g. "2017c") and release date. I can imagine other information that might be attractive, for example the names of countries affected (separate for "future" and "past" timestamps), and perhaps the dates affected (for future, that would be the effective date of the new or changed rule).
If only we had some tzdist <https://tools.ietf.org/html/rfc7808> implementations… metadata geared toward answering questions like "what's being affected and when?" was definitely one of the goals there. -- Tim Parenti
Kim wrote:
Quoting Zefram on Tuesday October 24, 2017:
HTTPS is fine for retrieving a specific release, but FTP offers a couple of other facilities that AFAICS HTTPS doesn't provide. Specifically, by FTP I can enumerate old releases [...] Is there a recommended way to do these things through the HTTPS interface? (Scraping a human-oriented web page isn't an attractive approach.)
I'm happy to explore providing additional metadata in a structured way if that is useful. There is currently some metadata on the current version that is not currently exposed but could be.
Is retrieving old versions of the tzdb a common use case?
Depends on what you mean by "common", I suppose. It's certainly a vital one. Just last week I was researching something, and went to the web distribution page to find a copy of tzcode93 and couldn't find it, and was very glad to discover that it was still available on the ftp site. I'm troubled by the number of times I still find myself using ftp (and not just for accessing the tz db, that is!), because it's such a clumsy, old, insecure protocol. But there are a number of things it still does quite well. So I'm with Zefram in lobbying for something comprehensive and machine- as well as human-readable. Steve Summit
On 10/24/2017 05:38 PM, Kim Davies wrote:
Is retrieving old versions of the tzdb a common use case?
Reasonably common, at least for software archaeologists. I would do it all the time if I didn't have the old versions already cached. Though I'm not a typical user.... If the 2017c release is here: https://www.iana.org/time-zones/repository/releases/tzdb-2017c.tar.lz then one might expect this URL to list all the releases: https://www.iana.org/time-zones/repository/releases/ Could we get this to work?
On Oct 25, 2017, at 2:52 PM, Paul Eggert <eggert@CS.UCLA.EDU> wrote:
On 10/24/2017 05:38 PM, Kim Davies wrote:
Is retrieving old versions of the tzdb a common use case?
Reasonably common, at least for software archaeologists. I would do it all the time if I didn't have the old versions already cached. Though I'm not a typical user....
If the 2017c release is here:
https://www.iana.org/time-zones/repository/releases/tzdb-2017c.tar.lz
then one might expect this URL to list all the releases:
https://www.iana.org/time-zones/repository/releases/
Could we get this to work?
That would be good, but it wouldn't solve what is needed for automated tools that do these things, since HTTP index listings are not (sanely) program-parseable. paul
On 2017-10-25 12:52, Paul Eggert wrote:
On 10/24/2017 05:38 PM, Kim Davies wrote:
Is retrieving old versions of the tzdb a common use case?
Reasonably common, at least for software archaeologists. I would do it all the time if I didn't have the old versions already cached. Though I'm not a typical user....
If the 2017c release is here:
https://www.iana.org/time-zones/repository/releases/tzdb-2017c.tar.lz
then one might expect this URL to list all the releases:
https://www.iana.org/time-zones/repository/releases/
Could we get this to work?
That would be a good start, and allow the following part of the web page to be useful: "Distribution We provide access to the Time Zone Database via three methods: HTTP: http://www.iana.org/time-zones" which is currently a self reference to the explicit links above on the web page. If the web site also had HTTP server or HTML redirections from tz{code,data,db}-latest.tar.?z to tz{code,data,db}-yyyyr.tar.?z which returned 304 Not Modified or 307 Temporary Redirect (keep using the original URI) status codes that would allow those who want to get and check explicit versions to quickly quit most days of the year, as we do currently with FTP. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
Brian Inglis <Brian.Inglis@SystematicSw.ab.ca> wrote: |On 2017-10-25 12:52, Paul Eggert wrote: |> On 10/24/2017 05:38 PM, Kim Davies wrote: ... |> If the 2017c release is here: |> |> https://www.iana.org/time-zones/repository/releases/tzdb-2017c.tar.lz ... |If the web site also had HTTP server or HTML redirections from |tz{code,data,db}-latest.tar.?z to tz{code,data,db}-yyyyr.tar.?z |which returned 304 Not Modified or 307 Temporary Redirect (keep using the |original URI) status codes that would allow those who want to get and check |explicit versions to quickly quit most days of the year, as we do currently \ |with |FTP. I did not know it yesterday, but such a thing exists, actually: ?0[steffen@essex tmp]$ s-curl -I https://www.iana.org/time-zones/repository/tzdb-latest.tar.lz HTTP/1.1 200 OK Date: Thu, 26 Oct 2017 13:02:46 GMT X-Frame-Options: SAMEORIGIN Content-Security-Policy: upgrade-insecure-requests Last-Modified: Mon, 23 Oct 2017 15:19:20 GMT Vary: Accept-Encoding Cache-control: public, s-maxage=600, max-age=3600 Expires: Thu, 26 Oct 2017 14:02:46 GMT Content-Type: application/x-lzip Server: Apache Strict-Transport-Security: max-age=47304003; preload X-Cache-Hits: 0 Accept-Ranges: bytes Connection: keep-alive --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Kim Davies wrote:
Is retrieving old versions of the tzdb a common use case?
I frequently look at old versions, to find out what changed and when, to find the origins of ideas, and so on. I'm not a typical user, of course, and my uses for the history can now to some extent be satisfied by git. But the sequence of tzdb releases is a well-defined historical record, and ought to be accessible.
I'm happy to explore providing additional metadata in a structured way if that is useful.
With respect to the current database, I'd like to be able * to determine whether I already have the current version; and * to download the current database while being aware of what version it is. Merely downloading the current version can be accomplished by the use of "-latest" links. Those links used to be insufficient to determine what version the latest is, because until recently the distribution didn't contain anything saying (in a reasonably machine-readable manner) what version it is. However, the "version" file in recent distributions is a solution to this, if it can be relied upon. To determine whether I have the current version, I have been accustomed to looking at the list of files available to determine what the current version is. Currently this means looking at the directory containing all the releases, which is quite a lot of names, so actually not an ideal process. Back in the elsie days, the main directory normally contained only the current version (and had it under versioned filenames, unlike the present "-latest" links), so the list of filenames to look at was quite short. It would be nicer to have some kind of retrievable thing that just directly tells me what the latest version is. With the "version" file in the distribution, I could theoretically just download a "-latest" tarball and read its "version" file, but that's a disproportionately large amount of material to download just to get a version number. (My automation checks for the latest version about a thousand times for each one occasion when there's actually a new version to download.) The ideal would be to have an HTTP(S) URL that gives just the latest version number, as a plain text file in the same format as the "version" file in the distribution. There may be an additional wrinkle due to code and data portions of the distribution having separate version numbers. Historically, and as recently as 2012, a tzdata or tzcode release would be made without the other half, the new version of the database being composed of tzdata and tzcode releases that have different version numbers. Is that still a possibility? The "version" file and the tzdb-*.tar.lz release files give some impression that the two halves are now more tightly tied, and that neither can be released without a matching release of the other. If tzcode and tzdata can still have non-matching version numbers, then rather than acquiring a single "latest version number" I need separate latest version numbers for tzcode and tzdata. With respect to historical versions of the database, I'd like to be able * to determine, for a hypothetical historical version number, whether there is in fact a release with that version number; * to determine, for a genuine historical version number, the separate version numbers of its constituent tzcode and tzdata portions; * to download the tzcode and tzdata tarballs for a specified version number, with signatures where applicable. I can cope with there being a bounded number of historical exceptions to whatever mechanisms we establish, though the tarballs ought to be available in some form right back to the beginning. The important thing is that there should be a reliable mechanism that applies to all new versions as they get added to the historical record. If tzcode and tzdata are henceforth to be released in lockstep, then the whole question of separate version numbers falls under the "bounded number of historical exceptions" rubric. -zefram
I wrote:
There may be an additional wrinkle due to code and data portions of the distribution having separate version numbers.
Having reviewed the mailing list discussion around tzdb.tar.lz, I now understand that the era of desynchronised tzcode and tzdata releases is over, so this wrinkle doesn't apply to downloading current versions. For historical versions, I'm managing the split versioning by baking into my downloading code a table of all the historical releases since 1993 (including a couple that weren't announced on the mailing list; I think I got them all). So of the issues I discussed, the only one getting in the way of me using HTTP(S) is the lack of a machine-friendly way to determine the version number of the latest release. To reiterate:
The ideal would be to have an HTTP(S) URL that gives just the latest version number, as a plain text file in the same format as the "version" file in the distribution.
-zefram
Is retrieving old versions of the tzdb a common use case?
Probably not a _common_ use case, but I know of multiple (separate) teams who are working on embedded systems that (will) make use of the tzdb. When they put together the support systems that'll be used for updating their devices in the field, they'll each need to download a number of older versions, so that they can test the update process against real data. Regards, Stephen Goudge Senior Software Engineer Petards Joyce-Loebl Limited 390 Princesway Team Valley Gateshead Tyne & Wear NE11 0TU T +44 (0) 191 420 3015 F +44 (0) 191 420 3030 W www.petards.com ___________________________________________________________________________________________ This email has been scanned by Petards. The service is powered by Symantec MessageLabs Email AntiVirus.cloud ___________________________________________________________________________________________ This email has been sent from Petards Group plc or a member of the Petards group of companies. The information in this email is confidential and/or privileged. It is intended solely for the use of the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any review, dissemination, disclosure, alteration, printing, circulation or transmission of this e-mail and/or any file or attachment transmitted with it, is prohibited and may be unlawful. Petards Group plc is registered in England & Wales. Company No. 2990100 ___________________________________________________________________________________________
On Oct 24, 2017, at 4:38 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 10/24/2017 12:38 PM, Kim Davies wrote:
Please try again now, it should be resolved. Thanks for fixing that.
I no longer recommend FTP to get tzdb. FTP has trouble with firewalls, does not support caching or accelerators, has an aging software base, and has real problems with man-in-the-middle attacks. Although we don't have immediate plans to remove the FTP servers, anybody who's currently using them should put "switch to HTTPS for tzdb access" on their to-do list, as I expect the FTP servers will be on their way out sooner or later.
FTP works fine with firewalls, if the firewall software is any good. And as others pointed out, FTP does more than simply retrieve files the way HTTP does. If you want to consider something else that is functionally comparable with FTP, that's fine. Does SFTP? I don't really know it. But HTTP isn't a functional replacement for FTP at all. As for man in the middle attacks, that's what data signatures are for. No worries about the protocols if the data is authenticated. paul
<Paul.Koning@dell.com> writes:
FTP works fine with firewalls, if the firewall software is any good.
Yeah, but it's a real pain in the ass to make work in the firewall software since you have to track the PASV connection pair and tie it back to the original connection. The FTP wire protocol is kind of awful. It was designed for a much different era of the Internet.
If you want to consider something else that is functionally comparable with FTP, that's fine. Does SFTP? I don't really know it. But HTTP isn't a functional replacement for FTP at all.
Anonymous rsync doesn't give you the full capabilities of FTP, but it does give you the file listing part, and I suspect would be enough to do the things you're doing with FTP in a standardized way. -- Russ Allbery (eagle@eyrie.org) <http://www.eyrie.org/~eagle/>
Paul.Koning@dell.com wrote:
As for man in the middle attacks, that's what data signatures are for. No worries about the protocols if the data is authenticated.
This is true as far as it goes, but one has to be careful about what's covered by the signature. What's covered by the present signatures is only the content of the released file. So you know you've got an authentic Eggert release file, and can be confident that the code in it won't be malicious. But you can't so easily be sure that the authentic file you've got is the same one that you asked for: the association between the filename and the content is not covered. A MitM could give you tzcode2013e.tar.gz, with matching signature, when you were expecting tzdb-2017c.tar.lz (different tarball layout and scope, different version). In my downloading code I've now implemented a check that the file we got is the one we're expecting, based entirely on the content of the file. For tzdb-*.tar.lz this is pretty easy: extract tzdb-$version/version and check that it contains the version number. The existence of the file in the tarball under this directoryful name confirms that it's a tzdb tarball, rather than tzcode or tzdata. The content of the version file confirms the version number, but actually the versioned directory name in the tar also confirms that. For current tzcode and tzdata files one can similarly check the version file for the version number, and check some obvious filenames to determine which kind of tarball it is. Older tzcode and tzdata are a bit more difficult, with the version number not being in a separate file, but being present in the Makefile. But actually I'm not using those trickier checks: for everything preceding the tzdb.tar.lz era I'm baking SHA-512 hashes into my code, indexed by filename. That also makes a signature check redundant for these files. Of course, that system can't cover new releases. FWIW, I would not regard HTTPS in the absence of the PGP signature files as being secure against MitM attacks for this purpose. The key trust management is a problem in several respects that I don't want to digress into. The PGP signatures, applied to the files per se and made using a key that's used for little else, are the right tool for the job. -zefram
Zefram wrote:
FWIW, I would not regard HTTPS in the absence of the PGP signature files as being secure against MitM attacks for this purpose.
Yes, quite right. HTTPS is not a cure-all. However, it is a significant security improvement over FTP, which is why I'm recommending it. As for shortcomings of the IANA tz HTTPS server, how about if we did the following: 1. Set up https://ftp.iana.org/tz so that it contains the same files that ftp://ftp.iana.org/tz does, in the same locations. Where the FTP server contains directories, the HTTPS: server can contain directory listings in the usual Apache format. 2. Where the FTP server has a symbolic link, have the HTTPS server redirect via an HTTP status code 307 (Temporary Redirect) response that points to the symlink target, instead of simply being another name for the target. That way, it should be easy to find out programmatically what the current version is: just retrieve the "latest" URL and see what it redirects to. (2) can be done with something like the recipe mentioned here: https://stackoverflow.com/questions/16351271/apache-redirects-based-on-symli...
Paul Eggert <eggert@cs.ucla.edu> wrote on Tue, 31 Oct 2017 at 23:39:55 -0700 in <7019cc6f-34fb-d372-b14a-2dfd41bfc86a@cs.ucla.edu>:
1. Set up https://ftp.iana.org/tz so that it contains the same files that ftp://ftp.iana.org/tz does, in the same locations.
While there's nothing wrong with this, and arguably it makes a certain amount of sense, why not do so with https://www.iana.org/time-zones/repository/releases/ which is where you'd expect to find them by lopping off the tail of the URL? (given https://www.iana.org/time-zones/repository/releases/tzdb-2017c.tar.lz) Of course they are not mutually exclusive -- perhaps they should do the same thing or one should redirect to the other. --jhawk@mit.edu John Hawkinson
Paul Eggert wrote:
2. Where the FTP server has a symbolic link, have the HTTPS server redirect via an HTTP status code 307 (Temporary Redirect) response that points to the symlink target,
This sounds fine for the "-latest" URIs, and would satisfy my use case. Whether this type of redirect is appropriate for other symlinks should be judged case by case. -zefram
participants (13)
-
Brian Inglis -
Deborah Goldsmith -
Goudge, Stephen -
John Hawkinson -
Kim Davies -
Nathan Winters -
Paul Eggert -
Paul.Koning@dell.com -
Russ Allbery -
scs@eskimo.com -
Steffen Nurpmeso -
Tim Parenti -
Zefram