Steven Abner wrote:
I am not sure of what function Perl would use or your application would use if you wished to translate a server in Japan sending a date string of "Maintenance will begin at 12:00 JST" and desired to display it in a local time string.
That sort of thing is properly tackled in the opposite direction. When maintenance will start is a point in time, independent of timezone, and so should be represented in UT or equivalent. Then it can be trivially converted to each user's preferred timezone for display. For example, $ maintdate='2011-12-09 00:00 UT' $ for TZ in America/New_York America/Sao_Paulo Asia/Tokyo; do date -d "$maintdate" +'Maintenance will begin at %H:%M %Z on %A.'; done Maintenance will begin at 19:00 EST on Thursday. Maintenance will begin at 22:00 BRST on Thursday. Maintenance will begin at 09:00 JST on Friday. If you really need to go the other way... well, that's pretty bizarre. You're scraping a human-oriented announcement "Maintenance will begin at 12:00 JST" from someone's server, the format is rigid enough for you to reliably pick out "12:00 JST" as a time indicator, but the application isn't specific enough for you to statically configure that this server uses Tokyo time? I don't believe it as stated. There are more plausible scenarios, though, where you could get "12:00 JST" as a time indicator and have to automatically interpret it. In that case, firstly, you don't want to go to the Asia/Tokyo timezone per se. The job isn't to guess which geographical zone applies, it's to guess what "JST" means. That's not so much of an issue with "JST", but consider "MST": you don't really care where it's coming from America/Denver or America/Phoenix, it could be either and means the same either way. If you get "MST" for a date in July then it's probably America/Phoenix, and you'd go quite wrong if you were to interpret it as America/Denver and use the offset that Denver uses in July. (Denver uses the abbreviation "MDT" in July.) Secondly, you need to accept that you're guessing. There are ambiguities, and if you can't have any user input to disambiguate then you're going to go wrong sometimes.
Would you scan all the files? or search the internet of how to interpret JST?
Once you've got the above issues clear, actually performing the guess is fairly easy. As you point out, it's a bit much to search through every zone file each time, at least if you're doing this regularly, so it's sensible to build an abbreviation-to-whatever index. Generating the index is trivial, by a single pass through every zone file. Depending on application, you might want to limit the indexing to abbreviations used in the last N years (so no "LMT" outlier). Looking up an abbreviation in the index will give you a small list of candidate offsets. If you want to be clever, you could try narrowing down the list further by checking the slightly larger list of candidate geographical timezones, to see whether the abbreviation is meant to be in use at the time of year that the time expression appears to depict.
How would one even start querying a user.
If you've got a user to query, you can do this properly. You usually want to know a user's timezone for output purposes. Usually a geographical civil timezone, and you *do* want to distinguish America/Denver from America/Phoenix. The zone.tab file provides a convenient structure for this, based on contemporary political geography. You end up with a dialogue going something like "what continent are you in?" "Asia" "which of these countries?" "Japan" "right, you'll be wanting Asia/Tokyo". Countries with multiple timezones get an additional question, "is that east or west Uzbekistan?". tzselect.ksh in the tz distribution implements this system in a simple way. Some refinements are possible, such as displaying the current time (and abbreviation) in each candidate zone. If, bizarrely, you've got "JST" from a user, and then want to ask to disambiguate (or confirm) it, that'll be a much shorter line of questioning. Use the index discussed above to get a list of candidate geographical zones, and then show them to the user (possibly using the descriptions from zone.tab, though they don't cover all zones in the database, this isn't what they're for), and ask the user to pick one. Here's a very crude version of this type of search: $ for TZ in $(comm -23 \ =(grep -wl JST /usr/share/zoneinfo/posix/**/*(.) | \ sed 's,.*posix/,,' | sort) \ =(grep '^Link' ~/tmp/tz/* | awk '{print $3}' | sort)); do date +'%a %H:%M %Z%t'$TZ; done Thu 20:31 TLT Asia/Dili Thu 19:31 HKT Asia/Hong_Kong Thu 18:31 WIT Asia/Jakarta Thu 19:31 MYT Asia/Kuala_Lumpur Thu 19:31 MYT Asia/Kuching Thu 19:31 CIT Asia/Makassar Thu 19:31 PHT Asia/Manila Thu 18:31 WIT Asia/Pontianak Thu 18:01 MMT Asia/Rangoon Thu 22:31 SAKT Asia/Sakhalin Thu 19:31 SGT Asia/Singapore Thu 20:31 JST Asia/Tokyo Thu 23:31 NRT Pacific/Nauru Obviously, grep is just turning up every zone that has ever used "JST" at all, possibly with some false positives. You can do better by properly parsing the tzfiles. You can limit to zones that have used the abbreviation recently, and so on.
What if your application shouldn't interact to convert to local time display?
It'll still need some kind of input to determine which timezone to use. A zone abbreviation is a rather unlikely form of such input. If you can't get an explicit zone name input then you'll have to guess to some extent, of course. GeoIP will give you a decent guess. -zefram