Back-of-the-envelope cost of extra data :-)
This output... Script started on Wed 04 May 2005 10:15:00 AM EDT lecserver$ du -s tz*/tmp/*/zoneinfo 489 tz/tmp/etc/zoneinfo 1709 tzexp/tmp/etc/zoneinfo lecserver$ exit script done on Wed 04 May 2005 10:15:08 AM EDT ...indicates that "old-format" data eats up about half a megabyte of disk space total while "new-format" data eats up about 2 megabytes. (Some individual files grow by a factor of 9; others don't grow at all; the final result lies between these extremes.) At a dollar per gigabyte of disk space, the old stuff costs one twentieth of a cent per computer while the new stuff costs four twentieths of a cent per computer, for a difference of three twentieths of a cent. Making a wild estimate of one computer per person, there are about 6.5 billion computers in the world. In this case I'm happy if 90% are running Windows, since Microsoft doesn't ship time zone files. That leaves at most 650 million computers where the time zone files might show up. Maximum total cost: 650 million computers * three twentieths of a cent: $975,000 (ulp!) Frightening corollary: 1024 bytes of Microsoft code bloat costs the world economy $5850. --ado
"Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> writes:
This output... Script started on Wed 04 May 2005 10:15:00 AM EDT lecserver$ du -s tz*/tmp/*/zoneinfo 489 tz/tmp/etc/zoneinfo 1709 tzexp/tmp/etc/zoneinfo lecserver$ exit
script done on Wed 04 May 2005 10:15:08 AM EDT ...indicates that "old-format" data eats up about half a megabyte of disk space total while "new-format" data eats up about 2 megabytes.
This depends on filesystem blocking. I calculate the actual data as growing from 301,265 to 2,128,478 bytes, a factor of 7. The old-format data produces tiny files that waste a lot of disk space due to internal fragmentation; the new-format data produces larger files, with less internal fragmentation. With Solaris 9 UFS, using 1 KiB units, I get: $ du -sk tz*/etc/zoneinfo 660 tz-0/etc/zoneinfo 2445 tz-1/etc/zoneinfo or a 3.7x growth on that file system. I don't know what units your "du" was generating output for, but I'm a bit surprised if it the units are 1 KiB, as it's indicating that you're shoehorning 2,128,478 bytes into 1709 KiB. Are you using a compressed file system?
Maximum total cost: 650 million computers * three twentieths of a cent: $975,000 (ulp!)
Yup. It'd be nice to shrink this a bit, if it's feasible. Perhaps go to a varying-width format?
Date: Wed, 4 May 2005 12:09:51 -0400 From: "Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> Message-ID: <75DDD376F2B6B546B722398AC161106C7403FA@nihexchange2.nih.gov> | Maximum total cost: 650 million computers * three twentieths of a cent: | $975,000 (ulp!) Why do we need to spend that much? I'm confused. At first I thought the 400 years in advance stuff was as far as we wanted to predict he future (and shouldn't the code be generating "stardate 1603.8" by then anyway?) But then I saw some reference to what happens after the 400 years, where the rules are to be condensed into an algorithm (kind of like the posix string) and then any future time constructed from that. The conversion generated is fantasy, we all know that, but that isn't the point of this mail. If at some future point the code is going to switch from the tables, to an algorithmic conversion, why bloat the tables by including so much speculative future data? We know that none of us is able to predict the rules with any accuracy more than a year or two in advance in the best case - for many, we can't even keep in advance of the changes. If that algorithmic conversion is to happen, why not have it kick in much closer to now - say 20 years in advance, instead of 400 (and yes, even earlier than the old format zone files ran out of table data). The conversions all a combination of smoke, mirrors, and superstition out there anyway. If the rules change (or rather, when the rules change) people need new zone files anyway, if they don't change, there's no reason the algorithmic conversion can't keep on being used, way out into the time when it is being used for current & even past time conversions (it may be slower, but anyone who cares can just update their zone tables). kre
participants (3)
-
Olson, Arthur David (NIH/NCI) -
Paul Eggert -
Robert Elz