Looking for a way to cut down binary zoneinfo size. Does anyone have the way to?
Hello, I'm developing for a pretty small-sized embedded system, and 2.9M of binaries is a bit too much for us. The system never uses a date before 2007. I estimate I can slim it down too 500K by cutting down the old data. 1. I made a script to cut down the source data: all the rules and all the zonelines before 2001. Compiling this data, I get a lot of errors like:
"northamerica", line 2537: %s in ruleless zone "northamerica", line 2618: unruly zone
because some zones are left without rules, but with %s format in them. 2. I tried to modify zic itself, but apparently understanding it will take me days. Just setting min_year to 2001 results in bunch of
can't determine time zone abbreviation to use just after until time
Does anyone have a ready solution? Thanks!
Android uses its own format, which is smaller; you might look into that. If memory serves, the default Android tzdata binary file doesn't worry about dates after 2038, which can save space if you know your devices cannot possibly be long-lived. There is an option to go past 2038, though, and I'd likely use that myself.
Thanks Paul! I don't have Android, I have a much smaller vanilla Linux. We do use zic in a toolchain. Update on my progress: immediately after sending the email, I got another idea: apparently, Rules need to stay in the source data, they shouldn't affect the output size, as long as no zoneline uses them. (hope it's correct) I've tried cutting zonelines only, but now I get several of:
can't determine time zone abbreviation to use just after until time
It also appears to me that some zones will be completely empty, and thus should turn into Links (question is, to what?) The project goes on. On Thu, May 18, 2017 at 9:44 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Android uses its own format, which is smaller; you might look into that. If memory serves, the default Android tzdata binary file doesn't worry about dates after 2038, which can save space if you know your devices cannot possibly be long-lived. There is an option to go past 2038, though, and I'd likely use that myself.
Suggestion: Forget about modifying the source data or zic, and just write a simple zoneinfo->zoneinfo filter that removes any transitions you don't want. On Thu, May 18, 2017 at 12:50 PM, Viktor Sergiienko <singalen@gmail.com> wrote:
Thanks Paul!
I don't have Android, I have a much smaller vanilla Linux. We do use zic in a toolchain.
Update on my progress: immediately after sending the email, I got another idea: apparently, Rules need to stay in the source data, they shouldn't affect the output size, as long as no zoneline uses them. (hope it's correct) I've tried cutting zonelines only, but now I get several of:
can't determine time zone abbreviation to use just after until time
It also appears to me that some zones will be completely empty, and thus should turn into Links (question is, to what?) The project goes on.
On Thu, May 18, 2017 at 9:44 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Android uses its own format, which is smaller; you might look into that. If memory serves, the default Android tzdata binary file doesn't worry about dates after 2038, which can save space if you know your devices cannot possibly be long-lived. There is an option to go past 2038, though, and I'd likely use that myself.
On May 18, 2017, at 12:16 PM, Viktor Sergiienko <singalen@gmail.com> wrote:
Hello,
I'm developing for a pretty small-sized embedded system, and 2.9M of binaries is a bit too much for us. The system never uses a date before 2007.
I estimate I can slim it down too 500K by cutting down the old data.
1. I made a script to cut down the source data: all the rules and all the zonelines before 2001. Compiling this data, I get a lot of errors like:
"northamerica", line 2537: %s in ruleless zone "northamerica", line 2618: unruly zone
because some zones are left without rules, but with %s format in them.
2. I tried to modify zic itself, but apparently understanding it will take me days. Just setting min_year to 2001 results in bunch of
can't determine time zone abbreviation to use just after until time
Does anyone have a ready solution?
Not completely ready, but... I had the same need some time ago. I made a small change to zic to add a switch that says "omit data from before Y" (with in our case, Y=2001). That produces files that are much shorter, but also a lot of duplicates -- because often two zones differ only in early rules. So I create hardlinks for any duplicate files. The result is about 100kB of actual data. I don't remember if I tried it at the source level, as you did. I think that's a bit messy for cases where the most recent rule is from before your cutoff. The change in zic was pretty easy. paul
Cutting out old data may not buy as much as you'd like. Putting a "timecnt = 0" at the top of "writezone" in "zic.c" and then comparing the produced directories of the original ("tzraw") and modified ("tzcooked") yields: Script started on Thu, May 18, 2017 1:12:37 PM $ du -s tzraw/tmp/etc tzcooked/tmp/etc 3103 tzraw/tmp/etc 2356 tzcooked/tmp/etc $ exit exit Script done on Thu, May 18, 2017 1:12:49 PM Limited savings is due to disk sector size; as an example, the "America/New_York" produced by an unmodified zic weighs in at 3545 bytes; on a 4096-byte-sector system, the one sector it takes can't be reduced. (Your correspondent is enough of a fossil to have lived in the age of 512-byte sectors.) If space is at a premium, be sure to "make REDO=posix_only ..." when building. @dashdashado On Thu, May 18, 2017 at 12:16 PM, Viktor Sergiienko <singalen@gmail.com> wrote:
Hello,
I'm developing for a pretty small-sized embedded system, and 2.9M of binaries is a bit too much for us. The system never uses a date before 2007.
I estimate I can slim it down too 500K by cutting down the old data.
1. I made a script to cut down the source data: all the rules and all the zonelines before 2001. Compiling this data, I get a lot of errors like:
"northamerica", line 2537: %s in ruleless zone "northamerica", line 2618: unruly zone
because some zones are left without rules, but with %s format in them.
2. I tried to modify zic itself, but apparently understanding it will take me days. Just setting min_year to 2001 results in bunch of
can't determine time zone abbreviation to use just after until time
Does anyone have a ready solution? Thanks!
On May 18, 2017, at 1:29 PM, Arthur David Olson <arthurdavidolson@gmail.com> wrote:
Cutting out old data may not buy as much as you'd like. Putting a "timecnt = 0" at the top of "writezone" in "zic.c" and then comparing the produced directories of the original ("tzraw") and modified ("tzcooked") yields:
Script started on Thu, May 18, 2017 1:12:37 PM $ du -s tzraw/tmp/etc tzcooked/tmp/etc 3103 tzraw/tmp/etc 2356 tzcooked/tmp/etc $ exit exit Script done on Thu, May 18, 2017 1:12:49 PM
Limited savings is due to disk sector size; as an example, the "America/New_York" produced by an unmodified zic weighs in at 3545 bytes; on a 4096-byte-sector system, the one sector it takes can't be reduced.
(Your correspondent is enough of a fossil to have lived in the age of 512-byte sectors.)
If space is at a premium, be sure to "make REDO=posix_only ..." when building.
@dashdashado
That's a point. Note though that removing old information will also make for a bunch of duplicates, which reduces the total storage needed. Also, if you can use a storage system that packs the data, the sector issue may not be there. I could imagine, for example, storing the zone data in a zip file and extracting the desired file when the user says "I want to use zone America/Thule". In our own case, I used a dense file system similar to Linux's "romfs". paul
On Thu, May 18, 2017 at 10:39 AM, <Paul.Koning@dell.com> wrote:
Limited savings is due to disk sector size; as an example, the "America/New_York" produced by an unmodified zic weighs in at 3545 bytes; on a 4096-byte-sector system, the one sector it takes can't be reduced.
That's a point. Note though that removing old information will also make for a bunch of duplicates, which reduces the total storage needed. Also, if you can use a storage system that packs the data, the sector issue may not be there. I could imagine, for example, storing the zone data in a zip file and extracting the desired file when the user says "I want to use zone America/Thule". In our own case, I used a dense file system similar to Linux's "romfs".
Yep, or we can mount an archive as is, with archivemount or something. The file slack can be dealt with, as long as the data is slimmed. Thanks!
One quick-and-dirty possibility: in zic.c, set "early_time" to one billion (and downgrade the error about leap seconds before the big bang to a warning). (While quick and dirty, this is considerably more refined than setting timecnt to zero.) @dashdashado On Thu, May 18, 2017 at 1:59 PM, Viktor Sergiienko <singalen@gmail.com> wrote:
On Thu, May 18, 2017 at 10:39 AM, <Paul.Koning@dell.com> wrote:
Limited savings is due to disk sector size; as an example, the
"America/New_York" produced by an unmodified zic weighs in at 3545 bytes; on a 4096-byte-sector system, the one sector it takes can't be reduced.
That's a point. Note though that removing old information will also make for a bunch of duplicates, which reduces the total storage needed. Also, if you can use a storage system that packs the data, the sector issue may not be there. I could imagine, for example, storing the zone data in a zip file and extracting the desired file when the user says "I want to use zone America/Thule". In our own case, I used a dense file system similar to Linux's "romfs".
Yep, or we can mount an archive as is, with archivemount or something. The file slack can be dealt with, as long as the data is slimmed.
Thanks!
Thanks a lot, Arthur! We have a 1k sector on older systems. I based my estimation on that, and the total count of files. Plus, our filesystem keeps symlinks in directory, not in a separate sector. if I turn more timezones into symlinks, that must save some space too. On Thu, May 18, 2017 at 10:29 AM, Arthur David Olson <arthurdavidolson@gmail.com> wrote:
Cutting out old data may not buy as much as you'd like. Putting a "timecnt = 0" at the top of "writezone" in "zic.c" and then comparing the produced directories of the original ("tzraw") and modified ("tzcooked") yields:
Script started on Thu, May 18, 2017 1:12:37 PM $ du -s tzraw/tmp/etc tzcooked/tmp/etc 3103 tzraw/tmp/etc 2356 tzcooked/tmp/etc $ exit exit Script done on Thu, May 18, 2017 1:12:49 PM
Limited savings is due to disk sector size; as an example, the "America/New_York" produced by an unmodified zic weighs in at 3545 bytes; on a 4096-byte-sector system, the one sector it takes can't be reduced.
(Your correspondent is enough of a fossil to have lived in the age of 512-byte sectors.)
If space is at a premium, be sure to "make REDO=posix_only ..." when building.
@dashdashado
On Thu, May 18, 2017 at 12:16 PM, Viktor Sergiienko <singalen@gmail.com> wrote:
Hello,
I'm developing for a pretty small-sized embedded system, and 2.9M of binaries is a bit too much for us. The system never uses a date before 2007.
I estimate I can slim it down too 500K by cutting down the old data.
1. I made a script to cut down the source data: all the rules and all the zonelines before 2001. Compiling this data, I get a lot of errors like:
"northamerica", line 2537: %s in ruleless zone "northamerica", line 2618: unruly zone
because some zones are left without rules, but with %s format in them.
2. I tried to modify zic itself, but apparently understanding it will take me days. Just setting min_year to 2001 results in bunch of
can't determine time zone abbreviation to use just after until time
Does anyone have a ready solution? Thanks!
<<On Thu, 18 May 2017 09:16:03 -0700, Viktor Sergiienko <singalen@gmail.com> said:
I'm developing for a pretty small-sized embedded system, and 2.9M of binaries is a bit too much for us. The system never uses a date before 2007.
I estimate I can slim it down too 500K by cutting down the old data.
If you already have some sort of compression library on this system, you can just store up a compressed copy of the tzdata files and save far more. Results with various compression formats: -rw-r--r-- 1 wollman users 86292 May 18 14:29 foo.cpio.xz -rw-r--r-- 1 wollman users 317317 May 18 14:28 foo.tar.Z -rw-r--r-- 1 wollman users 126190 May 18 14:27 foo.tar.bz2 -rw-r--r-- 1 wollman users 182065 May 18 14:27 foo.tar.gz -rw-r--r-- 1 wollman users 89776 May 18 14:27 foo.tar.xz -rw-r--r-- 1 wollman users 379700 May 18 14:26 foo.zip ZIP, while the largest, supports random access. -GAWollman
On 18/05/17 19:30, Garrett Wollman wrote:
<<On Thu, 18 May 2017 09:16:03 -0700, Viktor Sergiienko <singalen@gmail.com> said:
I'm developing for a pretty small-sized embedded system, and 2.9M of binaries is a bit too much for us. The system never uses a date before 2007.
I estimate I can slim it down too 500K by cutting down the old data.
If you already have some sort of compression library on this system, you can just store up a compressed copy of the tzdata files and save far more.
Results with various compression formats:
-rw-r--r-- 1 wollman users 86292 May 18 14:29 foo.cpio.xz -rw-r--r-- 1 wollman users 317317 May 18 14:28 foo.tar.Z -rw-r--r-- 1 wollman users 126190 May 18 14:27 foo.tar.bz2 -rw-r--r-- 1 wollman users 182065 May 18 14:27 foo.tar.gz -rw-r--r-- 1 wollman users 89776 May 18 14:27 foo.tar.xz -rw-r--r-- 1 wollman users 379700 May 18 14:26 foo.zip
ZIP, while the largest, supports random access.
An alternative to ZIP is 7z (or p7zip), which also supports random access and seems to produce sizes similar to your foo.cpio.xz and foo.tar.xz. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti@mev.co.uk> )=- -=( Web: http://www.mev.co.uk/ )=-
participants (7)
-
Arthur David Olson -
Bradley White -
Garrett Wollman -
Ian Abbott -
Paul Eggert -
Paul.Koning@dell.com -
Viktor Sergiienko