Could there be a "Version 4" of the compiled TZif files in which the ints and time_ts have the correct endianness for the platform on which they're installed? I imagine a zic command-line argument that selects: - always big-endian (the default for compatibility), - always little-endian, or - the correct endianness for the platform on which zic is running. This wouldn't let users rip out all their endianness code since not all TZif files would be version 4; but it might reduce running time in programs that read lots of TZif files. Does this make sense, or is it just premature optimization? --Bill Seymour
Hi Bill, I don’t have any strong feelings either way on your idea, but it would have to be a version 5 because v4 is already baked: https://datatracker.ietf.org/doc/draft-murchison-rfc8536bis/ On Fri, Mar 8, 2024, at 10:52 AM, Bill Seymour via tz wrote:
Could there be a "Version 4" of the compiled TZif files in which the ints and time_ts have the correct endianness for the platform on which they're installed?
I imagine a zic command-line argument that selects:
- always big-endian (the default for compatibility), - always little-endian, or - the correct endianness for the platform on which zic is running.
This wouldn't let users rip out all their endianness code since not all TZif files would be version 4; but it might reduce running time in programs that read lots of TZif files.
Does this make sense, or is it just premature optimization?
--Bill Seymour
-- Kenneth Murchison Senior Software Developer Fastmail US LLC murch@fastmailteam.com
On Fri, Mar 8, 2024 at 10:54 AM Bill Seymour via tz <tz@iana.org> wrote:
it might reduce running time in programs that read lots of TZif files.
It might. A convincing proposal would include an experiment to show that it does. At first blush, I imagine that reading a file dominates swapping bytes by orders of magnitude.
On 2024-03-08 08:16, Bradley White via tz wrote:
It might. A convincing proposal would include an experiment to show that it does.
At first blush, I imagine that reading a file dominates swapping bytes by orders of magnitude.
If we someone goes this route, they should also try aligning the data. The cost of unaligned loads could dominate the cost of swapping bytes. That being said, any performance gain is likely not worth the compatibility hassle.
Bill Seymour via tz <tz@iana.org> writes:
Could there be a "Version 4" of the compiled TZif files in which the ints and time_ts have the correct endianness for the platform on which they're installed?
This'd break the fairly widespread habit of installing the files under /usr/share (which is supposed to contain only architecture-independent files). I think you'd need a *very* convincing performance argument to support changing that. regards, tom lane
Architecture independence could be maintained by adding an "endian" element to the header. But readers currently in the field wouldn't use the information, so they'd mishandle files with the new byte ordering. A file naming convention (for example, suffixing "-bigendian" or "-littleendian") is another path. Given the small effect on run time, sticking with the current ordering seems best. @dashdashado On Fri, Mar 8, 2024, 12:21 PM Tom Lane via tz <tz@iana.org> wrote:
Bill Seymour via tz <tz@iana.org> writes:
Could there be a "Version 4" of the compiled TZif files in which the ints and time_ts have the correct endianness for the platform on which they're installed?
This'd break the fairly widespread habit of installing the files under /usr/share (which is supposed to contain only architecture-independent files). I think you'd need a *very* convincing performance argument to support changing that.
regards, tom lane
One ugly option for maintaining device independence: compiled files with the data in one endian order followed by a copy in the other order.-S @dashdashado On Fri, Mar 8, 2024 at 12:31 PM Arthur David Olson < arthurdavidolson@gmail.com> wrote:
Architecture independence could be maintained by adding an "endian" element to the header. But readers currently in the field wouldn't use the information, so they'd mishandle files with the new byte ordering.
A file naming convention (for example, suffixing "-bigendian" or "-littleendian") is another path.
Given the small effect on run time, sticking with the current ordering seems best.
@dashdashado
On Fri, Mar 8, 2024, 12:21 PM Tom Lane via tz <tz@iana.org> wrote:
Bill Seymour via tz <tz@iana.org> writes:
Could there be a "Version 4" of the compiled TZif files in which the ints and time_ts have the correct endianness for the platform on which they're installed?
This'd break the fairly widespread habit of installing the files under /usr/share (which is supposed to contain only architecture-independent files). I think you'd need a *very* convincing performance argument to support changing that.
regards, tom lane
On Mar 8, 2024, at 7:52 AM, Bill Seymour via tz <tz@iana.org> wrote:
Could there be a "Version 4" of the compiled TZif files in which the ints and time_ts have the correct endianness for the platform on which they're installed?
In the very early days of the project, the files were in the byte order of the host on which zic ran. I was in the OS group at Sun at that time; when I discovered the project, I decided use it in SunOS 4.0, which was under development at that time. SunOS 4.0 removed Sun's old ND (network disk) protocol, which was used for the root file system for diskless workstations, replacing it with NFS. At the time, Sun also decided to reorganize the directory layout of the system; many of the conventions used in most UN*Xes at the time, such as: /sbin and /usr/sbin; /usr/share; /var; were introduced in that reorganization. /usr/share was introduced for files that were platform-independent, so that diskless workstations with different instruction sets could all use the same versions of those files on a file server. I decided to store the tzdb files under /usr/share. The machines Sun sold *at that time* were all big-endian, so byte order would not be a problem for then. *However*, Sun was also developing their 80386-based Sun386i line of workstations; x86 processors are little-endian, so that would *make* byte order a problem. I decided to change the file format to store multi-byte integral values in network byte order, i.e. big-endian format, and changed the code to support that. I submitted that patch to Arthur, and it was accepted. I.e., at the time, there were cases where the the platform on which the files were installed, in the sense of "the machine to which the disks on which the files are stored are attached", is not the platform that is running code that is reading the files, and may not have the same byte order as the machine on which the files are stored. See https://mm.icann.org/pipermail/tz/1986-November/000422.html for the message announcing that: The important differences: * There's a new format for the binary versions of time zone information files, designed to allow the files to be used by both big-endian and little-endian machines in shared file environments. ... Diskless workstation support using NFS isn't much of a thing these days, so that rationale might be less important, although going with native-byte-order tzdb files *would* mean that they should be moved out of /usr/share on systems that store them there - but, on my UN*X box, from some obscure UN*X-box company in Cupertino, store them under /var/db/timezone/zoneinfo/ anyway.
This wouldn't let users rip out all their endianness code since not all TZif files would be version 4; but it might reduce running time in programs that read lots of TZif files.
Does this make sense, or is it just premature optimization?
I'd call it premature to the extent that we don't know whether it'd significantly reduce running time in programs of that sort. Somebody should probably do some tests with both host-native and big-endian files to see what performance difference it makes.
participants (7)
-
Arthur David Olson -
Bill Seymour -
Bradley White -
Guy Harris -
Ken Murchison -
Paul Eggert -
Tom Lane