we're still having trouble with tzcode using unreasonable amounts of stack. things were somewhat ameliorated by defining the oddly-named ALL_STATE. that causes tzcode to allocate tzload's 'u_t' on the heap instead of the stack. that takes about 40KiB off tzload's stack requirements, which was by far the biggest offender.
unfortunately, later on in tzload there's a 'struct state' allocated on the stack. on LP64, that's about 18KiB, which -- even if it's not 40KiB -- is still unreasonable in one frame! LP32 is about 9KiB less, because our LP32 time_t was only 32-bit, so we've been getting away with it there --- threads with 16KiB stacks could safely call tzcode functions. but this isn't true for LP64.
(note that these rough figures are with TZ_MAX_TIMES at 1200 --- we haven't yet bumped that to the 2000 it seems to be at head.)
--elliott