On Jul 10, 2014, at 12:35 PM, David Patte ₯ <dpatte@relativedata.com> wrote:
I also agree with Stephen and Derick on this
Database records, in any database, should not be merged just because they contain the same data. My feeling is that the relationship of the records in question is coincidental, and therefore the records should remain separate.
Agreed. Optimization should be done as a post-processing step. Keep all the source records distinct. After they have been turned into their final internal representation for the system that uses the data — for example, “zic” output files in Unix systems — then and only then look for identical bits, and turn things into links. In our system, I’ve done exactly that. This is the right place partly because it’s automatic, it depends on data comparison rather than human effort attempting to keep explicit links up to date. And in addition, it is the right place because we use abbreviated data (we keep only zone data from 2001 up) so we end up with a very large number of matches that would not be there if you look at the source records. paul