A deeper dive into the technical details of ZFS dedup. It's not as simple as "one MB of disk, one KB of RAM", even though that is possibly a decent heuristic for some data:
> According to the ZFS dedup FAQ, each entry in the dedup table costs about 320 Bytes of memory per block. To estimate the size of the dedup table, we need to know how many blocks ZFS will need to store our data. This question can be tricky: ZFS uses a variable block size between 512 bytes and 128K, depending on the size of the files it stores. So we can't really know in advance how many blocks ZFS will use for storing our data.
It looks to me from my guix_nix experiment that ZFS doesn't deduplicate across datasets in the same pool. But these sources say actually it does. I will have to experiment further.
I did experiment more, and it seems the deduplication works as advertised. It's just that this particular machine's guix+nix is less redundant than I expected, apparently. :-)
@clacke @hattiecat @bob @samis @h @gemlog @cstanhope Just caught up - thanks for the info. I run ZFS on a FreeBSD box but to be honest am only using a fraction of its potential.
I also want to try DragonFly BSD's HAMMER for file versioning (something I've missed since VMS) but can't get it to run in a VM and have no spare hardware to try it with :-(