After migrating to a new laptop I found that I'd made several duplicates of some fairly large (but important) data, during the migration, to make sure I did not lose it. But now that everything is copied across, and I've been using the new laptop for a couple of months, the drive is looking more full (90% full is a steady state for harddrives!). So I wanted to tidy up some of the duplication, and recover some space.

An obvious thing to do with "similar, but not quite identical" directory trees, is to turn all the common files into hardlinks to each other. There are specialised tools (like dupfiles) for doing this, and I've written my own in the past to deal with various file renames (link iPhone/iPad backups which use hashed filenames). But for the simple case of identical filenames in an identical structure, there is an easy solution: rsync --link-dest=.... (which can be used to make a Poor Man's Time Machine; another backup example).

An example (using something similar to my actual case):

cd /bkup/
mv photos photos.old
mkdir photos
rsync -av --link-dest=/photos photos.old/ photos/

What this does is for every file in common between /bkup/photos.old and /photos, it hard links in the file from /photos. And for every other file it copies the file from /bkup/photos.old. If the two are mostly common (eg, one is an older copy of the other, as was my case), most of the files will end up hard linked. You can check the quantity of hard links with:

cd /bkup
find photos -type f -links 1 | wc -l
find photos -type f | wc -l

Where the first number tells you the files that are "one of a kind" (ie not hard linked), and the second tells you the total number of files. Ideally you'll find that, eg, 95% of the files are now hard links.

When you're happy the hard links are in place, you can then double check that the same files are found in the old version and the new version:

cd /bkup
diff -r photos/ photos.old/

and if doesn't show any differences, you should be find to remove the old version:

cd /bkup
rm -r photos.old

At this point you'd expect to recover the space used by the one off copies that were in /bkup/photos (which became /bkup/photos.old), since they are no longer separate copies -- just references to files that already existed on the hard drive.

But if you're running OS X 10.9 (Mavericks; or it seems 10.7 -- Lion -- or later) then the magic space recovery does not happen as immediately as expected on other systems. Emptying the trash, logging out, or rebooting, to try to free up references to the now deleted files, do not meaningfully help. Which is surprising.

The answer to this surprise seems to be Time Machine Local Snapshots (more detail; they show as "white" ticks in the Time Machine view, as opposed to the "purple" ticks in the Time Machine view for backups on an external drive). These Local Snapshots are references to files held on the local hard drive (and apparently mounted as a lookback mount on /Volumes/MobileBackups). The effect is as if there are still copies of the files that you have deleted -- so they keep taking up space on the hard drive. (Other lost storage space possibilities.)

It is possible to see how much space is being consumed by these local backups by going to: Apple -> About this Mac -> More Info... -> Storage. That is the Storage tab of the more detailed "About This Mac" window. (Do not go into "System Report..." on the Overview tab, no matter how tempting it looks, as that is only a hardware breakdown -- not a usage breakdown.)

In the resulting graphic, the "Backups" section on the main hard drive (usually "Macintosh HD", and/or "Flash Storage") is the Local Backups.

In my case these "Local Snapshots" are currently taking up about 20% of the drive! I assume that if I had looked just before deleting the /bkup/photos.old, it would have been taking up much less, and after deleting /bkup/photos.old it would have jumped up to almost the size of /bkup/photos.old.

Time Machine will keep these local snapshots while there is at least 20% free space on the local drive. Once the free space drops below 20%, it will start removing older snapshots. And if it falls below 10% free, then removing snapshots will be given a higher priority. In addition the local snapshots are consolidated regularly down to one per day (after a day or so), and one per week every week. In my case there are backups held back a full month on the local drive. (In /Volumes/MobileBackups/Backups.backupdb/${HOSTNAME} there are directories named YYYY-MM-DD-HHMMSS of when the snapshot is taken, which helps identify how new or old they are.)

Eventually the last reference to a deleted files will be removed, at which point the free space will magically go up. But it is most likely that this will happen either after about a month, or when the space is needed by something else (the local snapshots act like a weak reference) -- in which case there may not be much change in free space.

So it turns out the steady state of an OS X Mavericks system drive really is 80-90% full!

(It is possible to turn off local snapshots manually from the command line, using tmutil, but on a laptop they're actually quite useful so I've left them on. I just need to remember to check the disk usage of snapshots before wondering why deleting lots of files has not freed up much space :-) )