SVN vs. Mercurial vs. Git For Managing Your Home Directory
For several years I've kept the bulk of my home directory in a revision control system. This allows me to synchronize my files across the two machines I use commonly, keep a backup on my home NAS box, and have complete revision history of files.
There's a price, however: the SCM keeps metadata on my machines, and this can add up. Plus there's the time needed to commit files. When it became clear I needed to switch away from Subversion because it doesn't cooperate with iWork files, I decided to look into alternatives.
Mercurial and Git appeared to be the best solutions, but there's quite the holy war going on between the two. Git's confusing, Mercurial is slow, etc.. I decided to run some of my own tests and let the data speak for itself.
Update 2008.04.25: Adding results for Bazaar.
Home Directories vs. Source Code
Keep in mind that managing a home directory is different than managing source code. I consider source code management an entirely different problem: company processes, branching/merging, platform compatibility, etc. are just as important as commit time and repository size. A home directory, on the other hand, is all about me. I sync my machines but that's about it, and I rarely need to branch.
The distribution of files is different, too. Source code tends to have many small, easily compressible text files. Home directories tend toward fewer and larger files — in my case, my photo library and design projects are a real problem. When I'm working on a large Photoshop file and committing several revisions, I need the SCM's binary storage/differencing engine to handle that efficiently.
Last, change tracking is a bit more lenient with home directories. I may shuffle some stuff around, and I don't need to explain the changes to anyone else. I'd like to tell the SCM "just make the current version look like this." Some GUI interfaces do this well, as does Mercurial's "addremove" command.
Testing Add Time and Repository Growth
My first test is adding piles of files and watching repository growth. I used three sets of files for this test, with the intent of covering large binary files down to smaller text files:
Digital Negative (DNG) files: 501MB for 134 files. Median file size 3.4MB, mean 3.8MB.
JPEG files: 500MB for 1301 files. Median file size 228KB, mean 405KB.
Pile of PDFs, source code, office docs: 500MB for 6069 files. Median file size 4KB, mean 24KB.
I added the files one set at a time. In all configurations the repository is on local disk. Full test protocol and output is at the bottom of this page for the curious.
Time Required To Add + Commit
The purpose of this test is simply to look at time required to add each data set to the repository.
|SCM Tool||DNG files||JPEG files||Document files||Repack (Git only)|
|Subversion||2m 30s||4m 54s||20m 13s||—|
|Mercurial||1m 33s||1m 54s||1m 59s||—|
|Git||1m 6s||1m 30s||1m 29s||9m 0s|
|Bazaar||1m 25s||1m 38s||1m 35s||—|
You can see Mercurial and Git are noticeably faster than Subversion, and scale much better for large quantities of small files. I've seen arguments that Git is faster than Mercurial; this data indicates that it's faster at adds but not hugely so. If you count repacking Git's repository, however, the argument goes the other way, with Mercurial the clear leader.
Update 2008.04.25: Bazaar looks very good here, too.
Repository Expansion for 500MB Add
The purpose of this test is to see how much the repository grows as each file set is added. In the case of Subversion, note that the working copy is always 2x the size of the working files; all files are duplicated in the .svn directory. The numbers below show the repository size only. So, for each 500MB added, the working copy grows an additional 500MB and the repository grows by the amount shown below.
In the case of Git, I show the incremental size at each add (as expected), and after the last add I also did a
git gc to repack the repository.
|SCM Tool||DNG files||JPEG files||Document files||Total size (after repack w/ Git)|
There's little difference between these SCMs in how efficiently they store already-compressed images in the repository. Git is noticeably more efficient with the small document-type files.
Testing File Modification Time and Repository Growth
For my use, I'm not as concerned about making changes to lots of small files. My problem is with large image files. If I'm working on a big Photoshop file and want to commit changes often, I want those changes to take minimal space in my local repository.
To test this, I created a reasonably large Photoshop file (starting size 56MB) with several layers, then made several rounds of edits (ending size 71MB), committing the changes between each edit. The same sets of edits were applied for each SCM.
Time Required to Commit Modified File
|SCM Tool||Initial Add||First Change||Second Change||Third Change||Repack|
All SCMs posted respectable numbers here, with Mercurial out in front on committing changes.
Update 2008.04.25: Bazaar does a great job here, too.
Repository Growth With Modified File
The purpose of this test is to look at how much the repository grows each time the Photoshop file is modified. Note that with Subversion the working copy is fixed at 2x the size of the working files; all version history is stored in the repository. This can be a substantial advantage when your repository is on a separate server because you don't have to worry about your local copy growing out of control with many revisions. These Subversion numbers show growth of the repository, not the local copy. Also note that I forgot to take a data point, leading to two "not available" entries in the table below.
|SCM Tool||Initial Add||First Change||Second Change||Third Change||Total size (after repack w/ Git)|
Git turns in some very interesting numbers here. The repository grows significantly with each change (topping out at 197MB before repack) but packs down very tight on repack. The ending repository size is significantly smaller than the ending Photoshop file size (55MB repo, 71MB image).
Update 2008.04.25: Curiously, the Bazaar results track Mercurial's almost exactly. Are they using the same repository format? I did a few web searches but couldn't turn up an answer.
First, keep in mind that this is testing SCM systems for the purpose of managing a home directory, and the data used in the test is representative of my home directory. Your mileage will vary. I have specifically not focused on managing source code because the bulk of my source code is managed separately with my company's chosen SCM (mostly Subversion).
Looking at these numbers, Subversion finished worse than I expected. The working copy is always 2x the size of the files being managed, which can be a blessing for large binary files with many revisions but a curse for everything else. The repository growth is reasonable; I'm not as concerned about that. Speed-wise, adding files is slow to terrible. Updating large binary files is reasonable.
Git and Mercurial both turn in good numbers but make an interesting trade-off between speed and repository size. Mercurial is fast with both adds and modifications, and keeps repository growth under control at the same time. Git is also fast, but its repository grows very quickly with modified files until you repack — and those repacks can be very slow. But the packed repository is much smaller than Mercurial's.
So there's really no "Git rules, Mercurial sucks" argument or vice-versa. It's more a question of workflow and priorities. In my opinion, Mercurial is easier to set up and use day-to-day. Its "addremove" command, in particular, is a great time-saver. But Git can really squash its repository down with repacking, much smaller than Mercurial.
Honestly, I was expecting the numbers to reveal a clear-cut answer to which tool I should use. They didn't. So, my recommendation is to look at your workflow, evaluate how Git or Mercurial would fit into it, and pick based on that. Or flip a coin, whichever.
Update 2008.04.24: I've been using Mercurial for several months to manage my home directory and I'm quite happy with it. I may switch to Git, however, for its more compact repository — I haven't decided if it's worth the trouble.
Update 2008.04.25: I added results for Bazaar due to popular request. Its results track Mercurial very closely, so for the basic use I've tested here, I don't see a compelling reason to use one vs. the other.
- MacBook Pro dual 2.2 GHz, 2GB RAM
- Mac OS X 10.5.2
- Mercurial version 0.9.5
- Subversion version 1.4.4 (r25188)
- Git version 126.96.36.199
- Bazaar version 1.3