New Directions in Version Control
If you're new here, you may want to subscribe to my free RSS feed. Thanks for visiting!
There are two trends that have emerged over the last few years that have had a significant impact on the version control tool space. In the past we’ve seen major shifts as version control tools went from small file oriented versioning systems like SCCS and RCS to tools that more easily managed a complete source tree like CVS. The move to handling complete source trees had with it a move to handle networks better and so was born things like the capabilities to use CVS over a network and its offspring subversion. This pretty much was the state of the revision control space for some time.
Direction #1 - Distributed Revision Control
Bitkeeper introduced a major innovation whose impact is still playing out, namely a highly distributed repository with local commit capability. This innovation got major street cred since bitkeeper was used to manage the linux kernel source for some time. When that relationship ended, a whole set of new tools were developed to fill the void including tools such as monotone, mercurial, bazaar and others (darcs ??). The primary feature of all these tools is their use of a local repositories allowing developers to commit locally while also providing for a means to share changes between repositories. Of course the distributed side of these tools is optional allowing a project to centralized if they choose.
This architectural direction is likely here to stay. Software organizations are becoming more distributed not less so. Additionally hosts of young software engineers brought up participating in open source projects that use these tools chafe at the restrictions imposed by centralized version control tools.
Direction #2 - Whole Tree Version Control
This is likely more controversial. The summary of distributed revision control systems above purposely ignored one of the largest such tools, namely GIT, the revision control system written by Linus Torvalds. GIT is the tool of choice for managing the linux Kernel and serves as the replacement for Bitkeeper. GIT is a distributed revision control system at its heart and so it could easily be placed in the distributed category.
What distinguishes GIT from the other distributed revision control systems is its unique repository structure and consequently, how it records changes. All of the above tools, including their predecessors like CVS, Subversion, RCS, SCCS, etc all think about a software tree as a set of versioned files and their repositories reflect this. Versioning is conducted at the file level. Thus, if you interrogate the system for the history of changes to the whole tree, it derives that from the set of changes recorded in each file archive. If you need to extract a “snapshot” or label of the software, the versioning tools need to traverse each file archive and extract the correct version. In this sense, the traditional repository structure stores what is necessary to derive the contents of the whole tree.
GIT completely abandons this model to its great profit. A change set for git is not a collection of revisions to files, but rather a snapshot of the whole software tree with some new objects created inside the snapshot to represent changed files. This makes whole tree operations much much faster and safer. It also makes advanced merge capabilities much easier to deal with because the context for the change (the whole tree) is always in view. Most importantly though, this is a much safer repository structure since the actual contents of a tree at a point in time are not derived from the revisions of individual files which can be corrupt. The repository simply stores this what the tree looks like. This is an extremely innovative departure from how version control has been performed.
My bet is that these two trends will continue to impact how version control systems are architected for some time to come.
Relevant Links:
Linus Torvalds on GIT at the Googleplex
A look back: Brahm Cohen vs. Linus Torvalds
If you enjoyed this post, make sure you subscribe to my RSS feed!








This post has 8 comments
July 24th, 2007
GIT and Mercurial share the same conceptual model, based on the one from Monotone.
The difference you point out is an implementation difference: storing a whole changeset in the same file will get you faster checkout but slower annotate. Storing history per-file like mercurial will give you specular results.
The main benefit of the Mercurial layout from my point of view is automatic compression, while with git you have to manually repack once in a while. The real reason the Mercurial developers choose this layout is the reduced need of seeks needed for all common operations and the fact that the GIT one is nearly worst-case when you do a cp on a repo.
July 24th, 2007
I thought the same, but the Git structure seemed to handle the safety concerns related to a repository becoming corrupted better. Combine that with digitally signing a complete tree in GIT and you end up with a pretty resilient repository. Perhaps the structure of a GIT repository favors that side of the equation while mercurial optimizes for per file operations. Is that a fair characterization? I think I have heard Linus express the reasons behind GIT’s repository structure along similar lines.
July 25th, 2007
Whole tree versioning isn’t unique to Git: Subversion uses it too.
http://svnbook.red-bean.com/nightly/en/svn.basic.in-action.html#svn.basic.in-action.revs
http://svnbook.red-bean.com/nightly/en/svn.forcvs.revnums.html
July 25th, 2007
Thanks for that correction, it looks like subversion beat GIT to the punch. Does subversion offer any way to sign a change in a vein similar to how GIT does to catch possible repository tampering, corruption, etc?
July 25th, 2007
Found this through Thoof, nice to see some more technical content on there, please keep submitting!
I’ve used both CVS and Subversion, and I must confess I’m having a hard time understanding what the big deal is with git. Perhaps if someone could cite one or more real-world thing that are hard or impossible to achieve with Subversion but easy or easier with git, it would be very helpful.
July 25th, 2007
I don’t know if Subversion offers signing or not. However, attack-resistance aside, the fact that Subversion uses a whole-tree approach despite being vastly inferior to Git in in its capabilities leads me to believe that whole-tree versioning isn’t what makes Git innovative.
In fact, whole-tree versioning seems to be a prerequisite for having atomic commits, which no modern VCS could do without. Even if you’re dealing with a system that stores revisions for individual files (like Mercurial), there has to be a top-level manifest that refers to a set of file revisions, and so you still effectively have whole-tree versioning.
Whole tree versioning is not a major innovation. All modern VCS’s do it, because it is obvious to anyone working in this space that it is the only sane choice.
Git’s main innovation (and Linus’s main design criterion) seems to be performance. It and Mercurial simultaneously became the only two open-source dVCS’s that could handle huge trees and large histories with good performance.
July 26th, 2007
Paul, thanks for your comment. I just recently ran across thoof myself. It looks to be an interesting way to find good content. I’ll be sure to post things there in the future.
July 26th, 2007
Hey Josh, I think atomic commits can be implemented on top of a number of different repository structures such as whole tree versioning or file by file versioning. The key to atomic commits is to avoid “partial change checkins” because of failures during the commit or parts of a checkin sneaking into updates during. There are lots of implementations out there of atomic commits that dont do whole tree versioning in each changeset, especially in the commercial tool space. This is normally done via some kind of transaction technique.
Perhaps part of what makes GIT interesting is its combination of distributed revision control along with a wholesale belief in whole tree versioning in each changeset and using this in its merging approach, security via signing etc.
Add a comment