bos@559: bos@26: dongsheng@625: dongsheng@650: bos@559: Preface bos@26: dongsheng@650: dongsheng@650: Why revision control? Why Mercurial? dongsheng@650: dongsheng@650: Revision control is the process of managing multiple dongsheng@650: versions of a piece of information. In its simplest form, this dongsheng@650: is something that many people do by hand: every time you modify dongsheng@650: a file, save it under a new name that contains a number, each dongsheng@650: one higher than the number of the preceding version. dongsheng@650: dongsheng@650: Manually managing multiple versions of even a single file is dongsheng@650: an error-prone task, though, so software tools to help automate dongsheng@650: this process have long been available. The earliest automated dongsheng@650: revision control tools were intended to help a single user to dongsheng@650: manage revisions of a single file. Over the past few decades, dongsheng@650: the scope of revision control tools has expanded greatly; they dongsheng@650: now manage multiple files, and help multiple people to work dongsheng@650: together. The best modern revision control tools have no dongsheng@650: problem coping with thousands of people working together on dongsheng@650: projects that consist of hundreds of thousands of files. dongsheng@650: dongsheng@650: The arrival of distributed revision control is relatively dongsheng@650: recent, and so far this new field has grown due to people's dongsheng@650: willingness to explore ill-charted territory. dongsheng@650: dongsheng@650: I am writing a book about distributed revision control dongsheng@650: because I believe that it is an important subject that deserves dongsheng@650: a field guide. I chose to write about Mercurial because it is dongsheng@650: the easiest tool to learn the terrain with, and yet it scales to dongsheng@650: the demands of real, challenging environments where many other dongsheng@650: revision control tools buckle. dongsheng@650: dongsheng@650: dongsheng@650: Why use revision control? dongsheng@650: dongsheng@650: There are a number of reasons why you or your team might dongsheng@650: want to use an automated revision control tool for a dongsheng@650: project. dongsheng@650: dongsheng@650: dongsheng@650: It will track the history and evolution of dongsheng@650: your project, so you don't have to. For every change, dongsheng@650: you'll have a log of who made it; dongsheng@650: why they made it; dongsheng@650: when they made it; and dongsheng@650: what the change dongsheng@650: was. dongsheng@650: When you're working with other people, dongsheng@650: revision control software makes it easier for you to dongsheng@650: collaborate. For example, when people more or less dongsheng@650: simultaneously make potentially incompatible changes, the dongsheng@650: software will help you to identify and resolve those dongsheng@650: conflicts. dongsheng@650: It can help you to recover from mistakes. If dongsheng@650: you make a change that later turns out to be in error, you dongsheng@650: can revert to an earlier version of one or more files. In dongsheng@650: fact, a really good revision control dongsheng@650: tool will even help you to efficiently figure out exactly dongsheng@650: when a problem was introduced (see section for details). dongsheng@650: It will help you to work simultaneously on, dongsheng@650: and manage the drift between, multiple versions of your dongsheng@650: project. dongsheng@650: dongsheng@650: dongsheng@650: Most of these reasons are equally valid---at least in dongsheng@650: theory---whether you're working on a project by yourself, or dongsheng@650: with a hundred other people. dongsheng@650: dongsheng@650: A key question about the practicality of revision control dongsheng@650: at these two different scales (lone hacker and dongsheng@650: huge team) is how its dongsheng@650: benefits compare to its dongsheng@650: costs. A revision control tool that's dongsheng@650: difficult to understand or use is going to impose a high dongsheng@650: cost. dongsheng@650: dongsheng@650: A five-hundred-person project is likely to collapse under dongsheng@650: its own weight almost immediately without a revision control dongsheng@650: tool and process. In this case, the cost of using revision dongsheng@650: control might hardly seem worth considering, since dongsheng@650: without it, failure is almost dongsheng@650: guaranteed. dongsheng@650: dongsheng@650: On the other hand, a one-person quick hack dongsheng@650: might seem like a poor place to use a revision control tool, dongsheng@650: because surely the cost of using one must be close to the dongsheng@650: overall cost of the project. Right? dongsheng@650: dongsheng@650: Mercurial uniquely supports both of dongsheng@650: these scales of development. You can learn the basics in just dongsheng@650: a few minutes, and due to its low overhead, you can apply dongsheng@650: revision control to the smallest of projects with ease. Its dongsheng@650: simplicity means you won't have a lot of abstruse concepts or dongsheng@650: command sequences competing for mental space with whatever dongsheng@650: you're really trying to do. At the same dongsheng@650: time, Mercurial's high performance and peer-to-peer nature let dongsheng@650: you scale painlessly to handle large projects. dongsheng@650: dongsheng@650: No revision control tool can rescue a poorly run project, dongsheng@650: but a good choice of tools can make a huge difference to the dongsheng@650: fluidity with which you can work on a project. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: The many names of revision control dongsheng@650: dongsheng@650: Revision control is a diverse field, so much so that it is dongsheng@650: referred to by many names and acronyms. Here are a few of the dongsheng@650: more common variations you'll encounter: dongsheng@650: dongsheng@650: Revision control (RCS) dongsheng@650: Software configuration management (SCM), or dongsheng@650: configuration management dongsheng@650: Source code management dongsheng@650: Source code control, or source dongsheng@650: control dongsheng@650: Version control dongsheng@650: (VCS) dongsheng@650: Some people claim that these terms actually have different dongsheng@650: meanings, but in practice they overlap so much that there's no dongsheng@650: agreed or even useful way to tease them apart. dongsheng@650: dongsheng@650: dongsheng@650: bos@26: bos@559: bos@559: This book is a work in progress bos@26: dongsheng@650: I am releasing this book while I am still writing it, in the dongsheng@650: hope that it will prove useful to others. I am writing under an dongsheng@650: open license in the hope that you, my readers, will contribute dongsheng@650: feedback and perhaps content of your own. bos@200: bos@559: bos@559: bos@559: About the examples in this book bos@200: dongsheng@650: This book takes an unusual approach to code samples. Every bos@559: example is live---each one is actually the result bos@559: of a shell script that executes the Mercurial commands you see. bos@559: Every time an image of the book is built from its sources, all bos@559: the example scripts are automatically run, and their current bos@559: results compared against their expected results. bos@200: dongsheng@650: The advantage of this approach is that the examples are bos@559: always accurate; they describe exactly the bos@559: behaviour of the version of Mercurial that's mentioned at the bos@559: front of the book. If I update the version of Mercurial that bos@559: I'm documenting, and the output of some command changes, the bos@559: build fails. bos@200: dongsheng@650: There is a small disadvantage to this approach, which is bos@559: that the dates and times you'll see in examples tend to be bos@559: squashed together in a way that they wouldn't be bos@559: if the same commands were being typed by a human. Where a human bos@559: can issue no more than one command every few seconds, with any bos@559: resulting timestamps correspondingly spread out, my automated bos@559: example scripts run many commands in one second. bos@200: dongsheng@650: As an instance of this, several consecutive commits in an bos@559: example can show up as having occurred during the same second. bos@559: You can see this occur in the bisect example in section , for instance. bos@200: dongsheng@650: So when you're reading examples, don't place too much weight bos@559: on the dates or times you see in the output of commands. But bos@559: do be confident that the behaviour you're bos@559: seeing is consistent and reproducible. bos@26: bos@559: dongsheng@650: dongsheng@650: dongsheng@650: Trends in the field dongsheng@650: dongsheng@650: There has been an unmistakable trend in the development and dongsheng@650: use of revision control tools over the past four decades, as dongsheng@650: people have become familiar with the capabilities of their tools dongsheng@650: and constrained by their limitations. dongsheng@650: dongsheng@650: The first generation began by managing single files on dongsheng@650: individual computers. Although these tools represented a huge dongsheng@650: advance over ad-hoc manual revision control, their locking model dongsheng@650: and reliance on a single computer limited them to small, dongsheng@650: tightly-knit teams. dongsheng@650: dongsheng@650: The second generation loosened these constraints by moving dongsheng@650: to network-centered architectures, and managing entire projects dongsheng@650: at a time. As projects grew larger, they ran into new problems. dongsheng@650: With clients needing to talk to servers very frequently, server dongsheng@650: scaling became an issue for large projects. An unreliable dongsheng@650: network connection could prevent remote users from being able to dongsheng@650: talk to the server at all. As open source projects started dongsheng@650: making read-only access available anonymously to anyone, people dongsheng@650: without commit privileges found that they could not use the dongsheng@650: tools to interact with a project in a natural way, as they could dongsheng@650: not record their changes. dongsheng@650: dongsheng@650: The current generation of revision control tools is dongsheng@650: peer-to-peer in nature. All of these systems have dropped the dongsheng@650: dependency on a single central server, and allow people to dongsheng@650: distribute their revision control data to where it's actually dongsheng@650: needed. Collaboration over the Internet has moved from dongsheng@650: constrained by technology to a matter of choice and consensus. dongsheng@650: Modern tools can operate offline indefinitely and autonomously, dongsheng@650: with a network connection only needed when syncing changes with dongsheng@650: another repository. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: A few of the advantages of distributed revision dongsheng@650: control dongsheng@650: dongsheng@650: Even though distributed revision control tools have for dongsheng@650: several years been as robust and usable as their dongsheng@650: previous-generation counterparts, people using older tools have dongsheng@650: not yet necessarily woken up to their advantages. There are a dongsheng@650: number of ways in which distributed tools shine relative to dongsheng@650: centralised ones. dongsheng@650: dongsheng@650: For an individual developer, distributed tools are almost dongsheng@650: always much faster than centralised tools. This is for a simple dongsheng@650: reason: a centralised tool needs to talk over the network for dongsheng@650: many common operations, because most metadata is stored in a dongsheng@650: single copy on the central server. A distributed tool stores dongsheng@650: all of its metadata locally. All else being equal, talking over dongsheng@650: the network adds overhead to a centralised tool. Don't dongsheng@650: underestimate the value of a snappy, responsive tool: you're dongsheng@650: going to spend a lot of time interacting with your revision dongsheng@650: control software. dongsheng@650: dongsheng@650: Distributed tools are indifferent to the vagaries of your dongsheng@650: server infrastructure, again because they replicate metadata to dongsheng@650: so many locations. If you use a centralised system and your dongsheng@650: server catches fire, you'd better hope that your backup media dongsheng@650: are reliable, and that your last backup was recent and actually dongsheng@650: worked. With a distributed tool, you have many backups dongsheng@650: available on every contributor's computer. dongsheng@650: dongsheng@650: The reliability of your network will affect distributed dongsheng@650: tools far less than it will centralised tools. You can't even dongsheng@650: use a centralised tool without a network connection, except for dongsheng@650: a few highly constrained commands. With a distributed tool, if dongsheng@650: your network connection goes down while you're working, you may dongsheng@650: not even notice. The only thing you won't be able to do is talk dongsheng@650: to repositories on other computers, something that is relatively dongsheng@650: rare compared with local operations. If you have a far-flung dongsheng@650: team of collaborators, this may be significant. dongsheng@650: dongsheng@650: dongsheng@650: Advantages for open source projects dongsheng@650: dongsheng@650: If you take a shine to an open source project and decide dongsheng@650: that you would like to start hacking on it, and that project dongsheng@650: uses a distributed revision control tool, you are at once a dongsheng@650: peer with the people who consider themselves the dongsheng@650: core of that project. If they publish their dongsheng@650: repositories, you can immediately copy their project history, dongsheng@650: start making changes, and record your work, using the same dongsheng@650: tools in the same ways as insiders. By contrast, with a dongsheng@650: centralised tool, you must use the software in a read dongsheng@650: only mode unless someone grants you permission to dongsheng@650: commit changes to their central server. Until then, you won't dongsheng@650: be able to record changes, and your local modifications will dongsheng@650: be at risk of corruption any time you try to update your dongsheng@650: client's view of the repository. dongsheng@650: dongsheng@650: dongsheng@650: The forking non-problem dongsheng@650: dongsheng@650: It has been suggested that distributed revision control dongsheng@650: tools pose some sort of risk to open source projects because dongsheng@650: they make it easy to fork the development of dongsheng@650: a project. A fork happens when there are differences in dongsheng@650: opinion or attitude between groups of developers that cause dongsheng@650: them to decide that they can't work together any longer. dongsheng@650: Each side takes a more or less complete copy of the dongsheng@650: project's source code, and goes off in its own dongsheng@650: direction. dongsheng@650: dongsheng@650: Sometimes the camps in a fork decide to reconcile their dongsheng@650: differences. With a centralised revision control system, the dongsheng@650: technical process of reconciliation is dongsheng@650: painful, and has to be performed largely by hand. You have dongsheng@650: to decide whose revision history is going to dongsheng@650: win, and graft the other team's changes into dongsheng@650: the tree somehow. This usually loses some or all of one dongsheng@650: side's revision history. dongsheng@650: dongsheng@650: What distributed tools do with respect to forking is dongsheng@650: they make forking the only way to dongsheng@650: develop a project. Every single change that you make is dongsheng@650: potentially a fork point. The great strength of this dongsheng@650: approach is that a distributed revision control tool has to dongsheng@650: be really good at merging forks, dongsheng@650: because forks are absolutely fundamental: they happen all dongsheng@650: the time. dongsheng@650: dongsheng@650: If every piece of work that everybody does, all the dongsheng@650: time, is framed in terms of forking and merging, then what dongsheng@650: the open source world refers to as a fork dongsheng@650: becomes purely a social issue. If dongsheng@650: anything, distributed tools lower the dongsheng@650: likelihood of a fork: dongsheng@650: dongsheng@650: They eliminate the social distinction that dongsheng@650: centralised tools impose: that between insiders (people dongsheng@650: with commit access) and outsiders (people dongsheng@650: without). dongsheng@650: They make it easier to reconcile after a dongsheng@650: social fork, because all that's involved from the dongsheng@650: perspective of the revision control software is just dongsheng@650: another merge. dongsheng@650: dongsheng@650: Some people resist distributed tools because they want dongsheng@650: to retain tight control over their projects, and they dongsheng@650: believe that centralised tools give them this control. dongsheng@650: However, if you're of this belief, and you publish your CVS dongsheng@650: or Subversion repositories publicly, there are plenty of dongsheng@650: tools available that can pull out your entire project's dongsheng@650: history (albeit slowly) and recreate it somewhere that you dongsheng@650: don't control. So while your control in this case is dongsheng@650: illusory, you are forgoing the ability to fluidly dongsheng@650: collaborate with whatever people feel compelled to mirror dongsheng@650: and fork your history. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Advantages for commercial projects dongsheng@650: dongsheng@650: Many commercial projects are undertaken by teams that are dongsheng@650: scattered across the globe. Contributors who are far from a dongsheng@650: central server will see slower command execution and perhaps dongsheng@650: less reliability. Commercial revision control systems attempt dongsheng@650: to ameliorate these problems with remote-site replication dongsheng@650: add-ons that are typically expensive to buy and cantankerous dongsheng@650: to administer. A distributed system doesn't suffer from these dongsheng@650: problems in the first place. Better yet, you can easily set dongsheng@650: up multiple authoritative servers, say one per site, so that dongsheng@650: there's no redundant communication between repositories over dongsheng@650: expensive long-haul network links. dongsheng@650: dongsheng@650: Centralised revision control systems tend to have dongsheng@650: relatively low scalability. It's not unusual for an expensive dongsheng@650: centralised system to fall over under the combined load of dongsheng@650: just a few dozen concurrent users. Once again, the typical dongsheng@650: response tends to be an expensive and clunky replication dongsheng@650: facility. Since the load on a central server---if you have dongsheng@650: one at all---is many times lower with a distributed tool dongsheng@650: (because all of the data is replicated everywhere), a single dongsheng@650: cheap server can handle the needs of a much larger team, and dongsheng@650: replication to balance load becomes a simple matter of dongsheng@650: scripting. dongsheng@650: dongsheng@650: If you have an employee in the field, troubleshooting a dongsheng@650: problem at a customer's site, they'll benefit from distributed dongsheng@650: revision control. The tool will let them generate custom dongsheng@650: builds, try different fixes in isolation from each other, and dongsheng@650: search efficiently through history for the sources of bugs and dongsheng@650: regressions in the customer's environment, all without needing dongsheng@650: to connect to your company's network. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Why choose Mercurial? dongsheng@650: dongsheng@650: Mercurial has a unique set of properties that make it a dongsheng@650: particularly good choice as a revision control system. dongsheng@650: dongsheng@650: It is easy to learn and use. dongsheng@650: It is lightweight. dongsheng@650: It scales excellently. dongsheng@650: It is easy to dongsheng@650: customise. dongsheng@650: dongsheng@650: If you are at all familiar with revision control systems, dongsheng@650: you should be able to get up and running with Mercurial in less dongsheng@650: than five minutes. Even if not, it will take no more than a few dongsheng@650: minutes longer. Mercurial's command and feature sets are dongsheng@650: generally uniform and consistent, so you can keep track of a few dongsheng@650: general rules instead of a host of exceptions. dongsheng@650: dongsheng@650: On a small project, you can start working with Mercurial in dongsheng@650: moments. Creating new changes and branches; transferring changes dongsheng@650: around (whether locally or over a network); and history and dongsheng@650: status operations are all fast. Mercurial attempts to stay dongsheng@650: nimble and largely out of your way by combining low cognitive dongsheng@650: overhead with blazingly fast operations. dongsheng@650: dongsheng@650: The usefulness of Mercurial is not limited to small dongsheng@650: projects: it is used by projects with hundreds to thousands of dongsheng@650: contributors, each containing tens of thousands of files and dongsheng@650: hundreds of megabytes of source code. dongsheng@650: dongsheng@650: If the core functionality of Mercurial is not enough for dongsheng@650: you, it's easy to build on. Mercurial is well suited to dongsheng@650: scripting tasks, and its clean internals and implementation in dongsheng@650: Python make it easy to add features in the form of extensions. dongsheng@650: There are a number of popular and useful extensions already dongsheng@650: available, ranging from helping to identify bugs to improving dongsheng@650: performance. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Mercurial compared with other tools dongsheng@650: dongsheng@650: Before you read on, please understand that this section dongsheng@650: necessarily reflects my own experiences, interests, and (dare I dongsheng@650: say it) biases. I have used every one of the revision control dongsheng@650: tools listed below, in most cases for several years at a dongsheng@650: time. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Subversion dongsheng@650: dongsheng@650: Subversion is a popular revision control tool, developed dongsheng@650: to replace CVS. It has a centralised client/server dongsheng@650: architecture. dongsheng@650: dongsheng@650: Subversion and Mercurial have similarly named commands for dongsheng@650: performing the same operations, so if you're familiar with dongsheng@650: one, it is easy to learn to use the other. Both tools are dongsheng@650: portable to all popular operating systems. dongsheng@650: dongsheng@650: Prior to version 1.5, Subversion had no useful support for dongsheng@650: merges. At the time of writing, its merge tracking capability dongsheng@650: is new, and known to be complicated dongsheng@650: and buggy. dongsheng@650: dongsheng@650: Mercurial has a substantial performance advantage over dongsheng@650: Subversion on every revision control operation I have dongsheng@650: benchmarked. I have measured its advantage as ranging from a dongsheng@650: factor of two to a factor of six when compared with Subversion dongsheng@650: 1.4.3's ra_local file store, which is the dongsheng@650: fastest access method available. In more realistic dongsheng@650: deployments involving a network-based store, Subversion will dongsheng@650: be at a substantially larger disadvantage. Because many dongsheng@650: Subversion commands must talk to the server and Subversion dongsheng@650: does not have useful replication facilities, server capacity dongsheng@650: and network bandwidth become bottlenecks for modestly large dongsheng@650: projects. dongsheng@650: dongsheng@650: Additionally, Subversion incurs substantial storage dongsheng@650: overhead to avoid network transactions for a few common dongsheng@650: operations, such as finding modified files dongsheng@650: (status) and displaying modifications dongsheng@650: against the current revision (diff). As a dongsheng@650: result, a Subversion working copy is often the same size as, dongsheng@650: or larger than, a Mercurial repository and working directory, dongsheng@650: even though the Mercurial repository contains a complete dongsheng@650: history of the project. dongsheng@650: dongsheng@650: Subversion is widely supported by third party tools. dongsheng@650: Mercurial currently lags considerably in this area. This gap dongsheng@650: is closing, however, and indeed some of Mercurial's GUI tools dongsheng@650: now outshine their Subversion equivalents. Like Mercurial, dongsheng@650: Subversion has an excellent user manual. dongsheng@650: dongsheng@650: Because Subversion doesn't store revision history on the dongsheng@650: client, it is well suited to managing projects that deal with dongsheng@650: lots of large, opaque binary files. If you check in fifty dongsheng@650: revisions to an incompressible 10MB file, Subversion's dongsheng@650: client-side space usage stays constant The space used by any dongsheng@650: distributed SCM will grow rapidly in proportion to the number dongsheng@650: of revisions, because the differences between each revision dongsheng@650: are large. dongsheng@650: dongsheng@650: In addition, it's often difficult or, more usually, dongsheng@650: impossible to merge different versions of a binary file. dongsheng@650: Subversion's ability to let a user lock a file, so that they dongsheng@650: temporarily have the exclusive right to commit changes to it, dongsheng@650: can be a significant advantage to a project where binary files dongsheng@650: are widely used. dongsheng@650: dongsheng@650: Mercurial can import revision history from a Subversion dongsheng@650: repository. It can also export revision history to a dongsheng@650: Subversion repository. This makes it easy to test the dongsheng@650: waters and use Mercurial and Subversion in parallel dongsheng@650: before deciding to switch. History conversion is incremental, dongsheng@650: so you can perform an initial conversion, then small dongsheng@650: additional conversions afterwards to bring in new dongsheng@650: changes. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Git dongsheng@650: dongsheng@650: Git is a distributed revision control tool that was dongsheng@650: developed for managing the Linux kernel source tree. Like dongsheng@650: Mercurial, its early design was somewhat influenced by dongsheng@650: Monotone. dongsheng@650: dongsheng@650: Git has a very large command set, with version 1.5.0 dongsheng@650: providing 139 individual commands. It has something of a dongsheng@650: reputation for being difficult to learn. Compared to Git, dongsheng@650: Mercurial has a strong focus on simplicity. dongsheng@650: dongsheng@650: In terms of performance, Git is extremely fast. In dongsheng@650: several cases, it is faster than Mercurial, at least on Linux, dongsheng@650: while Mercurial performs better on other operations. However, dongsheng@650: on Windows, the performance and general level of support that dongsheng@650: Git provides is, at the time of writing, far behind that of dongsheng@650: Mercurial. dongsheng@650: dongsheng@650: While a Mercurial repository needs no maintenance, a Git dongsheng@650: repository requires frequent manual repacks of dongsheng@650: its metadata. Without these, performance degrades, while dongsheng@650: space usage grows rapidly. A server that contains many Git dongsheng@650: repositories that are not rigorously and frequently repacked dongsheng@650: will become heavily disk-bound during backups, and there have dongsheng@650: been instances of daily backups taking far longer than 24 dongsheng@650: hours as a result. A freshly packed Git repository is dongsheng@650: slightly smaller than a Mercurial repository, but an unpacked dongsheng@650: repository is several orders of magnitude larger. dongsheng@650: dongsheng@650: The core of Git is written in C. Many Git commands are dongsheng@650: implemented as shell or Perl scripts, and the quality of these dongsheng@650: scripts varies widely. I have encountered several instances dongsheng@650: where scripts charged along blindly in the presence of errors dongsheng@650: that should have been fatal. dongsheng@650: dongsheng@650: Mercurial can import revision history from a Git dongsheng@650: repository. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: CVS dongsheng@650: dongsheng@650: CVS is probably the most widely used revision control tool dongsheng@650: in the world. Due to its age and internal untidiness, it has dongsheng@650: been only lightly maintained for many years. dongsheng@650: dongsheng@650: It has a centralised client/server architecture. It does dongsheng@650: not group related file changes into atomic commits, making it dongsheng@650: easy for people to break the build: one person dongsheng@650: can successfully commit part of a change and then be blocked dongsheng@650: by the need for a merge, causing other people to see only a dongsheng@650: portion of the work they intended to do. This also affects dongsheng@650: how you work with project history. If you want to see all of dongsheng@650: the modifications someone made as part of a task, you will dongsheng@650: need to manually inspect the descriptions and timestamps of dongsheng@650: the changes made to each file involved (if you even know what dongsheng@650: those files were). dongsheng@650: dongsheng@650: CVS has a muddled notion of tags and branches that I will dongsheng@650: not attempt to even describe. It does not support renaming of dongsheng@650: files or directories well, making it easy to corrupt a dongsheng@650: repository. It has almost no internal consistency checking dongsheng@650: capabilities, so it is usually not even possible to tell dongsheng@650: whether or how a repository is corrupt. I would not recommend dongsheng@650: CVS for any project, existing or new. dongsheng@650: dongsheng@650: Mercurial can import CVS revision history. However, there dongsheng@650: are a few caveats that apply; these are true of every other dongsheng@650: revision control tool's CVS importer, too. Due to CVS's lack dongsheng@650: of atomic changes and unversioned filesystem hierarchy, it is dongsheng@650: not possible to reconstruct CVS history completely accurately; dongsheng@650: some guesswork is involved, and renames will usually not show dongsheng@650: up. Because a lot of advanced CVS administration has to be dongsheng@650: done by hand and is hence error-prone, it's common for CVS dongsheng@650: importers to run into multiple problems with corrupted dongsheng@650: repositories (completely bogus revision timestamps and files dongsheng@650: that have remained locked for over a decade are just two of dongsheng@650: the less interesting problems I can recall from personal dongsheng@650: experience). dongsheng@650: dongsheng@650: Mercurial can import revision history from a CVS dongsheng@650: repository. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Commercial tools dongsheng@650: dongsheng@650: Perforce has a centralised client/server architecture, dongsheng@650: with no client-side caching of any data. Unlike modern dongsheng@650: revision control tools, Perforce requires that a user run a dongsheng@650: command to inform the server about every file they intend to dongsheng@650: edit. dongsheng@650: dongsheng@650: The performance of Perforce is quite good for small teams, dongsheng@650: but it falls off rapidly as the number of users grows beyond a dongsheng@650: few dozen. Modestly large Perforce installations require the dongsheng@650: deployment of proxies to cope with the load their users dongsheng@650: generate. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Choosing a revision control tool dongsheng@650: dongsheng@650: With the exception of CVS, all of the tools listed above dongsheng@650: have unique strengths that suit them to particular styles of dongsheng@650: work. There is no single revision control tool that is best dongsheng@650: in all situations. dongsheng@650: dongsheng@650: As an example, Subversion is a good choice for working dongsheng@650: with frequently edited binary files, due to its centralised dongsheng@650: nature and support for file locking. dongsheng@650: dongsheng@650: I personally find Mercurial's properties of simplicity, dongsheng@650: performance, and good merge support to be a compelling dongsheng@650: combination that has served me well for several years. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Switching from another tool to Mercurial dongsheng@650: dongsheng@650: Mercurial is bundled with an extension named convert, which can incrementally dongsheng@650: import revision history from several other revision control dongsheng@650: tools. By incremental, I mean that you can dongsheng@650: convert all of a project's history to date in one go, then rerun dongsheng@650: the conversion later to obtain new changes that happened after dongsheng@650: the initial conversion. dongsheng@650: dongsheng@650: The revision control tools supported by convert are as follows: dongsheng@650: dongsheng@650: Subversion dongsheng@650: CVS dongsheng@650: Git dongsheng@650: Darcs dongsheng@650: dongsheng@650: In addition, convert can dongsheng@650: export changes from Mercurial to Subversion. This makes it dongsheng@650: possible to try Subversion and Mercurial in parallel before dongsheng@650: committing to a switchover, without risking the loss of any dongsheng@650: work. dongsheng@650: dongsheng@650: The convert command dongsheng@650: is easy to use. Simply point it at the path or URL of the dongsheng@650: source repository, optionally give it the name of the dongsheng@650: destination repository, and it will start working. After the dongsheng@650: initial conversion, just run the same command again to import dongsheng@650: new changes. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: A short history of revision control dongsheng@650: dongsheng@650: The best known of the old-time revision control tools is dongsheng@650: SCCS (Source Code Control System), which Marc Rochkind wrote at dongsheng@650: Bell Labs, in the early 1970s. SCCS operated on individual dongsheng@650: files, and required every person working on a project to have dongsheng@650: access to a shared workspace on a single system. Only one dongsheng@650: person could modify a file at any time; arbitration for access dongsheng@650: to files was via locks. It was common for people to lock files, dongsheng@650: and later forget to unlock them, preventing anyone else from dongsheng@650: modifying those files without the help of an dongsheng@650: administrator. dongsheng@650: dongsheng@650: Walter Tichy developed a free alternative to SCCS in the dongsheng@650: early 1980s; he called his program RCS (Revision Control System). dongsheng@650: Like SCCS, RCS required developers to work in a single shared dongsheng@650: workspace, and to lock files to prevent multiple people from dongsheng@650: modifying them simultaneously. dongsheng@650: dongsheng@650: Later in the 1980s, Dick Grune used RCS as a building block dongsheng@650: for a set of shell scripts he initially called cmt, but then dongsheng@650: renamed to CVS (Concurrent Versions System). The big innovation dongsheng@650: of CVS was that it let developers work simultaneously and dongsheng@650: somewhat independently in their own personal workspaces. The dongsheng@650: personal workspaces prevented developers from stepping on each dongsheng@650: other's toes all the time, as was common with SCCS and RCS. Each dongsheng@650: developer had a copy of every project file, and could modify dongsheng@650: their copies independently. They had to merge their edits prior dongsheng@650: to committing changes to the central repository. dongsheng@650: dongsheng@650: Brian Berliner took Grune's original scripts and rewrote dongsheng@650: them in C, releasing in 1989 the code that has since developed dongsheng@650: into the modern version of CVS. CVS subsequently acquired the dongsheng@650: ability to operate over a network connection, giving it a dongsheng@650: client/server architecture. CVS's architecture is centralised; dongsheng@650: only the server has a copy of the history of the project. Client dongsheng@650: workspaces just contain copies of recent versions of the dongsheng@650: project's files, and a little metadata to tell them where the dongsheng@650: server is. CVS has been enormously successful; it is probably dongsheng@650: the world's most widely used revision control system. dongsheng@650: dongsheng@650: In the early 1990s, Sun Microsystems developed an early dongsheng@650: distributed revision control system, called TeamWare. A dongsheng@650: TeamWare workspace contains a complete copy of the project's dongsheng@650: history. TeamWare has no notion of a central repository. (CVS dongsheng@650: relied upon RCS for its history storage; TeamWare used dongsheng@650: SCCS.) dongsheng@650: dongsheng@650: As the 1990s progressed, awareness grew of a number of dongsheng@650: problems with CVS. It records simultaneous changes to multiple dongsheng@650: files individually, instead of grouping them together as a dongsheng@650: single logically atomic operation. It does not manage its file dongsheng@650: hierarchy well; it is easy to make a mess of a repository by dongsheng@650: renaming files and directories. Worse, its source code is dongsheng@650: difficult to read and maintain, which made the pain dongsheng@650: level of fixing these architectural problems dongsheng@650: prohibitive. dongsheng@650: dongsheng@650: In 2001, Jim Blandy and Karl Fogel, two developers who had dongsheng@650: worked on CVS, started a project to replace it with a tool that dongsheng@650: would have a better architecture and cleaner code. The result, dongsheng@650: Subversion, does not stray from CVS's centralised client/server dongsheng@650: model, but it adds multi-file atomic commits, better namespace dongsheng@650: management, and a number of other features that make it a dongsheng@650: generally better tool than CVS. Since its initial release, it dongsheng@650: has rapidly grown in popularity. dongsheng@650: dongsheng@650: More or less simultaneously, Graydon Hoare began working on dongsheng@650: an ambitious distributed revision control system that he named dongsheng@650: Monotone. While Monotone addresses many of CVS's design flaws dongsheng@650: and has a peer-to-peer architecture, it goes beyond earlier (and dongsheng@650: subsequent) revision control tools in a number of innovative dongsheng@650: ways. It uses cryptographic hashes as identifiers, and has an dongsheng@650: integral notion of trust for code from different dongsheng@650: sources. dongsheng@650: dongsheng@650: Mercurial began life in 2005. While a few aspects of its dongsheng@650: design are influenced by Monotone, Mercurial focuses on ease of dongsheng@650: use, high performance, and scalability to very large dongsheng@650: projects. dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: dongsheng@650: Colophon&emdash;this book is Free dongsheng@650: dongsheng@650: This book is licensed under the Open Publication License, bos@559: and is produced entirely using Free Software tools. It is bos@580: typeset with DocBook XML. Illustrations are drawn and rendered with bos@559: Inkscape. bos@26: dongsheng@650: The complete source code for this book is published as a bos@559: Mercurial repository, at http://hg.serpentine.com/mercurial/book. bos@559: bos@559: bos@559: bos@559: