hgbook
diff en/ch00-preface.xml @ 583:28b5a5befb08
Fold preface and intro into one
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Thu Mar 19 20:54:12 2009 -0700 (2009-03-19) |
parents | 8366882f67f2 |
children | c838b3975bc6 |
line diff
1.1 --- a/en/ch00-preface.xml Wed Mar 18 00:00:58 2009 -0700 1.2 +++ b/en/ch00-preface.xml Thu Mar 19 20:54:12 2009 -0700 1.3 @@ -3,23 +3,139 @@ 1.4 <preface id="chap:preface"> 1.5 <title>Preface</title> 1.6 1.7 - <para>Distributed revision control is a relatively new territory, 1.8 - and has thus far grown due to people's willingness to strike out 1.9 - into ill-charted territory.</para> 1.10 - 1.11 - <para>I am writing a book about distributed revision control because 1.12 - I believe that it is an important subject that deserves a field 1.13 - guide. I chose to write about Mercurial because it is the easiest 1.14 - tool to learn the terrain with, and yet it scales to the demands 1.15 - of real, challenging environments where many other revision 1.16 - control tools fail.</para> 1.17 + <sect1> 1.18 + <title>Why revision control? Why Mercurial?</title> 1.19 + 1.20 + <para>Revision control is the process of managing multiple 1.21 + versions of a piece of information. In its simplest form, this 1.22 + is something that many people do by hand: every time you modify 1.23 + a file, save it under a new name that contains a number, each 1.24 + one higher than the number of the preceding version.</para> 1.25 + 1.26 + <para>Manually managing multiple versions of even a single file is 1.27 + an error-prone task, though, so software tools to help automate 1.28 + this process have long been available. The earliest automated 1.29 + revision control tools were intended to help a single user to 1.30 + manage revisions of a single file. Over the past few decades, 1.31 + the scope of revision control tools has expanded greatly; they 1.32 + now manage multiple files, and help multiple people to work 1.33 + together. The best modern revision control tools have no 1.34 + problem coping with thousands of people working together on 1.35 + projects that consist of hundreds of thousands of files.</para> 1.36 + 1.37 + <para>The arrival of distributed revision control is relatively 1.38 + recent, and so far this new field has grown due to people's 1.39 + willingness to explore ill-charted territory.</para> 1.40 + 1.41 + <para>I am writing a book about distributed revision control 1.42 + because I believe that it is an important subject that deserves 1.43 + a field guide. I chose to write about Mercurial because it is 1.44 + the easiest tool to learn the terrain with, and yet it scales to 1.45 + the demands of real, challenging environments where many other 1.46 + revision control tools buckle.</para> 1.47 + 1.48 + <sect2> 1.49 + <title>Why use revision control?</title> 1.50 + 1.51 + <para>There are a number of reasons why you or your team might 1.52 + want to use an automated revision control tool for a 1.53 + project.</para> 1.54 + 1.55 + <itemizedlist> 1.56 + <listitem><para>It will track the history and evolution of 1.57 + your project, so you don't have to. For every change, 1.58 + you'll have a log of <emphasis>who</emphasis> made it; 1.59 + <emphasis>why</emphasis> they made it; 1.60 + <emphasis>when</emphasis> they made it; and 1.61 + <emphasis>what</emphasis> the change 1.62 + was.</para></listitem> 1.63 + <listitem><para>When you're working with other people, 1.64 + revision control software makes it easier for you to 1.65 + collaborate. For example, when people more or less 1.66 + simultaneously make potentially incompatible changes, the 1.67 + software will help you to identify and resolve those 1.68 + conflicts.</para></listitem> 1.69 + <listitem><para>It can help you to recover from mistakes. If 1.70 + you make a change that later turns out to be in error, you 1.71 + can revert to an earlier version of one or more files. In 1.72 + fact, a <emphasis>really</emphasis> good revision control 1.73 + tool will even help you to efficiently figure out exactly 1.74 + when a problem was introduced (see section <xref 1.75 + linkend="sec:undo:bisect"/> for details).</para></listitem> 1.76 + <listitem><para>It will help you to work simultaneously on, 1.77 + and manage the drift between, multiple versions of your 1.78 + project.</para></listitem> 1.79 + </itemizedlist> 1.80 + 1.81 + <para>Most of these reasons are equally valid---at least in 1.82 + theory---whether you're working on a project by yourself, or 1.83 + with a hundred other people.</para> 1.84 + 1.85 + <para>A key question about the practicality of revision control 1.86 + at these two different scales (<quote>lone hacker</quote> and 1.87 + <quote>huge team</quote>) is how its 1.88 + <emphasis>benefits</emphasis> compare to its 1.89 + <emphasis>costs</emphasis>. A revision control tool that's 1.90 + difficult to understand or use is going to impose a high 1.91 + cost.</para> 1.92 + 1.93 + <para>A five-hundred-person project is likely to collapse under 1.94 + its own weight almost immediately without a revision control 1.95 + tool and process. In this case, the cost of using revision 1.96 + control might hardly seem worth considering, since 1.97 + <emphasis>without</emphasis> it, failure is almost 1.98 + guaranteed.</para> 1.99 + 1.100 + <para>On the other hand, a one-person <quote>quick hack</quote> 1.101 + might seem like a poor place to use a revision control tool, 1.102 + because surely the cost of using one must be close to the 1.103 + overall cost of the project. Right?</para> 1.104 + 1.105 + <para>Mercurial uniquely supports <emphasis>both</emphasis> of 1.106 + these scales of development. You can learn the basics in just 1.107 + a few minutes, and due to its low overhead, you can apply 1.108 + revision control to the smallest of projects with ease. Its 1.109 + simplicity means you won't have a lot of abstruse concepts or 1.110 + command sequences competing for mental space with whatever 1.111 + you're <emphasis>really</emphasis> trying to do. At the same 1.112 + time, Mercurial's high performance and peer-to-peer nature let 1.113 + you scale painlessly to handle large projects.</para> 1.114 + 1.115 + <para>No revision control tool can rescue a poorly run project, 1.116 + but a good choice of tools can make a huge difference to the 1.117 + fluidity with which you can work on a project.</para> 1.118 + 1.119 + </sect2> 1.120 + 1.121 + <sect2> 1.122 + <title>The many names of revision control</title> 1.123 + 1.124 + <para>Revision control is a diverse field, so much so that it is 1.125 + referred to by many names and acronyms. Here are a few of the 1.126 + more common variations you'll encounter:</para> 1.127 + <itemizedlist> 1.128 + <listitem><para>Revision control (RCS)</para></listitem> 1.129 + <listitem><para>Software configuration management (SCM), or 1.130 + configuration management</para></listitem> 1.131 + <listitem><para>Source code management</para></listitem> 1.132 + <listitem><para>Source code control, or source 1.133 + control</para></listitem> 1.134 + <listitem><para>Version control 1.135 + (VCS)</para></listitem></itemizedlist> 1.136 + <para>Some people claim that these terms actually have different 1.137 + meanings, but in practice they overlap so much that there's no 1.138 + agreed or even useful way to tease them apart.</para> 1.139 + 1.140 + </sect2> 1.141 + </sect1> 1.142 1.143 <sect1> 1.144 <title>This book is a work in progress</title> 1.145 1.146 <para>I am releasing this book while I am still writing it, in the 1.147 - hope that it will prove useful to others. I also hope that 1.148 - readers will contribute as they see fit.</para> 1.149 + hope that it will prove useful to others. I am writing under an 1.150 + open license in the hope that you, my readers, will contribute 1.151 + feedback and perhaps content of your own.</para> 1.152 1.153 </sect1> 1.154 <sect1> 1.155 @@ -59,8 +175,567 @@ 1.156 seeing is consistent and reproducible.</para> 1.157 1.158 </sect1> 1.159 - <sect1> 1.160 - <title>Colophon---this book is Free</title> 1.161 + 1.162 + <sect1> 1.163 + <title>Trends in the field</title> 1.164 + 1.165 + <para>There has been an unmistakable trend in the development and 1.166 + use of revision control tools over the past four decades, as 1.167 + people have become familiar with the capabilities of their tools 1.168 + and constrained by their limitations.</para> 1.169 + 1.170 + <para>The first generation began by managing single files on 1.171 + individual computers. Although these tools represented a huge 1.172 + advance over ad-hoc manual revision control, their locking model 1.173 + and reliance on a single computer limited them to small, 1.174 + tightly-knit teams.</para> 1.175 + 1.176 + <para>The second generation loosened these constraints by moving 1.177 + to network-centered architectures, and managing entire projects 1.178 + at a time. As projects grew larger, they ran into new problems. 1.179 + With clients needing to talk to servers very frequently, server 1.180 + scaling became an issue for large projects. An unreliable 1.181 + network connection could prevent remote users from being able to 1.182 + talk to the server at all. As open source projects started 1.183 + making read-only access available anonymously to anyone, people 1.184 + without commit privileges found that they could not use the 1.185 + tools to interact with a project in a natural way, as they could 1.186 + not record their changes.</para> 1.187 + 1.188 + <para>The current generation of revision control tools is 1.189 + peer-to-peer in nature. All of these systems have dropped the 1.190 + dependency on a single central server, and allow people to 1.191 + distribute their revision control data to where it's actually 1.192 + needed. Collaboration over the Internet has moved from 1.193 + constrained by technology to a matter of choice and consensus. 1.194 + Modern tools can operate offline indefinitely and autonomously, 1.195 + with a network connection only needed when syncing changes with 1.196 + another repository.</para> 1.197 + 1.198 + </sect1> 1.199 + <sect1> 1.200 + <title>A few of the advantages of distributed revision 1.201 + control</title> 1.202 + 1.203 + <para>Even though distributed revision control tools have for 1.204 + several years been as robust and usable as their 1.205 + previous-generation counterparts, people using older tools have 1.206 + not yet necessarily woken up to their advantages. There are a 1.207 + number of ways in which distributed tools shine relative to 1.208 + centralised ones.</para> 1.209 + 1.210 + <para>For an individual developer, distributed tools are almost 1.211 + always much faster than centralised tools. This is for a simple 1.212 + reason: a centralised tool needs to talk over the network for 1.213 + many common operations, because most metadata is stored in a 1.214 + single copy on the central server. A distributed tool stores 1.215 + all of its metadata locally. All else being equal, talking over 1.216 + the network adds overhead to a centralised tool. Don't 1.217 + underestimate the value of a snappy, responsive tool: you're 1.218 + going to spend a lot of time interacting with your revision 1.219 + control software.</para> 1.220 + 1.221 + <para>Distributed tools are indifferent to the vagaries of your 1.222 + server infrastructure, again because they replicate metadata to 1.223 + so many locations. If you use a centralised system and your 1.224 + server catches fire, you'd better hope that your backup media 1.225 + are reliable, and that your last backup was recent and actually 1.226 + worked. With a distributed tool, you have many backups 1.227 + available on every contributor's computer.</para> 1.228 + 1.229 + <para>The reliability of your network will affect distributed 1.230 + tools far less than it will centralised tools. You can't even 1.231 + use a centralised tool without a network connection, except for 1.232 + a few highly constrained commands. With a distributed tool, if 1.233 + your network connection goes down while you're working, you may 1.234 + not even notice. The only thing you won't be able to do is talk 1.235 + to repositories on other computers, something that is relatively 1.236 + rare compared with local operations. If you have a far-flung 1.237 + team of collaborators, this may be significant.</para> 1.238 + 1.239 + <sect2> 1.240 + <title>Advantages for open source projects</title> 1.241 + 1.242 + <para>If you take a shine to an open source project and decide 1.243 + that you would like to start hacking on it, and that project 1.244 + uses a distributed revision control tool, you are at once a 1.245 + peer with the people who consider themselves the 1.246 + <quote>core</quote> of that project. If they publish their 1.247 + repositories, you can immediately copy their project history, 1.248 + start making changes, and record your work, using the same 1.249 + tools in the same ways as insiders. By contrast, with a 1.250 + centralised tool, you must use the software in a <quote>read 1.251 + only</quote> mode unless someone grants you permission to 1.252 + commit changes to their central server. Until then, you won't 1.253 + be able to record changes, and your local modifications will 1.254 + be at risk of corruption any time you try to update your 1.255 + client's view of the repository.</para> 1.256 + 1.257 + <sect3> 1.258 + <title>The forking non-problem</title> 1.259 + 1.260 + <para>It has been suggested that distributed revision control 1.261 + tools pose some sort of risk to open source projects because 1.262 + they make it easy to <quote>fork</quote> the development of 1.263 + a project. A fork happens when there are differences in 1.264 + opinion or attitude between groups of developers that cause 1.265 + them to decide that they can't work together any longer. 1.266 + Each side takes a more or less complete copy of the 1.267 + project's source code, and goes off in its own 1.268 + direction.</para> 1.269 + 1.270 + <para>Sometimes the camps in a fork decide to reconcile their 1.271 + differences. With a centralised revision control system, the 1.272 + <emphasis>technical</emphasis> process of reconciliation is 1.273 + painful, and has to be performed largely by hand. You have 1.274 + to decide whose revision history is going to 1.275 + <quote>win</quote>, and graft the other team's changes into 1.276 + the tree somehow. This usually loses some or all of one 1.277 + side's revision history.</para> 1.278 + 1.279 + <para>What distributed tools do with respect to forking is 1.280 + they make forking the <emphasis>only</emphasis> way to 1.281 + develop a project. Every single change that you make is 1.282 + potentially a fork point. The great strength of this 1.283 + approach is that a distributed revision control tool has to 1.284 + be really good at <emphasis>merging</emphasis> forks, 1.285 + because forks are absolutely fundamental: they happen all 1.286 + the time.</para> 1.287 + 1.288 + <para>If every piece of work that everybody does, all the 1.289 + time, is framed in terms of forking and merging, then what 1.290 + the open source world refers to as a <quote>fork</quote> 1.291 + becomes <emphasis>purely</emphasis> a social issue. If 1.292 + anything, distributed tools <emphasis>lower</emphasis> the 1.293 + likelihood of a fork:</para> 1.294 + <itemizedlist> 1.295 + <listitem><para>They eliminate the social distinction that 1.296 + centralised tools impose: that between insiders (people 1.297 + with commit access) and outsiders (people 1.298 + without).</para></listitem> 1.299 + <listitem><para>They make it easier to reconcile after a 1.300 + social fork, because all that's involved from the 1.301 + perspective of the revision control software is just 1.302 + another merge.</para></listitem></itemizedlist> 1.303 + 1.304 + <para>Some people resist distributed tools because they want 1.305 + to retain tight control over their projects, and they 1.306 + believe that centralised tools give them this control. 1.307 + However, if you're of this belief, and you publish your CVS 1.308 + or Subversion repositories publicly, there are plenty of 1.309 + tools available that can pull out your entire project's 1.310 + history (albeit slowly) and recreate it somewhere that you 1.311 + don't control. So while your control in this case is 1.312 + illusory, you are forgoing the ability to fluidly 1.313 + collaborate with whatever people feel compelled to mirror 1.314 + and fork your history.</para> 1.315 + 1.316 + </sect3> 1.317 + </sect2> 1.318 + <sect2> 1.319 + <title>Advantages for commercial projects</title> 1.320 + 1.321 + <para>Many commercial projects are undertaken by teams that are 1.322 + scattered across the globe. Contributors who are far from a 1.323 + central server will see slower command execution and perhaps 1.324 + less reliability. Commercial revision control systems attempt 1.325 + to ameliorate these problems with remote-site replication 1.326 + add-ons that are typically expensive to buy and cantankerous 1.327 + to administer. A distributed system doesn't suffer from these 1.328 + problems in the first place. Better yet, you can easily set 1.329 + up multiple authoritative servers, say one per site, so that 1.330 + there's no redundant communication between repositories over 1.331 + expensive long-haul network links.</para> 1.332 + 1.333 + <para>Centralised revision control systems tend to have 1.334 + relatively low scalability. It's not unusual for an expensive 1.335 + centralised system to fall over under the combined load of 1.336 + just a few dozen concurrent users. Once again, the typical 1.337 + response tends to be an expensive and clunky replication 1.338 + facility. Since the load on a central server---if you have 1.339 + one at all---is many times lower with a distributed tool 1.340 + (because all of the data is replicated everywhere), a single 1.341 + cheap server can handle the needs of a much larger team, and 1.342 + replication to balance load becomes a simple matter of 1.343 + scripting.</para> 1.344 + 1.345 + <para>If you have an employee in the field, troubleshooting a 1.346 + problem at a customer's site, they'll benefit from distributed 1.347 + revision control. The tool will let them generate custom 1.348 + builds, try different fixes in isolation from each other, and 1.349 + search efficiently through history for the sources of bugs and 1.350 + regressions in the customer's environment, all without needing 1.351 + to connect to your company's network.</para> 1.352 + 1.353 + </sect2> 1.354 + </sect1> 1.355 + <sect1> 1.356 + <title>Why choose Mercurial?</title> 1.357 + 1.358 + <para>Mercurial has a unique set of properties that make it a 1.359 + particularly good choice as a revision control system.</para> 1.360 + <itemizedlist> 1.361 + <listitem><para>It is easy to learn and use.</para></listitem> 1.362 + <listitem><para>It is lightweight.</para></listitem> 1.363 + <listitem><para>It scales excellently.</para></listitem> 1.364 + <listitem><para>It is easy to 1.365 + customise.</para></listitem></itemizedlist> 1.366 + 1.367 + <para>If you are at all familiar with revision control systems, 1.368 + you should be able to get up and running with Mercurial in less 1.369 + than five minutes. Even if not, it will take no more than a few 1.370 + minutes longer. Mercurial's command and feature sets are 1.371 + generally uniform and consistent, so you can keep track of a few 1.372 + general rules instead of a host of exceptions.</para> 1.373 + 1.374 + <para>On a small project, you can start working with Mercurial in 1.375 + moments. Creating new changes and branches; transferring changes 1.376 + around (whether locally or over a network); and history and 1.377 + status operations are all fast. Mercurial attempts to stay 1.378 + nimble and largely out of your way by combining low cognitive 1.379 + overhead with blazingly fast operations.</para> 1.380 + 1.381 + <para>The usefulness of Mercurial is not limited to small 1.382 + projects: it is used by projects with hundreds to thousands of 1.383 + contributors, each containing tens of thousands of files and 1.384 + hundreds of megabytes of source code.</para> 1.385 + 1.386 + <para>If the core functionality of Mercurial is not enough for 1.387 + you, it's easy to build on. Mercurial is well suited to 1.388 + scripting tasks, and its clean internals and implementation in 1.389 + Python make it easy to add features in the form of extensions. 1.390 + There are a number of popular and useful extensions already 1.391 + available, ranging from helping to identify bugs to improving 1.392 + performance.</para> 1.393 + 1.394 + </sect1> 1.395 + <sect1> 1.396 + <title>Mercurial compared with other tools</title> 1.397 + 1.398 + <para>Before you read on, please understand that this section 1.399 + necessarily reflects my own experiences, interests, and (dare I 1.400 + say it) biases. I have used every one of the revision control 1.401 + tools listed below, in most cases for several years at a 1.402 + time.</para> 1.403 + 1.404 + 1.405 + <sect2> 1.406 + <title>Subversion</title> 1.407 + 1.408 + <para>Subversion is a popular revision control tool, developed 1.409 + to replace CVS. It has a centralised client/server 1.410 + architecture.</para> 1.411 + 1.412 + <para>Subversion and Mercurial have similarly named commands for 1.413 + performing the same operations, so if you're familiar with 1.414 + one, it is easy to learn to use the other. Both tools are 1.415 + portable to all popular operating systems.</para> 1.416 + 1.417 + <para>Prior to version 1.5, Subversion had no useful support for 1.418 + merges. At the time of writing, its merge tracking capability 1.419 + is new, and known to be <ulink 1.420 + url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated 1.421 + and buggy</ulink>.</para> 1.422 + 1.423 + <para>Mercurial has a substantial performance advantage over 1.424 + Subversion on every revision control operation I have 1.425 + benchmarked. I have measured its advantage as ranging from a 1.426 + factor of two to a factor of six when compared with Subversion 1.427 + 1.4.3's <emphasis>ra_local</emphasis> file store, which is the 1.428 + fastest access method available. In more realistic 1.429 + deployments involving a network-based store, Subversion will 1.430 + be at a substantially larger disadvantage. Because many 1.431 + Subversion commands must talk to the server and Subversion 1.432 + does not have useful replication facilities, server capacity 1.433 + and network bandwidth become bottlenecks for modestly large 1.434 + projects.</para> 1.435 + 1.436 + <para>Additionally, Subversion incurs substantial storage 1.437 + overhead to avoid network transactions for a few common 1.438 + operations, such as finding modified files 1.439 + (<literal>status</literal>) and displaying modifications 1.440 + against the current revision (<literal>diff</literal>). As a 1.441 + result, a Subversion working copy is often the same size as, 1.442 + or larger than, a Mercurial repository and working directory, 1.443 + even though the Mercurial repository contains a complete 1.444 + history of the project.</para> 1.445 + 1.446 + <para>Subversion is widely supported by third party tools. 1.447 + Mercurial currently lags considerably in this area. This gap 1.448 + is closing, however, and indeed some of Mercurial's GUI tools 1.449 + now outshine their Subversion equivalents. Like Mercurial, 1.450 + Subversion has an excellent user manual.</para> 1.451 + 1.452 + <para>Because Subversion doesn't store revision history on the 1.453 + client, it is well suited to managing projects that deal with 1.454 + lots of large, opaque binary files. If you check in fifty 1.455 + revisions to an incompressible 10MB file, Subversion's 1.456 + client-side space usage stays constant The space used by any 1.457 + distributed SCM will grow rapidly in proportion to the number 1.458 + of revisions, because the differences between each revision 1.459 + are large.</para> 1.460 + 1.461 + <para>In addition, it's often difficult or, more usually, 1.462 + impossible to merge different versions of a binary file. 1.463 + Subversion's ability to let a user lock a file, so that they 1.464 + temporarily have the exclusive right to commit changes to it, 1.465 + can be a significant advantage to a project where binary files 1.466 + are widely used.</para> 1.467 + 1.468 + <para>Mercurial can import revision history from a Subversion 1.469 + repository. It can also export revision history to a 1.470 + Subversion repository. This makes it easy to <quote>test the 1.471 + waters</quote> and use Mercurial and Subversion in parallel 1.472 + before deciding to switch. History conversion is incremental, 1.473 + so you can perform an initial conversion, then small 1.474 + additional conversions afterwards to bring in new 1.475 + changes.</para> 1.476 + 1.477 + 1.478 + </sect2> 1.479 + <sect2> 1.480 + <title>Git</title> 1.481 + 1.482 + <para>Git is a distributed revision control tool that was 1.483 + developed for managing the Linux kernel source tree. Like 1.484 + Mercurial, its early design was somewhat influenced by 1.485 + Monotone.</para> 1.486 + 1.487 + <para>Git has a very large command set, with version 1.5.0 1.488 + providing 139 individual commands. It has something of a 1.489 + reputation for being difficult to learn. Compared to Git, 1.490 + Mercurial has a strong focus on simplicity.</para> 1.491 + 1.492 + <para>In terms of performance, Git is extremely fast. In 1.493 + several cases, it is faster than Mercurial, at least on Linux, 1.494 + while Mercurial performs better on other operations. However, 1.495 + on Windows, the performance and general level of support that 1.496 + Git provides is, at the time of writing, far behind that of 1.497 + Mercurial.</para> 1.498 + 1.499 + <para>While a Mercurial repository needs no maintenance, a Git 1.500 + repository requires frequent manual <quote>repacks</quote> of 1.501 + its metadata. Without these, performance degrades, while 1.502 + space usage grows rapidly. A server that contains many Git 1.503 + repositories that are not rigorously and frequently repacked 1.504 + will become heavily disk-bound during backups, and there have 1.505 + been instances of daily backups taking far longer than 24 1.506 + hours as a result. A freshly packed Git repository is 1.507 + slightly smaller than a Mercurial repository, but an unpacked 1.508 + repository is several orders of magnitude larger.</para> 1.509 + 1.510 + <para>The core of Git is written in C. Many Git commands are 1.511 + implemented as shell or Perl scripts, and the quality of these 1.512 + scripts varies widely. I have encountered several instances 1.513 + where scripts charged along blindly in the presence of errors 1.514 + that should have been fatal.</para> 1.515 + 1.516 + <para>Mercurial can import revision history from a Git 1.517 + repository.</para> 1.518 + 1.519 + 1.520 + </sect2> 1.521 + <sect2> 1.522 + <title>CVS</title> 1.523 + 1.524 + <para>CVS is probably the most widely used revision control tool 1.525 + in the world. Due to its age and internal untidiness, it has 1.526 + been only lightly maintained for many years.</para> 1.527 + 1.528 + <para>It has a centralised client/server architecture. It does 1.529 + not group related file changes into atomic commits, making it 1.530 + easy for people to <quote>break the build</quote>: one person 1.531 + can successfully commit part of a change and then be blocked 1.532 + by the need for a merge, causing other people to see only a 1.533 + portion of the work they intended to do. This also affects 1.534 + how you work with project history. If you want to see all of 1.535 + the modifications someone made as part of a task, you will 1.536 + need to manually inspect the descriptions and timestamps of 1.537 + the changes made to each file involved (if you even know what 1.538 + those files were).</para> 1.539 + 1.540 + <para>CVS has a muddled notion of tags and branches that I will 1.541 + not attempt to even describe. It does not support renaming of 1.542 + files or directories well, making it easy to corrupt a 1.543 + repository. It has almost no internal consistency checking 1.544 + capabilities, so it is usually not even possible to tell 1.545 + whether or how a repository is corrupt. I would not recommend 1.546 + CVS for any project, existing or new.</para> 1.547 + 1.548 + <para>Mercurial can import CVS revision history. However, there 1.549 + are a few caveats that apply; these are true of every other 1.550 + revision control tool's CVS importer, too. Due to CVS's lack 1.551 + of atomic changes and unversioned filesystem hierarchy, it is 1.552 + not possible to reconstruct CVS history completely accurately; 1.553 + some guesswork is involved, and renames will usually not show 1.554 + up. Because a lot of advanced CVS administration has to be 1.555 + done by hand and is hence error-prone, it's common for CVS 1.556 + importers to run into multiple problems with corrupted 1.557 + repositories (completely bogus revision timestamps and files 1.558 + that have remained locked for over a decade are just two of 1.559 + the less interesting problems I can recall from personal 1.560 + experience).</para> 1.561 + 1.562 + <para>Mercurial can import revision history from a CVS 1.563 + repository.</para> 1.564 + 1.565 + 1.566 + </sect2> 1.567 + <sect2> 1.568 + <title>Commercial tools</title> 1.569 + 1.570 + <para>Perforce has a centralised client/server architecture, 1.571 + with no client-side caching of any data. Unlike modern 1.572 + revision control tools, Perforce requires that a user run a 1.573 + command to inform the server about every file they intend to 1.574 + edit.</para> 1.575 + 1.576 + <para>The performance of Perforce is quite good for small teams, 1.577 + but it falls off rapidly as the number of users grows beyond a 1.578 + few dozen. Modestly large Perforce installations require the 1.579 + deployment of proxies to cope with the load their users 1.580 + generate.</para> 1.581 + 1.582 + 1.583 + </sect2> 1.584 + <sect2> 1.585 + <title>Choosing a revision control tool</title> 1.586 + 1.587 + <para>With the exception of CVS, all of the tools listed above 1.588 + have unique strengths that suit them to particular styles of 1.589 + work. There is no single revision control tool that is best 1.590 + in all situations.</para> 1.591 + 1.592 + <para>As an example, Subversion is a good choice for working 1.593 + with frequently edited binary files, due to its centralised 1.594 + nature and support for file locking.</para> 1.595 + 1.596 + <para>I personally find Mercurial's properties of simplicity, 1.597 + performance, and good merge support to be a compelling 1.598 + combination that has served me well for several years.</para> 1.599 + 1.600 + 1.601 + </sect2> 1.602 + </sect1> 1.603 + <sect1> 1.604 + <title>Switching from another tool to Mercurial</title> 1.605 + 1.606 + <para>Mercurial is bundled with an extension named <literal 1.607 + role="hg-ext">convert</literal>, which can incrementally 1.608 + import revision history from several other revision control 1.609 + tools. By <quote>incremental</quote>, I mean that you can 1.610 + convert all of a project's history to date in one go, then rerun 1.611 + the conversion later to obtain new changes that happened after 1.612 + the initial conversion.</para> 1.613 + 1.614 + <para>The revision control tools supported by <literal 1.615 + role="hg-ext">convert</literal> are as follows:</para> 1.616 + <itemizedlist> 1.617 + <listitem><para>Subversion</para></listitem> 1.618 + <listitem><para>CVS</para></listitem> 1.619 + <listitem><para>Git</para></listitem> 1.620 + <listitem><para>Darcs</para></listitem></itemizedlist> 1.621 + 1.622 + <para>In addition, <literal role="hg-ext">convert</literal> can 1.623 + export changes from Mercurial to Subversion. This makes it 1.624 + possible to try Subversion and Mercurial in parallel before 1.625 + committing to a switchover, without risking the loss of any 1.626 + work.</para> 1.627 + 1.628 + <para>The <command role="hg-ext-convert">convert</command> command 1.629 + is easy to use. Simply point it at the path or URL of the 1.630 + source repository, optionally give it the name of the 1.631 + destination repository, and it will start working. After the 1.632 + initial conversion, just run the same command again to import 1.633 + new changes.</para> 1.634 + </sect1> 1.635 + 1.636 + <sect1> 1.637 + <title>A short history of revision control</title> 1.638 + 1.639 + <para>The best known of the old-time revision control tools is 1.640 + SCCS (Source Code Control System), which Marc Rochkind wrote at 1.641 + Bell Labs, in the early 1970s. SCCS operated on individual 1.642 + files, and required every person working on a project to have 1.643 + access to a shared workspace on a single system. Only one 1.644 + person could modify a file at any time; arbitration for access 1.645 + to files was via locks. It was common for people to lock files, 1.646 + and later forget to unlock them, preventing anyone else from 1.647 + modifying those files without the help of an 1.648 + administrator.</para> 1.649 + 1.650 + <para>Walter Tichy developed a free alternative to SCCS in the 1.651 + early 1980s; he called his program RCS (Revision Control System). 1.652 + Like SCCS, RCS required developers to work in a single shared 1.653 + workspace, and to lock files to prevent multiple people from 1.654 + modifying them simultaneously.</para> 1.655 + 1.656 + <para>Later in the 1980s, Dick Grune used RCS as a building block 1.657 + for a set of shell scripts he initially called cmt, but then 1.658 + renamed to CVS (Concurrent Versions System). The big innovation 1.659 + of CVS was that it let developers work simultaneously and 1.660 + somewhat independently in their own personal workspaces. The 1.661 + personal workspaces prevented developers from stepping on each 1.662 + other's toes all the time, as was common with SCCS and RCS. Each 1.663 + developer had a copy of every project file, and could modify 1.664 + their copies independently. They had to merge their edits prior 1.665 + to committing changes to the central repository.</para> 1.666 + 1.667 + <para>Brian Berliner took Grune's original scripts and rewrote 1.668 + them in C, releasing in 1989 the code that has since developed 1.669 + into the modern version of CVS. CVS subsequently acquired the 1.670 + ability to operate over a network connection, giving it a 1.671 + client/server architecture. CVS's architecture is centralised; 1.672 + only the server has a copy of the history of the project. Client 1.673 + workspaces just contain copies of recent versions of the 1.674 + project's files, and a little metadata to tell them where the 1.675 + server is. CVS has been enormously successful; it is probably 1.676 + the world's most widely used revision control system.</para> 1.677 + 1.678 + <para>In the early 1990s, Sun Microsystems developed an early 1.679 + distributed revision control system, called TeamWare. A 1.680 + TeamWare workspace contains a complete copy of the project's 1.681 + history. TeamWare has no notion of a central repository. (CVS 1.682 + relied upon RCS for its history storage; TeamWare used 1.683 + SCCS.)</para> 1.684 + 1.685 + <para>As the 1990s progressed, awareness grew of a number of 1.686 + problems with CVS. It records simultaneous changes to multiple 1.687 + files individually, instead of grouping them together as a 1.688 + single logically atomic operation. It does not manage its file 1.689 + hierarchy well; it is easy to make a mess of a repository by 1.690 + renaming files and directories. Worse, its source code is 1.691 + difficult to read and maintain, which made the <quote>pain 1.692 + level</quote> of fixing these architectural problems 1.693 + prohibitive.</para> 1.694 + 1.695 + <para>In 2001, Jim Blandy and Karl Fogel, two developers who had 1.696 + worked on CVS, started a project to replace it with a tool that 1.697 + would have a better architecture and cleaner code. The result, 1.698 + Subversion, does not stray from CVS's centralised client/server 1.699 + model, but it adds multi-file atomic commits, better namespace 1.700 + management, and a number of other features that make it a 1.701 + generally better tool than CVS. Since its initial release, it 1.702 + has rapidly grown in popularity.</para> 1.703 + 1.704 + <para>More or less simultaneously, Graydon Hoare began working on 1.705 + an ambitious distributed revision control system that he named 1.706 + Monotone. While Monotone addresses many of CVS's design flaws 1.707 + and has a peer-to-peer architecture, it goes beyond earlier (and 1.708 + subsequent) revision control tools in a number of innovative 1.709 + ways. It uses cryptographic hashes as identifiers, and has an 1.710 + integral notion of <quote>trust</quote> for code from different 1.711 + sources.</para> 1.712 + 1.713 + <para>Mercurial began life in 2005. While a few aspects of its 1.714 + design are influenced by Monotone, Mercurial focuses on ease of 1.715 + use, high performance, and scalability to very large 1.716 + projects.</para> 1.717 + 1.718 + </sect1> 1.719 + 1.720 + <sect1> 1.721 + <title>Colophon&emdash;this book is Free</title> 1.722 1.723 <para>This book is licensed under the Open Publication License, 1.724 and is produced entirely using Free Software tools. It is