hgbook

annotate en/ch00-preface.xml @ 679:06458701453c

Fix up some links to example URLs that aren't actually real.
author Bryan O'Sullivan <bos@serpentine.com>
date Tue Apr 21 21:07:20 2009 -0700 (2009-04-21)
parents 3b33dd6aba87
children acf9dc5f088d
rev   line source
bos@559 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
bos@26 2
bos@559 3 <preface id="chap:preface">
bos@587 4 <?dbhtml filename="preface.html"?>
bos@559 5 <title>Preface</title>
bos@26 6
bos@583 7 <sect1>
bos@583 8 <title>Why revision control? Why Mercurial?</title>
bos@583 9
bos@584 10 <para id="x_6d">Revision control is the process of managing multiple
bos@583 11 versions of a piece of information. In its simplest form, this
bos@583 12 is something that many people do by hand: every time you modify
bos@583 13 a file, save it under a new name that contains a number, each
bos@583 14 one higher than the number of the preceding version.</para>
bos@583 15
bos@584 16 <para id="x_6e">Manually managing multiple versions of even a single file is
bos@583 17 an error-prone task, though, so software tools to help automate
bos@583 18 this process have long been available. The earliest automated
bos@583 19 revision control tools were intended to help a single user to
bos@583 20 manage revisions of a single file. Over the past few decades,
bos@583 21 the scope of revision control tools has expanded greatly; they
bos@583 22 now manage multiple files, and help multiple people to work
bos@583 23 together. The best modern revision control tools have no
bos@583 24 problem coping with thousands of people working together on
bos@583 25 projects that consist of hundreds of thousands of files.</para>
bos@583 26
bos@584 27 <para id="x_6f">The arrival of distributed revision control is relatively
bos@583 28 recent, and so far this new field has grown due to people's
bos@583 29 willingness to explore ill-charted territory.</para>
bos@583 30
bos@584 31 <para id="x_70">I am writing a book about distributed revision control
bos@583 32 because I believe that it is an important subject that deserves
bos@583 33 a field guide. I chose to write about Mercurial because it is
bos@583 34 the easiest tool to learn the terrain with, and yet it scales to
bos@583 35 the demands of real, challenging environments where many other
bos@583 36 revision control tools buckle.</para>
bos@583 37
bos@583 38 <sect2>
bos@583 39 <title>Why use revision control?</title>
bos@583 40
bos@584 41 <para id="x_71">There are a number of reasons why you or your team might
bos@583 42 want to use an automated revision control tool for a
bos@583 43 project.</para>
bos@583 44
bos@583 45 <itemizedlist>
bos@584 46 <listitem><para id="x_72">It will track the history and evolution of
bos@583 47 your project, so you don't have to. For every change,
bos@583 48 you'll have a log of <emphasis>who</emphasis> made it;
bos@583 49 <emphasis>why</emphasis> they made it;
bos@583 50 <emphasis>when</emphasis> they made it; and
bos@583 51 <emphasis>what</emphasis> the change
bos@583 52 was.</para></listitem>
bos@584 53 <listitem><para id="x_73">When you're working with other people,
bos@583 54 revision control software makes it easier for you to
bos@583 55 collaborate. For example, when people more or less
bos@583 56 simultaneously make potentially incompatible changes, the
bos@583 57 software will help you to identify and resolve those
bos@583 58 conflicts.</para></listitem>
bos@584 59 <listitem><para id="x_74">It can help you to recover from mistakes. If
bos@583 60 you make a change that later turns out to be in error, you
bos@583 61 can revert to an earlier version of one or more files. In
bos@583 62 fact, a <emphasis>really</emphasis> good revision control
bos@583 63 tool will even help you to efficiently figure out exactly
bos@592 64 when a problem was introduced (see <xref
bos@583 65 linkend="sec:undo:bisect"/> for details).</para></listitem>
bos@584 66 <listitem><para id="x_75">It will help you to work simultaneously on,
bos@583 67 and manage the drift between, multiple versions of your
bos@583 68 project.</para></listitem>
bos@583 69 </itemizedlist>
bos@583 70
bos@609 71 <para id="x_76">Most of these reasons are equally
bos@609 72 valid&emdash;at least in theory&emdash;whether you're working
bos@609 73 on a project by yourself, or with a hundred other
bos@609 74 people.</para>
bos@583 75
bos@584 76 <para id="x_77">A key question about the practicality of revision control
bos@583 77 at these two different scales (<quote>lone hacker</quote> and
bos@583 78 <quote>huge team</quote>) is how its
bos@583 79 <emphasis>benefits</emphasis> compare to its
bos@583 80 <emphasis>costs</emphasis>. A revision control tool that's
bos@583 81 difficult to understand or use is going to impose a high
bos@583 82 cost.</para>
bos@583 83
bos@584 84 <para id="x_78">A five-hundred-person project is likely to collapse under
bos@583 85 its own weight almost immediately without a revision control
bos@583 86 tool and process. In this case, the cost of using revision
bos@583 87 control might hardly seem worth considering, since
bos@583 88 <emphasis>without</emphasis> it, failure is almost
bos@583 89 guaranteed.</para>
bos@583 90
bos@584 91 <para id="x_79">On the other hand, a one-person <quote>quick hack</quote>
bos@583 92 might seem like a poor place to use a revision control tool,
bos@583 93 because surely the cost of using one must be close to the
bos@583 94 overall cost of the project. Right?</para>
bos@583 95
bos@584 96 <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of
bos@583 97 these scales of development. You can learn the basics in just
bos@583 98 a few minutes, and due to its low overhead, you can apply
bos@583 99 revision control to the smallest of projects with ease. Its
bos@583 100 simplicity means you won't have a lot of abstruse concepts or
bos@583 101 command sequences competing for mental space with whatever
bos@583 102 you're <emphasis>really</emphasis> trying to do. At the same
bos@583 103 time, Mercurial's high performance and peer-to-peer nature let
bos@583 104 you scale painlessly to handle large projects.</para>
bos@583 105
bos@584 106 <para id="x_7b">No revision control tool can rescue a poorly run project,
bos@583 107 but a good choice of tools can make a huge difference to the
bos@583 108 fluidity with which you can work on a project.</para>
bos@583 109
bos@583 110 </sect2>
bos@583 111
bos@583 112 <sect2>
bos@583 113 <title>The many names of revision control</title>
bos@583 114
bos@584 115 <para id="x_7c">Revision control is a diverse field, so much so that it is
bos@583 116 referred to by many names and acronyms. Here are a few of the
bos@583 117 more common variations you'll encounter:</para>
bos@583 118 <itemizedlist>
bos@584 119 <listitem><para id="x_7d">Revision control (RCS)</para></listitem>
bos@584 120 <listitem><para id="x_7e">Software configuration management (SCM), or
bos@583 121 configuration management</para></listitem>
bos@584 122 <listitem><para id="x_7f">Source code management</para></listitem>
bos@584 123 <listitem><para id="x_80">Source code control, or source
bos@583 124 control</para></listitem>
bos@584 125 <listitem><para id="x_81">Version control
bos@583 126 (VCS)</para></listitem></itemizedlist>
bos@584 127 <para id="x_82">Some people claim that these terms actually have different
bos@583 128 meanings, but in practice they overlap so much that there's no
bos@583 129 agreed or even useful way to tease them apart.</para>
bos@583 130
bos@583 131 </sect2>
bos@583 132 </sect1>
bos@26 133
bos@559 134 <sect1>
bos@559 135 <title>This book is a work in progress</title>
bos@26 136
bos@584 137 <para id="x_83">I am releasing this book while I am still writing it, in the
bos@583 138 hope that it will prove useful to others. I am writing under an
bos@583 139 open license in the hope that you, my readers, will contribute
bos@583 140 feedback and perhaps content of your own.</para>
bos@200 141
bos@559 142 </sect1>
bos@559 143 <sect1>
bos@559 144 <title>About the examples in this book</title>
bos@200 145
bos@584 146 <para id="x_84">This book takes an unusual approach to code samples. Every
bos@609 147 example is <quote>live</quote>&emdash;each one is actually the result
bos@559 148 of a shell script that executes the Mercurial commands you see.
bos@559 149 Every time an image of the book is built from its sources, all
bos@559 150 the example scripts are automatically run, and their current
bos@559 151 results compared against their expected results.</para>
bos@200 152
bos@584 153 <para id="x_85">The advantage of this approach is that the examples are
bos@559 154 always accurate; they describe <emphasis>exactly</emphasis> the
bos@672 155 behavior of the version of Mercurial that's mentioned at the
bos@559 156 front of the book. If I update the version of Mercurial that
bos@559 157 I'm documenting, and the output of some command changes, the
bos@559 158 build fails.</para>
bos@200 159
bos@584 160 <para id="x_86">There is a small disadvantage to this approach, which is
bos@559 161 that the dates and times you'll see in examples tend to be
bos@559 162 <quote>squashed</quote> together in a way that they wouldn't be
bos@559 163 if the same commands were being typed by a human. Where a human
bos@559 164 can issue no more than one command every few seconds, with any
bos@559 165 resulting timestamps correspondingly spread out, my automated
bos@559 166 example scripts run many commands in one second.</para>
bos@200 167
bos@584 168 <para id="x_87">As an instance of this, several consecutive commits in an
bos@559 169 example can show up as having occurred during the same second.
bos@559 170 You can see this occur in the <literal
bos@592 171 role="hg-ext">bisect</literal> example in <xref
bos@589 172 linkend="sec:undo:bisect"/>, for instance.</para>
bos@200 173
bos@584 174 <para id="x_88">So when you're reading examples, don't place too much weight
bos@559 175 on the dates or times you see in the output of commands. But
bos@672 176 <emphasis>do</emphasis> be confident that the behavior you're
bos@559 177 seeing is consistent and reproducible.</para>
bos@26 178
bos@559 179 </sect1>
bos@583 180
bos@583 181 <sect1>
bos@583 182 <title>Trends in the field</title>
bos@583 183
bos@584 184 <para id="x_89">There has been an unmistakable trend in the development and
bos@583 185 use of revision control tools over the past four decades, as
bos@583 186 people have become familiar with the capabilities of their tools
bos@583 187 and constrained by their limitations.</para>
bos@583 188
bos@584 189 <para id="x_8a">The first generation began by managing single files on
bos@583 190 individual computers. Although these tools represented a huge
bos@583 191 advance over ad-hoc manual revision control, their locking model
bos@583 192 and reliance on a single computer limited them to small,
bos@583 193 tightly-knit teams.</para>
bos@583 194
bos@584 195 <para id="x_8b">The second generation loosened these constraints by moving
bos@583 196 to network-centered architectures, and managing entire projects
bos@583 197 at a time. As projects grew larger, they ran into new problems.
bos@583 198 With clients needing to talk to servers very frequently, server
bos@583 199 scaling became an issue for large projects. An unreliable
bos@583 200 network connection could prevent remote users from being able to
bos@583 201 talk to the server at all. As open source projects started
bos@583 202 making read-only access available anonymously to anyone, people
bos@583 203 without commit privileges found that they could not use the
bos@583 204 tools to interact with a project in a natural way, as they could
bos@583 205 not record their changes.</para>
bos@583 206
bos@584 207 <para id="x_8c">The current generation of revision control tools is
bos@583 208 peer-to-peer in nature. All of these systems have dropped the
bos@583 209 dependency on a single central server, and allow people to
bos@583 210 distribute their revision control data to where it's actually
bos@583 211 needed. Collaboration over the Internet has moved from
bos@583 212 constrained by technology to a matter of choice and consensus.
bos@583 213 Modern tools can operate offline indefinitely and autonomously,
bos@583 214 with a network connection only needed when syncing changes with
bos@583 215 another repository.</para>
bos@583 216
bos@583 217 </sect1>
bos@583 218 <sect1>
bos@583 219 <title>A few of the advantages of distributed revision
bos@583 220 control</title>
bos@583 221
bos@584 222 <para id="x_8d">Even though distributed revision control tools have for
bos@583 223 several years been as robust and usable as their
bos@583 224 previous-generation counterparts, people using older tools have
bos@583 225 not yet necessarily woken up to their advantages. There are a
bos@583 226 number of ways in which distributed tools shine relative to
bos@583 227 centralised ones.</para>
bos@583 228
bos@584 229 <para id="x_8e">For an individual developer, distributed tools are almost
bos@583 230 always much faster than centralised tools. This is for a simple
bos@583 231 reason: a centralised tool needs to talk over the network for
bos@583 232 many common operations, because most metadata is stored in a
bos@583 233 single copy on the central server. A distributed tool stores
bos@583 234 all of its metadata locally. All else being equal, talking over
bos@583 235 the network adds overhead to a centralised tool. Don't
bos@583 236 underestimate the value of a snappy, responsive tool: you're
bos@583 237 going to spend a lot of time interacting with your revision
bos@583 238 control software.</para>
bos@583 239
bos@584 240 <para id="x_8f">Distributed tools are indifferent to the vagaries of your
bos@583 241 server infrastructure, again because they replicate metadata to
bos@583 242 so many locations. If you use a centralised system and your
bos@583 243 server catches fire, you'd better hope that your backup media
bos@583 244 are reliable, and that your last backup was recent and actually
bos@583 245 worked. With a distributed tool, you have many backups
bos@583 246 available on every contributor's computer.</para>
bos@583 247
bos@584 248 <para id="x_90">The reliability of your network will affect distributed
bos@583 249 tools far less than it will centralised tools. You can't even
bos@583 250 use a centralised tool without a network connection, except for
bos@583 251 a few highly constrained commands. With a distributed tool, if
bos@583 252 your network connection goes down while you're working, you may
bos@583 253 not even notice. The only thing you won't be able to do is talk
bos@583 254 to repositories on other computers, something that is relatively
bos@583 255 rare compared with local operations. If you have a far-flung
bos@583 256 team of collaborators, this may be significant.</para>
bos@583 257
bos@583 258 <sect2>
bos@583 259 <title>Advantages for open source projects</title>
bos@583 260
bos@584 261 <para id="x_91">If you take a shine to an open source project and decide
bos@583 262 that you would like to start hacking on it, and that project
bos@583 263 uses a distributed revision control tool, you are at once a
bos@583 264 peer with the people who consider themselves the
bos@583 265 <quote>core</quote> of that project. If they publish their
bos@583 266 repositories, you can immediately copy their project history,
bos@583 267 start making changes, and record your work, using the same
bos@583 268 tools in the same ways as insiders. By contrast, with a
bos@583 269 centralised tool, you must use the software in a <quote>read
bos@583 270 only</quote> mode unless someone grants you permission to
bos@583 271 commit changes to their central server. Until then, you won't
bos@583 272 be able to record changes, and your local modifications will
bos@583 273 be at risk of corruption any time you try to update your
bos@583 274 client's view of the repository.</para>
bos@583 275
bos@583 276 <sect3>
bos@583 277 <title>The forking non-problem</title>
bos@583 278
bos@584 279 <para id="x_92">It has been suggested that distributed revision control
bos@583 280 tools pose some sort of risk to open source projects because
bos@583 281 they make it easy to <quote>fork</quote> the development of
bos@583 282 a project. A fork happens when there are differences in
bos@583 283 opinion or attitude between groups of developers that cause
bos@583 284 them to decide that they can't work together any longer.
bos@583 285 Each side takes a more or less complete copy of the
bos@583 286 project's source code, and goes off in its own
bos@583 287 direction.</para>
bos@583 288
bos@584 289 <para id="x_93">Sometimes the camps in a fork decide to reconcile their
bos@583 290 differences. With a centralised revision control system, the
bos@583 291 <emphasis>technical</emphasis> process of reconciliation is
bos@583 292 painful, and has to be performed largely by hand. You have
bos@583 293 to decide whose revision history is going to
bos@583 294 <quote>win</quote>, and graft the other team's changes into
bos@583 295 the tree somehow. This usually loses some or all of one
bos@583 296 side's revision history.</para>
bos@583 297
bos@584 298 <para id="x_94">What distributed tools do with respect to forking is
bos@583 299 they make forking the <emphasis>only</emphasis> way to
bos@583 300 develop a project. Every single change that you make is
bos@583 301 potentially a fork point. The great strength of this
bos@583 302 approach is that a distributed revision control tool has to
bos@583 303 be really good at <emphasis>merging</emphasis> forks,
bos@583 304 because forks are absolutely fundamental: they happen all
bos@583 305 the time.</para>
bos@583 306
bos@584 307 <para id="x_95">If every piece of work that everybody does, all the
bos@583 308 time, is framed in terms of forking and merging, then what
bos@583 309 the open source world refers to as a <quote>fork</quote>
bos@583 310 becomes <emphasis>purely</emphasis> a social issue. If
bos@583 311 anything, distributed tools <emphasis>lower</emphasis> the
bos@583 312 likelihood of a fork:</para>
bos@583 313 <itemizedlist>
bos@584 314 <listitem><para id="x_96">They eliminate the social distinction that
bos@583 315 centralised tools impose: that between insiders (people
bos@583 316 with commit access) and outsiders (people
bos@583 317 without).</para></listitem>
bos@584 318 <listitem><para id="x_97">They make it easier to reconcile after a
bos@583 319 social fork, because all that's involved from the
bos@583 320 perspective of the revision control software is just
bos@583 321 another merge.</para></listitem></itemizedlist>
bos@583 322
bos@584 323 <para id="x_98">Some people resist distributed tools because they want
bos@583 324 to retain tight control over their projects, and they
bos@583 325 believe that centralised tools give them this control.
bos@583 326 However, if you're of this belief, and you publish your CVS
bos@583 327 or Subversion repositories publicly, there are plenty of
bos@583 328 tools available that can pull out your entire project's
bos@583 329 history (albeit slowly) and recreate it somewhere that you
bos@583 330 don't control. So while your control in this case is
bos@583 331 illusory, you are forgoing the ability to fluidly
bos@583 332 collaborate with whatever people feel compelled to mirror
bos@583 333 and fork your history.</para>
bos@583 334
bos@583 335 </sect3>
bos@583 336 </sect2>
bos@583 337 <sect2>
bos@583 338 <title>Advantages for commercial projects</title>
bos@583 339
bos@584 340 <para id="x_99">Many commercial projects are undertaken by teams that are
bos@583 341 scattered across the globe. Contributors who are far from a
bos@583 342 central server will see slower command execution and perhaps
bos@583 343 less reliability. Commercial revision control systems attempt
bos@583 344 to ameliorate these problems with remote-site replication
bos@583 345 add-ons that are typically expensive to buy and cantankerous
bos@583 346 to administer. A distributed system doesn't suffer from these
bos@583 347 problems in the first place. Better yet, you can easily set
bos@583 348 up multiple authoritative servers, say one per site, so that
bos@583 349 there's no redundant communication between repositories over
bos@583 350 expensive long-haul network links.</para>
bos@583 351
bos@584 352 <para id="x_9a">Centralised revision control systems tend to have
bos@583 353 relatively low scalability. It's not unusual for an expensive
bos@583 354 centralised system to fall over under the combined load of
bos@583 355 just a few dozen concurrent users. Once again, the typical
bos@583 356 response tends to be an expensive and clunky replication
bos@609 357 facility. Since the load on a central server&emdash;if you have
bos@609 358 one at all&emdash;is many times lower with a distributed tool
bos@583 359 (because all of the data is replicated everywhere), a single
bos@583 360 cheap server can handle the needs of a much larger team, and
bos@583 361 replication to balance load becomes a simple matter of
bos@583 362 scripting.</para>
bos@583 363
bos@584 364 <para id="x_9b">If you have an employee in the field, troubleshooting a
bos@583 365 problem at a customer's site, they'll benefit from distributed
bos@583 366 revision control. The tool will let them generate custom
bos@583 367 builds, try different fixes in isolation from each other, and
bos@583 368 search efficiently through history for the sources of bugs and
bos@583 369 regressions in the customer's environment, all without needing
bos@583 370 to connect to your company's network.</para>
bos@583 371
bos@583 372 </sect2>
bos@583 373 </sect1>
bos@583 374 <sect1>
bos@583 375 <title>Why choose Mercurial?</title>
bos@583 376
bos@584 377 <para id="x_9c">Mercurial has a unique set of properties that make it a
bos@583 378 particularly good choice as a revision control system.</para>
bos@583 379 <itemizedlist>
bos@584 380 <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>
bos@584 381 <listitem><para id="x_9e">It is lightweight.</para></listitem>
bos@584 382 <listitem><para id="x_9f">It scales excellently.</para></listitem>
bos@584 383 <listitem><para id="x_a0">It is easy to
bos@583 384 customise.</para></listitem></itemizedlist>
bos@583 385
bos@584 386 <para id="x_a1">If you are at all familiar with revision control systems,
bos@583 387 you should be able to get up and running with Mercurial in less
bos@583 388 than five minutes. Even if not, it will take no more than a few
bos@583 389 minutes longer. Mercurial's command and feature sets are
bos@583 390 generally uniform and consistent, so you can keep track of a few
bos@583 391 general rules instead of a host of exceptions.</para>
bos@583 392
bos@584 393 <para id="x_a2">On a small project, you can start working with Mercurial in
bos@583 394 moments. Creating new changes and branches; transferring changes
bos@583 395 around (whether locally or over a network); and history and
bos@583 396 status operations are all fast. Mercurial attempts to stay
bos@583 397 nimble and largely out of your way by combining low cognitive
bos@583 398 overhead with blazingly fast operations.</para>
bos@583 399
bos@584 400 <para id="x_a3">The usefulness of Mercurial is not limited to small
bos@583 401 projects: it is used by projects with hundreds to thousands of
bos@583 402 contributors, each containing tens of thousands of files and
bos@583 403 hundreds of megabytes of source code.</para>
bos@583 404
bos@584 405 <para id="x_a4">If the core functionality of Mercurial is not enough for
bos@583 406 you, it's easy to build on. Mercurial is well suited to
bos@583 407 scripting tasks, and its clean internals and implementation in
bos@583 408 Python make it easy to add features in the form of extensions.
bos@583 409 There are a number of popular and useful extensions already
bos@583 410 available, ranging from helping to identify bugs to improving
bos@583 411 performance.</para>
bos@583 412
bos@583 413 </sect1>
bos@583 414 <sect1>
bos@583 415 <title>Mercurial compared with other tools</title>
bos@583 416
bos@584 417 <para id="x_a5">Before you read on, please understand that this section
bos@583 418 necessarily reflects my own experiences, interests, and (dare I
bos@583 419 say it) biases. I have used every one of the revision control
bos@583 420 tools listed below, in most cases for several years at a
bos@583 421 time.</para>
bos@583 422
bos@583 423
bos@583 424 <sect2>
bos@583 425 <title>Subversion</title>
bos@583 426
bos@584 427 <para id="x_a6">Subversion is a popular revision control tool, developed
bos@583 428 to replace CVS. It has a centralised client/server
bos@583 429 architecture.</para>
bos@583 430
bos@584 431 <para id="x_a7">Subversion and Mercurial have similarly named commands for
bos@583 432 performing the same operations, so if you're familiar with
bos@583 433 one, it is easy to learn to use the other. Both tools are
bos@583 434 portable to all popular operating systems.</para>
bos@583 435
bos@584 436 <para id="x_a8">Prior to version 1.5, Subversion had no useful support for
bos@583 437 merges. At the time of writing, its merge tracking capability
bos@583 438 is new, and known to be <ulink
bos@583 439 url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated
bos@583 440 and buggy</ulink>.</para>
bos@583 441
bos@584 442 <para id="x_a9">Mercurial has a substantial performance advantage over
bos@583 443 Subversion on every revision control operation I have
bos@583 444 benchmarked. I have measured its advantage as ranging from a
bos@583 445 factor of two to a factor of six when compared with Subversion
bos@583 446 1.4.3's <emphasis>ra_local</emphasis> file store, which is the
bos@583 447 fastest access method available. In more realistic
bos@583 448 deployments involving a network-based store, Subversion will
bos@583 449 be at a substantially larger disadvantage. Because many
bos@583 450 Subversion commands must talk to the server and Subversion
bos@583 451 does not have useful replication facilities, server capacity
bos@583 452 and network bandwidth become bottlenecks for modestly large
bos@583 453 projects.</para>
bos@583 454
bos@584 455 <para id="x_aa">Additionally, Subversion incurs substantial storage
bos@583 456 overhead to avoid network transactions for a few common
bos@583 457 operations, such as finding modified files
bos@583 458 (<literal>status</literal>) and displaying modifications
bos@583 459 against the current revision (<literal>diff</literal>). As a
bos@583 460 result, a Subversion working copy is often the same size as,
bos@583 461 or larger than, a Mercurial repository and working directory,
bos@583 462 even though the Mercurial repository contains a complete
bos@583 463 history of the project.</para>
bos@583 464
bos@584 465 <para id="x_ab">Subversion is widely supported by third party tools.
bos@583 466 Mercurial currently lags considerably in this area. This gap
bos@583 467 is closing, however, and indeed some of Mercurial's GUI tools
bos@583 468 now outshine their Subversion equivalents. Like Mercurial,
bos@583 469 Subversion has an excellent user manual.</para>
bos@583 470
bos@584 471 <para id="x_ac">Because Subversion doesn't store revision history on the
bos@583 472 client, it is well suited to managing projects that deal with
bos@583 473 lots of large, opaque binary files. If you check in fifty
bos@583 474 revisions to an incompressible 10MB file, Subversion's
bos@583 475 client-side space usage stays constant The space used by any
bos@583 476 distributed SCM will grow rapidly in proportion to the number
bos@583 477 of revisions, because the differences between each revision
bos@583 478 are large.</para>
bos@583 479
bos@584 480 <para id="x_ad">In addition, it's often difficult or, more usually,
bos@583 481 impossible to merge different versions of a binary file.
bos@583 482 Subversion's ability to let a user lock a file, so that they
bos@583 483 temporarily have the exclusive right to commit changes to it,
bos@583 484 can be a significant advantage to a project where binary files
bos@583 485 are widely used.</para>
bos@583 486
bos@584 487 <para id="x_ae">Mercurial can import revision history from a Subversion
bos@583 488 repository. It can also export revision history to a
bos@583 489 Subversion repository. This makes it easy to <quote>test the
bos@583 490 waters</quote> and use Mercurial and Subversion in parallel
bos@583 491 before deciding to switch. History conversion is incremental,
bos@583 492 so you can perform an initial conversion, then small
bos@583 493 additional conversions afterwards to bring in new
bos@583 494 changes.</para>
bos@583 495
bos@583 496
bos@583 497 </sect2>
bos@583 498 <sect2>
bos@583 499 <title>Git</title>
bos@583 500
bos@584 501 <para id="x_af">Git is a distributed revision control tool that was
bos@583 502 developed for managing the Linux kernel source tree. Like
bos@583 503 Mercurial, its early design was somewhat influenced by
bos@583 504 Monotone.</para>
bos@583 505
bos@584 506 <para id="x_b0">Git has a very large command set, with version 1.5.0
bos@583 507 providing 139 individual commands. It has something of a
bos@583 508 reputation for being difficult to learn. Compared to Git,
bos@583 509 Mercurial has a strong focus on simplicity.</para>
bos@583 510
bos@584 511 <para id="x_b1">In terms of performance, Git is extremely fast. In
bos@583 512 several cases, it is faster than Mercurial, at least on Linux,
bos@583 513 while Mercurial performs better on other operations. However,
bos@583 514 on Windows, the performance and general level of support that
bos@583 515 Git provides is, at the time of writing, far behind that of
bos@583 516 Mercurial.</para>
bos@583 517
bos@584 518 <para id="x_b2">While a Mercurial repository needs no maintenance, a Git
bos@583 519 repository requires frequent manual <quote>repacks</quote> of
bos@583 520 its metadata. Without these, performance degrades, while
bos@583 521 space usage grows rapidly. A server that contains many Git
bos@583 522 repositories that are not rigorously and frequently repacked
bos@583 523 will become heavily disk-bound during backups, and there have
bos@583 524 been instances of daily backups taking far longer than 24
bos@583 525 hours as a result. A freshly packed Git repository is
bos@583 526 slightly smaller than a Mercurial repository, but an unpacked
bos@583 527 repository is several orders of magnitude larger.</para>
bos@583 528
bos@584 529 <para id="x_b3">The core of Git is written in C. Many Git commands are
bos@583 530 implemented as shell or Perl scripts, and the quality of these
bos@583 531 scripts varies widely. I have encountered several instances
bos@583 532 where scripts charged along blindly in the presence of errors
bos@583 533 that should have been fatal.</para>
bos@583 534
bos@584 535 <para id="x_b4">Mercurial can import revision history from a Git
bos@583 536 repository.</para>
bos@583 537
bos@583 538
bos@583 539 </sect2>
bos@583 540 <sect2>
bos@583 541 <title>CVS</title>
bos@583 542
bos@584 543 <para id="x_b5">CVS is probably the most widely used revision control tool
bos@583 544 in the world. Due to its age and internal untidiness, it has
bos@583 545 been only lightly maintained for many years.</para>
bos@583 546
bos@584 547 <para id="x_b6">It has a centralised client/server architecture. It does
bos@583 548 not group related file changes into atomic commits, making it
bos@583 549 easy for people to <quote>break the build</quote>: one person
bos@583 550 can successfully commit part of a change and then be blocked
bos@583 551 by the need for a merge, causing other people to see only a
bos@583 552 portion of the work they intended to do. This also affects
bos@583 553 how you work with project history. If you want to see all of
bos@583 554 the modifications someone made as part of a task, you will
bos@583 555 need to manually inspect the descriptions and timestamps of
bos@583 556 the changes made to each file involved (if you even know what
bos@583 557 those files were).</para>
bos@583 558
bos@584 559 <para id="x_b7">CVS has a muddled notion of tags and branches that I will
bos@583 560 not attempt to even describe. It does not support renaming of
bos@583 561 files or directories well, making it easy to corrupt a
bos@583 562 repository. It has almost no internal consistency checking
bos@583 563 capabilities, so it is usually not even possible to tell
bos@583 564 whether or how a repository is corrupt. I would not recommend
bos@583 565 CVS for any project, existing or new.</para>
bos@583 566
bos@584 567 <para id="x_b8">Mercurial can import CVS revision history. However, there
bos@583 568 are a few caveats that apply; these are true of every other
bos@583 569 revision control tool's CVS importer, too. Due to CVS's lack
bos@583 570 of atomic changes and unversioned filesystem hierarchy, it is
bos@583 571 not possible to reconstruct CVS history completely accurately;
bos@583 572 some guesswork is involved, and renames will usually not show
bos@583 573 up. Because a lot of advanced CVS administration has to be
bos@583 574 done by hand and is hence error-prone, it's common for CVS
bos@583 575 importers to run into multiple problems with corrupted
bos@583 576 repositories (completely bogus revision timestamps and files
bos@583 577 that have remained locked for over a decade are just two of
bos@583 578 the less interesting problems I can recall from personal
bos@583 579 experience).</para>
bos@583 580
bos@584 581 <para id="x_b9">Mercurial can import revision history from a CVS
bos@583 582 repository.</para>
bos@583 583
bos@583 584
bos@583 585 </sect2>
bos@583 586 <sect2>
bos@583 587 <title>Commercial tools</title>
bos@583 588
bos@584 589 <para id="x_ba">Perforce has a centralised client/server architecture,
bos@583 590 with no client-side caching of any data. Unlike modern
bos@583 591 revision control tools, Perforce requires that a user run a
bos@583 592 command to inform the server about every file they intend to
bos@583 593 edit.</para>
bos@583 594
bos@584 595 <para id="x_bb">The performance of Perforce is quite good for small teams,
bos@583 596 but it falls off rapidly as the number of users grows beyond a
bos@583 597 few dozen. Modestly large Perforce installations require the
bos@583 598 deployment of proxies to cope with the load their users
bos@583 599 generate.</para>
bos@583 600
bos@583 601
bos@583 602 </sect2>
bos@583 603 <sect2>
bos@583 604 <title>Choosing a revision control tool</title>
bos@583 605
bos@584 606 <para id="x_bc">With the exception of CVS, all of the tools listed above
bos@583 607 have unique strengths that suit them to particular styles of
bos@583 608 work. There is no single revision control tool that is best
bos@583 609 in all situations.</para>
bos@583 610
bos@584 611 <para id="x_bd">As an example, Subversion is a good choice for working
bos@583 612 with frequently edited binary files, due to its centralised
bos@583 613 nature and support for file locking.</para>
bos@583 614
bos@584 615 <para id="x_be">I personally find Mercurial's properties of simplicity,
bos@583 616 performance, and good merge support to be a compelling
bos@583 617 combination that has served me well for several years.</para>
bos@583 618
bos@583 619
bos@583 620 </sect2>
bos@583 621 </sect1>
bos@583 622 <sect1>
bos@583 623 <title>Switching from another tool to Mercurial</title>
bos@583 624
bos@584 625 <para id="x_bf">Mercurial is bundled with an extension named <literal
bos@583 626 role="hg-ext">convert</literal>, which can incrementally
bos@583 627 import revision history from several other revision control
bos@583 628 tools. By <quote>incremental</quote>, I mean that you can
bos@583 629 convert all of a project's history to date in one go, then rerun
bos@583 630 the conversion later to obtain new changes that happened after
bos@583 631 the initial conversion.</para>
bos@583 632
bos@584 633 <para id="x_c0">The revision control tools supported by <literal
bos@583 634 role="hg-ext">convert</literal> are as follows:</para>
bos@583 635 <itemizedlist>
bos@584 636 <listitem><para id="x_c1">Subversion</para></listitem>
bos@584 637 <listitem><para id="x_c2">CVS</para></listitem>
bos@584 638 <listitem><para id="x_c3">Git</para></listitem>
bos@584 639 <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>
bos@584 640
bos@584 641 <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can
bos@583 642 export changes from Mercurial to Subversion. This makes it
bos@583 643 possible to try Subversion and Mercurial in parallel before
bos@583 644 committing to a switchover, without risking the loss of any
bos@583 645 work.</para>
bos@583 646
bos@584 647 <para id="x_c6">The <command role="hg-ext-convert">convert</command> command
bos@583 648 is easy to use. Simply point it at the path or URL of the
bos@583 649 source repository, optionally give it the name of the
bos@583 650 destination repository, and it will start working. After the
bos@583 651 initial conversion, just run the same command again to import
bos@583 652 new changes.</para>
bos@583 653 </sect1>
bos@583 654
bos@583 655 <sect1>
bos@583 656 <title>A short history of revision control</title>
bos@583 657
bos@584 658 <para id="x_c7">The best known of the old-time revision control tools is
bos@583 659 SCCS (Source Code Control System), which Marc Rochkind wrote at
bos@583 660 Bell Labs, in the early 1970s. SCCS operated on individual
bos@583 661 files, and required every person working on a project to have
bos@583 662 access to a shared workspace on a single system. Only one
bos@583 663 person could modify a file at any time; arbitration for access
bos@583 664 to files was via locks. It was common for people to lock files,
bos@583 665 and later forget to unlock them, preventing anyone else from
bos@583 666 modifying those files without the help of an
bos@583 667 administrator.</para>
bos@583 668
bos@584 669 <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the
bos@583 670 early 1980s; he called his program RCS (Revision Control System).
bos@583 671 Like SCCS, RCS required developers to work in a single shared
bos@583 672 workspace, and to lock files to prevent multiple people from
bos@583 673 modifying them simultaneously.</para>
bos@583 674
bos@584 675 <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block
bos@583 676 for a set of shell scripts he initially called cmt, but then
bos@583 677 renamed to CVS (Concurrent Versions System). The big innovation
bos@583 678 of CVS was that it let developers work simultaneously and
bos@583 679 somewhat independently in their own personal workspaces. The
bos@583 680 personal workspaces prevented developers from stepping on each
bos@583 681 other's toes all the time, as was common with SCCS and RCS. Each
bos@583 682 developer had a copy of every project file, and could modify
bos@583 683 their copies independently. They had to merge their edits prior
bos@583 684 to committing changes to the central repository.</para>
bos@583 685
bos@584 686 <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote
bos@583 687 them in C, releasing in 1989 the code that has since developed
bos@583 688 into the modern version of CVS. CVS subsequently acquired the
bos@583 689 ability to operate over a network connection, giving it a
bos@583 690 client/server architecture. CVS's architecture is centralised;
bos@583 691 only the server has a copy of the history of the project. Client
bos@583 692 workspaces just contain copies of recent versions of the
bos@583 693 project's files, and a little metadata to tell them where the
bos@583 694 server is. CVS has been enormously successful; it is probably
bos@583 695 the world's most widely used revision control system.</para>
bos@583 696
bos@584 697 <para id="x_cb">In the early 1990s, Sun Microsystems developed an early
bos@583 698 distributed revision control system, called TeamWare. A
bos@583 699 TeamWare workspace contains a complete copy of the project's
bos@583 700 history. TeamWare has no notion of a central repository. (CVS
bos@583 701 relied upon RCS for its history storage; TeamWare used
bos@583 702 SCCS.)</para>
bos@583 703
bos@584 704 <para id="x_cc">As the 1990s progressed, awareness grew of a number of
bos@583 705 problems with CVS. It records simultaneous changes to multiple
bos@583 706 files individually, instead of grouping them together as a
bos@583 707 single logically atomic operation. It does not manage its file
bos@583 708 hierarchy well; it is easy to make a mess of a repository by
bos@583 709 renaming files and directories. Worse, its source code is
bos@583 710 difficult to read and maintain, which made the <quote>pain
bos@583 711 level</quote> of fixing these architectural problems
bos@583 712 prohibitive.</para>
bos@583 713
bos@584 714 <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had
bos@583 715 worked on CVS, started a project to replace it with a tool that
bos@583 716 would have a better architecture and cleaner code. The result,
bos@583 717 Subversion, does not stray from CVS's centralised client/server
bos@583 718 model, but it adds multi-file atomic commits, better namespace
bos@583 719 management, and a number of other features that make it a
bos@583 720 generally better tool than CVS. Since its initial release, it
bos@583 721 has rapidly grown in popularity.</para>
bos@583 722
bos@584 723 <para id="x_ce">More or less simultaneously, Graydon Hoare began working on
bos@583 724 an ambitious distributed revision control system that he named
bos@583 725 Monotone. While Monotone addresses many of CVS's design flaws
bos@583 726 and has a peer-to-peer architecture, it goes beyond earlier (and
bos@583 727 subsequent) revision control tools in a number of innovative
bos@583 728 ways. It uses cryptographic hashes as identifiers, and has an
bos@583 729 integral notion of <quote>trust</quote> for code from different
bos@583 730 sources.</para>
bos@583 731
bos@584 732 <para id="x_cf">Mercurial began life in 2005. While a few aspects of its
bos@583 733 design are influenced by Monotone, Mercurial focuses on ease of
bos@583 734 use, high performance, and scalability to very large
bos@583 735 projects.</para>
bos@583 736
bos@583 737 </sect1>
bos@583 738
bos@583 739 <sect1>
bos@583 740 <title>Colophon&emdash;this book is Free</title>
bos@26 741
bos@584 742 <para id="x_d0">This book is licensed under the Open Publication License,
bos@559 743 and is produced entirely using Free Software tools. It is
bos@580 744 typeset with DocBook XML. Illustrations are drawn and rendered with
bos@559 745 <ulink url="http://www.inkscape.org/">Inkscape</ulink>.</para>
bos@26 746
bos@584 747 <para id="x_d1">The complete source code for this book is published as a
bos@559 748 Mercurial repository, at <ulink
bos@559 749 url="http://hg.serpentine.com/mercurial/book">http://hg.serpentine.com/mercurial/book</ulink>.</para>
bos@559 750
bos@559 751 </sect1>
bos@559 752 </preface>
bos@559 753 <!--
bos@559 754 local variables:
bos@559 755 sgml-parent-document: ("00book.xml" "book" "preface")
bos@559 756 end:
bos@559 757 -->