hgbook

annotate en/ch00-preface.xml @ 584:c838b3975bc6

Add IDs to paragraphs.
author Bryan O'Sullivan <bos@serpentine.com>
date Thu Mar 19 21:18:52 2009 -0700 (2009-03-19)
parents 28b5a5befb08
children 34cb220eb717
rev   line source
bos@559 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
bos@26 2
bos@559 3 <preface id="chap:preface">
bos@559 4 <title>Preface</title>
bos@26 5
bos@583 6 <sect1>
bos@583 7 <title>Why revision control? Why Mercurial?</title>
bos@583 8
bos@584 9 <para id="x_6d">Revision control is the process of managing multiple
bos@583 10 versions of a piece of information. In its simplest form, this
bos@583 11 is something that many people do by hand: every time you modify
bos@583 12 a file, save it under a new name that contains a number, each
bos@583 13 one higher than the number of the preceding version.</para>
bos@583 14
bos@584 15 <para id="x_6e">Manually managing multiple versions of even a single file is
bos@583 16 an error-prone task, though, so software tools to help automate
bos@583 17 this process have long been available. The earliest automated
bos@583 18 revision control tools were intended to help a single user to
bos@583 19 manage revisions of a single file. Over the past few decades,
bos@583 20 the scope of revision control tools has expanded greatly; they
bos@583 21 now manage multiple files, and help multiple people to work
bos@583 22 together. The best modern revision control tools have no
bos@583 23 problem coping with thousands of people working together on
bos@583 24 projects that consist of hundreds of thousands of files.</para>
bos@583 25
bos@584 26 <para id="x_6f">The arrival of distributed revision control is relatively
bos@583 27 recent, and so far this new field has grown due to people's
bos@583 28 willingness to explore ill-charted territory.</para>
bos@583 29
bos@584 30 <para id="x_70">I am writing a book about distributed revision control
bos@583 31 because I believe that it is an important subject that deserves
bos@583 32 a field guide. I chose to write about Mercurial because it is
bos@583 33 the easiest tool to learn the terrain with, and yet it scales to
bos@583 34 the demands of real, challenging environments where many other
bos@583 35 revision control tools buckle.</para>
bos@583 36
bos@583 37 <sect2>
bos@583 38 <title>Why use revision control?</title>
bos@583 39
bos@584 40 <para id="x_71">There are a number of reasons why you or your team might
bos@583 41 want to use an automated revision control tool for a
bos@583 42 project.</para>
bos@583 43
bos@583 44 <itemizedlist>
bos@584 45 <listitem><para id="x_72">It will track the history and evolution of
bos@583 46 your project, so you don't have to. For every change,
bos@583 47 you'll have a log of <emphasis>who</emphasis> made it;
bos@583 48 <emphasis>why</emphasis> they made it;
bos@583 49 <emphasis>when</emphasis> they made it; and
bos@583 50 <emphasis>what</emphasis> the change
bos@583 51 was.</para></listitem>
bos@584 52 <listitem><para id="x_73">When you're working with other people,
bos@583 53 revision control software makes it easier for you to
bos@583 54 collaborate. For example, when people more or less
bos@583 55 simultaneously make potentially incompatible changes, the
bos@583 56 software will help you to identify and resolve those
bos@583 57 conflicts.</para></listitem>
bos@584 58 <listitem><para id="x_74">It can help you to recover from mistakes. If
bos@583 59 you make a change that later turns out to be in error, you
bos@583 60 can revert to an earlier version of one or more files. In
bos@583 61 fact, a <emphasis>really</emphasis> good revision control
bos@583 62 tool will even help you to efficiently figure out exactly
bos@583 63 when a problem was introduced (see section <xref
bos@583 64 linkend="sec:undo:bisect"/> for details).</para></listitem>
bos@584 65 <listitem><para id="x_75">It will help you to work simultaneously on,
bos@583 66 and manage the drift between, multiple versions of your
bos@583 67 project.</para></listitem>
bos@583 68 </itemizedlist>
bos@583 69
bos@584 70 <para id="x_76">Most of these reasons are equally valid---at least in
bos@583 71 theory---whether you're working on a project by yourself, or
bos@583 72 with a hundred other people.</para>
bos@583 73
bos@584 74 <para id="x_77">A key question about the practicality of revision control
bos@583 75 at these two different scales (<quote>lone hacker</quote> and
bos@583 76 <quote>huge team</quote>) is how its
bos@583 77 <emphasis>benefits</emphasis> compare to its
bos@583 78 <emphasis>costs</emphasis>. A revision control tool that's
bos@583 79 difficult to understand or use is going to impose a high
bos@583 80 cost.</para>
bos@583 81
bos@584 82 <para id="x_78">A five-hundred-person project is likely to collapse under
bos@583 83 its own weight almost immediately without a revision control
bos@583 84 tool and process. In this case, the cost of using revision
bos@583 85 control might hardly seem worth considering, since
bos@583 86 <emphasis>without</emphasis> it, failure is almost
bos@583 87 guaranteed.</para>
bos@583 88
bos@584 89 <para id="x_79">On the other hand, a one-person <quote>quick hack</quote>
bos@583 90 might seem like a poor place to use a revision control tool,
bos@583 91 because surely the cost of using one must be close to the
bos@583 92 overall cost of the project. Right?</para>
bos@583 93
bos@584 94 <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of
bos@583 95 these scales of development. You can learn the basics in just
bos@583 96 a few minutes, and due to its low overhead, you can apply
bos@583 97 revision control to the smallest of projects with ease. Its
bos@583 98 simplicity means you won't have a lot of abstruse concepts or
bos@583 99 command sequences competing for mental space with whatever
bos@583 100 you're <emphasis>really</emphasis> trying to do. At the same
bos@583 101 time, Mercurial's high performance and peer-to-peer nature let
bos@583 102 you scale painlessly to handle large projects.</para>
bos@583 103
bos@584 104 <para id="x_7b">No revision control tool can rescue a poorly run project,
bos@583 105 but a good choice of tools can make a huge difference to the
bos@583 106 fluidity with which you can work on a project.</para>
bos@583 107
bos@583 108 </sect2>
bos@583 109
bos@583 110 <sect2>
bos@583 111 <title>The many names of revision control</title>
bos@583 112
bos@584 113 <para id="x_7c">Revision control is a diverse field, so much so that it is
bos@583 114 referred to by many names and acronyms. Here are a few of the
bos@583 115 more common variations you'll encounter:</para>
bos@583 116 <itemizedlist>
bos@584 117 <listitem><para id="x_7d">Revision control (RCS)</para></listitem>
bos@584 118 <listitem><para id="x_7e">Software configuration management (SCM), or
bos@583 119 configuration management</para></listitem>
bos@584 120 <listitem><para id="x_7f">Source code management</para></listitem>
bos@584 121 <listitem><para id="x_80">Source code control, or source
bos@583 122 control</para></listitem>
bos@584 123 <listitem><para id="x_81">Version control
bos@583 124 (VCS)</para></listitem></itemizedlist>
bos@584 125 <para id="x_82">Some people claim that these terms actually have different
bos@583 126 meanings, but in practice they overlap so much that there's no
bos@583 127 agreed or even useful way to tease them apart.</para>
bos@583 128
bos@583 129 </sect2>
bos@583 130 </sect1>
bos@26 131
bos@559 132 <sect1>
bos@559 133 <title>This book is a work in progress</title>
bos@26 134
bos@584 135 <para id="x_83">I am releasing this book while I am still writing it, in the
bos@583 136 hope that it will prove useful to others. I am writing under an
bos@583 137 open license in the hope that you, my readers, will contribute
bos@583 138 feedback and perhaps content of your own.</para>
bos@200 139
bos@559 140 </sect1>
bos@559 141 <sect1>
bos@559 142 <title>About the examples in this book</title>
bos@200 143
bos@584 144 <para id="x_84">This book takes an unusual approach to code samples. Every
bos@559 145 example is <quote>live</quote>---each one is actually the result
bos@559 146 of a shell script that executes the Mercurial commands you see.
bos@559 147 Every time an image of the book is built from its sources, all
bos@559 148 the example scripts are automatically run, and their current
bos@559 149 results compared against their expected results.</para>
bos@200 150
bos@584 151 <para id="x_85">The advantage of this approach is that the examples are
bos@559 152 always accurate; they describe <emphasis>exactly</emphasis> the
bos@559 153 behaviour of the version of Mercurial that's mentioned at the
bos@559 154 front of the book. If I update the version of Mercurial that
bos@559 155 I'm documenting, and the output of some command changes, the
bos@559 156 build fails.</para>
bos@200 157
bos@584 158 <para id="x_86">There is a small disadvantage to this approach, which is
bos@559 159 that the dates and times you'll see in examples tend to be
bos@559 160 <quote>squashed</quote> together in a way that they wouldn't be
bos@559 161 if the same commands were being typed by a human. Where a human
bos@559 162 can issue no more than one command every few seconds, with any
bos@559 163 resulting timestamps correspondingly spread out, my automated
bos@559 164 example scripts run many commands in one second.</para>
bos@200 165
bos@584 166 <para id="x_87">As an instance of this, several consecutive commits in an
bos@559 167 example can show up as having occurred during the same second.
bos@559 168 You can see this occur in the <literal
bos@559 169 role="hg-ext">bisect</literal> example in section <xref
bos@559 170 id="sec:undo:bisect"/>, for instance.</para>
bos@200 171
bos@584 172 <para id="x_88">So when you're reading examples, don't place too much weight
bos@559 173 on the dates or times you see in the output of commands. But
bos@559 174 <emphasis>do</emphasis> be confident that the behaviour you're
bos@559 175 seeing is consistent and reproducible.</para>
bos@26 176
bos@559 177 </sect1>
bos@583 178
bos@583 179 <sect1>
bos@583 180 <title>Trends in the field</title>
bos@583 181
bos@584 182 <para id="x_89">There has been an unmistakable trend in the development and
bos@583 183 use of revision control tools over the past four decades, as
bos@583 184 people have become familiar with the capabilities of their tools
bos@583 185 and constrained by their limitations.</para>
bos@583 186
bos@584 187 <para id="x_8a">The first generation began by managing single files on
bos@583 188 individual computers. Although these tools represented a huge
bos@583 189 advance over ad-hoc manual revision control, their locking model
bos@583 190 and reliance on a single computer limited them to small,
bos@583 191 tightly-knit teams.</para>
bos@583 192
bos@584 193 <para id="x_8b">The second generation loosened these constraints by moving
bos@583 194 to network-centered architectures, and managing entire projects
bos@583 195 at a time. As projects grew larger, they ran into new problems.
bos@583 196 With clients needing to talk to servers very frequently, server
bos@583 197 scaling became an issue for large projects. An unreliable
bos@583 198 network connection could prevent remote users from being able to
bos@583 199 talk to the server at all. As open source projects started
bos@583 200 making read-only access available anonymously to anyone, people
bos@583 201 without commit privileges found that they could not use the
bos@583 202 tools to interact with a project in a natural way, as they could
bos@583 203 not record their changes.</para>
bos@583 204
bos@584 205 <para id="x_8c">The current generation of revision control tools is
bos@583 206 peer-to-peer in nature. All of these systems have dropped the
bos@583 207 dependency on a single central server, and allow people to
bos@583 208 distribute their revision control data to where it's actually
bos@583 209 needed. Collaboration over the Internet has moved from
bos@583 210 constrained by technology to a matter of choice and consensus.
bos@583 211 Modern tools can operate offline indefinitely and autonomously,
bos@583 212 with a network connection only needed when syncing changes with
bos@583 213 another repository.</para>
bos@583 214
bos@583 215 </sect1>
bos@583 216 <sect1>
bos@583 217 <title>A few of the advantages of distributed revision
bos@583 218 control</title>
bos@583 219
bos@584 220 <para id="x_8d">Even though distributed revision control tools have for
bos@583 221 several years been as robust and usable as their
bos@583 222 previous-generation counterparts, people using older tools have
bos@583 223 not yet necessarily woken up to their advantages. There are a
bos@583 224 number of ways in which distributed tools shine relative to
bos@583 225 centralised ones.</para>
bos@583 226
bos@584 227 <para id="x_8e">For an individual developer, distributed tools are almost
bos@583 228 always much faster than centralised tools. This is for a simple
bos@583 229 reason: a centralised tool needs to talk over the network for
bos@583 230 many common operations, because most metadata is stored in a
bos@583 231 single copy on the central server. A distributed tool stores
bos@583 232 all of its metadata locally. All else being equal, talking over
bos@583 233 the network adds overhead to a centralised tool. Don't
bos@583 234 underestimate the value of a snappy, responsive tool: you're
bos@583 235 going to spend a lot of time interacting with your revision
bos@583 236 control software.</para>
bos@583 237
bos@584 238 <para id="x_8f">Distributed tools are indifferent to the vagaries of your
bos@583 239 server infrastructure, again because they replicate metadata to
bos@583 240 so many locations. If you use a centralised system and your
bos@583 241 server catches fire, you'd better hope that your backup media
bos@583 242 are reliable, and that your last backup was recent and actually
bos@583 243 worked. With a distributed tool, you have many backups
bos@583 244 available on every contributor's computer.</para>
bos@583 245
bos@584 246 <para id="x_90">The reliability of your network will affect distributed
bos@583 247 tools far less than it will centralised tools. You can't even
bos@583 248 use a centralised tool without a network connection, except for
bos@583 249 a few highly constrained commands. With a distributed tool, if
bos@583 250 your network connection goes down while you're working, you may
bos@583 251 not even notice. The only thing you won't be able to do is talk
bos@583 252 to repositories on other computers, something that is relatively
bos@583 253 rare compared with local operations. If you have a far-flung
bos@583 254 team of collaborators, this may be significant.</para>
bos@583 255
bos@583 256 <sect2>
bos@583 257 <title>Advantages for open source projects</title>
bos@583 258
bos@584 259 <para id="x_91">If you take a shine to an open source project and decide
bos@583 260 that you would like to start hacking on it, and that project
bos@583 261 uses a distributed revision control tool, you are at once a
bos@583 262 peer with the people who consider themselves the
bos@583 263 <quote>core</quote> of that project. If they publish their
bos@583 264 repositories, you can immediately copy their project history,
bos@583 265 start making changes, and record your work, using the same
bos@583 266 tools in the same ways as insiders. By contrast, with a
bos@583 267 centralised tool, you must use the software in a <quote>read
bos@583 268 only</quote> mode unless someone grants you permission to
bos@583 269 commit changes to their central server. Until then, you won't
bos@583 270 be able to record changes, and your local modifications will
bos@583 271 be at risk of corruption any time you try to update your
bos@583 272 client's view of the repository.</para>
bos@583 273
bos@583 274 <sect3>
bos@583 275 <title>The forking non-problem</title>
bos@583 276
bos@584 277 <para id="x_92">It has been suggested that distributed revision control
bos@583 278 tools pose some sort of risk to open source projects because
bos@583 279 they make it easy to <quote>fork</quote> the development of
bos@583 280 a project. A fork happens when there are differences in
bos@583 281 opinion or attitude between groups of developers that cause
bos@583 282 them to decide that they can't work together any longer.
bos@583 283 Each side takes a more or less complete copy of the
bos@583 284 project's source code, and goes off in its own
bos@583 285 direction.</para>
bos@583 286
bos@584 287 <para id="x_93">Sometimes the camps in a fork decide to reconcile their
bos@583 288 differences. With a centralised revision control system, the
bos@583 289 <emphasis>technical</emphasis> process of reconciliation is
bos@583 290 painful, and has to be performed largely by hand. You have
bos@583 291 to decide whose revision history is going to
bos@583 292 <quote>win</quote>, and graft the other team's changes into
bos@583 293 the tree somehow. This usually loses some or all of one
bos@583 294 side's revision history.</para>
bos@583 295
bos@584 296 <para id="x_94">What distributed tools do with respect to forking is
bos@583 297 they make forking the <emphasis>only</emphasis> way to
bos@583 298 develop a project. Every single change that you make is
bos@583 299 potentially a fork point. The great strength of this
bos@583 300 approach is that a distributed revision control tool has to
bos@583 301 be really good at <emphasis>merging</emphasis> forks,
bos@583 302 because forks are absolutely fundamental: they happen all
bos@583 303 the time.</para>
bos@583 304
bos@584 305 <para id="x_95">If every piece of work that everybody does, all the
bos@583 306 time, is framed in terms of forking and merging, then what
bos@583 307 the open source world refers to as a <quote>fork</quote>
bos@583 308 becomes <emphasis>purely</emphasis> a social issue. If
bos@583 309 anything, distributed tools <emphasis>lower</emphasis> the
bos@583 310 likelihood of a fork:</para>
bos@583 311 <itemizedlist>
bos@584 312 <listitem><para id="x_96">They eliminate the social distinction that
bos@583 313 centralised tools impose: that between insiders (people
bos@583 314 with commit access) and outsiders (people
bos@583 315 without).</para></listitem>
bos@584 316 <listitem><para id="x_97">They make it easier to reconcile after a
bos@583 317 social fork, because all that's involved from the
bos@583 318 perspective of the revision control software is just
bos@583 319 another merge.</para></listitem></itemizedlist>
bos@583 320
bos@584 321 <para id="x_98">Some people resist distributed tools because they want
bos@583 322 to retain tight control over their projects, and they
bos@583 323 believe that centralised tools give them this control.
bos@583 324 However, if you're of this belief, and you publish your CVS
bos@583 325 or Subversion repositories publicly, there are plenty of
bos@583 326 tools available that can pull out your entire project's
bos@583 327 history (albeit slowly) and recreate it somewhere that you
bos@583 328 don't control. So while your control in this case is
bos@583 329 illusory, you are forgoing the ability to fluidly
bos@583 330 collaborate with whatever people feel compelled to mirror
bos@583 331 and fork your history.</para>
bos@583 332
bos@583 333 </sect3>
bos@583 334 </sect2>
bos@583 335 <sect2>
bos@583 336 <title>Advantages for commercial projects</title>
bos@583 337
bos@584 338 <para id="x_99">Many commercial projects are undertaken by teams that are
bos@583 339 scattered across the globe. Contributors who are far from a
bos@583 340 central server will see slower command execution and perhaps
bos@583 341 less reliability. Commercial revision control systems attempt
bos@583 342 to ameliorate these problems with remote-site replication
bos@583 343 add-ons that are typically expensive to buy and cantankerous
bos@583 344 to administer. A distributed system doesn't suffer from these
bos@583 345 problems in the first place. Better yet, you can easily set
bos@583 346 up multiple authoritative servers, say one per site, so that
bos@583 347 there's no redundant communication between repositories over
bos@583 348 expensive long-haul network links.</para>
bos@583 349
bos@584 350 <para id="x_9a">Centralised revision control systems tend to have
bos@583 351 relatively low scalability. It's not unusual for an expensive
bos@583 352 centralised system to fall over under the combined load of
bos@583 353 just a few dozen concurrent users. Once again, the typical
bos@583 354 response tends to be an expensive and clunky replication
bos@583 355 facility. Since the load on a central server---if you have
bos@583 356 one at all---is many times lower with a distributed tool
bos@583 357 (because all of the data is replicated everywhere), a single
bos@583 358 cheap server can handle the needs of a much larger team, and
bos@583 359 replication to balance load becomes a simple matter of
bos@583 360 scripting.</para>
bos@583 361
bos@584 362 <para id="x_9b">If you have an employee in the field, troubleshooting a
bos@583 363 problem at a customer's site, they'll benefit from distributed
bos@583 364 revision control. The tool will let them generate custom
bos@583 365 builds, try different fixes in isolation from each other, and
bos@583 366 search efficiently through history for the sources of bugs and
bos@583 367 regressions in the customer's environment, all without needing
bos@583 368 to connect to your company's network.</para>
bos@583 369
bos@583 370 </sect2>
bos@583 371 </sect1>
bos@583 372 <sect1>
bos@583 373 <title>Why choose Mercurial?</title>
bos@583 374
bos@584 375 <para id="x_9c">Mercurial has a unique set of properties that make it a
bos@583 376 particularly good choice as a revision control system.</para>
bos@583 377 <itemizedlist>
bos@584 378 <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>
bos@584 379 <listitem><para id="x_9e">It is lightweight.</para></listitem>
bos@584 380 <listitem><para id="x_9f">It scales excellently.</para></listitem>
bos@584 381 <listitem><para id="x_a0">It is easy to
bos@583 382 customise.</para></listitem></itemizedlist>
bos@583 383
bos@584 384 <para id="x_a1">If you are at all familiar with revision control systems,
bos@583 385 you should be able to get up and running with Mercurial in less
bos@583 386 than five minutes. Even if not, it will take no more than a few
bos@583 387 minutes longer. Mercurial's command and feature sets are
bos@583 388 generally uniform and consistent, so you can keep track of a few
bos@583 389 general rules instead of a host of exceptions.</para>
bos@583 390
bos@584 391 <para id="x_a2">On a small project, you can start working with Mercurial in
bos@583 392 moments. Creating new changes and branches; transferring changes
bos@583 393 around (whether locally or over a network); and history and
bos@583 394 status operations are all fast. Mercurial attempts to stay
bos@583 395 nimble and largely out of your way by combining low cognitive
bos@583 396 overhead with blazingly fast operations.</para>
bos@583 397
bos@584 398 <para id="x_a3">The usefulness of Mercurial is not limited to small
bos@583 399 projects: it is used by projects with hundreds to thousands of
bos@583 400 contributors, each containing tens of thousands of files and
bos@583 401 hundreds of megabytes of source code.</para>
bos@583 402
bos@584 403 <para id="x_a4">If the core functionality of Mercurial is not enough for
bos@583 404 you, it's easy to build on. Mercurial is well suited to
bos@583 405 scripting tasks, and its clean internals and implementation in
bos@583 406 Python make it easy to add features in the form of extensions.
bos@583 407 There are a number of popular and useful extensions already
bos@583 408 available, ranging from helping to identify bugs to improving
bos@583 409 performance.</para>
bos@583 410
bos@583 411 </sect1>
bos@583 412 <sect1>
bos@583 413 <title>Mercurial compared with other tools</title>
bos@583 414
bos@584 415 <para id="x_a5">Before you read on, please understand that this section
bos@583 416 necessarily reflects my own experiences, interests, and (dare I
bos@583 417 say it) biases. I have used every one of the revision control
bos@583 418 tools listed below, in most cases for several years at a
bos@583 419 time.</para>
bos@583 420
bos@583 421
bos@583 422 <sect2>
bos@583 423 <title>Subversion</title>
bos@583 424
bos@584 425 <para id="x_a6">Subversion is a popular revision control tool, developed
bos@583 426 to replace CVS. It has a centralised client/server
bos@583 427 architecture.</para>
bos@583 428
bos@584 429 <para id="x_a7">Subversion and Mercurial have similarly named commands for
bos@583 430 performing the same operations, so if you're familiar with
bos@583 431 one, it is easy to learn to use the other. Both tools are
bos@583 432 portable to all popular operating systems.</para>
bos@583 433
bos@584 434 <para id="x_a8">Prior to version 1.5, Subversion had no useful support for
bos@583 435 merges. At the time of writing, its merge tracking capability
bos@583 436 is new, and known to be <ulink
bos@583 437 url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated
bos@583 438 and buggy</ulink>.</para>
bos@583 439
bos@584 440 <para id="x_a9">Mercurial has a substantial performance advantage over
bos@583 441 Subversion on every revision control operation I have
bos@583 442 benchmarked. I have measured its advantage as ranging from a
bos@583 443 factor of two to a factor of six when compared with Subversion
bos@583 444 1.4.3's <emphasis>ra_local</emphasis> file store, which is the
bos@583 445 fastest access method available. In more realistic
bos@583 446 deployments involving a network-based store, Subversion will
bos@583 447 be at a substantially larger disadvantage. Because many
bos@583 448 Subversion commands must talk to the server and Subversion
bos@583 449 does not have useful replication facilities, server capacity
bos@583 450 and network bandwidth become bottlenecks for modestly large
bos@583 451 projects.</para>
bos@583 452
bos@584 453 <para id="x_aa">Additionally, Subversion incurs substantial storage
bos@583 454 overhead to avoid network transactions for a few common
bos@583 455 operations, such as finding modified files
bos@583 456 (<literal>status</literal>) and displaying modifications
bos@583 457 against the current revision (<literal>diff</literal>). As a
bos@583 458 result, a Subversion working copy is often the same size as,
bos@583 459 or larger than, a Mercurial repository and working directory,
bos@583 460 even though the Mercurial repository contains a complete
bos@583 461 history of the project.</para>
bos@583 462
bos@584 463 <para id="x_ab">Subversion is widely supported by third party tools.
bos@583 464 Mercurial currently lags considerably in this area. This gap
bos@583 465 is closing, however, and indeed some of Mercurial's GUI tools
bos@583 466 now outshine their Subversion equivalents. Like Mercurial,
bos@583 467 Subversion has an excellent user manual.</para>
bos@583 468
bos@584 469 <para id="x_ac">Because Subversion doesn't store revision history on the
bos@583 470 client, it is well suited to managing projects that deal with
bos@583 471 lots of large, opaque binary files. If you check in fifty
bos@583 472 revisions to an incompressible 10MB file, Subversion's
bos@583 473 client-side space usage stays constant The space used by any
bos@583 474 distributed SCM will grow rapidly in proportion to the number
bos@583 475 of revisions, because the differences between each revision
bos@583 476 are large.</para>
bos@583 477
bos@584 478 <para id="x_ad">In addition, it's often difficult or, more usually,
bos@583 479 impossible to merge different versions of a binary file.
bos@583 480 Subversion's ability to let a user lock a file, so that they
bos@583 481 temporarily have the exclusive right to commit changes to it,
bos@583 482 can be a significant advantage to a project where binary files
bos@583 483 are widely used.</para>
bos@583 484
bos@584 485 <para id="x_ae">Mercurial can import revision history from a Subversion
bos@583 486 repository. It can also export revision history to a
bos@583 487 Subversion repository. This makes it easy to <quote>test the
bos@583 488 waters</quote> and use Mercurial and Subversion in parallel
bos@583 489 before deciding to switch. History conversion is incremental,
bos@583 490 so you can perform an initial conversion, then small
bos@583 491 additional conversions afterwards to bring in new
bos@583 492 changes.</para>
bos@583 493
bos@583 494
bos@583 495 </sect2>
bos@583 496 <sect2>
bos@583 497 <title>Git</title>
bos@583 498
bos@584 499 <para id="x_af">Git is a distributed revision control tool that was
bos@583 500 developed for managing the Linux kernel source tree. Like
bos@583 501 Mercurial, its early design was somewhat influenced by
bos@583 502 Monotone.</para>
bos@583 503
bos@584 504 <para id="x_b0">Git has a very large command set, with version 1.5.0
bos@583 505 providing 139 individual commands. It has something of a
bos@583 506 reputation for being difficult to learn. Compared to Git,
bos@583 507 Mercurial has a strong focus on simplicity.</para>
bos@583 508
bos@584 509 <para id="x_b1">In terms of performance, Git is extremely fast. In
bos@583 510 several cases, it is faster than Mercurial, at least on Linux,
bos@583 511 while Mercurial performs better on other operations. However,
bos@583 512 on Windows, the performance and general level of support that
bos@583 513 Git provides is, at the time of writing, far behind that of
bos@583 514 Mercurial.</para>
bos@583 515
bos@584 516 <para id="x_b2">While a Mercurial repository needs no maintenance, a Git
bos@583 517 repository requires frequent manual <quote>repacks</quote> of
bos@583 518 its metadata. Without these, performance degrades, while
bos@583 519 space usage grows rapidly. A server that contains many Git
bos@583 520 repositories that are not rigorously and frequently repacked
bos@583 521 will become heavily disk-bound during backups, and there have
bos@583 522 been instances of daily backups taking far longer than 24
bos@583 523 hours as a result. A freshly packed Git repository is
bos@583 524 slightly smaller than a Mercurial repository, but an unpacked
bos@583 525 repository is several orders of magnitude larger.</para>
bos@583 526
bos@584 527 <para id="x_b3">The core of Git is written in C. Many Git commands are
bos@583 528 implemented as shell or Perl scripts, and the quality of these
bos@583 529 scripts varies widely. I have encountered several instances
bos@583 530 where scripts charged along blindly in the presence of errors
bos@583 531 that should have been fatal.</para>
bos@583 532
bos@584 533 <para id="x_b4">Mercurial can import revision history from a Git
bos@583 534 repository.</para>
bos@583 535
bos@583 536
bos@583 537 </sect2>
bos@583 538 <sect2>
bos@583 539 <title>CVS</title>
bos@583 540
bos@584 541 <para id="x_b5">CVS is probably the most widely used revision control tool
bos@583 542 in the world. Due to its age and internal untidiness, it has
bos@583 543 been only lightly maintained for many years.</para>
bos@583 544
bos@584 545 <para id="x_b6">It has a centralised client/server architecture. It does
bos@583 546 not group related file changes into atomic commits, making it
bos@583 547 easy for people to <quote>break the build</quote>: one person
bos@583 548 can successfully commit part of a change and then be blocked
bos@583 549 by the need for a merge, causing other people to see only a
bos@583 550 portion of the work they intended to do. This also affects
bos@583 551 how you work with project history. If you want to see all of
bos@583 552 the modifications someone made as part of a task, you will
bos@583 553 need to manually inspect the descriptions and timestamps of
bos@583 554 the changes made to each file involved (if you even know what
bos@583 555 those files were).</para>
bos@583 556
bos@584 557 <para id="x_b7">CVS has a muddled notion of tags and branches that I will
bos@583 558 not attempt to even describe. It does not support renaming of
bos@583 559 files or directories well, making it easy to corrupt a
bos@583 560 repository. It has almost no internal consistency checking
bos@583 561 capabilities, so it is usually not even possible to tell
bos@583 562 whether or how a repository is corrupt. I would not recommend
bos@583 563 CVS for any project, existing or new.</para>
bos@583 564
bos@584 565 <para id="x_b8">Mercurial can import CVS revision history. However, there
bos@583 566 are a few caveats that apply; these are true of every other
bos@583 567 revision control tool's CVS importer, too. Due to CVS's lack
bos@583 568 of atomic changes and unversioned filesystem hierarchy, it is
bos@583 569 not possible to reconstruct CVS history completely accurately;
bos@583 570 some guesswork is involved, and renames will usually not show
bos@583 571 up. Because a lot of advanced CVS administration has to be
bos@583 572 done by hand and is hence error-prone, it's common for CVS
bos@583 573 importers to run into multiple problems with corrupted
bos@583 574 repositories (completely bogus revision timestamps and files
bos@583 575 that have remained locked for over a decade are just two of
bos@583 576 the less interesting problems I can recall from personal
bos@583 577 experience).</para>
bos@583 578
bos@584 579 <para id="x_b9">Mercurial can import revision history from a CVS
bos@583 580 repository.</para>
bos@583 581
bos@583 582
bos@583 583 </sect2>
bos@583 584 <sect2>
bos@583 585 <title>Commercial tools</title>
bos@583 586
bos@584 587 <para id="x_ba">Perforce has a centralised client/server architecture,
bos@583 588 with no client-side caching of any data. Unlike modern
bos@583 589 revision control tools, Perforce requires that a user run a
bos@583 590 command to inform the server about every file they intend to
bos@583 591 edit.</para>
bos@583 592
bos@584 593 <para id="x_bb">The performance of Perforce is quite good for small teams,
bos@583 594 but it falls off rapidly as the number of users grows beyond a
bos@583 595 few dozen. Modestly large Perforce installations require the
bos@583 596 deployment of proxies to cope with the load their users
bos@583 597 generate.</para>
bos@583 598
bos@583 599
bos@583 600 </sect2>
bos@583 601 <sect2>
bos@583 602 <title>Choosing a revision control tool</title>
bos@583 603
bos@584 604 <para id="x_bc">With the exception of CVS, all of the tools listed above
bos@583 605 have unique strengths that suit them to particular styles of
bos@583 606 work. There is no single revision control tool that is best
bos@583 607 in all situations.</para>
bos@583 608
bos@584 609 <para id="x_bd">As an example, Subversion is a good choice for working
bos@583 610 with frequently edited binary files, due to its centralised
bos@583 611 nature and support for file locking.</para>
bos@583 612
bos@584 613 <para id="x_be">I personally find Mercurial's properties of simplicity,
bos@583 614 performance, and good merge support to be a compelling
bos@583 615 combination that has served me well for several years.</para>
bos@583 616
bos@583 617
bos@583 618 </sect2>
bos@583 619 </sect1>
bos@583 620 <sect1>
bos@583 621 <title>Switching from another tool to Mercurial</title>
bos@583 622
bos@584 623 <para id="x_bf">Mercurial is bundled with an extension named <literal
bos@583 624 role="hg-ext">convert</literal>, which can incrementally
bos@583 625 import revision history from several other revision control
bos@583 626 tools. By <quote>incremental</quote>, I mean that you can
bos@583 627 convert all of a project's history to date in one go, then rerun
bos@583 628 the conversion later to obtain new changes that happened after
bos@583 629 the initial conversion.</para>
bos@583 630
bos@584 631 <para id="x_c0">The revision control tools supported by <literal
bos@583 632 role="hg-ext">convert</literal> are as follows:</para>
bos@583 633 <itemizedlist>
bos@584 634 <listitem><para id="x_c1">Subversion</para></listitem>
bos@584 635 <listitem><para id="x_c2">CVS</para></listitem>
bos@584 636 <listitem><para id="x_c3">Git</para></listitem>
bos@584 637 <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>
bos@584 638
bos@584 639 <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can
bos@583 640 export changes from Mercurial to Subversion. This makes it
bos@583 641 possible to try Subversion and Mercurial in parallel before
bos@583 642 committing to a switchover, without risking the loss of any
bos@583 643 work.</para>
bos@583 644
bos@584 645 <para id="x_c6">The <command role="hg-ext-convert">convert</command> command
bos@583 646 is easy to use. Simply point it at the path or URL of the
bos@583 647 source repository, optionally give it the name of the
bos@583 648 destination repository, and it will start working. After the
bos@583 649 initial conversion, just run the same command again to import
bos@583 650 new changes.</para>
bos@583 651 </sect1>
bos@583 652
bos@583 653 <sect1>
bos@583 654 <title>A short history of revision control</title>
bos@583 655
bos@584 656 <para id="x_c7">The best known of the old-time revision control tools is
bos@583 657 SCCS (Source Code Control System), which Marc Rochkind wrote at
bos@583 658 Bell Labs, in the early 1970s. SCCS operated on individual
bos@583 659 files, and required every person working on a project to have
bos@583 660 access to a shared workspace on a single system. Only one
bos@583 661 person could modify a file at any time; arbitration for access
bos@583 662 to files was via locks. It was common for people to lock files,
bos@583 663 and later forget to unlock them, preventing anyone else from
bos@583 664 modifying those files without the help of an
bos@583 665 administrator.</para>
bos@583 666
bos@584 667 <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the
bos@583 668 early 1980s; he called his program RCS (Revision Control System).
bos@583 669 Like SCCS, RCS required developers to work in a single shared
bos@583 670 workspace, and to lock files to prevent multiple people from
bos@583 671 modifying them simultaneously.</para>
bos@583 672
bos@584 673 <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block
bos@583 674 for a set of shell scripts he initially called cmt, but then
bos@583 675 renamed to CVS (Concurrent Versions System). The big innovation
bos@583 676 of CVS was that it let developers work simultaneously and
bos@583 677 somewhat independently in their own personal workspaces. The
bos@583 678 personal workspaces prevented developers from stepping on each
bos@583 679 other's toes all the time, as was common with SCCS and RCS. Each
bos@583 680 developer had a copy of every project file, and could modify
bos@583 681 their copies independently. They had to merge their edits prior
bos@583 682 to committing changes to the central repository.</para>
bos@583 683
bos@584 684 <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote
bos@583 685 them in C, releasing in 1989 the code that has since developed
bos@583 686 into the modern version of CVS. CVS subsequently acquired the
bos@583 687 ability to operate over a network connection, giving it a
bos@583 688 client/server architecture. CVS's architecture is centralised;
bos@583 689 only the server has a copy of the history of the project. Client
bos@583 690 workspaces just contain copies of recent versions of the
bos@583 691 project's files, and a little metadata to tell them where the
bos@583 692 server is. CVS has been enormously successful; it is probably
bos@583 693 the world's most widely used revision control system.</para>
bos@583 694
bos@584 695 <para id="x_cb">In the early 1990s, Sun Microsystems developed an early
bos@583 696 distributed revision control system, called TeamWare. A
bos@583 697 TeamWare workspace contains a complete copy of the project's
bos@583 698 history. TeamWare has no notion of a central repository. (CVS
bos@583 699 relied upon RCS for its history storage; TeamWare used
bos@583 700 SCCS.)</para>
bos@583 701
bos@584 702 <para id="x_cc">As the 1990s progressed, awareness grew of a number of
bos@583 703 problems with CVS. It records simultaneous changes to multiple
bos@583 704 files individually, instead of grouping them together as a
bos@583 705 single logically atomic operation. It does not manage its file
bos@583 706 hierarchy well; it is easy to make a mess of a repository by
bos@583 707 renaming files and directories. Worse, its source code is
bos@583 708 difficult to read and maintain, which made the <quote>pain
bos@583 709 level</quote> of fixing these architectural problems
bos@583 710 prohibitive.</para>
bos@583 711
bos@584 712 <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had
bos@583 713 worked on CVS, started a project to replace it with a tool that
bos@583 714 would have a better architecture and cleaner code. The result,
bos@583 715 Subversion, does not stray from CVS's centralised client/server
bos@583 716 model, but it adds multi-file atomic commits, better namespace
bos@583 717 management, and a number of other features that make it a
bos@583 718 generally better tool than CVS. Since its initial release, it
bos@583 719 has rapidly grown in popularity.</para>
bos@583 720
bos@584 721 <para id="x_ce">More or less simultaneously, Graydon Hoare began working on
bos@583 722 an ambitious distributed revision control system that he named
bos@583 723 Monotone. While Monotone addresses many of CVS's design flaws
bos@583 724 and has a peer-to-peer architecture, it goes beyond earlier (and
bos@583 725 subsequent) revision control tools in a number of innovative
bos@583 726 ways. It uses cryptographic hashes as identifiers, and has an
bos@583 727 integral notion of <quote>trust</quote> for code from different
bos@583 728 sources.</para>
bos@583 729
bos@584 730 <para id="x_cf">Mercurial began life in 2005. While a few aspects of its
bos@583 731 design are influenced by Monotone, Mercurial focuses on ease of
bos@583 732 use, high performance, and scalability to very large
bos@583 733 projects.</para>
bos@583 734
bos@583 735 </sect1>
bos@583 736
bos@583 737 <sect1>
bos@583 738 <title>Colophon&emdash;this book is Free</title>
bos@26 739
bos@584 740 <para id="x_d0">This book is licensed under the Open Publication License,
bos@559 741 and is produced entirely using Free Software tools. It is
bos@580 742 typeset with DocBook XML. Illustrations are drawn and rendered with
bos@559 743 <ulink url="http://www.inkscape.org/">Inkscape</ulink>.</para>
bos@26 744
bos@584 745 <para id="x_d1">The complete source code for this book is published as a
bos@559 746 Mercurial repository, at <ulink
bos@559 747 url="http://hg.serpentine.com/mercurial/book">http://hg.serpentine.com/mercurial/book</ulink>.</para>
bos@559 748
bos@559 749 </sect1>
bos@559 750 </preface>
bos@559 751 <!--
bos@559 752 local variables:
bos@559 753 sgml-parent-document: ("00book.xml" "book" "preface")
bos@559 754 end:
bos@559 755 -->