hgbook

annotate en/ch00-preface.xml @ 650:7e7c47481e4f

Oops, this is the real merge for my hg's oddity
author Dongsheng Song <dongsheng.song@gmail.com>
date Fri Mar 20 16:43:35 2009 +0800 (2009-03-20)
parents d0160b0b1a9e
children 751ee9bf2e8d
rev   line source
bos@559 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
bos@26 2
dongsheng@625 3 <preface id="chap.preface">
dongsheng@650 4 <?dbhtml filename="preface.html"?>
bos@559 5 <title>Preface</title>
bos@26 6
dongsheng@650 7 <sect1>
dongsheng@650 8 <title>Why revision control? Why Mercurial?</title>
dongsheng@650 9
dongsheng@650 10 <para id="x_6d">Revision control is the process of managing multiple
dongsheng@650 11 versions of a piece of information. In its simplest form, this
dongsheng@650 12 is something that many people do by hand: every time you modify
dongsheng@650 13 a file, save it under a new name that contains a number, each
dongsheng@650 14 one higher than the number of the preceding version.</para>
dongsheng@650 15
dongsheng@650 16 <para id="x_6e">Manually managing multiple versions of even a single file is
dongsheng@650 17 an error-prone task, though, so software tools to help automate
dongsheng@650 18 this process have long been available. The earliest automated
dongsheng@650 19 revision control tools were intended to help a single user to
dongsheng@650 20 manage revisions of a single file. Over the past few decades,
dongsheng@650 21 the scope of revision control tools has expanded greatly; they
dongsheng@650 22 now manage multiple files, and help multiple people to work
dongsheng@650 23 together. The best modern revision control tools have no
dongsheng@650 24 problem coping with thousands of people working together on
dongsheng@650 25 projects that consist of hundreds of thousands of files.</para>
dongsheng@650 26
dongsheng@650 27 <para id="x_6f">The arrival of distributed revision control is relatively
dongsheng@650 28 recent, and so far this new field has grown due to people's
dongsheng@650 29 willingness to explore ill-charted territory.</para>
dongsheng@650 30
dongsheng@650 31 <para id="x_70">I am writing a book about distributed revision control
dongsheng@650 32 because I believe that it is an important subject that deserves
dongsheng@650 33 a field guide. I chose to write about Mercurial because it is
dongsheng@650 34 the easiest tool to learn the terrain with, and yet it scales to
dongsheng@650 35 the demands of real, challenging environments where many other
dongsheng@650 36 revision control tools buckle.</para>
dongsheng@650 37
dongsheng@650 38 <sect2>
dongsheng@650 39 <title>Why use revision control?</title>
dongsheng@650 40
dongsheng@650 41 <para id="x_71">There are a number of reasons why you or your team might
dongsheng@650 42 want to use an automated revision control tool for a
dongsheng@650 43 project.</para>
dongsheng@650 44
dongsheng@650 45 <itemizedlist>
dongsheng@650 46 <listitem><para id="x_72">It will track the history and evolution of
dongsheng@650 47 your project, so you don't have to. For every change,
dongsheng@650 48 you'll have a log of <emphasis>who</emphasis> made it;
dongsheng@650 49 <emphasis>why</emphasis> they made it;
dongsheng@650 50 <emphasis>when</emphasis> they made it; and
dongsheng@650 51 <emphasis>what</emphasis> the change
dongsheng@650 52 was.</para></listitem>
dongsheng@650 53 <listitem><para id="x_73">When you're working with other people,
dongsheng@650 54 revision control software makes it easier for you to
dongsheng@650 55 collaborate. For example, when people more or less
dongsheng@650 56 simultaneously make potentially incompatible changes, the
dongsheng@650 57 software will help you to identify and resolve those
dongsheng@650 58 conflicts.</para></listitem>
dongsheng@650 59 <listitem><para id="x_74">It can help you to recover from mistakes. If
dongsheng@650 60 you make a change that later turns out to be in error, you
dongsheng@650 61 can revert to an earlier version of one or more files. In
dongsheng@650 62 fact, a <emphasis>really</emphasis> good revision control
dongsheng@650 63 tool will even help you to efficiently figure out exactly
dongsheng@650 64 when a problem was introduced (see section <xref
dongsheng@650 65 linkend="sec.undo.bisect"/> for details).</para></listitem>
dongsheng@650 66 <listitem><para id="x_75">It will help you to work simultaneously on,
dongsheng@650 67 and manage the drift between, multiple versions of your
dongsheng@650 68 project.</para></listitem>
dongsheng@650 69 </itemizedlist>
dongsheng@650 70
dongsheng@650 71 <para id="x_76">Most of these reasons are equally valid---at least in
dongsheng@650 72 theory---whether you're working on a project by yourself, or
dongsheng@650 73 with a hundred other people.</para>
dongsheng@650 74
dongsheng@650 75 <para id="x_77">A key question about the practicality of revision control
dongsheng@650 76 at these two different scales (<quote>lone hacker</quote> and
dongsheng@650 77 <quote>huge team</quote>) is how its
dongsheng@650 78 <emphasis>benefits</emphasis> compare to its
dongsheng@650 79 <emphasis>costs</emphasis>. A revision control tool that's
dongsheng@650 80 difficult to understand or use is going to impose a high
dongsheng@650 81 cost.</para>
dongsheng@650 82
dongsheng@650 83 <para id="x_78">A five-hundred-person project is likely to collapse under
dongsheng@650 84 its own weight almost immediately without a revision control
dongsheng@650 85 tool and process. In this case, the cost of using revision
dongsheng@650 86 control might hardly seem worth considering, since
dongsheng@650 87 <emphasis>without</emphasis> it, failure is almost
dongsheng@650 88 guaranteed.</para>
dongsheng@650 89
dongsheng@650 90 <para id="x_79">On the other hand, a one-person <quote>quick hack</quote>
dongsheng@650 91 might seem like a poor place to use a revision control tool,
dongsheng@650 92 because surely the cost of using one must be close to the
dongsheng@650 93 overall cost of the project. Right?</para>
dongsheng@650 94
dongsheng@650 95 <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of
dongsheng@650 96 these scales of development. You can learn the basics in just
dongsheng@650 97 a few minutes, and due to its low overhead, you can apply
dongsheng@650 98 revision control to the smallest of projects with ease. Its
dongsheng@650 99 simplicity means you won't have a lot of abstruse concepts or
dongsheng@650 100 command sequences competing for mental space with whatever
dongsheng@650 101 you're <emphasis>really</emphasis> trying to do. At the same
dongsheng@650 102 time, Mercurial's high performance and peer-to-peer nature let
dongsheng@650 103 you scale painlessly to handle large projects.</para>
dongsheng@650 104
dongsheng@650 105 <para id="x_7b">No revision control tool can rescue a poorly run project,
dongsheng@650 106 but a good choice of tools can make a huge difference to the
dongsheng@650 107 fluidity with which you can work on a project.</para>
dongsheng@650 108
dongsheng@650 109 </sect2>
dongsheng@650 110
dongsheng@650 111 <sect2>
dongsheng@650 112 <title>The many names of revision control</title>
dongsheng@650 113
dongsheng@650 114 <para id="x_7c">Revision control is a diverse field, so much so that it is
dongsheng@650 115 referred to by many names and acronyms. Here are a few of the
dongsheng@650 116 more common variations you'll encounter:</para>
dongsheng@650 117 <itemizedlist>
dongsheng@650 118 <listitem><para id="x_7d">Revision control (RCS)</para></listitem>
dongsheng@650 119 <listitem><para id="x_7e">Software configuration management (SCM), or
dongsheng@650 120 configuration management</para></listitem>
dongsheng@650 121 <listitem><para id="x_7f">Source code management</para></listitem>
dongsheng@650 122 <listitem><para id="x_80">Source code control, or source
dongsheng@650 123 control</para></listitem>
dongsheng@650 124 <listitem><para id="x_81">Version control
dongsheng@650 125 (VCS)</para></listitem></itemizedlist>
dongsheng@650 126 <para id="x_82">Some people claim that these terms actually have different
dongsheng@650 127 meanings, but in practice they overlap so much that there's no
dongsheng@650 128 agreed or even useful way to tease them apart.</para>
dongsheng@650 129
dongsheng@650 130 </sect2>
dongsheng@650 131 </sect1>
bos@26 132
bos@559 133 <sect1>
bos@559 134 <title>This book is a work in progress</title>
bos@26 135
dongsheng@650 136 <para id="x_83">I am releasing this book while I am still writing it, in the
dongsheng@650 137 hope that it will prove useful to others. I am writing under an
dongsheng@650 138 open license in the hope that you, my readers, will contribute
dongsheng@650 139 feedback and perhaps content of your own.</para>
bos@200 140
bos@559 141 </sect1>
bos@559 142 <sect1>
bos@559 143 <title>About the examples in this book</title>
bos@200 144
dongsheng@650 145 <para id="x_84">This book takes an unusual approach to code samples. Every
bos@559 146 example is <quote>live</quote>---each one is actually the result
bos@559 147 of a shell script that executes the Mercurial commands you see.
bos@559 148 Every time an image of the book is built from its sources, all
bos@559 149 the example scripts are automatically run, and their current
bos@559 150 results compared against their expected results.</para>
bos@200 151
dongsheng@650 152 <para id="x_85">The advantage of this approach is that the examples are
bos@559 153 always accurate; they describe <emphasis>exactly</emphasis> the
bos@559 154 behaviour of the version of Mercurial that's mentioned at the
bos@559 155 front of the book. If I update the version of Mercurial that
bos@559 156 I'm documenting, and the output of some command changes, the
bos@559 157 build fails.</para>
bos@200 158
dongsheng@650 159 <para id="x_86">There is a small disadvantage to this approach, which is
bos@559 160 that the dates and times you'll see in examples tend to be
bos@559 161 <quote>squashed</quote> together in a way that they wouldn't be
bos@559 162 if the same commands were being typed by a human. Where a human
bos@559 163 can issue no more than one command every few seconds, with any
bos@559 164 resulting timestamps correspondingly spread out, my automated
bos@559 165 example scripts run many commands in one second.</para>
bos@200 166
dongsheng@650 167 <para id="x_87">As an instance of this, several consecutive commits in an
bos@559 168 example can show up as having occurred during the same second.
bos@559 169 You can see this occur in the <literal
bos@559 170 role="hg-ext">bisect</literal> example in section <xref
dongsheng@625 171 id="sec.undo.bisect"/>, for instance.</para>
bos@200 172
dongsheng@650 173 <para id="x_88">So when you're reading examples, don't place too much weight
bos@559 174 on the dates or times you see in the output of commands. But
bos@559 175 <emphasis>do</emphasis> be confident that the behaviour you're
bos@559 176 seeing is consistent and reproducible.</para>
bos@26 177
bos@559 178 </sect1>
dongsheng@650 179
dongsheng@650 180 <sect1>
dongsheng@650 181 <title>Trends in the field</title>
dongsheng@650 182
dongsheng@650 183 <para id="x_89">There has been an unmistakable trend in the development and
dongsheng@650 184 use of revision control tools over the past four decades, as
dongsheng@650 185 people have become familiar with the capabilities of their tools
dongsheng@650 186 and constrained by their limitations.</para>
dongsheng@650 187
dongsheng@650 188 <para id="x_8a">The first generation began by managing single files on
dongsheng@650 189 individual computers. Although these tools represented a huge
dongsheng@650 190 advance over ad-hoc manual revision control, their locking model
dongsheng@650 191 and reliance on a single computer limited them to small,
dongsheng@650 192 tightly-knit teams.</para>
dongsheng@650 193
dongsheng@650 194 <para id="x_8b">The second generation loosened these constraints by moving
dongsheng@650 195 to network-centered architectures, and managing entire projects
dongsheng@650 196 at a time. As projects grew larger, they ran into new problems.
dongsheng@650 197 With clients needing to talk to servers very frequently, server
dongsheng@650 198 scaling became an issue for large projects. An unreliable
dongsheng@650 199 network connection could prevent remote users from being able to
dongsheng@650 200 talk to the server at all. As open source projects started
dongsheng@650 201 making read-only access available anonymously to anyone, people
dongsheng@650 202 without commit privileges found that they could not use the
dongsheng@650 203 tools to interact with a project in a natural way, as they could
dongsheng@650 204 not record their changes.</para>
dongsheng@650 205
dongsheng@650 206 <para id="x_8c">The current generation of revision control tools is
dongsheng@650 207 peer-to-peer in nature. All of these systems have dropped the
dongsheng@650 208 dependency on a single central server, and allow people to
dongsheng@650 209 distribute their revision control data to where it's actually
dongsheng@650 210 needed. Collaboration over the Internet has moved from
dongsheng@650 211 constrained by technology to a matter of choice and consensus.
dongsheng@650 212 Modern tools can operate offline indefinitely and autonomously,
dongsheng@650 213 with a network connection only needed when syncing changes with
dongsheng@650 214 another repository.</para>
dongsheng@650 215
dongsheng@650 216 </sect1>
dongsheng@650 217 <sect1>
dongsheng@650 218 <title>A few of the advantages of distributed revision
dongsheng@650 219 control</title>
dongsheng@650 220
dongsheng@650 221 <para id="x_8d">Even though distributed revision control tools have for
dongsheng@650 222 several years been as robust and usable as their
dongsheng@650 223 previous-generation counterparts, people using older tools have
dongsheng@650 224 not yet necessarily woken up to their advantages. There are a
dongsheng@650 225 number of ways in which distributed tools shine relative to
dongsheng@650 226 centralised ones.</para>
dongsheng@650 227
dongsheng@650 228 <para id="x_8e">For an individual developer, distributed tools are almost
dongsheng@650 229 always much faster than centralised tools. This is for a simple
dongsheng@650 230 reason: a centralised tool needs to talk over the network for
dongsheng@650 231 many common operations, because most metadata is stored in a
dongsheng@650 232 single copy on the central server. A distributed tool stores
dongsheng@650 233 all of its metadata locally. All else being equal, talking over
dongsheng@650 234 the network adds overhead to a centralised tool. Don't
dongsheng@650 235 underestimate the value of a snappy, responsive tool: you're
dongsheng@650 236 going to spend a lot of time interacting with your revision
dongsheng@650 237 control software.</para>
dongsheng@650 238
dongsheng@650 239 <para id="x_8f">Distributed tools are indifferent to the vagaries of your
dongsheng@650 240 server infrastructure, again because they replicate metadata to
dongsheng@650 241 so many locations. If you use a centralised system and your
dongsheng@650 242 server catches fire, you'd better hope that your backup media
dongsheng@650 243 are reliable, and that your last backup was recent and actually
dongsheng@650 244 worked. With a distributed tool, you have many backups
dongsheng@650 245 available on every contributor's computer.</para>
dongsheng@650 246
dongsheng@650 247 <para id="x_90">The reliability of your network will affect distributed
dongsheng@650 248 tools far less than it will centralised tools. You can't even
dongsheng@650 249 use a centralised tool without a network connection, except for
dongsheng@650 250 a few highly constrained commands. With a distributed tool, if
dongsheng@650 251 your network connection goes down while you're working, you may
dongsheng@650 252 not even notice. The only thing you won't be able to do is talk
dongsheng@650 253 to repositories on other computers, something that is relatively
dongsheng@650 254 rare compared with local operations. If you have a far-flung
dongsheng@650 255 team of collaborators, this may be significant.</para>
dongsheng@650 256
dongsheng@650 257 <sect2>
dongsheng@650 258 <title>Advantages for open source projects</title>
dongsheng@650 259
dongsheng@650 260 <para id="x_91">If you take a shine to an open source project and decide
dongsheng@650 261 that you would like to start hacking on it, and that project
dongsheng@650 262 uses a distributed revision control tool, you are at once a
dongsheng@650 263 peer with the people who consider themselves the
dongsheng@650 264 <quote>core</quote> of that project. If they publish their
dongsheng@650 265 repositories, you can immediately copy their project history,
dongsheng@650 266 start making changes, and record your work, using the same
dongsheng@650 267 tools in the same ways as insiders. By contrast, with a
dongsheng@650 268 centralised tool, you must use the software in a <quote>read
dongsheng@650 269 only</quote> mode unless someone grants you permission to
dongsheng@650 270 commit changes to their central server. Until then, you won't
dongsheng@650 271 be able to record changes, and your local modifications will
dongsheng@650 272 be at risk of corruption any time you try to update your
dongsheng@650 273 client's view of the repository.</para>
dongsheng@650 274
dongsheng@650 275 <sect3>
dongsheng@650 276 <title>The forking non-problem</title>
dongsheng@650 277
dongsheng@650 278 <para id="x_92">It has been suggested that distributed revision control
dongsheng@650 279 tools pose some sort of risk to open source projects because
dongsheng@650 280 they make it easy to <quote>fork</quote> the development of
dongsheng@650 281 a project. A fork happens when there are differences in
dongsheng@650 282 opinion or attitude between groups of developers that cause
dongsheng@650 283 them to decide that they can't work together any longer.
dongsheng@650 284 Each side takes a more or less complete copy of the
dongsheng@650 285 project's source code, and goes off in its own
dongsheng@650 286 direction.</para>
dongsheng@650 287
dongsheng@650 288 <para id="x_93">Sometimes the camps in a fork decide to reconcile their
dongsheng@650 289 differences. With a centralised revision control system, the
dongsheng@650 290 <emphasis>technical</emphasis> process of reconciliation is
dongsheng@650 291 painful, and has to be performed largely by hand. You have
dongsheng@650 292 to decide whose revision history is going to
dongsheng@650 293 <quote>win</quote>, and graft the other team's changes into
dongsheng@650 294 the tree somehow. This usually loses some or all of one
dongsheng@650 295 side's revision history.</para>
dongsheng@650 296
dongsheng@650 297 <para id="x_94">What distributed tools do with respect to forking is
dongsheng@650 298 they make forking the <emphasis>only</emphasis> way to
dongsheng@650 299 develop a project. Every single change that you make is
dongsheng@650 300 potentially a fork point. The great strength of this
dongsheng@650 301 approach is that a distributed revision control tool has to
dongsheng@650 302 be really good at <emphasis>merging</emphasis> forks,
dongsheng@650 303 because forks are absolutely fundamental: they happen all
dongsheng@650 304 the time.</para>
dongsheng@650 305
dongsheng@650 306 <para id="x_95">If every piece of work that everybody does, all the
dongsheng@650 307 time, is framed in terms of forking and merging, then what
dongsheng@650 308 the open source world refers to as a <quote>fork</quote>
dongsheng@650 309 becomes <emphasis>purely</emphasis> a social issue. If
dongsheng@650 310 anything, distributed tools <emphasis>lower</emphasis> the
dongsheng@650 311 likelihood of a fork:</para>
dongsheng@650 312 <itemizedlist>
dongsheng@650 313 <listitem><para id="x_96">They eliminate the social distinction that
dongsheng@650 314 centralised tools impose: that between insiders (people
dongsheng@650 315 with commit access) and outsiders (people
dongsheng@650 316 without).</para></listitem>
dongsheng@650 317 <listitem><para id="x_97">They make it easier to reconcile after a
dongsheng@650 318 social fork, because all that's involved from the
dongsheng@650 319 perspective of the revision control software is just
dongsheng@650 320 another merge.</para></listitem></itemizedlist>
dongsheng@650 321
dongsheng@650 322 <para id="x_98">Some people resist distributed tools because they want
dongsheng@650 323 to retain tight control over their projects, and they
dongsheng@650 324 believe that centralised tools give them this control.
dongsheng@650 325 However, if you're of this belief, and you publish your CVS
dongsheng@650 326 or Subversion repositories publicly, there are plenty of
dongsheng@650 327 tools available that can pull out your entire project's
dongsheng@650 328 history (albeit slowly) and recreate it somewhere that you
dongsheng@650 329 don't control. So while your control in this case is
dongsheng@650 330 illusory, you are forgoing the ability to fluidly
dongsheng@650 331 collaborate with whatever people feel compelled to mirror
dongsheng@650 332 and fork your history.</para>
dongsheng@650 333
dongsheng@650 334 </sect3>
dongsheng@650 335 </sect2>
dongsheng@650 336 <sect2>
dongsheng@650 337 <title>Advantages for commercial projects</title>
dongsheng@650 338
dongsheng@650 339 <para id="x_99">Many commercial projects are undertaken by teams that are
dongsheng@650 340 scattered across the globe. Contributors who are far from a
dongsheng@650 341 central server will see slower command execution and perhaps
dongsheng@650 342 less reliability. Commercial revision control systems attempt
dongsheng@650 343 to ameliorate these problems with remote-site replication
dongsheng@650 344 add-ons that are typically expensive to buy and cantankerous
dongsheng@650 345 to administer. A distributed system doesn't suffer from these
dongsheng@650 346 problems in the first place. Better yet, you can easily set
dongsheng@650 347 up multiple authoritative servers, say one per site, so that
dongsheng@650 348 there's no redundant communication between repositories over
dongsheng@650 349 expensive long-haul network links.</para>
dongsheng@650 350
dongsheng@650 351 <para id="x_9a">Centralised revision control systems tend to have
dongsheng@650 352 relatively low scalability. It's not unusual for an expensive
dongsheng@650 353 centralised system to fall over under the combined load of
dongsheng@650 354 just a few dozen concurrent users. Once again, the typical
dongsheng@650 355 response tends to be an expensive and clunky replication
dongsheng@650 356 facility. Since the load on a central server---if you have
dongsheng@650 357 one at all---is many times lower with a distributed tool
dongsheng@650 358 (because all of the data is replicated everywhere), a single
dongsheng@650 359 cheap server can handle the needs of a much larger team, and
dongsheng@650 360 replication to balance load becomes a simple matter of
dongsheng@650 361 scripting.</para>
dongsheng@650 362
dongsheng@650 363 <para id="x_9b">If you have an employee in the field, troubleshooting a
dongsheng@650 364 problem at a customer's site, they'll benefit from distributed
dongsheng@650 365 revision control. The tool will let them generate custom
dongsheng@650 366 builds, try different fixes in isolation from each other, and
dongsheng@650 367 search efficiently through history for the sources of bugs and
dongsheng@650 368 regressions in the customer's environment, all without needing
dongsheng@650 369 to connect to your company's network.</para>
dongsheng@650 370
dongsheng@650 371 </sect2>
dongsheng@650 372 </sect1>
dongsheng@650 373 <sect1>
dongsheng@650 374 <title>Why choose Mercurial?</title>
dongsheng@650 375
dongsheng@650 376 <para id="x_9c">Mercurial has a unique set of properties that make it a
dongsheng@650 377 particularly good choice as a revision control system.</para>
dongsheng@650 378 <itemizedlist>
dongsheng@650 379 <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>
dongsheng@650 380 <listitem><para id="x_9e">It is lightweight.</para></listitem>
dongsheng@650 381 <listitem><para id="x_9f">It scales excellently.</para></listitem>
dongsheng@650 382 <listitem><para id="x_a0">It is easy to
dongsheng@650 383 customise.</para></listitem></itemizedlist>
dongsheng@650 384
dongsheng@650 385 <para id="x_a1">If you are at all familiar with revision control systems,
dongsheng@650 386 you should be able to get up and running with Mercurial in less
dongsheng@650 387 than five minutes. Even if not, it will take no more than a few
dongsheng@650 388 minutes longer. Mercurial's command and feature sets are
dongsheng@650 389 generally uniform and consistent, so you can keep track of a few
dongsheng@650 390 general rules instead of a host of exceptions.</para>
dongsheng@650 391
dongsheng@650 392 <para id="x_a2">On a small project, you can start working with Mercurial in
dongsheng@650 393 moments. Creating new changes and branches; transferring changes
dongsheng@650 394 around (whether locally or over a network); and history and
dongsheng@650 395 status operations are all fast. Mercurial attempts to stay
dongsheng@650 396 nimble and largely out of your way by combining low cognitive
dongsheng@650 397 overhead with blazingly fast operations.</para>
dongsheng@650 398
dongsheng@650 399 <para id="x_a3">The usefulness of Mercurial is not limited to small
dongsheng@650 400 projects: it is used by projects with hundreds to thousands of
dongsheng@650 401 contributors, each containing tens of thousands of files and
dongsheng@650 402 hundreds of megabytes of source code.</para>
dongsheng@650 403
dongsheng@650 404 <para id="x_a4">If the core functionality of Mercurial is not enough for
dongsheng@650 405 you, it's easy to build on. Mercurial is well suited to
dongsheng@650 406 scripting tasks, and its clean internals and implementation in
dongsheng@650 407 Python make it easy to add features in the form of extensions.
dongsheng@650 408 There are a number of popular and useful extensions already
dongsheng@650 409 available, ranging from helping to identify bugs to improving
dongsheng@650 410 performance.</para>
dongsheng@650 411
dongsheng@650 412 </sect1>
dongsheng@650 413 <sect1>
dongsheng@650 414 <title>Mercurial compared with other tools</title>
dongsheng@650 415
dongsheng@650 416 <para id="x_a5">Before you read on, please understand that this section
dongsheng@650 417 necessarily reflects my own experiences, interests, and (dare I
dongsheng@650 418 say it) biases. I have used every one of the revision control
dongsheng@650 419 tools listed below, in most cases for several years at a
dongsheng@650 420 time.</para>
dongsheng@650 421
dongsheng@650 422
dongsheng@650 423 <sect2>
dongsheng@650 424 <title>Subversion</title>
dongsheng@650 425
dongsheng@650 426 <para id="x_a6">Subversion is a popular revision control tool, developed
dongsheng@650 427 to replace CVS. It has a centralised client/server
dongsheng@650 428 architecture.</para>
dongsheng@650 429
dongsheng@650 430 <para id="x_a7">Subversion and Mercurial have similarly named commands for
dongsheng@650 431 performing the same operations, so if you're familiar with
dongsheng@650 432 one, it is easy to learn to use the other. Both tools are
dongsheng@650 433 portable to all popular operating systems.</para>
dongsheng@650 434
dongsheng@650 435 <para id="x_a8">Prior to version 1.5, Subversion had no useful support for
dongsheng@650 436 merges. At the time of writing, its merge tracking capability
dongsheng@650 437 is new, and known to be <ulink
dongsheng@650 438 url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated
dongsheng@650 439 and buggy</ulink>.</para>
dongsheng@650 440
dongsheng@650 441 <para id="x_a9">Mercurial has a substantial performance advantage over
dongsheng@650 442 Subversion on every revision control operation I have
dongsheng@650 443 benchmarked. I have measured its advantage as ranging from a
dongsheng@650 444 factor of two to a factor of six when compared with Subversion
dongsheng@650 445 1.4.3's <emphasis>ra_local</emphasis> file store, which is the
dongsheng@650 446 fastest access method available. In more realistic
dongsheng@650 447 deployments involving a network-based store, Subversion will
dongsheng@650 448 be at a substantially larger disadvantage. Because many
dongsheng@650 449 Subversion commands must talk to the server and Subversion
dongsheng@650 450 does not have useful replication facilities, server capacity
dongsheng@650 451 and network bandwidth become bottlenecks for modestly large
dongsheng@650 452 projects.</para>
dongsheng@650 453
dongsheng@650 454 <para id="x_aa">Additionally, Subversion incurs substantial storage
dongsheng@650 455 overhead to avoid network transactions for a few common
dongsheng@650 456 operations, such as finding modified files
dongsheng@650 457 (<literal>status</literal>) and displaying modifications
dongsheng@650 458 against the current revision (<literal>diff</literal>). As a
dongsheng@650 459 result, a Subversion working copy is often the same size as,
dongsheng@650 460 or larger than, a Mercurial repository and working directory,
dongsheng@650 461 even though the Mercurial repository contains a complete
dongsheng@650 462 history of the project.</para>
dongsheng@650 463
dongsheng@650 464 <para id="x_ab">Subversion is widely supported by third party tools.
dongsheng@650 465 Mercurial currently lags considerably in this area. This gap
dongsheng@650 466 is closing, however, and indeed some of Mercurial's GUI tools
dongsheng@650 467 now outshine their Subversion equivalents. Like Mercurial,
dongsheng@650 468 Subversion has an excellent user manual.</para>
dongsheng@650 469
dongsheng@650 470 <para id="x_ac">Because Subversion doesn't store revision history on the
dongsheng@650 471 client, it is well suited to managing projects that deal with
dongsheng@650 472 lots of large, opaque binary files. If you check in fifty
dongsheng@650 473 revisions to an incompressible 10MB file, Subversion's
dongsheng@650 474 client-side space usage stays constant The space used by any
dongsheng@650 475 distributed SCM will grow rapidly in proportion to the number
dongsheng@650 476 of revisions, because the differences between each revision
dongsheng@650 477 are large.</para>
dongsheng@650 478
dongsheng@650 479 <para id="x_ad">In addition, it's often difficult or, more usually,
dongsheng@650 480 impossible to merge different versions of a binary file.
dongsheng@650 481 Subversion's ability to let a user lock a file, so that they
dongsheng@650 482 temporarily have the exclusive right to commit changes to it,
dongsheng@650 483 can be a significant advantage to a project where binary files
dongsheng@650 484 are widely used.</para>
dongsheng@650 485
dongsheng@650 486 <para id="x_ae">Mercurial can import revision history from a Subversion
dongsheng@650 487 repository. It can also export revision history to a
dongsheng@650 488 Subversion repository. This makes it easy to <quote>test the
dongsheng@650 489 waters</quote> and use Mercurial and Subversion in parallel
dongsheng@650 490 before deciding to switch. History conversion is incremental,
dongsheng@650 491 so you can perform an initial conversion, then small
dongsheng@650 492 additional conversions afterwards to bring in new
dongsheng@650 493 changes.</para>
dongsheng@650 494
dongsheng@650 495
dongsheng@650 496 </sect2>
dongsheng@650 497 <sect2>
dongsheng@650 498 <title>Git</title>
dongsheng@650 499
dongsheng@650 500 <para id="x_af">Git is a distributed revision control tool that was
dongsheng@650 501 developed for managing the Linux kernel source tree. Like
dongsheng@650 502 Mercurial, its early design was somewhat influenced by
dongsheng@650 503 Monotone.</para>
dongsheng@650 504
dongsheng@650 505 <para id="x_b0">Git has a very large command set, with version 1.5.0
dongsheng@650 506 providing 139 individual commands. It has something of a
dongsheng@650 507 reputation for being difficult to learn. Compared to Git,
dongsheng@650 508 Mercurial has a strong focus on simplicity.</para>
dongsheng@650 509
dongsheng@650 510 <para id="x_b1">In terms of performance, Git is extremely fast. In
dongsheng@650 511 several cases, it is faster than Mercurial, at least on Linux,
dongsheng@650 512 while Mercurial performs better on other operations. However,
dongsheng@650 513 on Windows, the performance and general level of support that
dongsheng@650 514 Git provides is, at the time of writing, far behind that of
dongsheng@650 515 Mercurial.</para>
dongsheng@650 516
dongsheng@650 517 <para id="x_b2">While a Mercurial repository needs no maintenance, a Git
dongsheng@650 518 repository requires frequent manual <quote>repacks</quote> of
dongsheng@650 519 its metadata. Without these, performance degrades, while
dongsheng@650 520 space usage grows rapidly. A server that contains many Git
dongsheng@650 521 repositories that are not rigorously and frequently repacked
dongsheng@650 522 will become heavily disk-bound during backups, and there have
dongsheng@650 523 been instances of daily backups taking far longer than 24
dongsheng@650 524 hours as a result. A freshly packed Git repository is
dongsheng@650 525 slightly smaller than a Mercurial repository, but an unpacked
dongsheng@650 526 repository is several orders of magnitude larger.</para>
dongsheng@650 527
dongsheng@650 528 <para id="x_b3">The core of Git is written in C. Many Git commands are
dongsheng@650 529 implemented as shell or Perl scripts, and the quality of these
dongsheng@650 530 scripts varies widely. I have encountered several instances
dongsheng@650 531 where scripts charged along blindly in the presence of errors
dongsheng@650 532 that should have been fatal.</para>
dongsheng@650 533
dongsheng@650 534 <para id="x_b4">Mercurial can import revision history from a Git
dongsheng@650 535 repository.</para>
dongsheng@650 536
dongsheng@650 537
dongsheng@650 538 </sect2>
dongsheng@650 539 <sect2>
dongsheng@650 540 <title>CVS</title>
dongsheng@650 541
dongsheng@650 542 <para id="x_b5">CVS is probably the most widely used revision control tool
dongsheng@650 543 in the world. Due to its age and internal untidiness, it has
dongsheng@650 544 been only lightly maintained for many years.</para>
dongsheng@650 545
dongsheng@650 546 <para id="x_b6">It has a centralised client/server architecture. It does
dongsheng@650 547 not group related file changes into atomic commits, making it
dongsheng@650 548 easy for people to <quote>break the build</quote>: one person
dongsheng@650 549 can successfully commit part of a change and then be blocked
dongsheng@650 550 by the need for a merge, causing other people to see only a
dongsheng@650 551 portion of the work they intended to do. This also affects
dongsheng@650 552 how you work with project history. If you want to see all of
dongsheng@650 553 the modifications someone made as part of a task, you will
dongsheng@650 554 need to manually inspect the descriptions and timestamps of
dongsheng@650 555 the changes made to each file involved (if you even know what
dongsheng@650 556 those files were).</para>
dongsheng@650 557
dongsheng@650 558 <para id="x_b7">CVS has a muddled notion of tags and branches that I will
dongsheng@650 559 not attempt to even describe. It does not support renaming of
dongsheng@650 560 files or directories well, making it easy to corrupt a
dongsheng@650 561 repository. It has almost no internal consistency checking
dongsheng@650 562 capabilities, so it is usually not even possible to tell
dongsheng@650 563 whether or how a repository is corrupt. I would not recommend
dongsheng@650 564 CVS for any project, existing or new.</para>
dongsheng@650 565
dongsheng@650 566 <para id="x_b8">Mercurial can import CVS revision history. However, there
dongsheng@650 567 are a few caveats that apply; these are true of every other
dongsheng@650 568 revision control tool's CVS importer, too. Due to CVS's lack
dongsheng@650 569 of atomic changes and unversioned filesystem hierarchy, it is
dongsheng@650 570 not possible to reconstruct CVS history completely accurately;
dongsheng@650 571 some guesswork is involved, and renames will usually not show
dongsheng@650 572 up. Because a lot of advanced CVS administration has to be
dongsheng@650 573 done by hand and is hence error-prone, it's common for CVS
dongsheng@650 574 importers to run into multiple problems with corrupted
dongsheng@650 575 repositories (completely bogus revision timestamps and files
dongsheng@650 576 that have remained locked for over a decade are just two of
dongsheng@650 577 the less interesting problems I can recall from personal
dongsheng@650 578 experience).</para>
dongsheng@650 579
dongsheng@650 580 <para id="x_b9">Mercurial can import revision history from a CVS
dongsheng@650 581 repository.</para>
dongsheng@650 582
dongsheng@650 583
dongsheng@650 584 </sect2>
dongsheng@650 585 <sect2>
dongsheng@650 586 <title>Commercial tools</title>
dongsheng@650 587
dongsheng@650 588 <para id="x_ba">Perforce has a centralised client/server architecture,
dongsheng@650 589 with no client-side caching of any data. Unlike modern
dongsheng@650 590 revision control tools, Perforce requires that a user run a
dongsheng@650 591 command to inform the server about every file they intend to
dongsheng@650 592 edit.</para>
dongsheng@650 593
dongsheng@650 594 <para id="x_bb">The performance of Perforce is quite good for small teams,
dongsheng@650 595 but it falls off rapidly as the number of users grows beyond a
dongsheng@650 596 few dozen. Modestly large Perforce installations require the
dongsheng@650 597 deployment of proxies to cope with the load their users
dongsheng@650 598 generate.</para>
dongsheng@650 599
dongsheng@650 600
dongsheng@650 601 </sect2>
dongsheng@650 602 <sect2>
dongsheng@650 603 <title>Choosing a revision control tool</title>
dongsheng@650 604
dongsheng@650 605 <para id="x_bc">With the exception of CVS, all of the tools listed above
dongsheng@650 606 have unique strengths that suit them to particular styles of
dongsheng@650 607 work. There is no single revision control tool that is best
dongsheng@650 608 in all situations.</para>
dongsheng@650 609
dongsheng@650 610 <para id="x_bd">As an example, Subversion is a good choice for working
dongsheng@650 611 with frequently edited binary files, due to its centralised
dongsheng@650 612 nature and support for file locking.</para>
dongsheng@650 613
dongsheng@650 614 <para id="x_be">I personally find Mercurial's properties of simplicity,
dongsheng@650 615 performance, and good merge support to be a compelling
dongsheng@650 616 combination that has served me well for several years.</para>
dongsheng@650 617
dongsheng@650 618
dongsheng@650 619 </sect2>
dongsheng@650 620 </sect1>
dongsheng@650 621 <sect1>
dongsheng@650 622 <title>Switching from another tool to Mercurial</title>
dongsheng@650 623
dongsheng@650 624 <para id="x_bf">Mercurial is bundled with an extension named <literal
dongsheng@650 625 role="hg-ext">convert</literal>, which can incrementally
dongsheng@650 626 import revision history from several other revision control
dongsheng@650 627 tools. By <quote>incremental</quote>, I mean that you can
dongsheng@650 628 convert all of a project's history to date in one go, then rerun
dongsheng@650 629 the conversion later to obtain new changes that happened after
dongsheng@650 630 the initial conversion.</para>
dongsheng@650 631
dongsheng@650 632 <para id="x_c0">The revision control tools supported by <literal
dongsheng@650 633 role="hg-ext">convert</literal> are as follows:</para>
dongsheng@650 634 <itemizedlist>
dongsheng@650 635 <listitem><para id="x_c1">Subversion</para></listitem>
dongsheng@650 636 <listitem><para id="x_c2">CVS</para></listitem>
dongsheng@650 637 <listitem><para id="x_c3">Git</para></listitem>
dongsheng@650 638 <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>
dongsheng@650 639
dongsheng@650 640 <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can
dongsheng@650 641 export changes from Mercurial to Subversion. This makes it
dongsheng@650 642 possible to try Subversion and Mercurial in parallel before
dongsheng@650 643 committing to a switchover, without risking the loss of any
dongsheng@650 644 work.</para>
dongsheng@650 645
dongsheng@650 646 <para id="x_c6">The <command role="hg-ext-convert">convert</command> command
dongsheng@650 647 is easy to use. Simply point it at the path or URL of the
dongsheng@650 648 source repository, optionally give it the name of the
dongsheng@650 649 destination repository, and it will start working. After the
dongsheng@650 650 initial conversion, just run the same command again to import
dongsheng@650 651 new changes.</para>
dongsheng@650 652 </sect1>
dongsheng@650 653
dongsheng@650 654 <sect1>
dongsheng@650 655 <title>A short history of revision control</title>
dongsheng@650 656
dongsheng@650 657 <para id="x_c7">The best known of the old-time revision control tools is
dongsheng@650 658 SCCS (Source Code Control System), which Marc Rochkind wrote at
dongsheng@650 659 Bell Labs, in the early 1970s. SCCS operated on individual
dongsheng@650 660 files, and required every person working on a project to have
dongsheng@650 661 access to a shared workspace on a single system. Only one
dongsheng@650 662 person could modify a file at any time; arbitration for access
dongsheng@650 663 to files was via locks. It was common for people to lock files,
dongsheng@650 664 and later forget to unlock them, preventing anyone else from
dongsheng@650 665 modifying those files without the help of an
dongsheng@650 666 administrator.</para>
dongsheng@650 667
dongsheng@650 668 <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the
dongsheng@650 669 early 1980s; he called his program RCS (Revision Control System).
dongsheng@650 670 Like SCCS, RCS required developers to work in a single shared
dongsheng@650 671 workspace, and to lock files to prevent multiple people from
dongsheng@650 672 modifying them simultaneously.</para>
dongsheng@650 673
dongsheng@650 674 <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block
dongsheng@650 675 for a set of shell scripts he initially called cmt, but then
dongsheng@650 676 renamed to CVS (Concurrent Versions System). The big innovation
dongsheng@650 677 of CVS was that it let developers work simultaneously and
dongsheng@650 678 somewhat independently in their own personal workspaces. The
dongsheng@650 679 personal workspaces prevented developers from stepping on each
dongsheng@650 680 other's toes all the time, as was common with SCCS and RCS. Each
dongsheng@650 681 developer had a copy of every project file, and could modify
dongsheng@650 682 their copies independently. They had to merge their edits prior
dongsheng@650 683 to committing changes to the central repository.</para>
dongsheng@650 684
dongsheng@650 685 <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote
dongsheng@650 686 them in C, releasing in 1989 the code that has since developed
dongsheng@650 687 into the modern version of CVS. CVS subsequently acquired the
dongsheng@650 688 ability to operate over a network connection, giving it a
dongsheng@650 689 client/server architecture. CVS's architecture is centralised;
dongsheng@650 690 only the server has a copy of the history of the project. Client
dongsheng@650 691 workspaces just contain copies of recent versions of the
dongsheng@650 692 project's files, and a little metadata to tell them where the
dongsheng@650 693 server is. CVS has been enormously successful; it is probably
dongsheng@650 694 the world's most widely used revision control system.</para>
dongsheng@650 695
dongsheng@650 696 <para id="x_cb">In the early 1990s, Sun Microsystems developed an early
dongsheng@650 697 distributed revision control system, called TeamWare. A
dongsheng@650 698 TeamWare workspace contains a complete copy of the project's
dongsheng@650 699 history. TeamWare has no notion of a central repository. (CVS
dongsheng@650 700 relied upon RCS for its history storage; TeamWare used
dongsheng@650 701 SCCS.)</para>
dongsheng@650 702
dongsheng@650 703 <para id="x_cc">As the 1990s progressed, awareness grew of a number of
dongsheng@650 704 problems with CVS. It records simultaneous changes to multiple
dongsheng@650 705 files individually, instead of grouping them together as a
dongsheng@650 706 single logically atomic operation. It does not manage its file
dongsheng@650 707 hierarchy well; it is easy to make a mess of a repository by
dongsheng@650 708 renaming files and directories. Worse, its source code is
dongsheng@650 709 difficult to read and maintain, which made the <quote>pain
dongsheng@650 710 level</quote> of fixing these architectural problems
dongsheng@650 711 prohibitive.</para>
dongsheng@650 712
dongsheng@650 713 <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had
dongsheng@650 714 worked on CVS, started a project to replace it with a tool that
dongsheng@650 715 would have a better architecture and cleaner code. The result,
dongsheng@650 716 Subversion, does not stray from CVS's centralised client/server
dongsheng@650 717 model, but it adds multi-file atomic commits, better namespace
dongsheng@650 718 management, and a number of other features that make it a
dongsheng@650 719 generally better tool than CVS. Since its initial release, it
dongsheng@650 720 has rapidly grown in popularity.</para>
dongsheng@650 721
dongsheng@650 722 <para id="x_ce">More or less simultaneously, Graydon Hoare began working on
dongsheng@650 723 an ambitious distributed revision control system that he named
dongsheng@650 724 Monotone. While Monotone addresses many of CVS's design flaws
dongsheng@650 725 and has a peer-to-peer architecture, it goes beyond earlier (and
dongsheng@650 726 subsequent) revision control tools in a number of innovative
dongsheng@650 727 ways. It uses cryptographic hashes as identifiers, and has an
dongsheng@650 728 integral notion of <quote>trust</quote> for code from different
dongsheng@650 729 sources.</para>
dongsheng@650 730
dongsheng@650 731 <para id="x_cf">Mercurial began life in 2005. While a few aspects of its
dongsheng@650 732 design are influenced by Monotone, Mercurial focuses on ease of
dongsheng@650 733 use, high performance, and scalability to very large
dongsheng@650 734 projects.</para>
dongsheng@650 735
dongsheng@650 736 </sect1>
dongsheng@650 737
dongsheng@650 738 <sect1>
dongsheng@650 739 <title>Colophon&emdash;this book is Free</title>
dongsheng@650 740
dongsheng@650 741 <para id="x_d0">This book is licensed under the Open Publication License,
bos@559 742 and is produced entirely using Free Software tools. It is
bos@580 743 typeset with DocBook XML. Illustrations are drawn and rendered with
bos@559 744 <ulink url="http://www.inkscape.org/">Inkscape</ulink>.</para>
bos@26 745
dongsheng@650 746 <para id="x_d1">The complete source code for this book is published as a
bos@559 747 Mercurial repository, at <ulink
bos@559 748 url="http://hg.serpentine.com/mercurial/book">http://hg.serpentine.com/mercurial/book</ulink>.</para>
bos@559 749
bos@559 750 </sect1>
bos@559 751 </preface>
bos@559 752 <!--
bos@559 753 local variables:
bos@559 754 sgml-parent-document: ("00book.xml" "book" "preface")
bos@559 755 end:
bos@559 756 -->