hgbook

annotate en/ch01-intro.xml @ 649:d13c7c706a58

Merge with http://hg.serpentine.com/mercurial/book
author Dongsheng Song <dongsheng.song@gmail.com>
date Fri Mar 20 15:40:06 2009 +0800 (2009-03-20)
parents 13513d2a128d
children
rev   line source
bos@553 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
bos@553 2
dongsheng@625 3 <chapter id="chap.intro">
bos@572 4 <?dbhtml filename="introduction.html"?>
bos@553 5 <title>Introduction</title>
bos@553 6
bos@553 7 <sect1>
bos@553 8 <title>About revision control</title>
bos@553 9
bos@553 10 <para>Revision control is the process of managing multiple
bos@553 11 versions of a piece of information. In its simplest form, this
bos@553 12 is something that many people do by hand: every time you modify
bos@553 13 a file, save it under a new name that contains a number, each
bos@553 14 one higher than the number of the preceding version.</para>
bos@553 15
bos@553 16 <para>Manually managing multiple versions of even a single file is
bos@553 17 an error-prone task, though, so software tools to help automate
bos@553 18 this process have long been available. The earliest automated
bos@553 19 revision control tools were intended to help a single user to
bos@553 20 manage revisions of a single file. Over the past few decades,
bos@553 21 the scope of revision control tools has expanded greatly; they
bos@553 22 now manage multiple files, and help multiple people to work
bos@553 23 together. The best modern revision control tools have no
bos@553 24 problem coping with thousands of people working together on
bos@553 25 projects that consist of hundreds of thousands of files.</para>
bos@553 26
bos@553 27 <sect2>
bos@553 28 <title>Why use revision control?</title>
bos@553 29
bos@553 30 <para>There are a number of reasons why you or your team might
bos@553 31 want to use an automated revision control tool for a
bos@553 32 project.</para>
bos@553 33 <itemizedlist>
bos@553 34 <listitem><para>It will track the history and evolution of
bos@553 35 your project, so you don't have to. For every change,
bos@553 36 you'll have a log of <emphasis>who</emphasis> made it;
bos@553 37 <emphasis>why</emphasis> they made it;
bos@553 38 <emphasis>when</emphasis> they made it; and
bos@553 39 <emphasis>what</emphasis> the change
bos@553 40 was.</para></listitem>
bos@553 41 <listitem><para>When you're working with other people,
bos@553 42 revision control software makes it easier for you to
bos@553 43 collaborate. For example, when people more or less
bos@553 44 simultaneously make potentially incompatible changes, the
bos@553 45 software will help you to identify and resolve those
bos@553 46 conflicts.</para></listitem>
bos@553 47 <listitem><para>It can help you to recover from mistakes. If
bos@553 48 you make a change that later turns out to be in error, you
bos@553 49 can revert to an earlier version of one or more files. In
bos@553 50 fact, a <emphasis>really</emphasis> good revision control
bos@553 51 tool will even help you to efficiently figure out exactly
bos@553 52 when a problem was introduced (see section <xref
dongsheng@625 53 linkend="sec.undo.bisect"/> for details).</para></listitem>
bos@553 54 <listitem><para>It will help you to work simultaneously on,
bos@553 55 and manage the drift between, multiple versions of your
bos@553 56 project.</para></listitem></itemizedlist>
bos@553 57 <para>Most of these reasons are equally valid---at least in
bos@553 58 theory---whether you're working on a project by yourself, or
bos@553 59 with a hundred other people.</para>
bos@553 60
bos@553 61 <para>A key question about the practicality of revision control
bos@553 62 at these two different scales (<quote>lone hacker</quote> and
bos@553 63 <quote>huge team</quote>) is how its
bos@553 64 <emphasis>benefits</emphasis> compare to its
bos@553 65 <emphasis>costs</emphasis>. A revision control tool that's
bos@553 66 difficult to understand or use is going to impose a high
bos@553 67 cost.</para>
bos@553 68
bos@553 69 <para>A five-hundred-person project is likely to collapse under
bos@553 70 its own weight almost immediately without a revision control
bos@553 71 tool and process. In this case, the cost of using revision
bos@553 72 control might hardly seem worth considering, since
bos@553 73 <emphasis>without</emphasis> it, failure is almost
bos@553 74 guaranteed.</para>
bos@553 75
bos@553 76 <para>On the other hand, a one-person <quote>quick hack</quote>
bos@553 77 might seem like a poor place to use a revision control tool,
bos@553 78 because surely the cost of using one must be close to the
bos@553 79 overall cost of the project. Right?</para>
bos@553 80
bos@553 81 <para>Mercurial uniquely supports <emphasis>both</emphasis> of
bos@553 82 these scales of development. You can learn the basics in just
bos@553 83 a few minutes, and due to its low overhead, you can apply
bos@553 84 revision control to the smallest of projects with ease. Its
bos@553 85 simplicity means you won't have a lot of abstruse concepts or
bos@553 86 command sequences competing for mental space with whatever
bos@553 87 you're <emphasis>really</emphasis> trying to do. At the same
bos@553 88 time, Mercurial's high performance and peer-to-peer nature let
bos@553 89 you scale painlessly to handle large projects.</para>
bos@553 90
bos@553 91 <para>No revision control tool can rescue a poorly run project,
bos@553 92 but a good choice of tools can make a huge difference to the
bos@553 93 fluidity with which you can work on a project.</para>
bos@553 94
bos@553 95 </sect2>
bos@553 96 <sect2>
bos@553 97 <title>The many names of revision control</title>
bos@553 98
bos@553 99 <para>Revision control is a diverse field, so much so that it
bos@553 100 doesn't actually have a single name or acronym. Here are a
bos@553 101 few of the more common names and acronyms you'll
bos@553 102 encounter:</para>
bos@553 103 <itemizedlist>
bos@553 104 <listitem><para>Revision control (RCS)</para></listitem>
bos@553 105 <listitem><para>Software configuration management (SCM), or
bos@553 106 configuration management</para></listitem>
bos@553 107 <listitem><para>Source code management</para></listitem>
bos@553 108 <listitem><para>Source code control, or source
bos@553 109 control</para></listitem>
bos@553 110 <listitem><para>Version control
bos@553 111 (VCS)</para></listitem></itemizedlist>
bos@553 112 <para>Some people claim that these terms actually have different
bos@553 113 meanings, but in practice they overlap so much that there's no
bos@553 114 agreed or even useful way to tease them apart.</para>
bos@553 115
bos@553 116 </sect2>
bos@553 117 </sect1>
bos@553 118 <sect1>
bos@553 119 <title>A short history of revision control</title>
bos@553 120
bos@553 121 <para>The best known of the old-time revision control tools is
bos@553 122 SCCS (Source Code Control System), which Marc Rochkind wrote at
bos@553 123 Bell Labs, in the early 1970s. SCCS operated on individual
bos@553 124 files, and required every person working on a project to have
bos@553 125 access to a shared workspace on a single system. Only one
bos@553 126 person could modify a file at any time; arbitration for access
bos@553 127 to files was via locks. It was common for people to lock files,
bos@553 128 and later forget to unlock them, preventing anyone else from
bos@553 129 modifying those files without the help of an
bos@553 130 administrator.</para>
bos@553 131
bos@553 132 <para>Walter Tichy developed a free alternative to SCCS in the
ori@561 133 early 1980s; he called his program RCS (Revision Control System).
bos@553 134 Like SCCS, RCS required developers to work in a single shared
bos@553 135 workspace, and to lock files to prevent multiple people from
bos@553 136 modifying them simultaneously.</para>
bos@553 137
bos@553 138 <para>Later in the 1980s, Dick Grune used RCS as a building block
bos@553 139 for a set of shell scripts he initially called cmt, but then
bos@553 140 renamed to CVS (Concurrent Versions System). The big innovation
bos@553 141 of CVS was that it let developers work simultaneously and
bos@553 142 somewhat independently in their own personal workspaces. The
bos@553 143 personal workspaces prevented developers from stepping on each
bos@553 144 other's toes all the time, as was common with SCCS and RCS. Each
bos@553 145 developer had a copy of every project file, and could modify
bos@553 146 their copies independently. They had to merge their edits prior
bos@553 147 to committing changes to the central repository.</para>
bos@553 148
bos@553 149 <para>Brian Berliner took Grune's original scripts and rewrote
bos@553 150 them in C, releasing in 1989 the code that has since developed
bos@553 151 into the modern version of CVS. CVS subsequently acquired the
bos@553 152 ability to operate over a network connection, giving it a
bos@553 153 client/server architecture. CVS's architecture is centralised;
bos@553 154 only the server has a copy of the history of the project. Client
bos@553 155 workspaces just contain copies of recent versions of the
bos@553 156 project's files, and a little metadata to tell them where the
bos@553 157 server is. CVS has been enormously successful; it is probably
bos@553 158 the world's most widely used revision control system.</para>
bos@553 159
bos@553 160 <para>In the early 1990s, Sun Microsystems developed an early
bos@553 161 distributed revision control system, called TeamWare. A
bos@553 162 TeamWare workspace contains a complete copy of the project's
bos@553 163 history. TeamWare has no notion of a central repository. (CVS
bos@553 164 relied upon RCS for its history storage; TeamWare used
bos@553 165 SCCS.)</para>
bos@553 166
bos@553 167 <para>As the 1990s progressed, awareness grew of a number of
bos@553 168 problems with CVS. It records simultaneous changes to multiple
bos@553 169 files individually, instead of grouping them together as a
bos@553 170 single logically atomic operation. It does not manage its file
bos@553 171 hierarchy well; it is easy to make a mess of a repository by
bos@553 172 renaming files and directories. Worse, its source code is
bos@553 173 difficult to read and maintain, which made the <quote>pain
bos@553 174 level</quote> of fixing these architectural problems
bos@553 175 prohibitive.</para>
bos@553 176
bos@553 177 <para>In 2001, Jim Blandy and Karl Fogel, two developers who had
bos@553 178 worked on CVS, started a project to replace it with a tool that
bos@553 179 would have a better architecture and cleaner code. The result,
bos@553 180 Subversion, does not stray from CVS's centralised client/server
bos@553 181 model, but it adds multi-file atomic commits, better namespace
bos@553 182 management, and a number of other features that make it a
bos@553 183 generally better tool than CVS. Since its initial release, it
bos@553 184 has rapidly grown in popularity.</para>
bos@553 185
bos@553 186 <para>More or less simultaneously, Graydon Hoare began working on
bos@553 187 an ambitious distributed revision control system that he named
bos@553 188 Monotone. While Monotone addresses many of CVS's design flaws
bos@553 189 and has a peer-to-peer architecture, it goes beyond earlier (and
bos@553 190 subsequent) revision control tools in a number of innovative
bos@553 191 ways. It uses cryptographic hashes as identifiers, and has an
bos@553 192 integral notion of <quote>trust</quote> for code from different
bos@553 193 sources.</para>
bos@553 194
bos@553 195 <para>Mercurial began life in 2005. While a few aspects of its
bos@553 196 design are influenced by Monotone, Mercurial focuses on ease of
bos@553 197 use, high performance, and scalability to very large
bos@553 198 projects.</para>
bos@553 199
bos@553 200 </sect1>
bos@553 201 <sect1>
bos@553 202 <title>Trends in revision control</title>
bos@553 203
bos@553 204 <para>There has been an unmistakable trend in the development and
bos@553 205 use of revision control tools over the past four decades, as
bos@553 206 people have become familiar with the capabilities of their tools
bos@553 207 and constrained by their limitations.</para>
bos@553 208
bos@553 209 <para>The first generation began by managing single files on
bos@553 210 individual computers. Although these tools represented a huge
bos@553 211 advance over ad-hoc manual revision control, their locking model
bos@553 212 and reliance on a single computer limited them to small,
bos@553 213 tightly-knit teams.</para>
bos@553 214
bos@553 215 <para>The second generation loosened these constraints by moving
bos@553 216 to network-centered architectures, and managing entire projects
bos@553 217 at a time. As projects grew larger, they ran into new problems.
bos@553 218 With clients needing to talk to servers very frequently, server
bos@553 219 scaling became an issue for large projects. An unreliable
bos@553 220 network connection could prevent remote users from being able to
bos@553 221 talk to the server at all. As open source projects started
bos@553 222 making read-only access available anonymously to anyone, people
bos@553 223 without commit privileges found that they could not use the
bos@553 224 tools to interact with a project in a natural way, as they could
bos@553 225 not record their changes.</para>
bos@553 226
bos@553 227 <para>The current generation of revision control tools is
bos@553 228 peer-to-peer in nature. All of these systems have dropped the
bos@553 229 dependency on a single central server, and allow people to
bos@553 230 distribute their revision control data to where it's actually
bos@553 231 needed. Collaboration over the Internet has moved from
bos@553 232 constrained by technology to a matter of choice and consensus.
bos@553 233 Modern tools can operate offline indefinitely and autonomously,
bos@553 234 with a network connection only needed when syncing changes with
bos@553 235 another repository.</para>
bos@553 236
bos@553 237 </sect1>
bos@553 238 <sect1>
bos@553 239 <title>A few of the advantages of distributed revision
bos@553 240 control</title>
bos@553 241
bos@553 242 <para>Even though distributed revision control tools have for
bos@553 243 several years been as robust and usable as their
bos@553 244 previous-generation counterparts, people using older tools have
bos@553 245 not yet necessarily woken up to their advantages. There are a
bos@553 246 number of ways in which distributed tools shine relative to
bos@553 247 centralised ones.</para>
bos@553 248
bos@553 249 <para>For an individual developer, distributed tools are almost
bos@553 250 always much faster than centralised tools. This is for a simple
bos@553 251 reason: a centralised tool needs to talk over the network for
bos@553 252 many common operations, because most metadata is stored in a
bos@553 253 single copy on the central server. A distributed tool stores
bos@553 254 all of its metadata locally. All else being equal, talking over
bos@553 255 the network adds overhead to a centralised tool. Don't
bos@553 256 underestimate the value of a snappy, responsive tool: you're
bos@553 257 going to spend a lot of time interacting with your revision
bos@553 258 control software.</para>
bos@553 259
bos@553 260 <para>Distributed tools are indifferent to the vagaries of your
bos@553 261 server infrastructure, again because they replicate metadata to
bos@553 262 so many locations. If you use a centralised system and your
bos@553 263 server catches fire, you'd better hope that your backup media
bos@553 264 are reliable, and that your last backup was recent and actually
bos@553 265 worked. With a distributed tool, you have many backups
bos@553 266 available on every contributor's computer.</para>
bos@553 267
bos@553 268 <para>The reliability of your network will affect distributed
bos@553 269 tools far less than it will centralised tools. You can't even
bos@553 270 use a centralised tool without a network connection, except for
bos@553 271 a few highly constrained commands. With a distributed tool, if
bos@553 272 your network connection goes down while you're working, you may
bos@553 273 not even notice. The only thing you won't be able to do is talk
bos@553 274 to repositories on other computers, something that is relatively
bos@553 275 rare compared with local operations. If you have a far-flung
bos@553 276 team of collaborators, this may be significant.</para>
bos@553 277
bos@553 278 <sect2>
bos@553 279 <title>Advantages for open source projects</title>
bos@553 280
bos@553 281 <para>If you take a shine to an open source project and decide
bos@553 282 that you would like to start hacking on it, and that project
bos@553 283 uses a distributed revision control tool, you are at once a
bos@553 284 peer with the people who consider themselves the
bos@553 285 <quote>core</quote> of that project. If they publish their
bos@553 286 repositories, you can immediately copy their project history,
bos@553 287 start making changes, and record your work, using the same
bos@553 288 tools in the same ways as insiders. By contrast, with a
bos@553 289 centralised tool, you must use the software in a <quote>read
bos@553 290 only</quote> mode unless someone grants you permission to
bos@553 291 commit changes to their central server. Until then, you won't
bos@553 292 be able to record changes, and your local modifications will
bos@553 293 be at risk of corruption any time you try to update your
bos@553 294 client's view of the repository.</para>
bos@553 295
bos@553 296 <sect3>
bos@553 297 <title>The forking non-problem</title>
bos@553 298
bos@553 299 <para>It has been suggested that distributed revision control
bos@553 300 tools pose some sort of risk to open source projects because
bos@553 301 they make it easy to <quote>fork</quote> the development of
bos@553 302 a project. A fork happens when there are differences in
bos@553 303 opinion or attitude between groups of developers that cause
bos@553 304 them to decide that they can't work together any longer.
bos@553 305 Each side takes a more or less complete copy of the
bos@553 306 project's source code, and goes off in its own
bos@553 307 direction.</para>
bos@553 308
bos@553 309 <para>Sometimes the camps in a fork decide to reconcile their
bos@553 310 differences. With a centralised revision control system, the
bos@553 311 <emphasis>technical</emphasis> process of reconciliation is
bos@553 312 painful, and has to be performed largely by hand. You have
bos@553 313 to decide whose revision history is going to
bos@553 314 <quote>win</quote>, and graft the other team's changes into
bos@553 315 the tree somehow. This usually loses some or all of one
bos@553 316 side's revision history.</para>
bos@553 317
bos@553 318 <para>What distributed tools do with respect to forking is
bos@553 319 they make forking the <emphasis>only</emphasis> way to
bos@553 320 develop a project. Every single change that you make is
bos@553 321 potentially a fork point. The great strength of this
bos@553 322 approach is that a distributed revision control tool has to
bos@553 323 be really good at <emphasis>merging</emphasis> forks,
bos@553 324 because forks are absolutely fundamental: they happen all
bos@553 325 the time.</para>
bos@553 326
bos@553 327 <para>If every piece of work that everybody does, all the
bos@553 328 time, is framed in terms of forking and merging, then what
bos@553 329 the open source world refers to as a <quote>fork</quote>
bos@553 330 becomes <emphasis>purely</emphasis> a social issue. If
bos@553 331 anything, distributed tools <emphasis>lower</emphasis> the
bos@553 332 likelihood of a fork:</para>
bos@553 333 <itemizedlist>
bos@553 334 <listitem><para>They eliminate the social distinction that
bos@553 335 centralised tools impose: that between insiders (people
bos@553 336 with commit access) and outsiders (people
bos@553 337 without).</para></listitem>
bos@553 338 <listitem><para>They make it easier to reconcile after a
bos@553 339 social fork, because all that's involved from the
bos@553 340 perspective of the revision control software is just
bos@553 341 another merge.</para></listitem></itemizedlist>
bos@553 342
bos@553 343 <para>Some people resist distributed tools because they want
bos@553 344 to retain tight control over their projects, and they
bos@553 345 believe that centralised tools give them this control.
bos@553 346 However, if you're of this belief, and you publish your CVS
ori@561 347 or Subversion repositories publicly, there are plenty of
bos@553 348 tools available that can pull out your entire project's
bos@553 349 history (albeit slowly) and recreate it somewhere that you
bos@553 350 don't control. So while your control in this case is
bos@553 351 illusory, you are forgoing the ability to fluidly
bos@553 352 collaborate with whatever people feel compelled to mirror
bos@553 353 and fork your history.</para>
bos@553 354
bos@553 355 </sect3>
bos@553 356 </sect2>
bos@553 357 <sect2>
bos@553 358 <title>Advantages for commercial projects</title>
bos@553 359
bos@553 360 <para>Many commercial projects are undertaken by teams that are
bos@553 361 scattered across the globe. Contributors who are far from a
bos@553 362 central server will see slower command execution and perhaps
bos@553 363 less reliability. Commercial revision control systems attempt
bos@553 364 to ameliorate these problems with remote-site replication
bos@553 365 add-ons that are typically expensive to buy and cantankerous
bos@553 366 to administer. A distributed system doesn't suffer from these
bos@553 367 problems in the first place. Better yet, you can easily set
bos@553 368 up multiple authoritative servers, say one per site, so that
bos@553 369 there's no redundant communication between repositories over
bos@553 370 expensive long-haul network links.</para>
bos@553 371
bos@553 372 <para>Centralised revision control systems tend to have
bos@553 373 relatively low scalability. It's not unusual for an expensive
bos@553 374 centralised system to fall over under the combined load of
bos@553 375 just a few dozen concurrent users. Once again, the typical
bos@553 376 response tends to be an expensive and clunky replication
bos@553 377 facility. Since the load on a central server---if you have
bos@553 378 one at all---is many times lower with a distributed tool
bos@553 379 (because all of the data is replicated everywhere), a single
bos@553 380 cheap server can handle the needs of a much larger team, and
bos@553 381 replication to balance load becomes a simple matter of
bos@553 382 scripting.</para>
bos@553 383
bos@553 384 <para>If you have an employee in the field, troubleshooting a
bos@553 385 problem at a customer's site, they'll benefit from distributed
bos@553 386 revision control. The tool will let them generate custom
bos@553 387 builds, try different fixes in isolation from each other, and
bos@553 388 search efficiently through history for the sources of bugs and
bos@553 389 regressions in the customer's environment, all without needing
bos@553 390 to connect to your company's network.</para>
bos@553 391
bos@553 392 </sect2>
bos@553 393 </sect1>
bos@553 394 <sect1>
bos@553 395 <title>Why choose Mercurial?</title>
bos@553 396
bos@553 397 <para>Mercurial has a unique set of properties that make it a
bos@553 398 particularly good choice as a revision control system.</para>
bos@553 399 <itemizedlist>
bos@553 400 <listitem><para>It is easy to learn and use.</para></listitem>
bos@553 401 <listitem><para>It is lightweight.</para></listitem>
bos@553 402 <listitem><para>It scales excellently.</para></listitem>
bos@553 403 <listitem><para>It is easy to
bos@553 404 customise.</para></listitem></itemizedlist>
bos@553 405
bos@553 406 <para>If you are at all familiar with revision control systems,
bos@553 407 you should be able to get up and running with Mercurial in less
bos@553 408 than five minutes. Even if not, it will take no more than a few
bos@553 409 minutes longer. Mercurial's command and feature sets are
bos@553 410 generally uniform and consistent, so you can keep track of a few
bos@553 411 general rules instead of a host of exceptions.</para>
bos@553 412
bos@553 413 <para>On a small project, you can start working with Mercurial in
bos@553 414 moments. Creating new changes and branches; transferring changes
bos@553 415 around (whether locally or over a network); and history and
bos@553 416 status operations are all fast. Mercurial attempts to stay
bos@553 417 nimble and largely out of your way by combining low cognitive
bos@553 418 overhead with blazingly fast operations.</para>
bos@553 419
bos@553 420 <para>The usefulness of Mercurial is not limited to small
bos@553 421 projects: it is used by projects with hundreds to thousands of
bos@553 422 contributors, each containing tens of thousands of files and
bos@553 423 hundreds of megabytes of source code.</para>
bos@553 424
bos@553 425 <para>If the core functionality of Mercurial is not enough for
bos@553 426 you, it's easy to build on. Mercurial is well suited to
bos@553 427 scripting tasks, and its clean internals and implementation in
bos@553 428 Python make it easy to add features in the form of extensions.
bos@553 429 There are a number of popular and useful extensions already
bos@553 430 available, ranging from helping to identify bugs to improving
bos@553 431 performance.</para>
bos@553 432
bos@553 433 </sect1>
bos@553 434 <sect1>
bos@553 435 <title>Mercurial compared with other tools</title>
bos@553 436
bos@553 437 <para>Before you read on, please understand that this section
bos@553 438 necessarily reflects my own experiences, interests, and (dare I
bos@553 439 say it) biases. I have used every one of the revision control
bos@553 440 tools listed below, in most cases for several years at a
bos@553 441 time.</para>
bos@553 442
bos@553 443
bos@553 444 <sect2>
bos@553 445 <title>Subversion</title>
bos@553 446
bos@553 447 <para>Subversion is a popular revision control tool, developed
bos@553 448 to replace CVS. It has a centralised client/server
bos@553 449 architecture.</para>
bos@553 450
bos@553 451 <para>Subversion and Mercurial have similarly named commands for
bos@553 452 performing the same operations, so if you're familiar with
bos@553 453 one, it is easy to learn to use the other. Both tools are
bos@553 454 portable to all popular operating systems.</para>
bos@553 455
bos@553 456 <para>Prior to version 1.5, Subversion had no useful support for
bos@553 457 merges. At the time of writing, its merge tracking capability
bos@553 458 is new, and known to be <ulink
bos@553 459 url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated
bos@553 460 and buggy</ulink>.</para>
bos@553 461
bos@553 462 <para>Mercurial has a substantial performance advantage over
bos@553 463 Subversion on every revision control operation I have
bos@553 464 benchmarked. I have measured its advantage as ranging from a
bos@553 465 factor of two to a factor of six when compared with Subversion
bos@553 466 1.4.3's <emphasis>ra_local</emphasis> file store, which is the
bos@553 467 fastest access method available. In more realistic
bos@553 468 deployments involving a network-based store, Subversion will
bos@553 469 be at a substantially larger disadvantage. Because many
bos@553 470 Subversion commands must talk to the server and Subversion
bos@553 471 does not have useful replication facilities, server capacity
bos@553 472 and network bandwidth become bottlenecks for modestly large
bos@553 473 projects.</para>
bos@553 474
bos@553 475 <para>Additionally, Subversion incurs substantial storage
bos@553 476 overhead to avoid network transactions for a few common
bos@553 477 operations, such as finding modified files
bos@553 478 (<literal>status</literal>) and displaying modifications
bos@553 479 against the current revision (<literal>diff</literal>). As a
bos@553 480 result, a Subversion working copy is often the same size as,
bos@553 481 or larger than, a Mercurial repository and working directory,
bos@553 482 even though the Mercurial repository contains a complete
bos@553 483 history of the project.</para>
bos@553 484
bos@553 485 <para>Subversion is widely supported by third party tools.
bos@553 486 Mercurial currently lags considerably in this area. This gap
bos@553 487 is closing, however, and indeed some of Mercurial's GUI tools
bos@553 488 now outshine their Subversion equivalents. Like Mercurial,
bos@553 489 Subversion has an excellent user manual.</para>
bos@553 490
bos@553 491 <para>Because Subversion doesn't store revision history on the
bos@553 492 client, it is well suited to managing projects that deal with
bos@553 493 lots of large, opaque binary files. If you check in fifty
bos@553 494 revisions to an incompressible 10MB file, Subversion's
bos@553 495 client-side space usage stays constant The space used by any
bos@553 496 distributed SCM will grow rapidly in proportion to the number
bos@553 497 of revisions, because the differences between each revision
bos@553 498 are large.</para>
bos@553 499
bos@553 500 <para>In addition, it's often difficult or, more usually,
bos@553 501 impossible to merge different versions of a binary file.
bos@553 502 Subversion's ability to let a user lock a file, so that they
bos@553 503 temporarily have the exclusive right to commit changes to it,
bos@553 504 can be a significant advantage to a project where binary files
bos@553 505 are widely used.</para>
bos@553 506
bos@553 507 <para>Mercurial can import revision history from a Subversion
bos@553 508 repository. It can also export revision history to a
bos@553 509 Subversion repository. This makes it easy to <quote>test the
bos@553 510 waters</quote> and use Mercurial and Subversion in parallel
bos@553 511 before deciding to switch. History conversion is incremental,
bos@553 512 so you can perform an initial conversion, then small
bos@553 513 additional conversions afterwards to bring in new
bos@553 514 changes.</para>
bos@553 515
bos@553 516
bos@553 517 </sect2>
bos@553 518 <sect2>
bos@553 519 <title>Git</title>
bos@553 520
bos@553 521 <para>Git is a distributed revision control tool that was
bos@553 522 developed for managing the Linux kernel source tree. Like
bos@553 523 Mercurial, its early design was somewhat influenced by
bos@553 524 Monotone.</para>
bos@553 525
bos@553 526 <para>Git has a very large command set, with version 1.5.0
bos@553 527 providing 139 individual commands. It has something of a
bos@553 528 reputation for being difficult to learn. Compared to Git,
bos@553 529 Mercurial has a strong focus on simplicity.</para>
bos@553 530
bos@553 531 <para>In terms of performance, Git is extremely fast. In
bos@553 532 several cases, it is faster than Mercurial, at least on Linux,
bos@553 533 while Mercurial performs better on other operations. However,
bos@553 534 on Windows, the performance and general level of support that
bos@553 535 Git provides is, at the time of writing, far behind that of
bos@553 536 Mercurial.</para>
bos@553 537
bos@553 538 <para>While a Mercurial repository needs no maintenance, a Git
bos@553 539 repository requires frequent manual <quote>repacks</quote> of
bos@553 540 its metadata. Without these, performance degrades, while
bos@553 541 space usage grows rapidly. A server that contains many Git
bos@553 542 repositories that are not rigorously and frequently repacked
bos@553 543 will become heavily disk-bound during backups, and there have
bos@553 544 been instances of daily backups taking far longer than 24
bos@553 545 hours as a result. A freshly packed Git repository is
bos@553 546 slightly smaller than a Mercurial repository, but an unpacked
bos@553 547 repository is several orders of magnitude larger.</para>
bos@553 548
bos@553 549 <para>The core of Git is written in C. Many Git commands are
bos@553 550 implemented as shell or Perl scripts, and the quality of these
bos@553 551 scripts varies widely. I have encountered several instances
bos@553 552 where scripts charged along blindly in the presence of errors
bos@553 553 that should have been fatal.</para>
bos@553 554
bos@553 555 <para>Mercurial can import revision history from a Git
bos@553 556 repository.</para>
bos@553 557
bos@553 558
bos@553 559 </sect2>
bos@553 560 <sect2>
bos@553 561 <title>CVS</title>
bos@553 562
bos@553 563 <para>CVS is probably the most widely used revision control tool
bos@553 564 in the world. Due to its age and internal untidiness, it has
bos@553 565 been only lightly maintained for many years.</para>
bos@553 566
bos@553 567 <para>It has a centralised client/server architecture. It does
bos@553 568 not group related file changes into atomic commits, making it
bos@553 569 easy for people to <quote>break the build</quote>: one person
bos@553 570 can successfully commit part of a change and then be blocked
bos@553 571 by the need for a merge, causing other people to see only a
bos@553 572 portion of the work they intended to do. This also affects
bos@553 573 how you work with project history. If you want to see all of
bos@553 574 the modifications someone made as part of a task, you will
bos@553 575 need to manually inspect the descriptions and timestamps of
bos@553 576 the changes made to each file involved (if you even know what
bos@553 577 those files were).</para>
bos@553 578
bos@553 579 <para>CVS has a muddled notion of tags and branches that I will
bos@553 580 not attempt to even describe. It does not support renaming of
bos@553 581 files or directories well, making it easy to corrupt a
bos@553 582 repository. It has almost no internal consistency checking
bos@553 583 capabilities, so it is usually not even possible to tell
bos@553 584 whether or how a repository is corrupt. I would not recommend
bos@553 585 CVS for any project, existing or new.</para>
bos@553 586
bos@553 587 <para>Mercurial can import CVS revision history. However, there
bos@553 588 are a few caveats that apply; these are true of every other
bos@553 589 revision control tool's CVS importer, too. Due to CVS's lack
bos@553 590 of atomic changes and unversioned filesystem hierarchy, it is
bos@553 591 not possible to reconstruct CVS history completely accurately;
bos@553 592 some guesswork is involved, and renames will usually not show
bos@553 593 up. Because a lot of advanced CVS administration has to be
bos@553 594 done by hand and is hence error-prone, it's common for CVS
bos@553 595 importers to run into multiple problems with corrupted
bos@553 596 repositories (completely bogus revision timestamps and files
bos@553 597 that have remained locked for over a decade are just two of
bos@553 598 the less interesting problems I can recall from personal
bos@553 599 experience).</para>
bos@553 600
bos@553 601 <para>Mercurial can import revision history from a CVS
bos@553 602 repository.</para>
bos@553 603
bos@553 604
bos@553 605 </sect2>
bos@553 606 <sect2>
bos@553 607 <title>Commercial tools</title>
bos@553 608
bos@553 609 <para>Perforce has a centralised client/server architecture,
bos@553 610 with no client-side caching of any data. Unlike modern
bos@553 611 revision control tools, Perforce requires that a user run a
bos@553 612 command to inform the server about every file they intend to
bos@553 613 edit.</para>
bos@553 614
bos@553 615 <para>The performance of Perforce is quite good for small teams,
bos@553 616 but it falls off rapidly as the number of users grows beyond a
bos@553 617 few dozen. Modestly large Perforce installations require the
bos@553 618 deployment of proxies to cope with the load their users
bos@553 619 generate.</para>
bos@553 620
bos@553 621
bos@553 622 </sect2>
bos@553 623 <sect2>
bos@553 624 <title>Choosing a revision control tool</title>
bos@553 625
bos@553 626 <para>With the exception of CVS, all of the tools listed above
bos@553 627 have unique strengths that suit them to particular styles of
bos@553 628 work. There is no single revision control tool that is best
bos@553 629 in all situations.</para>
bos@553 630
bos@553 631 <para>As an example, Subversion is a good choice for working
bos@553 632 with frequently edited binary files, due to its centralised
bos@553 633 nature and support for file locking.</para>
bos@553 634
bos@553 635 <para>I personally find Mercurial's properties of simplicity,
bos@553 636 performance, and good merge support to be a compelling
bos@553 637 combination that has served me well for several years.</para>
bos@553 638
bos@553 639
bos@553 640 </sect2>
bos@553 641 </sect1>
bos@553 642 <sect1>
bos@553 643 <title>Switching from another tool to Mercurial</title>
bos@553 644
bos@553 645 <para>Mercurial is bundled with an extension named <literal
bos@553 646 role="hg-ext">convert</literal>, which can incrementally
bos@553 647 import revision history from several other revision control
bos@553 648 tools. By <quote>incremental</quote>, I mean that you can
bos@553 649 convert all of a project's history to date in one go, then rerun
bos@553 650 the conversion later to obtain new changes that happened after
bos@553 651 the initial conversion.</para>
bos@553 652
bos@553 653 <para>The revision control tools supported by <literal
bos@553 654 role="hg-ext">convert</literal> are as follows:</para>
bos@553 655 <itemizedlist>
bos@553 656 <listitem><para>Subversion</para></listitem>
bos@553 657 <listitem><para>CVS</para></listitem>
bos@553 658 <listitem><para>Git</para></listitem>
bos@553 659 <listitem><para>Darcs</para></listitem></itemizedlist>
bos@553 660
bos@553 661 <para>In addition, <literal role="hg-ext">convert</literal> can
bos@553 662 export changes from Mercurial to Subversion. This makes it
bos@553 663 possible to try Subversion and Mercurial in parallel before
bos@553 664 committing to a switchover, without risking the loss of any
bos@553 665 work.</para>
bos@553 666
bos@553 667 <para>The <command role="hg-ext-conver">convert</command> command
bos@553 668 is easy to use. Simply point it at the path or URL of the
bos@553 669 source repository, optionally give it the name of the
bos@553 670 destination repository, and it will start working. After the
bos@553 671 initial conversion, just run the same command again to import
bos@553 672 new changes.</para>
bos@553 673 </sect1>
bos@553 674 </chapter>
bos@553 675
bos@553 676 <!--
bos@553 677 local variables:
bos@553 678 sgml-parent-document: ("00book.xml" "book" "chapter")
bos@553 679 end:
bos@553 680 -->