hgbook

diff en/ch01-intro.xml @ 990:b4ff7b04efdc

French translation : corrected some mistakes in ch05-daily
author Frédéric Bouquet <youshe.jaalon@gmail.com>
date Thu Sep 10 14:45:17 2009 +0200 (2009-09-10)
parents b338f5490029
children
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/en/ch01-intro.xml	Thu Sep 10 14:45:17 2009 +0200
     1.3 @@ -0,0 +1,734 @@
     1.4 +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
     1.5 +
     1.6 +<chapter id="chap:intro">
     1.7 +  <?dbhtml filename="how-did-we-get-here.html"?>
     1.8 +  <title>How did we get here?</title>
     1.9 +
    1.10 +  <sect1>
    1.11 +    <title>Why revision control? Why Mercurial?</title>
    1.12 +
    1.13 +    <para id="x_6d">Revision control is the process of managing multiple
    1.14 +      versions of a piece of information.  In its simplest form, this
    1.15 +      is something that many people do by hand: every time you modify
    1.16 +      a file, save it under a new name that contains a number, each
    1.17 +      one higher than the number of the preceding version.</para>
    1.18 +
    1.19 +    <para id="x_6e">Manually managing multiple versions of even a single file is
    1.20 +      an error-prone task, though, so software tools to help automate
    1.21 +      this process have long been available.  The earliest automated
    1.22 +      revision control tools were intended to help a single user to
    1.23 +      manage revisions of a single file.  Over the past few decades,
    1.24 +      the scope of revision control tools has expanded greatly; they
    1.25 +      now manage multiple files, and help multiple people to work
    1.26 +      together.  The best modern revision control tools have no
    1.27 +      problem coping with thousands of people working together on
    1.28 +      projects that consist of hundreds of thousands of files.</para>
    1.29 +
    1.30 +    <para id="x_6f">The arrival of distributed revision control is relatively
    1.31 +      recent, and so far this new field has grown due to people's
    1.32 +      willingness to explore ill-charted territory.</para>
    1.33 +
    1.34 +    <para id="x_70">I am writing a book about distributed revision control
    1.35 +      because I believe that it is an important subject that deserves
    1.36 +      a field guide. I chose to write about Mercurial because it is
    1.37 +      the easiest tool to learn the terrain with, and yet it scales to
    1.38 +      the demands of real, challenging environments where many other
    1.39 +      revision control tools buckle.</para>
    1.40 +
    1.41 +    <sect2>
    1.42 +      <title>Why use revision control?</title>
    1.43 +
    1.44 +      <para id="x_71">There are a number of reasons why you or your team might
    1.45 +	want to use an automated revision control tool for a
    1.46 +	project.</para>
    1.47 +
    1.48 +      <itemizedlist>
    1.49 +	<listitem><para id="x_72">It will track the history and evolution of
    1.50 +	    your project, so you don't have to.  For every change,
    1.51 +	    you'll have a log of <emphasis>who</emphasis> made it;
    1.52 +	    <emphasis>why</emphasis> they made it;
    1.53 +	    <emphasis>when</emphasis> they made it; and
    1.54 +	    <emphasis>what</emphasis> the change
    1.55 +	    was.</para></listitem>
    1.56 +	<listitem><para id="x_73">When you're working with other people,
    1.57 +	    revision control software makes it easier for you to
    1.58 +	    collaborate.  For example, when people more or less
    1.59 +	    simultaneously make potentially incompatible changes, the
    1.60 +	    software will help you to identify and resolve those
    1.61 +	    conflicts.</para></listitem>
    1.62 +	<listitem><para id="x_74">It can help you to recover from mistakes.  If
    1.63 +	    you make a change that later turns out to be in error, you
    1.64 +	    can revert to an earlier version of one or more files.  In
    1.65 +	    fact, a <emphasis>really</emphasis> good revision control
    1.66 +	    tool will even help you to efficiently figure out exactly
    1.67 +	    when a problem was introduced (see <xref
    1.68 +	      linkend="sec:undo:bisect"/> for details).</para></listitem>
    1.69 +	<listitem><para id="x_75">It will help you to work simultaneously on,
    1.70 +	    and manage the drift between, multiple versions of your
    1.71 +	    project.</para></listitem>
    1.72 +      </itemizedlist>
    1.73 +
    1.74 +      <para id="x_76">Most of these reasons are equally
    1.75 +	valid&emdash;at least in theory&emdash;whether you're working
    1.76 +	on a project by yourself, or with a hundred other
    1.77 +	people.</para>
    1.78 +
    1.79 +      <para id="x_77">A key question about the practicality of revision control
    1.80 +	at these two different scales (<quote>lone hacker</quote> and
    1.81 +	<quote>huge team</quote>) is how its
    1.82 +	<emphasis>benefits</emphasis> compare to its
    1.83 +	<emphasis>costs</emphasis>.  A revision control tool that's
    1.84 +	difficult to understand or use is going to impose a high
    1.85 +	cost.</para>
    1.86 +
    1.87 +      <para id="x_78">A five-hundred-person project is likely to collapse under
    1.88 +	its own weight almost immediately without a revision control
    1.89 +	tool and process. In this case, the cost of using revision
    1.90 +	control might hardly seem worth considering, since
    1.91 +	<emphasis>without</emphasis> it, failure is almost
    1.92 +	guaranteed.</para>
    1.93 +
    1.94 +      <para id="x_79">On the other hand, a one-person <quote>quick hack</quote>
    1.95 +	might seem like a poor place to use a revision control tool,
    1.96 +	because surely the cost of using one must be close to the
    1.97 +	overall cost of the project.  Right?</para>
    1.98 +
    1.99 +      <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of
   1.100 +	these scales of development.  You can learn the basics in just
   1.101 +	a few minutes, and due to its low overhead, you can apply
   1.102 +	revision control to the smallest of projects with ease.  Its
   1.103 +	simplicity means you won't have a lot of abstruse concepts or
   1.104 +	command sequences competing for mental space with whatever
   1.105 +	you're <emphasis>really</emphasis> trying to do.  At the same
   1.106 +	time, Mercurial's high performance and peer-to-peer nature let
   1.107 +	you scale painlessly to handle large projects.</para>
   1.108 +
   1.109 +      <para id="x_7b">No revision control tool can rescue a poorly run project,
   1.110 +	but a good choice of tools can make a huge difference to the
   1.111 +	fluidity with which you can work on a project.</para>
   1.112 +
   1.113 +    </sect2>
   1.114 +
   1.115 +    <sect2>
   1.116 +      <title>The many names of revision control</title>
   1.117 +
   1.118 +      <para id="x_7c">Revision control is a diverse field, so much so that it is
   1.119 +	referred to by many names and acronyms.  Here are a few of the
   1.120 +	more common variations you'll encounter:</para>
   1.121 +      <itemizedlist>
   1.122 +	<listitem><para id="x_7d">Revision control (RCS)</para></listitem>
   1.123 +	<listitem><para id="x_7e">Software configuration management (SCM), or
   1.124 +	    configuration management</para></listitem>
   1.125 +	<listitem><para id="x_7f">Source code management</para></listitem>
   1.126 +	<listitem><para id="x_80">Source code control, or source
   1.127 +	    control</para></listitem>
   1.128 +	<listitem><para id="x_81">Version control
   1.129 +	    (VCS)</para></listitem></itemizedlist>
   1.130 +      <para id="x_82">Some people claim that these terms actually have different
   1.131 +	meanings, but in practice they overlap so much that there's no
   1.132 +	agreed or even useful way to tease them apart.</para>
   1.133 +
   1.134 +    </sect2>
   1.135 +  </sect1>
   1.136 +
   1.137 +  <sect1>
   1.138 +    <title>About the examples in this book</title>
   1.139 +
   1.140 +    <para id="x_84">This book takes an unusual approach to code samples.  Every
   1.141 +      example is <quote>live</quote>&emdash;each one is actually the result
   1.142 +      of a shell script that executes the Mercurial commands you see.
   1.143 +      Every time an image of the book is built from its sources, all
   1.144 +      the example scripts are automatically run, and their current
   1.145 +      results compared against their expected results.</para>
   1.146 +
   1.147 +    <para id="x_85">The advantage of this approach is that the examples are
   1.148 +      always accurate; they describe <emphasis>exactly</emphasis> the
   1.149 +      behavior of the version of Mercurial that's mentioned at the
   1.150 +      front of the book.  If I update the version of Mercurial that
   1.151 +      I'm documenting, and the output of some command changes, the
   1.152 +      build fails.</para>
   1.153 +
   1.154 +    <para id="x_86">There is a small disadvantage to this approach, which is
   1.155 +      that the dates and times you'll see in examples tend to be
   1.156 +      <quote>squashed</quote> together in a way that they wouldn't be
   1.157 +      if the same commands were being typed by a human.  Where a human
   1.158 +      can issue no more than one command every few seconds, with any
   1.159 +      resulting timestamps correspondingly spread out, my automated
   1.160 +      example scripts run many commands in one second.</para>
   1.161 +
   1.162 +    <para id="x_87">As an instance of this, several consecutive commits in an
   1.163 +      example can show up as having occurred during the same second.
   1.164 +      You can see this occur in the <literal
   1.165 +	role="hg-ext">bisect</literal> example in <xref
   1.166 +	linkend="sec:undo:bisect"/>, for instance.</para>
   1.167 +
   1.168 +    <para id="x_88">So when you're reading examples, don't place too much weight
   1.169 +      on the dates or times you see in the output of commands.  But
   1.170 +      <emphasis>do</emphasis> be confident that the behavior you're
   1.171 +      seeing is consistent and reproducible.</para>
   1.172 +
   1.173 +  </sect1>
   1.174 +
   1.175 +  <sect1>
   1.176 +    <title>Trends in the field</title>
   1.177 +
   1.178 +    <para id="x_89">There has been an unmistakable trend in the development and
   1.179 +      use of revision control tools over the past four decades, as
   1.180 +      people have become familiar with the capabilities of their tools
   1.181 +      and constrained by their limitations.</para>
   1.182 +
   1.183 +    <para id="x_8a">The first generation began by managing single files on
   1.184 +      individual computers.  Although these tools represented a huge
   1.185 +      advance over ad-hoc manual revision control, their locking model
   1.186 +      and reliance on a single computer limited them to small,
   1.187 +      tightly-knit teams.</para>
   1.188 +
   1.189 +    <para id="x_8b">The second generation loosened these constraints by moving
   1.190 +      to network-centered architectures, and managing entire projects
   1.191 +      at a time.  As projects grew larger, they ran into new problems.
   1.192 +      With clients needing to talk to servers very frequently, server
   1.193 +      scaling became an issue for large projects.  An unreliable
   1.194 +      network connection could prevent remote users from being able to
   1.195 +      talk to the server at all.  As open source projects started
   1.196 +      making read-only access available anonymously to anyone, people
   1.197 +      without commit privileges found that they could not use the
   1.198 +      tools to interact with a project in a natural way, as they could
   1.199 +      not record their changes.</para>
   1.200 +
   1.201 +    <para id="x_8c">The current generation of revision control tools is
   1.202 +      peer-to-peer in nature.  All of these systems have dropped the
   1.203 +      dependency on a single central server, and allow people to
   1.204 +      distribute their revision control data to where it's actually
   1.205 +      needed.  Collaboration over the Internet has moved from
   1.206 +      constrained by technology to a matter of choice and consensus.
   1.207 +      Modern tools can operate offline indefinitely and autonomously,
   1.208 +      with a network connection only needed when syncing changes with
   1.209 +      another repository.</para>
   1.210 +
   1.211 +  </sect1>
   1.212 +  <sect1>
   1.213 +    <title>A few of the advantages of distributed revision
   1.214 +      control</title>
   1.215 +
   1.216 +    <para id="x_8d">Even though distributed revision control tools have for
   1.217 +      several years been as robust and usable as their
   1.218 +      previous-generation counterparts, people using older tools have
   1.219 +      not yet necessarily woken up to their advantages.  There are a
   1.220 +      number of ways in which distributed tools shine relative to
   1.221 +      centralised ones.</para>
   1.222 +
   1.223 +    <para id="x_8e">For an individual developer, distributed tools are almost
   1.224 +      always much faster than centralised tools.  This is for a simple
   1.225 +      reason: a centralised tool needs to talk over the network for
   1.226 +      many common operations, because most metadata is stored in a
   1.227 +      single copy on the central server.  A distributed tool stores
   1.228 +      all of its metadata locally.  All else being equal, talking over
   1.229 +      the network adds overhead to a centralised tool.  Don't
   1.230 +      underestimate the value of a snappy, responsive tool: you're
   1.231 +      going to spend a lot of time interacting with your revision
   1.232 +      control software.</para>
   1.233 +
   1.234 +    <para id="x_8f">Distributed tools are indifferent to the vagaries of your
   1.235 +      server infrastructure, again because they replicate metadata to
   1.236 +      so many locations.  If you use a centralised system and your
   1.237 +      server catches fire, you'd better hope that your backup media
   1.238 +      are reliable, and that your last backup was recent and actually
   1.239 +      worked.  With a distributed tool, you have many backups
   1.240 +      available on every contributor's computer.</para>
   1.241 +
   1.242 +    <para id="x_90">The reliability of your network will affect distributed
   1.243 +      tools far less than it will centralised tools.  You can't even
   1.244 +      use a centralised tool without a network connection, except for
   1.245 +      a few highly constrained commands.  With a distributed tool, if
   1.246 +      your network connection goes down while you're working, you may
   1.247 +      not even notice.  The only thing you won't be able to do is talk
   1.248 +      to repositories on other computers, something that is relatively
   1.249 +      rare compared with local operations.  If you have a far-flung
   1.250 +      team of collaborators, this may be significant.</para>
   1.251 +
   1.252 +    <sect2>
   1.253 +      <title>Advantages for open source projects</title>
   1.254 +
   1.255 +      <para id="x_91">If you take a shine to an open source project and decide
   1.256 +	that you would like to start hacking on it, and that project
   1.257 +	uses a distributed revision control tool, you are at once a
   1.258 +	peer with the people who consider themselves the
   1.259 +	<quote>core</quote> of that project.  If they publish their
   1.260 +	repositories, you can immediately copy their project history,
   1.261 +	start making changes, and record your work, using the same
   1.262 +	tools in the same ways as insiders.  By contrast, with a
   1.263 +	centralised tool, you must use the software in a <quote>read
   1.264 +	  only</quote> mode unless someone grants you permission to
   1.265 +	commit changes to their central server.  Until then, you won't
   1.266 +	be able to record changes, and your local modifications will
   1.267 +	be at risk of corruption any time you try to update your
   1.268 +	client's view of the repository.</para>
   1.269 +
   1.270 +      <sect3>
   1.271 +	<title>The forking non-problem</title>
   1.272 +
   1.273 +	<para id="x_92">It has been suggested that distributed revision control
   1.274 +	  tools pose some sort of risk to open source projects because
   1.275 +	  they make it easy to <quote>fork</quote> the development of
   1.276 +	  a project.  A fork happens when there are differences in
   1.277 +	  opinion or attitude between groups of developers that cause
   1.278 +	  them to decide that they can't work together any longer.
   1.279 +	  Each side takes a more or less complete copy of the
   1.280 +	  project's source code, and goes off in its own
   1.281 +	  direction.</para>
   1.282 +
   1.283 +	<para id="x_93">Sometimes the camps in a fork decide to reconcile their
   1.284 +	  differences. With a centralised revision control system, the
   1.285 +	  <emphasis>technical</emphasis> process of reconciliation is
   1.286 +	  painful, and has to be performed largely by hand.  You have
   1.287 +	  to decide whose revision history is going to
   1.288 +	  <quote>win</quote>, and graft the other team's changes into
   1.289 +	  the tree somehow. This usually loses some or all of one
   1.290 +	  side's revision history.</para>
   1.291 +
   1.292 +	<para id="x_94">What distributed tools do with respect to forking is
   1.293 +	  they make forking the <emphasis>only</emphasis> way to
   1.294 +	  develop a project.  Every single change that you make is
   1.295 +	  potentially a fork point.  The great strength of this
   1.296 +	  approach is that a distributed revision control tool has to
   1.297 +	  be really good at <emphasis>merging</emphasis> forks,
   1.298 +	  because forks are absolutely fundamental: they happen all
   1.299 +	  the time.</para>
   1.300 +
   1.301 +	<para id="x_95">If every piece of work that everybody does, all the
   1.302 +	  time, is framed in terms of forking and merging, then what
   1.303 +	  the open source world refers to as a <quote>fork</quote>
   1.304 +	  becomes <emphasis>purely</emphasis> a social issue.  If
   1.305 +	  anything, distributed tools <emphasis>lower</emphasis> the
   1.306 +	  likelihood of a fork:</para>
   1.307 +	<itemizedlist>
   1.308 +	  <listitem><para id="x_96">They eliminate the social distinction that
   1.309 +	      centralised tools impose: that between insiders (people
   1.310 +	      with commit access) and outsiders (people
   1.311 +	      without).</para></listitem>
   1.312 +	  <listitem><para id="x_97">They make it easier to reconcile after a
   1.313 +	      social fork, because all that's involved from the
   1.314 +	      perspective of the revision control software is just
   1.315 +	      another merge.</para></listitem></itemizedlist>
   1.316 +
   1.317 +	<para id="x_98">Some people resist distributed tools because they want
   1.318 +	  to retain tight control over their projects, and they
   1.319 +	  believe that centralised tools give them this control.
   1.320 +	  However, if you're of this belief, and you publish your CVS
   1.321 +	  or Subversion repositories publicly, there are plenty of
   1.322 +	  tools available that can pull out your entire project's
   1.323 +	  history (albeit slowly) and recreate it somewhere that you
   1.324 +	  don't control.  So while your control in this case is
   1.325 +	  illusory, you are forgoing the ability to fluidly
   1.326 +	  collaborate with whatever people feel compelled to mirror
   1.327 +	  and fork your history.</para>
   1.328 +
   1.329 +      </sect3>
   1.330 +    </sect2>
   1.331 +    <sect2>
   1.332 +      <title>Advantages for commercial projects</title>
   1.333 +
   1.334 +      <para id="x_99">Many commercial projects are undertaken by teams that are
   1.335 +	scattered across the globe.  Contributors who are far from a
   1.336 +	central server will see slower command execution and perhaps
   1.337 +	less reliability.  Commercial revision control systems attempt
   1.338 +	to ameliorate these problems with remote-site replication
   1.339 +	add-ons that are typically expensive to buy and cantankerous
   1.340 +	to administer.  A distributed system doesn't suffer from these
   1.341 +	problems in the first place.  Better yet, you can easily set
   1.342 +	up multiple authoritative servers, say one per site, so that
   1.343 +	there's no redundant communication between repositories over
   1.344 +	expensive long-haul network links.</para>
   1.345 +
   1.346 +      <para id="x_9a">Centralised revision control systems tend to have
   1.347 +	relatively low scalability.  It's not unusual for an expensive
   1.348 +	centralised system to fall over under the combined load of
   1.349 +	just a few dozen concurrent users.  Once again, the typical
   1.350 +	response tends to be an expensive and clunky replication
   1.351 +	facility.  Since the load on a central server&emdash;if you have
   1.352 +	one at all&emdash;is many times lower with a distributed tool
   1.353 +	(because all of the data is replicated everywhere), a single
   1.354 +	cheap server can handle the needs of a much larger team, and
   1.355 +	replication to balance load becomes a simple matter of
   1.356 +	scripting.</para>
   1.357 +
   1.358 +      <para id="x_9b">If you have an employee in the field, troubleshooting a
   1.359 +	problem at a customer's site, they'll benefit from distributed
   1.360 +	revision control. The tool will let them generate custom
   1.361 +	builds, try different fixes in isolation from each other, and
   1.362 +	search efficiently through history for the sources of bugs and
   1.363 +	regressions in the customer's environment, all without needing
   1.364 +	to connect to your company's network.</para>
   1.365 +
   1.366 +    </sect2>
   1.367 +  </sect1>
   1.368 +  <sect1>
   1.369 +    <title>Why choose Mercurial?</title>
   1.370 +
   1.371 +    <para id="x_9c">Mercurial has a unique set of properties that make it a
   1.372 +      particularly good choice as a revision control system.</para>
   1.373 +    <itemizedlist>
   1.374 +      <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>
   1.375 +      <listitem><para id="x_9e">It is lightweight.</para></listitem>
   1.376 +      <listitem><para id="x_9f">It scales excellently.</para></listitem>
   1.377 +      <listitem><para id="x_a0">It is easy to
   1.378 +	  customise.</para></listitem></itemizedlist>
   1.379 +
   1.380 +    <para id="x_a1">If you are at all familiar with revision control systems,
   1.381 +      you should be able to get up and running with Mercurial in less
   1.382 +      than five minutes.  Even if not, it will take no more than a few
   1.383 +      minutes longer.  Mercurial's command and feature sets are
   1.384 +      generally uniform and consistent, so you can keep track of a few
   1.385 +      general rules instead of a host of exceptions.</para>
   1.386 +
   1.387 +    <para id="x_a2">On a small project, you can start working with Mercurial in
   1.388 +      moments. Creating new changes and branches; transferring changes
   1.389 +      around (whether locally or over a network); and history and
   1.390 +      status operations are all fast.  Mercurial attempts to stay
   1.391 +      nimble and largely out of your way by combining low cognitive
   1.392 +      overhead with blazingly fast operations.</para>
   1.393 +
   1.394 +    <para id="x_a3">The usefulness of Mercurial is not limited to small
   1.395 +      projects: it is used by projects with hundreds to thousands of
   1.396 +      contributors, each containing tens of thousands of files and
   1.397 +      hundreds of megabytes of source code.</para>
   1.398 +
   1.399 +    <para id="x_a4">If the core functionality of Mercurial is not enough for
   1.400 +      you, it's easy to build on.  Mercurial is well suited to
   1.401 +      scripting tasks, and its clean internals and implementation in
   1.402 +      Python make it easy to add features in the form of extensions.
   1.403 +      There are a number of popular and useful extensions already
   1.404 +      available, ranging from helping to identify bugs to improving
   1.405 +      performance.</para>
   1.406 +
   1.407 +  </sect1>
   1.408 +  <sect1>
   1.409 +    <title>Mercurial compared with other tools</title>
   1.410 +
   1.411 +    <para id="x_a5">Before you read on, please understand that this section
   1.412 +      necessarily reflects my own experiences, interests, and (dare I
   1.413 +      say it) biases.  I have used every one of the revision control
   1.414 +      tools listed below, in most cases for several years at a
   1.415 +      time.</para>
   1.416 +
   1.417 +
   1.418 +    <sect2>
   1.419 +      <title>Subversion</title>
   1.420 +
   1.421 +      <para id="x_a6">Subversion is a popular revision control tool, developed
   1.422 +	to replace CVS.  It has a centralised client/server
   1.423 +	architecture.</para>
   1.424 +
   1.425 +      <para id="x_a7">Subversion and Mercurial have similarly named commands for
   1.426 +	performing the same operations, so if you're familiar with
   1.427 +	one, it is easy to learn to use the other.  Both tools are
   1.428 +	portable to all popular operating systems.</para>
   1.429 +
   1.430 +      <para id="x_a8">Prior to version 1.5, Subversion had no useful support for
   1.431 +	merges. At the time of writing, its merge tracking capability
   1.432 +	is new, and known to be <ulink
   1.433 +	  url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated 
   1.434 +	  and buggy</ulink>.</para>
   1.435 +
   1.436 +      <para id="x_a9">Mercurial has a substantial performance advantage over
   1.437 +	Subversion on every revision control operation I have
   1.438 +	benchmarked.  I have measured its advantage as ranging from a
   1.439 +	factor of two to a factor of six when compared with Subversion
   1.440 +	1.4.3's <emphasis>ra_local</emphasis> file store, which is the
   1.441 +	fastest access method available.  In more realistic
   1.442 +	deployments involving a network-based store, Subversion will
   1.443 +	be at a substantially larger disadvantage.  Because many
   1.444 +	Subversion commands must talk to the server and Subversion
   1.445 +	does not have useful replication facilities, server capacity
   1.446 +	and network bandwidth become bottlenecks for modestly large
   1.447 +	projects.</para>
   1.448 +
   1.449 +      <para id="x_aa">Additionally, Subversion incurs substantial storage
   1.450 +	overhead to avoid network transactions for a few common
   1.451 +	operations, such as finding modified files
   1.452 +	(<literal>status</literal>) and displaying modifications
   1.453 +	against the current revision (<literal>diff</literal>).  As a
   1.454 +	result, a Subversion working copy is often the same size as,
   1.455 +	or larger than, a Mercurial repository and working directory,
   1.456 +	even though the Mercurial repository contains a complete
   1.457 +	history of the project.</para>
   1.458 +
   1.459 +      <para id="x_ab">Subversion is widely supported by third party tools.
   1.460 +	Mercurial currently lags considerably in this area.  This gap
   1.461 +	is closing, however, and indeed some of Mercurial's GUI tools
   1.462 +	now outshine their Subversion equivalents.  Like Mercurial,
   1.463 +	Subversion has an excellent user manual.</para>
   1.464 +
   1.465 +      <para id="x_ac">Because Subversion doesn't store revision history on the
   1.466 +	client, it is well suited to managing projects that deal with
   1.467 +	lots of large, opaque binary files.  If you check in fifty
   1.468 +	revisions to an incompressible 10MB file, Subversion's
   1.469 +	client-side space usage stays constant The space used by any
   1.470 +	distributed SCM will grow rapidly in proportion to the number
   1.471 +	of revisions, because the differences between each revision
   1.472 +	are large.</para>
   1.473 +
   1.474 +      <para id="x_ad">In addition, it's often difficult or, more usually,
   1.475 +	impossible to merge different versions of a binary file.
   1.476 +	Subversion's ability to let a user lock a file, so that they
   1.477 +	temporarily have the exclusive right to commit changes to it,
   1.478 +	can be a significant advantage to a project where binary files
   1.479 +	are widely used.</para>
   1.480 +
   1.481 +      <para id="x_ae">Mercurial can import revision history from a Subversion
   1.482 +	repository. It can also export revision history to a
   1.483 +	Subversion repository.  This makes it easy to <quote>test the
   1.484 +	  waters</quote> and use Mercurial and Subversion in parallel
   1.485 +	before deciding to switch.  History conversion is incremental,
   1.486 +	so you can perform an initial conversion, then small
   1.487 +	additional conversions afterwards to bring in new
   1.488 +	changes.</para>
   1.489 +
   1.490 +
   1.491 +    </sect2>
   1.492 +    <sect2>
   1.493 +      <title>Git</title>
   1.494 +
   1.495 +      <para id="x_af">Git is a distributed revision control tool that was
   1.496 +	developed for managing the Linux kernel source tree.  Like
   1.497 +	Mercurial, its early design was somewhat influenced by
   1.498 +	Monotone.</para>
   1.499 +
   1.500 +      <para id="x_b0">Git has a very large command set, with version 1.5.0
   1.501 +	providing 139 individual commands.  It has something of a
   1.502 +	reputation for being difficult to learn.  Compared to Git,
   1.503 +	Mercurial has a strong focus on simplicity.</para>
   1.504 +
   1.505 +      <para id="x_b1">In terms of performance, Git is extremely fast.  In
   1.506 +	several cases, it is faster than Mercurial, at least on Linux,
   1.507 +	while Mercurial performs better on other operations.  However,
   1.508 +	on Windows, the performance and general level of support that
   1.509 +	Git provides is, at the time of writing, far behind that of
   1.510 +	Mercurial.</para>
   1.511 +
   1.512 +      <para id="x_b2">While a Mercurial repository needs no maintenance, a Git
   1.513 +	repository requires frequent manual <quote>repacks</quote> of
   1.514 +	its metadata.  Without these, performance degrades, while
   1.515 +	space usage grows rapidly.  A server that contains many Git
   1.516 +	repositories that are not rigorously and frequently repacked
   1.517 +	will become heavily disk-bound during backups, and there have
   1.518 +	been instances of daily backups taking far longer than 24
   1.519 +	hours as a result.  A freshly packed Git repository is
   1.520 +	slightly smaller than a Mercurial repository, but an unpacked
   1.521 +	repository is several orders of magnitude larger.</para>
   1.522 +
   1.523 +      <para id="x_b3">The core of Git is written in C.  Many Git commands are
   1.524 +	implemented as shell or Perl scripts, and the quality of these
   1.525 +	scripts varies widely. I have encountered several instances
   1.526 +	where scripts charged along blindly in the presence of errors
   1.527 +	that should have been fatal.</para>
   1.528 +
   1.529 +      <para id="x_b4">Mercurial can import revision history from a Git
   1.530 +	repository.</para>
   1.531 +
   1.532 +
   1.533 +    </sect2>
   1.534 +    <sect2>
   1.535 +      <title>CVS</title>
   1.536 +
   1.537 +      <para id="x_b5">CVS is probably the most widely used revision control tool
   1.538 +	in the world.  Due to its age and internal untidiness, it has
   1.539 +	been only lightly maintained for many years.</para>
   1.540 +
   1.541 +      <para id="x_b6">It has a centralised client/server architecture.  It does
   1.542 +	not group related file changes into atomic commits, making it
   1.543 +	easy for people to <quote>break the build</quote>: one person
   1.544 +	can successfully commit part of a change and then be blocked
   1.545 +	by the need for a merge, causing other people to see only a
   1.546 +	portion of the work they intended to do.  This also affects
   1.547 +	how you work with project history.  If you want to see all of
   1.548 +	the modifications someone made as part of a task, you will
   1.549 +	need to manually inspect the descriptions and timestamps of
   1.550 +	the changes made to each file involved (if you even know what
   1.551 +	those files were).</para>
   1.552 +
   1.553 +      <para id="x_b7">CVS has a muddled notion of tags and branches that I will
   1.554 +	not attempt to even describe.  It does not support renaming of
   1.555 +	files or directories well, making it easy to corrupt a
   1.556 +	repository.  It has almost no internal consistency checking
   1.557 +	capabilities, so it is usually not even possible to tell
   1.558 +	whether or how a repository is corrupt.  I would not recommend
   1.559 +	CVS for any project, existing or new.</para>
   1.560 +
   1.561 +      <para id="x_b8">Mercurial can import CVS revision history.  However, there
   1.562 +	are a few caveats that apply; these are true of every other
   1.563 +	revision control tool's CVS importer, too.  Due to CVS's lack
   1.564 +	of atomic changes and unversioned filesystem hierarchy, it is
   1.565 +	not possible to reconstruct CVS history completely accurately;
   1.566 +	some guesswork is involved, and renames will usually not show
   1.567 +	up.  Because a lot of advanced CVS administration has to be
   1.568 +	done by hand and is hence error-prone, it's common for CVS
   1.569 +	importers to run into multiple problems with corrupted
   1.570 +	repositories (completely bogus revision timestamps and files
   1.571 +	that have remained locked for over a decade are just two of
   1.572 +	the less interesting problems I can recall from personal
   1.573 +	experience).</para>
   1.574 +
   1.575 +      <para id="x_b9">Mercurial can import revision history from a CVS
   1.576 +	repository.</para>
   1.577 +
   1.578 +
   1.579 +    </sect2>
   1.580 +    <sect2>
   1.581 +      <title>Commercial tools</title>
   1.582 +
   1.583 +      <para id="x_ba">Perforce has a centralised client/server architecture,
   1.584 +	with no client-side caching of any data.  Unlike modern
   1.585 +	revision control tools, Perforce requires that a user run a
   1.586 +	command to inform the server about every file they intend to
   1.587 +	edit.</para>
   1.588 +
   1.589 +      <para id="x_bb">The performance of Perforce is quite good for small teams,
   1.590 +	but it falls off rapidly as the number of users grows beyond a
   1.591 +	few dozen. Modestly large Perforce installations require the
   1.592 +	deployment of proxies to cope with the load their users
   1.593 +	generate.</para>
   1.594 +
   1.595 +
   1.596 +    </sect2>
   1.597 +    <sect2>
   1.598 +      <title>Choosing a revision control tool</title>
   1.599 +
   1.600 +      <para id="x_bc">With the exception of CVS, all of the tools listed above
   1.601 +	have unique strengths that suit them to particular styles of
   1.602 +	work.  There is no single revision control tool that is best
   1.603 +	in all situations.</para>
   1.604 +
   1.605 +      <para id="x_bd">As an example, Subversion is a good choice for working
   1.606 +	with frequently edited binary files, due to its centralised
   1.607 +	nature and support for file locking.</para>
   1.608 +
   1.609 +      <para id="x_be">I personally find Mercurial's properties of simplicity,
   1.610 +	performance, and good merge support to be a compelling
   1.611 +	combination that has served me well for several years.</para>
   1.612 +
   1.613 +
   1.614 +    </sect2>
   1.615 +  </sect1>
   1.616 +  <sect1>
   1.617 +    <title>Switching from another tool to Mercurial</title>
   1.618 +
   1.619 +    <para id="x_bf">Mercurial is bundled with an extension named <literal
   1.620 +	role="hg-ext">convert</literal>, which can incrementally
   1.621 +      import revision history from several other revision control
   1.622 +      tools.  By <quote>incremental</quote>, I mean that you can
   1.623 +      convert all of a project's history to date in one go, then rerun
   1.624 +      the conversion later to obtain new changes that happened after
   1.625 +      the initial conversion.</para>
   1.626 +
   1.627 +    <para id="x_c0">The revision control tools supported by <literal
   1.628 +	role="hg-ext">convert</literal> are as follows:</para>
   1.629 +    <itemizedlist>
   1.630 +      <listitem><para id="x_c1">Subversion</para></listitem>
   1.631 +      <listitem><para id="x_c2">CVS</para></listitem>
   1.632 +      <listitem><para id="x_c3">Git</para></listitem>
   1.633 +      <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>
   1.634 +
   1.635 +    <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can
   1.636 +      export changes from Mercurial to Subversion.  This makes it
   1.637 +      possible to try Subversion and Mercurial in parallel before
   1.638 +      committing to a switchover, without risking the loss of any
   1.639 +      work.</para>
   1.640 +
   1.641 +    <para id="x_c6">The <command role="hg-ext-convert">convert</command> command
   1.642 +      is easy to use.  Simply point it at the path or URL of the
   1.643 +      source repository, optionally give it the name of the
   1.644 +      destination repository, and it will start working.  After the
   1.645 +      initial conversion, just run the same command again to import
   1.646 +      new changes.</para>
   1.647 +  </sect1>
   1.648 +
   1.649 +  <sect1>
   1.650 +    <title>A short history of revision control</title>
   1.651 +
   1.652 +    <para id="x_c7">The best known of the old-time revision control tools is
   1.653 +      SCCS (Source Code Control System), which Marc Rochkind wrote at
   1.654 +      Bell Labs, in the early 1970s.  SCCS operated on individual
   1.655 +      files, and required every person working on a project to have
   1.656 +      access to a shared workspace on a single system.  Only one
   1.657 +      person could modify a file at any time; arbitration for access
   1.658 +      to files was via locks.  It was common for people to lock files,
   1.659 +      and later forget to unlock them, preventing anyone else from
   1.660 +      modifying those files without the help of an
   1.661 +      administrator.</para>
   1.662 +
   1.663 +    <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the
   1.664 +      early 1980s; he called his program RCS (Revision Control System).
   1.665 +      Like SCCS, RCS required developers to work in a single shared
   1.666 +      workspace, and to lock files to prevent multiple people from
   1.667 +      modifying them simultaneously.</para>
   1.668 +
   1.669 +    <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block
   1.670 +      for a set of shell scripts he initially called cmt, but then
   1.671 +      renamed to CVS (Concurrent Versions System).  The big innovation
   1.672 +      of CVS was that it let developers work simultaneously and
   1.673 +      somewhat independently in their own personal workspaces.  The
   1.674 +      personal workspaces prevented developers from stepping on each
   1.675 +      other's toes all the time, as was common with SCCS and RCS. Each
   1.676 +      developer had a copy of every project file, and could modify
   1.677 +      their copies independently.  They had to merge their edits prior
   1.678 +      to committing changes to the central repository.</para>
   1.679 +
   1.680 +    <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote
   1.681 +      them in C, releasing in 1989 the code that has since developed
   1.682 +      into the modern version of CVS.  CVS subsequently acquired the
   1.683 +      ability to operate over a network connection, giving it a
   1.684 +      client/server architecture.  CVS's architecture is centralised;
   1.685 +      only the server has a copy of the history of the project. Client
   1.686 +      workspaces just contain copies of recent versions of the
   1.687 +      project's files, and a little metadata to tell them where the
   1.688 +      server is.  CVS has been enormously successful; it is probably
   1.689 +      the world's most widely used revision control system.</para>
   1.690 +
   1.691 +    <para id="x_cb">In the early 1990s, Sun Microsystems developed an early
   1.692 +      distributed revision control system, called TeamWare.  A
   1.693 +      TeamWare workspace contains a complete copy of the project's
   1.694 +      history.  TeamWare has no notion of a central repository.  (CVS
   1.695 +      relied upon RCS for its history storage; TeamWare used
   1.696 +      SCCS.)</para>
   1.697 +
   1.698 +    <para id="x_cc">As the 1990s progressed, awareness grew of a number of
   1.699 +      problems with CVS.  It records simultaneous changes to multiple
   1.700 +      files individually, instead of grouping them together as a
   1.701 +      single logically atomic operation.  It does not manage its file
   1.702 +      hierarchy well; it is easy to make a mess of a repository by
   1.703 +      renaming files and directories.  Worse, its source code is
   1.704 +      difficult to read and maintain, which made the <quote>pain
   1.705 +	level</quote> of fixing these architectural problems
   1.706 +      prohibitive.</para>
   1.707 +
   1.708 +    <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had
   1.709 +      worked on CVS, started a project to replace it with a tool that
   1.710 +      would have a better architecture and cleaner code.  The result,
   1.711 +      Subversion, does not stray from CVS's centralised client/server
   1.712 +      model, but it adds multi-file atomic commits, better namespace
   1.713 +      management, and a number of other features that make it a
   1.714 +      generally better tool than CVS. Since its initial release, it
   1.715 +      has rapidly grown in popularity.</para>
   1.716 +
   1.717 +    <para id="x_ce">More or less simultaneously, Graydon Hoare began working on
   1.718 +      an ambitious distributed revision control system that he named
   1.719 +      Monotone. While Monotone addresses many of CVS's design flaws
   1.720 +      and has a peer-to-peer architecture, it goes beyond earlier (and
   1.721 +      subsequent) revision control tools in a number of innovative
   1.722 +      ways.  It uses cryptographic hashes as identifiers, and has an
   1.723 +      integral notion of <quote>trust</quote> for code from different
   1.724 +      sources.</para>
   1.725 +
   1.726 +    <para id="x_cf">Mercurial began life in 2005.  While a few aspects of its
   1.727 +      design are influenced by Monotone, Mercurial focuses on ease of
   1.728 +      use, high performance, and scalability to very large
   1.729 +      projects.</para>
   1.730 +  </sect1>
   1.731 +</chapter>
   1.732 +
   1.733 +<!--
   1.734 +local variables: 
   1.735 +sgml-parent-document: ("00book.xml" "book" "chapter")
   1.736 +end:
   1.737 +-->