hgbook

annotate en/intro.tex @ 298:2b8c6aa370d5

Fix sample output for test 'branch-repo'.
author Guido Ostkamp <hg@ostkamp.fastmail.fm>
date Wed Aug 20 21:54:18 2008 +0200 (2008-08-20)
parents 9dbed77d3ba6
children a168daed199b 231c8469a0ec
rev   line source
bos@16 1 \chapter{Introduction}
bos@16 2 \label{chap:intro}
bos@16 3
bos@217 4 \section{About revision control}
bos@155 5
bos@219 6 Revision control is the process of managing multiple versions of a
bos@219 7 piece of information. In its simplest form, this is something that
bos@219 8 many people do by hand: every time you modify a file, save it under a
bos@219 9 new name that contains a number, each one higher than the number of
bos@219 10 the preceding version.
bos@217 11
bos@217 12 Manually managing multiple versions of even a single file is an
bos@217 13 error-prone task, though, so software tools to help automate this
bos@217 14 process have long been available. The earliest automated revision
bos@217 15 control tools were intended to help a single user to manage revisions
bos@219 16 of a single file. Over the past few decades, the scope of revision
bos@219 17 control tools has expanded greatly; they now manage multiple files,
bos@219 18 and help multiple people to work together. The best modern revision
bos@219 19 control tools have no problem coping with thousands of people working
bos@219 20 together on projects that consist of hundreds of thousands of files.
bos@217 21
bos@217 22 \subsection{Why use revision control?}
bos@217 23
bos@217 24 There are a number of reasons why you or your team might want to use
bos@217 25 an automated revision control tool for a project.
bos@217 26 \begin{itemize}
bos@219 27 \item It will track the history and evolution of your project, so you
bos@219 28 don't have to. For every change, you'll have a log of \emph{who}
bos@219 29 made it; \emph{why} they made it; \emph{when} they made it; and
bos@219 30 \emph{what} the change was.
bos@219 31 \item When you're working with other people, revision control software
bos@219 32 makes it easier for you to collaborate. For example, when people
bos@219 33 more or less simultaneously make potentially incompatible changes,
bos@219 34 the software will help you to identify and resolve those conflicts.
bos@217 35 \item It can help you to recover from mistakes. If you make a change
bos@217 36 that later turns out to be in error, you can revert to an earlier
bos@217 37 version of one or more files. In fact, a \emph{really} good
bos@217 38 revision control tool will even help you to efficiently figure out
bos@217 39 exactly when a problem was introduced (see
bos@217 40 section~\ref{sec:undo:bisect} for details).
bos@218 41 \item It will help you to work simultaneously on, and manage the drift
bos@218 42 between, multiple versions of your project.
bos@217 43 \end{itemize}
bos@218 44 Most of these reasons are equally valid---at least in theory---whether
bos@218 45 you're working on a project by yourself, or with a hundred other
bos@218 46 people.
bos@218 47
bos@218 48 A key question about the practicality of revision control at these two
bos@218 49 different scales (``lone hacker'' and ``huge team'') is how its
bos@218 50 \emph{benefits} compare to its \emph{costs}. A revision control tool
bos@218 51 that's difficult to understand or use is going to impose a high cost.
bos@218 52
bos@219 53 A five-hundred-person project is likely to collapse under its own
bos@219 54 weight almost immediately without a revision control tool and process.
bos@219 55 In this case, the cost of using revision control might hardly seem
bos@219 56 worth considering, since \emph{without} it, failure is almost
bos@219 57 guaranteed.
bos@218 58
bos@218 59 On the other hand, a one-person ``quick hack'' might seem like a poor
bos@218 60 place to use a revision control tool, because surely the cost of using
bos@218 61 one must be close to the overall cost of the project. Right?
bos@218 62
bos@218 63 Mercurial uniquely supports \emph{both} of these scales of
bos@218 64 development. You can learn the basics in just a few minutes, and due
bos@218 65 to its low overhead, you can apply revision control to the smallest of
bos@218 66 projects with ease. Its simplicity means you won't have a lot of
bos@218 67 abstruse concepts or command sequences competing for mental space with
bos@218 68 whatever you're \emph{really} trying to do. At the same time,
bos@218 69 Mercurial's high performance and peer-to-peer nature let you scale
bos@218 70 painlessly to handle large projects.
bos@217 71
bos@219 72 No revision control tool can rescue a poorly run project, but a good
bos@219 73 choice of tools can make a huge difference to the fluidity with which
bos@219 74 you can work on a project.
bos@219 75
bos@217 76 \subsection{The many names of revision control}
bos@217 77
bos@217 78 Revision control is a diverse field, so much so that it doesn't
bos@217 79 actually have a single name or acronym. Here are a few of the more
bos@217 80 common names and acronyms you'll encounter:
bos@217 81 \begin{itemize}
bos@217 82 \item Revision control (RCS)
bos@219 83 \item Software configuration management (SCM), or configuration management
bos@218 84 \item Source code management
bos@219 85 \item Source code control, or source control
bos@217 86 \item Version control (VCS)
bos@217 87 \end{itemize}
bos@217 88 Some people claim that these terms actually have different meanings,
bos@217 89 but in practice they overlap so much that there's no agreed or even
bos@217 90 useful way to tease them apart.
bos@155 91
bos@219 92 \section{A short history of revision control}
bos@155 93
bos@218 94 The best known of the old-time revision control tools is SCCS (Source
bos@218 95 Code Control System), which Marc Rochkind wrote at Bell Labs, in the
bos@218 96 early 1970s. SCCS operated on individual files, and required every
bos@218 97 person working on a project to have access to a shared workspace on a
bos@218 98 single system. Only one person could modify a file at any time;
bos@218 99 arbitration for access to files was via locks. It was common for
bos@218 100 people to lock files, and later forget to unlock them, preventing
bos@218 101 anyone else from modifying those files without the help of an
bos@218 102 administrator.
bos@218 103
bos@218 104 Walter Tichy developed a free alternative to SCCS in the early 1980s;
bos@218 105 he called his program RCS (Revison Control System). Like SCCS, RCS
bos@218 106 required developers to work in a single shared workspace, and to lock
bos@218 107 files to prevent multiple people from modifying them simultaneously.
bos@218 108
bos@218 109 Later in the 1980s, Dick Grune used RCS as a building block for a set
bos@218 110 of shell scripts he initially called cmt, but then renamed to CVS
bos@218 111 (Concurrent Versions System). The big innovation of CVS was that it
bos@218 112 let developers work simultaneously and somewhat independently in their
bos@218 113 own personal workspaces. The personal workspaces prevented developers
bos@218 114 from stepping on each other's toes all the time, as was common with
bos@218 115 SCCS and RCS. Each developer had a copy of every project file, and
bos@218 116 could modify their copies independently. They had to merge their
bos@218 117 edits prior to committing changes to the central repository.
bos@218 118
bos@218 119 Brian Berliner took Grune's original scripts and rewrote them in~C,
bos@218 120 releasing in 1989 the code that has since developed into the modern
bos@218 121 version of CVS. CVS subsequently acquired the ability to operate over
bos@218 122 a network connection, giving it a client/server architecture. CVS's
bos@218 123 architecture is centralised; only the server has a copy of the history
bos@218 124 of the project. Client workspaces just contain copies of recent
bos@218 125 versions of the project's files, and a little metadata to tell them
bos@218 126 where the server is. CVS has been enormously successful; it is
bos@218 127 probably the world's most widely used revision control system.
bos@218 128
bos@218 129 In the early 1990s, Sun Microsystems developed an early distributed
bos@218 130 revision control system, called TeamWare. A TeamWare workspace
bos@218 131 contains a complete copy of the project's history. TeamWare has no
bos@218 132 notion of a central repository. (CVS relied upon RCS for its history
bos@218 133 storage; TeamWare used SCCS.)
bos@218 134
bos@218 135 As the 1990s progressed, awareness grew of a number of problems with
bos@218 136 CVS. It records simultaneous changes to multiple files individually,
bos@218 137 instead of grouping them together as a single logically atomic
bos@218 138 operation. It does not manage its file hierarchy well; it is easy to
bos@218 139 make a mess of a repository by renaming files and directories. Worse,
bos@218 140 its source code is difficult to read and maintain, which made the
bos@218 141 ``pain level'' of fixing these architectural problems prohibitive.
bos@218 142
bos@218 143 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on
bos@218 144 CVS, started a project to replace it with a tool that would have a
bos@218 145 better architecture and cleaner code. The result, Subversion, does
bos@218 146 not stray from CVS's centralised client/server model, but it adds
bos@218 147 multi-file atomic commits, better namespace management, and a number
bos@218 148 of other features that make it a generally better tool than CVS.
bos@218 149 Since its initial release, it has rapidly grown in popularity.
bos@218 150
bos@218 151 More or less simultaneously, Graydon Hoare began working on an
bos@218 152 ambitious distributed revision control system that he named Monotone.
bos@218 153 While Monotone addresses many of CVS's design flaws and has a
bos@218 154 peer-to-peer architecture, it goes beyond earlier (and subsequent)
bos@218 155 revision control tools in a number of innovative ways. It uses
bos@218 156 cryptographic hashes as identifiers, and has an integral notion of
bos@218 157 ``trust'' for code from different sources.
bos@218 158
bos@218 159 Mercurial began life in 2005. While a few aspects of its design are
bos@218 160 influenced by Monotone, Mercurial focuses on ease of use, high
bos@218 161 performance, and scalability to very large projects.
bos@155 162
bos@219 163 \section{Trends in revision control}
bos@219 164
bos@219 165 There has been an unmistakable trend in the development and use of
bos@219 166 revision control tools over the past four decades, as people have
bos@219 167 become familiar with the capabilities of their tools and constrained
bos@219 168 by their limitations.
bos@219 169
bos@219 170 The first generation began by managing single files on individual
bos@219 171 computers. Although these tools represented a huge advance over
bos@219 172 ad-hoc manual revision control, their locking model and reliance on a
bos@219 173 single computer limited them to small, tightly-knit teams.
bos@219 174
bos@219 175 The second generation loosened these constraints by moving to
bos@219 176 network-centered architectures, and managing entire projects at a
bos@219 177 time. As projects grew larger, they ran into new problems. With
bos@219 178 clients needing to talk to servers very frequently, server scaling
bos@219 179 became an issue for large projects. An unreliable network connection
bos@219 180 could prevent remote users from being able to talk to the server at
bos@219 181 all. As open source projects started making read-only access
bos@219 182 available anonymously to anyone, people without commit privileges
bos@219 183 found that they could not use the tools to interact with a project in
bos@219 184 a natural way, as they could not record their changes.
bos@219 185
bos@219 186 The current generation of revision control tools is peer-to-peer in
bos@219 187 nature. All of these systems have dropped the dependency on a single
bos@219 188 central server, and allow people to distribute their revision control
bos@219 189 data to where it's actually needed. Collaboration over the Internet
bos@219 190 has moved from constrained by technology to a matter of choice and
bos@219 191 consensus. Modern tools can operate offline indefinitely and
bos@219 192 autonomously, with a network connection only needed when syncing
bos@219 193 changes with another repository.
bos@219 194
bos@219 195 \section{A few of the advantages of distributed revision control}
bos@219 196
bos@219 197 Even though distributed revision control tools have for several years
bos@219 198 been as robust and usable as their previous-generation counterparts,
bos@219 199 people using older tools have not yet necessarily woken up to their
bos@219 200 advantages. There are a number of ways in which distributed tools
bos@219 201 shine relative to centralised ones.
bos@219 202
bos@219 203 For an individual developer, distributed tools are almost always much
bos@219 204 faster than centralised tools. This is for a simple reason: a
bos@219 205 centralised tool needs to talk over the network for many common
bos@219 206 operations, because most metadata is stored in a single copy on the
bos@219 207 central server. A distributed tool stores all of its metadata
bos@219 208 locally. All else being equal, talking over the network adds overhead
bos@219 209 to a centralised tool. Don't underestimate the value of a snappy,
bos@219 210 responsive tool: you're going to spend a lot of time interacting with
bos@219 211 your revision control software.
bos@219 212
bos@219 213 Distributed tools are indifferent to the vagaries of your server
bos@219 214 infrastructure, again because they replicate metadata to so many
bos@219 215 locations. If you use a centralised system and your server catches
bos@219 216 fire, you'd better hope that your backup media are reliable, and that
bos@219 217 your last backup was recent and actually worked. With a distributed
bos@219 218 tool, you have many backups available on every contributor's computer.
bos@219 219
bos@219 220 The reliability of your network will affect distributed tools far less
bos@219 221 than it will centralised tools. You can't even use a centralised tool
bos@219 222 without a network connection, except for a few highly constrained
bos@219 223 commands. With a distributed tool, if your network connection goes
bos@219 224 down while you're working, you may not even notice. The only thing
bos@219 225 you won't be able to do is talk to repositories on other computers,
bos@219 226 something that is relatively rare compared with local operations. If
bos@219 227 you have a far-flung team of collaborators, this may be significant.
bos@219 228
bos@220 229 \subsection{Advantages for open source projects}
bos@220 230
bos@219 231 If you take a shine to an open source project and decide that you
bos@219 232 would like to start hacking on it, and that project uses a distributed
bos@219 233 revision control tool, you are at once a peer with the people who
bos@219 234 consider themselves the ``core'' of that project. If they publish
bos@219 235 their repositories, you can immediately copy their project history,
bos@219 236 start making changes, and record your work, using the same tools in
bos@219 237 the same ways as insiders. By contrast, with a centralised tool, you
bos@219 238 must use the software in a ``read only'' mode unless someone grants
bos@219 239 you permission to commit changes to their central server. Until then,
bos@219 240 you won't be able to record changes, and your local modifications will
bos@219 241 be at risk of corruption any time you try to update your client's view
bos@219 242 of the repository.
bos@155 243
bos@220 244 \subsubsection{The forking non-problem}
bos@220 245
bos@220 246 It has been suggested that distributed revision control tools pose
bos@220 247 some sort of risk to open source projects because they make it easy to
bos@220 248 ``fork'' the development of a project. A fork happens when there are
bos@220 249 differences in opinion or attitude between groups of developers that
bos@220 250 cause them to decide that they can't work together any longer. Each
bos@220 251 side takes a more or less complete copy of the project's source code,
bos@220 252 and goes off in its own direction.
bos@220 253
bos@220 254 Sometimes the camps in a fork decide to reconcile their differences.
bos@220 255 With a centralised revision control system, the \emph{technical}
bos@220 256 process of reconciliation is painful, and has to be performed largely
bos@220 257 by hand. You have to decide whose revision history is going to
bos@220 258 ``win'', and graft the other team's changes into the tree somehow.
bos@220 259 This usually loses some or all of one side's revision history.
bos@220 260
bos@220 261 What distributed tools do with respect to forking is they make forking
bos@220 262 the \emph{only} way to develop a project. Every single change that
bos@220 263 you make is potentially a fork point. The great strength of this
bos@220 264 approach is that a distributed revision control tool has to be really
bos@220 265 good at \emph{merging} forks, because forks are absolutely
bos@220 266 fundamental: they happen all the time.
bos@220 267
bos@220 268 If every piece of work that everybody does, all the time, is framed in
bos@220 269 terms of forking and merging, then what the open source world refers
bos@220 270 to as a ``fork'' becomes \emph{purely} a social issue. If anything,
bos@220 271 distributed tools \emph{lower} the likelihood of a fork:
bos@220 272 \begin{itemize}
bos@220 273 \item They eliminate the social distinction that centralised tools
bos@220 274 impose: that between insiders (people with commit access) and
bos@220 275 outsiders (people without).
bos@220 276 \item They make it easier to reconcile after a social fork, because
bos@220 277 all that's involved from the perspective of the revision control
bos@220 278 software is just another merge.
bos@220 279 \end{itemize}
bos@220 280
bos@220 281 Some people resist distributed tools because they want to retain tight
bos@220 282 control over their projects, and they believe that centralised tools
bos@220 283 give them this control. However, if you're of this belief, and you
bos@220 284 publish your CVS or Subversion repositories publically, there are
bos@220 285 plenty of tools available that can pull out your entire project's
bos@220 286 history (albeit slowly) and recreate it somewhere that you don't
bos@220 287 control. So while your control in this case is illusory, you are
tktan@263 288 forgoing the ability to fluidly collaborate with whatever people feel
bos@220 289 compelled to mirror and fork your history.
bos@220 290
bos@220 291 \subsection{Advantages for commercial projects}
bos@220 292
bos@220 293 Many commercial projects are undertaken by teams that are scattered
bos@220 294 across the globe. Contributors who are far from a central server will
bos@220 295 see slower command execution and perhaps less reliability. Commercial
bos@220 296 revision control systems attempt to ameliorate these problems with
bos@220 297 remote-site replication add-ons that are typically expensive to buy
bos@220 298 and cantankerous to administer. A distributed system doesn't suffer
bos@220 299 from these problems in the first place. Better yet, you can easily
bos@220 300 set up multiple authoritative servers, say one per site, so that
bos@220 301 there's no redundant communication between repositories over expensive
bos@220 302 long-haul network links.
bos@220 303
bos@220 304 Centralised revision control systems tend to have relatively low
bos@220 305 scalability. It's not unusual for an expensive centralised system to
bos@220 306 fall over under the combined load of just a few dozen concurrent
bos@220 307 users. Once again, the typical response tends to be an expensive and
bos@220 308 clunky replication facility. Since the load on a central server---if
bos@280 309 you have one at all---is many times lower with a distributed
bos@220 310 tool (because all of the data is replicated everywhere), a single
bos@220 311 cheap server can handle the needs of a much larger team, and
bos@220 312 replication to balance load becomes a simple matter of scripting.
bos@220 313
bos@220 314 If you have an employee in the field, troubleshooting a problem at a
bos@220 315 customer's site, they'll benefit from distributed revision control.
bos@220 316 The tool will let them generate custom builds, try different fixes in
bos@220 317 isolation from each other, and search efficiently through history for
bos@220 318 the sources of bugs and regressions in the customer's environment, all
bos@220 319 without needing to connect to your company's network.
bos@219 320
bos@155 321 \section{Why choose Mercurial?}
bos@155 322
bos@221 323 Mercurial has a unique set of properties that make it a particularly
bos@221 324 good choice as a revision control system.
bos@221 325 \begin{itemize}
bos@221 326 \item It is easy to learn and use.
bos@221 327 \item It is lightweight.
bos@221 328 \item It scales excellently.
bos@221 329 \item It is easy to customise.
bos@221 330 \end{itemize}
bos@221 331
bos@221 332 If you are at all familiar with revision control systems, you should
bos@221 333 be able to get up and running with Mercurial in less than five
bos@221 334 minutes. Even if not, it will take no more than a few minutes
bos@221 335 longer. Mercurial's command and feature sets are generally uniform
bos@221 336 and consistent, so you can keep track of a few general rules instead
bos@221 337 of a host of exceptions.
bos@221 338
bos@221 339 On a small project, you can start working with Mercurial in moments.
bos@221 340 Creating new changes and branches; transferring changes around
bos@221 341 (whether locally or over a network); and history and status operations
bos@221 342 are all fast. Mercurial attempts to stay nimble and largely out of
bos@221 343 your way by combining low cognitive overhead with blazingly fast
bos@221 344 operations.
bos@221 345
bos@221 346 The usefulness of Mercurial is not limited to small projects: it is
bos@221 347 used by projects with hundreds to thousands of contributors, each
bos@221 348 containing tens of thousands of files and hundreds of megabytes of
bos@221 349 source code.
bos@221 350
bos@221 351 If the core functionality of Mercurial is not enough for you, it's
bos@221 352 easy to build on. Mercurial is well suited to scripting tasks, and
bos@221 353 its clean internals and implementation in Python make it easy to add
bos@221 354 features in the form of extensions. There are a number of popular and
bos@221 355 useful extensions already available, ranging from helping to identify
bos@221 356 bugs to improving performance.
bos@221 357
bos@221 358 \section{Mercurial compared with other tools}
bos@221 359
bos@221 360 Before you read on, please understand that this section necessarily
bos@221 361 reflects my own experiences, interests, and (dare I say it) biases. I
bos@221 362 have used every one of the revision control tools listed below, in
bos@221 363 most cases for several years at a time.
bos@221 364
bos@280 365
bos@221 366 \subsection{Subversion}
bos@221 367
bos@221 368 Subversion is a popular revision control tool, developed to replace
bos@221 369 CVS. It has a centralised client/server architecture.
bos@221 370
bos@221 371 Subversion and Mercurial have similarly named commands for performing
bos@280 372 the same operations, so if you're familiar with one, it is easy to
bos@280 373 learn to use the other. Both tools are portable to all popular
bos@221 374 operating systems.
bos@221 375
bos@256 376 Subversion lacks a history-aware merge capability, forcing its users
bos@256 377 to manually track exactly which revisions have been merged between
bos@256 378 branches. If users fail to do this, or make mistakes, they face the
bos@256 379 prospect of manually resolving merges with unnecessary conflicts.
bos@280 380 Subversion also fails to merge changes when files or directories are
bos@280 381 renamed. Subversion's poor merge support is its single biggest
bos@280 382 weakness.
bos@256 383
bos@221 384 Mercurial has a substantial performance advantage over Subversion on
bos@221 385 every revision control operation I have benchmarked. I have measured
bos@221 386 its advantage as ranging from a factor of two to a factor of six when
bos@221 387 compared with Subversion~1.4.3's \emph{ra\_local} file store, which is
bos@221 388 the fastest access method available). In more realistic deployments
bos@221 389 involving a network-based store, Subversion will be at a substantially
bos@256 390 larger disadvantage. Because many Subversion commands must talk to
bos@256 391 the server and Subversion does not have useful replication facilities,
bos@280 392 server capacity and network bandwidth become bottlenecks for modestly
bos@280 393 large projects.
bos@280 394
bos@280 395 Additionally, Subversion incurs substantial storage overhead to avoid
bos@280 396 network transactions for a few common operations, such as finding
bos@280 397 modified files (\texttt{status}) and displaying modifications against
bos@280 398 the current revision (\texttt{diff}). As a result, a Subversion
bos@280 399 working copy is often the same size as, or larger than, a Mercurial
bos@280 400 repository and working directory, even though the Mercurial repository
bos@280 401 contains a complete history of the project.
bos@280 402
bos@280 403 Subversion is widely supported by third party tools. Mercurial
bos@280 404 currently lags considerably in this area. This gap is closing,
bos@280 405 however, and indeed some of Mercurial's GUI tools now outshine their
bos@280 406 Subversion equivalents. Like Mercurial, Subversion has an excellent
bos@280 407 user manual.
bos@280 408
bos@280 409 Because Subversion doesn't store revision history on the client, it is
bos@280 410 well suited to managing projects that deal with lots of large, opaque
bos@280 411 binary files. If you check in fifty revisions to an incompressible
bos@280 412 10MB file, Subversion's client-side space usage stays constant The
bos@280 413 space used by any distributed SCM will grow rapidly in proportion to
bos@280 414 the number of revisions, because the differences between each revision
bos@280 415 are large.
bos@280 416
bos@280 417 In addition, it's often difficult or, more usually, impossible to
bos@280 418 merge different versions of a binary file. Subversion's ability to
bos@280 419 let a user lock a file, so that they temporarily have the exclusive
bos@280 420 right to commit changes to it, can be a significant advantage to a
bos@280 421 project where binary files are widely used.
bos@280 422
bos@280 423 Mercurial can import revision history from a Subversion repository.
bos@280 424 It can also export revision history to a Subversion repository. This
bos@280 425 makes it easy to ``test the waters'' and use Mercurial and Subversion
bos@280 426 in parallel before deciding to switch. History conversion is
bos@280 427 incremental, so you can perform an initial conversion, then small
bos@280 428 additional conversions afterwards to bring in new changes.
bos@280 429
bos@221 430
bos@221 431 \subsection{Git}
bos@221 432
bos@221 433 Git is a distributed revision control tool that was developed for
bos@221 434 managing the Linux kernel source tree. Like Mercurial, its early
bos@221 435 design was somewhat influenced by Monotone.
bos@221 436
bos@280 437 Git has a very large command set, with version~1.5.0 providing~139
bos@280 438 individual commands. It has something of a reputation for being
bos@280 439 difficult to learn. Compared to Git, Mercurial has a strong focus on
bos@280 440 simplicity.
bos@280 441
bos@280 442 In terms of performance, Git is extremely fast. In several cases, it
bos@280 443 is faster than Mercurial, at least on Linux, while Mercurial performs
bos@280 444 better on other operations. However, on Windows, the performance and
bos@280 445 general level of support that Git provides is, at the time of writing,
bos@280 446 far behind that of Mercurial.
bos@221 447
bos@221 448 While a Mercurial repository needs no maintenance, a Git repository
bos@221 449 requires frequent manual ``repacks'' of its metadata. Without these,
bos@221 450 performance degrades, while space usage grows rapidly. A server that
bos@221 451 contains many Git repositories that are not rigorously and frequently
bos@221 452 repacked will become heavily disk-bound during backups, and there have
bos@221 453 been instances of daily backups taking far longer than~24 hours as a
bos@221 454 result. A freshly packed Git repository is slightly smaller than a
bos@221 455 Mercurial repository, but an unpacked repository is several orders of
bos@221 456 magnitude larger.
bos@221 457
bos@221 458 The core of Git is written in C. Many Git commands are implemented as
bos@221 459 shell or Perl scripts, and the quality of these scripts varies widely.
bos@280 460 I have encountered several instances where scripts charged along
bos@221 461 blindly in the presence of errors that should have been fatal.
bos@221 462
bos@280 463 Mercurial can import revision history from a Git repository.
bos@280 464
bos@280 465
bos@221 466 \subsection{CVS}
bos@221 467
bos@221 468 CVS is probably the most widely used revision control tool in the
bos@280 469 world. Due to its age and internal untidiness, it has been only
bos@280 470 lightly maintained for many years.
bos@221 471
bos@221 472 It has a centralised client/server architecture. It does not group
bos@221 473 related file changes into atomic commits, making it easy for people to
bos@256 474 ``break the build'': one person can successfully commit part of a
bos@256 475 change and then be blocked by the need for a merge, causing other
bos@256 476 people to see only a portion of the work they intended to do. This
bos@256 477 also affects how you work with project history. If you want to see
bos@256 478 all of the modifications someone made as part of a task, you will need
bos@256 479 to manually inspect the descriptions and timestamps of the changes
bos@256 480 made to each file involved (if you even know what those files were).
bos@256 481
bos@256 482 CVS has a muddled notion of tags and branches that I will not attempt
bos@256 483 to even describe. It does not support renaming of files or
bos@256 484 directories well, making it easy to corrupt a repository. It has
bos@256 485 almost no internal consistency checking capabilities, so it is usually
bos@256 486 not even possible to tell whether or how a repository is corrupt. I
bos@256 487 would not recommend CVS for any project, existing or new.
bos@221 488
bos@221 489 Mercurial can import CVS revision history. However, there are a few
bos@221 490 caveats that apply; these are true of every other revision control
bos@221 491 tool's CVS importer, too. Due to CVS's lack of atomic changes and
bos@221 492 unversioned filesystem hierarchy, it is not possible to reconstruct
bos@221 493 CVS history completely accurately; some guesswork is involved, and
bos@221 494 renames will usually not show up. Because a lot of advanced CVS
bos@221 495 administration has to be done by hand and is hence error-prone, it's
bos@221 496 common for CVS importers to run into multiple problems with corrupted
bos@221 497 repositories (completely bogus revision timestamps and files that have
bos@221 498 remained locked for over a decade are just two of the less interesting
bos@221 499 problems I can recall from personal experience).
bos@221 500
bos@280 501 Mercurial can import revision history from a CVS repository.
bos@280 502
bos@280 503
bos@221 504 \subsection{Commercial tools}
bos@221 505
bos@221 506 Perforce has a centralised client/server architecture, with no
bos@221 507 client-side caching of any data. Unlike modern revision control
bos@221 508 tools, Perforce requires that a user run a command to inform the
bos@221 509 server about every file they intend to edit.
bos@221 510
bos@221 511 The performance of Perforce is quite good for small teams, but it
bos@221 512 falls off rapidly as the number of users grows beyond a few dozen.
bos@221 513 Modestly large Perforce installations require the deployment of
bos@221 514 proxies to cope with the load their users generate.
bos@16 515
bos@280 516
bos@280 517 \subsection{Choosing a revision control tool}
bos@280 518
bos@280 519 With the exception of CVS, all of the tools listed above have unique
bos@280 520 strengths that suit them to particular styles of work. There is no
bos@280 521 single revision control tool that is best in all situations.
bos@280 522
bos@280 523 As an example, Subversion is a good choice for working with frequently
bos@280 524 edited binary files, due to its centralised nature and support for
bos@280 525 file locking. If you're averse to the command line, it currently has
bos@280 526 better GUI support than other free revision control tools. However,
bos@280 527 its poor merging is a substantial liability for busy projects with
bos@280 528 overlapping development.
bos@280 529
bos@280 530 I personally find Mercurial's properties of simplicity, performance,
bos@280 531 and good merge support to be a compelling combination that has served
bos@280 532 me well for several years.
bos@280 533
bos@280 534
bos@280 535 \section{Switching from another tool to Mercurial}
bos@280 536
bos@280 537 Mercurial is bundled with an extension named \hgext{convert}, which
bos@280 538 can incrementally import revision history from several other revision
bos@280 539 control tools. By ``incremental'', I mean that you can convert all of
bos@280 540 a project's history to date in one go, then rerun the conversion later
bos@280 541 to obtain new changes that happened after the initial conversion.
bos@280 542
bos@280 543 The revision control tools supported by \hgext{convert} are as
bos@280 544 follows:
bos@280 545 \begin{itemize}
bos@280 546 \item Subversion
bos@280 547 \item CVS
bos@280 548 \item Git
bos@280 549 \item Darcs
bos@280 550 \end{itemize}
bos@280 551
bos@280 552 In addition, \hgext{convert} can export changes from Mercurial to
bos@280 553 Subversion. This makes it possible to try Subversion and Mercurial in
bos@280 554 parallel before committing to a switchover, without risking the loss
bos@280 555 of any work.
bos@280 556
bos@280 557 The \hgxcmd{conver}{convert} command is easy to use. Simply point it
bos@280 558 at the path or URL of the source repository, optionally give it the
bos@280 559 name of the destination repository, and it will start working. After
bos@280 560 the initial conversion, just run the same command again to import new
bos@280 561 changes.
bos@280 562
bos@280 563
bos@16 564 %%% Local Variables:
bos@16 565 %%% mode: latex
bos@16 566 %%% TeX-master: "00book"
bos@16 567 %%% End: