hgbook

annotate fr/intro.tex @ 924:6a2ccedd1e4c

Work in progress on intro.tex...
author Romain PELISSE <romain.pelisse@atosorigin.com>
date Fri Feb 06 20:49:16 2009 +0100 (2009-02-06)
parents 0d08ac613527
children 730d912ef843
rev   line source
bos@16 1 \chapter{Introduction}
bos@16 2 \label{chap:intro}
bos@16 3
romain@923 4 \section{A propros de la gestion source}
romain@923 5
romain@923 6 La gestion de source est un processus permettant de gérer différentes
romain@923 7 version de la même information. Dans sa forme la plus simple, c'est
romain@923 8 quelquechose que tout le monde fait manuellement : quand vous modifiez
romain@923 9 un fichier, vous le sauvegarder sous un nouveau nom contenant un numéro,
romain@923 10 à chaque fois plus grand la précédente version.
romain@923 11
romain@923 12 Ce genre de gestion de version manuel est cependant sujette facilement
romain@923 13 à des erreurs, ainsi, depuis longtemps, des logiciels existent pour
romain@923 14 adresser cette problématique. Les premiers outils de gestion de source
romain@923 15 étaient destinés à aider un seul utilisateur, à automatiser la gestion
romain@923 16 des versions d'un seulf fichier. Dans les dernières décades, cette cilble
romain@923 17 a largement était agrandie, ils gèrent désormais de multiple fichiers, et
romain@923 18 aident un grand nombre de personnes à travailler ensemble. Le outils les
romain@923 19 plus modernes n'ont aucune difficultés à gérer plusieurs milliers de
romain@923 20 personnes travaillant ensemble sur des projets regroupant plusieurs
romain@923 21 centaines de milliers de fichiers.
romain@923 22
romain@923 23 \subsection{Pourquoi utiliser un gestionnaire de source ?}
romain@923 24
romain@923 25 Il y a de nombreuse raisons pour que vous ou votre équipe souhaitiez
romain@923 26 utiliser un outil automatisant la gestion de version pour votre projet.
bos@217 27 \begin{itemize}
romain@923 28 \item L'outil se chargera de suivre l'évolution de votre projet, sans
romain@923 29 que vous ayez à le faire. Pour chaque modification, vous aurez à votre
romain@923 30 disposition un journal indiquant \emph{qui} a faient quoi, \emph{pourquoi}
romain@923 31 ils l'ont fait, \emph{quand} ils l'ont fait, et \emph{ce} qu'ils ont
romain@923 32 modifiés.
romain@923 33 \item Quand vous travaillez avec d'autres personnes, les logiciels de
romain@923 34 gestion de source facilite le travail collaboratif. Par exemple, quand
romain@923 35 plusieurs personnes font, plus ou moins simultannéement, des modifications
romain@923 36 incompatibles, le logiciel vous aidera à identifier et résoudre les conflits.
romain@924 37 \item L'outil vous aidera à réparer vos erreurs. Si vous effectuez un changement
romain@924 38 qui se révèlera être une erreur, vous pourrez revenir fiablement à une version
romain@924 39 antérieur d'une fichier ou même d'un ensemble de fichier. En fait, un outil de
romain@924 40 gestion de source \emph{vraiment} efficace vous permettra d'identifier à quel
romain@924 41 moment le problème est apparu (voir la section~\ref{sec:undo:bisect} pour plus
romain@924 42 de détails).
romain@924 43 \item L'outil vous permettra aussi de travailler sur plusieurs versions différentes
romain@924 44 de votre projet et à gérer l'écart entre chaque.
bos@217 45 \end{itemize}
romain@924 46 La plupart de ces raisons ont autant d'importances---du moins en théorie--- que
romain@924 47 vous travailliez sur un projet pour vous, ou avec une centaine d'autres
romain@924 48 personnes.
romain@924 49
romain@924 50 Une question fondamental à propos des outils de gestion de source, qu'il s'agisse
romain@924 51 du projet d'une personne ou d'une grande équipe, est quelles sont ses
romain@924 52 \emph{avantages} par rapport à ses \emph{coût}. Un outil qui est difficile à
romain@924 53 utiliser ou à comprendre exigera un effort d'adoption.
romain@924 54
romain@924 55 Un projet de cinq milles personnnes s'effondrera très certainement de lui même
romain@924 56 sans aucun processus et outil de gestion de source. Dans ce cas, le coût
romain@924 57 d'utilisation d'un logiciel de gestion de source est dérisoire, puisque
romain@924 58 \emph{sans}, l'échec est presque garanti.
romain@924 59
romain@924 60 D'un autre coté, un ``rapide hack'' d'une personnne peut sembler un contexte
romain@924 61 bien pauvre pour utiliser un outil de gestion de source, car, bien évidement
romain@924 62 le coût d'utilisation dépasse le coût total du projet. N'est ce pas ?
romain@924 63
romain@924 64 Mercurial supporte ces \emph{deux} échelles de travail. Vous pouvez apprendre
romain@924 65 les bases en juste quelques minutes, et, grâce à sa performance, vous pouvez
romain@924 66 l'utiliser avec facilité sur le plus petit des projets. Cette simplicité
romain@924 67 signifie que vous n'avez pas de concepts obscures ou de séquence de commandes
romain@924 68 défiant l'imagination, complètement décorrelé de \emph{ce que vous êtes
romain@924 69 vraiment entrain de faire}. En même temps, ces mêmes performances et sa
romain@924 70 nature ``peer-to-peer'' vous permet d'augmenter, sans difficulté, son
romain@924 71 utilisation à de très grand projet.
romain@924 72
romain@924 73 Aucun outil de gestion de source ne peut sauver un projet mal mené, mais un
romain@924 74 bon outil peut faire une grande différence dans la fluidité avec lequel
romain@924 75 vous pourrez travailler avec.
romain@924 76
romain@924 77 \subsection{Les multiples noms de la gestion de source}
romain@924 78
romain@924 79 La gestion de source est un domaine divers, tellement qu'il n'existe pas
romain@924 80 une seul nom ou acronyme pour le désigner. Voilà quelqu'uns des noms ou
romain@924 81 acronymes que vous rencontrerez le plus souvent:
bos@217 82 \begin{itemize}
romain@924 83 \item \textit{Revision control (RCS)} ;
romain@924 84 \item Software configuration management (SCM), ou \textit{configuration management} ;
romain@924 85 \item \textit{Source code management} ;
romain@924 86 \item \textit{Source code control}, ou \textit{source control} ;
romain@924 87 \item \textit{Version control (VCS)}.
bos@217 88 \end{itemize}
romain@924 89
romain@924 90 \notebox {
romain@924 91 Note du traducteur : J'ai conservé la liste des noms en anglais pour des raisons de commodité (ils sont plus ``googelable''). J'ai choisi de conserver le terme ``gestion de sources'' comme traduction unique dans l'ensemble du document.
romain@924 92
romain@924 93 En outre, j'ai opté pour conserver l'ensemble des opérations de Mercurial (commit, push, pull,...) en anglais, là aussi pour faciliter la lecture d'autres documents en anglais, et
romain@924 94 aussi son utilisation.
romain@924 95 }
romain@924 96
romain@924 97 Certains personnes prétendent que ces termes ont en fait des sens
romain@924 98 différents mais en pratique ils se recouvrent tellement qu'il n'y a pas
romain@924 99 réellement de manière pertinente de les distinguer.
romain@924 100
romain@924 101 \section{Une courte histoire de la gestion de source}
romain@924 102
romain@924 103 Le plus célèbre des anciens outils de gestion de source est \textit{SCCS (Source
romain@924 104 Code Control System)}, que Marc Rochkind conçu dans les laboratoire de recherche de Bell
romain@924 105 (\textit{Bell Labs}), dans le début des années 70. \textit{SCCS} ne fonctionner que sur des fichiers individuels, et demandait à personne travaillant sur le projet d'avoir un accès à un répertoire de travail commun, sur un unique système.
romain@924 106 Seulement une personne pouvait modifier un fichier au même moment, ce fonctionnement était assuré par l'utilisation de verrou (``lock''). Il était courant que des personnes ne vérouille
romain@924 107 des fichiers, et plus tard, oublie de le dévérouiller; empêchant n'importe qui d'autre de
romain@924 108 travailler sur ces fichiers sans l'aide de l'administrateur...
romain@924 109
romain@924 110 Walter Tichy a développé une alternative libre à \textit{SCCS} au début des années 80, qu'il
romain@924 111 nomma \textit{RSC (Revison Control System)}. Comme \textit{SCCS}, \textit{RCS}
romain@924 112 demander aux développeurs de travailler sur le même répertoire partagé, et de vérouiller les
romain@924 113 fichiers pour se prémunir de tout conflit issue de modifications concurrentes.
romain@924 114
romain@924 115 Un peu plus tard dans les années 1980, Dick Grune utilisa \textit{RCS} comme une brique de base pour un ensemble de scripts \textit{shell} qu'il intitula cmt, avant de la renommer en \textit{CVS (Concurrent Versions System)}. La grande innovation de CVS était que les développeurs pouvaient travailler simultanéement and indépendament dans leur propre espace de travail. Ces espaces de travail privés assuraient que les développeurs ne se marche mutuellement sur les pieds, comme c'était souvent le cas avec RCS et SCCS. Chaque développeur disposait donc de sa copie de tout les fichiers du projet, et ils pouvaient donc librement les modifier. Ils devaient néanmoins effectuer la ``fusion'' (\textit{``merge''}) de leur fichiers, avant d'effectuer le ``commit'' de leur modification sur le dépôt central.
bos@218 116
bos@218 117 Brian Berliner took Grune's original scripts and rewrote them in~C,
bos@218 118 releasing in 1989 the code that has since developed into the modern
bos@218 119 version of CVS. CVS subsequently acquired the ability to operate over
bos@218 120 a network connection, giving it a client/server architecture. CVS's
bos@218 121 architecture is centralised; only the server has a copy of the history
bos@218 122 of the project. Client workspaces just contain copies of recent
bos@218 123 versions of the project's files, and a little metadata to tell them
bos@218 124 where the server is. CVS has been enormously successful; it is
bos@218 125 probably the world's most widely used revision control system.
bos@218 126
bos@218 127 In the early 1990s, Sun Microsystems developed an early distributed
bos@218 128 revision control system, called TeamWare. A TeamWare workspace
bos@218 129 contains a complete copy of the project's history. TeamWare has no
bos@218 130 notion of a central repository. (CVS relied upon RCS for its history
bos@218 131 storage; TeamWare used SCCS.)
bos@218 132
bos@218 133 As the 1990s progressed, awareness grew of a number of problems with
bos@218 134 CVS. It records simultaneous changes to multiple files individually,
bos@218 135 instead of grouping them together as a single logically atomic
bos@218 136 operation. It does not manage its file hierarchy well; it is easy to
bos@218 137 make a mess of a repository by renaming files and directories. Worse,
bos@218 138 its source code is difficult to read and maintain, which made the
bos@218 139 ``pain level'' of fixing these architectural problems prohibitive.
bos@218 140
bos@218 141 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on
bos@218 142 CVS, started a project to replace it with a tool that would have a
bos@218 143 better architecture and cleaner code. The result, Subversion, does
bos@218 144 not stray from CVS's centralised client/server model, but it adds
bos@218 145 multi-file atomic commits, better namespace management, and a number
bos@218 146 of other features that make it a generally better tool than CVS.
bos@218 147 Since its initial release, it has rapidly grown in popularity.
bos@218 148
bos@218 149 More or less simultaneously, Graydon Hoare began working on an
bos@218 150 ambitious distributed revision control system that he named Monotone.
bos@218 151 While Monotone addresses many of CVS's design flaws and has a
bos@218 152 peer-to-peer architecture, it goes beyond earlier (and subsequent)
bos@218 153 revision control tools in a number of innovative ways. It uses
bos@218 154 cryptographic hashes as identifiers, and has an integral notion of
bos@218 155 ``trust'' for code from different sources.
bos@218 156
bos@218 157 Mercurial began life in 2005. While a few aspects of its design are
bos@218 158 influenced by Monotone, Mercurial focuses on ease of use, high
bos@218 159 performance, and scalability to very large projects.
bos@155 160
bos@219 161 \section{Trends in revision control}
bos@219 162
bos@219 163 There has been an unmistakable trend in the development and use of
bos@219 164 revision control tools over the past four decades, as people have
bos@219 165 become familiar with the capabilities of their tools and constrained
bos@219 166 by their limitations.
bos@219 167
bos@219 168 The first generation began by managing single files on individual
bos@219 169 computers. Although these tools represented a huge advance over
bos@219 170 ad-hoc manual revision control, their locking model and reliance on a
bos@219 171 single computer limited them to small, tightly-knit teams.
bos@219 172
bos@219 173 The second generation loosened these constraints by moving to
bos@219 174 network-centered architectures, and managing entire projects at a
bos@219 175 time. As projects grew larger, they ran into new problems. With
bos@219 176 clients needing to talk to servers very frequently, server scaling
bos@219 177 became an issue for large projects. An unreliable network connection
bos@219 178 could prevent remote users from being able to talk to the server at
bos@219 179 all. As open source projects started making read-only access
bos@219 180 available anonymously to anyone, people without commit privileges
bos@219 181 found that they could not use the tools to interact with a project in
bos@219 182 a natural way, as they could not record their changes.
bos@219 183
bos@219 184 The current generation of revision control tools is peer-to-peer in
bos@219 185 nature. All of these systems have dropped the dependency on a single
bos@219 186 central server, and allow people to distribute their revision control
bos@219 187 data to where it's actually needed. Collaboration over the Internet
bos@219 188 has moved from constrained by technology to a matter of choice and
bos@219 189 consensus. Modern tools can operate offline indefinitely and
bos@219 190 autonomously, with a network connection only needed when syncing
bos@219 191 changes with another repository.
bos@219 192
bos@219 193 \section{A few of the advantages of distributed revision control}
bos@219 194
bos@219 195 Even though distributed revision control tools have for several years
bos@219 196 been as robust and usable as their previous-generation counterparts,
bos@219 197 people using older tools have not yet necessarily woken up to their
bos@219 198 advantages. There are a number of ways in which distributed tools
bos@219 199 shine relative to centralised ones.
bos@219 200
bos@219 201 For an individual developer, distributed tools are almost always much
bos@219 202 faster than centralised tools. This is for a simple reason: a
bos@219 203 centralised tool needs to talk over the network for many common
bos@219 204 operations, because most metadata is stored in a single copy on the
bos@219 205 central server. A distributed tool stores all of its metadata
bos@219 206 locally. All else being equal, talking over the network adds overhead
bos@219 207 to a centralised tool. Don't underestimate the value of a snappy,
bos@219 208 responsive tool: you're going to spend a lot of time interacting with
bos@219 209 your revision control software.
bos@219 210
bos@219 211 Distributed tools are indifferent to the vagaries of your server
bos@219 212 infrastructure, again because they replicate metadata to so many
bos@219 213 locations. If you use a centralised system and your server catches
bos@219 214 fire, you'd better hope that your backup media are reliable, and that
bos@219 215 your last backup was recent and actually worked. With a distributed
bos@219 216 tool, you have many backups available on every contributor's computer.
bos@219 217
bos@219 218 The reliability of your network will affect distributed tools far less
bos@219 219 than it will centralised tools. You can't even use a centralised tool
bos@219 220 without a network connection, except for a few highly constrained
bos@219 221 commands. With a distributed tool, if your network connection goes
bos@219 222 down while you're working, you may not even notice. The only thing
bos@219 223 you won't be able to do is talk to repositories on other computers,
bos@219 224 something that is relatively rare compared with local operations. If
bos@219 225 you have a far-flung team of collaborators, this may be significant.
bos@219 226
bos@220 227 \subsection{Advantages for open source projects}
bos@220 228
bos@219 229 If you take a shine to an open source project and decide that you
bos@219 230 would like to start hacking on it, and that project uses a distributed
bos@219 231 revision control tool, you are at once a peer with the people who
bos@219 232 consider themselves the ``core'' of that project. If they publish
bos@219 233 their repositories, you can immediately copy their project history,
bos@219 234 start making changes, and record your work, using the same tools in
bos@219 235 the same ways as insiders. By contrast, with a centralised tool, you
bos@219 236 must use the software in a ``read only'' mode unless someone grants
bos@219 237 you permission to commit changes to their central server. Until then,
bos@219 238 you won't be able to record changes, and your local modifications will
bos@219 239 be at risk of corruption any time you try to update your client's view
bos@219 240 of the repository.
bos@155 241
bos@220 242 \subsubsection{The forking non-problem}
bos@220 243
bos@220 244 It has been suggested that distributed revision control tools pose
bos@220 245 some sort of risk to open source projects because they make it easy to
bos@220 246 ``fork'' the development of a project. A fork happens when there are
bos@220 247 differences in opinion or attitude between groups of developers that
bos@220 248 cause them to decide that they can't work together any longer. Each
bos@220 249 side takes a more or less complete copy of the project's source code,
bos@220 250 and goes off in its own direction.
bos@220 251
bos@220 252 Sometimes the camps in a fork decide to reconcile their differences.
bos@220 253 With a centralised revision control system, the \emph{technical}
bos@220 254 process of reconciliation is painful, and has to be performed largely
bos@220 255 by hand. You have to decide whose revision history is going to
bos@220 256 ``win'', and graft the other team's changes into the tree somehow.
bos@220 257 This usually loses some or all of one side's revision history.
bos@220 258
bos@220 259 What distributed tools do with respect to forking is they make forking
bos@220 260 the \emph{only} way to develop a project. Every single change that
bos@220 261 you make is potentially a fork point. The great strength of this
bos@220 262 approach is that a distributed revision control tool has to be really
bos@220 263 good at \emph{merging} forks, because forks are absolutely
bos@220 264 fundamental: they happen all the time.
bos@220 265
bos@220 266 If every piece of work that everybody does, all the time, is framed in
bos@220 267 terms of forking and merging, then what the open source world refers
bos@220 268 to as a ``fork'' becomes \emph{purely} a social issue. If anything,
bos@220 269 distributed tools \emph{lower} the likelihood of a fork:
bos@220 270 \begin{itemize}
bos@220 271 \item They eliminate the social distinction that centralised tools
bos@220 272 impose: that between insiders (people with commit access) and
bos@220 273 outsiders (people without).
bos@220 274 \item They make it easier to reconcile after a social fork, because
bos@220 275 all that's involved from the perspective of the revision control
bos@220 276 software is just another merge.
bos@220 277 \end{itemize}
bos@220 278
bos@220 279 Some people resist distributed tools because they want to retain tight
bos@220 280 control over their projects, and they believe that centralised tools
bos@220 281 give them this control. However, if you're of this belief, and you
bos@220 282 publish your CVS or Subversion repositories publically, there are
bos@220 283 plenty of tools available that can pull out your entire project's
bos@220 284 history (albeit slowly) and recreate it somewhere that you don't
bos@220 285 control. So while your control in this case is illusory, you are
tktan@263 286 forgoing the ability to fluidly collaborate with whatever people feel
bos@220 287 compelled to mirror and fork your history.
bos@220 288
bos@220 289 \subsection{Advantages for commercial projects}
bos@220 290
bos@220 291 Many commercial projects are undertaken by teams that are scattered
bos@220 292 across the globe. Contributors who are far from a central server will
bos@220 293 see slower command execution and perhaps less reliability. Commercial
bos@220 294 revision control systems attempt to ameliorate these problems with
bos@220 295 remote-site replication add-ons that are typically expensive to buy
bos@220 296 and cantankerous to administer. A distributed system doesn't suffer
bos@220 297 from these problems in the first place. Better yet, you can easily
bos@220 298 set up multiple authoritative servers, say one per site, so that
bos@220 299 there's no redundant communication between repositories over expensive
bos@220 300 long-haul network links.
bos@220 301
bos@220 302 Centralised revision control systems tend to have relatively low
bos@220 303 scalability. It's not unusual for an expensive centralised system to
bos@220 304 fall over under the combined load of just a few dozen concurrent
bos@220 305 users. Once again, the typical response tends to be an expensive and
bos@220 306 clunky replication facility. Since the load on a central server---if
bos@280 307 you have one at all---is many times lower with a distributed
bos@220 308 tool (because all of the data is replicated everywhere), a single
bos@220 309 cheap server can handle the needs of a much larger team, and
bos@220 310 replication to balance load becomes a simple matter of scripting.
bos@220 311
bos@220 312 If you have an employee in the field, troubleshooting a problem at a
bos@220 313 customer's site, they'll benefit from distributed revision control.
bos@220 314 The tool will let them generate custom builds, try different fixes in
bos@220 315 isolation from each other, and search efficiently through history for
bos@220 316 the sources of bugs and regressions in the customer's environment, all
bos@220 317 without needing to connect to your company's network.
bos@219 318
bos@155 319 \section{Why choose Mercurial?}
bos@155 320
bos@221 321 Mercurial has a unique set of properties that make it a particularly
bos@221 322 good choice as a revision control system.
bos@221 323 \begin{itemize}
bos@221 324 \item It is easy to learn and use.
bos@221 325 \item It is lightweight.
bos@221 326 \item It scales excellently.
bos@221 327 \item It is easy to customise.
bos@221 328 \end{itemize}
bos@221 329
bos@221 330 If you are at all familiar with revision control systems, you should
bos@221 331 be able to get up and running with Mercurial in less than five
bos@221 332 minutes. Even if not, it will take no more than a few minutes
bos@221 333 longer. Mercurial's command and feature sets are generally uniform
bos@221 334 and consistent, so you can keep track of a few general rules instead
bos@221 335 of a host of exceptions.
bos@221 336
bos@221 337 On a small project, you can start working with Mercurial in moments.
bos@221 338 Creating new changes and branches; transferring changes around
bos@221 339 (whether locally or over a network); and history and status operations
bos@221 340 are all fast. Mercurial attempts to stay nimble and largely out of
bos@221 341 your way by combining low cognitive overhead with blazingly fast
bos@221 342 operations.
bos@221 343
bos@221 344 The usefulness of Mercurial is not limited to small projects: it is
bos@221 345 used by projects with hundreds to thousands of contributors, each
bos@221 346 containing tens of thousands of files and hundreds of megabytes of
bos@221 347 source code.
bos@221 348
bos@221 349 If the core functionality of Mercurial is not enough for you, it's
bos@221 350 easy to build on. Mercurial is well suited to scripting tasks, and
bos@221 351 its clean internals and implementation in Python make it easy to add
bos@221 352 features in the form of extensions. There are a number of popular and
bos@221 353 useful extensions already available, ranging from helping to identify
bos@221 354 bugs to improving performance.
bos@221 355
bos@221 356 \section{Mercurial compared with other tools}
bos@221 357
bos@221 358 Before you read on, please understand that this section necessarily
bos@221 359 reflects my own experiences, interests, and (dare I say it) biases. I
bos@221 360 have used every one of the revision control tools listed below, in
bos@221 361 most cases for several years at a time.
bos@221 362
bos@280 363
bos@221 364 \subsection{Subversion}
bos@221 365
bos@221 366 Subversion is a popular revision control tool, developed to replace
bos@221 367 CVS. It has a centralised client/server architecture.
bos@221 368
bos@221 369 Subversion and Mercurial have similarly named commands for performing
bos@280 370 the same operations, so if you're familiar with one, it is easy to
bos@280 371 learn to use the other. Both tools are portable to all popular
bos@221 372 operating systems.
bos@221 373
bos@315 374 Prior to version 1.5, Subversion had no useful support for merges.
bos@315 375 At the time of writing, its merge tracking capability is new, and known to be
bos@315 376 \href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated
bos@315 377 and buggy}.
bos@256 378
bos@221 379 Mercurial has a substantial performance advantage over Subversion on
bos@221 380 every revision control operation I have benchmarked. I have measured
bos@221 381 its advantage as ranging from a factor of two to a factor of six when
bos@221 382 compared with Subversion~1.4.3's \emph{ra\_local} file store, which is
simon@313 383 the fastest access method available. In more realistic deployments
bos@221 384 involving a network-based store, Subversion will be at a substantially
bos@256 385 larger disadvantage. Because many Subversion commands must talk to
bos@256 386 the server and Subversion does not have useful replication facilities,
bos@280 387 server capacity and network bandwidth become bottlenecks for modestly
bos@280 388 large projects.
bos@280 389
bos@280 390 Additionally, Subversion incurs substantial storage overhead to avoid
bos@280 391 network transactions for a few common operations, such as finding
bos@280 392 modified files (\texttt{status}) and displaying modifications against
bos@280 393 the current revision (\texttt{diff}). As a result, a Subversion
bos@280 394 working copy is often the same size as, or larger than, a Mercurial
bos@280 395 repository and working directory, even though the Mercurial repository
bos@280 396 contains a complete history of the project.
bos@280 397
bos@280 398 Subversion is widely supported by third party tools. Mercurial
bos@280 399 currently lags considerably in this area. This gap is closing,
bos@280 400 however, and indeed some of Mercurial's GUI tools now outshine their
bos@280 401 Subversion equivalents. Like Mercurial, Subversion has an excellent
bos@280 402 user manual.
bos@280 403
bos@280 404 Because Subversion doesn't store revision history on the client, it is
bos@280 405 well suited to managing projects that deal with lots of large, opaque
bos@280 406 binary files. If you check in fifty revisions to an incompressible
bos@280 407 10MB file, Subversion's client-side space usage stays constant The
bos@280 408 space used by any distributed SCM will grow rapidly in proportion to
bos@280 409 the number of revisions, because the differences between each revision
bos@280 410 are large.
bos@280 411
bos@280 412 In addition, it's often difficult or, more usually, impossible to
bos@280 413 merge different versions of a binary file. Subversion's ability to
bos@280 414 let a user lock a file, so that they temporarily have the exclusive
bos@280 415 right to commit changes to it, can be a significant advantage to a
bos@280 416 project where binary files are widely used.
bos@280 417
bos@280 418 Mercurial can import revision history from a Subversion repository.
bos@280 419 It can also export revision history to a Subversion repository. This
bos@280 420 makes it easy to ``test the waters'' and use Mercurial and Subversion
bos@280 421 in parallel before deciding to switch. History conversion is
bos@280 422 incremental, so you can perform an initial conversion, then small
bos@280 423 additional conversions afterwards to bring in new changes.
bos@280 424
bos@221 425
bos@221 426 \subsection{Git}
bos@221 427
bos@221 428 Git is a distributed revision control tool that was developed for
bos@221 429 managing the Linux kernel source tree. Like Mercurial, its early
bos@221 430 design was somewhat influenced by Monotone.
bos@221 431
bos@280 432 Git has a very large command set, with version~1.5.0 providing~139
bos@280 433 individual commands. It has something of a reputation for being
bos@280 434 difficult to learn. Compared to Git, Mercurial has a strong focus on
bos@280 435 simplicity.
bos@280 436
bos@280 437 In terms of performance, Git is extremely fast. In several cases, it
bos@280 438 is faster than Mercurial, at least on Linux, while Mercurial performs
bos@280 439 better on other operations. However, on Windows, the performance and
bos@280 440 general level of support that Git provides is, at the time of writing,
bos@280 441 far behind that of Mercurial.
bos@221 442
bos@221 443 While a Mercurial repository needs no maintenance, a Git repository
bos@221 444 requires frequent manual ``repacks'' of its metadata. Without these,
bos@221 445 performance degrades, while space usage grows rapidly. A server that
bos@221 446 contains many Git repositories that are not rigorously and frequently
bos@221 447 repacked will become heavily disk-bound during backups, and there have
bos@221 448 been instances of daily backups taking far longer than~24 hours as a
bos@221 449 result. A freshly packed Git repository is slightly smaller than a
bos@221 450 Mercurial repository, but an unpacked repository is several orders of
bos@221 451 magnitude larger.
bos@221 452
bos@221 453 The core of Git is written in C. Many Git commands are implemented as
bos@221 454 shell or Perl scripts, and the quality of these scripts varies widely.
bos@280 455 I have encountered several instances where scripts charged along
bos@221 456 blindly in the presence of errors that should have been fatal.
bos@221 457
bos@280 458 Mercurial can import revision history from a Git repository.
bos@280 459
bos@280 460
bos@221 461 \subsection{CVS}
bos@221 462
bos@221 463 CVS is probably the most widely used revision control tool in the
bos@280 464 world. Due to its age and internal untidiness, it has been only
bos@280 465 lightly maintained for many years.
bos@221 466
bos@221 467 It has a centralised client/server architecture. It does not group
bos@221 468 related file changes into atomic commits, making it easy for people to
bos@256 469 ``break the build'': one person can successfully commit part of a
bos@256 470 change and then be blocked by the need for a merge, causing other
bos@256 471 people to see only a portion of the work they intended to do. This
bos@256 472 also affects how you work with project history. If you want to see
bos@256 473 all of the modifications someone made as part of a task, you will need
bos@256 474 to manually inspect the descriptions and timestamps of the changes
bos@256 475 made to each file involved (if you even know what those files were).
bos@256 476
bos@256 477 CVS has a muddled notion of tags and branches that I will not attempt
bos@256 478 to even describe. It does not support renaming of files or
bos@256 479 directories well, making it easy to corrupt a repository. It has
bos@256 480 almost no internal consistency checking capabilities, so it is usually
bos@256 481 not even possible to tell whether or how a repository is corrupt. I
bos@256 482 would not recommend CVS for any project, existing or new.
bos@221 483
bos@221 484 Mercurial can import CVS revision history. However, there are a few
bos@221 485 caveats that apply; these are true of every other revision control
bos@221 486 tool's CVS importer, too. Due to CVS's lack of atomic changes and
bos@221 487 unversioned filesystem hierarchy, it is not possible to reconstruct
bos@221 488 CVS history completely accurately; some guesswork is involved, and
bos@221 489 renames will usually not show up. Because a lot of advanced CVS
bos@221 490 administration has to be done by hand and is hence error-prone, it's
bos@221 491 common for CVS importers to run into multiple problems with corrupted
bos@221 492 repositories (completely bogus revision timestamps and files that have
bos@221 493 remained locked for over a decade are just two of the less interesting
bos@221 494 problems I can recall from personal experience).
bos@221 495
bos@280 496 Mercurial can import revision history from a CVS repository.
bos@280 497
bos@280 498
bos@221 499 \subsection{Commercial tools}
bos@221 500
bos@221 501 Perforce has a centralised client/server architecture, with no
bos@221 502 client-side caching of any data. Unlike modern revision control
bos@221 503 tools, Perforce requires that a user run a command to inform the
bos@221 504 server about every file they intend to edit.
bos@221 505
bos@221 506 The performance of Perforce is quite good for small teams, but it
bos@221 507 falls off rapidly as the number of users grows beyond a few dozen.
bos@221 508 Modestly large Perforce installations require the deployment of
bos@221 509 proxies to cope with the load their users generate.
bos@16 510
bos@280 511
bos@280 512 \subsection{Choosing a revision control tool}
bos@280 513
bos@280 514 With the exception of CVS, all of the tools listed above have unique
bos@280 515 strengths that suit them to particular styles of work. There is no
bos@280 516 single revision control tool that is best in all situations.
bos@280 517
bos@280 518 As an example, Subversion is a good choice for working with frequently
bos@280 519 edited binary files, due to its centralised nature and support for
bos@318 520 file locking.
bos@280 521
bos@280 522 I personally find Mercurial's properties of simplicity, performance,
bos@280 523 and good merge support to be a compelling combination that has served
bos@280 524 me well for several years.
bos@280 525
bos@280 526
bos@280 527 \section{Switching from another tool to Mercurial}
bos@280 528
bos@280 529 Mercurial is bundled with an extension named \hgext{convert}, which
bos@280 530 can incrementally import revision history from several other revision
bos@280 531 control tools. By ``incremental'', I mean that you can convert all of
bos@280 532 a project's history to date in one go, then rerun the conversion later
bos@280 533 to obtain new changes that happened after the initial conversion.
bos@280 534
bos@280 535 The revision control tools supported by \hgext{convert} are as
bos@280 536 follows:
bos@280 537 \begin{itemize}
bos@280 538 \item Subversion
bos@280 539 \item CVS
bos@280 540 \item Git
bos@280 541 \item Darcs
bos@280 542 \end{itemize}
bos@280 543
bos@280 544 In addition, \hgext{convert} can export changes from Mercurial to
bos@280 545 Subversion. This makes it possible to try Subversion and Mercurial in
bos@280 546 parallel before committing to a switchover, without risking the loss
bos@280 547 of any work.
bos@280 548
bos@280 549 The \hgxcmd{conver}{convert} command is easy to use. Simply point it
bos@280 550 at the path or URL of the source repository, optionally give it the
bos@280 551 name of the destination repository, and it will start working. After
bos@280 552 the initial conversion, just run the same command again to import new
bos@280 553 changes.
bos@280 554
bos@280 555
bos@16 556 %%% Local Variables:
bos@16 557 %%% mode: latex
bos@16 558 %%% TeX-master: "00book"
bos@16 559 %%% End: