hgbook: fr/intro.tex annotate

hgbook

annotate fr/intro.tex @ 924:6a2ccedd1e4c

Work in progress on intro.tex...

author	Romain PELISSE <romain.pelisse@atosorigin.com>
date	Fri Feb 06 20:49:16 2009 +0100 (2009-02-06)
parents	0d08ac613527
children	730d912ef843

rev	line source
bos@16	1 \chapter{Introduction}
bos@16	2 \label{chap:intro}
bos@16	3
romain@923	4 \section{A propros de la gestion source}
romain@923	5
romain@923	6 La gestion de source est un processus permettant de gérer différentes
romain@923	7 version de la même information. Dans sa forme la plus simple, c'est
romain@923	8 quelquechose que tout le monde fait manuellement : quand vous modifiez
romain@923	9 un fichier, vous le sauvegarder sous un nouveau nom contenant un numéro,
romain@923	10 à chaque fois plus grand la précédente version.
romain@923	11
romain@923	12 Ce genre de gestion de version manuel est cependant sujette facilement
romain@923	13 à des erreurs, ainsi, depuis longtemps, des logiciels existent pour
romain@923	14 adresser cette problématique. Les premiers outils de gestion de source
romain@923	15 étaient destinés à aider un seul utilisateur, à automatiser la gestion
romain@923	16 des versions d'un seulf fichier. Dans les dernières décades, cette cilble
romain@923	17 a largement était agrandie, ils gèrent désormais de multiple fichiers, et
romain@923	18 aident un grand nombre de personnes à travailler ensemble. Le outils les
romain@923	19 plus modernes n'ont aucune difficultés à gérer plusieurs milliers de
romain@923	20 personnes travaillant ensemble sur des projets regroupant plusieurs
romain@923	21 centaines de milliers de fichiers.
romain@923	22
romain@923	23 \subsection{Pourquoi utiliser un gestionnaire de source ?}
romain@923	24
romain@923	25 Il y a de nombreuse raisons pour que vous ou votre équipe souhaitiez
romain@923	26 utiliser un outil automatisant la gestion de version pour votre projet.
bos@217	27 \begin{itemize}
romain@923	28 \item L'outil se chargera de suivre l'évolution de votre projet, sans
romain@923	29 que vous ayez à le faire. Pour chaque modification, vous aurez à votre
romain@923	30 disposition un journal indiquant \emph{qui} a faient quoi, \emph{pourquoi}
romain@923	31 ils l'ont fait, \emph{quand} ils l'ont fait, et \emph{ce} qu'ils ont
romain@923	32 modifiés.
romain@923	33 \item Quand vous travaillez avec d'autres personnes, les logiciels de
romain@923	34 gestion de source facilite le travail collaboratif. Par exemple, quand
romain@923	35 plusieurs personnes font, plus ou moins simultannéement, des modifications
romain@923	36 incompatibles, le logiciel vous aidera à identifier et résoudre les conflits.
romain@924	37 \item L'outil vous aidera à réparer vos erreurs. Si vous effectuez un changement
romain@924	38 qui se révèlera être une erreur, vous pourrez revenir fiablement à une version
romain@924	39 antérieur d'une fichier ou même d'un ensemble de fichier. En fait, un outil de
romain@924	40 gestion de source \emph{vraiment} efficace vous permettra d'identifier à quel
romain@924	41 moment le problème est apparu (voir la section~\ref{sec:undo:bisect} pour plus
romain@924	42 de détails).
romain@924	43 \item L'outil vous permettra aussi de travailler sur plusieurs versions différentes
romain@924	44 de votre projet et à gérer l'écart entre chaque.
bos@217	45 \end{itemize}
romain@924	46 La plupart de ces raisons ont autant d'importances---du moins en théorie--- que
romain@924	47 vous travailliez sur un projet pour vous, ou avec une centaine d'autres
romain@924	48 personnes.
romain@924	49
romain@924	50 Une question fondamental à propos des outils de gestion de source, qu'il s'agisse
romain@924	51 du projet d'une personne ou d'une grande équipe, est quelles sont ses
romain@924	52 \emph{avantages} par rapport à ses \emph{coût}. Un outil qui est difficile à
romain@924	53 utiliser ou à comprendre exigera un effort d'adoption.
romain@924	54
romain@924	55 Un projet de cinq milles personnnes s'effondrera très certainement de lui même
romain@924	56 sans aucun processus et outil de gestion de source. Dans ce cas, le coût
romain@924	57 d'utilisation d'un logiciel de gestion de source est dérisoire, puisque
romain@924	58 \emph{sans}, l'échec est presque garanti.
romain@924	59
romain@924	60 D'un autre coté, un ``rapide hack'' d'une personnne peut sembler un contexte
romain@924	61 bien pauvre pour utiliser un outil de gestion de source, car, bien évidement
romain@924	62 le coût d'utilisation dépasse le coût total du projet. N'est ce pas ?
romain@924	63
romain@924	64 Mercurial supporte ces \emph{deux} échelles de travail. Vous pouvez apprendre
romain@924	65 les bases en juste quelques minutes, et, grâce à sa performance, vous pouvez
romain@924	66 l'utiliser avec facilité sur le plus petit des projets. Cette simplicité
romain@924	67 signifie que vous n'avez pas de concepts obscures ou de séquence de commandes
romain@924	68 défiant l'imagination, complètement décorrelé de \emph{ce que vous êtes
romain@924	69 vraiment entrain de faire}. En même temps, ces mêmes performances et sa
romain@924	70 nature ``peer-to-peer'' vous permet d'augmenter, sans difficulté, son
romain@924	71 utilisation à de très grand projet.
romain@924	72
romain@924	73 Aucun outil de gestion de source ne peut sauver un projet mal mené, mais un
romain@924	74 bon outil peut faire une grande différence dans la fluidité avec lequel
romain@924	75 vous pourrez travailler avec.
romain@924	76
romain@924	77 \subsection{Les multiples noms de la gestion de source}
romain@924	78
romain@924	79 La gestion de source est un domaine divers, tellement qu'il n'existe pas
romain@924	80 une seul nom ou acronyme pour le désigner. Voilà quelqu'uns des noms ou
romain@924	81 acronymes que vous rencontrerez le plus souvent:
bos@217	82 \begin{itemize}
romain@924	83 \item \textit{Revision control (RCS)} ;
romain@924	84 \item Software configuration management (SCM), ou \textit{configuration management} ;
romain@924	85 \item \textit{Source code management} ;
romain@924	86 \item \textit{Source code control}, ou \textit{source control} ;
romain@924	87 \item \textit{Version control (VCS)}.
bos@217	88 \end{itemize}
romain@924	89
romain@924	90 \notebox {
romain@924	91 Note du traducteur : J'ai conservé la liste des noms en anglais pour des raisons de commodité (ils sont plus ``googelable''). J'ai choisi de conserver le terme ``gestion de sources'' comme traduction unique dans l'ensemble du document.
romain@924	92
romain@924	93 En outre, j'ai opté pour conserver l'ensemble des opérations de Mercurial (commit, push, pull,...) en anglais, là aussi pour faciliter la lecture d'autres documents en anglais, et
romain@924	94 aussi son utilisation.
romain@924	95 }
romain@924	96
romain@924	97 Certains personnes prétendent que ces termes ont en fait des sens
romain@924	98 différents mais en pratique ils se recouvrent tellement qu'il n'y a pas
romain@924	99 réellement de manière pertinente de les distinguer.
romain@924	100
romain@924	101 \section{Une courte histoire de la gestion de source}
romain@924	102
romain@924	103 Le plus célèbre des anciens outils de gestion de source est \textit{SCCS (Source
romain@924	104 Code Control System)}, que Marc Rochkind conçu dans les laboratoire de recherche de Bell
romain@924	105 (\textit{Bell Labs}), dans le début des années 70. \textit{SCCS} ne fonctionner que sur des fichiers individuels, et demandait à personne travaillant sur le projet d'avoir un accès à un répertoire de travail commun, sur un unique système.
romain@924	106 Seulement une personne pouvait modifier un fichier au même moment, ce fonctionnement était assuré par l'utilisation de verrou (``lock''). Il était courant que des personnes ne vérouille
romain@924	107 des fichiers, et plus tard, oublie de le dévérouiller; empêchant n'importe qui d'autre de
romain@924	108 travailler sur ces fichiers sans l'aide de l'administrateur...
romain@924	109
romain@924	110 Walter Tichy a développé une alternative libre à \textit{SCCS} au début des années 80, qu'il
romain@924	111 nomma \textit{RSC (Revison Control System)}. Comme \textit{SCCS}, \textit{RCS}
romain@924	112 demander aux développeurs de travailler sur le même répertoire partagé, et de vérouiller les
romain@924	113 fichiers pour se prémunir de tout conflit issue de modifications concurrentes.
romain@924	114
romain@924	115 Un peu plus tard dans les années 1980, Dick Grune utilisa \textit{RCS} comme une brique de base pour un ensemble de scripts \textit{shell} qu'il intitula cmt, avant de la renommer en \textit{CVS (Concurrent Versions System)}. La grande innovation de CVS était que les développeurs pouvaient travailler simultanéement and indépendament dans leur propre espace de travail. Ces espaces de travail privés assuraient que les développeurs ne se marche mutuellement sur les pieds, comme c'était souvent le cas avec RCS et SCCS. Chaque développeur disposait donc de sa copie de tout les fichiers du projet, et ils pouvaient donc librement les modifier. Ils devaient néanmoins effectuer la ``fusion'' (\textit{``merge''}) de leur fichiers, avant d'effectuer le ``commit'' de leur modification sur le dépôt central.
bos@218	116
bos@218	117 Brian Berliner took Grune's original scripts and rewrote them in~C,
bos@218	118 releasing in 1989 the code that has since developed into the modern
bos@218	119 version of CVS. CVS subsequently acquired the ability to operate over
bos@218	120 a network connection, giving it a client/server architecture. CVS's
bos@218	121 architecture is centralised; only the server has a copy of the history
bos@218	122 of the project. Client workspaces just contain copies of recent
bos@218	123 versions of the project's files, and a little metadata to tell them
bos@218	124 where the server is. CVS has been enormously successful; it is
bos@218	125 probably the world's most widely used revision control system.
bos@218	126
bos@218	127 In the early 1990s, Sun Microsystems developed an early distributed
bos@218	128 revision control system, called TeamWare. A TeamWare workspace
bos@218	129 contains a complete copy of the project's history. TeamWare has no
bos@218	130 notion of a central repository. (CVS relied upon RCS for its history
bos@218	131 storage; TeamWare used SCCS.)
bos@218	132
bos@218	133 As the 1990s progressed, awareness grew of a number of problems with
bos@218	134 CVS. It records simultaneous changes to multiple files individually,
bos@218	135 instead of grouping them together as a single logically atomic
bos@218	136 operation. It does not manage its file hierarchy well; it is easy to
bos@218	137 make a mess of a repository by renaming files and directories. Worse,
bos@218	138 its source code is difficult to read and maintain, which made the
bos@218	139 ``pain level'' of fixing these architectural problems prohibitive.
bos@218	140
bos@218	141 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on
bos@218	142 CVS, started a project to replace it with a tool that would have a
bos@218	143 better architecture and cleaner code. The result, Subversion, does
bos@218	144 not stray from CVS's centralised client/server model, but it adds
bos@218	145 multi-file atomic commits, better namespace management, and a number
bos@218	146 of other features that make it a generally better tool than CVS.
bos@218	147 Since its initial release, it has rapidly grown in popularity.
bos@218	148
bos@218	149 More or less simultaneously, Graydon Hoare began working on an
bos@218	150 ambitious distributed revision control system that he named Monotone.
bos@218	151 While Monotone addresses many of CVS's design flaws and has a
bos@218	152 peer-to-peer architecture, it goes beyond earlier (and subsequent)
bos@218	153 revision control tools in a number of innovative ways. It uses
bos@218	154 cryptographic hashes as identifiers, and has an integral notion of
bos@218	155 ``trust'' for code from different sources.
bos@218	156
bos@218	157 Mercurial began life in 2005. While a few aspects of its design are
bos@218	158 influenced by Monotone, Mercurial focuses on ease of use, high
bos@218	159 performance, and scalability to very large projects.
bos@155	160
bos@219	161 \section{Trends in revision control}
bos@219	162
bos@219	163 There has been an unmistakable trend in the development and use of
bos@219	164 revision control tools over the past four decades, as people have
bos@219	165 become familiar with the capabilities of their tools and constrained
bos@219	166 by their limitations.
bos@219	167
bos@219	168 The first generation began by managing single files on individual
bos@219	169 computers. Although these tools represented a huge advance over
bos@219	170 ad-hoc manual revision control, their locking model and reliance on a
bos@219	171 single computer limited them to small, tightly-knit teams.
bos@219	172
bos@219	173 The second generation loosened these constraints by moving to
bos@219	174 network-centered architectures, and managing entire projects at a
bos@219	175 time. As projects grew larger, they ran into new problems. With
bos@219	176 clients needing to talk to servers very frequently, server scaling
bos@219	177 became an issue for large projects. An unreliable network connection
bos@219	178 could prevent remote users from being able to talk to the server at
bos@219	179 all. As open source projects started making read-only access
bos@219	180 available anonymously to anyone, people without commit privileges
bos@219	181 found that they could not use the tools to interact with a project in
bos@219	182 a natural way, as they could not record their changes.
bos@219	183
bos@219	184 The current generation of revision control tools is peer-to-peer in
bos@219	185 nature. All of these systems have dropped the dependency on a single
bos@219	186 central server, and allow people to distribute their revision control
bos@219	187 data to where it's actually needed. Collaboration over the Internet
bos@219	188 has moved from constrained by technology to a matter of choice and
bos@219	189 consensus. Modern tools can operate offline indefinitely and
bos@219	190 autonomously, with a network connection only needed when syncing
bos@219	191 changes with another repository.
bos@219	192
bos@219	193 \section{A few of the advantages of distributed revision control}
bos@219	194
bos@219	195 Even though distributed revision control tools have for several years
bos@219	196 been as robust and usable as their previous-generation counterparts,
bos@219	197 people using older tools have not yet necessarily woken up to their
bos@219	198 advantages. There are a number of ways in which distributed tools
bos@219	199 shine relative to centralised ones.
bos@219	200
bos@219	201 For an individual developer, distributed tools are almost always much
bos@219	202 faster than centralised tools. This is for a simple reason: a
bos@219	203 centralised tool needs to talk over the network for many common
bos@219	204 operations, because most metadata is stored in a single copy on the
bos@219	205 central server. A distributed tool stores all of its metadata
bos@219	206 locally. All else being equal, talking over the network adds overhead
bos@219	207 to a centralised tool. Don't underestimate the value of a snappy,
bos@219	208 responsive tool: you're going to spend a lot of time interacting with
bos@219	209 your revision control software.
bos@219	210
bos@219	211 Distributed tools are indifferent to the vagaries of your server
bos@219	212 infrastructure, again because they replicate metadata to so many
bos@219	213 locations. If you use a centralised system and your server catches
bos@219	214 fire, you'd better hope that your backup media are reliable, and that
bos@219	215 your last backup was recent and actually worked. With a distributed
bos@219	216 tool, you have many backups available on every contributor's computer.
bos@219	217
bos@219	218 The reliability of your network will affect distributed tools far less
bos@219	219 than it will centralised tools. You can't even use a centralised tool
bos@219	220 without a network connection, except for a few highly constrained
bos@219	221 commands. With a distributed tool, if your network connection goes
bos@219	222 down while you're working, you may not even notice. The only thing
bos@219	223 you won't be able to do is talk to repositories on other computers,
bos@219	224 something that is relatively rare compared with local operations. If
bos@219	225 you have a far-flung team of collaborators, this may be significant.
bos@219	226
bos@220	227 \subsection{Advantages for open source projects}
bos@220	228
bos@219	229 If you take a shine to an open source project and decide that you
bos@219	230 would like to start hacking on it, and that project uses a distributed
bos@219	231 revision control tool, you are at once a peer with the people who
bos@219	232 consider themselves the ``core'' of that project. If they publish
bos@219	233 their repositories, you can immediately copy their project history,
bos@219	234 start making changes, and record your work, using the same tools in
bos@219	235 the same ways as insiders. By contrast, with a centralised tool, you
bos@219	236 must use the software in a ``read only'' mode unless someone grants
bos@219	237 you permission to commit changes to their central server. Until then,
bos@219	238 you won't be able to record changes, and your local modifications will
bos@219	239 be at risk of corruption any time you try to update your client's view
bos@219	240 of the repository.
bos@155	241
bos@220	242 \subsubsection{The forking non-problem}
bos@220	243
bos@220	244 It has been suggested that distributed revision control tools pose
bos@220	245 some sort of risk to open source projects because they make it easy to
bos@220	246 ``fork'' the development of a project. A fork happens when there are
bos@220	247 differences in opinion or attitude between groups of developers that
bos@220	248 cause them to decide that they can't work together any longer. Each
bos@220	249 side takes a more or less complete copy of the project's source code,
bos@220	250 and goes off in its own direction.
bos@220	251
bos@220	252 Sometimes the camps in a fork decide to reconcile their differences.
bos@220	253 With a centralised revision control system, the \emph{technical}
bos@220	254 process of reconciliation is painful, and has to be performed largely
bos@220	255 by hand. You have to decide whose revision history is going to
bos@220	256 ``win'', and graft the other team's changes into the tree somehow.
bos@220	257 This usually loses some or all of one side's revision history.
bos@220	258
bos@220	259 What distributed tools do with respect to forking is they make forking
bos@220	260 the \emph{only} way to develop a project. Every single change that
bos@220	261 you make is potentially a fork point. The great strength of this
bos@220	262 approach is that a distributed revision control tool has to be really
bos@220	263 good at \emph{merging} forks, because forks are absolutely
bos@220	264 fundamental: they happen all the time.
bos@220	265
bos@220	266 If every piece of work that everybody does, all the time, is framed in
bos@220	267 terms of forking and merging, then what the open source world refers
bos@220	268 to as a ``fork'' becomes \emph{purely} a social issue. If anything,
bos@220	269 distributed tools \emph{lower} the likelihood of a fork:
bos@220	270 \begin{itemize}
bos@220	271 \item They eliminate the social distinction that centralised tools
bos@220	272 impose: that between insiders (people with commit access) and
bos@220	273 outsiders (people without).
bos@220	274 \item They make it easier to reconcile after a social fork, because
bos@220	275 all that's involved from the perspective of the revision control
bos@220	276 software is just another merge.
bos@220	277 \end{itemize}
bos@220	278
bos@220	279 Some people resist distributed tools because they want to retain tight
bos@220	280 control over their projects, and they believe that centralised tools
bos@220	281 give them this control. However, if you're of this belief, and you
bos@220	282 publish your CVS or Subversion repositories publically, there are
bos@220	283 plenty of tools available that can pull out your entire project's
bos@220	284 history (albeit slowly) and recreate it somewhere that you don't
bos@220	285 control. So while your control in this case is illusory, you are
tktan@263	286 forgoing the ability to fluidly collaborate with whatever people feel
bos@220	287 compelled to mirror and fork your history.
bos@220	288
bos@220	289 \subsection{Advantages for commercial projects}
bos@220	290
bos@220	291 Many commercial projects are undertaken by teams that are scattered
bos@220	292 across the globe. Contributors who are far from a central server will
bos@220	293 see slower command execution and perhaps less reliability. Commercial
bos@220	294 revision control systems attempt to ameliorate these problems with
bos@220	295 remote-site replication add-ons that are typically expensive to buy
bos@220	296 and cantankerous to administer. A distributed system doesn't suffer
bos@220	297 from these problems in the first place. Better yet, you can easily
bos@220	298 set up multiple authoritative servers, say one per site, so that
bos@220	299 there's no redundant communication between repositories over expensive
bos@220	300 long-haul network links.
bos@220	301
bos@220	302 Centralised revision control systems tend to have relatively low
bos@220	303 scalability. It's not unusual for an expensive centralised system to
bos@220	304 fall over under the combined load of just a few dozen concurrent
bos@220	305 users. Once again, the typical response tends to be an expensive and
bos@220	306 clunky replication facility. Since the load on a central server---if
bos@280	307 you have one at all---is many times lower with a distributed
bos@220	308 tool (because all of the data is replicated everywhere), a single
bos@220	309 cheap server can handle the needs of a much larger team, and
bos@220	310 replication to balance load becomes a simple matter of scripting.
bos@220	311
bos@220	312 If you have an employee in the field, troubleshooting a problem at a
bos@220	313 customer's site, they'll benefit from distributed revision control.
bos@220	314 The tool will let them generate custom builds, try different fixes in
bos@220	315 isolation from each other, and search efficiently through history for
bos@220	316 the sources of bugs and regressions in the customer's environment, all
bos@220	317 without needing to connect to your company's network.
bos@219	318
bos@155	319 \section{Why choose Mercurial?}
bos@155	320
bos@221	321 Mercurial has a unique set of properties that make it a particularly
bos@221	322 good choice as a revision control system.
bos@221	323 \begin{itemize}
bos@221	324 \item It is easy to learn and use.
bos@221	325 \item It is lightweight.
bos@221	326 \item It scales excellently.
bos@221	327 \item It is easy to customise.
bos@221	328 \end{itemize}
bos@221	329
bos@221	330 If you are at all familiar with revision control systems, you should
bos@221	331 be able to get up and running with Mercurial in less than five
bos@221	332 minutes. Even if not, it will take no more than a few minutes
bos@221	333 longer. Mercurial's command and feature sets are generally uniform
bos@221	334 and consistent, so you can keep track of a few general rules instead
bos@221	335 of a host of exceptions.
bos@221	336
bos@221	337 On a small project, you can start working with Mercurial in moments.
bos@221	338 Creating new changes and branches; transferring changes around
bos@221	339 (whether locally or over a network); and history and status operations
bos@221	340 are all fast. Mercurial attempts to stay nimble and largely out of
bos@221	341 your way by combining low cognitive overhead with blazingly fast
bos@221	342 operations.
bos@221	343
bos@221	344 The usefulness of Mercurial is not limited to small projects: it is
bos@221	345 used by projects with hundreds to thousands of contributors, each
bos@221	346 containing tens of thousands of files and hundreds of megabytes of
bos@221	347 source code.
bos@221	348
bos@221	349 If the core functionality of Mercurial is not enough for you, it's
bos@221	350 easy to build on. Mercurial is well suited to scripting tasks, and
bos@221	351 its clean internals and implementation in Python make it easy to add
bos@221	352 features in the form of extensions. There are a number of popular and
bos@221	353 useful extensions already available, ranging from helping to identify
bos@221	354 bugs to improving performance.
bos@221	355
bos@221	356 \section{Mercurial compared with other tools}
bos@221	357
bos@221	358 Before you read on, please understand that this section necessarily
bos@221	359 reflects my own experiences, interests, and (dare I say it) biases. I
bos@221	360 have used every one of the revision control tools listed below, in
bos@221	361 most cases for several years at a time.
bos@221	362
bos@280	363
bos@221	364 \subsection{Subversion}
bos@221	365
bos@221	366 Subversion is a popular revision control tool, developed to replace
bos@221	367 CVS. It has a centralised client/server architecture.
bos@221	368
bos@221	369 Subversion and Mercurial have similarly named commands for performing
bos@280	370 the same operations, so if you're familiar with one, it is easy to
bos@280	371 learn to use the other. Both tools are portable to all popular
bos@221	372 operating systems.
bos@221	373
bos@315	374 Prior to version 1.5, Subversion had no useful support for merges.
bos@315	375 At the time of writing, its merge tracking capability is new, and known to be
bos@315	376 \href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated
bos@315	377 and buggy}.
bos@256	378
bos@221	379 Mercurial has a substantial performance advantage over Subversion on
bos@221	380 every revision control operation I have benchmarked. I have measured
bos@221	381 its advantage as ranging from a factor of two to a factor of six when
bos@221	382 compared with Subversion~1.4.3's \emph{ra\_local} file store, which is
simon@313	383 the fastest access method available. In more realistic deployments
bos@221	384 involving a network-based store, Subversion will be at a substantially
bos@256	385 larger disadvantage. Because many Subversion commands must talk to
bos@256	386 the server and Subversion does not have useful replication facilities,
bos@280	387 server capacity and network bandwidth become bottlenecks for modestly
bos@280	388 large projects.
bos@280	389
bos@280	390 Additionally, Subversion incurs substantial storage overhead to avoid
bos@280	391 network transactions for a few common operations, such as finding
bos@280	392 modified files (\texttt{status}) and displaying modifications against
bos@280	393 the current revision (\texttt{diff}). As a result, a Subversion
bos@280	394 working copy is often the same size as, or larger than, a Mercurial
bos@280	395 repository and working directory, even though the Mercurial repository
bos@280	396 contains a complete history of the project.
bos@280	397
bos@280	398 Subversion is widely supported by third party tools. Mercurial
bos@280	399 currently lags considerably in this area. This gap is closing,
bos@280	400 however, and indeed some of Mercurial's GUI tools now outshine their
bos@280	401 Subversion equivalents. Like Mercurial, Subversion has an excellent
bos@280	402 user manual.
bos@280	403
bos@280	404 Because Subversion doesn't store revision history on the client, it is
bos@280	405 well suited to managing projects that deal with lots of large, opaque
bos@280	406 binary files. If you check in fifty revisions to an incompressible
bos@280	407 10MB file, Subversion's client-side space usage stays constant The
bos@280	408 space used by any distributed SCM will grow rapidly in proportion to
bos@280	409 the number of revisions, because the differences between each revision
bos@280	410 are large.
bos@280	411
bos@280	412 In addition, it's often difficult or, more usually, impossible to
bos@280	413 merge different versions of a binary file. Subversion's ability to
bos@280	414 let a user lock a file, so that they temporarily have the exclusive
bos@280	415 right to commit changes to it, can be a significant advantage to a
bos@280	416 project where binary files are widely used.
bos@280	417
bos@280	418 Mercurial can import revision history from a Subversion repository.
bos@280	419 It can also export revision history to a Subversion repository. This
bos@280	420 makes it easy to ``test the waters'' and use Mercurial and Subversion
bos@280	421 in parallel before deciding to switch. History conversion is
bos@280	422 incremental, so you can perform an initial conversion, then small
bos@280	423 additional conversions afterwards to bring in new changes.
bos@280	424
bos@221	425
bos@221	426 \subsection{Git}
bos@221	427
bos@221	428 Git is a distributed revision control tool that was developed for
bos@221	429 managing the Linux kernel source tree. Like Mercurial, its early
bos@221	430 design was somewhat influenced by Monotone.
bos@221	431
bos@280	432 Git has a very large command set, with version~1.5.0 providing~139
bos@280	433 individual commands. It has something of a reputation for being
bos@280	434 difficult to learn. Compared to Git, Mercurial has a strong focus on
bos@280	435 simplicity.
bos@280	436
bos@280	437 In terms of performance, Git is extremely fast. In several cases, it
bos@280	438 is faster than Mercurial, at least on Linux, while Mercurial performs
bos@280	439 better on other operations. However, on Windows, the performance and
bos@280	440 general level of support that Git provides is, at the time of writing,
bos@280	441 far behind that of Mercurial.
bos@221	442
bos@221	443 While a Mercurial repository needs no maintenance, a Git repository
bos@221	444 requires frequent manual ``repacks'' of its metadata. Without these,
bos@221	445 performance degrades, while space usage grows rapidly. A server that
bos@221	446 contains many Git repositories that are not rigorously and frequently
bos@221	447 repacked will become heavily disk-bound during backups, and there have
bos@221	448 been instances of daily backups taking far longer than~24 hours as a
bos@221	449 result. A freshly packed Git repository is slightly smaller than a
bos@221	450 Mercurial repository, but an unpacked repository is several orders of
bos@221	451 magnitude larger.
bos@221	452
bos@221	453 The core of Git is written in C. Many Git commands are implemented as
bos@221	454 shell or Perl scripts, and the quality of these scripts varies widely.
bos@280	455 I have encountered several instances where scripts charged along
bos@221	456 blindly in the presence of errors that should have been fatal.
bos@221	457
bos@280	458 Mercurial can import revision history from a Git repository.
bos@280	459
bos@280	460
bos@221	461 \subsection{CVS}
bos@221	462
bos@221	463 CVS is probably the most widely used revision control tool in the
bos@280	464 world. Due to its age and internal untidiness, it has been only
bos@280	465 lightly maintained for many years.
bos@221	466
bos@221	467 It has a centralised client/server architecture. It does not group
bos@221	468 related file changes into atomic commits, making it easy for people to
bos@256	469 ``break the build'': one person can successfully commit part of a
bos@256	470 change and then be blocked by the need for a merge, causing other
bos@256	471 people to see only a portion of the work they intended to do. This
bos@256	472 also affects how you work with project history. If you want to see
bos@256	473 all of the modifications someone made as part of a task, you will need
bos@256	474 to manually inspect the descriptions and timestamps of the changes
bos@256	475 made to each file involved (if you even know what those files were).
bos@256	476
bos@256	477 CVS has a muddled notion of tags and branches that I will not attempt
bos@256	478 to even describe. It does not support renaming of files or
bos@256	479 directories well, making it easy to corrupt a repository. It has
bos@256	480 almost no internal consistency checking capabilities, so it is usually
bos@256	481 not even possible to tell whether or how a repository is corrupt. I
bos@256	482 would not recommend CVS for any project, existing or new.
bos@221	483
bos@221	484 Mercurial can import CVS revision history. However, there are a few
bos@221	485 caveats that apply; these are true of every other revision control
bos@221	486 tool's CVS importer, too. Due to CVS's lack of atomic changes and
bos@221	487 unversioned filesystem hierarchy, it is not possible to reconstruct
bos@221	488 CVS history completely accurately; some guesswork is involved, and
bos@221	489 renames will usually not show up. Because a lot of advanced CVS
bos@221	490 administration has to be done by hand and is hence error-prone, it's
bos@221	491 common for CVS importers to run into multiple problems with corrupted
bos@221	492 repositories (completely bogus revision timestamps and files that have
bos@221	493 remained locked for over a decade are just two of the less interesting
bos@221	494 problems I can recall from personal experience).
bos@221	495
bos@280	496 Mercurial can import revision history from a CVS repository.
bos@280	497
bos@280	498
bos@221	499 \subsection{Commercial tools}
bos@221	500
bos@221	501 Perforce has a centralised client/server architecture, with no
bos@221	502 client-side caching of any data. Unlike modern revision control
bos@221	503 tools, Perforce requires that a user run a command to inform the
bos@221	504 server about every file they intend to edit.
bos@221	505
bos@221	506 The performance of Perforce is quite good for small teams, but it
bos@221	507 falls off rapidly as the number of users grows beyond a few dozen.
bos@221	508 Modestly large Perforce installations require the deployment of
bos@221	509 proxies to cope with the load their users generate.
bos@16	510
bos@280	511
bos@280	512 \subsection{Choosing a revision control tool}
bos@280	513
bos@280	514 With the exception of CVS, all of the tools listed above have unique
bos@280	515 strengths that suit them to particular styles of work. There is no
bos@280	516 single revision control tool that is best in all situations.
bos@280	517
bos@280	518 As an example, Subversion is a good choice for working with frequently
bos@280	519 edited binary files, due to its centralised nature and support for
bos@318	520 file locking.
bos@280	521
bos@280	522 I personally find Mercurial's properties of simplicity, performance,
bos@280	523 and good merge support to be a compelling combination that has served
bos@280	524 me well for several years.
bos@280	525
bos@280	526
bos@280	527 \section{Switching from another tool to Mercurial}
bos@280	528
bos@280	529 Mercurial is bundled with an extension named \hgext{convert}, which
bos@280	530 can incrementally import revision history from several other revision
bos@280	531 control tools. By ``incremental'', I mean that you can convert all of
bos@280	532 a project's history to date in one go, then rerun the conversion later
bos@280	533 to obtain new changes that happened after the initial conversion.
bos@280	534
bos@280	535 The revision control tools supported by \hgext{convert} are as
bos@280	536 follows:
bos@280	537 \begin{itemize}
bos@280	538 \item Subversion
bos@280	539 \item CVS
bos@280	540 \item Git
bos@280	541 \item Darcs
bos@280	542 \end{itemize}
bos@280	543
bos@280	544 In addition, \hgext{convert} can export changes from Mercurial to
bos@280	545 Subversion. This makes it possible to try Subversion and Mercurial in
bos@280	546 parallel before committing to a switchover, without risking the loss
bos@280	547 of any work.
bos@280	548
bos@280	549 The \hgxcmd{conver}{convert} command is easy to use. Simply point it
bos@280	550 at the path or URL of the source repository, optionally give it the
bos@280	551 name of the destination repository, and it will start working. After
bos@280	552 the initial conversion, just run the same command again to import new
bos@280	553 changes.
bos@280	554
bos@280	555
bos@16	556 %%% Local Variables:
bos@16	557 %%% mode: latex
bos@16	558 %%% TeX-master: "00book"
bos@16	559 %%% End: