rev |
line source |
bos@16
|
1 \chapter{Introduction}
|
bos@16
|
2 \label{chap:intro}
|
bos@16
|
3
|
romain@923
|
4 \section{A propros de la gestion source}
|
romain@923
|
5
|
romain@923
|
6 La gestion de source est un processus permettant de gérer différentes
|
romain@923
|
7 version de la même information. Dans sa forme la plus simple, c'est
|
romain@923
|
8 quelquechose que tout le monde fait manuellement : quand vous modifiez
|
romain@923
|
9 un fichier, vous le sauvegarder sous un nouveau nom contenant un numéro,
|
romain@923
|
10 à chaque fois plus grand la précédente version.
|
romain@923
|
11
|
romain@923
|
12 Ce genre de gestion de version manuel est cependant sujette facilement
|
romain@923
|
13 à des erreurs, ainsi, depuis longtemps, des logiciels existent pour
|
romain@923
|
14 adresser cette problématique. Les premiers outils de gestion de source
|
romain@923
|
15 étaient destinés à aider un seul utilisateur, à automatiser la gestion
|
romain@923
|
16 des versions d'un seulf fichier. Dans les dernières décades, cette cilble
|
romain@923
|
17 a largement était agrandie, ils gèrent désormais de multiple fichiers, et
|
romain@923
|
18 aident un grand nombre de personnes à travailler ensemble. Le outils les
|
romain@923
|
19 plus modernes n'ont aucune difficultés à gérer plusieurs milliers de
|
romain@923
|
20 personnes travaillant ensemble sur des projets regroupant plusieurs
|
romain@923
|
21 centaines de milliers de fichiers.
|
romain@923
|
22
|
romain@923
|
23 \subsection{Pourquoi utiliser un gestionnaire de source ?}
|
romain@923
|
24
|
romain@923
|
25 Il y a de nombreuse raisons pour que vous ou votre équipe souhaitiez
|
romain@923
|
26 utiliser un outil automatisant la gestion de version pour votre projet.
|
bos@217
|
27 \begin{itemize}
|
romain@923
|
28 \item L'outil se chargera de suivre l'évolution de votre projet, sans
|
romain@923
|
29 que vous ayez à le faire. Pour chaque modification, vous aurez à votre
|
romain@923
|
30 disposition un journal indiquant \emph{qui} a faient quoi, \emph{pourquoi}
|
romain@923
|
31 ils l'ont fait, \emph{quand} ils l'ont fait, et \emph{ce} qu'ils ont
|
romain@923
|
32 modifiés.
|
romain@923
|
33 \item Quand vous travaillez avec d'autres personnes, les logiciels de
|
romain@923
|
34 gestion de source facilite le travail collaboratif. Par exemple, quand
|
romain@923
|
35 plusieurs personnes font, plus ou moins simultannéement, des modifications
|
romain@923
|
36 incompatibles, le logiciel vous aidera à identifier et résoudre les conflits.
|
bos@217
|
37 \item It can help you to recover from mistakes. If you make a change
|
bos@217
|
38 that later turns out to be in error, you can revert to an earlier
|
bos@217
|
39 version of one or more files. In fact, a \emph{really} good
|
bos@217
|
40 revision control tool will even help you to efficiently figure out
|
bos@217
|
41 exactly when a problem was introduced (see
|
bos@217
|
42 section~\ref{sec:undo:bisect} for details).
|
bos@218
|
43 \item It will help you to work simultaneously on, and manage the drift
|
bos@218
|
44 between, multiple versions of your project.
|
bos@217
|
45 \end{itemize}
|
bos@218
|
46 Most of these reasons are equally valid---at least in theory---whether
|
bos@218
|
47 you're working on a project by yourself, or with a hundred other
|
bos@218
|
48 people.
|
bos@218
|
49
|
bos@218
|
50 A key question about the practicality of revision control at these two
|
bos@218
|
51 different scales (``lone hacker'' and ``huge team'') is how its
|
bos@218
|
52 \emph{benefits} compare to its \emph{costs}. A revision control tool
|
bos@218
|
53 that's difficult to understand or use is going to impose a high cost.
|
bos@218
|
54
|
bos@219
|
55 A five-hundred-person project is likely to collapse under its own
|
bos@219
|
56 weight almost immediately without a revision control tool and process.
|
bos@219
|
57 In this case, the cost of using revision control might hardly seem
|
bos@219
|
58 worth considering, since \emph{without} it, failure is almost
|
bos@219
|
59 guaranteed.
|
bos@218
|
60
|
bos@218
|
61 On the other hand, a one-person ``quick hack'' might seem like a poor
|
bos@218
|
62 place to use a revision control tool, because surely the cost of using
|
bos@218
|
63 one must be close to the overall cost of the project. Right?
|
bos@218
|
64
|
bos@218
|
65 Mercurial uniquely supports \emph{both} of these scales of
|
bos@218
|
66 development. You can learn the basics in just a few minutes, and due
|
bos@218
|
67 to its low overhead, you can apply revision control to the smallest of
|
bos@218
|
68 projects with ease. Its simplicity means you won't have a lot of
|
bos@218
|
69 abstruse concepts or command sequences competing for mental space with
|
bos@218
|
70 whatever you're \emph{really} trying to do. At the same time,
|
bos@218
|
71 Mercurial's high performance and peer-to-peer nature let you scale
|
bos@218
|
72 painlessly to handle large projects.
|
bos@217
|
73
|
bos@219
|
74 No revision control tool can rescue a poorly run project, but a good
|
bos@219
|
75 choice of tools can make a huge difference to the fluidity with which
|
bos@219
|
76 you can work on a project.
|
bos@219
|
77
|
bos@217
|
78 \subsection{The many names of revision control}
|
bos@217
|
79
|
bos@217
|
80 Revision control is a diverse field, so much so that it doesn't
|
bos@217
|
81 actually have a single name or acronym. Here are a few of the more
|
bos@217
|
82 common names and acronyms you'll encounter:
|
bos@217
|
83 \begin{itemize}
|
bos@217
|
84 \item Revision control (RCS)
|
bos@219
|
85 \item Software configuration management (SCM), or configuration management
|
bos@218
|
86 \item Source code management
|
bos@219
|
87 \item Source code control, or source control
|
bos@217
|
88 \item Version control (VCS)
|
bos@217
|
89 \end{itemize}
|
bos@217
|
90 Some people claim that these terms actually have different meanings,
|
bos@217
|
91 but in practice they overlap so much that there's no agreed or even
|
bos@217
|
92 useful way to tease them apart.
|
bos@155
|
93
|
bos@219
|
94 \section{A short history of revision control}
|
bos@155
|
95
|
bos@218
|
96 The best known of the old-time revision control tools is SCCS (Source
|
bos@218
|
97 Code Control System), which Marc Rochkind wrote at Bell Labs, in the
|
bos@218
|
98 early 1970s. SCCS operated on individual files, and required every
|
bos@218
|
99 person working on a project to have access to a shared workspace on a
|
bos@218
|
100 single system. Only one person could modify a file at any time;
|
bos@218
|
101 arbitration for access to files was via locks. It was common for
|
bos@218
|
102 people to lock files, and later forget to unlock them, preventing
|
bos@218
|
103 anyone else from modifying those files without the help of an
|
bos@218
|
104 administrator.
|
bos@218
|
105
|
bos@218
|
106 Walter Tichy developed a free alternative to SCCS in the early 1980s;
|
bos@218
|
107 he called his program RCS (Revison Control System). Like SCCS, RCS
|
bos@218
|
108 required developers to work in a single shared workspace, and to lock
|
bos@218
|
109 files to prevent multiple people from modifying them simultaneously.
|
bos@218
|
110
|
bos@218
|
111 Later in the 1980s, Dick Grune used RCS as a building block for a set
|
bos@218
|
112 of shell scripts he initially called cmt, but then renamed to CVS
|
bos@218
|
113 (Concurrent Versions System). The big innovation of CVS was that it
|
bos@218
|
114 let developers work simultaneously and somewhat independently in their
|
bos@218
|
115 own personal workspaces. The personal workspaces prevented developers
|
bos@218
|
116 from stepping on each other's toes all the time, as was common with
|
bos@218
|
117 SCCS and RCS. Each developer had a copy of every project file, and
|
bos@218
|
118 could modify their copies independently. They had to merge their
|
bos@218
|
119 edits prior to committing changes to the central repository.
|
bos@218
|
120
|
bos@218
|
121 Brian Berliner took Grune's original scripts and rewrote them in~C,
|
bos@218
|
122 releasing in 1989 the code that has since developed into the modern
|
bos@218
|
123 version of CVS. CVS subsequently acquired the ability to operate over
|
bos@218
|
124 a network connection, giving it a client/server architecture. CVS's
|
bos@218
|
125 architecture is centralised; only the server has a copy of the history
|
bos@218
|
126 of the project. Client workspaces just contain copies of recent
|
bos@218
|
127 versions of the project's files, and a little metadata to tell them
|
bos@218
|
128 where the server is. CVS has been enormously successful; it is
|
bos@218
|
129 probably the world's most widely used revision control system.
|
bos@218
|
130
|
bos@218
|
131 In the early 1990s, Sun Microsystems developed an early distributed
|
bos@218
|
132 revision control system, called TeamWare. A TeamWare workspace
|
bos@218
|
133 contains a complete copy of the project's history. TeamWare has no
|
bos@218
|
134 notion of a central repository. (CVS relied upon RCS for its history
|
bos@218
|
135 storage; TeamWare used SCCS.)
|
bos@218
|
136
|
bos@218
|
137 As the 1990s progressed, awareness grew of a number of problems with
|
bos@218
|
138 CVS. It records simultaneous changes to multiple files individually,
|
bos@218
|
139 instead of grouping them together as a single logically atomic
|
bos@218
|
140 operation. It does not manage its file hierarchy well; it is easy to
|
bos@218
|
141 make a mess of a repository by renaming files and directories. Worse,
|
bos@218
|
142 its source code is difficult to read and maintain, which made the
|
bos@218
|
143 ``pain level'' of fixing these architectural problems prohibitive.
|
bos@218
|
144
|
bos@218
|
145 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on
|
bos@218
|
146 CVS, started a project to replace it with a tool that would have a
|
bos@218
|
147 better architecture and cleaner code. The result, Subversion, does
|
bos@218
|
148 not stray from CVS's centralised client/server model, but it adds
|
bos@218
|
149 multi-file atomic commits, better namespace management, and a number
|
bos@218
|
150 of other features that make it a generally better tool than CVS.
|
bos@218
|
151 Since its initial release, it has rapidly grown in popularity.
|
bos@218
|
152
|
bos@218
|
153 More or less simultaneously, Graydon Hoare began working on an
|
bos@218
|
154 ambitious distributed revision control system that he named Monotone.
|
bos@218
|
155 While Monotone addresses many of CVS's design flaws and has a
|
bos@218
|
156 peer-to-peer architecture, it goes beyond earlier (and subsequent)
|
bos@218
|
157 revision control tools in a number of innovative ways. It uses
|
bos@218
|
158 cryptographic hashes as identifiers, and has an integral notion of
|
bos@218
|
159 ``trust'' for code from different sources.
|
bos@218
|
160
|
bos@218
|
161 Mercurial began life in 2005. While a few aspects of its design are
|
bos@218
|
162 influenced by Monotone, Mercurial focuses on ease of use, high
|
bos@218
|
163 performance, and scalability to very large projects.
|
bos@155
|
164
|
bos@219
|
165 \section{Trends in revision control}
|
bos@219
|
166
|
bos@219
|
167 There has been an unmistakable trend in the development and use of
|
bos@219
|
168 revision control tools over the past four decades, as people have
|
bos@219
|
169 become familiar with the capabilities of their tools and constrained
|
bos@219
|
170 by their limitations.
|
bos@219
|
171
|
bos@219
|
172 The first generation began by managing single files on individual
|
bos@219
|
173 computers. Although these tools represented a huge advance over
|
bos@219
|
174 ad-hoc manual revision control, their locking model and reliance on a
|
bos@219
|
175 single computer limited them to small, tightly-knit teams.
|
bos@219
|
176
|
bos@219
|
177 The second generation loosened these constraints by moving to
|
bos@219
|
178 network-centered architectures, and managing entire projects at a
|
bos@219
|
179 time. As projects grew larger, they ran into new problems. With
|
bos@219
|
180 clients needing to talk to servers very frequently, server scaling
|
bos@219
|
181 became an issue for large projects. An unreliable network connection
|
bos@219
|
182 could prevent remote users from being able to talk to the server at
|
bos@219
|
183 all. As open source projects started making read-only access
|
bos@219
|
184 available anonymously to anyone, people without commit privileges
|
bos@219
|
185 found that they could not use the tools to interact with a project in
|
bos@219
|
186 a natural way, as they could not record their changes.
|
bos@219
|
187
|
bos@219
|
188 The current generation of revision control tools is peer-to-peer in
|
bos@219
|
189 nature. All of these systems have dropped the dependency on a single
|
bos@219
|
190 central server, and allow people to distribute their revision control
|
bos@219
|
191 data to where it's actually needed. Collaboration over the Internet
|
bos@219
|
192 has moved from constrained by technology to a matter of choice and
|
bos@219
|
193 consensus. Modern tools can operate offline indefinitely and
|
bos@219
|
194 autonomously, with a network connection only needed when syncing
|
bos@219
|
195 changes with another repository.
|
bos@219
|
196
|
bos@219
|
197 \section{A few of the advantages of distributed revision control}
|
bos@219
|
198
|
bos@219
|
199 Even though distributed revision control tools have for several years
|
bos@219
|
200 been as robust and usable as their previous-generation counterparts,
|
bos@219
|
201 people using older tools have not yet necessarily woken up to their
|
bos@219
|
202 advantages. There are a number of ways in which distributed tools
|
bos@219
|
203 shine relative to centralised ones.
|
bos@219
|
204
|
bos@219
|
205 For an individual developer, distributed tools are almost always much
|
bos@219
|
206 faster than centralised tools. This is for a simple reason: a
|
bos@219
|
207 centralised tool needs to talk over the network for many common
|
bos@219
|
208 operations, because most metadata is stored in a single copy on the
|
bos@219
|
209 central server. A distributed tool stores all of its metadata
|
bos@219
|
210 locally. All else being equal, talking over the network adds overhead
|
bos@219
|
211 to a centralised tool. Don't underestimate the value of a snappy,
|
bos@219
|
212 responsive tool: you're going to spend a lot of time interacting with
|
bos@219
|
213 your revision control software.
|
bos@219
|
214
|
bos@219
|
215 Distributed tools are indifferent to the vagaries of your server
|
bos@219
|
216 infrastructure, again because they replicate metadata to so many
|
bos@219
|
217 locations. If you use a centralised system and your server catches
|
bos@219
|
218 fire, you'd better hope that your backup media are reliable, and that
|
bos@219
|
219 your last backup was recent and actually worked. With a distributed
|
bos@219
|
220 tool, you have many backups available on every contributor's computer.
|
bos@219
|
221
|
bos@219
|
222 The reliability of your network will affect distributed tools far less
|
bos@219
|
223 than it will centralised tools. You can't even use a centralised tool
|
bos@219
|
224 without a network connection, except for a few highly constrained
|
bos@219
|
225 commands. With a distributed tool, if your network connection goes
|
bos@219
|
226 down while you're working, you may not even notice. The only thing
|
bos@219
|
227 you won't be able to do is talk to repositories on other computers,
|
bos@219
|
228 something that is relatively rare compared with local operations. If
|
bos@219
|
229 you have a far-flung team of collaborators, this may be significant.
|
bos@219
|
230
|
bos@220
|
231 \subsection{Advantages for open source projects}
|
bos@220
|
232
|
bos@219
|
233 If you take a shine to an open source project and decide that you
|
bos@219
|
234 would like to start hacking on it, and that project uses a distributed
|
bos@219
|
235 revision control tool, you are at once a peer with the people who
|
bos@219
|
236 consider themselves the ``core'' of that project. If they publish
|
bos@219
|
237 their repositories, you can immediately copy their project history,
|
bos@219
|
238 start making changes, and record your work, using the same tools in
|
bos@219
|
239 the same ways as insiders. By contrast, with a centralised tool, you
|
bos@219
|
240 must use the software in a ``read only'' mode unless someone grants
|
bos@219
|
241 you permission to commit changes to their central server. Until then,
|
bos@219
|
242 you won't be able to record changes, and your local modifications will
|
bos@219
|
243 be at risk of corruption any time you try to update your client's view
|
bos@219
|
244 of the repository.
|
bos@155
|
245
|
bos@220
|
246 \subsubsection{The forking non-problem}
|
bos@220
|
247
|
bos@220
|
248 It has been suggested that distributed revision control tools pose
|
bos@220
|
249 some sort of risk to open source projects because they make it easy to
|
bos@220
|
250 ``fork'' the development of a project. A fork happens when there are
|
bos@220
|
251 differences in opinion or attitude between groups of developers that
|
bos@220
|
252 cause them to decide that they can't work together any longer. Each
|
bos@220
|
253 side takes a more or less complete copy of the project's source code,
|
bos@220
|
254 and goes off in its own direction.
|
bos@220
|
255
|
bos@220
|
256 Sometimes the camps in a fork decide to reconcile their differences.
|
bos@220
|
257 With a centralised revision control system, the \emph{technical}
|
bos@220
|
258 process of reconciliation is painful, and has to be performed largely
|
bos@220
|
259 by hand. You have to decide whose revision history is going to
|
bos@220
|
260 ``win'', and graft the other team's changes into the tree somehow.
|
bos@220
|
261 This usually loses some or all of one side's revision history.
|
bos@220
|
262
|
bos@220
|
263 What distributed tools do with respect to forking is they make forking
|
bos@220
|
264 the \emph{only} way to develop a project. Every single change that
|
bos@220
|
265 you make is potentially a fork point. The great strength of this
|
bos@220
|
266 approach is that a distributed revision control tool has to be really
|
bos@220
|
267 good at \emph{merging} forks, because forks are absolutely
|
bos@220
|
268 fundamental: they happen all the time.
|
bos@220
|
269
|
bos@220
|
270 If every piece of work that everybody does, all the time, is framed in
|
bos@220
|
271 terms of forking and merging, then what the open source world refers
|
bos@220
|
272 to as a ``fork'' becomes \emph{purely} a social issue. If anything,
|
bos@220
|
273 distributed tools \emph{lower} the likelihood of a fork:
|
bos@220
|
274 \begin{itemize}
|
bos@220
|
275 \item They eliminate the social distinction that centralised tools
|
bos@220
|
276 impose: that between insiders (people with commit access) and
|
bos@220
|
277 outsiders (people without).
|
bos@220
|
278 \item They make it easier to reconcile after a social fork, because
|
bos@220
|
279 all that's involved from the perspective of the revision control
|
bos@220
|
280 software is just another merge.
|
bos@220
|
281 \end{itemize}
|
bos@220
|
282
|
bos@220
|
283 Some people resist distributed tools because they want to retain tight
|
bos@220
|
284 control over their projects, and they believe that centralised tools
|
bos@220
|
285 give them this control. However, if you're of this belief, and you
|
bos@220
|
286 publish your CVS or Subversion repositories publically, there are
|
bos@220
|
287 plenty of tools available that can pull out your entire project's
|
bos@220
|
288 history (albeit slowly) and recreate it somewhere that you don't
|
bos@220
|
289 control. So while your control in this case is illusory, you are
|
tktan@263
|
290 forgoing the ability to fluidly collaborate with whatever people feel
|
bos@220
|
291 compelled to mirror and fork your history.
|
bos@220
|
292
|
bos@220
|
293 \subsection{Advantages for commercial projects}
|
bos@220
|
294
|
bos@220
|
295 Many commercial projects are undertaken by teams that are scattered
|
bos@220
|
296 across the globe. Contributors who are far from a central server will
|
bos@220
|
297 see slower command execution and perhaps less reliability. Commercial
|
bos@220
|
298 revision control systems attempt to ameliorate these problems with
|
bos@220
|
299 remote-site replication add-ons that are typically expensive to buy
|
bos@220
|
300 and cantankerous to administer. A distributed system doesn't suffer
|
bos@220
|
301 from these problems in the first place. Better yet, you can easily
|
bos@220
|
302 set up multiple authoritative servers, say one per site, so that
|
bos@220
|
303 there's no redundant communication between repositories over expensive
|
bos@220
|
304 long-haul network links.
|
bos@220
|
305
|
bos@220
|
306 Centralised revision control systems tend to have relatively low
|
bos@220
|
307 scalability. It's not unusual for an expensive centralised system to
|
bos@220
|
308 fall over under the combined load of just a few dozen concurrent
|
bos@220
|
309 users. Once again, the typical response tends to be an expensive and
|
bos@220
|
310 clunky replication facility. Since the load on a central server---if
|
bos@280
|
311 you have one at all---is many times lower with a distributed
|
bos@220
|
312 tool (because all of the data is replicated everywhere), a single
|
bos@220
|
313 cheap server can handle the needs of a much larger team, and
|
bos@220
|
314 replication to balance load becomes a simple matter of scripting.
|
bos@220
|
315
|
bos@220
|
316 If you have an employee in the field, troubleshooting a problem at a
|
bos@220
|
317 customer's site, they'll benefit from distributed revision control.
|
bos@220
|
318 The tool will let them generate custom builds, try different fixes in
|
bos@220
|
319 isolation from each other, and search efficiently through history for
|
bos@220
|
320 the sources of bugs and regressions in the customer's environment, all
|
bos@220
|
321 without needing to connect to your company's network.
|
bos@219
|
322
|
bos@155
|
323 \section{Why choose Mercurial?}
|
bos@155
|
324
|
bos@221
|
325 Mercurial has a unique set of properties that make it a particularly
|
bos@221
|
326 good choice as a revision control system.
|
bos@221
|
327 \begin{itemize}
|
bos@221
|
328 \item It is easy to learn and use.
|
bos@221
|
329 \item It is lightweight.
|
bos@221
|
330 \item It scales excellently.
|
bos@221
|
331 \item It is easy to customise.
|
bos@221
|
332 \end{itemize}
|
bos@221
|
333
|
bos@221
|
334 If you are at all familiar with revision control systems, you should
|
bos@221
|
335 be able to get up and running with Mercurial in less than five
|
bos@221
|
336 minutes. Even if not, it will take no more than a few minutes
|
bos@221
|
337 longer. Mercurial's command and feature sets are generally uniform
|
bos@221
|
338 and consistent, so you can keep track of a few general rules instead
|
bos@221
|
339 of a host of exceptions.
|
bos@221
|
340
|
bos@221
|
341 On a small project, you can start working with Mercurial in moments.
|
bos@221
|
342 Creating new changes and branches; transferring changes around
|
bos@221
|
343 (whether locally or over a network); and history and status operations
|
bos@221
|
344 are all fast. Mercurial attempts to stay nimble and largely out of
|
bos@221
|
345 your way by combining low cognitive overhead with blazingly fast
|
bos@221
|
346 operations.
|
bos@221
|
347
|
bos@221
|
348 The usefulness of Mercurial is not limited to small projects: it is
|
bos@221
|
349 used by projects with hundreds to thousands of contributors, each
|
bos@221
|
350 containing tens of thousands of files and hundreds of megabytes of
|
bos@221
|
351 source code.
|
bos@221
|
352
|
bos@221
|
353 If the core functionality of Mercurial is not enough for you, it's
|
bos@221
|
354 easy to build on. Mercurial is well suited to scripting tasks, and
|
bos@221
|
355 its clean internals and implementation in Python make it easy to add
|
bos@221
|
356 features in the form of extensions. There are a number of popular and
|
bos@221
|
357 useful extensions already available, ranging from helping to identify
|
bos@221
|
358 bugs to improving performance.
|
bos@221
|
359
|
bos@221
|
360 \section{Mercurial compared with other tools}
|
bos@221
|
361
|
bos@221
|
362 Before you read on, please understand that this section necessarily
|
bos@221
|
363 reflects my own experiences, interests, and (dare I say it) biases. I
|
bos@221
|
364 have used every one of the revision control tools listed below, in
|
bos@221
|
365 most cases for several years at a time.
|
bos@221
|
366
|
bos@280
|
367
|
bos@221
|
368 \subsection{Subversion}
|
bos@221
|
369
|
bos@221
|
370 Subversion is a popular revision control tool, developed to replace
|
bos@221
|
371 CVS. It has a centralised client/server architecture.
|
bos@221
|
372
|
bos@221
|
373 Subversion and Mercurial have similarly named commands for performing
|
bos@280
|
374 the same operations, so if you're familiar with one, it is easy to
|
bos@280
|
375 learn to use the other. Both tools are portable to all popular
|
bos@221
|
376 operating systems.
|
bos@221
|
377
|
bos@315
|
378 Prior to version 1.5, Subversion had no useful support for merges.
|
bos@315
|
379 At the time of writing, its merge tracking capability is new, and known to be
|
bos@315
|
380 \href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated
|
bos@315
|
381 and buggy}.
|
bos@256
|
382
|
bos@221
|
383 Mercurial has a substantial performance advantage over Subversion on
|
bos@221
|
384 every revision control operation I have benchmarked. I have measured
|
bos@221
|
385 its advantage as ranging from a factor of two to a factor of six when
|
bos@221
|
386 compared with Subversion~1.4.3's \emph{ra\_local} file store, which is
|
simon@313
|
387 the fastest access method available. In more realistic deployments
|
bos@221
|
388 involving a network-based store, Subversion will be at a substantially
|
bos@256
|
389 larger disadvantage. Because many Subversion commands must talk to
|
bos@256
|
390 the server and Subversion does not have useful replication facilities,
|
bos@280
|
391 server capacity and network bandwidth become bottlenecks for modestly
|
bos@280
|
392 large projects.
|
bos@280
|
393
|
bos@280
|
394 Additionally, Subversion incurs substantial storage overhead to avoid
|
bos@280
|
395 network transactions for a few common operations, such as finding
|
bos@280
|
396 modified files (\texttt{status}) and displaying modifications against
|
bos@280
|
397 the current revision (\texttt{diff}). As a result, a Subversion
|
bos@280
|
398 working copy is often the same size as, or larger than, a Mercurial
|
bos@280
|
399 repository and working directory, even though the Mercurial repository
|
bos@280
|
400 contains a complete history of the project.
|
bos@280
|
401
|
bos@280
|
402 Subversion is widely supported by third party tools. Mercurial
|
bos@280
|
403 currently lags considerably in this area. This gap is closing,
|
bos@280
|
404 however, and indeed some of Mercurial's GUI tools now outshine their
|
bos@280
|
405 Subversion equivalents. Like Mercurial, Subversion has an excellent
|
bos@280
|
406 user manual.
|
bos@280
|
407
|
bos@280
|
408 Because Subversion doesn't store revision history on the client, it is
|
bos@280
|
409 well suited to managing projects that deal with lots of large, opaque
|
bos@280
|
410 binary files. If you check in fifty revisions to an incompressible
|
bos@280
|
411 10MB file, Subversion's client-side space usage stays constant The
|
bos@280
|
412 space used by any distributed SCM will grow rapidly in proportion to
|
bos@280
|
413 the number of revisions, because the differences between each revision
|
bos@280
|
414 are large.
|
bos@280
|
415
|
bos@280
|
416 In addition, it's often difficult or, more usually, impossible to
|
bos@280
|
417 merge different versions of a binary file. Subversion's ability to
|
bos@280
|
418 let a user lock a file, so that they temporarily have the exclusive
|
bos@280
|
419 right to commit changes to it, can be a significant advantage to a
|
bos@280
|
420 project where binary files are widely used.
|
bos@280
|
421
|
bos@280
|
422 Mercurial can import revision history from a Subversion repository.
|
bos@280
|
423 It can also export revision history to a Subversion repository. This
|
bos@280
|
424 makes it easy to ``test the waters'' and use Mercurial and Subversion
|
bos@280
|
425 in parallel before deciding to switch. History conversion is
|
bos@280
|
426 incremental, so you can perform an initial conversion, then small
|
bos@280
|
427 additional conversions afterwards to bring in new changes.
|
bos@280
|
428
|
bos@221
|
429
|
bos@221
|
430 \subsection{Git}
|
bos@221
|
431
|
bos@221
|
432 Git is a distributed revision control tool that was developed for
|
bos@221
|
433 managing the Linux kernel source tree. Like Mercurial, its early
|
bos@221
|
434 design was somewhat influenced by Monotone.
|
bos@221
|
435
|
bos@280
|
436 Git has a very large command set, with version~1.5.0 providing~139
|
bos@280
|
437 individual commands. It has something of a reputation for being
|
bos@280
|
438 difficult to learn. Compared to Git, Mercurial has a strong focus on
|
bos@280
|
439 simplicity.
|
bos@280
|
440
|
bos@280
|
441 In terms of performance, Git is extremely fast. In several cases, it
|
bos@280
|
442 is faster than Mercurial, at least on Linux, while Mercurial performs
|
bos@280
|
443 better on other operations. However, on Windows, the performance and
|
bos@280
|
444 general level of support that Git provides is, at the time of writing,
|
bos@280
|
445 far behind that of Mercurial.
|
bos@221
|
446
|
bos@221
|
447 While a Mercurial repository needs no maintenance, a Git repository
|
bos@221
|
448 requires frequent manual ``repacks'' of its metadata. Without these,
|
bos@221
|
449 performance degrades, while space usage grows rapidly. A server that
|
bos@221
|
450 contains many Git repositories that are not rigorously and frequently
|
bos@221
|
451 repacked will become heavily disk-bound during backups, and there have
|
bos@221
|
452 been instances of daily backups taking far longer than~24 hours as a
|
bos@221
|
453 result. A freshly packed Git repository is slightly smaller than a
|
bos@221
|
454 Mercurial repository, but an unpacked repository is several orders of
|
bos@221
|
455 magnitude larger.
|
bos@221
|
456
|
bos@221
|
457 The core of Git is written in C. Many Git commands are implemented as
|
bos@221
|
458 shell or Perl scripts, and the quality of these scripts varies widely.
|
bos@280
|
459 I have encountered several instances where scripts charged along
|
bos@221
|
460 blindly in the presence of errors that should have been fatal.
|
bos@221
|
461
|
bos@280
|
462 Mercurial can import revision history from a Git repository.
|
bos@280
|
463
|
bos@280
|
464
|
bos@221
|
465 \subsection{CVS}
|
bos@221
|
466
|
bos@221
|
467 CVS is probably the most widely used revision control tool in the
|
bos@280
|
468 world. Due to its age and internal untidiness, it has been only
|
bos@280
|
469 lightly maintained for many years.
|
bos@221
|
470
|
bos@221
|
471 It has a centralised client/server architecture. It does not group
|
bos@221
|
472 related file changes into atomic commits, making it easy for people to
|
bos@256
|
473 ``break the build'': one person can successfully commit part of a
|
bos@256
|
474 change and then be blocked by the need for a merge, causing other
|
bos@256
|
475 people to see only a portion of the work they intended to do. This
|
bos@256
|
476 also affects how you work with project history. If you want to see
|
bos@256
|
477 all of the modifications someone made as part of a task, you will need
|
bos@256
|
478 to manually inspect the descriptions and timestamps of the changes
|
bos@256
|
479 made to each file involved (if you even know what those files were).
|
bos@256
|
480
|
bos@256
|
481 CVS has a muddled notion of tags and branches that I will not attempt
|
bos@256
|
482 to even describe. It does not support renaming of files or
|
bos@256
|
483 directories well, making it easy to corrupt a repository. It has
|
bos@256
|
484 almost no internal consistency checking capabilities, so it is usually
|
bos@256
|
485 not even possible to tell whether or how a repository is corrupt. I
|
bos@256
|
486 would not recommend CVS for any project, existing or new.
|
bos@221
|
487
|
bos@221
|
488 Mercurial can import CVS revision history. However, there are a few
|
bos@221
|
489 caveats that apply; these are true of every other revision control
|
bos@221
|
490 tool's CVS importer, too. Due to CVS's lack of atomic changes and
|
bos@221
|
491 unversioned filesystem hierarchy, it is not possible to reconstruct
|
bos@221
|
492 CVS history completely accurately; some guesswork is involved, and
|
bos@221
|
493 renames will usually not show up. Because a lot of advanced CVS
|
bos@221
|
494 administration has to be done by hand and is hence error-prone, it's
|
bos@221
|
495 common for CVS importers to run into multiple problems with corrupted
|
bos@221
|
496 repositories (completely bogus revision timestamps and files that have
|
bos@221
|
497 remained locked for over a decade are just two of the less interesting
|
bos@221
|
498 problems I can recall from personal experience).
|
bos@221
|
499
|
bos@280
|
500 Mercurial can import revision history from a CVS repository.
|
bos@280
|
501
|
bos@280
|
502
|
bos@221
|
503 \subsection{Commercial tools}
|
bos@221
|
504
|
bos@221
|
505 Perforce has a centralised client/server architecture, with no
|
bos@221
|
506 client-side caching of any data. Unlike modern revision control
|
bos@221
|
507 tools, Perforce requires that a user run a command to inform the
|
bos@221
|
508 server about every file they intend to edit.
|
bos@221
|
509
|
bos@221
|
510 The performance of Perforce is quite good for small teams, but it
|
bos@221
|
511 falls off rapidly as the number of users grows beyond a few dozen.
|
bos@221
|
512 Modestly large Perforce installations require the deployment of
|
bos@221
|
513 proxies to cope with the load their users generate.
|
bos@16
|
514
|
bos@280
|
515
|
bos@280
|
516 \subsection{Choosing a revision control tool}
|
bos@280
|
517
|
bos@280
|
518 With the exception of CVS, all of the tools listed above have unique
|
bos@280
|
519 strengths that suit them to particular styles of work. There is no
|
bos@280
|
520 single revision control tool that is best in all situations.
|
bos@280
|
521
|
bos@280
|
522 As an example, Subversion is a good choice for working with frequently
|
bos@280
|
523 edited binary files, due to its centralised nature and support for
|
bos@318
|
524 file locking.
|
bos@280
|
525
|
bos@280
|
526 I personally find Mercurial's properties of simplicity, performance,
|
bos@280
|
527 and good merge support to be a compelling combination that has served
|
bos@280
|
528 me well for several years.
|
bos@280
|
529
|
bos@280
|
530
|
bos@280
|
531 \section{Switching from another tool to Mercurial}
|
bos@280
|
532
|
bos@280
|
533 Mercurial is bundled with an extension named \hgext{convert}, which
|
bos@280
|
534 can incrementally import revision history from several other revision
|
bos@280
|
535 control tools. By ``incremental'', I mean that you can convert all of
|
bos@280
|
536 a project's history to date in one go, then rerun the conversion later
|
bos@280
|
537 to obtain new changes that happened after the initial conversion.
|
bos@280
|
538
|
bos@280
|
539 The revision control tools supported by \hgext{convert} are as
|
bos@280
|
540 follows:
|
bos@280
|
541 \begin{itemize}
|
bos@280
|
542 \item Subversion
|
bos@280
|
543 \item CVS
|
bos@280
|
544 \item Git
|
bos@280
|
545 \item Darcs
|
bos@280
|
546 \end{itemize}
|
bos@280
|
547
|
bos@280
|
548 In addition, \hgext{convert} can export changes from Mercurial to
|
bos@280
|
549 Subversion. This makes it possible to try Subversion and Mercurial in
|
bos@280
|
550 parallel before committing to a switchover, without risking the loss
|
bos@280
|
551 of any work.
|
bos@280
|
552
|
bos@280
|
553 The \hgxcmd{conver}{convert} command is easy to use. Simply point it
|
bos@280
|
554 at the path or URL of the source repository, optionally give it the
|
bos@280
|
555 name of the destination repository, and it will start working. After
|
bos@280
|
556 the initial conversion, just run the same command again to import new
|
bos@280
|
557 changes.
|
bos@280
|
558
|
bos@280
|
559
|
bos@16
|
560 %%% Local Variables:
|
bos@16
|
561 %%% mode: latex
|
bos@16
|
562 %%% TeX-master: "00book"
|
bos@16
|
563 %%% End:
|