hgbook: 2b8c6aa370d5 en/intro.tex

hgbook

view en/intro.tex @ 298:2b8c6aa370d5

Fix sample output for test 'branch-repo'.

author	Guido Ostkamp <hg@ostkamp.fastmail.fm>
date	Wed Aug 20 21:54:18 2008 +0200 (2008-08-20)
parents	9dbed77d3ba6
children	a168daed199b 231c8469a0ec

line source

1 \chapter{Introduction}

2 \label{chap:intro}

4 \section{About revision control}

6 Revision control is the process of managing multiple versions of a

7 piece of information. In its simplest form, this is something that

8 many people do by hand: every time you modify a file, save it under a

9 new name that contains a number, each one higher than the number of

10 the preceding version.

12 Manually managing multiple versions of even a single file is an

13 error-prone task, though, so software tools to help automate this

14 process have long been available. The earliest automated revision

15 control tools were intended to help a single user to manage revisions

16 of a single file. Over the past few decades, the scope of revision

17 control tools has expanded greatly; they now manage multiple files,

18 and help multiple people to work together. The best modern revision

19 control tools have no problem coping with thousands of people working

20 together on projects that consist of hundreds of thousands of files.

22 \subsection{Why use revision control?}

24 There are a number of reasons why you or your team might want to use

25 an automated revision control tool for a project.

26 \begin{itemize}

27 \item It will track the history and evolution of your project, so you

28 don't have to. For every change, you'll have a log of \emph{who}

29 made it; \emph{why} they made it; \emph{when} they made it; and

30 \emph{what} the change was.

31 \item When you're working with other people, revision control software

32 makes it easier for you to collaborate. For example, when people

33 more or less simultaneously make potentially incompatible changes,

34 the software will help you to identify and resolve those conflicts.

35 \item It can help you to recover from mistakes. If you make a change

36 that later turns out to be in error, you can revert to an earlier

37 version of one or more files. In fact, a \emph{really} good

38 revision control tool will even help you to efficiently figure out

39 exactly when a problem was introduced (see

40 section~\ref{sec:undo:bisect} for details).

41 \item It will help you to work simultaneously on, and manage the drift

42 between, multiple versions of your project.

43 \end{itemize}

44 Most of these reasons are equally valid---at least in theory---whether

45 you're working on a project by yourself, or with a hundred other

46 people.

48 A key question about the practicality of revision control at these two

49 different scales (``lone hacker'' and ``huge team'') is how its

50 \emph{benefits} compare to its \emph{costs}. A revision control tool

51 that's difficult to understand or use is going to impose a high cost.

53 A five-hundred-person project is likely to collapse under its own

54 weight almost immediately without a revision control tool and process.

55 In this case, the cost of using revision control might hardly seem

56 worth considering, since \emph{without} it, failure is almost

57 guaranteed.

59 On the other hand, a one-person ``quick hack'' might seem like a poor

60 place to use a revision control tool, because surely the cost of using

61 one must be close to the overall cost of the project. Right?

63 Mercurial uniquely supports \emph{both} of these scales of

64 development. You can learn the basics in just a few minutes, and due

65 to its low overhead, you can apply revision control to the smallest of

66 projects with ease. Its simplicity means you won't have a lot of

67 abstruse concepts or command sequences competing for mental space with

68 whatever you're \emph{really} trying to do. At the same time,

69 Mercurial's high performance and peer-to-peer nature let you scale

70 painlessly to handle large projects.

72 No revision control tool can rescue a poorly run project, but a good

73 choice of tools can make a huge difference to the fluidity with which

74 you can work on a project.

76 \subsection{The many names of revision control}

78 Revision control is a diverse field, so much so that it doesn't

79 actually have a single name or acronym. Here are a few of the more

80 common names and acronyms you'll encounter:

81 \begin{itemize}

82 \item Revision control (RCS)

83 \item Software configuration management (SCM), or configuration management

84 \item Source code management

85 \item Source code control, or source control

86 \item Version control (VCS)

87 \end{itemize}

88 Some people claim that these terms actually have different meanings,

89 but in practice they overlap so much that there's no agreed or even

90 useful way to tease them apart.

92 \section{A short history of revision control}

94 The best known of the old-time revision control tools is SCCS (Source

95 Code Control System), which Marc Rochkind wrote at Bell Labs, in the

96 early 1970s. SCCS operated on individual files, and required every

97 person working on a project to have access to a shared workspace on a

98 single system. Only one person could modify a file at any time;

99 arbitration for access to files was via locks. It was common for

100 people to lock files, and later forget to unlock them, preventing

101 anyone else from modifying those files without the help of an

102 administrator.

103

104 Walter Tichy developed a free alternative to SCCS in the early 1980s;

105 he called his program RCS (Revison Control System). Like SCCS, RCS

106 required developers to work in a single shared workspace, and to lock

107 files to prevent multiple people from modifying them simultaneously.

108

109 Later in the 1980s, Dick Grune used RCS as a building block for a set

110 of shell scripts he initially called cmt, but then renamed to CVS

111 (Concurrent Versions System). The big innovation of CVS was that it

112 let developers work simultaneously and somewhat independently in their

113 own personal workspaces. The personal workspaces prevented developers

114 from stepping on each other's toes all the time, as was common with

115 SCCS and RCS. Each developer had a copy of every project file, and

116 could modify their copies independently. They had to merge their

117 edits prior to committing changes to the central repository.

118

119 Brian Berliner took Grune's original scripts and rewrote them in~C,

120 releasing in 1989 the code that has since developed into the modern

121 version of CVS. CVS subsequently acquired the ability to operate over

122 a network connection, giving it a client/server architecture. CVS's

123 architecture is centralised; only the server has a copy of the history

124 of the project. Client workspaces just contain copies of recent

125 versions of the project's files, and a little metadata to tell them

126 where the server is. CVS has been enormously successful; it is

127 probably the world's most widely used revision control system.

128

129 In the early 1990s, Sun Microsystems developed an early distributed

130 revision control system, called TeamWare. A TeamWare workspace

131 contains a complete copy of the project's history. TeamWare has no

132 notion of a central repository. (CVS relied upon RCS for its history

133 storage; TeamWare used SCCS.)

134

135 As the 1990s progressed, awareness grew of a number of problems with

136 CVS. It records simultaneous changes to multiple files individually,

137 instead of grouping them together as a single logically atomic

138 operation. It does not manage its file hierarchy well; it is easy to

139 make a mess of a repository by renaming files and directories. Worse,

140 its source code is difficult to read and maintain, which made the

141 ``pain level'' of fixing these architectural problems prohibitive.

142

143 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on

144 CVS, started a project to replace it with a tool that would have a

145 better architecture and cleaner code. The result, Subversion, does

146 not stray from CVS's centralised client/server model, but it adds

147 multi-file atomic commits, better namespace management, and a number

148 of other features that make it a generally better tool than CVS.

149 Since its initial release, it has rapidly grown in popularity.

150

151 More or less simultaneously, Graydon Hoare began working on an

152 ambitious distributed revision control system that he named Monotone.

153 While Monotone addresses many of CVS's design flaws and has a

154 peer-to-peer architecture, it goes beyond earlier (and subsequent)

155 revision control tools in a number of innovative ways. It uses

156 cryptographic hashes as identifiers, and has an integral notion of

157 ``trust'' for code from different sources.

158

159 Mercurial began life in 2005. While a few aspects of its design are

160 influenced by Monotone, Mercurial focuses on ease of use, high

161 performance, and scalability to very large projects.

162

163 \section{Trends in revision control}

164

165 There has been an unmistakable trend in the development and use of

166 revision control tools over the past four decades, as people have

167 become familiar with the capabilities of their tools and constrained

168 by their limitations.

169

170 The first generation began by managing single files on individual

171 computers. Although these tools represented a huge advance over

172 ad-hoc manual revision control, their locking model and reliance on a

173 single computer limited them to small, tightly-knit teams.

174

175 The second generation loosened these constraints by moving to

176 network-centered architectures, and managing entire projects at a

177 time. As projects grew larger, they ran into new problems. With

178 clients needing to talk to servers very frequently, server scaling

179 became an issue for large projects. An unreliable network connection

180 could prevent remote users from being able to talk to the server at

181 all. As open source projects started making read-only access

182 available anonymously to anyone, people without commit privileges

183 found that they could not use the tools to interact with a project in

184 a natural way, as they could not record their changes.

185

186 The current generation of revision control tools is peer-to-peer in

187 nature. All of these systems have dropped the dependency on a single

188 central server, and allow people to distribute their revision control

189 data to where it's actually needed. Collaboration over the Internet

190 has moved from constrained by technology to a matter of choice and

191 consensus. Modern tools can operate offline indefinitely and

192 autonomously, with a network connection only needed when syncing

193 changes with another repository.

194

195 \section{A few of the advantages of distributed revision control}

196

197 Even though distributed revision control tools have for several years

198 been as robust and usable as their previous-generation counterparts,

199 people using older tools have not yet necessarily woken up to their

200 advantages. There are a number of ways in which distributed tools

201 shine relative to centralised ones.

202

203 For an individual developer, distributed tools are almost always much

204 faster than centralised tools. This is for a simple reason: a

205 centralised tool needs to talk over the network for many common

206 operations, because most metadata is stored in a single copy on the

207 central server. A distributed tool stores all of its metadata

208 locally. All else being equal, talking over the network adds overhead

209 to a centralised tool. Don't underestimate the value of a snappy,

210 responsive tool: you're going to spend a lot of time interacting with

211 your revision control software.

212

213 Distributed tools are indifferent to the vagaries of your server

214 infrastructure, again because they replicate metadata to so many

215 locations. If you use a centralised system and your server catches

216 fire, you'd better hope that your backup media are reliable, and that

217 your last backup was recent and actually worked. With a distributed

218 tool, you have many backups available on every contributor's computer.

219

220 The reliability of your network will affect distributed tools far less

221 than it will centralised tools. You can't even use a centralised tool

222 without a network connection, except for a few highly constrained

223 commands. With a distributed tool, if your network connection goes

224 down while you're working, you may not even notice. The only thing

225 you won't be able to do is talk to repositories on other computers,

226 something that is relatively rare compared with local operations. If

227 you have a far-flung team of collaborators, this may be significant.

228

229 \subsection{Advantages for open source projects}

230

231 If you take a shine to an open source project and decide that you

232 would like to start hacking on it, and that project uses a distributed

233 revision control tool, you are at once a peer with the people who

234 consider themselves the ``core'' of that project. If they publish

235 their repositories, you can immediately copy their project history,

236 start making changes, and record your work, using the same tools in

237 the same ways as insiders. By contrast, with a centralised tool, you

238 must use the software in a ``read only'' mode unless someone grants

239 you permission to commit changes to their central server. Until then,

240 you won't be able to record changes, and your local modifications will

241 be at risk of corruption any time you try to update your client's view

242 of the repository.

243

244 \subsubsection{The forking non-problem}

245

246 It has been suggested that distributed revision control tools pose

247 some sort of risk to open source projects because they make it easy to

248 ``fork'' the development of a project. A fork happens when there are

249 differences in opinion or attitude between groups of developers that

250 cause them to decide that they can't work together any longer. Each

251 side takes a more or less complete copy of the project's source code,

252 and goes off in its own direction.

253

254 Sometimes the camps in a fork decide to reconcile their differences.

255 With a centralised revision control system, the \emph{technical}

256 process of reconciliation is painful, and has to be performed largely

257 by hand. You have to decide whose revision history is going to

258 ``win'', and graft the other team's changes into the tree somehow.

259 This usually loses some or all of one side's revision history.

260

261 What distributed tools do with respect to forking is they make forking

262 the \emph{only} way to develop a project. Every single change that

263 you make is potentially a fork point. The great strength of this

264 approach is that a distributed revision control tool has to be really

265 good at \emph{merging} forks, because forks are absolutely

266 fundamental: they happen all the time.

267

268 If every piece of work that everybody does, all the time, is framed in

269 terms of forking and merging, then what the open source world refers

270 to as a ``fork'' becomes \emph{purely} a social issue. If anything,

271 distributed tools \emph{lower} the likelihood of a fork:

272 \begin{itemize}

273 \item They eliminate the social distinction that centralised tools

274 impose: that between insiders (people with commit access) and

275 outsiders (people without).

276 \item They make it easier to reconcile after a social fork, because

277 all that's involved from the perspective of the revision control

278 software is just another merge.

279 \end{itemize}

280

281 Some people resist distributed tools because they want to retain tight

282 control over their projects, and they believe that centralised tools

283 give them this control. However, if you're of this belief, and you

284 publish your CVS or Subversion repositories publically, there are

285 plenty of tools available that can pull out your entire project's

286 history (albeit slowly) and recreate it somewhere that you don't

287 control. So while your control in this case is illusory, you are

288 forgoing the ability to fluidly collaborate with whatever people feel

289 compelled to mirror and fork your history.

290

291 \subsection{Advantages for commercial projects}

292

293 Many commercial projects are undertaken by teams that are scattered

294 across the globe. Contributors who are far from a central server will

295 see slower command execution and perhaps less reliability. Commercial

296 revision control systems attempt to ameliorate these problems with

297 remote-site replication add-ons that are typically expensive to buy

298 and cantankerous to administer. A distributed system doesn't suffer

299 from these problems in the first place. Better yet, you can easily

300 set up multiple authoritative servers, say one per site, so that

301 there's no redundant communication between repositories over expensive

302 long-haul network links.

303

304 Centralised revision control systems tend to have relatively low

305 scalability. It's not unusual for an expensive centralised system to

306 fall over under the combined load of just a few dozen concurrent

307 users. Once again, the typical response tends to be an expensive and

308 clunky replication facility. Since the load on a central server---if

309 you have one at all---is many times lower with a distributed

310 tool (because all of the data is replicated everywhere), a single

311 cheap server can handle the needs of a much larger team, and

312 replication to balance load becomes a simple matter of scripting.

313

314 If you have an employee in the field, troubleshooting a problem at a

315 customer's site, they'll benefit from distributed revision control.

316 The tool will let them generate custom builds, try different fixes in

317 isolation from each other, and search efficiently through history for

318 the sources of bugs and regressions in the customer's environment, all

319 without needing to connect to your company's network.

320

321 \section{Why choose Mercurial?}

322

323 Mercurial has a unique set of properties that make it a particularly

324 good choice as a revision control system.

325 \begin{itemize}

326 \item It is easy to learn and use.

327 \item It is lightweight.

328 \item It scales excellently.

329 \item It is easy to customise.

330 \end{itemize}

331

332 If you are at all familiar with revision control systems, you should

333 be able to get up and running with Mercurial in less than five

334 minutes. Even if not, it will take no more than a few minutes

335 longer. Mercurial's command and feature sets are generally uniform

336 and consistent, so you can keep track of a few general rules instead

337 of a host of exceptions.

338

339 On a small project, you can start working with Mercurial in moments.

340 Creating new changes and branches; transferring changes around

341 (whether locally or over a network); and history and status operations

342 are all fast. Mercurial attempts to stay nimble and largely out of

343 your way by combining low cognitive overhead with blazingly fast

344 operations.

345

346 The usefulness of Mercurial is not limited to small projects: it is

347 used by projects with hundreds to thousands of contributors, each

348 containing tens of thousands of files and hundreds of megabytes of

349 source code.

350

351 If the core functionality of Mercurial is not enough for you, it's

352 easy to build on. Mercurial is well suited to scripting tasks, and

353 its clean internals and implementation in Python make it easy to add

354 features in the form of extensions. There are a number of popular and

355 useful extensions already available, ranging from helping to identify

356 bugs to improving performance.

357

358 \section{Mercurial compared with other tools}

359

360 Before you read on, please understand that this section necessarily

361 reflects my own experiences, interests, and (dare I say it) biases. I

362 have used every one of the revision control tools listed below, in

363 most cases for several years at a time.

364

365

366 \subsection{Subversion}

367

368 Subversion is a popular revision control tool, developed to replace

369 CVS. It has a centralised client/server architecture.

370

371 Subversion and Mercurial have similarly named commands for performing

372 the same operations, so if you're familiar with one, it is easy to

373 learn to use the other. Both tools are portable to all popular

374 operating systems.

375

376 Subversion lacks a history-aware merge capability, forcing its users

377 to manually track exactly which revisions have been merged between

378 branches. If users fail to do this, or make mistakes, they face the

379 prospect of manually resolving merges with unnecessary conflicts.

380 Subversion also fails to merge changes when files or directories are

381 renamed. Subversion's poor merge support is its single biggest

382 weakness.

383

384 Mercurial has a substantial performance advantage over Subversion on

385 every revision control operation I have benchmarked. I have measured

386 its advantage as ranging from a factor of two to a factor of six when

387 compared with Subversion~1.4.3's \emph{ra\_local} file store, which is

388 the fastest access method available). In more realistic deployments

389 involving a network-based store, Subversion will be at a substantially

390 larger disadvantage. Because many Subversion commands must talk to

391 the server and Subversion does not have useful replication facilities,

392 server capacity and network bandwidth become bottlenecks for modestly

393 large projects.

394

395 Additionally, Subversion incurs substantial storage overhead to avoid

396 network transactions for a few common operations, such as finding

397 modified files (\texttt{status}) and displaying modifications against

398 the current revision (\texttt{diff}). As a result, a Subversion

399 working copy is often the same size as, or larger than, a Mercurial

400 repository and working directory, even though the Mercurial repository

401 contains a complete history of the project.

402

403 Subversion is widely supported by third party tools. Mercurial

404 currently lags considerably in this area. This gap is closing,

405 however, and indeed some of Mercurial's GUI tools now outshine their

406 Subversion equivalents. Like Mercurial, Subversion has an excellent

407 user manual.

408

409 Because Subversion doesn't store revision history on the client, it is

410 well suited to managing projects that deal with lots of large, opaque

411 binary files. If you check in fifty revisions to an incompressible

412 10MB file, Subversion's client-side space usage stays constant The

413 space used by any distributed SCM will grow rapidly in proportion to

414 the number of revisions, because the differences between each revision

415 are large.

416

417 In addition, it's often difficult or, more usually, impossible to

418 merge different versions of a binary file. Subversion's ability to

419 let a user lock a file, so that they temporarily have the exclusive

420 right to commit changes to it, can be a significant advantage to a

421 project where binary files are widely used.

422

423 Mercurial can import revision history from a Subversion repository.

424 It can also export revision history to a Subversion repository. This

425 makes it easy to ``test the waters'' and use Mercurial and Subversion

426 in parallel before deciding to switch. History conversion is

427 incremental, so you can perform an initial conversion, then small

428 additional conversions afterwards to bring in new changes.

429

430

431 \subsection{Git}

432

433 Git is a distributed revision control tool that was developed for

434 managing the Linux kernel source tree. Like Mercurial, its early

435 design was somewhat influenced by Monotone.

436

437 Git has a very large command set, with version~1.5.0 providing~139

438 individual commands. It has something of a reputation for being

439 difficult to learn. Compared to Git, Mercurial has a strong focus on

440 simplicity.

441

442 In terms of performance, Git is extremely fast. In several cases, it

443 is faster than Mercurial, at least on Linux, while Mercurial performs

444 better on other operations. However, on Windows, the performance and

445 general level of support that Git provides is, at the time of writing,

446 far behind that of Mercurial.

447

448 While a Mercurial repository needs no maintenance, a Git repository

449 requires frequent manual ``repacks'' of its metadata. Without these,

450 performance degrades, while space usage grows rapidly. A server that

451 contains many Git repositories that are not rigorously and frequently

452 repacked will become heavily disk-bound during backups, and there have

453 been instances of daily backups taking far longer than~24 hours as a

454 result. A freshly packed Git repository is slightly smaller than a

455 Mercurial repository, but an unpacked repository is several orders of

456 magnitude larger.

457

458 The core of Git is written in C. Many Git commands are implemented as

459 shell or Perl scripts, and the quality of these scripts varies widely.

460 I have encountered several instances where scripts charged along

461 blindly in the presence of errors that should have been fatal.

462

463 Mercurial can import revision history from a Git repository.

464

465

466 \subsection{CVS}

467

468 CVS is probably the most widely used revision control tool in the

469 world. Due to its age and internal untidiness, it has been only

470 lightly maintained for many years.

471

472 It has a centralised client/server architecture. It does not group

473 related file changes into atomic commits, making it easy for people to

474 ``break the build'': one person can successfully commit part of a

475 change and then be blocked by the need for a merge, causing other

476 people to see only a portion of the work they intended to do. This

477 also affects how you work with project history. If you want to see

478 all of the modifications someone made as part of a task, you will need

479 to manually inspect the descriptions and timestamps of the changes

480 made to each file involved (if you even know what those files were).

481

482 CVS has a muddled notion of tags and branches that I will not attempt

483 to even describe. It does not support renaming of files or

484 directories well, making it easy to corrupt a repository. It has

485 almost no internal consistency checking capabilities, so it is usually

486 not even possible to tell whether or how a repository is corrupt. I

487 would not recommend CVS for any project, existing or new.

488

489 Mercurial can import CVS revision history. However, there are a few

490 caveats that apply; these are true of every other revision control

491 tool's CVS importer, too. Due to CVS's lack of atomic changes and

492 unversioned filesystem hierarchy, it is not possible to reconstruct

493 CVS history completely accurately; some guesswork is involved, and

494 renames will usually not show up. Because a lot of advanced CVS

495 administration has to be done by hand and is hence error-prone, it's

496 common for CVS importers to run into multiple problems with corrupted

497 repositories (completely bogus revision timestamps and files that have

498 remained locked for over a decade are just two of the less interesting

499 problems I can recall from personal experience).

500

501 Mercurial can import revision history from a CVS repository.

502

503

504 \subsection{Commercial tools}

505

506 Perforce has a centralised client/server architecture, with no

507 client-side caching of any data. Unlike modern revision control

508 tools, Perforce requires that a user run a command to inform the

509 server about every file they intend to edit.

510

511 The performance of Perforce is quite good for small teams, but it

512 falls off rapidly as the number of users grows beyond a few dozen.

513 Modestly large Perforce installations require the deployment of

514 proxies to cope with the load their users generate.

515

516

517 \subsection{Choosing a revision control tool}

518

519 With the exception of CVS, all of the tools listed above have unique

520 strengths that suit them to particular styles of work. There is no

521 single revision control tool that is best in all situations.

522

523 As an example, Subversion is a good choice for working with frequently

524 edited binary files, due to its centralised nature and support for

525 file locking. If you're averse to the command line, it currently has

526 better GUI support than other free revision control tools. However,

527 its poor merging is a substantial liability for busy projects with

528 overlapping development.

529

530 I personally find Mercurial's properties of simplicity, performance,

531 and good merge support to be a compelling combination that has served

532 me well for several years.

533

534

535 \section{Switching from another tool to Mercurial}

536

537 Mercurial is bundled with an extension named \hgext{convert}, which

538 can incrementally import revision history from several other revision

539 control tools. By ``incremental'', I mean that you can convert all of

540 a project's history to date in one go, then rerun the conversion later

541 to obtain new changes that happened after the initial conversion.

542

543 The revision control tools supported by \hgext{convert} are as

544 follows:

545 \begin{itemize}

546 \item Subversion

547 \item CVS

548 \item Git

549 \item Darcs

550 \end{itemize}

551

552 In addition, \hgext{convert} can export changes from Mercurial to

553 Subversion. This makes it possible to try Subversion and Mercurial in

554 parallel before committing to a switchover, without risking the loss

555 of any work.

556

557 The \hgxcmd{conver}{convert} command is easy to use. Simply point it

558 at the path or URL of the source repository, optionally give it the

559 name of the destination repository, and it will start working. After

560 the initial conversion, just run the same command again to import new

561 changes.

562

563

564 %%% Local Variables:

565 %%% mode: latex

566 %%% TeX-master: "00book"

567 %%% End: