hgbook: c44d5854620b en/ch00-preface.xml

hgbook

view en/ch00-preface.xml @ 609:c44d5854620b

Fix up chapter 1.

author	Bryan O'Sullivan <bos@serpentine.com>
date	Tue Mar 31 22:38:30 2009 -0700 (2009-03-31)
parents	4ce9d0754af3
children	3b33dd6aba87

line source

1

3 <preface id="chap:preface">

4 <?dbhtml filename="preface.html"?>

5 <title>Preface</title>

7 <sect1>

8 <title>Why revision control? Why Mercurial?</title>

10 <para id="x_6d">Revision control is the process of managing multiple

11 versions of a piece of information. In its simplest form, this

12 is something that many people do by hand: every time you modify

13 a file, save it under a new name that contains a number, each

14 one higher than the number of the preceding version.</para>

16 <para id="x_6e">Manually managing multiple versions of even a single file is

17 an error-prone task, though, so software tools to help automate

18 this process have long been available. The earliest automated

19 revision control tools were intended to help a single user to

20 manage revisions of a single file. Over the past few decades,

21 the scope of revision control tools has expanded greatly; they

22 now manage multiple files, and help multiple people to work

23 together. The best modern revision control tools have no

24 problem coping with thousands of people working together on

25 projects that consist of hundreds of thousands of files.</para>

27 <para id="x_6f">The arrival of distributed revision control is relatively

28 recent, and so far this new field has grown due to people's

29 willingness to explore ill-charted territory.</para>

31 <para id="x_70">I am writing a book about distributed revision control

32 because I believe that it is an important subject that deserves

33 a field guide. I chose to write about Mercurial because it is

34 the easiest tool to learn the terrain with, and yet it scales to

35 the demands of real, challenging environments where many other

36 revision control tools buckle.</para>

38 <sect2>

39 <title>Why use revision control?</title>

41 <para id="x_71">There are a number of reasons why you or your team might

42 want to use an automated revision control tool for a

43 project.</para>

45 <itemizedlist>

46 <listitem><para id="x_72">It will track the history and evolution of

47 your project, so you don't have to. For every change,

48 you'll have a log of <emphasis>who</emphasis> made it;

49 <emphasis>why</emphasis> they made it;

50 <emphasis>when</emphasis> they made it; and

51 <emphasis>what</emphasis> the change

52 was.</para></listitem>

53 <listitem><para id="x_73">When you're working with other people,

54 revision control software makes it easier for you to

55 collaborate. For example, when people more or less

56 simultaneously make potentially incompatible changes, the

57 software will help you to identify and resolve those

58 conflicts.</para></listitem>

59 <listitem><para id="x_74">It can help you to recover from mistakes. If

60 you make a change that later turns out to be in error, you

61 can revert to an earlier version of one or more files. In

62 fact, a <emphasis>really</emphasis> good revision control

63 tool will even help you to efficiently figure out exactly

64 when a problem was introduced (see <xref

65 linkend="sec:undo:bisect"/> for details).</para></listitem>

66 <listitem><para id="x_75">It will help you to work simultaneously on,

67 and manage the drift between, multiple versions of your

68 project.</para></listitem>

69 </itemizedlist>

71 <para id="x_76">Most of these reasons are equally

72 valid&emdash;at least in theory&emdash;whether you're working

73 on a project by yourself, or with a hundred other

74 people.</para>

76 <para id="x_77">A key question about the practicality of revision control

77 at these two different scales (<quote>lone hacker</quote> and

78 <quote>huge team</quote>) is how its

79 <emphasis>benefits</emphasis> compare to its

80 <emphasis>costs</emphasis>. A revision control tool that's

81 difficult to understand or use is going to impose a high

82 cost.</para>

84 <para id="x_78">A five-hundred-person project is likely to collapse under

85 its own weight almost immediately without a revision control

86 tool and process. In this case, the cost of using revision

87 control might hardly seem worth considering, since

88 <emphasis>without</emphasis> it, failure is almost

89 guaranteed.</para>

91 <para id="x_79">On the other hand, a one-person <quote>quick hack</quote>

92 might seem like a poor place to use a revision control tool,

93 because surely the cost of using one must be close to the

94 overall cost of the project. Right?</para>

96 <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of

97 these scales of development. You can learn the basics in just

98 a few minutes, and due to its low overhead, you can apply

99 revision control to the smallest of projects with ease. Its

100 simplicity means you won't have a lot of abstruse concepts or

101 command sequences competing for mental space with whatever

102 you're <emphasis>really</emphasis> trying to do. At the same

103 time, Mercurial's high performance and peer-to-peer nature let

104 you scale painlessly to handle large projects.</para>

105

106 <para id="x_7b">No revision control tool can rescue a poorly run project,

107 but a good choice of tools can make a huge difference to the

108 fluidity with which you can work on a project.</para>

109

110 </sect2>

111

112 <sect2>

113 <title>The many names of revision control</title>

114

115 <para id="x_7c">Revision control is a diverse field, so much so that it is

116 referred to by many names and acronyms. Here are a few of the

117 more common variations you'll encounter:</para>

118 <itemizedlist>

119 <listitem><para id="x_7d">Revision control (RCS)</para></listitem>

120 <listitem><para id="x_7e">Software configuration management (SCM), or

121 configuration management</para></listitem>

122 <listitem><para id="x_7f">Source code management</para></listitem>

123 <listitem><para id="x_80">Source code control, or source

124 control</para></listitem>

125 <listitem><para id="x_81">Version control

126 (VCS)</para></listitem></itemizedlist>

127 <para id="x_82">Some people claim that these terms actually have different

128 meanings, but in practice they overlap so much that there's no

129 agreed or even useful way to tease them apart.</para>

130

131 </sect2>

132 </sect1>

133

134 <sect1>

135 <title>This book is a work in progress</title>

136

137 <para id="x_83">I am releasing this book while I am still writing it, in the

138 hope that it will prove useful to others. I am writing under an

139 open license in the hope that you, my readers, will contribute

140 feedback and perhaps content of your own.</para>

141

142 </sect1>

143 <sect1>

144 <title>About the examples in this book</title>

145

146 <para id="x_84">This book takes an unusual approach to code samples. Every

147 example is <quote>live</quote>&emdash;each one is actually the result

148 of a shell script that executes the Mercurial commands you see.

149 Every time an image of the book is built from its sources, all

150 the example scripts are automatically run, and their current

151 results compared against their expected results.</para>

152

153 <para id="x_85">The advantage of this approach is that the examples are

154 always accurate; they describe <emphasis>exactly</emphasis> the

155 behaviour of the version of Mercurial that's mentioned at the

156 front of the book. If I update the version of Mercurial that

157 I'm documenting, and the output of some command changes, the

158 build fails.</para>

159

160 <para id="x_86">There is a small disadvantage to this approach, which is

161 that the dates and times you'll see in examples tend to be

162 <quote>squashed</quote> together in a way that they wouldn't be

163 if the same commands were being typed by a human. Where a human

164 can issue no more than one command every few seconds, with any

165 resulting timestamps correspondingly spread out, my automated

166 example scripts run many commands in one second.</para>

167

168 <para id="x_87">As an instance of this, several consecutive commits in an

169 example can show up as having occurred during the same second.

170 You can see this occur in the <literal

171 role="hg-ext">bisect</literal> example in <xref

172 linkend="sec:undo:bisect"/>, for instance.</para>

173

174 <para id="x_88">So when you're reading examples, don't place too much weight

175 on the dates or times you see in the output of commands. But

176 <emphasis>do</emphasis> be confident that the behaviour you're

177 seeing is consistent and reproducible.</para>

178

179 </sect1>

180

181 <sect1>

182 <title>Trends in the field</title>

183

184 <para id="x_89">There has been an unmistakable trend in the development and

185 use of revision control tools over the past four decades, as

186 people have become familiar with the capabilities of their tools

187 and constrained by their limitations.</para>

188

189 <para id="x_8a">The first generation began by managing single files on

190 individual computers. Although these tools represented a huge

191 advance over ad-hoc manual revision control, their locking model

192 and reliance on a single computer limited them to small,

193 tightly-knit teams.</para>

194

195 <para id="x_8b">The second generation loosened these constraints by moving

196 to network-centered architectures, and managing entire projects

197 at a time. As projects grew larger, they ran into new problems.

198 With clients needing to talk to servers very frequently, server

199 scaling became an issue for large projects. An unreliable

200 network connection could prevent remote users from being able to

201 talk to the server at all. As open source projects started

202 making read-only access available anonymously to anyone, people

203 without commit privileges found that they could not use the

204 tools to interact with a project in a natural way, as they could

205 not record their changes.</para>

206

207 <para id="x_8c">The current generation of revision control tools is

208 peer-to-peer in nature. All of these systems have dropped the

209 dependency on a single central server, and allow people to

210 distribute their revision control data to where it's actually

211 needed. Collaboration over the Internet has moved from

212 constrained by technology to a matter of choice and consensus.

213 Modern tools can operate offline indefinitely and autonomously,

214 with a network connection only needed when syncing changes with

215 another repository.</para>

216

217 </sect1>

218 <sect1>

219 <title>A few of the advantages of distributed revision

220 control</title>

221

222 <para id="x_8d">Even though distributed revision control tools have for

223 several years been as robust and usable as their

224 previous-generation counterparts, people using older tools have

225 not yet necessarily woken up to their advantages. There are a

226 number of ways in which distributed tools shine relative to

227 centralised ones.</para>

228

229 <para id="x_8e">For an individual developer, distributed tools are almost

230 always much faster than centralised tools. This is for a simple

231 reason: a centralised tool needs to talk over the network for

232 many common operations, because most metadata is stored in a

233 single copy on the central server. A distributed tool stores

234 all of its metadata locally. All else being equal, talking over

235 the network adds overhead to a centralised tool. Don't

236 underestimate the value of a snappy, responsive tool: you're

237 going to spend a lot of time interacting with your revision

238 control software.</para>

239

240 <para id="x_8f">Distributed tools are indifferent to the vagaries of your

241 server infrastructure, again because they replicate metadata to

242 so many locations. If you use a centralised system and your

243 server catches fire, you'd better hope that your backup media

244 are reliable, and that your last backup was recent and actually

245 worked. With a distributed tool, you have many backups

246 available on every contributor's computer.</para>

247

248 <para id="x_90">The reliability of your network will affect distributed

249 tools far less than it will centralised tools. You can't even

250 use a centralised tool without a network connection, except for

251 a few highly constrained commands. With a distributed tool, if

252 your network connection goes down while you're working, you may

253 not even notice. The only thing you won't be able to do is talk

254 to repositories on other computers, something that is relatively

255 rare compared with local operations. If you have a far-flung

256 team of collaborators, this may be significant.</para>

257

258 <sect2>

259 <title>Advantages for open source projects</title>

260

261 <para id="x_91">If you take a shine to an open source project and decide

262 that you would like to start hacking on it, and that project

263 uses a distributed revision control tool, you are at once a

264 peer with the people who consider themselves the

265 <quote>core</quote> of that project. If they publish their

266 repositories, you can immediately copy their project history,

267 start making changes, and record your work, using the same

268 tools in the same ways as insiders. By contrast, with a

269 centralised tool, you must use the software in a <quote>read

270 only</quote> mode unless someone grants you permission to

271 commit changes to their central server. Until then, you won't

272 be able to record changes, and your local modifications will

273 be at risk of corruption any time you try to update your

274 client's view of the repository.</para>

275

276 <sect3>

277 <title>The forking non-problem</title>

278

279 <para id="x_92">It has been suggested that distributed revision control

280 tools pose some sort of risk to open source projects because

281 they make it easy to <quote>fork</quote> the development of

282 a project. A fork happens when there are differences in

283 opinion or attitude between groups of developers that cause

284 them to decide that they can't work together any longer.

285 Each side takes a more or less complete copy of the

286 project's source code, and goes off in its own

287 direction.</para>

288

289 <para id="x_93">Sometimes the camps in a fork decide to reconcile their

290 differences. With a centralised revision control system, the

291 <emphasis>technical</emphasis> process of reconciliation is

292 painful, and has to be performed largely by hand. You have

293 to decide whose revision history is going to

294 <quote>win</quote>, and graft the other team's changes into

295 the tree somehow. This usually loses some or all of one

296 side's revision history.</para>

297

298 <para id="x_94">What distributed tools do with respect to forking is

299 they make forking the <emphasis>only</emphasis> way to

300 develop a project. Every single change that you make is

301 potentially a fork point. The great strength of this

302 approach is that a distributed revision control tool has to

303 be really good at <emphasis>merging</emphasis> forks,

304 because forks are absolutely fundamental: they happen all

305 the time.</para>

306

307 <para id="x_95">If every piece of work that everybody does, all the

308 time, is framed in terms of forking and merging, then what

309 the open source world refers to as a <quote>fork</quote>

310 becomes <emphasis>purely</emphasis> a social issue. If

311 anything, distributed tools <emphasis>lower</emphasis> the

312 likelihood of a fork:</para>

313 <itemizedlist>

314 <listitem><para id="x_96">They eliminate the social distinction that

315 centralised tools impose: that between insiders (people

316 with commit access) and outsiders (people

317 without).</para></listitem>

318 <listitem><para id="x_97">They make it easier to reconcile after a

319 social fork, because all that's involved from the

320 perspective of the revision control software is just

321 another merge.</para></listitem></itemizedlist>

322

323 <para id="x_98">Some people resist distributed tools because they want

324 to retain tight control over their projects, and they

325 believe that centralised tools give them this control.

326 However, if you're of this belief, and you publish your CVS

327 or Subversion repositories publicly, there are plenty of

328 tools available that can pull out your entire project's

329 history (albeit slowly) and recreate it somewhere that you

330 don't control. So while your control in this case is

331 illusory, you are forgoing the ability to fluidly

332 collaborate with whatever people feel compelled to mirror

333 and fork your history.</para>

334

335 </sect3>

336 </sect2>

337 <sect2>

338 <title>Advantages for commercial projects</title>

339

340 <para id="x_99">Many commercial projects are undertaken by teams that are

341 scattered across the globe. Contributors who are far from a

342 central server will see slower command execution and perhaps

343 less reliability. Commercial revision control systems attempt

344 to ameliorate these problems with remote-site replication

345 add-ons that are typically expensive to buy and cantankerous

346 to administer. A distributed system doesn't suffer from these

347 problems in the first place. Better yet, you can easily set

348 up multiple authoritative servers, say one per site, so that

349 there's no redundant communication between repositories over

350 expensive long-haul network links.</para>

351

352 <para id="x_9a">Centralised revision control systems tend to have

353 relatively low scalability. It's not unusual for an expensive

354 centralised system to fall over under the combined load of

355 just a few dozen concurrent users. Once again, the typical

356 response tends to be an expensive and clunky replication

357 facility. Since the load on a central server&emdash;if you have

358 one at all&emdash;is many times lower with a distributed tool

359 (because all of the data is replicated everywhere), a single

360 cheap server can handle the needs of a much larger team, and

361 replication to balance load becomes a simple matter of

362 scripting.</para>

363

364 <para id="x_9b">If you have an employee in the field, troubleshooting a

365 problem at a customer's site, they'll benefit from distributed

366 revision control. The tool will let them generate custom

367 builds, try different fixes in isolation from each other, and

368 search efficiently through history for the sources of bugs and

369 regressions in the customer's environment, all without needing

370 to connect to your company's network.</para>

371

372 </sect2>

373 </sect1>

374 <sect1>

375 <title>Why choose Mercurial?</title>

376

377 <para id="x_9c">Mercurial has a unique set of properties that make it a

378 particularly good choice as a revision control system.</para>

379 <itemizedlist>

380 <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>

381 <listitem><para id="x_9e">It is lightweight.</para></listitem>

382 <listitem><para id="x_9f">It scales excellently.</para></listitem>

383 <listitem><para id="x_a0">It is easy to

384 customise.</para></listitem></itemizedlist>

385

386 <para id="x_a1">If you are at all familiar with revision control systems,

387 you should be able to get up and running with Mercurial in less

388 than five minutes. Even if not, it will take no more than a few

389 minutes longer. Mercurial's command and feature sets are

390 generally uniform and consistent, so you can keep track of a few

391 general rules instead of a host of exceptions.</para>

392

393 <para id="x_a2">On a small project, you can start working with Mercurial in

394 moments. Creating new changes and branches; transferring changes

395 around (whether locally or over a network); and history and

396 status operations are all fast. Mercurial attempts to stay

397 nimble and largely out of your way by combining low cognitive

398 overhead with blazingly fast operations.</para>

399

400 <para id="x_a3">The usefulness of Mercurial is not limited to small

401 projects: it is used by projects with hundreds to thousands of

402 contributors, each containing tens of thousands of files and

403 hundreds of megabytes of source code.</para>

404

405 <para id="x_a4">If the core functionality of Mercurial is not enough for

406 you, it's easy to build on. Mercurial is well suited to

407 scripting tasks, and its clean internals and implementation in

408 Python make it easy to add features in the form of extensions.

409 There are a number of popular and useful extensions already

410 available, ranging from helping to identify bugs to improving

411 performance.</para>

412

413 </sect1>

414 <sect1>

415 <title>Mercurial compared with other tools</title>

416

417 <para id="x_a5">Before you read on, please understand that this section

418 necessarily reflects my own experiences, interests, and (dare I

419 say it) biases. I have used every one of the revision control

420 tools listed below, in most cases for several years at a

421 time.</para>

422

423

424 <sect2>

425 <title>Subversion</title>

426

427 <para id="x_a6">Subversion is a popular revision control tool, developed

428 to replace CVS. It has a centralised client/server

429 architecture.</para>

430

431 <para id="x_a7">Subversion and Mercurial have similarly named commands for

432 performing the same operations, so if you're familiar with

433 one, it is easy to learn to use the other. Both tools are

434 portable to all popular operating systems.</para>

435

436 <para id="x_a8">Prior to version 1.5, Subversion had no useful support for

437 merges. At the time of writing, its merge tracking capability

438 is new, and known to be <ulink

439 url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated

440 and buggy</ulink>.</para>

441

442 <para id="x_a9">Mercurial has a substantial performance advantage over

443 Subversion on every revision control operation I have

444 benchmarked. I have measured its advantage as ranging from a

445 factor of two to a factor of six when compared with Subversion

446 1.4.3's <emphasis>ra_local</emphasis> file store, which is the

447 fastest access method available. In more realistic

448 deployments involving a network-based store, Subversion will

449 be at a substantially larger disadvantage. Because many

450 Subversion commands must talk to the server and Subversion

451 does not have useful replication facilities, server capacity

452 and network bandwidth become bottlenecks for modestly large

453 projects.</para>

454

455 <para id="x_aa">Additionally, Subversion incurs substantial storage

456 overhead to avoid network transactions for a few common

457 operations, such as finding modified files

458 (<literal>status</literal>) and displaying modifications

459 against the current revision (<literal>diff</literal>). As a

460 result, a Subversion working copy is often the same size as,

461 or larger than, a Mercurial repository and working directory,

462 even though the Mercurial repository contains a complete

463 history of the project.</para>

464

465 <para id="x_ab">Subversion is widely supported by third party tools.

466 Mercurial currently lags considerably in this area. This gap

467 is closing, however, and indeed some of Mercurial's GUI tools

468 now outshine their Subversion equivalents. Like Mercurial,

469 Subversion has an excellent user manual.</para>

470

471 <para id="x_ac">Because Subversion doesn't store revision history on the

472 client, it is well suited to managing projects that deal with

473 lots of large, opaque binary files. If you check in fifty

474 revisions to an incompressible 10MB file, Subversion's

475 client-side space usage stays constant The space used by any

476 distributed SCM will grow rapidly in proportion to the number

477 of revisions, because the differences between each revision

478 are large.</para>

479

480 <para id="x_ad">In addition, it's often difficult or, more usually,

481 impossible to merge different versions of a binary file.

482 Subversion's ability to let a user lock a file, so that they

483 temporarily have the exclusive right to commit changes to it,

484 can be a significant advantage to a project where binary files

485 are widely used.</para>

486

487 <para id="x_ae">Mercurial can import revision history from a Subversion

488 repository. It can also export revision history to a

489 Subversion repository. This makes it easy to <quote>test the

490 waters</quote> and use Mercurial and Subversion in parallel

491 before deciding to switch. History conversion is incremental,

492 so you can perform an initial conversion, then small

493 additional conversions afterwards to bring in new

494 changes.</para>

495

496

497 </sect2>

498 <sect2>

499 <title>Git</title>

500

501 <para id="x_af">Git is a distributed revision control tool that was

502 developed for managing the Linux kernel source tree. Like

503 Mercurial, its early design was somewhat influenced by

504 Monotone.</para>

505

506 <para id="x_b0">Git has a very large command set, with version 1.5.0

507 providing 139 individual commands. It has something of a

508 reputation for being difficult to learn. Compared to Git,

509 Mercurial has a strong focus on simplicity.</para>

510

511 <para id="x_b1">In terms of performance, Git is extremely fast. In

512 several cases, it is faster than Mercurial, at least on Linux,

513 while Mercurial performs better on other operations. However,

514 on Windows, the performance and general level of support that

515 Git provides is, at the time of writing, far behind that of

516 Mercurial.</para>

517

518 <para id="x_b2">While a Mercurial repository needs no maintenance, a Git

519 repository requires frequent manual <quote>repacks</quote> of

520 its metadata. Without these, performance degrades, while

521 space usage grows rapidly. A server that contains many Git

522 repositories that are not rigorously and frequently repacked

523 will become heavily disk-bound during backups, and there have

524 been instances of daily backups taking far longer than 24

525 hours as a result. A freshly packed Git repository is

526 slightly smaller than a Mercurial repository, but an unpacked

527 repository is several orders of magnitude larger.</para>

528

529 <para id="x_b3">The core of Git is written in C. Many Git commands are

530 implemented as shell or Perl scripts, and the quality of these

531 scripts varies widely. I have encountered several instances

532 where scripts charged along blindly in the presence of errors

533 that should have been fatal.</para>

534

535 <para id="x_b4">Mercurial can import revision history from a Git

536 repository.</para>

537

538

539 </sect2>

540 <sect2>

541 <title>CVS</title>

542

543 <para id="x_b5">CVS is probably the most widely used revision control tool

544 in the world. Due to its age and internal untidiness, it has

545 been only lightly maintained for many years.</para>

546

547 <para id="x_b6">It has a centralised client/server architecture. It does

548 not group related file changes into atomic commits, making it

549 easy for people to <quote>break the build</quote>: one person

550 can successfully commit part of a change and then be blocked

551 by the need for a merge, causing other people to see only a

552 portion of the work they intended to do. This also affects

553 how you work with project history. If you want to see all of

554 the modifications someone made as part of a task, you will

555 need to manually inspect the descriptions and timestamps of

556 the changes made to each file involved (if you even know what

557 those files were).</para>

558

559 <para id="x_b7">CVS has a muddled notion of tags and branches that I will

560 not attempt to even describe. It does not support renaming of

561 files or directories well, making it easy to corrupt a

562 repository. It has almost no internal consistency checking

563 capabilities, so it is usually not even possible to tell

564 whether or how a repository is corrupt. I would not recommend

565 CVS for any project, existing or new.</para>

566

567 <para id="x_b8">Mercurial can import CVS revision history. However, there

568 are a few caveats that apply; these are true of every other

569 revision control tool's CVS importer, too. Due to CVS's lack

570 of atomic changes and unversioned filesystem hierarchy, it is

571 not possible to reconstruct CVS history completely accurately;

572 some guesswork is involved, and renames will usually not show

573 up. Because a lot of advanced CVS administration has to be

574 done by hand and is hence error-prone, it's common for CVS

575 importers to run into multiple problems with corrupted

576 repositories (completely bogus revision timestamps and files

577 that have remained locked for over a decade are just two of

578 the less interesting problems I can recall from personal

579 experience).</para>

580

581 <para id="x_b9">Mercurial can import revision history from a CVS

582 repository.</para>

583

584

585 </sect2>

586 <sect2>

587 <title>Commercial tools</title>

588

589 <para id="x_ba">Perforce has a centralised client/server architecture,

590 with no client-side caching of any data. Unlike modern

591 revision control tools, Perforce requires that a user run a

592 command to inform the server about every file they intend to

593 edit.</para>

594

595 <para id="x_bb">The performance of Perforce is quite good for small teams,

596 but it falls off rapidly as the number of users grows beyond a

597 few dozen. Modestly large Perforce installations require the

598 deployment of proxies to cope with the load their users

599 generate.</para>

600

601

602 </sect2>

603 <sect2>

604 <title>Choosing a revision control tool</title>

605

606 <para id="x_bc">With the exception of CVS, all of the tools listed above

607 have unique strengths that suit them to particular styles of

608 work. There is no single revision control tool that is best

609 in all situations.</para>

610

611 <para id="x_bd">As an example, Subversion is a good choice for working

612 with frequently edited binary files, due to its centralised

613 nature and support for file locking.</para>

614

615 <para id="x_be">I personally find Mercurial's properties of simplicity,

616 performance, and good merge support to be a compelling

617 combination that has served me well for several years.</para>

618

619

620 </sect2>

621 </sect1>

622 <sect1>

623 <title>Switching from another tool to Mercurial</title>

624

625 <para id="x_bf">Mercurial is bundled with an extension named <literal

626 role="hg-ext">convert</literal>, which can incrementally

627 import revision history from several other revision control

628 tools. By <quote>incremental</quote>, I mean that you can

629 convert all of a project's history to date in one go, then rerun

630 the conversion later to obtain new changes that happened after

631 the initial conversion.</para>

632

633 <para id="x_c0">The revision control tools supported by <literal

634 role="hg-ext">convert</literal> are as follows:</para>

635 <itemizedlist>

636 <listitem><para id="x_c1">Subversion</para></listitem>

637 <listitem><para id="x_c2">CVS</para></listitem>

638 <listitem><para id="x_c3">Git</para></listitem>

639 <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>

640

641 <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can

642 export changes from Mercurial to Subversion. This makes it

643 possible to try Subversion and Mercurial in parallel before

644 committing to a switchover, without risking the loss of any

645 work.</para>

646

647 <para id="x_c6">The <command role="hg-ext-convert">convert</command> command

648 is easy to use. Simply point it at the path or URL of the

649 source repository, optionally give it the name of the

650 destination repository, and it will start working. After the

651 initial conversion, just run the same command again to import

652 new changes.</para>

653 </sect1>

654

655 <sect1>

656 <title>A short history of revision control</title>

657

658 <para id="x_c7">The best known of the old-time revision control tools is

659 SCCS (Source Code Control System), which Marc Rochkind wrote at

660 Bell Labs, in the early 1970s. SCCS operated on individual

661 files, and required every person working on a project to have

662 access to a shared workspace on a single system. Only one

663 person could modify a file at any time; arbitration for access

664 to files was via locks. It was common for people to lock files,

665 and later forget to unlock them, preventing anyone else from

666 modifying those files without the help of an

667 administrator.</para>

668

669 <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the

670 early 1980s; he called his program RCS (Revision Control System).

671 Like SCCS, RCS required developers to work in a single shared

672 workspace, and to lock files to prevent multiple people from

673 modifying them simultaneously.</para>

674

675 <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block

676 for a set of shell scripts he initially called cmt, but then

677 renamed to CVS (Concurrent Versions System). The big innovation

678 of CVS was that it let developers work simultaneously and

679 somewhat independently in their own personal workspaces. The

680 personal workspaces prevented developers from stepping on each

681 other's toes all the time, as was common with SCCS and RCS. Each

682 developer had a copy of every project file, and could modify

683 their copies independently. They had to merge their edits prior

684 to committing changes to the central repository.</para>

685

686 <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote

687 them in C, releasing in 1989 the code that has since developed

688 into the modern version of CVS. CVS subsequently acquired the

689 ability to operate over a network connection, giving it a

690 client/server architecture. CVS's architecture is centralised;

691 only the server has a copy of the history of the project. Client

692 workspaces just contain copies of recent versions of the

693 project's files, and a little metadata to tell them where the

694 server is. CVS has been enormously successful; it is probably

695 the world's most widely used revision control system.</para>

696

697 <para id="x_cb">In the early 1990s, Sun Microsystems developed an early

698 distributed revision control system, called TeamWare. A

699 TeamWare workspace contains a complete copy of the project's

700 history. TeamWare has no notion of a central repository. (CVS

701 relied upon RCS for its history storage; TeamWare used

702 SCCS.)</para>

703

704 <para id="x_cc">As the 1990s progressed, awareness grew of a number of

705 problems with CVS. It records simultaneous changes to multiple

706 files individually, instead of grouping them together as a

707 single logically atomic operation. It does not manage its file

708 hierarchy well; it is easy to make a mess of a repository by

709 renaming files and directories. Worse, its source code is

710 difficult to read and maintain, which made the <quote>pain

711 level</quote> of fixing these architectural problems

712 prohibitive.</para>

713

714 <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had

715 worked on CVS, started a project to replace it with a tool that

716 would have a better architecture and cleaner code. The result,

717 Subversion, does not stray from CVS's centralised client/server

718 model, but it adds multi-file atomic commits, better namespace

719 management, and a number of other features that make it a

720 generally better tool than CVS. Since its initial release, it

721 has rapidly grown in popularity.</para>

722

723 <para id="x_ce">More or less simultaneously, Graydon Hoare began working on

724 an ambitious distributed revision control system that he named

725 Monotone. While Monotone addresses many of CVS's design flaws

726 and has a peer-to-peer architecture, it goes beyond earlier (and

727 subsequent) revision control tools in a number of innovative

728 ways. It uses cryptographic hashes as identifiers, and has an

729 integral notion of <quote>trust</quote> for code from different

730 sources.</para>

731

732 <para id="x_cf">Mercurial began life in 2005. While a few aspects of its

733 design are influenced by Monotone, Mercurial focuses on ease of

734 use, high performance, and scalability to very large

735 projects.</para>

736

737 </sect1>

738

739 <sect1>

740 <title>Colophon&emdash;this book is Free</title>

741

742 <para id="x_d0">This book is licensed under the Open Publication License,

743 and is produced entirely using Free Software tools. It is

744 typeset with DocBook XML. Illustrations are drawn and rendered with

745 <ulink url="http://www.inkscape.org/">Inkscape</ulink>.</para>

746

747 <para id="x_d1">The complete source code for this book is published as a

748 Mercurial repository, at <ulink

749 url="http://hg.serpentine.com/mercurial/book">http://hg.serpentine.com/mercurial/book</ulink>.</para>

750

751 </sect1>

752 </preface>

753 <!--

754 local variables:

755 sgml-parent-document: ("00book.xml" "book" "preface")

756 end:

757 -->