hgbook: 4c9b9416cd23 en/undo.tex

hgbook

view en/undo.tex @ 223:4c9b9416cd23

Skeleton for chapter on extensions.

author	Bryan O'Sullivan <bos@serpentine.com>
date	Tue May 15 14:55:54 2007 -0700 (2007-05-15)
parents	9bba958be4c6
children	3f53563c7579

line source

1 \chapter{Finding and fixing your mistakes}

2 \label{chap:undo}

4 To err might be human, but to really handle the consequences well

5 takes a top-notch revision control system. In this chapter, we'll

6 discuss some of the techniques you can use when you find that a

7 problem has crept into your project. Mercurial has some highly

8 capable features that will help you to isolate the sources of

9 problems, and to handle them appropriately.

11 \section{Erasing local history}

13 \subsection{The accidental commit}

15 I have the occasional but persistent problem of typing rather more

16 quickly than I can think, which sometimes results in me committing a

17 changeset that is either incomplete or plain wrong. In my case, the

18 usual kind of incomplete changeset is one in which I've created a new

19 source file, but forgotten to \hgcmd{add} it. A ``plain wrong''

20 changeset is not as common, but no less annoying.

22 \subsection{Rolling back a transaction}

23 \label{sec:undo:rollback}

25 In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats

26 each modification of a repository as a \emph{transaction}. Every time

27 you commit a changeset or pull changes from another repository,

28 Mercurial remembers what you did. You can undo, or \emph{roll back},

29 exactly one of these actions using the \hgcmd{rollback} command. (See

30 section~\ref{sec:undo:rollback-after-push} for an important caveat

31 about the use of this command.)

33 Here's a mistake that I often find myself making: committing a change

34 in which I've created a new file, but forgotten to \hgcmd{add} it.

35 \interaction{rollback.commit}

36 Looking at the output of \hgcmd{status} after the commit immediately

37 confirms the error.

38 \interaction{rollback.status}

39 The commit captured the changes to the file \filename{a}, but not the

40 new file \filename{b}. If I were to push this changeset to a

41 repository that I shared with a colleague, the chances are high that

42 something in \filename{a} would refer to \filename{b}, which would not

43 be present in their repository when they pulled my changes. I would

44 thus become the object of some indignation.

46 However, luck is with me---I've caught my error before I pushed the

47 changeset. I use the \hgcmd{rollback} command, and Mercurial makes

48 that last changeset vanish.

49 \interaction{rollback.rollback}

50 Notice that the changeset is no longer present in the repository's

51 history, and the working directory once again thinks that the file

52 \filename{a} is modified. The commit and rollback have left the

53 working directory exactly as it was prior to the commit; the changeset

54 has been completely erased. I can now safely \hgcmd{add} the file

55 \filename{b}, and rerun my commit.

56 \interaction{rollback.add}

58 \subsection{The erroneous pull}

60 It's common practice with Mercurial to maintain separate development

61 branches of a project in different repositories. Your development

62 team might have one shared repository for your project's ``0.9''

63 release, and another, containing different changes, for the ``1.0''

64 release.

66 Given this, you can imagine that the consequences could be messy if

67 you had a local ``0.9'' repository, and accidentally pulled changes

68 from the shared ``1.0'' repository into it. At worst, you could be

69 paying insufficient attention, and push those changes into the shared

70 ``0.9'' tree, confusing your entire team (but don't worry, we'll

71 return to this horror scenario later). However, it's more likely that

72 you'll notice immediately, because Mercurial will display the URL it's

73 pulling from, or you will see it pull a suspiciously large number of

74 changes into the repository.

76 The \hgcmd{rollback} command will work nicely to expunge all of the

77 changesets that you just pulled. Mercurial groups all changes from

78 one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is

79 all you need to undo this mistake.

81 \subsection{Rolling back is useless once you've pushed}

82 \label{sec:undo:rollback-after-push}

84 The value of the \hgcmd{rollback} command drops to zero once you've

85 pushed your changes to another repository. Rolling back a change

86 makes it disappear entirely, but \emph{only} in the repository in

87 which you perform the \hgcmd{rollback}. Because a rollback eliminates

88 history, there's no way for the disappearance of a change to propagate

89 between repositories.

91 If you've pushed a change to another repository---particularly if it's

92 a shared repository---it has essentially ``escaped into the wild,''

93 and you'll have to recover from your mistake in a different way. What

94 will happen if you push a changeset somewhere, then roll it back, then

95 pull from the repository you pushed to, is that the changeset will

96 reappear in your repository.

98 (If you absolutely know for sure that the change you want to roll back

99 is the most recent change in the repository that you pushed to,

100 \emph{and} you know that nobody else could have pulled it from that

101 repository, you can roll back the changeset there, too, but you really

102 should really not rely on this working reliably. If you do this,

103 sooner or later a change really will make it into a repository that

104 you don't directly control (or have forgotten about), and come back to

105 bite you.)

106

107 \subsection{You can only roll back once}

108

109 Mercurial stores exactly one transaction in its transaction log; that

110 transaction is the most recent one that occurred in the repository.

111 This means that you can only roll back one transaction. If you expect

112 to be able to roll back one transaction, then its predecessor, this is

113 not the behaviour you will get.

114 \interaction{rollback.twice}

115 Once you've rolled back one transaction in a repository, you can't

116 roll back again in that repository until you perform another commit or

117 pull.

118

119 \section{Reverting the mistaken change}

120

121 If you make a modification to a file, and decide that you really

122 didn't want to change the file at all, and you haven't yet committed

123 your changes, the \hgcmd{revert} command is the one you'll need. It

124 looks at the changeset that's the parent of the working directory, and

125 restores the contents of the file to their state as of that changeset.

126 (That's a long-winded way of saying that, in the normal case, it

127 undoes your modifications.)

128

129 Let's illustrate how the \hgcmd{revert} command works with yet another

130 small example. We'll begin by modifying a file that Mercurial is

131 already tracking.

132 \interaction{daily.revert.modify}

133 If we don't want that change, we can simply \hgcmd{revert} the file.

134 \interaction{daily.revert.unmodify}

135 The \hgcmd{revert} command provides us with an extra degree of safety

136 by saving our modified file with a \filename{.orig} extension.

137 \interaction{daily.revert.status}

138

139 Here is a summary of the cases that the \hgcmd{revert} command can

140 deal with. We will describe each of these in more detail in the

141 section that follows.

142 \begin{itemize}

143 \item If you modify a file, it will restore the file to its unmodified

144 state.

145 \item If you \hgcmd{add} a file, it will undo the ``added'' state of

146 the file, but leave the file itself untouched.

147 \item If you delete a file without telling Mercurial, it will restore

148 the file to its unmodified contents.

149 \item If you use the \hgcmd{remove} command to remove a file, it will

150 undo the ``removed'' state of the file, and restore the file to its

151 unmodified contents.

152 \end{itemize}

153

154 \subsection{File management errors}

155 \label{sec:undo:mgmt}

156

157 The \hgcmd{revert} command is useful for more than just modified

158 files. It lets you reverse the results of all of Mercurial's file

159 management commands---\hgcmd{add}, \hgcmd{remove}, and so on.

160

161 If you \hgcmd{add} a file, then decide that in fact you don't want

162 Mercurial to track it, use \hgcmd{revert} to undo the add. Don't

163 worry; Mercurial will not modify the file in any way. It will just

164 ``unmark'' the file.

165 \interaction{daily.revert.add}

166

167 Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use

168 \hgcmd{revert} to restore it to the contents it had as of the parent

169 of the working directory.

170 \interaction{daily.revert.remove}

171 This works just as well for a file that you deleted by hand, without

172 telling Mercurial (recall that in Mercurial terminology, this kind of

173 file is called ``missing'').

174 \interaction{daily.revert.missing}

175

176 If you revert a \hgcmd{copy}, the copied-to file remains in your

177 working directory afterwards, untracked. Since a copy doesn't affect

178 the copied-from file in any way, Mercurial doesn't do anything with

179 the copied-from file.

180 \interaction{daily.revert.copy}

181

182 \subsubsection{A slightly special case: reverting a rename}

183

184 If you \hgcmd{rename} a file, there is one small detail that

185 you should remember. When you \hgcmd{revert} a rename, it's not

186 enough to provide the name of the renamed-to file, as you can see

187 here.

188 \interaction{daily.revert.rename}

189 As you can see from the output of \hgcmd{status}, the renamed-to file

190 is no longer identified as added, but the renamed-\emph{from} file is

191 still removed! This is counter-intuitive (at least to me), but at

192 least it's easy to deal with.

193 \interaction{daily.revert.rename-orig}

194 So remember, to revert a \hgcmd{rename}, you must provide \emph{both}

195 the source and destination names.

196

197 (By the way, if you rename a file, then modify the renamed-to file,

198 then revert both components of the rename, when Mercurial restores the

199 file that was removed as part of the rename, it will be unmodified.

200 If you need the modifications in the renamed-to file to show up in the

201 renamed-from file, don't forget to copy them over.)

202

203 These fiddly aspects of reverting a rename arguably constitute a small

204 bug in Mercurial.

205

206 \section{Dealing with committed changes}

207

208 Consider a case where you have committed a change $a$, and another

209 change $b$ on top of it; you then realise that change $a$ was

210 incorrect. Mercurial lets you ``back out'' an entire changeset

211 automatically, and building blocks that let you reverse part of a

212 changeset by hand.

213

214 Before you read this section, here's something to keep in mind: the

215 \hgcmd{backout} command undoes changes by \emph{adding} history, not

216 by modifying or erasing it. It's the right tool to use if you're

217 fixing bugs, but not if you're trying to undo some change that has

218 catastrophic consequences. To deal with those, see

219 section~\ref{sec:undo:aaaiiieee}.

220

221 \subsection{Backing out a changeset}

222

223 The \hgcmd{backout} command lets you ``undo'' the effects of an entire

224 changeset in an automated fashion. Because Mercurial's history is

225 immutable, this command \emph{does not} get rid of the changeset you

226 want to undo. Instead, it creates a new changeset that

227 \emph{reverses} the effect of the to-be-undone changeset.

228

229 The operation of the \hgcmd{backout} command is a little intricate, so

230 let's illustrate it with some examples. First, we'll create a

231 repository with some simple changes.

232 \interaction{backout.init}

233

234 The \hgcmd{backout} command takes a single changeset ID as its

235 argument; this is the changeset to back out. Normally,

236 \hgcmd{backout} will drop you into a text editor to write a commit

237 message, so you can record why you're backing the change out. In this

238 example, we provide a commit message on the command line using the

239 \hgopt{backout}{-m} option.

240

241 \subsection{Backing out the tip changeset}

242

243 We're going to start by backing out the last changeset we committed.

244 \interaction{backout.simple}

245 You can see that the second line from \filename{myfile} is no longer

246 present. Taking a look at the output of \hgcmd{log} gives us an idea

247 of what the \hgcmd{backout} command has done.

248 \interaction{backout.simple.log}

249 Notice that the new changeset that \hgcmd{backout} has created is a

250 child of the changeset we backed out. It's easier to see this in

251 figure~\ref{fig:undo:backout}, which presents a graphical view of the

252 change history. As you can see, the history is nice and linear.

253

254 \begin{figure}[htb]

255 \centering

256 \grafix{undo-simple}

257 \caption{Backing out a change using the \hgcmd{backout} command}

258 \label{fig:undo:backout}

259 \end{figure}

260

261 \subsection{Backing out a non-tip change}

262

263 If you want to back out a change other than the last one you

264 committed, pass the \hgopt{backout}{--merge} option to the

265 \hgcmd{backout} command.

266 \interaction{backout.non-tip.clone}

267 This makes backing out any changeset a ``one-shot'' operation that's

268 usually simple and fast.

269 \interaction{backout.non-tip.backout}

270

271 If you take a look at the contents of \filename{myfile} after the

272 backout finishes, you'll see that the first and third changes are

273 present, but not the second.

274 \interaction{backout.non-tip.cat}

275

276 As the graphical history in figure~\ref{fig:undo:backout-non-tip}

277 illustrates, Mercurial actually commits \emph{two} changes in this

278 kind of situation (the box-shaped nodes are the ones that Mercurial

279 commits automatically). Before Mercurial begins the backout process,

280 it first remembers what the current parent of the working directory

281 is. It then backs out the target changeset, and commits that as a

282 changeset. Finally, it merges back to the previous parent of the

283 working directory, and commits the result of the merge.

284

285 \begin{figure}[htb]

286 \centering

287 \grafix{undo-non-tip}

288 \caption{Automated backout of a non-tip change using the \hgcmd{backout} command}

289 \label{fig:undo:backout-non-tip}

290 \end{figure}

291

292 The result is that you end up ``back where you were'', only with some

293 extra history that undoes the effect of the changeset you wanted to

294 back out.

295

296 \subsubsection{Always use the \hgopt{backout}{--merge} option}

297

298 In fact, since the \hgopt{backout}{--merge} option will do the ``right

299 thing'' whether or not the changeset you're backing out is the tip

300 (i.e.~it won't try to merge if it's backing out the tip, since there's

301 no need), you should \emph{always} use this option when you run the

302 \hgcmd{backout} command.

303

304 \subsection{Gaining more control of the backout process}

305

306 While I've recommended that you always use the

307 \hgopt{backout}{--merge} option when backing out a change, the

308 \hgcmd{backout} command lets you decide how to merge a backout

309 changeset. Taking control of the backout process by hand is something

310 you will rarely need to do, but it can be useful to understand what

311 the \hgcmd{backout} command is doing for you automatically. To

312 illustrate this, let's clone our first repository, but omit the

313 backout change that it contains.

314

315 \interaction{backout.manual.clone}

316 As with our earlier example, We'll commit a third changeset, then back

317 out its parent, and see what happens.

318 \interaction{backout.manual.backout}

319 Our new changeset is again a descendant of the changeset we backout

320 out; it's thus a new head, \emph{not} a descendant of the changeset

321 that was the tip. The \hgcmd{backout} command was quite explicit in

322 telling us this.

323 \interaction{backout.manual.log}

324

325 Again, it's easier to see what has happened by looking at a graph of

326 the revision history, in figure~\ref{fig:undo:backout-manual}. This

327 makes it clear that when we use \hgcmd{backout} to back out a change

328 other than the tip, Mercurial adds a new head to the repository (the

329 change it committed is box-shaped).

330

331 \begin{figure}[htb]

332 \centering

333 \grafix{undo-manual}

334 \caption{Backing out a change using the \hgcmd{backout} command}

335 \label{fig:undo:backout-manual}

336 \end{figure}

337

338 After the \hgcmd{backout} command has completed, it leaves the new

339 ``backout'' changeset as the parent of the working directory.

340 \interaction{backout.manual.parents}

341 Now we have two isolated sets of changes.

342 \interaction{backout.manual.heads}

343

344 Let's think about what we expect to see as the contents of

345 \filename{myfile} now. The first change should be present, because

346 we've never backed it out. The second change should be missing, as

347 that's the change we backed out. Since the history graph shows the

348 third change as a separate head, we \emph{don't} expect to see the

349 third change present in \filename{myfile}.

350 \interaction{backout.manual.cat}

351 To get the third change back into the file, we just do a normal merge

352 of our two heads.

353 \interaction{backout.manual.merge}

354 Afterwards, the graphical history of our repository looks like

355 figure~\ref{fig:undo:backout-manual-merge}.

356

357 \begin{figure}[htb]

358 \centering

359 \grafix{undo-manual-merge}

360 \caption{Manually merging a backout change}

361 \label{fig:undo:backout-manual-merge}

362 \end{figure}

363

364 \subsection{Why \hgcmd{backout} works as it does}

365

366 Here's a brief description of how the \hgcmd{backout} command works.

367 \begin{enumerate}

368 \item It ensures that the working directory is ``clean'', i.e.~that

369 the output of \hgcmd{status} would be empty.

370 \item It remembers the current parent of the working directory. Let's

371 call this changeset \texttt{orig}

372 \item It does the equivalent of a \hgcmd{update} to sync the working

373 directory to the changeset you want to back out. Let's call this

374 changeset \texttt{backout}

375 \item It finds the parent of that changeset. Let's call that

376 changeset \texttt{parent}.

377 \item For each file that the \texttt{backout} changeset affected, it

378 does the equivalent of a \hgcmdargs{revert}{-r parent} on that file,

379 to restore it to the contents it had before that changeset was

380 committed.

381 \item It commits the result as a new changeset. This changeset has

382 \texttt{backout} as its parent.

383 \item If you specify \hgopt{backout}{--merge} on the command line, it

384 merges with \texttt{orig}, and commits the result of the merge.

385 \end{enumerate}

386

387 An alternative way to implement the \hgcmd{backout} command would be

388 to \hgcmd{export} the to-be-backed-out changeset as a diff, then use

389 the \cmdopt{patch}{--reverse} option to the \command{patch} command to

390 reverse the effect of the change without fiddling with the working

391 directory. This sounds much simpler, but it would not work nearly as

392 well.

393

394 The reason that \hgcmd{backout} does an update, a commit, a merge, and

395 another commit is to give the merge machinery the best chance to do a

396 good job when dealing with all the changes \emph{between} the change

397 you're backing out and the current tip.

398

399 If you're backing out a changeset that's~100 revisions back in your

400 project's history, the chances that the \command{patch} command will

401 be able to apply a reverse diff cleanly are not good, because

402 intervening changes are likely to have ``broken the context'' that

403 \command{patch} uses to determine whether it can apply a patch (if

404 this sounds like gibberish, see \ref{sec:mq:patch} for a

405 discussion of the \command{patch} command). Also, Mercurial's merge

406 machinery will handle files and directories being renamed, permission

407 changes, and modifications to binary files, none of which

408 \command{patch} can deal with.

409

410 \section{Changes that should never have been}

411 \label{sec:undo:aaaiiieee}

412

413 Most of the time, the \hgcmd{backout} command is exactly what you need

414 if you want to undo the effects of a change. It leaves a permanent

415 record of exactly what you did, both when committing the original

416 changeset and when you cleaned up after it.

417

418 On rare occasions, though, you may find that you've committed a change

419 that really should not be present in the repository at all. For

420 example, it would be very unusual, and usually considered a mistake,

421 to commit a software project's object files as well as its source

422 files. Object files have almost no intrinsic value, and they're

423 \emph{big}, so they increase the size of the repository and the amount

424 of time it takes to clone or pull changes.

425

426 Before I discuss the options that you have if you commit a ``brown

427 paper bag'' change (the kind that's so bad that you want to pull a

428 brown paper bag over your head), let me first discuss some approaches

429 that probably won't work.

430

431 Since Mercurial treats history as accumulative---every change builds

432 on top of all changes that preceded it---you generally can't just make

433 disastrous changes disappear. The one exception is when you've just

434 committed a change, and it hasn't been pushed or pulled into another

435 repository. That's when you can safely use the \hgcmd{rollback}

436 command, as I detailed in section~\ref{sec:undo:rollback}.

437

438 After you've pushed a bad change to another repository, you

439 \emph{could} still use \hgcmd{rollback} to make your local copy of the

440 change disappear, but it won't have the consequences you want. The

441 change will still be present in the remote repository, so it will

442 reappear in your local repository the next time you pull.

443

444 If a situation like this arises, and you know which repositories your

445 bad change has propagated into, you can \emph{try} to get rid of the

446 changeefrom \emph{every} one of those repositories. This is, of

447 course, not a satisfactory solution: if you miss even a single

448 repository while you're expunging, the change is still ``in the

449 wild'', and could propagate further.

450

451 If you've committed one or more changes \emph{after} the change that

452 you'd like to see disappear, your options are further reduced.

453 Mercurial doesn't provide a way to ``punch a hole'' in history,

454 leaving changesets intact.

455

456 XXX This needs filling out. The \texttt{hg-replay} script in the

457 \texttt{examples} directory works, but doesn't handle merge

458 changesets. Kind of an important omission.

459

460 \subsection{Protect yourself from ``escaped'' changes}

461

462 If you've committed some changes to your local repository and they've

463 been pushed or pulled somewhere else, this isn't necessarily a

464 disaster. You can protect yourself ahead of time against some classes

465 of bad changeset. This is particularly easy if your team usually

466 pulls changes from a central repository.

467

468 By configuring some hooks on that repository to validate incoming

469 changesets (see chapter~\ref{chap:hook}), you can automatically

470 prevent some kinds of bad changeset from being pushed to the central

471 repository at all. With such a configuration in place, some kinds of

472 bad changeset will naturally tend to ``die out'' because they can't

473 propagate into the central repository. Better yet, this happens

474 without any need for explicit intervention.

475

476 For instance, an incoming change hook that verifies that a changeset

477 will actually compile can prevent people from inadvertantly ``breaking

478 the build''.

479

480 \section{Finding the source of a bug}

481 \label{sec:undo:bisect}

482

483 While it's all very well to be able to back out a changeset that

484 introduced a bug, this requires that you know which changeset to back

485 out. Mercurial provides an invaluable extension, called

486 \hgext{bisect}, that helps you to automate this process and accomplish

487 it very efficiently.

488

489 The idea behind the \hgext{bisect} extension is that a changeset has

490 introduced some change of behaviour that you can identify with a

491 simple binary test. You don't know which piece of code introduced the

492 change, but you know how to test for the presence of the bug. The

493 \hgext{bisect} extension uses your test to direct its search for the

494 changeset that introduced the code that caused the bug.

495

496 Here are a few scenarios to help you understand how you might apply this

497 extension.

498 \begin{itemize}

499 \item The most recent version of your software has a bug that you

500 remember wasn't present a few weeks ago, but you don't know when it

501 was introduced. Here, your binary test checks for the presence of

502 that bug.

503 \item You fixed a bug in a rush, and now it's time to close the entry

504 in your team's bug database. The bug database requires a changeset

505 ID when you close an entry, but you don't remember which changeset

506 you fixed the bug in. Once again, your binary test checks for the

507 presence of the bug.

508 \item Your software works correctly, but runs~15\% slower than the

509 last time you measured it. You want to know which changeset

510 introduced the performance regression. In this case, your binary

511 test measures the performance of your software, to see whether it's

512 ``fast'' or ``slow''.

513 \item The sizes of the components of your project that you ship

514 exploded recently, and you suspect that something changed in the way

515 you build your project.

516 \end{itemize}

517

518 From these examples, it should be clear that the \hgext{bisect}

519 extension is not useful only for finding the sources of bugs. You can

520 use it to find any ``emergent property'' of a repository (anything

521 that you can't find from a simple text search of the files in the

522 tree) for which you can write a binary test.

523

524 We'll introduce a little bit of terminology here, just to make it

525 clear which parts of the search process are your responsibility, and

526 which are Mercurial's. A \emph{test} is something that \emph{you} run

527 when \hgext{bisect} chooses a changeset. A \emph{probe} is what

528 \hgext{bisect} runs to tell whether a revision is good. Finally,

529 we'll use the word ``bisect'', as both a noun and a verb, to stand in

530 for the phrase ``search using the \hgext{bisect} extension''.

531

532 One simple way to automate the searching process would be simply to

533 probe every changeset. However, this scales poorly. If it took ten

534 minutes to test a single changeset, and you had 10,000 changesets in

535 your repository, the exhaustive approach would take on average~35

536 \emph{days} to find the changeset that introduced a bug. Even if you

537 knew that the bug was introduced by one of the last 500 changesets,

538 and limited your search to those, you'd still be looking at over 40

539 hours to find the changeset that introduced your bug.

540

541 What the \emph{bisect} extension does is use its knowledge of the

542 ``shape'' of your project's revision history to perform a search in

543 time proportional to the \emph{logarithm} of the number of changesets

544 to check (the kind of search it performs is called a dichotomic

545 search). With this approach, searching through 10,000 changesets will

546 take less than two hours, even at ten minutes per test. Limit your

547 search to the last 500 changesets, and it will take less than an hour.

548

549 The \hgext{bisect} extension is aware of the ``branchy'' nature of a

550 Mercurial project's revision history, so it has no problems dealing

551 with branches, merges, or multiple heads in a repoository. It can

552 prune entire branches of history with a single probe, which is how it

553 operates so efficiently.

554

555 \subsection{Using the \hgext{bisect} extension}

556

557 Here's an example of \hgext{bisect} in action. To keep the core of

558 Mercurial simple, \hgext{bisect} is packaged as an extension; this

559 means that it won't be present unless you explicitly enable it. To do

560 this, edit your \hgrc\ and add the following section header (if it's

561 not already present):

562 \begin{codesample2}

563 [extensions]

564 \end{codesample2}

565 Then add a line to this section to enable the extension:

566 \begin{codesample2}

567 hbisect =

568 \end{codesample2}

569 \begin{note}

570 That's right, there's a ``\texttt{h}'' at the front of the name of

571 the \hgext{bisect} extension. The reason is that Mercurial is

572 written in Python, and uses a standard Python package called

573 \texttt{bisect}. If you omit the ``\texttt{h}'' from the name

574 ``\texttt{hbisect}'', Mercurial will erroneously find the standard

575 Python \texttt{bisect} package, and try to use it as a Mercurial

576 extension. This won't work, and Mercurial will crash repeatedly

577 until you fix the spelling in your \hgrc. Ugh.

578 \end{note}

579

580 Now let's create a repository, so that we can try out the

581 \hgext{bisect} extension in isolation.

582 \interaction{bisect.init}

583 We'll simulate a project that has a bug in it in a simple-minded way:

584 create trivial changes in a loop, and nominate one specific change

585 that will have the ``bug''. This loop creates 50 changesets, each

586 adding a single file to the repository. We'll represent our ``bug''

587 with a file that contains the text ``i have a gub''.

588 \interaction{bisect.commits}

589

590 The next thing that we'd like to do is figure out how to use the

591 \hgext{bisect} extension. We can use Mercurial's normal built-in help

592 mechanism for this.

593 \interaction{bisect.help}

594

595 The \hgext{bisect} extension works in steps. Each step proceeds as follows.

596 \begin{enumerate}

597 \item You run your binary test.

598 \begin{itemize}

599 \item If the test succeeded, you tell \hgext{bisect} by running the

600 \hgcmdargs{bisect}{good} command.

601 \item If it failed, use the \hgcmdargs{bisect}{bad} command to let

602 the \hgext{bisect} extension know.

603 \end{itemize}

604 \item The extension uses your information to decide which changeset to

605 test next.

606 \item It updates the working directory to that changeset, and the

607 process begins again.

608 \end{enumerate}

609 The process ends when \hgext{bisect} identifies a unique changeset

610 that marks the point where your test transitioned from ``succeeding''

611 to ``failing''.

612

613 To start the search, we must run the \hgcmdargs{bisect}{init} command.

614 \interaction{bisect.search.init}

615

616 In our case, the binary test we use is simple: we check to see if any

617 file in the repository contains the string ``i have a gub''. If it

618 does, this changeset contains the change that ``caused the bug''. By

619 convention, a changeset that has the property we're searching for is

620 ``bad'', while one that doesn't is ``good''.

621

622 Most of the time, the revision to which the working directory is

623 synced (usually the tip) already exhibits the problem introduced by

624 the buggy change, so we'll mark it as ``bad''.

625 \interaction{bisect.search.bad-init}

626

627 Our next task is to nominate a changeset that we know \emph{doesn't}

628 have the bug; the \hgext{bisect} extension will ``bracket'' its search

629 between the first pair of good and bad changesets. In our case, we

630 know that revision~10 didn't have the bug. (I'll have more words

631 about choosing the first ``good'' changeset later.)

632 \interaction{bisect.search.good-init}

633

634 Notice that this command printed some output.

635 \begin{itemize}

636 \item It told us how many changesets it must consider before it can

637 identify the one that introduced the bug, and how many tests that

638 will require.

639 \item It updated the working directory to the next changeset to test,

640 and told us which changeset it's testing.

641 \end{itemize}

642

643 We now run our test in the working directory. We use the

644 \command{grep} command to see if our ``bad'' file is present in the

645 working directory. If it is, this revision is bad; if not, this

646 revision is good.

647 \interaction{bisect.search.step1}

648

649 This test looks like a perfect candidate for automation, so let's turn

650 it into a shell function.

651 \interaction{bisect.search.mytest}

652 We can now run an entire test step with a single command,

653 \texttt{mytest}.

654 \interaction{bisect.search.step2}

655 A few more invocations of our canned test step command, and we're

656 done.

657 \interaction{bisect.search.rest}

658

659 Even though we had~40 changesets to search through, the \hgext{bisect}

660 extension let us find the changeset that introduced our ``bug'' with

661 only five tests. Because the number of tests that the \hgext{bisect}

662 extension grows logarithmically with the number of changesets to

663 search, the advantage that it has over the ``brute force'' search

664 approach increases with every changeset you add.

665

666 \subsection{Cleaning up after your search}

667

668 When you're finished using the \hgext{bisect} extension in a

669 repository, you can use the \hgcmdargs{bisect}{reset} command to drop

670 the information it was using to drive your search. The extension

671 doesn't use much space, so it doesn't matter if you forget to run this

672 command. However, \hgext{bisect} won't let you start a new search in

673 that repository until you do a \hgcmdargs{bisect}{reset}.

674 \interaction{bisect.search.reset}

675

676 \section{Tips for finding bugs effectively}

677

678 \subsection{Give consistent input}

679

680 The \hgext{bisect} extension requires that you correctly report the

681 result of every test you perform. If you tell it that a test failed

682 when it really succeeded, it \emph{might} be able to detect the

683 inconsistency. If it can identify an inconsistency in your reports,

684 it will tell you that a particular changeset is both good and bad.

685 However, it can't do this perfectly; it's about as likely to report

686 the wrong changeset as the source of the bug.

687

688 \subsection{Automate as much as possible}

689

690 When I started using the \hgext{bisect} extension, I tried a few times

691 to run my tests by hand, on the command line. This is an approach

692 that I, at least, am not suited to. After a few tries, I found that I

693 was making enough mistakes that I was having to restart my searches

694 several times before finally getting correct results.

695

696 My initial problems with driving the \hgext{bisect} extension by hand

697 occurred even with simple searches on small repositories; if the

698 problem you're looking for is more subtle, or the number of tests that

699 \hgext{bisect} must perform increases, the likelihood of operator

700 error ruining the search is much higher. Once I started automating my

701 tests, I had much better results.

702

703 The key to automated testing is twofold:

704 \begin{itemize}

705 \item always test for the same symptom, and

706 \item always feed consistent input to the \hgcmd{bisect} command.

707 \end{itemize}

708 In my tutorial example above, the \command{grep} command tests for the

709 symptom, and the \texttt{if} statement takes the result of this check

710 and ensures that we always feed the same input to the \hgcmd{bisect}

711 command. The \texttt{mytest} function marries these together in a

712 reproducible way, so that every test is uniform and consistent.

713

714 \subsection{Check your results}

715

716 Because the output of a \hgext{bisect} search is only as good as the

717 input you give it, don't take the changeset it reports as the

718 absolute truth. A simple way to cross-check its report is to manually

719 run your test at each of the following changesets:

720 \begin{itemize}

721 \item The changeset that it reports as the first bad revision. Your

722 test should still report this as bad.

723 \item The parent of that changeset (either parent, if it's a merge).

724 Your test should report this changeset as good.

725 \item A child of that changeset. Your test should report this

726 changeset as bad.

727 \end{itemize}

728

729 \subsection{Beware interference between bugs}

730

731 It's possible that your search for one bug could be disrupted by the

732 presence of another. For example, let's say your software crashes at

733 revision 100, and worked correctly at revision 50. Unknown to you,

734 someone else introduced a different crashing bug at revision 60, and

735 fixed it at revision 80. This could distort your results in one of

736 several ways.

737

738 It is possible that this other bug completely ``masks'' yours, which

739 is to say that it occurs before your bug has a chance to manifest

740 itself. If you can't avoid that other bug (for example, it prevents

741 your project from building), and so can't tell whether your bug is

742 present in a particular changeset, the \hgext{bisect} extension cannot

743 help you directly. Instead, you'll need to manually avoid the

744 changesets where that bug is present, and do separate searches

745 ``around'' it.

746

747 A different problem could arise if your test for a bug's presence is

748 not specific enough. If you checks for ``my program crashes'', then

749 both your crashing bug and an unrelated crashing bug that masks it

750 will look like the same thing, and mislead \hgext{bisect}.

751

752 \subsection{Bracket your search lazily}

753

754 Choosing the first ``good'' and ``bad'' changesets that will mark the

755 end points of your search is often easy, but it bears a little

756 discussion neverthheless. From the perspective of \hgext{bisect}, the

757 ``newest'' changeset is conventionally ``bad'', and the older

758 changeset is ``good''.

759

760 If you're having trouble remembering when a suitable ``good'' change

761 was, so that you can tell \hgext{bisect}, you could do worse than

762 testing changesets at random. Just remember to eliminate contenders

763 that can't possibly exhibit the bug (perhaps because the feature with

764 the bug isn't present yet) and those where another problem masks the

765 bug (as I discussed above).

766

767 Even if you end up ``early'' by thousands of changesets or months of

768 history, you will only add a handful of tests to the total number that

769 \hgext{bisect} must perform, thanks to its logarithmic behaviour.

770

771 %%% Local Variables:

772 %%% mode: latex

773 %%% TeX-master: "00book"

774 %%% End: