hgbook: 76697ae503db en/undo.tex

hgbook

view en/undo.tex @ 197:76697ae503db

Local branches.

author	Bryan O'Sullivan <bos@serpentine.com>
date	Mon Apr 16 16:33:51 2007 -0700 (2007-04-16)
parents	26b7a4e943aa
children	9bba958be4c6

line source

1 \chapter{Finding and fixing your mistakes}

2 \label{chap:undo}

4 To err might be human, but to really handle the consequences well

5 takes a top-notch revision control system. In this chapter, we'll

6 discuss some of the techniques you can use when you find that a

7 problem has crept into your project. Mercurial has some highly

8 capable features that will help you to isolate the sources of

9 problems, and to handle them appropriately.

11 \section{Erasing local history}

13 \subsection{The accidental commit}

15 I have the occasional but persistent problem of typing rather more

16 quickly than I can think, which sometimes results in me committing a

17 changeset that is either incomplete or plain wrong. In my case, the

18 usual kind of incomplete changeset is one in which I've created a new

19 source file, but forgotten to \hgcmd{add} it. A ``plain wrong''

20 changeset is not as common, but no less annoying.

22 \subsection{Rolling back a transaction}

23 \label{sec:undo:rollback}

25 In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats

26 each modification of a repository as a \emph{transaction}. Every time

27 you commit a changeset or pull changes from another repository,

28 Mercurial remembers what you did. You can undo, or \emph{roll back},

29 exactly one of these actions using the \hgcmd{rollback} command.

31 Here's a mistake that I often find myself making: committing a change

32 in which I've created a new file, but forgotten to \hgcmd{add} it.

33 \interaction{rollback.commit}

34 Looking at the output of \hgcmd{status} after the commit immediately

35 confirms the error.

36 \interaction{rollback.status}

37 The commit captured the changes to the file \filename{a}, but not the

38 new file \filename{b}. If I were to push this changeset to a

39 repository that I shared with a colleague, the chances are high that

40 something in \filename{a} would refer to \filename{b}, which would not

41 be present in their repository when they pulled my changes. I would

42 thus become the object of some indignation.

44 However, luck is with me---I've caught my error before I pushed the

45 changeset. I use the \hgcmd{rollback} command, and Mercurial makes

46 that last changeset vanish.

47 \interaction{rollback.rollback}

48 Notice that the changeset is no longer present in the repository's

49 history, and the working directory once again thinks that the file

50 \filename{a} is modified. The commit and rollback have left the

51 working directory exactly as it was prior to the commit; the changeset

52 has been completely erased. I can now safely \hgcmd{add} the file

53 \filename{b}, and rerun my commit.

54 \interaction{rollback.add}

56 \subsection{The erroneous pull}

58 It's common practice with Mercurial to maintain separate development

59 branches of a project in different repositories. Your development

60 team might have one shared repository for your project's ``0.9''

61 release, and another, containing different changes, for the ``1.0''

62 release.

64 Given this, you can imagine that the consequences could be messy if

65 you had a local ``0.9'' repository, and accidentally pulled changes

66 from the shared ``1.0'' repository into it. At worst, you could be

67 paying insufficient attention, and push those changes into the shared

68 ``0.9'' tree, confusing your entire team (but don't worry, we'll

69 return to this horror scenario later). However, it's more likely that

70 you'll notice immediately, because Mercurial will display the URL it's

71 pulling from, or you will see it pull a suspiciously large number of

72 changes into the repository.

74 The \hgcmd{rollback} command will work nicely to expunge all of the

75 changesets that you just pulled. Mercurial groups all changes from

76 one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is

77 all you need to undo this mistake.

79 \subsection{Rolling back is useless once you've pushed}

81 The value of the \hgcmd{rollback} command drops to zero once you've

82 pushed your changes to another repository. Rolling back a change

83 makes it disappear entirely, but \emph{only} in the repository in

84 which you perform the \hgcmd{rollback}. Because a rollback eliminates

85 history, there's no way for the disappearance of a change to propagate

86 between repositories.

88 If you've pushed a change to another repository---particularly if it's

89 a shared repository---it has essentially ``escaped into the wild,''

90 and you'll have to recover from your mistake in a different way. What

91 will happen if you push a changeset somewhere, then roll it back, then

92 pull from the repository you pushed to, is that the changeset will

93 reappear in your repository.

95 (If you absolutely know for sure that the change you want to roll back

96 is the most recent change in the repository that you pushed to,

97 \emph{and} you know that nobody else could have pulled it from that

98 repository, you can roll back the changeset there, too, but you really

99 should really not rely on this working reliably. If you do this,

100 sooner or later a change really will make it into a repository that

101 you don't directly control (or have forgotten about), and come back to

102 bite you.)

103

104 \subsection{You can only roll back once}

105

106 Mercurial stores exactly one transaction in its transaction log; that

107 transaction is the most recent one that occurred in the repository.

108 This means that you can only roll back one transaction. If you expect

109 to be able to roll back one transaction, then its predecessor, this is

110 not the behaviour you will get.

111 \interaction{rollback.twice}

112 Once you've rolled back one transaction in a repository, you can't

113 roll back again in that repository until you perform another commit or

114 pull.

115

116 \section{Reverting the mistaken change}

117

118 If you make a modification to a file, and decide that you really

119 didn't want to change the file at all, and you haven't yet committed

120 your changes, the \hgcmd{revert} command is the one you'll need. It

121 looks at the changeset that's the parent of the working directory, and

122 restores the contents of the file to their state as of that changeset.

123 (That's a long-winded way of saying that, in the normal case, it

124 undoes your modifications.)

125

126 Let's illustrate how the \hgcmd{revert} command works with yet another

127 small example. We'll begin by modifying a file that Mercurial is

128 already tracking.

129 \interaction{daily.revert.modify}

130 If we don't want that change, we can simply \hgcmd{revert} the file.

131 \interaction{daily.revert.unmodify}

132 The \hgcmd{revert} command provides us with an extra degree of safety

133 by saving our modified file with a \filename{.orig} extension.

134 \interaction{daily.revert.status}

135

136 Here is a summary of the cases that the \hgcmd{revert} command can

137 deal with. We will describe each of these in more detail in the

138 section that follows.

139 \begin{itemize}

140 \item If you modify a file, it will restore the file to its unmodified

141 state.

142 \item If you \hgcmd{add} a file, it will undo the ``added'' state of

143 the file, but leave the file itself untouched.

144 \item If you delete a file without telling Mercurial, it will restore

145 the file to its unmodified contents.

146 \item If you use the \hgcmd{remove} command to remove a file, it will

147 undo the ``removed'' state of the file, and restore the file to its

148 unmodified contents.

149 \end{itemize}

150

151 \subsection{File management errors}

152 \label{sec:undo:mgmt}

153

154 The \hgcmd{revert} command is useful for more than just modified

155 files. It lets you reverse the results of all of Mercurial's file

156 management commands---\hgcmd{add}, \hgcmd{remove}, and so on.

157

158 If you \hgcmd{add} a file, then decide that in fact you don't want

159 Mercurial to track it, use \hgcmd{revert} to undo the add. Don't

160 worry; Mercurial will not modify the file in any way. It will just

161 ``unmark'' the file.

162 \interaction{daily.revert.add}

163

164 Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use

165 \hgcmd{revert} to restore it to the contents it had as of the parent

166 of the working directory.

167 \interaction{daily.revert.remove}

168 This works just as well for a file that you deleted by hand, without

169 telling Mercurial (recall that in Mercurial terminology, this kind of

170 file is called ``missing'').

171 \interaction{daily.revert.missing}

172

173 If you revert a \hgcmd{copy}, the copied-to file remains in your

174 working directory afterwards, untracked. Since a copy doesn't affect

175 the copied-from file in any way, Mercurial doesn't do anything with

176 the copied-from file.

177 \interaction{daily.revert.copy}

178

179 \subsubsection{A slightly special case: reverting a rename}

180

181 If you \hgcmd{rename} a file, there is one small detail that

182 you should remember. When you \hgcmd{revert} a rename, it's not

183 enough to provide the name of the renamed-to file, as you can see

184 here.

185 \interaction{daily.revert.rename}

186 As you can see from the output of \hgcmd{status}, the renamed-to file

187 is no longer identified as added, but the renamed-\emph{from} file is

188 still removed! This is counter-intuitive (at least to me), but at

189 least it's easy to deal with.

190 \interaction{daily.revert.rename-orig}

191 So remember, to revert a \hgcmd{rename}, you must provide \emph{both}

192 the source and destination names.

193

194 (By the way, if you rename a file, then modify the renamed-to file,

195 then revert both components of the rename, when Mercurial restores the

196 file that was removed as part of the rename, it will be unmodified.

197 If you need the modifications in the renamed-to file to show up in the

198 renamed-from file, don't forget to copy them over.)

199

200 These fiddly aspects of reverting a rename arguably constitute a small

201 bug in Mercurial.

202

203 \section{Dealing with committed changes}

204

205 Consider a case where you have committed a change $a$, and another

206 change $b$ on top of it; you then realise that change $a$ was

207 incorrect. Mercurial lets you ``back out'' an entire changeset

208 automatically, and building blocks that let you reverse part of a

209 changeset by hand.

210

211 Before you read this section, here's something to keep in mind: the

212 \hgcmd{backout} command undoes changes by \emph{adding} history, not

213 by modifying or erasing it. It's the right tool to use if you're

214 fixing bugs, but not if you're trying to undo some change that has

215 catastrophic consequences. To deal with those, see

216 section~\ref{sec:undo:aaaiiieee}.

217

218 \subsection{Backing out a changeset}

219

220 The \hgcmd{backout} command lets you ``undo'' the effects of an entire

221 changeset in an automated fashion. Because Mercurial's history is

222 immutable, this command \emph{does not} get rid of the changeset you

223 want to undo. Instead, it creates a new changeset that

224 \emph{reverses} the effect of the to-be-undone changeset.

225

226 The operation of the \hgcmd{backout} command is a little intricate, so

227 let's illustrate it with some examples. First, we'll create a

228 repository with some simple changes.

229 \interaction{backout.init}

230

231 The \hgcmd{backout} command takes a single changeset ID as its

232 argument; this is the changeset to back out. Normally,

233 \hgcmd{backout} will drop you into a text editor to write a commit

234 message, so you can record why you're backing the change out. In this

235 example, we provide a commit message on the command line using the

236 \hgopt{backout}{-m} option.

237

238 \subsection{Backing out the tip changeset}

239

240 We're going to start by backing out the last changeset we committed.

241 \interaction{backout.simple}

242 You can see that the second line from \filename{myfile} is no longer

243 present. Taking a look at the output of \hgcmd{log} gives us an idea

244 of what the \hgcmd{backout} command has done.

245 \interaction{backout.simple.log}

246 Notice that the new changeset that \hgcmd{backout} has created is a

247 child of the changeset we backed out. It's easier to see this in

248 figure~\ref{fig:undo:backout}, which presents a graphical view of the

249 change history. As you can see, the history is nice and linear.

250

251 \begin{figure}[htb]

252 \centering

253 \grafix{undo-simple}

254 \caption{Backing out a change using the \hgcmd{backout} command}

255 \label{fig:undo:backout}

256 \end{figure}

257

258 \subsection{Backing out a non-tip change}

259

260 If you want to back out a change other than the last one you

261 committed, pass the \hgopt{backout}{--merge} option to the

262 \hgcmd{backout} command.

263 \interaction{backout.non-tip.clone}

264 This makes backing out any changeset a ``one-shot'' operation that's

265 usually simple and fast.

266 \interaction{backout.non-tip.backout}

267

268 If you take a look at the contents of \filename{myfile} after the

269 backout finishes, you'll see that the first and third changes are

270 present, but not the second.

271 \interaction{backout.non-tip.cat}

272

273 As the graphical history in figure~\ref{fig:undo:backout-non-tip}

274 illustrates, Mercurial actually commits \emph{two} changes in this

275 kind of situation (the box-shaped nodes are the ones that Mercurial

276 commits automatically). Before Mercurial begins the backout process,

277 it first remembers what the current parent of the working directory

278 is. It then backs out the target changeset, and commits that as a

279 changeset. Finally, it merges back to the previous parent of the

280 working directory, and commits the result of the merge.

281

282 \begin{figure}[htb]

283 \centering

284 \grafix{undo-non-tip}

285 \caption{Automated backout of a non-tip change using the \hgcmd{backout} command}

286 \label{fig:undo:backout-non-tip}

287 \end{figure}

288

289 The result is that you end up ``back where you were'', only with some

290 extra history that undoes the effect of the changeset you wanted to

291 back out.

292

293 \subsubsection{Always use the \hgopt{backout}{--merge} option}

294

295 In fact, since the \hgopt{backout}{--merge} option will do the ``right

296 thing'' whether or not the changeset you're backing out is the tip

297 (i.e.~it won't try to merge if it's backing out the tip, since there's

298 no need), you should \emph{always} use this option when you run the

299 \hgcmd{backout} command.

300

301 \subsection{Gaining more control of the backout process}

302

303 While I've recommended that you always use the

304 \hgopt{backout}{--merge} option when backing out a change, the

305 \hgcmd{backout} command lets you decide how to merge a backout

306 changeset. Taking control of the backout process by hand is something

307 you will rarely need to do, but it can be useful to understand what

308 the \hgcmd{backout} command is doing for you automatically. To

309 illustrate this, let's clone our first repository, but omit the

310 backout change that it contains.

311

312 \interaction{backout.manual.clone}

313 As with our earlier example, We'll commit a third changeset, then back

314 out its parent, and see what happens.

315 \interaction{backout.manual.backout}

316 Our new changeset is again a descendant of the changeset we backout

317 out; it's thus a new head, \emph{not} a descendant of the changeset

318 that was the tip. The \hgcmd{backout} command was quite explicit in

319 telling us this.

320 \interaction{backout.manual.log}

321

322 Again, it's easier to see what has happened by looking at a graph of

323 the revision history, in figure~\ref{fig:undo:backout-manual}. This

324 makes it clear that when we use \hgcmd{backout} to back out a change

325 other than the tip, Mercurial adds a new head to the repository (the

326 change it committed is box-shaped).

327

328 \begin{figure}[htb]

329 \centering

330 \grafix{undo-manual}

331 \caption{Backing out a change using the \hgcmd{backout} command}

332 \label{fig:undo:backout-manual}

333 \end{figure}

334

335 After the \hgcmd{backout} command has completed, it leaves the new

336 ``backout'' changeset as the parent of the working directory.

337 \interaction{backout.manual.parents}

338 Now we have two isolated sets of changes.

339 \interaction{backout.manual.heads}

340

341 Let's think about what we expect to see as the contents of

342 \filename{myfile} now. The first change should be present, because

343 we've never backed it out. The second change should be missing, as

344 that's the change we backed out. Since the history graph shows the

345 third change as a separate head, we \emph{don't} expect to see the

346 third change present in \filename{myfile}.

347 \interaction{backout.manual.cat}

348 To get the third change back into the file, we just do a normal merge

349 of our two heads.

350 \interaction{backout.manual.merge}

351 Afterwards, the graphical history of our repository looks like

352 figure~\ref{fig:undo:backout-manual-merge}.

353

354 \begin{figure}[htb]

355 \centering

356 \grafix{undo-manual-merge}

357 \caption{Manually merging a backout change}

358 \label{fig:undo:backout-manual-merge}

359 \end{figure}

360

361 \subsection{Why \hgcmd{backout} works as it does}

362

363 Here's a brief description of how the \hgcmd{backout} command works.

364 \begin{enumerate}

365 \item It ensures that the working directory is ``clean'', i.e.~that

366 the output of \hgcmd{status} would be empty.

367 \item It remembers the current parent of the working directory. Let's

368 call this changeset \texttt{orig}

369 \item It does the equivalent of a \hgcmd{update} to sync the working

370 directory to the changeset you want to back out. Let's call this

371 changeset \texttt{backout}

372 \item It finds the parent of that changeset. Let's call that

373 changeset \texttt{parent}.

374 \item For each file that the \texttt{backout} changeset affected, it

375 does the equivalent of a \hgcmdargs{revert}{-r parent} on that file,

376 to restore it to the contents it had before that changeset was

377 committed.

378 \item It commits the result as a new changeset. This changeset has

379 \texttt{backout} as its parent.

380 \item If you specify \hgopt{backout}{--merge} on the command line, it

381 merges with \texttt{orig}, and commits the result of the merge.

382 \end{enumerate}

383

384 An alternative way to implement the \hgcmd{backout} command would be

385 to \hgcmd{export} the to-be-backed-out changeset as a diff, then use

386 the \cmdopt{patch}{--reverse} option to the \command{patch} command to

387 reverse the effect of the change without fiddling with the working

388 directory. This sounds much simpler, but it would not work nearly as

389 well.

390

391 The reason that \hgcmd{backout} does an update, a commit, a merge, and

392 another commit is to give the merge machinery the best chance to do a

393 good job when dealing with all the changes \emph{between} the change

394 you're backing out and the current tip.

395

396 If you're backing out a changeset that's~100 revisions back in your

397 project's history, the chances that the \command{patch} command will

398 be able to apply a reverse diff cleanly are not good, because

399 intervening changes are likely to have ``broken the context'' that

400 \command{patch} uses to determine whether it can apply a patch (if

401 this sounds like gibberish, see \ref{sec:mq:patch} for a

402 discussion of the \command{patch} command). Also, Mercurial's merge

403 machinery will handle files and directories being renamed, permission

404 changes, and modifications to binary files, none of which

405 \command{patch} can deal with.

406

407 \section{Changes that should never have been}

408 \label{sec:undo:aaaiiieee}

409

410 Most of the time, the \hgcmd{backout} command is exactly what you need

411 if you want to undo the effects of a change. It leaves a permanent

412 record of exactly what you did, both when committing the original

413 changeset and when you cleaned up after it.

414

415 On rare occasions, though, you may find that you've committed a change

416 that really should not be present in the repository at all. For

417 example, it would be very unusual, and usually considered a mistake,

418 to commit a software project's object files as well as its source

419 files. Object files have almost no intrinsic value, and they're

420 \emph{big}, so they increase the size of the repository and the amount

421 of time it takes to clone or pull changes.

422

423 Before I discuss the options that you have if you commit a ``brown

424 paper bag'' change (the kind that's so bad that you want to pull a

425 brown paper bag over your head), let me first discuss some approaches

426 that probably won't work.

427

428 Since Mercurial treats history as accumulative---every change builds

429 on top of all changes that preceded it---you generally can't just make

430 disastrous changes disappear. The one exception is when you've just

431 committed a change, and it hasn't been pushed or pulled into another

432 repository. That's when you can safely use the \hgcmd{rollback}

433 command, as I detailed in section~\ref{sec:undo:rollback}.

434

435 After you've pushed a bad change to another repository, you

436 \emph{could} still use \hgcmd{rollback} to make your local copy of the

437 change disappear, but it won't have the consequences you want. The

438 change will still be present in the remote repository, so it will

439 reappear in your local repository the next time you pull.

440

441 If a situation like this arises, and you know which repositories your

442 bad change has propagated into, you can \emph{try} to get rid of the

443 changeefrom \emph{every} one of those repositories. This is, of

444 course, not a satisfactory solution: if you miss even a single

445 repository while you're expunging, the change is still ``in the

446 wild'', and could propagate further.

447

448 If you've committed one or more changes \emph{after} the change that

449 you'd like to see disappear, your options are further reduced.

450 Mercurial doesn't provide a way to ``punch a hole'' in history,

451 leaving changesets intact.

452

453 XXX This needs filling out. The \texttt{hg-replay} script in the

454 \texttt{examples} directory works, but doesn't handle merge

455 changesets. Kind of an important omission.

456

457 \section{Finding the source of a bug}

458

459 While it's all very well to be able to back out a changeset that

460 introduced a bug, this requires that you know which changeset to back

461 out. Mercurial provides an invaluable extension, called

462 \hgext{bisect}, that helps you to automate this process and accomplish

463 it very efficiently.

464

465 The idea behind the \hgext{bisect} extension is that a changeset has

466 introduced some change of behaviour that you can identify with a

467 simple binary test. You don't know which piece of code introduced the

468 change, but you know how to test for the presence of the bug. The

469 \hgext{bisect} extension uses your test to direct its search for the

470 changeset that introduced the code that caused the bug.

471

472 Here are a few scenarios to help you understand how you might apply this

473 extension.

474 \begin{itemize}

475 \item The most recent version of your software has a bug that you

476 remember wasn't present a few weeks ago, but you don't know when it

477 was introduced. Here, your binary test checks for the presence of

478 that bug.

479 \item You fixed a bug in a rush, and now it's time to close the entry

480 in your team's bug database. The bug database requires a changeset

481 ID when you close an entry, but you don't remember which changeset

482 you fixed the bug in. Once again, your binary test checks for the

483 presence of the bug.

484 \item Your software works correctly, but runs~15\% slower than the

485 last time you measured it. You want to know which changeset

486 introduced the performance regression. In this case, your binary

487 test measures the performance of your software, to see whether it's

488 ``fast'' or ``slow''.

489 \item The sizes of the components of your project that you ship

490 exploded recently, and you suspect that something changed in the way

491 you build your project.

492 \end{itemize}

493

494 From these examples, it should be clear that the \hgext{bisect}

495 extension is not useful only for finding the sources of bugs. You can

496 use it to find any ``emergent property'' of a repository (anything

497 that you can't find from a simple text search of the files in the

498 tree) for which you can write a binary test.

499

500 We'll introduce a little bit of terminology here, just to make it

501 clear which parts of the search process are your responsibility, and

502 which are Mercurial's. A \emph{test} is something that \emph{you} run

503 when \hgext{bisect} chooses a changeset. A \emph{probe} is what

504 \hgext{bisect} runs to tell whether a revision is good. Finally,

505 we'll use the word ``bisect'', as both a noun and a verb, to stand in

506 for the phrase ``search using the \hgext{bisect} extension''.

507

508 One simple way to automate the searching process would be simply to

509 probe every changeset. However, this scales poorly. If it took ten

510 minutes to test a single changeset, and you had 10,000 changesets in

511 your repository, the exhaustive approach would take on average~35

512 \emph{days} to find the changeset that introduced a bug. Even if you

513 knew that the bug was introduced by one of the last 500 changesets,

514 and limited your search to those, you'd still be looking at over 40

515 hours to find the changeset that introduced your bug.

516

517 What the \emph{bisect} extension does is use its knowledge of the

518 ``shape'' of your project's revision history to perform a search in

519 time proportional to the \emph{logarithm} of the number of changesets

520 to check (the kind of search it performs is called a dichotomic

521 search). With this approach, searching through 10,000 changesets will

522 take less than two hours, even at ten minutes per test. Limit your

523 search to the last 500 changesets, and it will take less than an hour.

524

525 The \hgext{bisect} extension is aware of the ``branchy'' nature of a

526 Mercurial project's revision history, so it has no problems dealing

527 with branches, merges, or multiple heads in a repoository. It can

528 prune entire branches of history with a single probe, which is how it

529 operates so efficiently.

530

531 \subsection{Using the \hgext{bisect} extension}

532

533 Here's an example of \hgext{bisect} in action. To keep the core of

534 Mercurial simple, \hgext{bisect} is packaged as an extension; this

535 means that it won't be present unless you explicitly enable it. To do

536 this, edit your \hgrc\ and add the following section header (if it's

537 not already present):

538 \begin{codesample2}

539 [extensions]

540 \end{codesample2}

541 Then add a line to this section to enable the extension:

542 \begin{codesample2}

543 hbisect =

544 \end{codesample2}

545 \begin{note}

546 That's right, there's a ``\texttt{h}'' at the front of the name of

547 the \hgext{bisect} extension. The reason is that Mercurial is

548 written in Python, and uses a standard Python package called

549 \texttt{bisect}. If you omit the ``\texttt{h}'' from the name

550 ``\texttt{hbisect}'', Mercurial will erroneously find the standard

551 Python \texttt{bisect} package, and try to use it as a Mercurial

552 extension. This won't work, and Mercurial will crash repeatedly

553 until you fix the spelling in your \hgrc. Ugh.

554 \end{note}

555

556 Now let's create a repository, so that we can try out the

557 \hgext{bisect} extension in isolation.

558 \interaction{bisect.init}

559 We'll simulate a project that has a bug in it in a simple-minded way:

560 create trivial changes in a loop, and nominate one specific change

561 that will have the ``bug''. This loop creates 50 changesets, each

562 adding a single file to the repository. We'll represent our ``bug''

563 with a file that contains the text ``i have a gub''.

564 \interaction{bisect.commits}

565

566 The next thing that we'd like to do is figure out how to use the

567 \hgext{bisect} extension. We can use Mercurial's normal built-in help

568 mechanism for this.

569 \interaction{bisect.help}

570

571 The \hgext{bisect} extension works in steps. Each step proceeds as follows.

572 \begin{enumerate}

573 \item You run your binary test.

574 \begin{itemize}

575 \item If the test succeeded, you tell \hgext{bisect} by running the

576 \hgcmdargs{bisect}{good} command.

577 \item If it failed, use the \hgcmdargs{bisect}{bad} command to let

578 the \hgext{bisect} extension know.

579 \end{itemize}

580 \item The extension uses your information to decide which changeset to

581 test next.

582 \item It updates the working directory to that changeset, and the

583 process begins again.

584 \end{enumerate}

585 The process ends when \hgext{bisect} identifies a unique changeset

586 that marks the point where your test transitioned from ``succeeding''

587 to ``failing''.

588

589 To start the search, we must run the \hgcmdargs{bisect}{init} command.

590 \interaction{bisect.search.init}

591

592 In our case, the binary test we use is simple: we check to see if any

593 file in the repository contains the string ``i have a gub''. If it

594 does, this changeset contains the change that ``caused the bug''. By

595 convention, a changeset that has the property we're searching for is

596 ``bad'', while one that doesn't is ``good''.

597

598 Most of the time, the revision to which the working directory is

599 synced (usually the tip) already exhibits the problem introduced by

600 the buggy change, so we'll mark it as ``bad''.

601 \interaction{bisect.search.bad-init}

602

603 Our next task is to nominate a changeset that we know \emph{doesn't}

604 have the bug; the \hgext{bisect} extension will ``bracket'' its search

605 between the first pair of good and bad changesets. In our case, we

606 know that revision~10 didn't have the bug. (I'll have more words

607 about choosing the first ``good'' changeset later.)

608 \interaction{bisect.search.good-init}

609

610 Notice that this command printed some output.

611 \begin{itemize}

612 \item It told us how many changesets it must consider before it can

613 identify the one that introduced the bug, and how many tests that

614 will require.

615 \item It updated the working directory to the next changeset to test,

616 and told us which changeset it's testing.

617 \end{itemize}

618

619 We now run our test in the working directory. We use the

620 \command{grep} command to see if our ``bad'' file is present in the

621 working directory. If it is, this revision is bad; if not, this

622 revision is good.

623 \interaction{bisect.search.step1}

624

625 This test looks like a perfect candidate for automation, so let's turn

626 it into a shell function.

627 \interaction{bisect.search.mytest}

628 We can now run an entire test step with a single command,

629 \texttt{mytest}.

630 \interaction{bisect.search.step2}

631 A few more invocations of our canned test step command, and we're

632 done.

633 \interaction{bisect.search.rest}

634

635 Even though we had~40 changesets to search through, the \hgext{bisect}

636 extension let us find the changeset that introduced our ``bug'' with

637 only five tests. Because the number of tests that the \hgext{bisect}

638 extension grows logarithmically with the number of changesets to

639 search, the advantage that it has over the ``brute force'' search

640 approach increases with every changeset you add.

641

642 \subsection{Cleaning up after your search}

643

644 When you're finished using the \hgext{bisect} extension in a

645 repository, you can use the \hgcmdargs{bisect}{reset} command to drop

646 the information it was using to drive your search. The extension

647 doesn't use much space, so it doesn't matter if you forget to run this

648 command. However, \hgext{bisect} won't let you start a new search in

649 that repository until you do a \hgcmdargs{bisect}{reset}.

650 \interaction{bisect.search.reset}

651

652 \section{Tips for finding bugs effectively}

653

654 \subsection{Give consistent input}

655

656 The \hgext{bisect} extension requires that you correctly report the

657 result of every test you perform. If you tell it that a test failed

658 when it really succeeded, it \emph{might} be able to detect the

659 inconsistency. If it can identify an inconsistency in your reports,

660 it will tell you that a particular changeset is both good and bad.

661 However, it can't do this perfectly; it's about as likely to report

662 the wrong changeset as the source of the bug.

663

664 \subsection{Automate as much as possible}

665

666 When I started using the \hgext{bisect} extension, I tried a few times

667 to run my tests by hand, on the command line. This is an approach

668 that I, at least, am not suited to. After a few tries, I found that I

669 was making enough mistakes that I was having to restart my searches

670 several times before finally getting correct results.

671

672 My initial problems with driving the \hgext{bisect} extension by hand

673 occurred even with simple searches on small repositories; if the

674 problem you're looking for is more subtle, or the number of tests that

675 \hgext{bisect} must perform increases, the likelihood of operator

676 error ruining the search is much higher. Once I started automating my

677 tests, I had much better results.

678

679 The key to automated testing is twofold:

680 \begin{itemize}

681 \item always test for the same symptom, and

682 \item always feed consistent input to the \hgcmd{bisect} command.

683 \end{itemize}

684 In my tutorial example above, the \command{grep} command tests for the

685 symptom, and the \texttt{if} statement takes the result of this check

686 and ensures that we always feed the same input to the \hgcmd{bisect}

687 command. The \texttt{mytest} function marries these together in a

688 reproducible way, so that every test is uniform and consistent.

689

690 \subsection{Check your results}

691

692 Because the output of a \hgext{bisect} search is only as good as the

693 input you give it, don't take the changeset it reports as the

694 absolute truth. A simple way to cross-check its report is to manually

695 run your test at each of the following changesets:

696 \begin{itemize}

697 \item The changeset that it reports as the first bad revision. Your

698 test should still report this as bad.

699 \item The parent of that changeset (either parent, if it's a merge).

700 Your test should report this changeset as good.

701 \item A child of that changeset. Your test should report this

702 changeset as bad.

703 \end{itemize}

704

705 \subsection{Beware interference between bugs}

706

707 It's possible that your search for one bug could be disrupted by the

708 presence of another. For example, let's say your software crashes at

709 revision 100, and worked correctly at revision 50. Unknown to you,

710 someone else introduced a different crashing bug at revision 60, and

711 fixed it at revision 80. This could distort your results in one of

712 several ways.

713

714 It is possible that this other bug completely ``masks'' yours, which

715 is to say that it occurs before your bug has a chance to manifest

716 itself. If you can't avoid that other bug (for example, it prevents

717 your project from building), and so can't tell whether your bug is

718 present in a particular changeset, the \hgext{bisect} extension cannot

719 help you directly. Instead, you'll need to manually avoid the

720 changesets where that bug is present, and do separate searches

721 ``around'' it.

722

723 A different problem could arise if your test for a bug's presence is

724 not specific enough. If you checks for ``my program crashes'', then

725 both your crashing bug and an unrelated crashing bug that masks it

726 will look like the same thing, and mislead \hgext{bisect}.

727

728 \subsection{Bracket your search lazily}

729

730 Choosing the first ``good'' and ``bad'' changesets that will mark the

731 end points of your search is often easy, but it bears a little

732 discussion neverthheless. From the perspective of \hgext{bisect}, the

733 ``newest'' changeset is conventionally ``bad'', and the older

734 changeset is ``good''.

735

736 If you're having trouble remembering when a suitable ``good'' change

737 was, so that you can tell \hgext{bisect}, you could do worse than

738 testing changesets at random. Just remember to eliminate contenders

739 that can't possibly exhibit the bug (perhaps because the feature with

740 the bug isn't present yet) and those where another problem masks the

741 bug (as I discussed above).

742

743 Even if you end up ``early'' by thousands of changesets or months of

744 history, you will only add a handful of tests to the total number that

745 \hgext{bisect} must perform, thanks to its logarithmic behaviour.

746

747 %%% Local Variables:

748 %%% mode: latex

749 %%% TeX-master: "00book"

750 %%% End: