hgbook

view en/hgext.tex @ 287:0a5879ea5416

Fix wording, per Jim Blandy.
author Bryan O'Sullivan <bos@serpentine.com>
date Fri Jan 18 11:58:40 2008 -0800 (2008-01-18)
parents 7df934d3dcb5
children
line source
1 \chapter{Adding functionality with extensions}
2 \label{chap:hgext}
4 While the core of Mercurial is quite complete from a functionality
5 standpoint, it's deliberately shorn of fancy features. This approach
6 of preserving simplicity keeps the software easy to deal with for both
7 maintainers and users.
9 However, Mercurial doesn't box you in with an inflexible command set:
10 you can add features to it as \emph{extensions} (sometimes known as
11 \emph{plugins}). We've already discussed a few of these extensions in
12 earlier chapters.
13 \begin{itemize}
14 \item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch}
15 extension; this combines pulling new changes and merging them with
16 local changes into a single command, \hgxcmd{fetch}{fetch}.
17 \item In chapter~\ref{chap:hook}, we covered several extensions that
18 are useful for hook-related functionality: \hgext{acl} adds access
19 control lists; \hgext{bugzilla} adds integration with the Bugzilla
20 bug tracking system; and \hgext{notify} sends notification emails on
21 new changes.
22 \item The Mercurial Queues patch management extension is so invaluable
23 that it merits two chapters and an appendix all to itself.
24 Chapter~\ref{chap:mq} covers the basics;
25 chapter~\ref{chap:mq-collab} discusses advanced topics; and
26 appendix~\ref{chap:mqref} goes into detail on each command.
27 \end{itemize}
29 In this chapter, we'll cover some of the other extensions that are
30 available for Mercurial, and briefly touch on some of the machinery
31 you'll need to know about if you want to write an extension of your
32 own.
33 \begin{itemize}
34 \item In section~\ref{sec:hgext:inotify}, we'll discuss the
35 possibility of \emph{huge} performance improvements using the
36 \hgext{inotify} extension.
37 \end{itemize}
39 \section{Improve performance with the \hgext{inotify} extension}
40 \label{sec:hgext:inotify}
42 Are you interested in having some of the most common Mercurial
43 operations run as much as a hundred times faster? Read on!
45 Mercurial has great performance under normal circumstances. For
46 example, when you run the \hgcmd{status} command, Mercurial has to
47 scan almost every directory and file in your repository so that it can
48 display file status. Many other Mercurial commands need to do the
49 same work behind the scenes; for example, the \hgcmd{diff} command
50 uses the status machinery to avoid doing an expensive comparison
51 operation on files that obviously haven't changed.
53 Because obtaining file status is crucial to good performance, the
54 authors of Mercurial have optimised this code to within an inch of its
55 life. However, there's no avoiding the fact that when you run
56 \hgcmd{status}, Mercurial is going to have to perform at least one
57 expensive system call for each managed file to determine whether it's
58 changed since the last time Mercurial checked. For a sufficiently
59 large repository, this can take a long time.
61 To put a number on the magnitude of this effect, I created a
62 repository containing 150,000 managed files. I timed \hgcmd{status}
63 as taking ten seconds to run, even when \emph{none} of those files had
64 been modified.
66 Many modern operating systems contain a file notification facility.
67 If a program signs up to an appropriate service, the operating system
68 will notify it every time a file of interest is created, modified, or
69 deleted. On Linux systems, the kernel component that does this is
70 called \texttt{inotify}.
72 Mercurial's \hgext{inotify} extension talks to the kernel's
73 \texttt{inotify} component to optimise \hgcmd{status} commands. The
74 extension has two components. A daemon sits in the background and
75 receives notifications from the \texttt{inotify} subsystem. It also
76 listens for connections from a regular Mercurial command. The
77 extension modifies Mercurial's behaviour so that instead of scanning
78 the filesystem, it queries the daemon. Since the daemon has perfect
79 information about the state of the repository, it can respond with a
80 result instantaneously, avoiding the need to scan every directory and
81 file in the repository.
83 Recall the ten seconds that I measured plain Mercurial as taking to
84 run \hgcmd{status} on a 150,000 file repository. With the
85 \hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a
86 factor of \emph{one hundred} faster.
88 Before we continue, please pay attention to some caveats.
89 \begin{itemize}
90 \item The \hgext{inotify} extension is Linux-specific. Because it
91 interfaces directly to the Linux kernel's \texttt{inotify}
92 subsystem, it does not work on other operating systems.
93 \item It should work on any Linux distribution that was released after
94 early~2005. Older distributions are likely to have a kernel that
95 lacks \texttt{inotify}, or a version of \texttt{glibc} that does not
96 have the necessary interfacing support.
97 \item Not all filesystems are suitable for use with the
98 \hgext{inotify} extension. Network filesystems such as NFS are a
99 non-starter, for example, particularly if you're running Mercurial
100 on several systems, all mounting the same network filesystem. The
101 kernel's \texttt{inotify} system has no way of knowing about changes
102 made on another system. Most local filesystems (e.g.~ext3, XFS,
103 ReiserFS) should work fine.
104 \end{itemize}
106 The \hgext{inotify} extension is not yet shipped with Mercurial as of
107 May~2007, so it's a little more involved to set up than other
108 extensions. But the performance improvement is worth it!
110 The extension currently comes in two parts: a set of patches to the
111 Mercurial source code, and a library of Python bindings to the
112 \texttt{inotify} subsystem.
113 \begin{note}
114 There are \emph{two} Python \texttt{inotify} binding libraries. One
115 of them is called \texttt{pyinotify}, and is packaged by some Linux
116 distributions as \texttt{python-inotify}. This is \emph{not} the
117 one you'll need, as it is too buggy and inefficient to be practical.
118 \end{note}
119 To get going, it's best to already have a functioning copy of
120 Mercurial installed.
121 \begin{note}
122 If you follow the instructions below, you'll be \emph{replacing} and
123 overwriting any existing installation of Mercurial that you might
124 already have, using the latest ``bleeding edge'' Mercurial code.
125 Don't say you weren't warned!
126 \end{note}
127 \begin{enumerate}
128 \item Clone the Python \texttt{inotify} binding repository. Build and
129 install it.
130 \begin{codesample4}
131 hg clone http://hg.kublai.com/python/inotify
132 cd inotify
133 python setup.py build --force
134 sudo python setup.py install --skip-build
135 \end{codesample4}
136 \item Clone the \dirname{crew} Mercurial repository. Clone the
137 \hgext{inotify} patch repository so that Mercurial Queues will be
138 able to apply patches to your cope of the \dirname{crew} repository.
139 \begin{codesample4}
140 hg clone http://hg.intevation.org/mercurial/crew
141 hg clone crew inotify
142 hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches
143 \end{codesample4}
144 \item Make sure that you have the Mercurial Queues extension,
145 \hgext{mq}, enabled. If you've never used MQ, read
146 section~\ref{sec:mq:start} to get started quickly.
147 \item Go into the \dirname{inotify} repo, and apply all of the
148 \hgext{inotify} patches using the \hgxopt{mq}{qpush}{-a} option to
149 the \hgxcmd{mq}{qpush} command.
150 \begin{codesample4}
151 cd inotify
152 hg qpush -a
153 \end{codesample4}
154 If you get an error message from \hgxcmd{mq}{qpush}, you should not
155 continue. Instead, ask for help.
156 \item Build and install the patched version of Mercurial.
157 \begin{codesample4}
158 python setup.py build --force
159 sudo python setup.py install --skip-build
160 \end{codesample4}
161 \end{enumerate}
162 Once you've build a suitably patched version of Mercurial, all you
163 need to do to enable the \hgext{inotify} extension is add an entry to
164 your \hgrc.
165 \begin{codesample2}
166 [extensions]
167 inotify =
168 \end{codesample2}
169 When the \hgext{inotify} extension is enabled, Mercurial will
170 automatically and transparently start the status daemon the first time
171 you run a command that needs status in a repository. It runs one
172 status daemon per repository.
174 The status daemon is started silently, and runs in the background. If
175 you look at a list of running processes after you've enabled the
176 \hgext{inotify} extension and run a few commands in different
177 repositories, you'll thus see a few \texttt{hg} processes sitting
178 around, waiting for updates from the kernel and queries from
179 Mercurial.
181 The first time you run a Mercurial command in a repository when you
182 have the \hgext{inotify} extension enabled, it will run with about the
183 same performance as a normal Mercurial command. This is because the
184 status daemon needs to perform a normal status scan so that it has a
185 baseline against which to apply later updates from the kernel.
186 However, \emph{every} subsequent command that does any kind of status
187 check should be noticeably faster on repositories of even fairly
188 modest size. Better yet, the bigger your repository is, the greater a
189 performance advantage you'll see. The \hgext{inotify} daemon makes
190 status operations almost instantaneous on repositories of all sizes!
192 If you like, you can manually start a status daemon using the
193 \hgxcmd{inotify}{inserve} command. This gives you slightly finer
194 control over how the daemon ought to run. This command will of course
195 only be available when the \hgext{inotify} extension is enabled.
197 When you're using the \hgext{inotify} extension, you should notice
198 \emph{no difference at all} in Mercurial's behaviour, with the sole
199 exception of status-related commands running a whole lot faster than
200 they used to. You should specifically expect that commands will not
201 print different output; neither should they give different results.
202 If either of these situations occurs, please report a bug.
204 \section{Flexible diff support with the \hgext{extdiff} extension}
205 \label{sec:hgext:extdiff}
207 Mercurial's built-in \hgcmd{diff} command outputs plaintext unified
208 diffs.
209 \interaction{extdiff.diff}
210 If you would like to use an external tool to display modifications,
211 you'll want to use the \hgext{extdiff} extension. This will let you
212 use, for example, a graphical diff tool.
214 The \hgext{extdiff} extension is bundled with Mercurial, so it's easy
215 to set up. In the \rcsection{extensions} section of your \hgrc,
216 simply add a one-line entry to enable the extension.
217 \begin{codesample2}
218 [extensions]
219 extdiff =
220 \end{codesample2}
221 This introduces a command named \hgxcmd{extdiff}{extdiff}, which by
222 default uses your system's \command{diff} command to generate a
223 unified diff in the same form as the built-in \hgcmd{diff} command.
224 \interaction{extdiff.extdiff}
225 The result won't be exactly the same as with the built-in \hgcmd{diff}
226 variations, because the output of \command{diff} varies from one
227 system to another, even when passed the same options.
229 As the ``\texttt{making snapshot}'' lines of output above imply, the
230 \hgxcmd{extdiff}{extdiff} command works by creating two snapshots of
231 your source tree. The first snapshot is of the source revision; the
232 second, of the target revision or working directory. The
233 \hgxcmd{extdiff}{extdiff} command generates these snapshots in a
234 temporary directory, passes the name of each directory to an external
235 diff viewer, then deletes the temporary directory. For efficiency, it
236 only snapshots the directories and files that have changed between the
237 two revisions.
239 Snapshot directory names have the same base name as your repository.
240 If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo}
241 will be the name of each snapshot directory. Each snapshot directory
242 name has its changeset ID appended, if appropriate. If a snapshot is
243 of revision \texttt{a631aca1083f}, the directory will be named
244 \dirname{foo.a631aca1083f}. A snapshot of the working directory won't
245 have a changeset ID appended, so it would just be \dirname{foo} in
246 this example. To see what this looks like in practice, look again at
247 the \hgxcmd{extdiff}{extdiff} example above. Notice that the diff has
248 the snapshot directory names embedded in its header.
250 The \hgxcmd{extdiff}{extdiff} command accepts two important options.
251 The \hgxopt{extdiff}{extdiff}{-p} option lets you choose a program to
252 view differences with, instead of \command{diff}. With the
253 \hgxopt{extdiff}{extdiff}{-o} option, you can change the options that
254 \hgxcmd{extdiff}{extdiff} passes to the program (by default, these
255 options are ``\texttt{-Npru}'', which only make sense if you're
256 running \command{diff}). In other respects, the
257 \hgxcmd{extdiff}{extdiff} command acts similarly to the built-in
258 \hgcmd{diff} command: you use the same option names, syntax, and
259 arguments to specify the revisions you want, the files you want, and
260 so on.
262 As an example, here's how to run the normal system \command{diff}
263 command, getting it to generate context diffs (using the
264 \cmdopt{diff}{-c} option) instead of unified diffs, and five lines of
265 context instead of the default three (passing \texttt{5} as the
266 argument to the \cmdopt{diff}{-C} option).
267 \interaction{extdiff.extdiff-ctx}
269 Launching a visual diff tool is just as easy. Here's how to launch
270 the \command{kdiff3} viewer.
271 \begin{codesample2}
272 hg extdiff -p kdiff3 -o ''
273 \end{codesample2}
275 If your diff viewing command can't deal with directories, you can
276 easily work around this with a little scripting. For an example of
277 such scripting in action with the \hgext{mq} extension and the
278 \command{interdiff} command, see
279 section~\ref{mq-collab:tips:interdiff}.
281 \subsection{Defining command aliases}
283 It can be cumbersome to remember the options to both the
284 \hgxcmd{extdiff}{extdiff} command and the diff viewer you want to use,
285 so the \hgext{extdiff} extension lets you define \emph{new} commands
286 that will invoke your diff viewer with exactly the right options.
288 All you need to do is edit your \hgrc, and add a section named
289 \rcsection{extdiff}. Inside this section, you can define multiple
290 commands. Here's how to add a \texttt{kdiff3} command. Once you've
291 defined this, you can type ``\texttt{hg kdiff3}'' and the
292 \hgext{extdiff} extension will run \command{kdiff3} for you.
293 \begin{codesample2}
294 [extdiff]
295 cmd.kdiff3 =
296 \end{codesample2}
297 If you leave the right hand side of the definition empty, as above,
298 the \hgext{extdiff} extension uses the name of the command you defined
299 as the name of the external program to run. But these names don't
300 have to be the same. Here, we define a command named ``\texttt{hg
301 wibble}'', which runs \command{kdiff3}.
302 \begin{codesample2}
303 [extdiff]
304 cmd.wibble = kdiff3
305 \end{codesample2}
307 You can also specify the default options that you want to invoke your
308 diff viewing program with. The prefix to use is ``\texttt{opts.}'',
309 followed by the name of the command to which the options apply. This
310 example defines a ``\texttt{hg vimdiff}'' command that runs the
311 \command{vim} editor's \texttt{DirDiff} extension.
312 \begin{codesample2}
313 [extdiff]
314 cmd.vimdiff = vim
315 opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)'
316 \end{codesample2}
318 \section{Cherrypicking changes with the \hgext{transplant} extension}
319 \label{sec:hgext:transplant}
321 Need to have a long chat with Brendan about this.
323 \section{Send changes via email with the \hgext{patchbomb} extension}
324 \label{sec:hgext:patchbomb}
326 Many projects have a culture of ``change review'', in which people
327 send their modifications to a mailing list for others to read and
328 comment on before they commit the final version to a shared
329 repository. Some projects have people who act as gatekeepers; they
330 apply changes from other people to a repository to which those others
331 don't have access.
333 Mercurial makes it easy to send changes over email for review or
334 application, via its \hgext{patchbomb} extension. The extension is so
335 namd because changes are formatted as patches, and it's usual to send
336 one changeset per email message. Sending a long series of changes by
337 email is thus much like ``bombing'' the recipient's inbox, hence
338 ``patchbomb''.
340 As usual, the basic configuration of the \hgext{patchbomb} extension
341 takes just one or two lines in your \hgrc.
342 \begin{codesample2}
343 [extensions]
344 patchbomb =
345 \end{codesample2}
346 Once you've enabled the extension, you will have a new command
347 available, named \hgxcmd{patchbomb}{email}.
349 The safest and best way to invoke the \hgxcmd{patchbomb}{email}
350 command is to \emph{always} run it first with the
351 \hgxopt{patchbomb}{email}{-n} option. This will show you what the
352 command \emph{would} send, without actually sending anything. Once
353 you've had a quick glance over the changes and verified that you are
354 sending the right ones, you can rerun the same command, with the
355 \hgxopt{patchbomb}{email}{-n} option removed.
357 The \hgxcmd{patchbomb}{email} command accepts the same kind of
358 revision syntax as every other Mercurial command. For example, this
359 command will send every revision between 7 and \texttt{tip},
360 inclusive.
361 \begin{codesample2}
362 hg email -n 7:tip
363 \end{codesample2}
364 You can also specify a \emph{repository} to compare with. If you
365 provide a repository but no revisions, the \hgxcmd{patchbomb}{email}
366 command will send all revisions in the local repository that are not
367 present in the remote repository. If you additionally specify
368 revisions or a branch name (the latter using the
369 \hgxopt{patchbomb}{email}{-b} option), this will constrain the
370 revisions sent.
372 It's perfectly safe to run the \hgxcmd{patchbomb}{email} command
373 without the names of the people you want to send to: if you do this,
374 it will just prompt you for those values interactively. (If you're
375 using a Linux or Unix-like system, you should have enhanced
376 \texttt{readline}-style editing capabilities when entering those
377 headers, too, which is useful.)
379 When you are sending just one revision, the \hgxcmd{patchbomb}{email}
380 command will by default use the first line of the changeset
381 description as the subject of the single email message it sends.
383 If you send multiple revisions, the \hgxcmd{patchbomb}{email} command
384 will usually send one message per changeset. It will preface the
385 series with an introductory message, in which you should describe the
386 purpose of the series of changes you're sending.
388 \subsection{Changing the behaviour of patchbombs}
390 Not every project has exactly the same conventions for sending changes
391 in email; the \hgext{patchbomb} extension tries to accommodate a
392 number of variations through command line options.
393 \begin{itemize}
394 \item You can write a subject for the introductory message on the
395 command line using the \hgxopt{patchbomb}{email}{-s} option. This
396 takes one argument, the text of the subject to use.
397 \item To change the email address from which the messages originate,
398 use the \hgxopt{patchbomb}{email}{-f} option. This takes one
399 argument, the email address to use.
400 \item The default behaviour is to send unified diffs (see
401 section~\ref{sec:mq:patch} for a description of the format), one per
402 message. You can send a binary bundle instead with the
403 \hgxopt{patchbomb}{email}{-b} option.
404 \item Unified diffs are normally prefaced with a metadata header. You
405 can omit this, and send unadorned diffs, with the
406 \hgxopt{patchbomb}{email}{--plain} option.
407 \item Diffs are normally sent ``inline'', in the same body part as the
408 description of a patch. This makes it easiest for the largest
409 number of readers to quote and respond to parts of a diff, as some
410 mail clients will only quote the first MIME body part in a message.
411 If you'd prefer to send the description and the diff in separate
412 body parts, use the \hgxopt{patchbomb}{email}{-a} option.
413 \item Instead of sending mail messages, you can write them to an
414 \texttt{mbox}-format mail folder using the
415 \hgxopt{patchbomb}{email}{-m} option. That option takes one
416 argument, the name of the file to write to.
417 \item If you would like to add a \command{diffstat}-format summary to
418 each patch, and one to the introductory message, use the
419 \hgxopt{patchbomb}{email}{-d} option. The \command{diffstat}
420 command displays a table containing the name of each file patched,
421 the number of lines affected, and a histogram showing how much each
422 file is modified. This gives readers a qualitative glance at how
423 complex a patch is.
424 \end{itemize}
426 %%% Local Variables:
427 %%% mode: latex
428 %%% TeX-master: "00book"
429 %%% End: