hgbook

view en/hgext.tex @ 226:eef2171243e8

Document the extdiff extension.
author Bryan O'Sullivan <bos@serpentine.com>
date Sat May 26 11:52:18 2007 -0700 (2007-05-26)
parents 34943a3d50d6
children 28ddbf9f3729
line source
1 \chapter{Adding functionality with extensions}
2 \label{chap:hgext}
4 While the core of Mercurial is quite complete from a functionality
5 standpoint, it's deliberately shorn of fancy features. This approach
6 of preserving simplicity keeps the software easy to deal with for both
7 maintainers and users.
9 However, Mercurial doesn't box you in with an inflexible command set:
10 you can add features to it as \emph{extensions} (sometimes known as
11 \emph{plugins}). We've already discussed a few of these extensions in
12 earlier chapters.
13 \begin{itemize}
14 \item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch}
15 extension; this combines pulling new changes and merging them with
16 local changes into a single command, \hgcmd{fetch}.
17 \item The \hgext{bisect} extension adds an efficient pruning search
18 for changes that introduced bugs, and we documented it in
19 chapter~\ref{sec:undo:bisect}.
20 \item In chapter~\ref{chap:hook}, we covered several extensions that
21 are useful for hook-related functionality: \hgext{acl} adds access
22 control lists; \hgext{bugzilla} adds integration with the Bugzilla
23 bug tracking system; and \hgext{notify} sends notification emails on
24 new changes.
25 \item The Mercurial Queues patch management extension is so invaluable
26 that it merits two chapters and an appendix all to itself.
27 Chapter~\ref{chap:mq} covers the basics;
28 chapter~\ref{chap:mq-collab} discusses advanced topics; and
29 appendix~\ref{chap:mqref} goes into detail on each command.
30 \end{itemize}
32 In this chapter, we'll cover some of the other extensions that are
33 available for Mercurial, and briefly touch on some of the machinery
34 you'll need to know about if you want to write an extension of your
35 own.
36 \begin{itemize}
37 \item In section~\ref{sec:hgext:inotify}, we'll discuss the
38 possibility of \emph{huge} performance improvements using the
39 \hgext{inotify} extension.
40 \end{itemize}
42 \section{Improve performance with the \hgext{inotify} extension}
43 \label{sec:hgext:inotify}
45 Are you interested in having some of the most common Mercurial
46 operations run as much as a hundred times faster? Read on!
48 Mercurial has great performance under normal circumstances. For
49 example, when you run the \hgcmd{status} command, Mercurial has to
50 scan almost every directory and file in your repository so that it can
51 display file status. Many other Mercurial commands need to do the
52 same work behind the scenes; for example, the \hgcmd{diff} command
53 uses the status machinery to avoid doing an expensive comparison
54 operation on files that obviously haven't changed.
56 Because obtaining file status is crucial to good performance, the
57 authors of Mercurial have optimised this code to within an inch of its
58 life. However, there's no avoiding the fact that when you run
59 \hgcmd{status}, Mercurial is going to have to perform at least one
60 expensive system call for each managed file to determine whether it's
61 changed since the last time Mercurial checked. For a sufficiently
62 large repository, this can take a long time.
64 To put a number on the magnitude of this effect, I created a
65 repository containing 150,000 managed files. I timed \hgcmd{status}
66 as taking ten seconds to run, even when \emph{none} of those files had
67 been modified.
69 Many modern operating systems contain a file notification facility.
70 If a program signs up to an appropriate service, the operating system
71 will notify it every time a file of interest is created, modified, or
72 deleted. On Linux systems, the kernel component that does this is
73 called \texttt{inotify}.
75 Mercurial's \hgext{inotify} extension talks to the kernel's
76 \texttt{inotify} component to optimise \hgcmd{status} commands. The
77 extension has two components. A daemon sits in the background and
78 receives notifications from the \texttt{inotify} subsystem. It also
79 listens for connections from a regular Mercurial command. The
80 extension modifies Mercurial's behaviour so that instead of scanning
81 the filesystem, it queries the daemon. Since the daemon has perfect
82 information about the state of the repository, it can respond with a
83 result instantaneously, avoiding the need to scan every directory and
84 file in the repository.
86 Recall the ten seconds that I measured plain Mercurial as taking to
87 run \hgcmd{status} on a 150,000 file repository. With the
88 \hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a
89 factor of \emph{one hundred} faster.
91 Before we continue, please pay attention to some caveats.
92 \begin{itemize}
93 \item The \hgext{inotify} extension is Linux-specific. Because it
94 interfaces directly to the Linux kernel's \texttt{inotify}
95 subsystem, it does not work on other operating systems.
96 \item It should work on any Linux distribution that was released after
97 early~2005. Older distributions are likely to have a kernel that
98 lacks \texttt{inotify}, or a version of \texttt{glibc} that does not
99 have the necessary interfacing support.
100 \item Not all filesystems are suitable for use with the
101 \hgext{inotify} extension. Network filesystems such as NFS are a
102 non-starter, for example, particularly if you're running Mercurial
103 on several systems, all mounting the same network filesystem. The
104 kernel's \texttt{inotify} system has no way of knowing about changes
105 made on another system. Most local filesystems (e.g.~ext3, XFS,
106 ReiserFS) should work fine.
107 \end{itemize}
109 The \hgext{inotify} extension is not yet shipped with Mercurial as of
110 May~2007, so it's a little more involved to set up than other
111 extensions. But the performance improvement is worth it!
113 The extension currently comes in two parts: a set of patches to the
114 Mercurial source code, and a library of Python bindings to the
115 \texttt{inotify} subsystem.
116 \begin{note}
117 There are \emph{two} Python \texttt{inotify} binding libraries. One
118 of them is called \texttt{pyinotify}, and is packaged by some Linux
119 distributions as \texttt{python-inotify}. This is \emph{not} the
120 one you'll need, as it is too buggy and inefficient to be practical.
121 \end{note}
122 To get going, it's best to already have a functioning copy of
123 Mercurial installed.
124 \begin{note}
125 If you follow the instructions below, you'll be \emph{replacing} and
126 overwriting any existing installation of Mercurial that you might
127 already have, using the latest ``bleeding edge'' Mercurial code.
128 Don't say you weren't warned!
129 \end{note}
130 \begin{enumerate}
131 \item Clone the Python \texttt{inotify} binding repository. Build and
132 install it.
133 \begin{codesample4}
134 hg clone http://hg.kublai.com/python/inotify
135 cd inotify
136 python setup.py build --force
137 sudo python setup.py install --skip-build
138 \end{codesample4}
139 \item Clone the \dirname{crew} Mercurial repository. Clone the
140 \hgext{inotify} patch repository so that Mercurial Queues will be
141 able to apply patches to your cope of the \dirname{crew} repository.
142 \begin{codesample4}
143 hg clone http://hg.intevation.org/mercurial/crew
144 hg clone crew inotify
145 hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches
146 \end{codesample4}
147 \item Make sure that you have the Mercurial Queues extension,
148 \hgext{mq}, enabled. If you've never used MQ, read
149 section~\ref{sec:mq:start} to get started quickly.
150 \item Go into the \dirname{inotify} repo, and apply all of the
151 \hgext{inotify} patches using the \hgopt{qpush}{-a} option to the
152 \hgcmd{qpush} command.
153 \begin{codesample4}
154 cd inotify
155 hg qpush -a
156 \end{codesample4}
157 If you get an error message from \hgcmd{qpush}, you should not
158 continue. Instead, ask for help.
159 \item Build and install the patched version of Mercurial.
160 \begin{codesample4}
161 python setup.py build --force
162 sudo python setup.py install --skip-build
163 \end{codesample4}
164 \end{enumerate}
165 Once you've build a suitably patched version of Mercurial, all you
166 need to do to enable the \hgext{inotify} extension is add an entry to
167 your \hgrc.
168 \begin{codesample2}
169 [extensions]
170 inotify =
171 \end{codesample2}
172 When the \hgext{inotify} extension is enabled, Mercurial will
173 automatically and transparently start the status daemon the first time
174 you run a command that needs status in a repository. It runs one
175 status daemon per repository.
177 The status daemon is started silently, and runs in the background. If
178 you look at a list of running processes after you've enabled the
179 \hgext{inotify} extension and run a few commands in different
180 repositories, you'll thus see a few \texttt{hg} processes sitting
181 around, waiting for updates from the kernel and queries from
182 Mercurial.
184 The first time you run a Mercurial command in a repository when you
185 have the \hgext{inotify} extension enabled, it will run with about the
186 same performance as a normal Mercurial command. This is because the
187 status daemon needs to perform a normal status scan so that it has a
188 baseline against which to apply later updates from the kernel.
189 However, \emph{every} subsequent command that does any kind of status
190 check should be noticeably faster on repositories of even fairly
191 modest size. Better yet, the bigger your repository is, the greater a
192 performance advantage you'll see. The \hgext{inotify} daemon makes
193 status operations almost instantaneous on repositories of all sizes!
195 If you like, you can manually start a status daemon using the
196 \hgcmd{inserve} command. This gives you slightly finer control over
197 how the daemon ought to run. This command will of course only be
198 available when the \hgext{inotify} extension is enabled.
200 When you're using the \hgext{inotify} extension, you should notice
201 \emph{no difference at all} in Mercurial's behaviour, with the sole
202 exception of status-related commands running a whole lot faster than
203 they used to. You should specifically expect that commands will not
204 print different output; neither should they give different results.
205 If either of these situations occurs, please report a bug.
207 \section{Flexible diff support with the \hgext{extdiff} extension}
208 \label{sec:hgext:extdiff}
210 Mercurial's built-in \hgcmd{diff} command outputs plaintext unified
211 diffs.
212 \interaction{extdiff.diff}
213 If you would like to use an external tool to display modifications,
214 you'll want to use the \hgext{extdiff} extension. This will let you
215 use, for example, a graphical diff tool.
217 The \hgext{extdiff} extension is bundled with Mercurial, so it's easy
218 to set up. In the \rcsection{extensions} section of your \hgrc,
219 simply add a one-line entry to enable the extension.
220 \begin{codesample2}
221 [extensions]
222 extdiff =
223 \end{codesample2}
224 This introduces a command named \hgcmd{extdiff}, which by default uses
225 your system's \command{diff} command to generate a unified diff in the
226 same form as the built-in \hgcmd{diff} command.
227 \interaction{extdiff.extdiff}
228 The result won't be exactly the same as with the built-in \hgcmd{diff}
229 variations, because the output of \command{diff} varies from one
230 system to another, even when passed the same options.
232 As the ``\texttt{making snapshot}'' lines of output above imply, the
233 \hgcmd{extdiff} command works by creating two snapshots of your source
234 tree. The first snapshot is of the source revision; the second, of
235 the target revision or working directory. The \hgcmd{extdiff} command
236 generates these snapshots in a temporary directory, passes the name of
237 each directory to an external diff viewer, then deletes the temporary
238 directory. For efficiency, it only snapshots the directories and
239 files that have changed between the two revisions.
241 Snapshot directory names have the same base name as your repository.
242 If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo}
243 will be the name of each snapshot directory. Each snapshot directory
244 name has its changeset ID appended, if appropriate. If a snapshot is
245 of revision \texttt{a631aca1083f}, the directory will be named
246 \dirname{foo.a631aca1083f}. A snapshot of the working directory won't
247 have a changeset ID appended, so it would just be \dirname{foo} in
248 this example. To see what this looks like in practice, look again at
249 the \hgcmd{extdiff} example above. Notice that the diff has the
250 snapshot directory names embedded in its header.
252 The \hgcmd{extdiff} command accepts two important options. The
253 \hgopt{extdiff}{-p} option lets you choose a program to view
254 differences with, instead of \command{diff}. With the
255 \hgopt{extdiff}{-o} option, you can change the options that
256 \hgcmd{extdiff} passes to the program (by default, these options are
257 ``\texttt{-Npru}'', which only make sense if you're running
258 \command{diff}). In other respects, the \hgcmd{extdiff} acts
259 similarly to the built-in \hgcmd{diff} command: you use the same
260 option names, syntax, and arguments to specify the revisions you want,
261 the files you want, and so on.
263 As an example, here's how to run the normal system \command{diff}
264 command, getting it to generate context diffs (using the
265 \cmdopt{diff}{-c} option) instead of unified diffs, and five lines of
266 context instead of the default three (passing \texttt{5} as the
267 argument to the \cmdopt{diff}{-C} option).
268 \interaction{extdiff.extdiff-ctx}
270 Launching a visual diff tool is just as easy. Here's how to launch
271 the \command{kdiff3} viewer.
272 \begin{codesample2}
273 hg extdiff -p kdiff3 -o ''
274 \end{codesample2}
276 If your diff viewing command can't deal with directories, you can
277 easily work around this with a little scripting. For an example of
278 such scripting in action with the \hgext{mq} extension and the
279 \command{interdiff} command, see
280 section~\ref{mq-collab:tips:interdiff}.
282 \subsection{Defining command aliases}
284 It can be cumbersome to remember the options to both the
285 \hgcmd{extdiff} command and the diff viewer you want to use, so the
286 \hgext{extdiff} extension lets you define \emph{new} commands that
287 will invoke your diff viewer with exactly the right options.
289 All you need to do is edit your \hgrc, and add a section named
290 \rcsection{extdiff}. Inside this section, you can define multiple
291 commands. Here's how to add a \texttt{kdiff3} command. Once you've
292 defined this, you can type ``\texttt{hg kdiff3}'' and the
293 \hgext{extdiff} extension will run \command{kdiff3} for you.
294 \begin{codesample2}
295 [extdiff]
296 cmd.kdiff3 =
297 \end{codesample2}
298 If you leave the right hand side of the definition empty, as above,
299 the \hgext{extdiff} extension uses the name of the command you defined
300 as the name of the external program to run. But these names don't
301 have to be the same. Here, we define a command named ``\texttt{hg
302 wibble}'', which runs \command{kdiff3}.
303 \begin{codesample2}
304 [extdiff]
305 cmd.wibble = kdiff3
306 \end{codesample2}
308 You can also specify the default options that you want to invoke your
309 diff viewing program with. The prefix to use is ``\texttt{opts.}'',
310 followed by the name of the command to which the options apply. This
311 example defines a ``\texttt{hg vimdiff}'' command that runs the
312 \command{vim} editor's \texttt{DirDiff} extension.
313 \begin{codesample2}
314 [extdiff]
315 cmd.vimdiff = vim
316 opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)'
317 \end{codesample2}
319 %%% Local Variables:
320 %%% mode: latex
321 %%% TeX-master: "00book"
322 %%% End: