hgbook
diff en/hgext.tex @ 224:34943a3d50d6
Start writing up extensions. Begin with inotify.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Tue May 15 16:24:20 2007 -0700 (2007-05-15) |
parents | 4c9b9416cd23 |
children | eef2171243e8 |
line diff
1.1 --- a/en/hgext.tex Tue May 15 14:55:54 2007 -0700 1.2 +++ b/en/hgext.tex Tue May 15 16:24:20 2007 -0700 1.3 @@ -1,6 +1,208 @@ 1.4 \chapter{Adding functionality with extensions} 1.5 \label{chap:hgext} 1.6 1.7 +While the core of Mercurial is quite complete from a functionality 1.8 +standpoint, it's deliberately shorn of fancy features. This approach 1.9 +of preserving simplicity keeps the software easy to deal with for both 1.10 +maintainers and users. 1.11 + 1.12 +However, Mercurial doesn't box you in with an inflexible command set: 1.13 +you can add features to it as \emph{extensions} (sometimes known as 1.14 +\emph{plugins}). We've already discussed a few of these extensions in 1.15 +earlier chapters. 1.16 +\begin{itemize} 1.17 +\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch} 1.18 + extension; this combines pulling new changes and merging them with 1.19 + local changes into a single command, \hgcmd{fetch}. 1.20 +\item The \hgext{bisect} extension adds an efficient pruning search 1.21 + for changes that introduced bugs, and we documented it in 1.22 + chapter~\ref{sec:undo:bisect}. 1.23 +\item In chapter~\ref{chap:hook}, we covered several extensions that 1.24 + are useful for hook-related functionality: \hgext{acl} adds access 1.25 + control lists; \hgext{bugzilla} adds integration with the Bugzilla 1.26 + bug tracking system; and \hgext{notify} sends notification emails on 1.27 + new changes. 1.28 +\item The Mercurial Queues patch management extension is so invaluable 1.29 + that it merits two chapters and an appendix all to itself. 1.30 + Chapter~\ref{chap:mq} covers the basics; 1.31 + chapter~\ref{chap:mq-collab} discusses advanced topics; and 1.32 + appendix~\ref{chap:mqref} goes into detail on each command. 1.33 +\end{itemize} 1.34 + 1.35 +In this chapter, we'll cover some of the other extensions that are 1.36 +available for Mercurial, and briefly touch on some of the machinery 1.37 +you'll need to know about if you want to write an extension of your 1.38 +own. 1.39 +\begin{itemize} 1.40 +\item In section~\ref{sec:hgext:inotify}, we'll discuss the 1.41 + possibility of \emph{huge} performance improvements using the 1.42 + \hgext{inotify} extension. 1.43 +\end{itemize} 1.44 + 1.45 +\section{Improve performance with the \hgext{inotify} extension} 1.46 +\label{sec:hgext:inotify} 1.47 + 1.48 +Are you interested in having some of the most common Mercurial 1.49 +operations run as much as a hundred times faster? Read on! 1.50 + 1.51 +Mercurial has great performance under normal circumstances. For 1.52 +example, when you run the \hgcmd{status} command, Mercurial has to 1.53 +scan almost every directory and file in your repository so that it can 1.54 +display file status. Many other Mercurial commands need to do the 1.55 +same work behind the scenes; for example, the \hgcmd{diff} command 1.56 +uses the status machinery to avoid doing an expensive comparison 1.57 +operation on files that obviously haven't changed. 1.58 + 1.59 +Because obtaining file status is crucial to good performance, the 1.60 +authors of Mercurial have optimised this code to within an inch of its 1.61 +life. However, there's no avoiding the fact that when you run 1.62 +\hgcmd{status}, Mercurial is going to have to perform at least one 1.63 +expensive system call for each managed file to determine whether it's 1.64 +changed since the last time Mercurial checked. For a sufficiently 1.65 +large repository, this can take a long time. 1.66 + 1.67 +To put a number on the magnitude of this effect, I created a 1.68 +repository containing 150,000 managed files. I timed \hgcmd{status} 1.69 +as taking ten seconds to run, even when \emph{none} of those files had 1.70 +been modified. 1.71 + 1.72 +Many modern operating systems contain a file notification facility. 1.73 +If a program signs up to an appropriate service, the operating system 1.74 +will notify it every time a file of interest is created, modified, or 1.75 +deleted. On Linux systems, the kernel component that does this is 1.76 +called \texttt{inotify}. 1.77 + 1.78 +Mercurial's \hgext{inotify} extension talks to the kernel's 1.79 +\texttt{inotify} component to optimise \hgcmd{status} commands. The 1.80 +extension has two components. A daemon sits in the background and 1.81 +receives notifications from the \texttt{inotify} subsystem. It also 1.82 +listens for connections from a regular Mercurial command. The 1.83 +extension modifies Mercurial's behaviour so that instead of scanning 1.84 +the filesystem, it queries the daemon. Since the daemon has perfect 1.85 +information about the state of the repository, it can respond with a 1.86 +result instantaneously, avoiding the need to scan every directory and 1.87 +file in the repository. 1.88 + 1.89 +Recall the ten seconds that I measured plain Mercurial as taking to 1.90 +run \hgcmd{status} on a 150,000 file repository. With the 1.91 +\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a 1.92 +factor of \emph{one hundred} faster. 1.93 + 1.94 +Before we continue, please pay attention to some caveats. 1.95 +\begin{itemize} 1.96 +\item The \hgext{inotify} extension is Linux-specific. Because it 1.97 + interfaces directly to the Linux kernel's \texttt{inotify} 1.98 + subsystem, it does not work on other operating systems. 1.99 +\item It should work on any Linux distribution that was released after 1.100 + early~2005. Older distributions are likely to have a kernel that 1.101 + lacks \texttt{inotify}, or a version of \texttt{glibc} that does not 1.102 + have the necessary interfacing support. 1.103 +\item Not all filesystems are suitable for use with the 1.104 + \hgext{inotify} extension. Network filesystems such as NFS are a 1.105 + non-starter, for example, particularly if you're running Mercurial 1.106 + on several systems, all mounting the same network filesystem. The 1.107 + kernel's \texttt{inotify} system has no way of knowing about changes 1.108 + made on another system. Most local filesystems (e.g.~ext3, XFS, 1.109 + ReiserFS) should work fine. 1.110 +\end{itemize} 1.111 + 1.112 +The \hgext{inotify} extension is not yet shipped with Mercurial as of 1.113 +May~2007, so it's a little more involved to set up than other 1.114 +extensions. But the performance improvement is worth it! 1.115 + 1.116 +The extension currently comes in two parts: a set of patches to the 1.117 +Mercurial source code, and a library of Python bindings to the 1.118 +\texttt{inotify} subsystem. 1.119 +\begin{note} 1.120 + There are \emph{two} Python \texttt{inotify} binding libraries. One 1.121 + of them is called \texttt{pyinotify}, and is packaged by some Linux 1.122 + distributions as \texttt{python-inotify}. This is \emph{not} the 1.123 + one you'll need, as it is too buggy and inefficient to be practical. 1.124 +\end{note} 1.125 +To get going, it's best to already have a functioning copy of 1.126 +Mercurial installed. 1.127 +\begin{note} 1.128 + If you follow the instructions below, you'll be \emph{replacing} and 1.129 + overwriting any existing installation of Mercurial that you might 1.130 + already have, using the latest ``bleeding edge'' Mercurial code. 1.131 + Don't say you weren't warned! 1.132 +\end{note} 1.133 +\begin{enumerate} 1.134 +\item Clone the Python \texttt{inotify} binding repository. Build and 1.135 + install it. 1.136 + \begin{codesample4} 1.137 + hg clone http://hg.kublai.com/python/inotify 1.138 + cd inotify 1.139 + python setup.py build --force 1.140 + sudo python setup.py install --skip-build 1.141 + \end{codesample4} 1.142 +\item Clone the \dirname{crew} Mercurial repository. Clone the 1.143 + \hgext{inotify} patch repository so that Mercurial Queues will be 1.144 + able to apply patches to your cope of the \dirname{crew} repository. 1.145 + \begin{codesample4} 1.146 + hg clone http://hg.intevation.org/mercurial/crew 1.147 + hg clone crew inotify 1.148 + hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches 1.149 + \end{codesample4} 1.150 +\item Make sure that you have the Mercurial Queues extension, 1.151 + \hgext{mq}, enabled. If you've never used MQ, read 1.152 + section~\ref{sec:mq:start} to get started quickly. 1.153 +\item Go into the \dirname{inotify} repo, and apply all of the 1.154 + \hgext{inotify} patches using the \hgopt{qpush}{-a} option to the 1.155 + \hgcmd{qpush} command. 1.156 + \begin{codesample4} 1.157 + cd inotify 1.158 + hg qpush -a 1.159 + \end{codesample4} 1.160 + If you get an error message from \hgcmd{qpush}, you should not 1.161 + continue. Instead, ask for help. 1.162 +\item Build and install the patched version of Mercurial. 1.163 + \begin{codesample4} 1.164 + python setup.py build --force 1.165 + sudo python setup.py install --skip-build 1.166 + \end{codesample4} 1.167 +\end{enumerate} 1.168 +Once you've build a suitably patched version of Mercurial, all you 1.169 +need to do to enable the \hgext{inotify} extension is add an entry to 1.170 +your \hgrc. 1.171 +\begin{codesample2} 1.172 + [extensions] 1.173 + inotify = 1.174 +\end{codesample2} 1.175 +When the \hgext{inotify} extension is enabled, Mercurial will 1.176 +automatically and transparently start the status daemon the first time 1.177 +you run a command that needs status in a repository. It runs one 1.178 +status daemon per repository. 1.179 + 1.180 +The status daemon is started silently, and runs in the background. If 1.181 +you look at a list of running processes after you've enabled the 1.182 +\hgext{inotify} extension and run a few commands in different 1.183 +repositories, you'll thus see a few \texttt{hg} processes sitting 1.184 +around, waiting for updates from the kernel and queries from 1.185 +Mercurial. 1.186 + 1.187 +The first time you run a Mercurial command in a repository when you 1.188 +have the \hgext{inotify} extension enabled, it will run with about the 1.189 +same performance as a normal Mercurial command. This is because the 1.190 +status daemon needs to perform a normal status scan so that it has a 1.191 +baseline against which to apply later updates from the kernel. 1.192 +However, \emph{every} subsequent command that does any kind of status 1.193 +check should be noticeably faster on repositories of even fairly 1.194 +modest size. Better yet, the bigger your repository is, the greater a 1.195 +performance advantage you'll see. The \hgext{inotify} daemon makes 1.196 +status operations almost instantaneous on repositories of all sizes! 1.197 + 1.198 +If you like, you can manually start a status daemon using the 1.199 +\hgcmd{inserve} command. This gives you slightly finer control over 1.200 +how the daemon ought to run. This command will of course only be 1.201 +available when the \hgext{inotify} extension is enabled. 1.202 + 1.203 +When you're using the \hgext{inotify} extension, you should notice 1.204 +\emph{no difference at all} in Mercurial's behaviour, with the sole 1.205 +exception of status-related commands running a whole lot faster than 1.206 +they used to. You should specifically expect that commands will not 1.207 +print different output; neither should they give different results. 1.208 +If either of these situations occurs, please report a bug. 1.209 1.210 %%% Local Variables: 1.211 %%% mode: latex