hgbook

changeset 224:34943a3d50d6

Start writing up extensions. Begin with inotify.
author Bryan O'Sullivan <bos@serpentine.com>
date Tue May 15 16:24:20 2007 -0700 (2007-05-15)
parents 4c9b9416cd23
children a631aca1083f
files en/hgext.tex en/mq-collab.tex en/mq-ref.tex en/mq.tex en/tour-merge.tex
line diff
     1.1 --- a/en/hgext.tex	Tue May 15 14:55:54 2007 -0700
     1.2 +++ b/en/hgext.tex	Tue May 15 16:24:20 2007 -0700
     1.3 @@ -1,6 +1,208 @@
     1.4  \chapter{Adding functionality with extensions}
     1.5  \label{chap:hgext}
     1.6  
     1.7 +While the core of Mercurial is quite complete from a functionality
     1.8 +standpoint, it's deliberately shorn of fancy features.  This approach
     1.9 +of preserving simplicity keeps the software easy to deal with for both
    1.10 +maintainers and users.
    1.11 +
    1.12 +However, Mercurial doesn't box you in with an inflexible command set:
    1.13 +you can add features to it as \emph{extensions} (sometimes known as
    1.14 +\emph{plugins}).  We've already discussed a few of these extensions in
    1.15 +earlier chapters.
    1.16 +\begin{itemize}
    1.17 +\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch}
    1.18 +  extension; this combines pulling new changes and merging them with
    1.19 +  local changes into a single command, \hgcmd{fetch}.
    1.20 +\item The \hgext{bisect} extension adds an efficient pruning search
    1.21 +  for changes that introduced bugs, and we documented it in
    1.22 +  chapter~\ref{sec:undo:bisect}.
    1.23 +\item In chapter~\ref{chap:hook}, we covered several extensions that
    1.24 +  are useful for hook-related functionality: \hgext{acl} adds access
    1.25 +  control lists; \hgext{bugzilla} adds integration with the Bugzilla
    1.26 +  bug tracking system; and \hgext{notify} sends notification emails on
    1.27 +  new changes.
    1.28 +\item The Mercurial Queues patch management extension is so invaluable
    1.29 +  that it merits two chapters and an appendix all to itself.
    1.30 +  Chapter~\ref{chap:mq} covers the basics;
    1.31 +  chapter~\ref{chap:mq-collab} discusses advanced topics; and
    1.32 +  appendix~\ref{chap:mqref} goes into detail on each command.
    1.33 +\end{itemize}
    1.34 +
    1.35 +In this chapter, we'll cover some of the other extensions that are
    1.36 +available for Mercurial, and briefly touch on some of the machinery
    1.37 +you'll need to know about if you want to write an extension of your
    1.38 +own.
    1.39 +\begin{itemize}
    1.40 +\item In section~\ref{sec:hgext:inotify}, we'll discuss the
    1.41 +  possibility of \emph{huge} performance improvements using the
    1.42 +  \hgext{inotify} extension.
    1.43 +\end{itemize}
    1.44 +
    1.45 +\section{Improve performance with the \hgext{inotify} extension}
    1.46 +\label{sec:hgext:inotify}
    1.47 +
    1.48 +Are you interested in having some of the most common Mercurial
    1.49 +operations run as much as a hundred times faster?  Read on!
    1.50 +
    1.51 +Mercurial has great performance under normal circumstances.  For
    1.52 +example, when you run the \hgcmd{status} command, Mercurial has to
    1.53 +scan almost every directory and file in your repository so that it can
    1.54 +display file status.  Many other Mercurial commands need to do the
    1.55 +same work behind the scenes; for example, the \hgcmd{diff} command
    1.56 +uses the status machinery to avoid doing an expensive comparison
    1.57 +operation on files that obviously haven't changed.
    1.58 +
    1.59 +Because obtaining file status is crucial to good performance, the
    1.60 +authors of Mercurial have optimised this code to within an inch of its
    1.61 +life.  However, there's no avoiding the fact that when you run
    1.62 +\hgcmd{status}, Mercurial is going to have to perform at least one
    1.63 +expensive system call for each managed file to determine whether it's
    1.64 +changed since the last time Mercurial checked.  For a sufficiently
    1.65 +large repository, this can take a long time.
    1.66 +
    1.67 +To put a number on the magnitude of this effect, I created a
    1.68 +repository containing 150,000 managed files.  I timed \hgcmd{status}
    1.69 +as taking ten seconds to run, even when \emph{none} of those files had
    1.70 +been modified.
    1.71 +
    1.72 +Many modern operating systems contain a file notification facility.
    1.73 +If a program signs up to an appropriate service, the operating system
    1.74 +will notify it every time a file of interest is created, modified, or
    1.75 +deleted.  On Linux systems, the kernel component that does this is
    1.76 +called \texttt{inotify}.
    1.77 +
    1.78 +Mercurial's \hgext{inotify} extension talks to the kernel's
    1.79 +\texttt{inotify} component to optimise \hgcmd{status} commands.  The
    1.80 +extension has two components.  A daemon sits in the background and
    1.81 +receives notifications from the \texttt{inotify} subsystem.  It also
    1.82 +listens for connections from a regular Mercurial command.  The
    1.83 +extension modifies Mercurial's behaviour so that instead of scanning
    1.84 +the filesystem, it queries the daemon.  Since the daemon has perfect
    1.85 +information about the state of the repository, it can respond with a
    1.86 +result instantaneously, avoiding the need to scan every directory and
    1.87 +file in the repository.
    1.88 +
    1.89 +Recall the ten seconds that I measured plain Mercurial as taking to
    1.90 +run \hgcmd{status} on a 150,000 file repository.  With the
    1.91 +\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a
    1.92 +factor of \emph{one hundred} faster.
    1.93 +
    1.94 +Before we continue, please pay attention to some caveats.
    1.95 +\begin{itemize}
    1.96 +\item The \hgext{inotify} extension is Linux-specific.  Because it
    1.97 +  interfaces directly to the Linux kernel's \texttt{inotify}
    1.98 +  subsystem, it does not work on other operating systems.
    1.99 +\item It should work on any Linux distribution that was released after
   1.100 +  early~2005.  Older distributions are likely to have a kernel that
   1.101 +  lacks \texttt{inotify}, or a version of \texttt{glibc} that does not
   1.102 +  have the necessary interfacing support.
   1.103 +\item Not all filesystems are suitable for use with the
   1.104 +  \hgext{inotify} extension.  Network filesystems such as NFS are a
   1.105 +  non-starter, for example, particularly if you're running Mercurial
   1.106 +  on several systems, all mounting the same network filesystem.  The
   1.107 +  kernel's \texttt{inotify} system has no way of knowing about changes
   1.108 +  made on another system.  Most local filesystems (e.g.~ext3, XFS,
   1.109 +  ReiserFS) should work fine.
   1.110 +\end{itemize}
   1.111 +
   1.112 +The \hgext{inotify} extension is not yet shipped with Mercurial as of
   1.113 +May~2007, so it's a little more involved to set up than other
   1.114 +extensions.  But the performance improvement is worth it!
   1.115 +
   1.116 +The extension currently comes in two parts: a set of patches to the
   1.117 +Mercurial source code, and a library of Python bindings to the
   1.118 +\texttt{inotify} subsystem.
   1.119 +\begin{note}
   1.120 +  There are \emph{two} Python \texttt{inotify} binding libraries.  One
   1.121 +  of them is called \texttt{pyinotify}, and is packaged by some Linux
   1.122 +  distributions as \texttt{python-inotify}.  This is \emph{not} the
   1.123 +  one you'll need, as it is too buggy and inefficient to be practical.
   1.124 +\end{note}
   1.125 +To get going, it's best to already have a functioning copy of
   1.126 +Mercurial installed.
   1.127 +\begin{note}
   1.128 +  If you follow the instructions below, you'll be \emph{replacing} and
   1.129 +  overwriting any existing installation of Mercurial that you might
   1.130 +  already have, using the latest ``bleeding edge'' Mercurial code.
   1.131 +  Don't say you weren't warned!
   1.132 +\end{note}
   1.133 +\begin{enumerate}
   1.134 +\item Clone the Python \texttt{inotify} binding repository.  Build and
   1.135 +  install it.
   1.136 +  \begin{codesample4}
   1.137 +    hg clone http://hg.kublai.com/python/inotify
   1.138 +    cd inotify
   1.139 +    python setup.py build --force
   1.140 +    sudo python setup.py install --skip-build
   1.141 +  \end{codesample4}
   1.142 +\item Clone the \dirname{crew} Mercurial repository.  Clone the
   1.143 +  \hgext{inotify} patch repository so that Mercurial Queues will be
   1.144 +  able to apply patches to your cope of the \dirname{crew} repository.
   1.145 +  \begin{codesample4}
   1.146 +    hg clone http://hg.intevation.org/mercurial/crew
   1.147 +    hg clone crew inotify
   1.148 +    hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches
   1.149 +  \end{codesample4}
   1.150 +\item Make sure that you have the Mercurial Queues extension,
   1.151 +  \hgext{mq}, enabled.  If you've never used MQ, read
   1.152 +  section~\ref{sec:mq:start} to get started quickly.
   1.153 +\item Go into the \dirname{inotify} repo, and apply all of the
   1.154 +  \hgext{inotify} patches using the \hgopt{qpush}{-a} option to the
   1.155 +  \hgcmd{qpush} command.
   1.156 +  \begin{codesample4}
   1.157 +    cd inotify
   1.158 +    hg qpush -a
   1.159 +  \end{codesample4}
   1.160 +  If you get an error message from \hgcmd{qpush}, you should not
   1.161 +  continue.  Instead, ask for help.
   1.162 +\item Build and install the patched version of Mercurial.
   1.163 +  \begin{codesample4}
   1.164 +    python setup.py build --force
   1.165 +    sudo python setup.py install --skip-build
   1.166 +  \end{codesample4}
   1.167 +\end{enumerate}
   1.168 +Once you've build a suitably patched version of Mercurial, all you
   1.169 +need to do to enable the \hgext{inotify} extension is add an entry to
   1.170 +your \hgrc.
   1.171 +\begin{codesample2}
   1.172 +  [extensions]
   1.173 +  inotify =
   1.174 +\end{codesample2}
   1.175 +When the \hgext{inotify} extension is enabled, Mercurial will
   1.176 +automatically and transparently start the status daemon the first time
   1.177 +you run a command that needs status in a repository.  It runs one
   1.178 +status daemon per repository.
   1.179 +
   1.180 +The status daemon is started silently, and runs in the background.  If
   1.181 +you look at a list of running processes after you've enabled the
   1.182 +\hgext{inotify} extension and run a few commands in different
   1.183 +repositories, you'll thus see a few \texttt{hg} processes sitting
   1.184 +around, waiting for updates from the kernel and queries from
   1.185 +Mercurial.
   1.186 +
   1.187 +The first time you run a Mercurial command in a repository when you
   1.188 +have the \hgext{inotify} extension enabled, it will run with about the
   1.189 +same performance as a normal Mercurial command.  This is because the
   1.190 +status daemon needs to perform a normal status scan so that it has a
   1.191 +baseline against which to apply later updates from the kernel.
   1.192 +However, \emph{every} subsequent command that does any kind of status
   1.193 +check should be noticeably faster on repositories of even fairly
   1.194 +modest size.  Better yet, the bigger your repository is, the greater a
   1.195 +performance advantage you'll see.  The \hgext{inotify} daemon makes
   1.196 +status operations almost instantaneous on repositories of all sizes!
   1.197 +
   1.198 +If you like, you can manually start a status daemon using the
   1.199 +\hgcmd{inserve} command.  This gives you slightly finer control over
   1.200 +how the daemon ought to run.  This command will of course only be
   1.201 +available when the \hgext{inotify} extension is enabled.
   1.202 +
   1.203 +When you're using the \hgext{inotify} extension, you should notice
   1.204 +\emph{no difference at all} in Mercurial's behaviour, with the sole
   1.205 +exception of status-related commands running a whole lot faster than
   1.206 +they used to.  You should specifically expect that commands will not
   1.207 +print different output; neither should they give different results.
   1.208 +If either of these situations occurs, please report a bug.
   1.209  
   1.210  %%% Local Variables: 
   1.211  %%% mode: latex
     2.1 --- a/en/mq-collab.tex	Tue May 15 14:55:54 2007 -0700
     2.2 +++ b/en/mq-collab.tex	Tue May 15 16:24:20 2007 -0700
     2.3 @@ -1,4 +1,5 @@
     2.4  \chapter{Advanced uses of Mercurial Queues}
     2.5 +\label{chap:mq-collab}
     2.6  
     2.7  While it's easy to pick up straightforward uses of Mercurial Queues,
     2.8  use of a little discipline and some of MQ's less frequently used
     3.1 --- a/en/mq-ref.tex	Tue May 15 14:55:54 2007 -0700
     3.2 +++ b/en/mq-ref.tex	Tue May 15 16:24:20 2007 -0700
     3.3 @@ -1,7 +1,8 @@
     3.4  \chapter{Mercurial Queues reference}
     3.5 +\label{chap:mqref}
     3.6  
     3.7  \section{MQ command reference}
     3.8 -\label{sec:mq:cmdref}
     3.9 +\label{sec:mqref:cmdref}
    3.10  
    3.11  For an overview of the commands provided by MQ, use the command
    3.12  \hgcmdargs{help}{mq}.
    3.13 @@ -178,7 +179,7 @@
    3.14  This will become the topmost applied patch if you run \hgcmd{qpop}.
    3.15  
    3.16  \subsection{\hgcmd{qpush}---push patches onto the stack}
    3.17 -\label{sec:mq:cmd:qpush}
    3.18 +\label{sec:mqref:cmd:qpush}
    3.19  
    3.20  The \hgcmd{qpush} command adds patches onto the applied stack.  By
    3.21  default, it adds only one patch.
     4.1 --- a/en/mq.tex	Tue May 15 14:55:54 2007 -0700
     4.2 +++ b/en/mq.tex	Tue May 15 16:24:20 2007 -0700
     4.3 @@ -790,8 +790,8 @@
     4.4  \item Normally, when you \hgcmd{qpop} a patch and \hgcmd{qpush} it
     4.5    again, the changeset that represents the patch after the pop/push
     4.6    will have a \emph{different identity} than the changeset that
     4.7 -  represented the hash beforehand.  See section~\ref{sec:mq:cmd:qpush}
     4.8 -  for information as to why this is.
     4.9 +  represented the hash beforehand.  See
    4.10 +  section~\ref{sec:mqref:cmd:qpush} for information as to why this is.
    4.11  \item It's not a good idea to \hgcmd{merge} changes from another
    4.12    branch with a patch changeset, at least if you want to maintain the
    4.13    ``patchiness'' of that changeset and changesets below it on the
     5.1 --- a/en/tour-merge.tex	Tue May 15 14:55:54 2007 -0700
     5.2 +++ b/en/tour-merge.tex	Tue May 15 16:24:20 2007 -0700
     5.3 @@ -231,8 +231,8 @@
     5.4  of our merge:
     5.5  \interaction{tour-merge-conflict.commit}
     5.6  
     5.7 -\section{Simplifying the pull-merge-commit 
     5.8 -  sequence}
     5.9 +\section{Simplifying the pull-merge-commit sequence}
    5.10 +\label{sec:tour-merge:fetch}
    5.11  
    5.12  The process of merging changes as outlined above is straightforward,
    5.13  but requires running three commands in sequence.