hgbook

changeset 550:5cd47f721686
Rename LaTeX input files to have numeric prefixes
author: Bryan O'Sullivan <bos@serpentine.com>
date: Thu Jan 29 22:56:27 2009 -0800 (2009-01-29)
parents: bc14f94e726a
children: f72b7e6cbe90
files: en/00book.tex en/Makefile en/appA-cmdref.tex en/appB-mq-ref.tex en/appC-srcinstall.tex en/appD-license.tex en/branch.tex en/ch00-preface.tex en/ch01-intro.tex en/ch02-tour-basic.tex en/ch03-tour-merge.tex en/ch04-concepts.tex en/ch05-daily.tex en/ch06-collab.tex en/ch07-filenames.tex en/ch08-branch.tex en/ch09-undo.tex en/ch10-hook.tex en/ch11-template.tex en/ch12-mq.tex en/ch13-mq-collab.tex en/ch14-hgext.tex en/cmdref.tex en/collab.tex en/concepts.tex en/daily.tex en/filenames.tex en/hgext.tex en/hook.tex en/intro.tex en/license.tex en/mq-collab.tex en/mq-ref.tex en/mq.tex en/preface.tex en/srcinstall.tex en/template.tex en/tour-basic.tex en/tour-merge.tex en/undo.tex
     1.1 --- a/en/00book.tex	Thu Jan 29 22:47:34 2009 -0800
     1.2 +++ b/en/00book.tex	Thu Jan 29 22:56:27 2009 -0800
     1.3 @@ -40,27 +40,27 @@
     1.4  
     1.5  \pagenumbering{arabic}
     1.6  
     1.7 -\include{preface}
     1.8 -\include{intro}
     1.9 -\include{tour-basic}
    1.10 -\include{tour-merge}
    1.11 -\include{concepts}
    1.12 -\include{daily}
    1.13 -\include{collab}
    1.14 -\include{filenames}
    1.15 -\include{branch}
    1.16 -\include{undo}
    1.17 -\include{hook}
    1.18 -\include{template}
    1.19 -\include{mq}
    1.20 -\include{mq-collab}
    1.21 -\include{hgext}
    1.22 +\include{ch00-preface}
    1.23 +\include{ch01-intro}
    1.24 +\include{ch02-tour-basic}
    1.25 +\include{ch03-tour-merge}
    1.26 +\include{ch04-concepts}
    1.27 +\include{ch05-daily}
    1.28 +\include{ch06-collab}
    1.29 +\include{ch07-filenames}
    1.30 +\include{ch08-branch}
    1.31 +\include{ch09-undo}
    1.32 +\include{ch10-hook}
    1.33 +\include{ch11-template}
    1.34 +\include{ch12-mq}
    1.35 +\include{ch13-mq-collab}
    1.36 +\include{ch14-hgext}
    1.37  
    1.38  \appendix
    1.39 -\include{cmdref}
    1.40 -\include{mq-ref}
    1.41 -\include{srcinstall}
    1.42 -\include{license}
    1.43 +\include{appA-cmdref}
    1.44 +\include{appB-mq-ref}
    1.45 +\include{appC-srcinstall}
    1.46 +\include{appD-license}
    1.47  \addcontentsline{toc}{chapter}{Bibliography}
    1.48  \bibliographystyle{alpha}
    1.49  \bibliography{99book}

     2.1 --- a/en/Makefile	Thu Jan 29 22:47:34 2009 -0800
     2.2 +++ b/en/Makefile	Thu Jan 29 22:56:27 2009 -0800
     2.3 @@ -4,26 +4,8 @@
     2.4  	00book.tex \
     2.5  	99book.bib \
     2.6  	99defs.tex \
     2.7 -	build_id.tex \
     2.8 -	branch.tex \
     2.9 -	cmdref.tex \
    2.10 -	collab.tex \
    2.11 -	concepts.tex \
    2.12 -	daily.tex \
    2.13 -	filenames.tex \
    2.14 -	hg_id.tex \
    2.15 -	hgext.tex \
    2.16 -	hook.tex \
    2.17 -	intro.tex \
    2.18 -	mq.tex \
    2.19 -	mq-collab.tex \
    2.20 -	mq-ref.tex \
    2.21 -	preface.tex \
    2.22 -	srcinstall.tex \
    2.23 -	template.tex \
    2.24 -	tour-basic.tex \
    2.25 -	tour-merge.tex \
    2.26 -	undo.tex
    2.27 +	app*.tex \
    2.28 +	ch*.tex
    2.29  
    2.30  image-sources := \
    2.31  	feature-branches.dot \

     3.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     3.2 +++ b/en/appA-cmdref.tex	Thu Jan 29 22:56:27 2009 -0800
     3.3 @@ -0,0 +1,176 @@
     3.4 +\chapter{Command reference}
     3.5 +\label{cmdref}
     3.6 +
     3.7 +\cmdref{add}{add files at the next commit}
     3.8 +\optref{add}{I}{include}
     3.9 +\optref{add}{X}{exclude}
    3.10 +\optref{add}{n}{dry-run}
    3.11 +
    3.12 +\cmdref{diff}{print changes in history or working directory}
    3.13 +
    3.14 +Show differences between revisions for the specified files or
    3.15 +directories, using the unified diff format.  For a description of the
    3.16 +unified diff format, see section~\ref{sec:mq:patch}.
    3.17 +
    3.18 +By default, this command does not print diffs for files that Mercurial
    3.19 +considers to contain binary data.  To control this behaviour, see the
    3.20 +\hgopt{diff}{-a} and \hgopt{diff}{--git} options.
    3.21 +
    3.22 +\subsection{Options}
    3.23 +
    3.24 +\loptref{diff}{nodates}
    3.25 +
    3.26 +Omit date and time information when printing diff headers.
    3.27 +
    3.28 +\optref{diff}{B}{ignore-blank-lines}
    3.29 +
    3.30 +Do not print changes that only insert or delete blank lines.  A line
    3.31 +that contains only whitespace is not considered blank.
    3.32 +
    3.33 +\optref{diff}{I}{include}
    3.34 +
    3.35 +Include files and directories whose names match the given patterns.
    3.36 +
    3.37 +\optref{diff}{X}{exclude}
    3.38 +
    3.39 +Exclude files and directories whose names match the given patterns.
    3.40 +
    3.41 +\optref{diff}{a}{text}
    3.42 +
    3.43 +If this option is not specified, \hgcmd{diff} will refuse to print
    3.44 +diffs for files that it detects as binary. Specifying \hgopt{diff}{-a}
    3.45 +forces \hgcmd{diff} to treat all files as text, and generate diffs for
    3.46 +all of them.
    3.47 +
    3.48 +This option is useful for files that are ``mostly text'' but have a
    3.49 +few embedded NUL characters.  If you use it on files that contain a
    3.50 +lot of binary data, its output will be incomprehensible.
    3.51 +
    3.52 +\optref{diff}{b}{ignore-space-change}
    3.53 +
    3.54 +Do not print a line if the only change to that line is in the amount
    3.55 +of white space it contains.
    3.56 +
    3.57 +\optref{diff}{g}{git}
    3.58 +
    3.59 +Print \command{git}-compatible diffs.  XXX reference a format
    3.60 +description.
    3.61 +
    3.62 +\optref{diff}{p}{show-function}
    3.63 +
    3.64 +Display the name of the enclosing function in a hunk header, using a
    3.65 +simple heuristic.  This functionality is enabled by default, so the
    3.66 +\hgopt{diff}{-p} option has no effect unless you change the value of
    3.67 +the \rcitem{diff}{showfunc} config item, as in the following example.
    3.68 +\interaction{cmdref.diff-p}
    3.69 +
    3.70 +\optref{diff}{r}{rev}
    3.71 +
    3.72 +Specify one or more revisions to compare.  The \hgcmd{diff} command
    3.73 +accepts up to two \hgopt{diff}{-r} options to specify the revisions to
    3.74 +compare.
    3.75 +
    3.76 +\begin{enumerate}
    3.77 +\setcounter{enumi}{0}
    3.78 +\item Display the differences between the parent revision of the
    3.79 +  working directory and the working directory.
    3.80 +\item Display the differences between the specified changeset and the
    3.81 +  working directory.
    3.82 +\item Display the differences between the two specified changesets.
    3.83 +\end{enumerate}
    3.84 +
    3.85 +You can specify two revisions using either two \hgopt{diff}{-r}
    3.86 +options or revision range notation.  For example, the two revision
    3.87 +specifications below are equivalent.
    3.88 +\begin{codesample2}
    3.89 +  hg diff -r 10 -r 20
    3.90 +  hg diff -r10:20
    3.91 +\end{codesample2}
    3.92 +
    3.93 +When you provide two revisions, Mercurial treats the order of those
    3.94 +revisions as significant.  Thus, \hgcmdargs{diff}{-r10:20} will
    3.95 +produce a diff that will transform files from their contents as of
    3.96 +revision~10 to their contents as of revision~20, while
    3.97 +\hgcmdargs{diff}{-r20:10} means the opposite: the diff that will
    3.98 +transform files from their revision~20 contents to their revision~10
    3.99 +contents.  You cannot reverse the ordering in this way if you are
   3.100 +diffing against the working directory.
   3.101 +
   3.102 +\optref{diff}{w}{ignore-all-space}
   3.103 +
   3.104 +\cmdref{version}{print version and copyright information}
   3.105 +
   3.106 +This command displays the version of Mercurial you are running, and
   3.107 +its copyright license.  There are four kinds of version string that
   3.108 +you may see.
   3.109 +\begin{itemize}
   3.110 +\item The string ``\texttt{unknown}''. This version of Mercurial was
   3.111 +  not built in a Mercurial repository, and cannot determine its own
   3.112 +  version.
   3.113 +\item A short numeric string, such as ``\texttt{1.1}''. This is a
   3.114 +  build of a revision of Mercurial that was identified by a specific
   3.115 +  tag in the repository where it was built.  (This doesn't necessarily
   3.116 +  mean that you're running an official release; someone else could
   3.117 +  have added that tag to any revision in the repository where they
   3.118 +  built Mercurial.)
   3.119 +\item A hexadecimal string, such as ``\texttt{875489e31abe}''.  This
   3.120 +  is a build of the given revision of Mercurial.
   3.121 +\item A hexadecimal string followed by a date, such as
   3.122 +  ``\texttt{875489e31abe+20070205}''.  This is a build of the given
   3.123 +  revision of Mercurial, where the build repository contained some
   3.124 +  local changes that had not been committed.
   3.125 +\end{itemize}
   3.126 +
   3.127 +\subsection{Tips and tricks}
   3.128 +
   3.129 +\subsubsection{Why do the results of \hgcmd{diff} and \hgcmd{status}
   3.130 +  differ?}
   3.131 +\label{cmdref:diff-vs-status}
   3.132 +
   3.133 +When you run the \hgcmd{status} command, you'll see a list of files
   3.134 +that Mercurial will record changes for the next time you perform a
   3.135 +commit.  If you run the \hgcmd{diff} command, you may notice that it
   3.136 +prints diffs for only a \emph{subset} of the files that \hgcmd{status}
   3.137 +listed.  There are two possible reasons for this.
   3.138 +
   3.139 +The first is that \hgcmd{status} prints some kinds of modifications
   3.140 +that \hgcmd{diff} doesn't normally display.  The \hgcmd{diff} command
   3.141 +normally outputs unified diffs, which don't have the ability to
   3.142 +represent some changes that Mercurial can track.  Most notably,
   3.143 +traditional diffs can't represent a change in whether or not a file is
   3.144 +executable, but Mercurial records this information.
   3.145 +
   3.146 +If you use the \hgopt{diff}{--git} option to \hgcmd{diff}, it will
   3.147 +display \command{git}-compatible diffs that \emph{can} display this
   3.148 +extra information.
   3.149 +
   3.150 +The second possible reason that \hgcmd{diff} might be printing diffs
   3.151 +for a subset of the files displayed by \hgcmd{status} is that if you
   3.152 +invoke it without any arguments, \hgcmd{diff} prints diffs against the
   3.153 +first parent of the working directory.  If you have run \hgcmd{merge}
   3.154 +to merge two changesets, but you haven't yet committed the results of
   3.155 +the merge, your working directory has two parents (use \hgcmd{parents}
   3.156 +to see them).  While \hgcmd{status} prints modifications relative to
   3.157 +\emph{both} parents after an uncommitted merge, \hgcmd{diff} still
   3.158 +operates relative only to the first parent.  You can get it to print
   3.159 +diffs relative to the second parent by specifying that parent with the
   3.160 +\hgopt{diff}{-r} option.  There is no way to print diffs relative to
   3.161 +both parents.
   3.162 +
   3.163 +\subsubsection{Generating safe binary diffs}
   3.164 +
   3.165 +If you use the \hgopt{diff}{-a} option to force Mercurial to print
   3.166 +diffs of files that are either ``mostly text'' or contain lots of
   3.167 +binary data, those diffs cannot subsequently be applied by either
   3.168 +Mercurial's \hgcmd{import} command or the system's \command{patch}
   3.169 +command.  
   3.170 +
   3.171 +If you want to generate a diff of a binary file that is safe to use as
   3.172 +input for \hgcmd{import}, use the \hgcmd{diff}{--git} option when you
   3.173 +generate the patch.  The system \command{patch} command cannot handle
   3.174 +binary patches at all.
   3.175 +
   3.176 +%%% Local Variables: 
   3.177 +%%% mode: latex
   3.178 +%%% TeX-master: "00book"
   3.179 +%%% End: 

     4.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     4.2 +++ b/en/appB-mq-ref.tex	Thu Jan 29 22:56:27 2009 -0800
     4.3 @@ -0,0 +1,349 @@
     4.4 +\chapter{Mercurial Queues reference}
     4.5 +\label{chap:mqref}
     4.6 +
     4.7 +\section{MQ command reference}
     4.8 +\label{sec:mqref:cmdref}
     4.9 +
    4.10 +For an overview of the commands provided by MQ, use the command
    4.11 +\hgcmdargs{help}{mq}.
    4.12 +
    4.13 +\subsection{\hgxcmd{mq}{qapplied}---print applied patches}
    4.14 +
    4.15 +The \hgxcmd{mq}{qapplied} command prints the current stack of applied
    4.16 +patches.  Patches are printed in oldest-to-newest order, so the last
    4.17 +patch in the list is the ``top'' patch.
    4.18 +
    4.19 +\subsection{\hgxcmd{mq}{qcommit}---commit changes in the queue repository}
    4.20 +
    4.21 +The \hgxcmd{mq}{qcommit} command commits any outstanding changes in the
    4.22 +\sdirname{.hg/patches} repository.  This command only works if the
    4.23 +\sdirname{.hg/patches} directory is a repository, i.e.~you created the
    4.24 +directory using \hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} or ran
    4.25 +\hgcmd{init} in the directory after running \hgxcmd{mq}{qinit}.
    4.26 +
    4.27 +This command is shorthand for \hgcmdargs{commit}{--cwd .hg/patches}.
    4.28 +
    4.29 +\subsection{\hgxcmd{mq}{qdelete}---delete a patch from the
    4.30 +  \sfilename{series} file}
    4.31 +
    4.32 +The \hgxcmd{mq}{qdelete} command removes the entry for a patch from the
    4.33 +\sfilename{series} file in the \sdirname{.hg/patches} directory.  It
    4.34 +does not pop the patch if the patch is already applied.  By default,
    4.35 +it does not delete the patch file; use the \hgxopt{mq}{qdel}{-f} option to
    4.36 +do that.
    4.37 +
    4.38 +Options:
    4.39 +\begin{itemize}
    4.40 +\item[\hgxopt{mq}{qdel}{-f}] Delete the patch file.
    4.41 +\end{itemize}
    4.42 +
    4.43 +\subsection{\hgxcmd{mq}{qdiff}---print a diff of the topmost applied patch}
    4.44 +
    4.45 +The \hgxcmd{mq}{qdiff} command prints a diff of the topmost applied patch.
    4.46 +It is equivalent to \hgcmdargs{diff}{-r-2:-1}.
    4.47 +
    4.48 +\subsection{\hgxcmd{mq}{qfold}---merge (``fold'') several patches into one}
    4.49 +
    4.50 +The \hgxcmd{mq}{qfold} command merges multiple patches into the topmost
    4.51 +applied patch, so that the topmost applied patch makes the union of
    4.52 +all of the changes in the patches in question.
    4.53 +
    4.54 +The patches to fold must not be applied; \hgxcmd{mq}{qfold} will exit with
    4.55 +an error if any is.  The order in which patches are folded is
    4.56 +significant; \hgcmdargs{qfold}{a b} means ``apply the current topmost
    4.57 +patch, followed by \texttt{a}, followed by \texttt{b}''.
    4.58 +
    4.59 +The comments from the folded patches are appended to the comments of
    4.60 +the destination patch, with each block of comments separated by three
    4.61 +asterisk (``\texttt{*}'') characters.  Use the \hgxopt{mq}{qfold}{-e}
    4.62 +option to edit the commit message for the combined patch/changeset
    4.63 +after the folding has completed.
    4.64 +
    4.65 +Options:
    4.66 +\begin{itemize}
    4.67 +\item[\hgxopt{mq}{qfold}{-e}] Edit the commit message and patch description
    4.68 +  for the newly folded patch.
    4.69 +\item[\hgxopt{mq}{qfold}{-l}] Use the contents of the given file as the new
    4.70 +  commit message and patch description for the folded patch.
    4.71 +\item[\hgxopt{mq}{qfold}{-m}] Use the given text as the new commit message
    4.72 +  and patch description for the folded patch.
    4.73 +\end{itemize}
    4.74 +
    4.75 +\subsection{\hgxcmd{mq}{qheader}---display the header/description of a patch}
    4.76 +
    4.77 +The \hgxcmd{mq}{qheader} command prints the header, or description, of a
    4.78 +patch.  By default, it prints the header of the topmost applied patch.
    4.79 +Given an argument, it prints the header of the named patch.
    4.80 +
    4.81 +\subsection{\hgxcmd{mq}{qimport}---import a third-party patch into the queue}
    4.82 +
    4.83 +The \hgxcmd{mq}{qimport} command adds an entry for an external patch to the
    4.84 +\sfilename{series} file, and copies the patch into the
    4.85 +\sdirname{.hg/patches} directory.  It adds the entry immediately after
    4.86 +the topmost applied patch, but does not push the patch.
    4.87 +
    4.88 +If the \sdirname{.hg/patches} directory is a repository,
    4.89 +\hgxcmd{mq}{qimport} automatically does an \hgcmd{add} of the imported
    4.90 +patch.
    4.91 +
    4.92 +\subsection{\hgxcmd{mq}{qinit}---prepare a repository to work with MQ}
    4.93 +
    4.94 +The \hgxcmd{mq}{qinit} command prepares a repository to work with MQ.  It
    4.95 +creates a directory called \sdirname{.hg/patches}.
    4.96 +
    4.97 +Options:
    4.98 +\begin{itemize}
    4.99 +\item[\hgxopt{mq}{qinit}{-c}] Create \sdirname{.hg/patches} as a repository
   4.100 +  in its own right.  Also creates a \sfilename{.hgignore} file that
   4.101 +  will ignore the \sfilename{status} file.
   4.102 +\end{itemize}
   4.103 +
   4.104 +When the \sdirname{.hg/patches} directory is a repository, the
   4.105 +\hgxcmd{mq}{qimport} and \hgxcmd{mq}{qnew} commands automatically \hgcmd{add}
   4.106 +new patches.
   4.107 +
   4.108 +\subsection{\hgxcmd{mq}{qnew}---create a new patch}
   4.109 +
   4.110 +The \hgxcmd{mq}{qnew} command creates a new patch.  It takes one mandatory
   4.111 +argument, the name to use for the patch file.  The newly created patch
   4.112 +is created empty by default.  It is added to the \sfilename{series}
   4.113 +file after the current topmost applied patch, and is immediately
   4.114 +pushed on top of that patch.
   4.115 +
   4.116 +If \hgxcmd{mq}{qnew} finds modified files in the working directory, it will
   4.117 +refuse to create a new patch unless the \hgxopt{mq}{qnew}{-f} option is
   4.118 +used (see below).  This behaviour allows you to \hgxcmd{mq}{qrefresh} your
   4.119 +topmost applied patch before you apply a new patch on top of it.
   4.120 +
   4.121 +Options:
   4.122 +\begin{itemize}
   4.123 +\item[\hgxopt{mq}{qnew}{-f}] Create a new patch if the contents of the
   4.124 +  working directory are modified.  Any outstanding modifications are
   4.125 +  added to the newly created patch, so after this command completes,
   4.126 +  the working directory will no longer be modified.
   4.127 +\item[\hgxopt{mq}{qnew}{-m}] Use the given text as the commit message.
   4.128 +  This text will be stored at the beginning of the patch file, before
   4.129 +  the patch data.
   4.130 +\end{itemize}
   4.131 +
   4.132 +\subsection{\hgxcmd{mq}{qnext}---print the name of the next patch}
   4.133 +
   4.134 +The \hgxcmd{mq}{qnext} command prints the name name of the next patch in
   4.135 +the \sfilename{series} file after the topmost applied patch.  This
   4.136 +patch will become the topmost applied patch if you run \hgxcmd{mq}{qpush}.
   4.137 +
   4.138 +\subsection{\hgxcmd{mq}{qpop}---pop patches off the stack}
   4.139 +
   4.140 +The \hgxcmd{mq}{qpop} command removes applied patches from the top of the
   4.141 +stack of applied patches.  By default, it removes only one patch.
   4.142 +
   4.143 +This command removes the changesets that represent the popped patches
   4.144 +from the repository, and updates the working directory to undo the
   4.145 +effects of the patches.
   4.146 +
   4.147 +This command takes an optional argument, which it uses as the name or
   4.148 +index of the patch to pop to.  If given a name, it will pop patches
   4.149 +until the named patch is the topmost applied patch.  If given a
   4.150 +number, \hgxcmd{mq}{qpop} treats the number as an index into the entries in
   4.151 +the series file, counting from zero (empty lines and lines containing
   4.152 +only comments do not count).  It pops patches until the patch
   4.153 +identified by the given index is the topmost applied patch.
   4.154 +
   4.155 +The \hgxcmd{mq}{qpop} command does not read or write patches or the
   4.156 +\sfilename{series} file.  It is thus safe to \hgxcmd{mq}{qpop} a patch that
   4.157 +you have removed from the \sfilename{series} file, or a patch that you
   4.158 +have renamed or deleted entirely.  In the latter two cases, use the
   4.159 +name of the patch as it was when you applied it.
   4.160 +
   4.161 +By default, the \hgxcmd{mq}{qpop} command will not pop any patches if the
   4.162 +working directory has been modified.  You can override this behaviour
   4.163 +using the \hgxopt{mq}{qpop}{-f} option, which reverts all modifications in
   4.164 +the working directory.
   4.165 +
   4.166 +Options:
   4.167 +\begin{itemize}
   4.168 +\item[\hgxopt{mq}{qpop}{-a}] Pop all applied patches.  This returns the
   4.169 +  repository to its state before you applied any patches.
   4.170 +\item[\hgxopt{mq}{qpop}{-f}] Forcibly revert any modifications to the
   4.171 +  working directory when popping.
   4.172 +\item[\hgxopt{mq}{qpop}{-n}] Pop a patch from the named queue.
   4.173 +\end{itemize}
   4.174 +
   4.175 +The \hgxcmd{mq}{qpop} command removes one line from the end of the
   4.176 +\sfilename{status} file for each patch that it pops.
   4.177 +
   4.178 +\subsection{\hgxcmd{mq}{qprev}---print the name of the previous patch}
   4.179 +
   4.180 +The \hgxcmd{mq}{qprev} command prints the name of the patch in the
   4.181 +\sfilename{series} file that comes before the topmost applied patch.
   4.182 +This will become the topmost applied patch if you run \hgxcmd{mq}{qpop}.
   4.183 +
   4.184 +\subsection{\hgxcmd{mq}{qpush}---push patches onto the stack}
   4.185 +\label{sec:mqref:cmd:qpush}
   4.186 +
   4.187 +The \hgxcmd{mq}{qpush} command adds patches onto the applied stack.  By
   4.188 +default, it adds only one patch.
   4.189 +
   4.190 +This command creates a new changeset to represent each applied patch,
   4.191 +and updates the working directory to apply the effects of the patches.
   4.192 +
   4.193 +The default data used when creating a changeset are as follows:
   4.194 +\begin{itemize}
   4.195 +\item The commit date and time zone are the current date and time
   4.196 +  zone.  Because these data are used to compute the identity of a
   4.197 +  changeset, this means that if you \hgxcmd{mq}{qpop} a patch and
   4.198 +  \hgxcmd{mq}{qpush} it again, the changeset that you push will have a
   4.199 +  different identity than the changeset you popped.
   4.200 +\item The author is the same as the default used by the \hgcmd{commit}
   4.201 +  command.
   4.202 +\item The commit message is any text from the patch file that comes
   4.203 +  before the first diff header.  If there is no such text, a default
   4.204 +  commit message is used that identifies the name of the patch.
   4.205 +\end{itemize}
   4.206 +If a patch contains a Mercurial patch header (XXX add link), the
   4.207 +information in the patch header overrides these defaults.
   4.208 +
   4.209 +Options:
   4.210 +\begin{itemize}
   4.211 +\item[\hgxopt{mq}{qpush}{-a}] Push all unapplied patches from the
   4.212 +  \sfilename{series} file until there are none left to push.
   4.213 +\item[\hgxopt{mq}{qpush}{-l}] Add the name of the patch to the end
   4.214 +  of the commit message.
   4.215 +\item[\hgxopt{mq}{qpush}{-m}] If a patch fails to apply cleanly, use the
   4.216 +  entry for the patch in another saved queue to compute the parameters
   4.217 +  for a three-way merge, and perform a three-way merge using the
   4.218 +  normal Mercurial merge machinery.  Use the resolution of the merge
   4.219 +  as the new patch content.
   4.220 +\item[\hgxopt{mq}{qpush}{-n}] Use the named queue if merging while pushing.
   4.221 +\end{itemize}
   4.222 +
   4.223 +The \hgxcmd{mq}{qpush} command reads, but does not modify, the
   4.224 +\sfilename{series} file.  It appends one line to the \hgcmd{status}
   4.225 +file for each patch that it pushes.
   4.226 +
   4.227 +\subsection{\hgxcmd{mq}{qrefresh}---update the topmost applied patch}
   4.228 +
   4.229 +The \hgxcmd{mq}{qrefresh} command updates the topmost applied patch.  It
   4.230 +modifies the patch, removes the old changeset that represented the
   4.231 +patch, and creates a new changeset to represent the modified patch.
   4.232 +
   4.233 +The \hgxcmd{mq}{qrefresh} command looks for the following modifications:
   4.234 +\begin{itemize}
   4.235 +\item Changes to the commit message, i.e.~the text before the first
   4.236 +  diff header in the patch file, are reflected in the new changeset
   4.237 +  that represents the patch.
   4.238 +\item Modifications to tracked files in the working directory are
   4.239 +  added to the patch.
   4.240 +\item Changes to the files tracked using \hgcmd{add}, \hgcmd{copy},
   4.241 +  \hgcmd{remove}, or \hgcmd{rename}.  Added files and copy and rename
   4.242 +  destinations are added to the patch, while removed files and rename
   4.243 +  sources are removed.
   4.244 +\end{itemize}
   4.245 +
   4.246 +Even if \hgxcmd{mq}{qrefresh} detects no changes, it still recreates the
   4.247 +changeset that represents the patch.  This causes the identity of the
   4.248 +changeset to differ from the previous changeset that identified the
   4.249 +patch.
   4.250 +
   4.251 +Options:
   4.252 +\begin{itemize}
   4.253 +\item[\hgxopt{mq}{qrefresh}{-e}] Modify the commit and patch description,
   4.254 +  using the preferred text editor.
   4.255 +\item[\hgxopt{mq}{qrefresh}{-m}] Modify the commit message and patch
   4.256 +  description, using the given text.
   4.257 +\item[\hgxopt{mq}{qrefresh}{-l}] Modify the commit message and patch
   4.258 +  description, using text from the given file.
   4.259 +\end{itemize}
   4.260 +
   4.261 +\subsection{\hgxcmd{mq}{qrename}---rename a patch}
   4.262 +
   4.263 +The \hgxcmd{mq}{qrename} command renames a patch, and changes the entry for
   4.264 +the patch in the \sfilename{series} file.
   4.265 +
   4.266 +With a single argument, \hgxcmd{mq}{qrename} renames the topmost applied
   4.267 +patch.  With two arguments, it renames its first argument to its
   4.268 +second.
   4.269 +
   4.270 +\subsection{\hgxcmd{mq}{qrestore}---restore saved queue state}
   4.271 +
   4.272 +XXX No idea what this does.
   4.273 +
   4.274 +\subsection{\hgxcmd{mq}{qsave}---save current queue state}
   4.275 +
   4.276 +XXX Likewise.
   4.277 +
   4.278 +\subsection{\hgxcmd{mq}{qseries}---print the entire patch series}
   4.279 +
   4.280 +The \hgxcmd{mq}{qseries} command prints the entire patch series from the
   4.281 +\sfilename{series} file.  It prints only patch names, not empty lines
   4.282 +or comments.  It prints in order from first to be applied to last.
   4.283 +
   4.284 +\subsection{\hgxcmd{mq}{qtop}---print the name of the current patch}
   4.285 +
   4.286 +The \hgxcmd{mq}{qtop} prints the name of the topmost currently applied
   4.287 +patch.
   4.288 +
   4.289 +\subsection{\hgxcmd{mq}{qunapplied}---print patches not yet applied}
   4.290 +
   4.291 +The \hgxcmd{mq}{qunapplied} command prints the names of patches from the
   4.292 +\sfilename{series} file that are not yet applied.  It prints them in
   4.293 +order from the next patch that will be pushed to the last.
   4.294 +
   4.295 +\subsection{\hgcmd{strip}---remove a revision and descendants}
   4.296 +
   4.297 +The \hgcmd{strip} command removes a revision, and all of its
   4.298 +descendants, from the repository.  It undoes the effects of the
   4.299 +removed revisions from the repository, and updates the working
   4.300 +directory to the first parent of the removed revision.
   4.301 +
   4.302 +The \hgcmd{strip} command saves a backup of the removed changesets in
   4.303 +a bundle, so that they can be reapplied if removed in error.
   4.304 +
   4.305 +Options:
   4.306 +\begin{itemize}
   4.307 +\item[\hgopt{strip}{-b}] Save unrelated changesets that are intermixed
   4.308 +  with the stripped changesets in the backup bundle.
   4.309 +\item[\hgopt{strip}{-f}] If a branch has multiple heads, remove all
   4.310 +  heads. XXX This should be renamed, and use \texttt{-f} to strip revs
   4.311 +  when there are pending changes.
   4.312 +\item[\hgopt{strip}{-n}] Do not save a backup bundle.
   4.313 +\end{itemize}
   4.314 +
   4.315 +\section{MQ file reference}
   4.316 +
   4.317 +\subsection{The \sfilename{series} file}
   4.318 +
   4.319 +The \sfilename{series} file contains a list of the names of all
   4.320 +patches that MQ can apply.  It is represented as a list of names, with
   4.321 +one name saved per line.  Leading and trailing white space in each
   4.322 +line are ignored.
   4.323 +
   4.324 +Lines may contain comments.  A comment begins with the ``\texttt{\#}''
   4.325 +character, and extends to the end of the line.  Empty lines, and lines
   4.326 +that contain only comments, are ignored.
   4.327 +
   4.328 +You will often need to edit the \sfilename{series} file by hand, hence
   4.329 +the support for comments and empty lines noted above.  For example,
   4.330 +you can comment out a patch temporarily, and \hgxcmd{mq}{qpush} will skip
   4.331 +over that patch when applying patches.  You can also change the order
   4.332 +in which patches are applied by reordering their entries in the
   4.333 +\sfilename{series} file.
   4.334 +
   4.335 +Placing the \sfilename{series} file under revision control is also
   4.336 +supported; it is a good idea to place all of the patches that it
   4.337 +refers to under revision control, as well.  If you create a patch
   4.338 +directory using the \hgxopt{mq}{qinit}{-c} option to \hgxcmd{mq}{qinit}, this
   4.339 +will be done for you automatically.
   4.340 +
   4.341 +\subsection{The \sfilename{status} file}
   4.342 +
   4.343 +The \sfilename{status} file contains the names and changeset hashes of
   4.344 +all patches that MQ currently has applied.  Unlike the
   4.345 +\sfilename{series} file, this file is not intended for editing.  You
   4.346 +should not place this file under revision control, or modify it in any
   4.347 +way.  It is used by MQ strictly for internal book-keeping.
   4.348 +
   4.349 +%%% Local Variables: 
   4.350 +%%% mode: latex
   4.351 +%%% TeX-master: "00book"
   4.352 +%%% End: 

     5.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     5.2 +++ b/en/appC-srcinstall.tex	Thu Jan 29 22:56:27 2009 -0800
     5.3 @@ -0,0 +1,53 @@
     5.4 +\chapter{Installing Mercurial from source}
     5.5 +\label{chap:srcinstall}
     5.6 +
     5.7 +\section{On a Unix-like system}
     5.8 +\label{sec:srcinstall:unixlike}
     5.9 +
    5.10 +If you are using a Unix-like system that has a sufficiently recent
    5.11 +version of Python (2.3~or newer) available, it is easy to install
    5.12 +Mercurial from source.
    5.13 +\begin{enumerate}
    5.14 +\item Download a recent source tarball from
    5.15 +  \url{http://www.selenic.com/mercurial/download}.
    5.16 +\item Unpack the tarball:
    5.17 +  \begin{codesample4}
    5.18 +    gzip -dc mercurial-\emph{version}.tar.gz | tar xf -
    5.19 +  \end{codesample4}
    5.20 +\item Go into the source directory and run the installer script.  This
    5.21 +  will build Mercurial and install it in your home directory.
    5.22 +  \begin{codesample4}
    5.23 +    cd mercurial-\emph{version}
    5.24 +    python setup.py install --force --home=\$HOME
    5.25 +  \end{codesample4}
    5.26 +\end{enumerate}
    5.27 +Once the install finishes, Mercurial will be in the \texttt{bin}
    5.28 +subdirectory of your home directory.  Don't forget to make sure that
    5.29 +this directory is present in your shell's search path.
    5.30 +
    5.31 +You will probably need to set the \envar{PYTHONPATH} environment
    5.32 +variable so that the Mercurial executable can find the rest of the
    5.33 +Mercurial packages.  For example, on my laptop, I have set it to
    5.34 +\texttt{/home/bos/lib/python}.  The exact path that you will need to
    5.35 +use depends on how Python was built for your system, but should be
    5.36 +easy to figure out.  If you're uncertain, look through the output of
    5.37 +the installer script above, and see where the contents of the
    5.38 +\texttt{mercurial} directory were installed to.
    5.39 +
    5.40 +\section{On Windows}
    5.41 +
    5.42 +Building and installing Mercurial on Windows requires a variety of
    5.43 +tools, a fair amount of technical knowledge, and considerable
    5.44 +patience.  I very much \emph{do not recommend} this route if you are a
    5.45 +``casual user''.  Unless you intend to hack on Mercurial, I strongly
    5.46 +suggest that you use a binary package instead.
    5.47 +
    5.48 +If you are intent on building Mercurial from source on Windows, follow
    5.49 +the ``hard way'' directions on the Mercurial wiki at
    5.50 +\url{http://www.selenic.com/mercurial/wiki/index.cgi/WindowsInstall},
    5.51 +and expect the process to involve a lot of fiddly work.
    5.52 +
    5.53 +%%% Local Variables: 
    5.54 +%%% mode: latex
    5.55 +%%% TeX-master: "00book"
    5.56 +%%% End: 

     6.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     6.2 +++ b/en/appD-license.tex	Thu Jan 29 22:56:27 2009 -0800
     6.3 @@ -0,0 +1,138 @@
     6.4 +\chapter{Open Publication License}
     6.5 +\label{cha:opl}
     6.6 +
     6.7 +Version 1.0, 8 June 1999
     6.8 +
     6.9 +\section{Requirements on both unmodified and modified versions}
    6.10 +
    6.11 +The Open Publication works may be reproduced and distributed in whole
    6.12 +or in part, in any medium physical or electronic, provided that the
    6.13 +terms of this license are adhered to, and that this license or an
    6.14 +incorporation of it by reference (with any options elected by the
    6.15 +author(s) and/or publisher) is displayed in the reproduction.
    6.16 +
    6.17 +Proper form for an incorporation by reference is as follows:
    6.18 +
    6.19 +\begin{quote}
    6.20 +  Copyright (c) \emph{year} by \emph{author's name or designee}. This
    6.21 +  material may be distributed only subject to the terms and conditions
    6.22 +  set forth in the Open Publication License, v\emph{x.y} or later (the
    6.23 +  latest version is presently available at
    6.24 +  \url{http://www.opencontent.org/openpub/}).
    6.25 +\end{quote}
    6.26 +
    6.27 +The reference must be immediately followed with any options elected by
    6.28 +the author(s) and/or publisher of the document (see
    6.29 +section~\ref{sec:opl:options}).
    6.30 +
    6.31 +Commercial redistribution of Open Publication-licensed material is
    6.32 +permitted.
    6.33 +
    6.34 +Any publication in standard (paper) book form shall require the
    6.35 +citation of the original publisher and author. The publisher and
    6.36 +author's names shall appear on all outer surfaces of the book. On all
    6.37 +outer surfaces of the book the original publisher's name shall be as
    6.38 +large as the title of the work and cited as possessive with respect to
    6.39 +the title.
    6.40 +
    6.41 +\section{Copyright}
    6.42 +
    6.43 +The copyright to each Open Publication is owned by its author(s) or
    6.44 +designee.
    6.45 +
    6.46 +\section{Scope of license}
    6.47 +
    6.48 +The following license terms apply to all Open Publication works,
    6.49 +unless otherwise explicitly stated in the document.
    6.50 +
    6.51 +Mere aggregation of Open Publication works or a portion of an Open
    6.52 +Publication work with other works or programs on the same media shall
    6.53 +not cause this license to apply to those other works. The aggregate
    6.54 +work shall contain a notice specifying the inclusion of the Open
    6.55 +Publication material and appropriate copyright notice.
    6.56 +
    6.57 +\textbf{Severability}. If any part of this license is found to be
    6.58 +unenforceable in any jurisdiction, the remaining portions of the
    6.59 +license remain in force.
    6.60 +
    6.61 +\textbf{No warranty}. Open Publication works are licensed and provided
    6.62 +``as is'' without warranty of any kind, express or implied, including,
    6.63 +but not limited to, the implied warranties of merchantability and
    6.64 +fitness for a particular purpose or a warranty of non-infringement.
    6.65 +
    6.66 +\section{Requirements on modified works}
    6.67 +
    6.68 +All modified versions of documents covered by this license, including
    6.69 +translations, anthologies, compilations and partial documents, must
    6.70 +meet the following requirements:
    6.71 +
    6.72 +\begin{enumerate}
    6.73 +\item The modified version must be labeled as such.
    6.74 +\item The person making the modifications must be identified and the
    6.75 +  modifications dated.
    6.76 +\item Acknowledgement of the original author and publisher if
    6.77 +  applicable must be retained according to normal academic citation
    6.78 +  practices.
    6.79 +\item The location of the original unmodified document must be
    6.80 +  identified.
    6.81 +\item The original author's (or authors') name(s) may not be used to
    6.82 +  assert or imply endorsement of the resulting document without the
    6.83 +  original author's (or authors') permission.
    6.84 +\end{enumerate}
    6.85 +
    6.86 +\section{Good-practice recommendations}
    6.87 +
    6.88 +In addition to the requirements of this license, it is requested from
    6.89 +and strongly recommended of redistributors that:
    6.90 +
    6.91 +\begin{enumerate}
    6.92 +\item If you are distributing Open Publication works on hardcopy or
    6.93 +  CD-ROM, you provide email notification to the authors of your intent
    6.94 +  to redistribute at least thirty days before your manuscript or media
    6.95 +  freeze, to give the authors time to provide updated documents. This
    6.96 +  notification should describe modifications, if any, made to the
    6.97 +  document.
    6.98 +\item All substantive modifications (including deletions) be either
    6.99 +  clearly marked up in the document or else described in an attachment
   6.100 +  to the document.
   6.101 +\item Finally, while it is not mandatory under this license, it is
   6.102 +  considered good form to offer a free copy of any hardcopy and CD-ROM
   6.103 +  expression of an Open Publication-licensed work to its author(s).
   6.104 +\end{enumerate}
   6.105 +
   6.106 +\section{License options}
   6.107 +\label{sec:opl:options}
   6.108 +
   6.109 +The author(s) and/or publisher of an Open Publication-licensed
   6.110 +document may elect certain options by appending language to the
   6.111 +reference to or copy of the license. These options are considered part
   6.112 +of the license instance and must be included with the license (or its
   6.113 +incorporation by reference) in derived works.
   6.114 +
   6.115 +\begin{enumerate}[A]
   6.116 +\item To prohibit distribution of substantively modified versions
   6.117 +  without the explicit permission of the author(s). ``Substantive
   6.118 +  modification'' is defined as a change to the semantic content of the
   6.119 +  document, and excludes mere changes in format or typographical
   6.120 +  corrections.
   6.121 +
   6.122 +  To accomplish this, add the phrase ``Distribution of substantively
   6.123 +  modified versions of this document is prohibited without the
   6.124 +  explicit permission of the copyright holder.'' to the license
   6.125 +  reference or copy.
   6.126 +
   6.127 +\item To prohibit any publication of this work or derivative works in
   6.128 +  whole or in part in standard (paper) book form for commercial
   6.129 +  purposes is prohibited unless prior permission is obtained from the
   6.130 +  copyright holder.
   6.131 +
   6.132 +  To accomplish this, add the phrase ``Distribution of the work or
   6.133 +  derivative of the work in any standard (paper) book form is
   6.134 +  prohibited unless prior permission is obtained from the copyright
   6.135 +  holder.'' to the license reference or copy.
   6.136 +\end{enumerate}
   6.137 +
   6.138 +%%% Local Variables: 
   6.139 +%%% mode: latex
   6.140 +%%% TeX-master: "00book"
   6.141 +%%% End: 

     7.1 --- a/en/branch.tex	Thu Jan 29 22:47:34 2009 -0800
     7.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
     7.3 @@ -1,392 +0,0 @@
     7.4 -\chapter{Managing releases and branchy development}
     7.5 -\label{chap:branch}
     7.6 -
     7.7 -Mercurial provides several mechanisms for you to manage a project that
     7.8 -is making progress on multiple fronts at once.  To understand these
     7.9 -mechanisms, let's first take a brief look at a fairly normal software
    7.10 -project structure.
    7.11 -
    7.12 -Many software projects issue periodic ``major'' releases that contain
    7.13 -substantial new features.  In parallel, they may issue ``minor''
    7.14 -releases.  These are usually identical to the major releases off which
    7.15 -they're based, but with a few bugs fixed.
    7.16 -
    7.17 -In this chapter, we'll start by talking about how to keep records of
    7.18 -project milestones such as releases.  We'll then continue on to talk
    7.19 -about the flow of work between different phases of a project, and how
    7.20 -Mercurial can help you to isolate and manage this work.
    7.21 -
    7.22 -\section{Giving a persistent name to a revision}
    7.23 -
    7.24 -Once you decide that you'd like to call a particular revision a
    7.25 -``release'', it's a good idea to record the identity of that revision.
    7.26 -This will let you reproduce that release at a later date, for whatever
    7.27 -purpose you might need at the time (reproducing a bug, porting to a
    7.28 -new platform, etc).
    7.29 -\interaction{tag.init}
    7.30 -
    7.31 -Mercurial lets you give a permanent name to any revision using the
    7.32 -\hgcmd{tag} command.  Not surprisingly, these names are called
    7.33 -``tags''.
    7.34 -\interaction{tag.tag}
    7.35 -
    7.36 -A tag is nothing more than a ``symbolic name'' for a revision.  Tags
    7.37 -exist purely for your convenience, so that you have a handy permanent
    7.38 -way to refer to a revision; Mercurial doesn't interpret the tag names
    7.39 -you use in any way.  Neither does Mercurial place any restrictions on
    7.40 -the name of a tag, beyond a few that are necessary to ensure that a
    7.41 -tag can be parsed unambiguously.  A tag name cannot contain any of the
    7.42 -following characters:
    7.43 -\begin{itemize}
    7.44 -\item Colon (ASCII 58, ``\texttt{:}'')
    7.45 -\item Carriage return (ASCII 13, ``\Verb+\r+'')
    7.46 -\item Newline (ASCII 10, ``\Verb+\n+'')
    7.47 -\end{itemize}
    7.48 -
    7.49 -You can use the \hgcmd{tags} command to display the tags present in
    7.50 -your repository.  In the output, each tagged revision is identified
    7.51 -first by its name, then by revision number, and finally by the unique
    7.52 -hash of the revision.  
    7.53 -\interaction{tag.tags}
    7.54 -Notice that \texttt{tip} is listed in the output of \hgcmd{tags}.  The
    7.55 -\texttt{tip} tag is a special ``floating'' tag, which always
    7.56 -identifies the newest revision in the repository.
    7.57 -
    7.58 -In the output of the \hgcmd{tags} command, tags are listed in reverse
    7.59 -order, by revision number.  This usually means that recent tags are
    7.60 -listed before older tags.  It also means that \texttt{tip} is always
    7.61 -going to be the first tag listed in the output of \hgcmd{tags}.
    7.62 -
    7.63 -When you run \hgcmd{log}, if it displays a revision that has tags
    7.64 -associated with it, it will print those tags.
    7.65 -\interaction{tag.log}
    7.66 -
    7.67 -Any time you need to provide a revision~ID to a Mercurial command, the
    7.68 -command will accept a tag name in its place.  Internally, Mercurial
    7.69 -will translate your tag name into the corresponding revision~ID, then
    7.70 -use that.
    7.71 -\interaction{tag.log.v1.0}
    7.72 -
    7.73 -There's no limit on the number of tags you can have in a repository,
    7.74 -or on the number of tags that a single revision can have.  As a
    7.75 -practical matter, it's not a great idea to have ``too many'' (a number
    7.76 -which will vary from project to project), simply because tags are
    7.77 -supposed to help you to find revisions.  If you have lots of tags, the
    7.78 -ease of using them to identify revisions diminishes rapidly.
    7.79 -
    7.80 -For example, if your project has milestones as frequent as every few
    7.81 -days, it's perfectly reasonable to tag each one of those.  But if you
    7.82 -have a continuous build system that makes sure every revision can be
    7.83 -built cleanly, you'd be introducing a lot of noise if you were to tag
    7.84 -every clean build.  Instead, you could tag failed builds (on the
    7.85 -assumption that they're rare!), or simply not use tags to track
    7.86 -buildability.
    7.87 -
    7.88 -If you want to remove a tag that you no longer want, use
    7.89 -\hgcmdargs{tag}{--remove}.  
    7.90 -\interaction{tag.remove}
    7.91 -You can also modify a tag at any time, so that it identifies a
    7.92 -different revision, by simply issuing a new \hgcmd{tag} command.
    7.93 -You'll have to use the \hgopt{tag}{-f} option to tell Mercurial that
    7.94 -you \emph{really} want to update the tag.
    7.95 -\interaction{tag.replace}
    7.96 -There will still be a permanent record of the previous identity of the
    7.97 -tag, but Mercurial will no longer use it.  There's thus no penalty to
    7.98 -tagging the wrong revision; all you have to do is turn around and tag
    7.99 -the correct revision once you discover your error.
   7.100 -
   7.101 -Mercurial stores tags in a normal revision-controlled file in your
   7.102 -repository.  If you've created any tags, you'll find them in a file
   7.103 -named \sfilename{.hgtags}.  When you run the \hgcmd{tag} command,
   7.104 -Mercurial modifies this file, then automatically commits the change to
   7.105 -it.  This means that every time you run \hgcmd{tag}, you'll see a
   7.106 -corresponding changeset in the output of \hgcmd{log}.
   7.107 -\interaction{tag.tip}
   7.108 -
   7.109 -\subsection{Handling tag conflicts during a merge}
   7.110 -
   7.111 -You won't often need to care about the \sfilename{.hgtags} file, but
   7.112 -it sometimes makes its presence known during a merge.  The format of
   7.113 -the file is simple: it consists of a series of lines.  Each line
   7.114 -starts with a changeset hash, followed by a space, followed by the
   7.115 -name of a tag.
   7.116 -
   7.117 -If you're resolving a conflict in the \sfilename{.hgtags} file during
   7.118 -a merge, there's one twist to modifying the \sfilename{.hgtags} file:
   7.119 -when Mercurial is parsing the tags in a repository, it \emph{never}
   7.120 -reads the working copy of the \sfilename{.hgtags} file.  Instead, it
   7.121 -reads the \emph{most recently committed} revision of the file.
   7.122 -
   7.123 -An unfortunate consequence of this design is that you can't actually
   7.124 -verify that your merged \sfilename{.hgtags} file is correct until
   7.125 -\emph{after} you've committed a change.  So if you find yourself
   7.126 -resolving a conflict on \sfilename{.hgtags} during a merge, be sure to
   7.127 -run \hgcmd{tags} after you commit.  If it finds an error in the
   7.128 -\sfilename{.hgtags} file, it will report the location of the error,
   7.129 -which you can then fix and commit.  You should then run \hgcmd{tags}
   7.130 -again, just to be sure that your fix is correct.
   7.131 -
   7.132 -\subsection{Tags and cloning}
   7.133 -
   7.134 -You may have noticed that the \hgcmd{clone} command has a
   7.135 -\hgopt{clone}{-r} option that lets you clone an exact copy of the
   7.136 -repository as of a particular changeset.  The new clone will not
   7.137 -contain any project history that comes after the revision you
   7.138 -specified.  This has an interaction with tags that can surprise the
   7.139 -unwary.
   7.140 -
   7.141 -Recall that a tag is stored as a revision to the \sfilename{.hgtags}
   7.142 -file, so that when you create a tag, the changeset in which it's
   7.143 -recorded necessarily refers to an older changeset.  When you run
   7.144 -\hgcmdargs{clone}{-r foo} to clone a repository as of tag
   7.145 -\texttt{foo}, the new clone \emph{will not contain the history that
   7.146 -  created the tag} that you used to clone the repository.  The result
   7.147 -is that you'll get exactly the right subset of the project's history
   7.148 -in the new repository, but \emph{not} the tag you might have expected.
   7.149 -
   7.150 -\subsection{When permanent tags are too much}
   7.151 -
   7.152 -Since Mercurial's tags are revision controlled and carried around with
   7.153 -a project's history, everyone you work with will see the tags you
   7.154 -create.  But giving names to revisions has uses beyond simply noting
   7.155 -that revision \texttt{4237e45506ee} is really \texttt{v2.0.2}.  If
   7.156 -you're trying to track down a subtle bug, you might want a tag to
   7.157 -remind you of something like ``Anne saw the symptoms with this
   7.158 -revision''.
   7.159 -
   7.160 -For cases like this, what you might want to use are \emph{local} tags.
   7.161 -You can create a local tag with the \hgopt{tag}{-l} option to the
   7.162 -\hgcmd{tag} command.  This will store the tag in a file called
   7.163 -\sfilename{.hg/localtags}.  Unlike \sfilename{.hgtags},
   7.164 -\sfilename{.hg/localtags} is not revision controlled.  Any tags you
   7.165 -create using \hgopt{tag}{-l} remain strictly local to the repository
   7.166 -you're currently working in.
   7.167 -
   7.168 -\section{The flow of changes---big picture vs. little}
   7.169 -
   7.170 -To return to the outline I sketched at the beginning of a chapter,
   7.171 -let's think about a project that has multiple concurrent pieces of
   7.172 -work under development at once.
   7.173 -
   7.174 -There might be a push for a new ``main'' release; a new minor bugfix
   7.175 -release to the last main release; and an unexpected ``hot fix'' to an
   7.176 -old release that is now in maintenance mode.
   7.177 -
   7.178 -The usual way people refer to these different concurrent directions of
   7.179 -development is as ``branches''.  However, we've already seen numerous
   7.180 -times that Mercurial treats \emph{all of history} as a series of
   7.181 -branches and merges.  Really, what we have here is two ideas that are
   7.182 -peripherally related, but which happen to share a name.
   7.183 -\begin{itemize}
   7.184 -\item ``Big picture'' branches represent the sweep of a project's
   7.185 -  evolution; people give them names, and talk about them in
   7.186 -  conversation.
   7.187 -\item ``Little picture'' branches are artefacts of the day-to-day
   7.188 -  activity of developing and merging changes.  They expose the
   7.189 -  narrative of how the code was developed.
   7.190 -\end{itemize}
   7.191 -
   7.192 -\section{Managing big-picture branches in repositories}
   7.193 -
   7.194 -The easiest way to isolate a ``big picture'' branch in Mercurial is in
   7.195 -a dedicated repository.  If you have an existing shared
   7.196 -repository---let's call it \texttt{myproject}---that reaches a ``1.0''
   7.197 -milestone, you can start to prepare for future maintenance releases on
   7.198 -top of version~1.0 by tagging the revision from which you prepared
   7.199 -the~1.0 release.
   7.200 -\interaction{branch-repo.tag}
   7.201 -You can then clone a new shared \texttt{myproject-1.0.1} repository as
   7.202 -of that tag.
   7.203 -\interaction{branch-repo.clone}
   7.204 -
   7.205 -Afterwards, if someone needs to work on a bug fix that ought to go
   7.206 -into an upcoming~1.0.1 minor release, they clone the
   7.207 -\texttt{myproject-1.0.1} repository, make their changes, and push them
   7.208 -back.
   7.209 -\interaction{branch-repo.bugfix}
   7.210 -Meanwhile, development for the next major release can continue,
   7.211 -isolated and unabated, in the \texttt{myproject} repository.
   7.212 -\interaction{branch-repo.new}
   7.213 -
   7.214 -\section{Don't repeat yourself: merging across branches}
   7.215 -
   7.216 -In many cases, if you have a bug to fix on a maintenance branch, the
   7.217 -chances are good that the bug exists on your project's main branch
   7.218 -(and possibly other maintenance branches, too).  It's a rare developer
   7.219 -who wants to fix the same bug multiple times, so let's look at a few
   7.220 -ways that Mercurial can help you to manage these bugfixes without
   7.221 -duplicating your work.
   7.222 -
   7.223 -In the simplest instance, all you need to do is pull changes from your
   7.224 -maintenance branch into your local clone of the target branch.
   7.225 -\interaction{branch-repo.pull}
   7.226 -You'll then need to merge the heads of the two branches, and push back
   7.227 -to the main branch.
   7.228 -\interaction{branch-repo.merge}
   7.229 -
   7.230 -\section{Naming branches within one repository}
   7.231 -
   7.232 -In most instances, isolating branches in repositories is the right
   7.233 -approach.  Its simplicity makes it easy to understand; and so it's
   7.234 -hard to make mistakes.  There's a one-to-one relationship between
   7.235 -branches you're working in and directories on your system.  This lets
   7.236 -you use normal (non-Mercurial-aware) tools to work on files within a
   7.237 -branch/repository.
   7.238 -
   7.239 -If you're more in the ``power user'' category (\emph{and} your
   7.240 -collaborators are too), there is an alternative way of handling
   7.241 -branches that you can consider.  I've already mentioned the
   7.242 -human-level distinction between ``small picture'' and ``big picture''
   7.243 -branches.  While Mercurial works with multiple ``small picture''
   7.244 -branches in a repository all the time (for example after you pull
   7.245 -changes in, but before you merge them), it can \emph{also} work with
   7.246 -multiple ``big picture'' branches.
   7.247 -
   7.248 -The key to working this way is that Mercurial lets you assign a
   7.249 -persistent \emph{name} to a branch.  There always exists a branch
   7.250 -named \texttt{default}.  Even before you start naming branches
   7.251 -yourself, you can find traces of the \texttt{default} branch if you
   7.252 -look for them.
   7.253 -
   7.254 -As an example, when you run the \hgcmd{commit} command, and it pops up
   7.255 -your editor so that you can enter a commit message, look for a line
   7.256 -that contains the text ``\texttt{HG: branch default}'' at the bottom.
   7.257 -This is telling you that your commit will occur on the branch named
   7.258 -\texttt{default}.
   7.259 -
   7.260 -To start working with named branches, use the \hgcmd{branches}
   7.261 -command.  This command lists the named branches already present in
   7.262 -your repository, telling you which changeset is the tip of each.
   7.263 -\interaction{branch-named.branches}
   7.264 -Since you haven't created any named branches yet, the only one that
   7.265 -exists is \texttt{default}.
   7.266 -
   7.267 -To find out what the ``current'' branch is, run the \hgcmd{branch}
   7.268 -command, giving it no arguments.  This tells you what branch the
   7.269 -parent of the current changeset is on.
   7.270 -\interaction{branch-named.branch}
   7.271 -
   7.272 -To create a new branch, run the \hgcmd{branch} command again.  This
   7.273 -time, give it one argument: the name of the branch you want to create.
   7.274 -\interaction{branch-named.create}
   7.275 -
   7.276 -After you've created a branch, you might wonder what effect the
   7.277 -\hgcmd{branch} command has had.  What do the \hgcmd{status} and
   7.278 -\hgcmd{tip} commands report?
   7.279 -\interaction{branch-named.status}
   7.280 -Nothing has changed in the working directory, and there's been no new
   7.281 -history created.  As this suggests, running the \hgcmd{branch} command
   7.282 -has no permanent effect; it only tells Mercurial what branch name to
   7.283 -use the \emph{next} time you commit a changeset.
   7.284 -
   7.285 -When you commit a change, Mercurial records the name of the branch on
   7.286 -which you committed.  Once you've switched from the \texttt{default}
   7.287 -branch to another and committed, you'll see the name of the new branch
   7.288 -show up in the output of \hgcmd{log}, \hgcmd{tip}, and other commands
   7.289 -that display the same kind of output.
   7.290 -\interaction{branch-named.commit}
   7.291 -The \hgcmd{log}-like commands will print the branch name of every
   7.292 -changeset that's not on the \texttt{default} branch.  As a result, if
   7.293 -you never use named branches, you'll never see this information.
   7.294 -
   7.295 -Once you've named a branch and committed a change with that name,
   7.296 -every subsequent commit that descends from that change will inherit
   7.297 -the same branch name.  You can change the name of a branch at any
   7.298 -time, using the \hgcmd{branch} command.  
   7.299 -\interaction{branch-named.rebranch}
   7.300 -In practice, this is something you won't do very often, as branch
   7.301 -names tend to have fairly long lifetimes.  (This isn't a rule, just an
   7.302 -observation.)
   7.303 -
   7.304 -\section{Dealing with multiple named branches in a repository}
   7.305 -
   7.306 -If you have more than one named branch in a repository, Mercurial will
   7.307 -remember the branch that your working directory on when you start a
   7.308 -command like \hgcmd{update} or \hgcmdargs{pull}{-u}.  It will update
   7.309 -the working directory to the tip of this branch, no matter what the
   7.310 -``repo-wide'' tip is.  To update to a revision that's on a different
   7.311 -named branch, you may need to use the \hgopt{update}{-C} option to
   7.312 -\hgcmd{update}.
   7.313 -
   7.314 -This behaviour is a little subtle, so let's see it in action.  First,
   7.315 -let's remind ourselves what branch we're currently on, and what
   7.316 -branches are in our repository.
   7.317 -\interaction{branch-named.parents}
   7.318 -We're on the \texttt{bar} branch, but there also exists an older
   7.319 -\hgcmd{foo} branch.
   7.320 -
   7.321 -We can \hgcmd{update} back and forth between the tips of the
   7.322 -\texttt{foo} and \texttt{bar} branches without needing to use the
   7.323 -\hgopt{update}{-C} option, because this only involves going backwards
   7.324 -and forwards linearly through our change history.
   7.325 -\interaction{branch-named.update-switchy}
   7.326 -
   7.327 -If we go back to the \texttt{foo} branch and then run \hgcmd{update},
   7.328 -it will keep us on \texttt{foo}, not move us to the tip of
   7.329 -\texttt{bar}.
   7.330 -\interaction{branch-named.update-nothing}
   7.331 -
   7.332 -Committing a new change on the \texttt{foo} branch introduces a new
   7.333 -head.
   7.334 -\interaction{branch-named.foo-commit}
   7.335 -
   7.336 -\section{Branch names and merging}
   7.337 -
   7.338 -As you've probably noticed, merges in Mercurial are not symmetrical.
   7.339 -Let's say our repository has two heads, 17 and 23.  If I
   7.340 -\hgcmd{update} to 17 and then \hgcmd{merge} with 23, Mercurial records
   7.341 -17 as the first parent of the merge, and 23 as the second.  Whereas if
   7.342 -I \hgcmd{update} to 23 and then \hgcmd{merge} with 17, it records 23
   7.343 -as the first parent, and 17 as the second.
   7.344 -
   7.345 -This affects Mercurial's choice of branch name when you merge.  After
   7.346 -a merge, Mercurial will retain the branch name of the first parent
   7.347 -when you commit the result of the merge.  If your first parent's
   7.348 -branch name is \texttt{foo}, and you merge with \texttt{bar}, the
   7.349 -branch name will still be \texttt{foo} after you merge.
   7.350 -
   7.351 -It's not unusual for a repository to contain multiple heads, each with
   7.352 -the same branch name.  Let's say I'm working on the \texttt{foo}
   7.353 -branch, and so are you.  We commit different changes; I pull your
   7.354 -changes; I now have two heads, each claiming to be on the \texttt{foo}
   7.355 -branch.  The result of a merge will be a single head on the
   7.356 -\texttt{foo} branch, as you might hope.
   7.357 -
   7.358 -But if I'm working on the \texttt{bar} branch, and I merge work from
   7.359 -the \texttt{foo} branch, the result will remain on the \texttt{bar}
   7.360 -branch.
   7.361 -\interaction{branch-named.merge}
   7.362 -
   7.363 -To give a more concrete example, if I'm working on the
   7.364 -\texttt{bleeding-edge} branch, and I want to bring in the latest fixes
   7.365 -from the \texttt{stable} branch, Mercurial will choose the ``right''
   7.366 -(\texttt{bleeding-edge}) branch name when I pull and merge from
   7.367 -\texttt{stable}.
   7.368 -
   7.369 -\section{Branch naming is generally useful}
   7.370 -
   7.371 -You shouldn't think of named branches as applicable only to situations
   7.372 -where you have multiple long-lived branches cohabiting in a single
   7.373 -repository.  They're very useful even in the one-branch-per-repository
   7.374 -case.  
   7.375 -
   7.376 -In the simplest case, giving a name to each branch gives you a
   7.377 -permanent record of which branch a changeset originated on.  This
   7.378 -gives you more context when you're trying to follow the history of a
   7.379 -long-lived branchy project.
   7.380 -
   7.381 -If you're working with shared repositories, you can set up a
   7.382 -\hook{pretxnchangegroup} hook on each that will block incoming changes
   7.383 -that have the ``wrong'' branch name.  This provides a simple, but
   7.384 -effective, defence against people accidentally pushing changes from a
   7.385 -``bleeding edge'' branch to a ``stable'' branch.  Such a hook might
   7.386 -look like this inside the shared repo's \hgrc.
   7.387 -\begin{codesample2}
   7.388 -  [hooks]
   7.389 -  pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch
   7.390 -\end{codesample2}
   7.391 -
   7.392 -%%% Local Variables: 
   7.393 -%%% mode: latex
   7.394 -%%% TeX-master: "00book"
   7.395 -%%% End: 

     8.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     8.2 +++ b/en/ch00-preface.tex	Thu Jan 29 22:56:27 2009 -0800
     8.3 @@ -0,0 +1,67 @@
     8.4 +\chapter*{Preface}
     8.5 +\addcontentsline{toc}{chapter}{Preface}
     8.6 +\label{chap:preface}
     8.7 +
     8.8 +Distributed revision control is a relatively new territory, and has
     8.9 +thus far grown due to people's willingness to strike out into
    8.10 +ill-charted territory.
    8.11 +
    8.12 +I am writing a book about distributed revision control because I
    8.13 +believe that it is an important subject that deserves a field guide.
    8.14 +I chose to write about Mercurial because it is the easiest tool to
    8.15 +learn the terrain with, and yet it scales to the demands of real,
    8.16 +challenging environments where many other revision control tools fail.
    8.17 +
    8.18 +\section{This book is a work in progress}
    8.19 +
    8.20 +I am releasing this book while I am still writing it, in the hope that
    8.21 +it will prove useful to others.  I also hope that readers will
    8.22 +contribute as they see fit.
    8.23 +
    8.24 +\section{About the examples in this book}
    8.25 +
    8.26 +This book takes an unusual approach to code samples.  Every example is
    8.27 +``live''---each one is actually the result of a shell script that
    8.28 +executes the Mercurial commands you see.  Every time an image of the
    8.29 +book is built from its sources, all the example scripts are
    8.30 +automatically run, and their current results compared against their
    8.31 +expected results.
    8.32 +
    8.33 +The advantage of this approach is that the examples are always
    8.34 +accurate; they describe \emph{exactly} the behaviour of the version of
    8.35 +Mercurial that's mentioned at the front of the book.  If I update the
    8.36 +version of Mercurial that I'm documenting, and the output of some
    8.37 +command changes, the build fails.
    8.38 +
    8.39 +There is a small disadvantage to this approach, which is that the
    8.40 +dates and times you'll see in examples tend to be ``squashed''
    8.41 +together in a way that they wouldn't be if the same commands were
    8.42 +being typed by a human.  Where a human can issue no more than one
    8.43 +command every few seconds, with any resulting timestamps
    8.44 +correspondingly spread out, my automated example scripts run many
    8.45 +commands in one second.
    8.46 +
    8.47 +As an instance of this, several consecutive commits in an example can
    8.48 +show up as having occurred during the same second.  You can see this
    8.49 +occur in the \hgext{bisect} example in section~\ref{sec:undo:bisect},
    8.50 +for instance.
    8.51 +
    8.52 +So when you're reading examples, don't place too much weight on the
    8.53 +dates or times you see in the output of commands.  But \emph{do} be
    8.54 +confident that the behaviour you're seeing is consistent and
    8.55 +reproducible.
    8.56 +
    8.57 +\section{Colophon---this book is Free}
    8.58 +
    8.59 +This book is licensed under the Open Publication License, and is
    8.60 +produced entirely using Free Software tools.  It is typeset with
    8.61 +\LaTeX{}; illustrations are drawn and rendered with
    8.62 +\href{http://www.inkscape.org/}{Inkscape}.
    8.63 +
    8.64 +The complete source code for this book is published as a Mercurial
    8.65 +repository, at \url{http://hg.serpentine.com/mercurial/book}.
    8.66 +
    8.67 +%%% Local Variables: 
    8.68 +%%% mode: latex
    8.69 +%%% TeX-master: "00book"
    8.70 +%%% End: 

     9.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     9.2 +++ b/en/ch01-intro.tex	Thu Jan 29 22:56:27 2009 -0800
     9.3 @@ -0,0 +1,561 @@
     9.4 +\chapter{Introduction}
     9.5 +\label{chap:intro}
     9.6 +
     9.7 +\section{About revision control}
     9.8 +
     9.9 +Revision control is the process of managing multiple versions of a
    9.10 +piece of information.  In its simplest form, this is something that
    9.11 +many people do by hand: every time you modify a file, save it under a
    9.12 +new name that contains a number, each one higher than the number of
    9.13 +the preceding version.
    9.14 +
    9.15 +Manually managing multiple versions of even a single file is an
    9.16 +error-prone task, though, so software tools to help automate this
    9.17 +process have long been available.  The earliest automated revision
    9.18 +control tools were intended to help a single user to manage revisions
    9.19 +of a single file.  Over the past few decades, the scope of revision
    9.20 +control tools has expanded greatly; they now manage multiple files,
    9.21 +and help multiple people to work together.  The best modern revision
    9.22 +control tools have no problem coping with thousands of people working
    9.23 +together on projects that consist of hundreds of thousands of files.
    9.24 +
    9.25 +\subsection{Why use revision control?}
    9.26 +
    9.27 +There are a number of reasons why you or your team might want to use
    9.28 +an automated revision control tool for a project.
    9.29 +\begin{itemize}
    9.30 +\item It will track the history and evolution of your project, so you
    9.31 +  don't have to.  For every change, you'll have a log of \emph{who}
    9.32 +  made it; \emph{why} they made it; \emph{when} they made it; and
    9.33 +  \emph{what} the change was.
    9.34 +\item When you're working with other people, revision control software
    9.35 +  makes it easier for you to collaborate.  For example, when people
    9.36 +  more or less simultaneously make potentially incompatible changes,
    9.37 +  the software will help you to identify and resolve those conflicts.
    9.38 +\item It can help you to recover from mistakes.  If you make a change
    9.39 +  that later turns out to be in error, you can revert to an earlier
    9.40 +  version of one or more files.  In fact, a \emph{really} good
    9.41 +  revision control tool will even help you to efficiently figure out
    9.42 +  exactly when a problem was introduced (see
    9.43 +  section~\ref{sec:undo:bisect} for details).
    9.44 +\item It will help you to work simultaneously on, and manage the drift
    9.45 +  between, multiple versions of your project.
    9.46 +\end{itemize}
    9.47 +Most of these reasons are equally valid---at least in theory---whether
    9.48 +you're working on a project by yourself, or with a hundred other
    9.49 +people.
    9.50 +
    9.51 +A key question about the practicality of revision control at these two
    9.52 +different scales (``lone hacker'' and ``huge team'') is how its
    9.53 +\emph{benefits} compare to its \emph{costs}.  A revision control tool
    9.54 +that's difficult to understand or use is going to impose a high cost.
    9.55 +
    9.56 +A five-hundred-person project is likely to collapse under its own
    9.57 +weight almost immediately without a revision control tool and process.
    9.58 +In this case, the cost of using revision control might hardly seem
    9.59 +worth considering, since \emph{without} it, failure is almost
    9.60 +guaranteed.
    9.61 +
    9.62 +On the other hand, a one-person ``quick hack'' might seem like a poor
    9.63 +place to use a revision control tool, because surely the cost of using
    9.64 +one must be close to the overall cost of the project.  Right?
    9.65 +
    9.66 +Mercurial uniquely supports \emph{both} of these scales of
    9.67 +development.  You can learn the basics in just a few minutes, and due
    9.68 +to its low overhead, you can apply revision control to the smallest of
    9.69 +projects with ease.  Its simplicity means you won't have a lot of
    9.70 +abstruse concepts or command sequences competing for mental space with
    9.71 +whatever you're \emph{really} trying to do.  At the same time,
    9.72 +Mercurial's high performance and peer-to-peer nature let you scale
    9.73 +painlessly to handle large projects.
    9.74 +
    9.75 +No revision control tool can rescue a poorly run project, but a good
    9.76 +choice of tools can make a huge difference to the fluidity with which
    9.77 +you can work on a project.
    9.78 +
    9.79 +\subsection{The many names of revision control}
    9.80 +
    9.81 +Revision control is a diverse field, so much so that it doesn't
    9.82 +actually have a single name or acronym.  Here are a few of the more
    9.83 +common names and acronyms you'll encounter:
    9.84 +\begin{itemize}
    9.85 +\item Revision control (RCS)
    9.86 +\item Software configuration management (SCM), or configuration management
    9.87 +\item Source code management
    9.88 +\item Source code control, or source control
    9.89 +\item Version control (VCS)
    9.90 +\end{itemize}
    9.91 +Some people claim that these terms actually have different meanings,
    9.92 +but in practice they overlap so much that there's no agreed or even
    9.93 +useful way to tease them apart.
    9.94 +
    9.95 +\section{A short history of revision control}
    9.96 +
    9.97 +The best known of the old-time revision control tools is SCCS (Source
    9.98 +Code Control System), which Marc Rochkind wrote at Bell Labs, in the
    9.99 +early 1970s.  SCCS operated on individual files, and required every
   9.100 +person working on a project to have access to a shared workspace on a
   9.101 +single system.  Only one person could modify a file at any time;
   9.102 +arbitration for access to files was via locks.  It was common for
   9.103 +people to lock files, and later forget to unlock them, preventing
   9.104 +anyone else from modifying those files without the help of an
   9.105 +administrator.  
   9.106 +
   9.107 +Walter Tichy developed a free alternative to SCCS in the early 1980s;
   9.108 +he called his program RCS (Revison Control System).  Like SCCS, RCS
   9.109 +required developers to work in a single shared workspace, and to lock
   9.110 +files to prevent multiple people from modifying them simultaneously.
   9.111 +
   9.112 +Later in the 1980s, Dick Grune used RCS as a building block for a set
   9.113 +of shell scripts he initially called cmt, but then renamed to CVS
   9.114 +(Concurrent Versions System).  The big innovation of CVS was that it
   9.115 +let developers work simultaneously and somewhat independently in their
   9.116 +own personal workspaces.  The personal workspaces prevented developers
   9.117 +from stepping on each other's toes all the time, as was common with
   9.118 +SCCS and RCS.  Each developer had a copy of every project file, and
   9.119 +could modify their copies independently.  They had to merge their
   9.120 +edits prior to committing changes to the central repository.
   9.121 +
   9.122 +Brian Berliner took Grune's original scripts and rewrote them in~C,
   9.123 +releasing in 1989 the code that has since developed into the modern
   9.124 +version of CVS.  CVS subsequently acquired the ability to operate over
   9.125 +a network connection, giving it a client/server architecture.  CVS's
   9.126 +architecture is centralised; only the server has a copy of the history
   9.127 +of the project.  Client workspaces just contain copies of recent
   9.128 +versions of the project's files, and a little metadata to tell them
   9.129 +where the server is.  CVS has been enormously successful; it is
   9.130 +probably the world's most widely used revision control system.
   9.131 +
   9.132 +In the early 1990s, Sun Microsystems developed an early distributed
   9.133 +revision control system, called TeamWare.  A TeamWare workspace
   9.134 +contains a complete copy of the project's history.  TeamWare has no
   9.135 +notion of a central repository.  (CVS relied upon RCS for its history
   9.136 +storage; TeamWare used SCCS.)
   9.137 +
   9.138 +As the 1990s progressed, awareness grew of a number of problems with
   9.139 +CVS.  It records simultaneous changes to multiple files individually,
   9.140 +instead of grouping them together as a single logically atomic
   9.141 +operation.  It does not manage its file hierarchy well; it is easy to
   9.142 +make a mess of a repository by renaming files and directories.  Worse,
   9.143 +its source code is difficult to read and maintain, which made the
   9.144 +``pain level'' of fixing these architectural problems prohibitive.
   9.145 +
   9.146 +In 2001, Jim Blandy and Karl Fogel, two developers who had worked on
   9.147 +CVS, started a project to replace it with a tool that would have a
   9.148 +better architecture and cleaner code.  The result, Subversion, does
   9.149 +not stray from CVS's centralised client/server model, but it adds
   9.150 +multi-file atomic commits, better namespace management, and a number
   9.151 +of other features that make it a generally better tool than CVS.
   9.152 +Since its initial release, it has rapidly grown in popularity.
   9.153 +
   9.154 +More or less simultaneously, Graydon Hoare began working on an
   9.155 +ambitious distributed revision control system that he named Monotone.
   9.156 +While Monotone addresses many of CVS's design flaws and has a
   9.157 +peer-to-peer architecture, it goes beyond earlier (and subsequent)
   9.158 +revision control tools in a number of innovative ways.  It uses
   9.159 +cryptographic hashes as identifiers, and has an integral notion of
   9.160 +``trust'' for code from different sources.
   9.161 +
   9.162 +Mercurial began life in 2005.  While a few aspects of its design are
   9.163 +influenced by Monotone, Mercurial focuses on ease of use, high
   9.164 +performance, and scalability to very large projects.
   9.165 +
   9.166 +\section{Trends in revision control}
   9.167 +
   9.168 +There has been an unmistakable trend in the development and use of
   9.169 +revision control tools over the past four decades, as people have
   9.170 +become familiar with the capabilities of their tools and constrained
   9.171 +by their limitations.
   9.172 +
   9.173 +The first generation began by managing single files on individual
   9.174 +computers.  Although these tools represented a huge advance over
   9.175 +ad-hoc manual revision control, their locking model and reliance on a
   9.176 +single computer limited them to small, tightly-knit teams.
   9.177 +
   9.178 +The second generation loosened these constraints by moving to
   9.179 +network-centered architectures, and managing entire projects at a
   9.180 +time.  As projects grew larger, they ran into new problems.  With
   9.181 +clients needing to talk to servers very frequently, server scaling
   9.182 +became an issue for large projects.  An unreliable network connection
   9.183 +could prevent remote users from being able to talk to the server at
   9.184 +all.  As open source projects started making read-only access
   9.185 +available anonymously to anyone, people without commit privileges
   9.186 +found that they could not use the tools to interact with a project in
   9.187 +a natural way, as they could not record their changes.
   9.188 +
   9.189 +The current generation of revision control tools is peer-to-peer in
   9.190 +nature.  All of these systems have dropped the dependency on a single
   9.191 +central server, and allow people to distribute their revision control
   9.192 +data to where it's actually needed.  Collaboration over the Internet
   9.193 +has moved from constrained by technology to a matter of choice and
   9.194 +consensus.  Modern tools can operate offline indefinitely and
   9.195 +autonomously, with a network connection only needed when syncing
   9.196 +changes with another repository.
   9.197 +
   9.198 +\section{A few of the advantages of distributed revision control}
   9.199 +
   9.200 +Even though distributed revision control tools have for several years
   9.201 +been as robust and usable as their previous-generation counterparts,
   9.202 +people using older tools have not yet necessarily woken up to their
   9.203 +advantages.  There are a number of ways in which distributed tools
   9.204 +shine relative to centralised ones.
   9.205 +
   9.206 +For an individual developer, distributed tools are almost always much
   9.207 +faster than centralised tools.  This is for a simple reason: a
   9.208 +centralised tool needs to talk over the network for many common
   9.209 +operations, because most metadata is stored in a single copy on the
   9.210 +central server.  A distributed tool stores all of its metadata
   9.211 +locally.  All else being equal, talking over the network adds overhead
   9.212 +to a centralised tool.  Don't underestimate the value of a snappy,
   9.213 +responsive tool: you're going to spend a lot of time interacting with
   9.214 +your revision control software.
   9.215 +
   9.216 +Distributed tools are indifferent to the vagaries of your server
   9.217 +infrastructure, again because they replicate metadata to so many
   9.218 +locations.  If you use a centralised system and your server catches
   9.219 +fire, you'd better hope that your backup media are reliable, and that
   9.220 +your last backup was recent and actually worked.  With a distributed
   9.221 +tool, you have many backups available on every contributor's computer.
   9.222 +
   9.223 +The reliability of your network will affect distributed tools far less
   9.224 +than it will centralised tools.  You can't even use a centralised tool
   9.225 +without a network connection, except for a few highly constrained
   9.226 +commands.  With a distributed tool, if your network connection goes
   9.227 +down while you're working, you may not even notice.  The only thing
   9.228 +you won't be able to do is talk to repositories on other computers,
   9.229 +something that is relatively rare compared with local operations.  If
   9.230 +you have a far-flung team of collaborators, this may be significant.
   9.231 +
   9.232 +\subsection{Advantages for open source projects}
   9.233 +
   9.234 +If you take a shine to an open source project and decide that you
   9.235 +would like to start hacking on it, and that project uses a distributed
   9.236 +revision control tool, you are at once a peer with the people who
   9.237 +consider themselves the ``core'' of that project.  If they publish
   9.238 +their repositories, you can immediately copy their project history,
   9.239 +start making changes, and record your work, using the same tools in
   9.240 +the same ways as insiders.  By contrast, with a centralised tool, you
   9.241 +must use the software in a ``read only'' mode unless someone grants
   9.242 +you permission to commit changes to their central server.  Until then,
   9.243 +you won't be able to record changes, and your local modifications will
   9.244 +be at risk of corruption any time you try to update your client's view
   9.245 +of the repository.
   9.246 +
   9.247 +\subsubsection{The forking non-problem}
   9.248 +
   9.249 +It has been suggested that distributed revision control tools pose
   9.250 +some sort of risk to open source projects because they make it easy to
   9.251 +``fork'' the development of a project.  A fork happens when there are
   9.252 +differences in opinion or attitude between groups of developers that
   9.253 +cause them to decide that they can't work together any longer.  Each
   9.254 +side takes a more or less complete copy of the project's source code,
   9.255 +and goes off in its own direction.
   9.256 +
   9.257 +Sometimes the camps in a fork decide to reconcile their differences.
   9.258 +With a centralised revision control system, the \emph{technical}
   9.259 +process of reconciliation is painful, and has to be performed largely
   9.260 +by hand.  You have to decide whose revision history is going to
   9.261 +``win'', and graft the other team's changes into the tree somehow.
   9.262 +This usually loses some or all of one side's revision history.
   9.263 +
   9.264 +What distributed tools do with respect to forking is they make forking
   9.265 +the \emph{only} way to develop a project.  Every single change that
   9.266 +you make is potentially a fork point.  The great strength of this
   9.267 +approach is that a distributed revision control tool has to be really
   9.268 +good at \emph{merging} forks, because forks are absolutely
   9.269 +fundamental: they happen all the time.  
   9.270 +
   9.271 +If every piece of work that everybody does, all the time, is framed in
   9.272 +terms of forking and merging, then what the open source world refers
   9.273 +to as a ``fork'' becomes \emph{purely} a social issue.  If anything,
   9.274 +distributed tools \emph{lower} the likelihood of a fork:
   9.275 +\begin{itemize}
   9.276 +\item They eliminate the social distinction that centralised tools
   9.277 +  impose: that between insiders (people with commit access) and
   9.278 +  outsiders (people without).
   9.279 +\item They make it easier to reconcile after a social fork, because
   9.280 +  all that's involved from the perspective of the revision control
   9.281 +  software is just another merge.
   9.282 +\end{itemize}
   9.283 +
   9.284 +Some people resist distributed tools because they want to retain tight
   9.285 +control over their projects, and they believe that centralised tools
   9.286 +give them this control.  However, if you're of this belief, and you
   9.287 +publish your CVS or Subversion repositories publically, there are
   9.288 +plenty of tools available that can pull out your entire project's
   9.289 +history (albeit slowly) and recreate it somewhere that you don't
   9.290 +control.  So while your control in this case is illusory, you are
   9.291 +forgoing the ability to fluidly collaborate with whatever people feel
   9.292 +compelled to mirror and fork your history.
   9.293 +
   9.294 +\subsection{Advantages for commercial projects}
   9.295 +
   9.296 +Many commercial projects are undertaken by teams that are scattered
   9.297 +across the globe.  Contributors who are far from a central server will
   9.298 +see slower command execution and perhaps less reliability.  Commercial
   9.299 +revision control systems attempt to ameliorate these problems with
   9.300 +remote-site replication add-ons that are typically expensive to buy
   9.301 +and cantankerous to administer.  A distributed system doesn't suffer
   9.302 +from these problems in the first place.  Better yet, you can easily
   9.303 +set up multiple authoritative servers, say one per site, so that
   9.304 +there's no redundant communication between repositories over expensive
   9.305 +long-haul network links.
   9.306 +
   9.307 +Centralised revision control systems tend to have relatively low
   9.308 +scalability.  It's not unusual for an expensive centralised system to
   9.309 +fall over under the combined load of just a few dozen concurrent
   9.310 +users.  Once again, the typical response tends to be an expensive and
   9.311 +clunky replication facility.  Since the load on a central server---if
   9.312 +you have one at all---is many times lower with a distributed
   9.313 +tool (because all of the data is replicated everywhere), a single
   9.314 +cheap server can handle the needs of a much larger team, and
   9.315 +replication to balance load becomes a simple matter of scripting.
   9.316 +
   9.317 +If you have an employee in the field, troubleshooting a problem at a
   9.318 +customer's site, they'll benefit from distributed revision control.
   9.319 +The tool will let them generate custom builds, try different fixes in
   9.320 +isolation from each other, and search efficiently through history for
   9.321 +the sources of bugs and regressions in the customer's environment, all
   9.322 +without needing to connect to your company's network.
   9.323 +
   9.324 +\section{Why choose Mercurial?}
   9.325 +
   9.326 +Mercurial has a unique set of properties that make it a particularly
   9.327 +good choice as a revision control system.
   9.328 +\begin{itemize}
   9.329 +\item It is easy to learn and use.
   9.330 +\item It is lightweight.
   9.331 +\item It scales excellently.
   9.332 +\item It is easy to customise.
   9.333 +\end{itemize}
   9.334 +
   9.335 +If you are at all familiar with revision control systems, you should
   9.336 +be able to get up and running with Mercurial in less than five
   9.337 +minutes.  Even if not, it will take no more than a few minutes
   9.338 +longer.  Mercurial's command and feature sets are generally uniform
   9.339 +and consistent, so you can keep track of a few general rules instead
   9.340 +of a host of exceptions.
   9.341 +
   9.342 +On a small project, you can start working with Mercurial in moments.
   9.343 +Creating new changes and branches; transferring changes around
   9.344 +(whether locally or over a network); and history and status operations
   9.345 +are all fast.  Mercurial attempts to stay nimble and largely out of
   9.346 +your way by combining low cognitive overhead with blazingly fast
   9.347 +operations.
   9.348 +
   9.349 +The usefulness of Mercurial is not limited to small projects: it is
   9.350 +used by projects with hundreds to thousands of contributors, each
   9.351 +containing tens of thousands of files and hundreds of megabytes of
   9.352 +source code.
   9.353 +
   9.354 +If the core functionality of Mercurial is not enough for you, it's
   9.355 +easy to build on.  Mercurial is well suited to scripting tasks, and
   9.356 +its clean internals and implementation in Python make it easy to add
   9.357 +features in the form of extensions.  There are a number of popular and
   9.358 +useful extensions already available, ranging from helping to identify
   9.359 +bugs to improving performance.
   9.360 +
   9.361 +\section{Mercurial compared with other tools}
   9.362 +
   9.363 +Before you read on, please understand that this section necessarily
   9.364 +reflects my own experiences, interests, and (dare I say it) biases.  I
   9.365 +have used every one of the revision control tools listed below, in
   9.366 +most cases for several years at a time.
   9.367 +
   9.368 +
   9.369 +\subsection{Subversion}
   9.370 +
   9.371 +Subversion is a popular revision control tool, developed to replace
   9.372 +CVS.  It has a centralised client/server architecture.
   9.373 +
   9.374 +Subversion and Mercurial have similarly named commands for performing
   9.375 +the same operations, so if you're familiar with one, it is easy to
   9.376 +learn to use the other.  Both tools are portable to all popular
   9.377 +operating systems.
   9.378 +
   9.379 +Prior to version 1.5, Subversion had no useful support for merges.
   9.380 +At the time of writing, its merge tracking capability is new, and known to be
   9.381 +\href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated
   9.382 +  and buggy}.
   9.383 +
   9.384 +Mercurial has a substantial performance advantage over Subversion on
   9.385 +every revision control operation I have benchmarked.  I have measured
   9.386 +its advantage as ranging from a factor of two to a factor of six when
   9.387 +compared with Subversion~1.4.3's \emph{ra\_local} file store, which is
   9.388 +the fastest access method available.  In more realistic deployments
   9.389 +involving a network-based store, Subversion will be at a substantially
   9.390 +larger disadvantage.  Because many Subversion commands must talk to
   9.391 +the server and Subversion does not have useful replication facilities,
   9.392 +server capacity and network bandwidth become bottlenecks for modestly
   9.393 +large projects.
   9.394 +
   9.395 +Additionally, Subversion incurs substantial storage overhead to avoid
   9.396 +network transactions for a few common operations, such as finding
   9.397 +modified files (\texttt{status}) and displaying modifications against
   9.398 +the current revision (\texttt{diff}).  As a result, a Subversion
   9.399 +working copy is often the same size as, or larger than, a Mercurial
   9.400 +repository and working directory, even though the Mercurial repository
   9.401 +contains a complete history of the project.
   9.402 +
   9.403 +Subversion is widely supported by third party tools.  Mercurial
   9.404 +currently lags considerably in this area.  This gap is closing,
   9.405 +however, and indeed some of Mercurial's GUI tools now outshine their
   9.406 +Subversion equivalents.  Like Mercurial, Subversion has an excellent
   9.407 +user manual.
   9.408 +
   9.409 +Because Subversion doesn't store revision history on the client, it is
   9.410 +well suited to managing projects that deal with lots of large, opaque
   9.411 +binary files.  If you check in fifty revisions to an incompressible
   9.412 +10MB file, Subversion's client-side space usage stays constant The
   9.413 +space used by any distributed SCM will grow rapidly in proportion to
   9.414 +the number of revisions, because the differences between each revision
   9.415 +are large.
   9.416 +
   9.417 +In addition, it's often difficult or, more usually, impossible to
   9.418 +merge different versions of a binary file.  Subversion's ability to
   9.419 +let a user lock a file, so that they temporarily have the exclusive
   9.420 +right to commit changes to it, can be a significant advantage to a
   9.421 +project where binary files are widely used.
   9.422 +
   9.423 +Mercurial can import revision history from a Subversion repository.
   9.424 +It can also export revision history to a Subversion repository.  This
   9.425 +makes it easy to ``test the waters'' and use Mercurial and Subversion
   9.426 +in parallel before deciding to switch.  History conversion is
   9.427 +incremental, so you can perform an initial conversion, then small
   9.428 +additional conversions afterwards to bring in new changes.
   9.429 +
   9.430 +
   9.431 +\subsection{Git}
   9.432 +
   9.433 +Git is a distributed revision control tool that was developed for
   9.434 +managing the Linux kernel source tree.  Like Mercurial, its early
   9.435 +design was somewhat influenced by Monotone.
   9.436 +
   9.437 +Git has a very large command set, with version~1.5.0 providing~139
   9.438 +individual commands.  It has something of a reputation for being
   9.439 +difficult to learn.  Compared to Git, Mercurial has a strong focus on
   9.440 +simplicity.
   9.441 +
   9.442 +In terms of performance, Git is extremely fast.  In several cases, it
   9.443 +is faster than Mercurial, at least on Linux, while Mercurial performs
   9.444 +better on other operations.  However, on Windows, the performance and
   9.445 +general level of support that Git provides is, at the time of writing,
   9.446 +far behind that of Mercurial.
   9.447 +
   9.448 +While a Mercurial repository needs no maintenance, a Git repository
   9.449 +requires frequent manual ``repacks'' of its metadata.  Without these,
   9.450 +performance degrades, while space usage grows rapidly.  A server that
   9.451 +contains many Git repositories that are not rigorously and frequently
   9.452 +repacked will become heavily disk-bound during backups, and there have
   9.453 +been instances of daily backups taking far longer than~24 hours as a
   9.454 +result.  A freshly packed Git repository is slightly smaller than a
   9.455 +Mercurial repository, but an unpacked repository is several orders of
   9.456 +magnitude larger.
   9.457 +
   9.458 +The core of Git is written in C.  Many Git commands are implemented as
   9.459 +shell or Perl scripts, and the quality of these scripts varies widely.
   9.460 +I have encountered several instances where scripts charged along
   9.461 +blindly in the presence of errors that should have been fatal.
   9.462 +
   9.463 +Mercurial can import revision history from a Git repository.
   9.464 +
   9.465 +
   9.466 +\subsection{CVS}
   9.467 +
   9.468 +CVS is probably the most widely used revision control tool in the
   9.469 +world.  Due to its age and internal untidiness, it has been only
   9.470 +lightly maintained for many years.
   9.471 +
   9.472 +It has a centralised client/server architecture.  It does not group
   9.473 +related file changes into atomic commits, making it easy for people to
   9.474 +``break the build'': one person can successfully commit part of a
   9.475 +change and then be blocked by the need for a merge, causing other
   9.476 +people to see only a portion of the work they intended to do.  This
   9.477 +also affects how you work with project history.  If you want to see
   9.478 +all of the modifications someone made as part of a task, you will need
   9.479 +to manually inspect the descriptions and timestamps of the changes
   9.480 +made to each file involved (if you even know what those files were).
   9.481 +
   9.482 +CVS has a muddled notion of tags and branches that I will not attempt
   9.483 +to even describe.  It does not support renaming of files or
   9.484 +directories well, making it easy to corrupt a repository.  It has
   9.485 +almost no internal consistency checking capabilities, so it is usually
   9.486 +not even possible to tell whether or how a repository is corrupt.  I
   9.487 +would not recommend CVS for any project, existing or new.
   9.488 +
   9.489 +Mercurial can import CVS revision history.  However, there are a few
   9.490 +caveats that apply; these are true of every other revision control
   9.491 +tool's CVS importer, too.  Due to CVS's lack of atomic changes and
   9.492 +unversioned filesystem hierarchy, it is not possible to reconstruct
   9.493 +CVS history completely accurately; some guesswork is involved, and
   9.494 +renames will usually not show up.  Because a lot of advanced CVS
   9.495 +administration has to be done by hand and is hence error-prone, it's
   9.496 +common for CVS importers to run into multiple problems with corrupted
   9.497 +repositories (completely bogus revision timestamps and files that have
   9.498 +remained locked for over a decade are just two of the less interesting
   9.499 +problems I can recall from personal experience).
   9.500 +
   9.501 +Mercurial can import revision history from a CVS repository.
   9.502 +
   9.503 +
   9.504 +\subsection{Commercial tools}
   9.505 +
   9.506 +Perforce has a centralised client/server architecture, with no
   9.507 +client-side caching of any data.  Unlike modern revision control
   9.508 +tools, Perforce requires that a user run a command to inform the
   9.509 +server about every file they intend to edit.
   9.510 +
   9.511 +The performance of Perforce is quite good for small teams, but it
   9.512 +falls off rapidly as the number of users grows beyond a few dozen.
   9.513 +Modestly large Perforce installations require the deployment of
   9.514 +proxies to cope with the load their users generate.
   9.515 +
   9.516 +
   9.517 +\subsection{Choosing a revision control tool}
   9.518 +
   9.519 +With the exception of CVS, all of the tools listed above have unique
   9.520 +strengths that suit them to particular styles of work.  There is no
   9.521 +single revision control tool that is best in all situations.
   9.522 +
   9.523 +As an example, Subversion is a good choice for working with frequently
   9.524 +edited binary files, due to its centralised nature and support for
   9.525 +file locking.
   9.526 +
   9.527 +I personally find Mercurial's properties of simplicity, performance,
   9.528 +and good merge support to be a compelling combination that has served
   9.529 +me well for several years.
   9.530 +
   9.531 +
   9.532 +\section{Switching from another tool to Mercurial}
   9.533 +
   9.534 +Mercurial is bundled with an extension named \hgext{convert}, which
   9.535 +can incrementally import revision history from several other revision
   9.536 +control tools.  By ``incremental'', I mean that you can convert all of
   9.537 +a project's history to date in one go, then rerun the conversion later
   9.538 +to obtain new changes that happened after the initial conversion.
   9.539 +
   9.540 +The revision control tools supported by \hgext{convert} are as
   9.541 +follows:
   9.542 +\begin{itemize}
   9.543 +\item Subversion
   9.544 +\item CVS
   9.545 +\item Git
   9.546 +\item Darcs
   9.547 +\end{itemize}
   9.548 +
   9.549 +In addition, \hgext{convert} can export changes from Mercurial to
   9.550 +Subversion.  This makes it possible to try Subversion and Mercurial in
   9.551 +parallel before committing to a switchover, without risking the loss
   9.552 +of any work.
   9.553 +
   9.554 +The \hgxcmd{conver}{convert} command is easy to use.  Simply point it
   9.555 +at the path or URL of the source repository, optionally give it the
   9.556 +name of the destination repository, and it will start working.  After
   9.557 +the initial conversion, just run the same command again to import new
   9.558 +changes.
   9.559 +
   9.560 +
   9.561 +%%% Local Variables: 
   9.562 +%%% mode: latex
   9.563 +%%% TeX-master: "00book"
   9.564 +%%% End: 

    10.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    10.2 +++ b/en/ch02-tour-basic.tex	Thu Jan 29 22:56:27 2009 -0800
    10.3 @@ -0,0 +1,624 @@
    10.4 +\chapter{A tour of Mercurial: the basics}
    10.5 +\label{chap:tour-basic}
    10.6 +
    10.7 +\section{Installing Mercurial on your system}
    10.8 +\label{sec:tour:install}
    10.9 +
   10.10 +Prebuilt binary packages of Mercurial are available for every popular
   10.11 +operating system.  These make it easy to start using Mercurial on your
   10.12 +computer immediately.
   10.13 +
   10.14 +\subsection{Linux}
   10.15 +
   10.16 +Because each Linux distribution has its own packaging tools, policies,
   10.17 +and rate of development, it's difficult to give a comprehensive set of
   10.18 +instructions on how to install Mercurial binaries.  The version of
   10.19 +Mercurial that you will end up with can vary depending on how active
   10.20 +the person is who maintains the package for your distribution.
   10.21 +
   10.22 +To keep things simple, I will focus on installing Mercurial from the
   10.23 +command line under the most popular Linux distributions.  Most of
   10.24 +these distributions provide graphical package managers that will let
   10.25 +you install Mercurial with a single click; the package name to look
   10.26 +for is \texttt{mercurial}.
   10.27 +
   10.28 +\begin{itemize}
   10.29 +\item[Debian]
   10.30 +  \begin{codesample4}
   10.31 +    apt-get install mercurial
   10.32 +  \end{codesample4}
   10.33 +
   10.34 +\item[Fedora Core]
   10.35 +  \begin{codesample4}
   10.36 +    yum install mercurial
   10.37 +  \end{codesample4}
   10.38 +
   10.39 +\item[Gentoo]
   10.40 +  \begin{codesample4}
   10.41 +    emerge mercurial
   10.42 +  \end{codesample4}
   10.43 +
   10.44 +\item[OpenSUSE]
   10.45 +  \begin{codesample4}
   10.46 +    yum install mercurial
   10.47 +  \end{codesample4}
   10.48 +
   10.49 +\item[Ubuntu] Ubuntu's Mercurial package is based on Debian's.  To
   10.50 +  install it, run the following command.
   10.51 +  \begin{codesample4}
   10.52 +    apt-get install mercurial
   10.53 +  \end{codesample4}
   10.54 +  The Ubuntu package for Mercurial tends to lag behind the Debian
   10.55 +  version by a considerable time margin (at the time of writing, seven
   10.56 +  months), which in some cases will mean that on Ubuntu, you may run
   10.57 +  into problems that have since been fixed in the Debian package.
   10.58 +\end{itemize}
   10.59 +
   10.60 +\subsection{Solaris}
   10.61 +
   10.62 +SunFreeWare, at \url{http://www.sunfreeware.com}, is a good source for a
   10.63 +large number of pre-built Solaris packages for 32 and 64 bit Intel and
   10.64 +Sparc architectures, including current versions of Mercurial.
   10.65 +
   10.66 +\subsection{Mac OS X}
   10.67 +
   10.68 +Lee Cantey publishes an installer of Mercurial for Mac OS~X at
   10.69 +\url{http://mercurial.berkwood.com}.  This package works on both
   10.70 +Intel-~and Power-based Macs.  Before you can use it, you must install
   10.71 +a compatible version of Universal MacPython~\cite{web:macpython}.  This
   10.72 +is easy to do; simply follow the instructions on Lee's site.
   10.73 +
   10.74 +It's also possible to install Mercurial using Fink or MacPorts,
   10.75 +two popular free package managers for Mac OS X.  If you have Fink,
   10.76 +use \command{sudo apt-get install mercurial-py25}.  If MacPorts,
   10.77 +\command{sudo port install mercurial}.
   10.78 +
   10.79 +\subsection{Windows}
   10.80 +
   10.81 +Lee Cantey publishes an installer of Mercurial for Windows at
   10.82 +\url{http://mercurial.berkwood.com}.  This package has no external
   10.83 +dependencies; it ``just works''.
   10.84 +
   10.85 +\begin{note}
   10.86 +  The Windows version of Mercurial does not automatically convert line
   10.87 +  endings between Windows and Unix styles.  If you want to share work
   10.88 +  with Unix users, you must do a little additional configuration
   10.89 +  work. XXX Flesh this out.
   10.90 +\end{note}
   10.91 +
   10.92 +\section{Getting started}
   10.93 +
   10.94 +To begin, we'll use the \hgcmd{version} command to find out whether
   10.95 +Mercurial is actually installed properly.  The actual version
   10.96 +information that it prints isn't so important; it's whether it prints
   10.97 +anything at all that we care about.
   10.98 +\interaction{tour.version}
   10.99 +
  10.100 +\subsection{Built-in help}
  10.101 +
  10.102 +Mercurial provides a built-in help system.  This is invaluable for those
  10.103 +times when you find yourself stuck trying to remember how to run a
  10.104 +command.  If you are completely stuck, simply run \hgcmd{help}; it
  10.105 +will print a brief list of commands, along with a description of what
  10.106 +each does.  If you ask for help on a specific command (as below), it
  10.107 +prints more detailed information.
  10.108 +\interaction{tour.help}
  10.109 +For a more impressive level of detail (which you won't usually need)
  10.110 +run \hgcmdargs{help}{\hggopt{-v}}.  The \hggopt{-v} option is short
  10.111 +for \hggopt{--verbose}, and tells Mercurial to print more information
  10.112 +than it usually would.
  10.113 +
  10.114 +\section{Working with a repository}
  10.115 +
  10.116 +In Mercurial, everything happens inside a \emph{repository}.  The
  10.117 +repository for a project contains all of the files that ``belong to''
  10.118 +that project, along with a historical record of the project's files.
  10.119 +
  10.120 +There's nothing particularly magical about a repository; it is simply
  10.121 +a directory tree in your filesystem that Mercurial treats as special.
  10.122 +You can rename or delete a repository any time you like, using either the
  10.123 +command line or your file browser.
  10.124 +
  10.125 +\subsection{Making a local copy of a repository}
  10.126 +
  10.127 +\emph{Copying} a repository is just a little bit special.  While you
  10.128 +could use a normal file copying command to make a copy of a
  10.129 +repository, it's best to use a built-in command that Mercurial
  10.130 +provides.  This command is called \hgcmd{clone}, because it creates an
  10.131 +identical copy of an existing repository.
  10.132 +\interaction{tour.clone}
  10.133 +If our clone succeeded, we should now have a local directory called
  10.134 +\dirname{hello}.  This directory will contain some files.
  10.135 +\interaction{tour.ls}
  10.136 +These files have the same contents and history in our repository as
  10.137 +they do in the repository we cloned.
  10.138 +
  10.139 +Every Mercurial repository is complete, self-contained, and
  10.140 +independent.  It contains its own private copy of a project's files
  10.141 +and history.  A cloned repository remembers the location of the
  10.142 +repository it was cloned from, but it does not communicate with that
  10.143 +repository, or any other, unless you tell it to.
  10.144 +
  10.145 +What this means for now is that we're free to experiment with our
  10.146 +repository, safe in the knowledge that it's a private ``sandbox'' that
  10.147 +won't affect anyone else.
  10.148 +
  10.149 +\subsection{What's in a repository?}
  10.150 +
  10.151 +When we take a more detailed look inside a repository, we can see that
  10.152 +it contains a directory named \dirname{.hg}.  This is where Mercurial
  10.153 +keeps all of its metadata for the repository.
  10.154 +\interaction{tour.ls-a}
  10.155 +
  10.156 +The contents of the \dirname{.hg} directory and its subdirectories are
  10.157 +private to Mercurial.  Every other file and directory in the
  10.158 +repository is yours to do with as you please.
  10.159 +
  10.160 +To introduce a little terminology, the \dirname{.hg} directory is the
  10.161 +``real'' repository, and all of the files and directories that coexist
  10.162 +with it are said to live in the \emph{working directory}.  An easy way
  10.163 +to remember the distinction is that the \emph{repository} contains the
  10.164 +\emph{history} of your project, while the \emph{working directory}
  10.165 +contains a \emph{snapshot} of your project at a particular point in
  10.166 +history.
  10.167 +
  10.168 +\section{A tour through history}
  10.169 +
  10.170 +One of the first things we might want to do with a new, unfamiliar
  10.171 +repository is understand its history.  The \hgcmd{log} command gives
  10.172 +us a view of history.
  10.173 +\interaction{tour.log}
  10.174 +By default, this command prints a brief paragraph of output for each
  10.175 +change to the project that was recorded.  In Mercurial terminology, we
  10.176 +call each of these recorded events a \emph{changeset}, because it can
  10.177 +contain a record of changes to several files.
  10.178 +
  10.179 +The fields in a record of output from \hgcmd{log} are as follows.
  10.180 +\begin{itemize}
  10.181 +\item[\texttt{changeset}] This field has the format of a number,
  10.182 +  followed by a colon, followed by a hexadecimal string.  These are
  10.183 +  \emph{identifiers} for the changeset.  There are two identifiers
  10.184 +  because the number is shorter and easier to type than the hex
  10.185 +  string.
  10.186 +\item[\texttt{user}] The identity of the person who created the
  10.187 +  changeset.  This is a free-form field, but it most often contains a
  10.188 +  person's name and email address.
  10.189 +\item[\texttt{date}] The date and time on which the changeset was
  10.190 +  created, and the timezone in which it was created.  (The date and
  10.191 +  time are local to that timezone; they display what time and date it
  10.192 +  was for the person who created the changeset.)
  10.193 +\item[\texttt{summary}] The first line of the text message that the
  10.194 +  creator of the changeset entered to describe the changeset.
  10.195 +\end{itemize}
  10.196 +The default output printed by \hgcmd{log} is purely a summary; it is
  10.197 +missing a lot of detail.
  10.198 +
  10.199 +Figure~\ref{fig:tour-basic:history} provides a graphical representation of
  10.200 +the history of the \dirname{hello} repository, to make it a little
  10.201 +easier to see which direction history is ``flowing'' in.  We'll be
  10.202 +returning to this figure several times in this chapter and the chapter
  10.203 +that follows.
  10.204 +
  10.205 +\begin{figure}[ht]
  10.206 +  \centering
  10.207 +  \grafix{tour-history}
  10.208 +  \caption{Graphical history of the \dirname{hello} repository}
  10.209 +  \label{fig:tour-basic:history}
  10.210 +\end{figure}
  10.211 +
  10.212 +\subsection{Changesets, revisions, and talking to other 
  10.213 +  people}
  10.214 +
  10.215 +As English is a notoriously sloppy language, and computer science has
  10.216 +a hallowed history of terminological confusion (why use one term when
  10.217 +four will do?), revision control has a variety of words and phrases
  10.218 +that mean the same thing.  If you are talking about Mercurial history
  10.219 +with other people, you will find that the word ``changeset'' is often
  10.220 +compressed to ``change'' or (when written) ``cset'', and sometimes a
  10.221 +changeset is referred to as a ``revision'' or a ``rev''.
  10.222 +
  10.223 +While it doesn't matter what \emph{word} you use to refer to the
  10.224 +concept of ``a~changeset'', the \emph{identifier} that you use to
  10.225 +refer to ``a~\emph{specific} changeset'' is of great importance.
  10.226 +Recall that the \texttt{changeset} field in the output from
  10.227 +\hgcmd{log} identifies a changeset using both a number and a
  10.228 +hexadecimal string.
  10.229 +\begin{itemize}
  10.230 +\item The revision number is \emph{only valid in that repository},
  10.231 +\item while the hex string is the \emph{permanent, unchanging
  10.232 +    identifier} that will always identify that exact changeset in
  10.233 +  \emph{every} copy of the repository.
  10.234 +\end{itemize}
  10.235 +This distinction is important.  If you send someone an email talking
  10.236 +about ``revision~33'', there's a high likelihood that their
  10.237 +revision~33 will \emph{not be the same} as yours.  The reason for this
  10.238 +is that a revision number depends on the order in which changes
  10.239 +arrived in a repository, and there is no guarantee that the same
  10.240 +changes will happen in the same order in different repositories.
  10.241 +Three changes $a,b,c$ can easily appear in one repository as $0,1,2$,
  10.242 +while in another as $1,0,2$.
  10.243 +
  10.244 +Mercurial uses revision numbers purely as a convenient shorthand.  If
  10.245 +you need to discuss a changeset with someone, or make a record of a
  10.246 +changeset for some other reason (for example, in a bug report), use
  10.247 +the hexadecimal identifier.
  10.248 +
  10.249 +\subsection{Viewing specific revisions}
  10.250 +
  10.251 +To narrow the output of \hgcmd{log} down to a single revision, use the
  10.252 +\hgopt{log}{-r} (or \hgopt{log}{--rev}) option.  You can use either a
  10.253 +revision number or a long-form changeset identifier, and you can
  10.254 +provide as many revisions as you want.  \interaction{tour.log-r}
  10.255 +
  10.256 +If you want to see the history of several revisions without having to
  10.257 +list each one, you can use \emph{range notation}; this lets you
  10.258 +express the idea ``I want all revisions between $a$ and $b$,
  10.259 +inclusive''.
  10.260 +\interaction{tour.log.range}
  10.261 +Mercurial also honours the order in which you specify revisions, so
  10.262 +\hgcmdargs{log}{-r 2:4} prints $2,3,4$ while \hgcmdargs{log}{-r 4:2}
  10.263 +prints $4,3,2$.
  10.264 +
  10.265 +\subsection{More detailed information}
  10.266 +
  10.267 +While the summary information printed by \hgcmd{log} is useful if you
  10.268 +already know what you're looking for, you may need to see a complete
  10.269 +description of the change, or a list of the files changed, if you're
  10.270 +trying to decide whether a changeset is the one you're looking for.
  10.271 +The \hgcmd{log} command's \hggopt{-v} (or \hggopt{--verbose})
  10.272 +option gives you this extra detail.
  10.273 +\interaction{tour.log-v}
  10.274 +
  10.275 +If you want to see both the description and content of a change, add
  10.276 +the \hgopt{log}{-p} (or \hgopt{log}{--patch}) option.  This displays
  10.277 +the content of a change as a \emph{unified diff} (if you've never seen
  10.278 +a unified diff before, see section~\ref{sec:mq:patch} for an overview).
  10.279 +\interaction{tour.log-vp}
  10.280 +
  10.281 +\section{All about command options}
  10.282 +
  10.283 +Let's take a brief break from exploring Mercurial commands to discuss
  10.284 +a pattern in the way that they work; you may find this useful to keep
  10.285 +in mind as we continue our tour.
  10.286 +
  10.287 +Mercurial has a consistent and straightforward approach to dealing
  10.288 +with the options that you can pass to commands.  It follows the
  10.289 +conventions for options that are common to modern Linux and Unix
  10.290 +systems.
  10.291 +\begin{itemize}
  10.292 +\item Every option has a long name.  For example, as we've already
  10.293 +  seen, the \hgcmd{log} command accepts a \hgopt{log}{--rev} option.
  10.294 +\item Most options have short names, too.  Instead of
  10.295 +  \hgopt{log}{--rev}, we can use \hgopt{log}{-r}.  (The reason that
  10.296 +  some options don't have short names is that the options in question
  10.297 +  are rarely used.)
  10.298 +\item Long options start with two dashes (e.g.~\hgopt{log}{--rev}),
  10.299 +  while short options start with one (e.g.~\hgopt{log}{-r}).
  10.300 +\item Option naming and usage is consistent across commands.  For
  10.301 +  example, every command that lets you specify a changeset~ID or
  10.302 +  revision number accepts both \hgopt{log}{-r} and \hgopt{log}{--rev}
  10.303 +  arguments.
  10.304 +\end{itemize}
  10.305 +In the examples throughout this book, I use short options instead of
  10.306 +long.  This just reflects my own preference, so don't read anything
  10.307 +significant into it.
  10.308 +
  10.309 +Most commands that print output of some kind will print more output
  10.310 +when passed a \hggopt{-v} (or \hggopt{--verbose}) option, and less
  10.311 +when passed \hggopt{-q} (or \hggopt{--quiet}).
  10.312 +
  10.313 +\section{Making and reviewing changes}
  10.314 +
  10.315 +Now that we have a grasp of viewing history in Mercurial, let's take a
  10.316 +look at making some changes and examining them.
  10.317 +
  10.318 +The first thing we'll do is isolate our experiment in a repository of
  10.319 +its own.  We use the \hgcmd{clone} command, but we don't need to
  10.320 +clone a copy of the remote repository.  Since we already have a copy
  10.321 +of it locally, we can just clone that instead.  This is much faster
  10.322 +than cloning over the network, and cloning a local repository uses
  10.323 +less disk space in most cases, too.
  10.324 +\interaction{tour.reclone}
  10.325 +As an aside, it's often good practice to keep a ``pristine'' copy of a
  10.326 +remote repository around, which you can then make temporary clones of
  10.327 +to create sandboxes for each task you want to work on.  This lets you
  10.328 +work on multiple tasks in parallel, each isolated from the others
  10.329 +until it's complete and you're ready to integrate it back.  Because
  10.330 +local clones are so cheap, there's almost no overhead to cloning and
  10.331 +destroying repositories whenever you want.
  10.332 +
  10.333 +In our \dirname{my-hello} repository, we have a file
  10.334 +\filename{hello.c} that contains the classic ``hello, world'' program.
  10.335 +Let's use the ancient and venerable \command{sed} command to edit this
  10.336 +file so that it prints a second line of output.  (I'm only using
  10.337 +\command{sed} to do this because it's easy to write a scripted example
  10.338 +this way.  Since you're not under the same constraint, you probably
  10.339 +won't want to use \command{sed}; simply use your preferred text editor to
  10.340 +do the same thing.)
  10.341 +\interaction{tour.sed}
  10.342 +
  10.343 +Mercurial's \hgcmd{status} command will tell us what Mercurial knows
  10.344 +about the files in the repository.
  10.345 +\interaction{tour.status}
  10.346 +The \hgcmd{status} command prints no output for some files, but a line
  10.347 +starting with ``\texttt{M}'' for \filename{hello.c}.  Unless you tell
  10.348 +it to, \hgcmd{status} will not print any output for files that have
  10.349 +not been modified.  
  10.350 +
  10.351 +The ``\texttt{M}'' indicates that Mercurial has noticed that we
  10.352 +modified \filename{hello.c}.  We didn't need to \emph{inform}
  10.353 +Mercurial that we were going to modify the file before we started, or
  10.354 +that we had modified the file after we were done; it was able to
  10.355 +figure this out itself.
  10.356 +
  10.357 +It's a little bit helpful to know that we've modified
  10.358 +\filename{hello.c}, but we might prefer to know exactly \emph{what}
  10.359 +changes we've made to it.  To do this, we use the \hgcmd{diff}
  10.360 +command.
  10.361 +\interaction{tour.diff}
  10.362 +
  10.363 +\section{Recording changes in a new changeset}
  10.364 +
  10.365 +We can modify files, build and test our changes, and use
  10.366 +\hgcmd{status} and \hgcmd{diff} to review our changes, until we're
  10.367 +satisfied with what we've done and arrive at a natural stopping point
  10.368 +where we want to record our work in a new changeset.
  10.369 +
  10.370 +The \hgcmd{commit} command lets us create a new changeset; we'll
  10.371 +usually refer to this as ``making a commit'' or ``committing''.  
  10.372 +
  10.373 +\subsection{Setting up a username}
  10.374 +
  10.375 +When you try to run \hgcmd{commit} for the first time, it is not
  10.376 +guaranteed to succeed.  Mercurial records your name and address with
  10.377 +each change that you commit, so that you and others will later be able
  10.378 +to tell who made each change.  Mercurial tries to automatically figure
  10.379 +out a sensible username to commit the change with.  It will attempt
  10.380 +each of the following methods, in order:
  10.381 +\begin{enumerate}
  10.382 +\item If you specify a \hgopt{commit}{-u} option to the \hgcmd{commit}
  10.383 +  command on the command line, followed by a username, this is always
  10.384 +  given the highest precedence.
  10.385 +\item If you have set the \envar{HGUSER} environment variable, this is
  10.386 +  checked next.
  10.387 +\item If you create a file in your home directory called
  10.388 +  \sfilename{.hgrc}, with a \rcitem{ui}{username} entry, that will be
  10.389 +  used next.  To see what the contents of this file should look like,
  10.390 +  refer to section~\ref{sec:tour-basic:username} below.
  10.391 +\item If you have set the \envar{EMAIL} environment variable, this
  10.392 +  will be used next.
  10.393 +\item Mercurial will query your system to find out your local user
  10.394 +  name and host name, and construct a username from these components.
  10.395 +  Since this often results in a username that is not very useful, it
  10.396 +  will print a warning if it has to do this.
  10.397 +\end{enumerate}
  10.398 +If all of these mechanisms fail, Mercurial will fail, printing an
  10.399 +error message.  In this case, it will not let you commit until you set
  10.400 +up a username.
  10.401 +
  10.402 +You should think of the \envar{HGUSER} environment variable and the
  10.403 +\hgopt{commit}{-u} option to the \hgcmd{commit} command as ways to
  10.404 +\emph{override} Mercurial's default selection of username.  For normal
  10.405 +use, the simplest and most robust way to set a username for yourself
  10.406 +is by creating a \sfilename{.hgrc} file; see below for details.
  10.407 +
  10.408 +\subsubsection{Creating a Mercurial configuration file}
  10.409 +\label{sec:tour-basic:username}
  10.410 +
  10.411 +To set a user name, use your favourite editor to create a file called
  10.412 +\sfilename{.hgrc} in your home directory.  Mercurial will use this
  10.413 +file to look up your personalised configuration settings.  The initial
  10.414 +contents of your \sfilename{.hgrc} should look like this.
  10.415 +\begin{codesample2}
  10.416 +  # This is a Mercurial configuration file.
  10.417 +  [ui]
  10.418 +  username = Firstname Lastname <email.address@domain.net>
  10.419 +\end{codesample2}
  10.420 +The ``\texttt{[ui]}'' line begins a \emph{section} of the config file,
  10.421 +so you can read the ``\texttt{username = ...}'' line as meaning ``set
  10.422 +the value of the \texttt{username} item in the \texttt{ui} section''.
  10.423 +A section continues until a new section begins, or the end of the
  10.424 +file.  Mercurial ignores empty lines and treats any text from
  10.425 +``\texttt{\#}'' to the end of a line as a comment.
  10.426 +
  10.427 +\subsubsection{Choosing a user name}
  10.428 +
  10.429 +You can use any text you like as the value of the \texttt{username}
  10.430 +config item, since this information is for reading by other people,
  10.431 +but for interpreting by Mercurial.  The convention that most people
  10.432 +follow is to use their name and email address, as in the example
  10.433 +above.
  10.434 +
  10.435 +\begin{note}
  10.436 +  Mercurial's built-in web server obfuscates email addresses, to make
  10.437 +  it more difficult for the email harvesting tools that spammers use.
  10.438 +  This reduces the likelihood that you'll start receiving more junk
  10.439 +  email if you publish a Mercurial repository on the web.
  10.440 +\end{note}
  10.441 +
  10.442 +\subsection{Writing a commit message}
  10.443 +
  10.444 +When we commit a change, Mercurial drops us into a text editor, to
  10.445 +enter a message that will describe the modifications we've made in
  10.446 +this changeset.  This is called the \emph{commit message}.  It will be
  10.447 +a record for readers of what we did and why, and it will be printed by
  10.448 +\hgcmd{log} after we've finished committing.
  10.449 +\interaction{tour.commit}
  10.450 +
  10.451 +The editor that the \hgcmd{commit} command drops us into will contain
  10.452 +an empty line, followed by a number of lines starting with
  10.453 +``\texttt{HG:}''.
  10.454 +\begin{codesample2}
  10.455 +  \emph{empty line}
  10.456 +  HG: changed hello.c
  10.457 +\end{codesample2}
  10.458 +Mercurial ignores the lines that start with ``\texttt{HG:}''; it uses
  10.459 +them only to tell us which files it's recording changes to.  Modifying
  10.460 +or deleting these lines has no effect.
  10.461 +
  10.462 +\subsection{Writing a good commit message}
  10.463 +
  10.464 +Since \hgcmd{log} only prints the first line of a commit message by
  10.465 +default, it's best to write a commit message whose first line stands
  10.466 +alone.  Here's a real example of a commit message that \emph{doesn't}
  10.467 +follow this guideline, and hence has a summary that is not readable.
  10.468 +\begin{codesample2}
  10.469 +  changeset:   73:584af0e231be
  10.470 +  user:        Censored Person <censored.person@example.org>
  10.471 +  date:        Tue Sep 26 21:37:07 2006 -0700
  10.472 +  summary:     include buildmeister/commondefs.   Add an exports and install
  10.473 +\end{codesample2}
  10.474 +
  10.475 +As far as the remainder of the contents of the commit message are
  10.476 +concerned, there are no hard-and-fast rules.  Mercurial itself doesn't
  10.477 +interpret or care about the contents of the commit message, though
  10.478 +your project may have policies that dictate a certain kind of
  10.479 +formatting.
  10.480 +
  10.481 +My personal preference is for short, but informative, commit messages
  10.482 +that tell me something that I can't figure out with a quick glance at
  10.483 +the output of \hgcmdargs{log}{--patch}.
  10.484 +
  10.485 +\subsection{Aborting a commit}
  10.486 +
  10.487 +If you decide that you don't want to commit while in the middle of
  10.488 +editing a commit message, simply exit from your editor without saving
  10.489 +the file that it's editing.  This will cause nothing to happen to
  10.490 +either the repository or the working directory.
  10.491 +
  10.492 +If we run the \hgcmd{commit} command without any arguments, it records
  10.493 +all of the changes we've made, as reported by \hgcmd{status} and
  10.494 +\hgcmd{diff}.
  10.495 +
  10.496 +\subsection{Admiring our new handiwork}
  10.497 +
  10.498 +Once we've finished the commit, we can use the \hgcmd{tip} command to
  10.499 +display the changeset we just created.  This command produces output
  10.500 +that is identical to \hgcmd{log}, but it only displays the newest
  10.501 +revision in the repository.
  10.502 +\interaction{tour.tip}
  10.503 +We refer to the newest revision in the repository as the tip revision,
  10.504 +or simply the tip.
  10.505 +
  10.506 +\section{Sharing changes}
  10.507 +
  10.508 +We mentioned earlier that repositories in Mercurial are
  10.509 +self-contained.  This means that the changeset we just created exists
  10.510 +only in our \dirname{my-hello} repository.  Let's look at a few ways
  10.511 +that we can propagate this change into other repositories.
  10.512 +
  10.513 +\subsection{Pulling changes from another repository}
  10.514 +\label{sec:tour:pull}
  10.515 +
  10.516 +To get started, let's clone our original \dirname{hello} repository,
  10.517 +which does not contain the change we just committed.  We'll call our
  10.518 +temporary repository \dirname{hello-pull}.
  10.519 +\interaction{tour.clone-pull}
  10.520 +
  10.521 +We'll use the \hgcmd{pull} command to bring changes from
  10.522 +\dirname{my-hello} into \dirname{hello-pull}.  However, blindly
  10.523 +pulling unknown changes into a repository is a somewhat scary
  10.524 +prospect.  Mercurial provides the \hgcmd{incoming} command to tell us
  10.525 +what changes the \hgcmd{pull} command \emph{would} pull into the
  10.526 +repository, without actually pulling the changes in.
  10.527 +\interaction{tour.incoming}
  10.528 +(Of course, someone could cause more changesets to appear in the
  10.529 +repository that we ran \hgcmd{incoming} in, before we get a chance to
  10.530 +\hgcmd{pull} the changes, so that we could end up pulling changes that we
  10.531 +didn't expect.)
  10.532 +
  10.533 +Bringing changes into a repository is a simple matter of running the
  10.534 +\hgcmd{pull} command, and telling it which repository to pull from.
  10.535 +\interaction{tour.pull}
  10.536 +As you can see from the before-and-after output of \hgcmd{tip}, we
  10.537 +have successfully pulled changes into our repository.  There remains
  10.538 +one step before we can see these changes in the working directory.
  10.539 +
  10.540 +\subsection{Updating the working directory}
  10.541 +
  10.542 +We have so far glossed over the relationship between a repository and
  10.543 +its working directory.  The \hgcmd{pull} command that we ran in
  10.544 +section~\ref{sec:tour:pull} brought changes into the repository, but
  10.545 +if we check, there's no sign of those changes in the working
  10.546 +directory.  This is because \hgcmd{pull} does not (by default) touch
  10.547 +the working directory.  Instead, we use the \hgcmd{update} command to
  10.548 +do this.
  10.549 +\interaction{tour.update}
  10.550 +
  10.551 +It might seem a bit strange that \hgcmd{pull} doesn't update the
  10.552 +working directory automatically.  There's actually a good reason for
  10.553 +this: you can use \hgcmd{update} to update the working directory to
  10.554 +the state it was in at \emph{any revision} in the history of the
  10.555 +repository.  If you had the working directory updated to an old
  10.556 +revision---to hunt down the origin of a bug, say---and ran a
  10.557 +\hgcmd{pull} which automatically updated the working directory to a
  10.558 +new revision, you might not be terribly happy.
  10.559 +
  10.560 +However, since pull-then-update is such a common thing to do,
  10.561 +Mercurial lets you combine the two by passing the \hgopt{pull}{-u}
  10.562 +option to \hgcmd{pull}.
  10.563 +\begin{codesample2}
  10.564 +  hg pull -u
  10.565 +\end{codesample2}
  10.566 +If you look back at the output of \hgcmd{pull} in
  10.567 +section~\ref{sec:tour:pull} when we ran it without \hgopt{pull}{-u},
  10.568 +you can see that it printed a helpful reminder that we'd have to take
  10.569 +an explicit step to update the working directory:
  10.570 +\begin{codesample2}
  10.571 +  (run 'hg update' to get a working copy)
  10.572 +\end{codesample2}
  10.573 +
  10.574 +To find out what revision the working directory is at, use the
  10.575 +\hgcmd{parents} command.
  10.576 +\interaction{tour.parents}
  10.577 +If you look back at figure~\ref{fig:tour-basic:history}, you'll see
  10.578 +arrows connecting each changeset.  The node that the arrow leads
  10.579 +\emph{from} in each case is a parent, and the node that the arrow
  10.580 +leads \emph{to} is its child.  The working directory has a parent in
  10.581 +just the same way; this is the changeset that the working directory
  10.582 +currently contains.
  10.583 +
  10.584 +To update the working directory to a particular revision, give a
  10.585 +revision number or changeset~ID to the \hgcmd{update} command.
  10.586 +\interaction{tour.older}
  10.587 +If you omit an explicit revision, \hgcmd{update} will update to the
  10.588 +tip revision, as shown by the second call to \hgcmd{update} in the
  10.589 +example above.
  10.590 +
  10.591 +\subsection{Pushing changes to another repository}
  10.592 +
  10.593 +Mercurial lets us push changes to another repository, from the
  10.594 +repository we're currently visiting.  As with the example of
  10.595 +\hgcmd{pull} above, we'll create a temporary repository to push our
  10.596 +changes into.
  10.597 +\interaction{tour.clone-push}
  10.598 +The \hgcmd{outgoing} command tells us what changes would be pushed
  10.599 +into another repository.
  10.600 +\interaction{tour.outgoing}
  10.601 +And the \hgcmd{push} command does the actual push.
  10.602 +\interaction{tour.push}
  10.603 +As with \hgcmd{pull}, the \hgcmd{push} command does not update the
  10.604 +working directory in the repository that it's pushing changes into.
  10.605 +(Unlike \hgcmd{pull}, \hgcmd{push} does not provide a \texttt{-u}
  10.606 +option that updates the other repository's working directory.)
  10.607 +
  10.608 +What happens if we try to pull or push changes and the receiving
  10.609 +repository already has those changes?  Nothing too exciting.
  10.610 +\interaction{tour.push.nothing}
  10.611 +
  10.612 +\subsection{Sharing changes over a network}
  10.613 +
  10.614 +The commands we have covered in the previous few sections are not
  10.615 +limited to working with local repositories.  Each works in exactly the
  10.616 +same fashion over a network connection; simply pass in a URL instead
  10.617 +of a local path.
  10.618 +\interaction{tour.outgoing.net}
  10.619 +In this example, we can see what changes we could push to the remote
  10.620 +repository, but the repository is understandably not set up to let
  10.621 +anonymous users push to it.
  10.622 +\interaction{tour.push.net}
  10.623 +
  10.624 +%%% Local Variables: 
  10.625 +%%% mode: latex
  10.626 +%%% TeX-master: "00book"
  10.627 +%%% End: 

    11.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    11.2 +++ b/en/ch03-tour-merge.tex	Thu Jan 29 22:56:27 2009 -0800
    11.3 @@ -0,0 +1,286 @@
    11.4 +\chapter{A tour of Mercurial: merging work}
    11.5 +\label{chap:tour-merge}
    11.6 +
    11.7 +We've now covered cloning a repository, making changes in a
    11.8 +repository, and pulling or pushing changes from one repository into
    11.9 +another.  Our next step is \emph{merging} changes from separate
   11.10 +repositories.
   11.11 +
   11.12 +\section{Merging streams of work}
   11.13 +
   11.14 +Merging is a fundamental part of working with a distributed revision
   11.15 +control tool.
   11.16 +\begin{itemize}
   11.17 +\item Alice and Bob each have a personal copy of a repository for a
   11.18 +  project they're collaborating on.  Alice fixes a bug in her
   11.19 +  repository; Bob adds a new feature in his.  They want the shared
   11.20 +  repository to contain both the bug fix and the new feature.
   11.21 +\item I frequently work on several different tasks for a single
   11.22 +  project at once, each safely isolated in its own repository.
   11.23 +  Working this way means that I often need to merge one piece of my
   11.24 +  own work with another.
   11.25 +\end{itemize}
   11.26 +
   11.27 +Because merging is such a common thing to need to do, Mercurial makes
   11.28 +it easy.  Let's walk through the process.  We'll begin by cloning yet
   11.29 +another repository (see how often they spring up?) and making a change
   11.30 +in it.
   11.31 +\interaction{tour.merge.clone}
   11.32 +We should now have two copies of \filename{hello.c} with different
   11.33 +contents.  The histories of the two repositories have also diverged,
   11.34 +as illustrated in figure~\ref{fig:tour-merge:sep-repos}.
   11.35 +\interaction{tour.merge.cat}
   11.36 +
   11.37 +\begin{figure}[ht]
   11.38 +  \centering
   11.39 +  \grafix{tour-merge-sep-repos}
   11.40 +  \caption{Divergent recent histories of the \dirname{my-hello} and
   11.41 +    \dirname{my-new-hello} repositories}
   11.42 +  \label{fig:tour-merge:sep-repos}
   11.43 +\end{figure}
   11.44 +
   11.45 +We already know that pulling changes from our \dirname{my-hello}
   11.46 +repository will have no effect on the working directory.
   11.47 +\interaction{tour.merge.pull}
   11.48 +However, the \hgcmd{pull} command says something about ``heads''.  
   11.49 +
   11.50 +\subsection{Head changesets}
   11.51 +
   11.52 +A head is a change that has no descendants, or children, as they're
   11.53 +also known.  The tip revision is thus a head, because the newest
   11.54 +revision in a repository doesn't have any children, but a repository
   11.55 +can contain more than one head.
   11.56 +
   11.57 +\begin{figure}[ht]
   11.58 +  \centering
   11.59 +  \grafix{tour-merge-pull}
   11.60 +  \caption{Repository contents after pulling from \dirname{my-hello} into
   11.61 +    \dirname{my-new-hello}}
   11.62 +  \label{fig:tour-merge:pull}
   11.63 +\end{figure}
   11.64 +
   11.65 +In figure~\ref{fig:tour-merge:pull}, you can see the effect of the
   11.66 +pull from \dirname{my-hello} into \dirname{my-new-hello}.  The history
   11.67 +that was already present in \dirname{my-new-hello} is untouched, but a
   11.68 +new revision has been added.  By referring to
   11.69 +figure~\ref{fig:tour-merge:sep-repos}, we can see that the
   11.70 +\emph{changeset ID} remains the same in the new repository, but the
   11.71 +\emph{revision number} has changed.  (This, incidentally, is a fine
   11.72 +example of why it's not safe to use revision numbers when discussing
   11.73 +changesets.)  We can view the heads in a repository using the
   11.74 +\hgcmd{heads} command.
   11.75 +\interaction{tour.merge.heads}
   11.76 +
   11.77 +\subsection{Performing the merge}
   11.78 +
   11.79 +What happens if we try to use the normal \hgcmd{update} command to
   11.80 +update to the new tip?
   11.81 +\interaction{tour.merge.update}
   11.82 +Mercurial is telling us that the \hgcmd{update} command won't do a
   11.83 +merge; it won't update the working directory when it thinks we might
   11.84 +be wanting to do a merge, unless we force it to do so.  Instead, we
   11.85 +use the \hgcmd{merge} command to merge the two heads.
   11.86 +\interaction{tour.merge.merge}
   11.87 +
   11.88 +\begin{figure}[ht]
   11.89 +  \centering
   11.90 +  \grafix{tour-merge-merge}
   11.91 +  \caption{Working directory and repository during merge, and
   11.92 +    following commit}
   11.93 +  \label{fig:tour-merge:merge}
   11.94 +\end{figure}
   11.95 +
   11.96 +This updates the working directory so that it contains changes from
   11.97 +\emph{both} heads, which is reflected in both the output of
   11.98 +\hgcmd{parents} and the contents of \filename{hello.c}.
   11.99 +\interaction{tour.merge.parents}
  11.100 +
  11.101 +\subsection{Committing the results of the merge}
  11.102 +
  11.103 +Whenever we've done a merge, \hgcmd{parents} will display two parents
  11.104 +until we \hgcmd{commit} the results of the merge.
  11.105 +\interaction{tour.merge.commit}
  11.106 +We now have a new tip revision; notice that it has \emph{both} of
  11.107 +our former heads as its parents.  These are the same revisions that
  11.108 +were previously displayed by \hgcmd{parents}.
  11.109 +\interaction{tour.merge.tip}
  11.110 +In figure~\ref{fig:tour-merge:merge}, you can see a representation of
  11.111 +what happens to the working directory during the merge, and how this
  11.112 +affects the repository when the commit happens.  During the merge, the
  11.113 +working directory has two parent changesets, and these become the
  11.114 +parents of the new changeset.
  11.115 +
  11.116 +\section{Merging conflicting changes}
  11.117 +
  11.118 +Most merges are simple affairs, but sometimes you'll find yourself
  11.119 +merging changes where each modifies the same portions of the same
  11.120 +files.  Unless both modifications are identical, this results in a
  11.121 +\emph{conflict}, where you have to decide how to reconcile the
  11.122 +different changes into something coherent.
  11.123 +
  11.124 +\begin{figure}[ht]
  11.125 +  \centering
  11.126 +  \grafix{tour-merge-conflict}
  11.127 +  \caption{Conflicting changes to a document}
  11.128 +  \label{fig:tour-merge:conflict}
  11.129 +\end{figure}
  11.130 +
  11.131 +Figure~\ref{fig:tour-merge:conflict} illustrates an instance of two
  11.132 +conflicting changes to a document.  We started with a single version
  11.133 +of the file; then we made some changes; while someone else made
  11.134 +different changes to the same text.  Our task in resolving the
  11.135 +conflicting changes is to decide what the file should look like.
  11.136 +
  11.137 +Mercurial doesn't have a built-in facility for handling conflicts.
  11.138 +Instead, it runs an external program called \command{hgmerge}.  This
  11.139 +is a shell script that is bundled with Mercurial; you can change it to
  11.140 +behave however you please.  What it does by default is try to find one
  11.141 +of several different merging tools that are likely to be installed on
  11.142 +your system.  It first tries a few fully automatic merging tools; if
  11.143 +these don't succeed (because the resolution process requires human
  11.144 +guidance) or aren't present, the script tries a few different
  11.145 +graphical merging tools.
  11.146 +
  11.147 +It's also possible to get Mercurial to run another program or script
  11.148 +instead of \command{hgmerge}, by setting the \envar{HGMERGE}
  11.149 +environment variable to the name of your preferred program.
  11.150 +
  11.151 +\subsection{Using a graphical merge tool}
  11.152 +
  11.153 +My preferred graphical merge tool is \command{kdiff3}, which I'll use
  11.154 +to describe the features that are common to graphical file merging
  11.155 +tools.  You can see a screenshot of \command{kdiff3} in action in
  11.156 +figure~\ref{fig:tour-merge:kdiff3}.  The kind of merge it is
  11.157 +performing is called a \emph{three-way merge}, because there are three
  11.158 +different versions of the file of interest to us.  The tool thus
  11.159 +splits the upper portion of the window into three panes:
  11.160 +\begin{itemize}
  11.161 +\item At the left is the \emph{base} version of the file, i.e.~the
  11.162 +  most recent version from which the two versions we're trying to
  11.163 +  merge are descended.
  11.164 +\item In the middle is ``our'' version of the file, with the contents
  11.165 +  that we modified.
  11.166 +\item On the right is ``their'' version of the file, the one that
  11.167 +  from the changeset that we're trying to merge with.
  11.168 +\end{itemize}
  11.169 +In the pane below these is the current \emph{result} of the merge.
  11.170 +Our task is to replace all of the red text, which indicates unresolved
  11.171 +conflicts, with some sensible merger of the ``ours'' and ``theirs''
  11.172 +versions of the file.
  11.173 +
  11.174 +All four of these panes are \emph{locked together}; if we scroll
  11.175 +vertically or horizontally in any of them, the others are updated to
  11.176 +display the corresponding sections of their respective files.
  11.177 +
  11.178 +\begin{figure}[ht]
  11.179 +  \centering
  11.180 +  \grafix{kdiff3}
  11.181 +  \caption{Using \command{kdiff3} to merge versions of a file}
  11.182 +  \label{fig:tour-merge:kdiff3}
  11.183 +\end{figure}
  11.184 +
  11.185 +For each conflicting portion of the file, we can choose to resolve
  11.186 +the conflict using some combination of text from the base version,
  11.187 +ours, or theirs.  We can also manually edit the merged file at any
  11.188 +time, in case we need to make further modifications.
  11.189 +
  11.190 +There are \emph{many} file merging tools available, too many to cover
  11.191 +here.  They vary in which platforms they are available for, and in
  11.192 +their particular strengths and weaknesses.  Most are tuned for merging
  11.193 +files containing plain text, while a few are aimed at specialised file
  11.194 +formats (generally XML).
  11.195 +
  11.196 +\subsection{A worked example}
  11.197 +
  11.198 +In this example, we will reproduce the file modification history of
  11.199 +figure~\ref{fig:tour-merge:conflict} above.  Let's begin by creating a
  11.200 +repository with a base version of our document.
  11.201 +\interaction{tour-merge-conflict.wife}
  11.202 +We'll clone the repository and make a change to the file.
  11.203 +\interaction{tour-merge-conflict.cousin}
  11.204 +And another clone, to simulate someone else making a change to the
  11.205 +file.  (This hints at the idea that it's not all that unusual to merge
  11.206 +with yourself when you isolate tasks in separate repositories, and
  11.207 +indeed to find and resolve conflicts while doing so.)
  11.208 +\interaction{tour-merge-conflict.son}
  11.209 +Having created two different versions of the file, we'll set up an
  11.210 +environment suitable for running our merge.
  11.211 +\interaction{tour-merge-conflict.pull}
  11.212 +
  11.213 +In this example, I won't use Mercurial's normal \command{hgmerge}
  11.214 +program to do the merge, because it would drop my nice automated
  11.215 +example-running tool into a graphical user interface.  Instead, I'll
  11.216 +set \envar{HGMERGE} to tell Mercurial to use the non-interactive
  11.217 +\command{merge} command.  This is bundled with many Unix-like systems.
  11.218 +If you're following this example on your computer, don't bother
  11.219 +setting \envar{HGMERGE}.
  11.220 +
  11.221 +\textbf{XXX FIX THIS EXAMPLE.}
  11.222 +
  11.223 +\interaction{tour-merge-conflict.merge}
  11.224 +Because \command{merge} can't resolve the conflicting changes, it
  11.225 +leaves \emph{merge markers} inside the file that has conflicts,
  11.226 +indicating which lines have conflicts, and whether they came from our
  11.227 +version of the file or theirs.
  11.228 +
  11.229 +Mercurial can tell from the way \command{merge} exits that it wasn't
  11.230 +able to merge successfully, so it tells us what commands we'll need to
  11.231 +run if we want to redo the merging operation.  This could be useful
  11.232 +if, for example, we were running a graphical merge tool and quit
  11.233 +because we were confused or realised we had made a mistake.
  11.234 +
  11.235 +If automatic or manual merges fail, there's nothing to prevent us from
  11.236 +``fixing up'' the affected files ourselves, and committing the results
  11.237 +of our merge:
  11.238 +\interaction{tour-merge-conflict.commit}
  11.239 +
  11.240 +\section{Simplifying the pull-merge-commit sequence}
  11.241 +\label{sec:tour-merge:fetch}
  11.242 +
  11.243 +The process of merging changes as outlined above is straightforward,
  11.244 +but requires running three commands in sequence.
  11.245 +\begin{codesample2}
  11.246 +  hg pull
  11.247 +  hg merge
  11.248 +  hg commit -m 'Merged remote changes'
  11.249 +\end{codesample2}
  11.250 +In the case of the final commit, you also need to enter a commit
  11.251 +message, which is almost always going to be a piece of uninteresting
  11.252 +``boilerplate'' text.
  11.253 +
  11.254 +It would be nice to reduce the number of steps needed, if this were
  11.255 +possible.  Indeed, Mercurial is distributed with an extension called
  11.256 +\hgext{fetch} that does just this.
  11.257 +
  11.258 +Mercurial provides a flexible extension mechanism that lets people
  11.259 +extend its functionality, while keeping the core of Mercurial small
  11.260 +and easy to deal with.  Some extensions add new commands that you can
  11.261 +use from the command line, while others work ``behind the scenes,''
  11.262 +for example adding capabilities to the server.
  11.263 +
  11.264 +The \hgext{fetch} extension adds a new command called, not
  11.265 +surprisingly, \hgcmd{fetch}.  This extension acts as a combination of
  11.266 +\hgcmd{pull}, \hgcmd{update} and \hgcmd{merge}.  It begins by pulling
  11.267 +changes from another repository into the current repository.  If it
  11.268 +finds that the changes added a new head to the repository, it begins a
  11.269 +merge, then commits the result of the merge with an
  11.270 +automatically-generated commit message.  If no new heads were added,
  11.271 +it updates the working directory to the new tip changeset.
  11.272 +
  11.273 +Enabling the \hgext{fetch} extension is easy.  Edit your
  11.274 +\sfilename{.hgrc}, and either go to the \rcsection{extensions} section
  11.275 +or create an \rcsection{extensions} section.  Then add a line that
  11.276 +simply reads ``\Verb+fetch +''.
  11.277 +\begin{codesample2}
  11.278 +  [extensions]
  11.279 +  fetch =
  11.280 +\end{codesample2}
  11.281 +(Normally, on the right-hand side of the ``\texttt{=}'' would appear
  11.282 +the location of the extension, but since the \hgext{fetch} extension
  11.283 +is in the standard distribution, Mercurial knows where to search for
  11.284 +it.)
  11.285 +
  11.286 +%%% Local Variables: 
  11.287 +%%% mode: latex
  11.288 +%%% TeX-master: "00book"
  11.289 +%%% End: 

    12.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    12.2 +++ b/en/ch04-concepts.tex	Thu Jan 29 22:56:27 2009 -0800
    12.3 @@ -0,0 +1,577 @@
    12.4 +\chapter{Behind the scenes}
    12.5 +\label{chap:concepts}
    12.6 +
    12.7 +Unlike many revision control systems, the concepts upon which
    12.8 +Mercurial is built are simple enough that it's easy to understand how
    12.9 +the software really works.  Knowing this certainly isn't necessary,
   12.10 +but I find it useful to have a ``mental model'' of what's going on.
   12.11 +
   12.12 +This understanding gives me confidence that Mercurial has been
   12.13 +carefully designed to be both \emph{safe} and \emph{efficient}.  And
   12.14 +just as importantly, if it's easy for me to retain a good idea of what
   12.15 +the software is doing when I perform a revision control task, I'm less
   12.16 +likely to be surprised by its behaviour.
   12.17 +
   12.18 +In this chapter, we'll initially cover the core concepts behind
   12.19 +Mercurial's design, then continue to discuss some of the interesting
   12.20 +details of its implementation.
   12.21 +
   12.22 +\section{Mercurial's historical record}
   12.23 +
   12.24 +\subsection{Tracking the history of a single file}
   12.25 +
   12.26 +When Mercurial tracks modifications to a file, it stores the history
   12.27 +of that file in a metadata object called a \emph{filelog}.  Each entry
   12.28 +in the filelog contains enough information to reconstruct one revision
   12.29 +of the file that is being tracked.  Filelogs are stored as files in
   12.30 +the \sdirname{.hg/store/data} directory.  A filelog contains two kinds
   12.31 +of information: revision data, and an index to help Mercurial to find
   12.32 +a revision efficiently.
   12.33 +
   12.34 +A file that is large, or has a lot of history, has its filelog stored
   12.35 +in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}''
   12.36 +suffix) files.  For small files without much history, the revision
   12.37 +data and index are combined in a single ``\texttt{.i}'' file.  The
   12.38 +correspondence between a file in the working directory and the filelog
   12.39 +that tracks its history in the repository is illustrated in
   12.40 +figure~\ref{fig:concepts:filelog}.
   12.41 +
   12.42 +\begin{figure}[ht]
   12.43 +  \centering
   12.44 +  \grafix{filelog}
   12.45 +  \caption{Relationships between files in working directory and
   12.46 +    filelogs in repository}
   12.47 +  \label{fig:concepts:filelog}
   12.48 +\end{figure}
   12.49 +
   12.50 +\subsection{Managing tracked files}
   12.51 +
   12.52 +Mercurial uses a structure called a \emph{manifest} to collect
   12.53 +together information about the files that it tracks.  Each entry in
   12.54 +the manifest contains information about the files present in a single
   12.55 +changeset.  An entry records which files are present in the changeset,
   12.56 +the revision of each file, and a few other pieces of file metadata.
   12.57 +
   12.58 +\subsection{Recording changeset information}
   12.59 +
   12.60 +The \emph{changelog} contains information about each changeset.  Each
   12.61 +revision records who committed a change, the changeset comment, other
   12.62 +pieces of changeset-related information, and the revision of the
   12.63 +manifest to use.
   12.64 +
   12.65 +\subsection{Relationships between revisions}
   12.66 +
   12.67 +Within a changelog, a manifest, or a filelog, each revision stores a
   12.68 +pointer to its immediate parent (or to its two parents, if it's a
   12.69 +merge revision).  As I mentioned above, there are also relationships
   12.70 +between revisions \emph{across} these structures, and they are
   12.71 +hierarchical in nature.
   12.72 +
   12.73 +For every changeset in a repository, there is exactly one revision
   12.74 +stored in the changelog.  Each revision of the changelog contains a
   12.75 +pointer to a single revision of the manifest.  A revision of the
   12.76 +manifest stores a pointer to a single revision of each filelog tracked
   12.77 +when that changeset was created.  These relationships are illustrated
   12.78 +in figure~\ref{fig:concepts:metadata}.
   12.79 +
   12.80 +\begin{figure}[ht]
   12.81 +  \centering
   12.82 +  \grafix{metadata}
   12.83 +  \caption{Metadata relationships}
   12.84 +  \label{fig:concepts:metadata}
   12.85 +\end{figure}
   12.86 +
   12.87 +As the illustration shows, there is \emph{not} a ``one to one''
   12.88 +relationship between revisions in the changelog, manifest, or filelog.
   12.89 +If the manifest hasn't changed between two changesets, the changelog
   12.90 +entries for those changesets will point to the same revision of the
   12.91 +manifest.  If a file that Mercurial tracks hasn't changed between two
   12.92 +changesets, the entry for that file in the two revisions of the
   12.93 +manifest will point to the same revision of its filelog.
   12.94 +
   12.95 +\section{Safe, efficient storage}
   12.96 +
   12.97 +The underpinnings of changelogs, manifests, and filelogs are provided
   12.98 +by a single structure called the \emph{revlog}.
   12.99 +
  12.100 +\subsection{Efficient storage}
  12.101 +
  12.102 +The revlog provides efficient storage of revisions using a
  12.103 +\emph{delta} mechanism.  Instead of storing a complete copy of a file
  12.104 +for each revision, it stores the changes needed to transform an older
  12.105 +revision into the new revision.  For many kinds of file data, these
  12.106 +deltas are typically a fraction of a percent of the size of a full
  12.107 +copy of a file.
  12.108 +
  12.109 +Some obsolete revision control systems can only work with deltas of
  12.110 +text files.  They must either store binary files as complete snapshots
  12.111 +or encoded into a text representation, both of which are wasteful
  12.112 +approaches.  Mercurial can efficiently handle deltas of files with
  12.113 +arbitrary binary contents; it doesn't need to treat text as special.
  12.114 +
  12.115 +\subsection{Safe operation}
  12.116 +\label{sec:concepts:txn}
  12.117 +
  12.118 +Mercurial only ever \emph{appends} data to the end of a revlog file.
  12.119 +It never modifies a section of a file after it has written it.  This
  12.120 +is both more robust and efficient than schemes that need to modify or
  12.121 +rewrite data.
  12.122 +
  12.123 +In addition, Mercurial treats every write as part of a
  12.124 +\emph{transaction} that can span a number of files.  A transaction is
  12.125 +\emph{atomic}: either the entire transaction succeeds and its effects
  12.126 +are all visible to readers in one go, or the whole thing is undone.
  12.127 +This guarantee of atomicity means that if you're running two copies of
  12.128 +Mercurial, where one is reading data and one is writing it, the reader
  12.129 +will never see a partially written result that might confuse it.
  12.130 +
  12.131 +The fact that Mercurial only appends to files makes it easier to
  12.132 +provide this transactional guarantee.  The easier it is to do stuff
  12.133 +like this, the more confident you should be that it's done correctly.
  12.134 +
  12.135 +\subsection{Fast retrieval}
  12.136 +
  12.137 +Mercurial cleverly avoids a pitfall common to all earlier
  12.138 +revision control systems: the problem of \emph{inefficient retrieval}.
  12.139 +Most revision control systems store the contents of a revision as an
  12.140 +incremental series of modifications against a ``snapshot''.  To
  12.141 +reconstruct a specific revision, you must first read the snapshot, and
  12.142 +then every one of the revisions between the snapshot and your target
  12.143 +revision.  The more history that a file accumulates, the more
  12.144 +revisions you must read, hence the longer it takes to reconstruct a
  12.145 +particular revision.
  12.146 +
  12.147 +\begin{figure}[ht]
  12.148 +  \centering
  12.149 +  \grafix{snapshot}
  12.150 +  \caption{Snapshot of a revlog, with incremental deltas}
  12.151 +  \label{fig:concepts:snapshot}
  12.152 +\end{figure}
  12.153 +
  12.154 +The innovation that Mercurial applies to this problem is simple but
  12.155 +effective.  Once the cumulative amount of delta information stored
  12.156 +since the last snapshot exceeds a fixed threshold, it stores a new
  12.157 +snapshot (compressed, of course), instead of another delta.  This
  12.158 +makes it possible to reconstruct \emph{any} revision of a file
  12.159 +quickly.  This approach works so well that it has since been copied by
  12.160 +several other revision control systems.
  12.161 +
  12.162 +Figure~\ref{fig:concepts:snapshot} illustrates the idea.  In an entry
  12.163 +in a revlog's index file, Mercurial stores the range of entries from
  12.164 +the data file that it must read to reconstruct a particular revision.
  12.165 +
  12.166 +\subsubsection{Aside: the influence of video compression}
  12.167 +
  12.168 +If you're familiar with video compression or have ever watched a TV
  12.169 +feed through a digital cable or satellite service, you may know that
  12.170 +most video compression schemes store each frame of video as a delta
  12.171 +against its predecessor frame.  In addition, these schemes use
  12.172 +``lossy'' compression techniques to increase the compression ratio, so
  12.173 +visual errors accumulate over the course of a number of inter-frame
  12.174 +deltas.
  12.175 +
  12.176 +Because it's possible for a video stream to ``drop out'' occasionally
  12.177 +due to signal glitches, and to limit the accumulation of artefacts
  12.178 +introduced by the lossy compression process, video encoders
  12.179 +periodically insert a complete frame (called a ``key frame'') into the
  12.180 +video stream; the next delta is generated against that frame.  This
  12.181 +means that if the video signal gets interrupted, it will resume once
  12.182 +the next key frame is received.  Also, the accumulation of encoding
  12.183 +errors restarts anew with each key frame.
  12.184 +
  12.185 +\subsection{Identification and strong integrity}
  12.186 +
  12.187 +Along with delta or snapshot information, a revlog entry contains a
  12.188 +cryptographic hash of the data that it represents.  This makes it
  12.189 +difficult to forge the contents of a revision, and easy to detect
  12.190 +accidental corruption.  
  12.191 +
  12.192 +Hashes provide more than a mere check against corruption; they are
  12.193 +used as the identifiers for revisions.  The changeset identification
  12.194 +hashes that you see as an end user are from revisions of the
  12.195 +changelog.  Although filelogs and the manifest also use hashes,
  12.196 +Mercurial only uses these behind the scenes.
  12.197 +
  12.198 +Mercurial verifies that hashes are correct when it retrieves file
  12.199 +revisions and when it pulls changes from another repository.  If it
  12.200 +encounters an integrity problem, it will complain and stop whatever
  12.201 +it's doing.
  12.202 +
  12.203 +In addition to the effect it has on retrieval efficiency, Mercurial's
  12.204 +use of periodic snapshots makes it more robust against partial data
  12.205 +corruption.  If a revlog becomes partly corrupted due to a hardware
  12.206 +error or system bug, it's often possible to reconstruct some or most
  12.207 +revisions from the uncorrupted sections of the revlog, both before and
  12.208 +after the corrupted section.  This would not be possible with a
  12.209 +delta-only storage model.
  12.210 +
  12.211 +\section{Revision history, branching,
  12.212 +  and merging}
  12.213 +
  12.214 +Every entry in a Mercurial revlog knows the identity of its immediate
  12.215 +ancestor revision, usually referred to as its \emph{parent}.  In fact,
  12.216 +a revision contains room for not one parent, but two.  Mercurial uses
  12.217 +a special hash, called the ``null ID'', to represent the idea ``there
  12.218 +is no parent here''.  This hash is simply a string of zeroes.
  12.219 +
  12.220 +In figure~\ref{fig:concepts:revlog}, you can see an example of the
  12.221 +conceptual structure of a revlog.  Filelogs, manifests, and changelogs
  12.222 +all have this same structure; they differ only in the kind of data
  12.223 +stored in each delta or snapshot.
  12.224 +
  12.225 +The first revision in a revlog (at the bottom of the image) has the
  12.226 +null ID in both of its parent slots.  For a ``normal'' revision, its
  12.227 +first parent slot contains the ID of its parent revision, and its
  12.228 +second contains the null ID, indicating that the revision has only one
  12.229 +real parent.  Any two revisions that have the same parent ID are
  12.230 +branches.  A revision that represents a merge between branches has two
  12.231 +normal revision IDs in its parent slots.
  12.232 +
  12.233 +\begin{figure}[ht]
  12.234 +  \centering
  12.235 +  \grafix{revlog}
  12.236 +  \caption{}
  12.237 +  \label{fig:concepts:revlog}
  12.238 +\end{figure}
  12.239 +
  12.240 +\section{The working directory}
  12.241 +
  12.242 +In the working directory, Mercurial stores a snapshot of the files
  12.243 +from the repository as of a particular changeset.
  12.244 +
  12.245 +The working directory ``knows'' which changeset it contains.  When you
  12.246 +update the working directory to contain a particular changeset,
  12.247 +Mercurial looks up the appropriate revision of the manifest to find
  12.248 +out which files it was tracking at the time that changeset was
  12.249 +committed, and which revision of each file was then current.  It then
  12.250 +recreates a copy of each of those files, with the same contents it had
  12.251 +when the changeset was committed.
  12.252 +
  12.253 +The \emph{dirstate} contains Mercurial's knowledge of the working
  12.254 +directory.  This details which changeset the working directory is
  12.255 +updated to, and all of the files that Mercurial is tracking in the
  12.256 +working directory.
  12.257 +
  12.258 +Just as a revision of a revlog has room for two parents, so that it
  12.259 +can represent either a normal revision (with one parent) or a merge of
  12.260 +two earlier revisions, the dirstate has slots for two parents.  When
  12.261 +you use the \hgcmd{update} command, the changeset that you update to
  12.262 +is stored in the ``first parent'' slot, and the null ID in the second.
  12.263 +When you \hgcmd{merge} with another changeset, the first parent
  12.264 +remains unchanged, and the second parent is filled in with the
  12.265 +changeset you're merging with.  The \hgcmd{parents} command tells you
  12.266 +what the parents of the dirstate are.
  12.267 +
  12.268 +\subsection{What happens when you commit}
  12.269 +
  12.270 +The dirstate stores parent information for more than just book-keeping
  12.271 +purposes.  Mercurial uses the parents of the dirstate as \emph{the
  12.272 +  parents of a new changeset} when you perform a commit.
  12.273 +
  12.274 +\begin{figure}[ht]
  12.275 +  \centering
  12.276 +  \grafix{wdir}
  12.277 +  \caption{The working directory can have two parents}
  12.278 +  \label{fig:concepts:wdir}
  12.279 +\end{figure}
  12.280 +
  12.281 +Figure~\ref{fig:concepts:wdir} shows the normal state of the working
  12.282 +directory, where it has a single changeset as parent.  That changeset
  12.283 +is the \emph{tip}, the newest changeset in the repository that has no
  12.284 +children.
  12.285 +
  12.286 +\begin{figure}[ht]
  12.287 +  \centering
  12.288 +  \grafix{wdir-after-commit}
  12.289 +  \caption{The working directory gains new parents after a commit}
  12.290 +  \label{fig:concepts:wdir-after-commit}
  12.291 +\end{figure}
  12.292 +
  12.293 +It's useful to think of the working directory as ``the changeset I'm
  12.294 +about to commit''.  Any files that you tell Mercurial that you've
  12.295 +added, removed, renamed, or copied will be reflected in that
  12.296 +changeset, as will modifications to any files that Mercurial is
  12.297 +already tracking; the new changeset will have the parents of the
  12.298 +working directory as its parents.
  12.299 +
  12.300 +After a commit, Mercurial will update the parents of the working
  12.301 +directory, so that the first parent is the ID of the new changeset,
  12.302 +and the second is the null ID.  This is shown in
  12.303 +figure~\ref{fig:concepts:wdir-after-commit}.  Mercurial doesn't touch
  12.304 +any of the files in the working directory when you commit; it just
  12.305 +modifies the dirstate to note its new parents.
  12.306 +
  12.307 +\subsection{Creating a new head}
  12.308 +
  12.309 +It's perfectly normal to update the working directory to a changeset
  12.310 +other than the current tip.  For example, you might want to know what
  12.311 +your project looked like last Tuesday, or you could be looking through
  12.312 +changesets to see which one introduced a bug.  In cases like this, the
  12.313 +natural thing to do is update the working directory to the changeset
  12.314 +you're interested in, and then examine the files in the working
  12.315 +directory directly to see their contents as they were when you
  12.316 +committed that changeset.  The effect of this is shown in
  12.317 +figure~\ref{fig:concepts:wdir-pre-branch}.
  12.318 +
  12.319 +\begin{figure}[ht]
  12.320 +  \centering
  12.321 +  \grafix{wdir-pre-branch}
  12.322 +  \caption{The working directory, updated to an older changeset}
  12.323 +  \label{fig:concepts:wdir-pre-branch}
  12.324 +\end{figure}
  12.325 +
  12.326 +Having updated the working directory to an older changeset, what
  12.327 +happens if you make some changes, and then commit?  Mercurial behaves
  12.328 +in the same way as I outlined above.  The parents of the working
  12.329 +directory become the parents of the new changeset.  This new changeset
  12.330 +has no children, so it becomes the new tip.  And the repository now
  12.331 +contains two changesets that have no children; we call these
  12.332 +\emph{heads}.  You can see the structure that this creates in
  12.333 +figure~\ref{fig:concepts:wdir-branch}.
  12.334 +
  12.335 +\begin{figure}[ht]
  12.336 +  \centering
  12.337 +  \grafix{wdir-branch}
  12.338 +  \caption{After a commit made while synced to an older changeset}
  12.339 +  \label{fig:concepts:wdir-branch}
  12.340 +\end{figure}
  12.341 +
  12.342 +\begin{note}
  12.343 +  If you're new to Mercurial, you should keep in mind a common
  12.344 +  ``error'', which is to use the \hgcmd{pull} command without any
  12.345 +  options.  By default, the \hgcmd{pull} command \emph{does not}
  12.346 +  update the working directory, so you'll bring new changesets into
  12.347 +  your repository, but the working directory will stay synced at the
  12.348 +  same changeset as before the pull.  If you make some changes and
  12.349 +  commit afterwards, you'll thus create a new head, because your
  12.350 +  working directory isn't synced to whatever the current tip is.
  12.351 +
  12.352 +  I put the word ``error'' in quotes because all that you need to do
  12.353 +  to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}.  In
  12.354 +  other words, this almost never has negative consequences; it just
  12.355 +  surprises people.  I'll discuss other ways to avoid this behaviour,
  12.356 +  and why Mercurial behaves in this initially surprising way, later
  12.357 +  on.
  12.358 +\end{note}
  12.359 +
  12.360 +\subsection{Merging heads}
  12.361 +
  12.362 +When you run the \hgcmd{merge} command, Mercurial leaves the first
  12.363 +parent of the working directory unchanged, and sets the second parent
  12.364 +to the changeset you're merging with, as shown in
  12.365 +figure~\ref{fig:concepts:wdir-merge}.
  12.366 +
  12.367 +\begin{figure}[ht]
  12.368 +  \centering
  12.369 +  \grafix{wdir-merge}
  12.370 +  \caption{Merging two heads}
  12.371 +  \label{fig:concepts:wdir-merge}
  12.372 +\end{figure}
  12.373 +
  12.374 +Mercurial also has to modify the working directory, to merge the files
  12.375 +managed in the two changesets.  Simplified a little, the merging
  12.376 +process goes like this, for every file in the manifests of both
  12.377 +changesets.
  12.378 +\begin{itemize}
  12.379 +\item If neither changeset has modified a file, do nothing with that
  12.380 +  file.
  12.381 +\item If one changeset has modified a file, and the other hasn't,
  12.382 +  create the modified copy of the file in the working directory.
  12.383 +\item If one changeset has removed a file, and the other hasn't (or
  12.384 +  has also deleted it), delete the file from the working directory.
  12.385 +\item If one changeset has removed a file, but the other has modified
  12.386 +  the file, ask the user what to do: keep the modified file, or remove
  12.387 +  it?
  12.388 +\item If both changesets have modified a file, invoke an external
  12.389 +  merge program to choose the new contents for the merged file.  This
  12.390 +  may require input from the user.
  12.391 +\item If one changeset has modified a file, and the other has renamed
  12.392 +  or copied the file, make sure that the changes follow the new name
  12.393 +  of the file.
  12.394 +\end{itemize}
  12.395 +There are more details---merging has plenty of corner cases---but
  12.396 +these are the most common choices that are involved in a merge.  As
  12.397 +you can see, most cases are completely automatic, and indeed most
  12.398 +merges finish automatically, without requiring your input to resolve
  12.399 +any conflicts.
  12.400 +
  12.401 +When you're thinking about what happens when you commit after a merge,
  12.402 +once again the working directory is ``the changeset I'm about to
  12.403 +commit''.  After the \hgcmd{merge} command completes, the working
  12.404 +directory has two parents; these will become the parents of the new
  12.405 +changeset.
  12.406 +
  12.407 +Mercurial lets you perform multiple merges, but you must commit the
  12.408 +results of each individual merge as you go.  This is necessary because
  12.409 +Mercurial only tracks two parents for both revisions and the working
  12.410 +directory.  While it would be technically possible to merge multiple
  12.411 +changesets at once, the prospect of user confusion and making a
  12.412 +terrible mess of a merge immediately becomes overwhelming.
  12.413 +
  12.414 +\section{Other interesting design features}
  12.415 +
  12.416 +In the sections above, I've tried to highlight some of the most
  12.417 +important aspects of Mercurial's design, to illustrate that it pays
  12.418 +careful attention to reliability and performance.  However, the
  12.419 +attention to detail doesn't stop there.  There are a number of other
  12.420 +aspects of Mercurial's construction that I personally find
  12.421 +interesting.  I'll detail a few of them here, separate from the ``big
  12.422 +ticket'' items above, so that if you're interested, you can gain a
  12.423 +better idea of the amount of thinking that goes into a well-designed
  12.424 +system.
  12.425 +
  12.426 +\subsection{Clever compression}
  12.427 +
  12.428 +When appropriate, Mercurial will store both snapshots and deltas in
  12.429 +compressed form.  It does this by always \emph{trying to} compress a
  12.430 +snapshot or delta, but only storing the compressed version if it's
  12.431 +smaller than the uncompressed version.
  12.432 +
  12.433 +This means that Mercurial does ``the right thing'' when storing a file
  12.434 +whose native form is compressed, such as a \texttt{zip} archive or a
  12.435 +JPEG image.  When these types of files are compressed a second time,
  12.436 +the resulting file is usually bigger than the once-compressed form,
  12.437 +and so Mercurial will store the plain \texttt{zip} or JPEG.
  12.438 +
  12.439 +Deltas between revisions of a compressed file are usually larger than
  12.440 +snapshots of the file, and Mercurial again does ``the right thing'' in
  12.441 +these cases.  It finds that such a delta exceeds the threshold at
  12.442 +which it should store a complete snapshot of the file, so it stores
  12.443 +the snapshot, again saving space compared to a naive delta-only
  12.444 +approach.
  12.445 +
  12.446 +\subsubsection{Network recompression}
  12.447 +
  12.448 +When storing revisions on disk, Mercurial uses the ``deflate''
  12.449 +compression algorithm (the same one used by the popular \texttt{zip}
  12.450 +archive format), which balances good speed with a respectable
  12.451 +compression ratio.  However, when transmitting revision data over a
  12.452 +network connection, Mercurial uncompresses the compressed revision
  12.453 +data.
  12.454 +
  12.455 +If the connection is over HTTP, Mercurial recompresses the entire
  12.456 +stream of data using a compression algorithm that gives a better
  12.457 +compression ratio (the Burrows-Wheeler algorithm from the widely used
  12.458 +\texttt{bzip2} compression package).  This combination of algorithm
  12.459 +and compression of the entire stream (instead of a revision at a time)
  12.460 +substantially reduces the number of bytes to be transferred, yielding
  12.461 +better network performance over almost all kinds of network.
  12.462 +
  12.463 +(If the connection is over \command{ssh}, Mercurial \emph{doesn't}
  12.464 +recompress the stream, because \command{ssh} can already do this
  12.465 +itself.)
  12.466 +
  12.467 +\subsection{Read/write ordering and atomicity}
  12.468 +
  12.469 +Appending to files isn't the whole story when it comes to guaranteeing
  12.470 +that a reader won't see a partial write.  If you recall
  12.471 +figure~\ref{fig:concepts:metadata}, revisions in the changelog point to
  12.472 +revisions in the manifest, and revisions in the manifest point to
  12.473 +revisions in filelogs.  This hierarchy is deliberate.
  12.474 +
  12.475 +A writer starts a transaction by writing filelog and manifest data,
  12.476 +and doesn't write any changelog data until those are finished.  A
  12.477 +reader starts by reading changelog data, then manifest data, followed
  12.478 +by filelog data.
  12.479 +
  12.480 +Since the writer has always finished writing filelog and manifest data
  12.481 +before it writes to the changelog, a reader will never read a pointer
  12.482 +to a partially written manifest revision from the changelog, and it will
  12.483 +never read a pointer to a partially written filelog revision from the
  12.484 +manifest.
  12.485 +
  12.486 +\subsection{Concurrent access}
  12.487 +
  12.488 +The read/write ordering and atomicity guarantees mean that Mercurial
  12.489 +never needs to \emph{lock} a repository when it's reading data, even
  12.490 +if the repository is being written to while the read is occurring.
  12.491 +This has a big effect on scalability; you can have an arbitrary number
  12.492 +of Mercurial processes safely reading data from a repository safely
  12.493 +all at once, no matter whether it's being written to or not.
  12.494 +
  12.495 +The lockless nature of reading means that if you're sharing a
  12.496 +repository on a multi-user system, you don't need to grant other local
  12.497 +users permission to \emph{write} to your repository in order for them
  12.498 +to be able to clone it or pull changes from it; they only need
  12.499 +\emph{read} permission.  (This is \emph{not} a common feature among
  12.500 +revision control systems, so don't take it for granted!  Most require
  12.501 +readers to be able to lock a repository to access it safely, and this
  12.502 +requires write permission on at least one directory, which of course
  12.503 +makes for all kinds of nasty and annoying security and administrative
  12.504 +problems.)
  12.505 +
  12.506 +Mercurial uses locks to ensure that only one process can write to a
  12.507 +repository at a time (the locking mechanism is safe even over
  12.508 +filesystems that are notoriously hostile to locking, such as NFS).  If
  12.509 +a repository is locked, a writer will wait for a while to retry if the
  12.510 +repository becomes unlocked, but if the repository remains locked for
  12.511 +too long, the process attempting to write will time out after a while.
  12.512 +This means that your daily automated scripts won't get stuck forever
  12.513 +and pile up if a system crashes unnoticed, for example.  (Yes, the
  12.514 +timeout is configurable, from zero to infinity.)
  12.515 +
  12.516 +\subsubsection{Safe dirstate access}
  12.517 +
  12.518 +As with revision data, Mercurial doesn't take a lock to read the
  12.519 +dirstate file; it does acquire a lock to write it.  To avoid the
  12.520 +possibility of reading a partially written copy of the dirstate file,
  12.521 +Mercurial writes to a file with a unique name in the same directory as
  12.522 +the dirstate file, then renames the temporary file atomically to
  12.523 +\filename{dirstate}.  The file named \filename{dirstate} is thus
  12.524 +guaranteed to be complete, not partially written.
  12.525 +
  12.526 +\subsection{Avoiding seeks}
  12.527 +
  12.528 +Critical to Mercurial's performance is the avoidance of seeks of the
  12.529 +disk head, since any seek is far more expensive than even a
  12.530 +comparatively large read operation.
  12.531 +
  12.532 +This is why, for example, the dirstate is stored in a single file.  If
  12.533 +there were a dirstate file per directory that Mercurial tracked, the
  12.534 +disk would seek once per directory.  Instead, Mercurial reads the
  12.535 +entire single dirstate file in one step.
  12.536 +
  12.537 +Mercurial also uses a ``copy on write'' scheme when cloning a
  12.538 +repository on local storage.  Instead of copying every revlog file
  12.539 +from the old repository into the new repository, it makes a ``hard
  12.540 +link'', which is a shorthand way to say ``these two names point to the
  12.541 +same file''.  When Mercurial is about to write to one of a revlog's
  12.542 +files, it checks to see if the number of names pointing at the file is
  12.543 +greater than one.  If it is, more than one repository is using the
  12.544 +file, so Mercurial makes a new copy of the file that is private to
  12.545 +this repository.
  12.546 +
  12.547 +A few revision control developers have pointed out that this idea of
  12.548 +making a complete private copy of a file is not very efficient in its
  12.549 +use of storage.  While this is true, storage is cheap, and this method
  12.550 +gives the highest performance while deferring most book-keeping to the
  12.551 +operating system.  An alternative scheme would most likely reduce
  12.552 +performance and increase the complexity of the software, each of which
  12.553 +is much more important to the ``feel'' of day-to-day use.
  12.554 +
  12.555 +\subsection{Other contents of the dirstate}
  12.556 +
  12.557 +Because Mercurial doesn't force you to tell it when you're modifying a
  12.558 +file, it uses the dirstate to store some extra information so it can
  12.559 +determine efficiently whether you have modified a file.  For each file
  12.560 +in the working directory, it stores the time that it last modified the
  12.561 +file itself, and the size of the file at that time.  
  12.562 +
  12.563 +When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
  12.564 +\hgcmd{copy} files, Mercurial updates the dirstate so that it knows
  12.565 +what to do with those files when you commit.
  12.566 +
  12.567 +When Mercurial is checking the states of files in the working
  12.568 +directory, it first checks a file's modification time.  If that has
  12.569 +not changed, the file must not have been modified.  If the file's size
  12.570 +has changed, the file must have been modified.  If the modification
  12.571 +time has changed, but the size has not, only then does Mercurial need
  12.572 +to read the actual contents of the file to see if they've changed.
  12.573 +Storing these few extra pieces of information dramatically reduces the
  12.574 +amount of data that Mercurial needs to read, which yields large
  12.575 +performance improvements compared to other revision control systems.
  12.576 +
  12.577 +%%% Local Variables: 
  12.578 +%%% mode: latex
  12.579 +%%% TeX-master: "00book"
  12.580 +%%% End:

    13.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    13.2 +++ b/en/ch05-daily.tex	Thu Jan 29 22:56:27 2009 -0800
    13.3 @@ -0,0 +1,381 @@
    13.4 +\chapter{Mercurial in daily use}
    13.5 +\label{chap:daily}
    13.6 +
    13.7 +\section{Telling Mercurial which files to track}
    13.8 +
    13.9 +Mercurial does not work with files in your repository unless you tell
   13.10 +it to manage them.  The \hgcmd{status} command will tell you which
   13.11 +files Mercurial doesn't know about; it uses a ``\texttt{?}'' to
   13.12 +display such files.
   13.13 +
   13.14 +To tell Mercurial to track a file, use the \hgcmd{add} command.  Once
   13.15 +you have added a file, the entry in the output of \hgcmd{status} for
   13.16 +that file changes from ``\texttt{?}'' to ``\texttt{A}''.
   13.17 +\interaction{daily.files.add}
   13.18 +
   13.19 +After you run a \hgcmd{commit}, the files that you added before the
   13.20 +commit will no longer be listed in the output of \hgcmd{status}.  The
   13.21 +reason for this is that \hgcmd{status} only tells you about
   13.22 +``interesting'' files---those that you have modified or told Mercurial
   13.23 +to do something with---by default.  If you have a repository that
   13.24 +contains thousands of files, you will rarely want to know about files
   13.25 +that Mercurial is tracking, but that have not changed.  (You can still
   13.26 +get this information; we'll return to this later.)
   13.27 +
   13.28 +Once you add a file, Mercurial doesn't do anything with it
   13.29 +immediately.  Instead, it will take a snapshot of the file's state the
   13.30 +next time you perform a commit.  It will then continue to track the
   13.31 +changes you make to the file every time you commit, until you remove
   13.32 +the file.
   13.33 +
   13.34 +\subsection{Explicit versus implicit file naming}
   13.35 +
   13.36 +A useful behaviour that Mercurial has is that if you pass the name of
   13.37 +a directory to a command, every Mercurial command will treat this as
   13.38 +``I want to operate on every file in this directory and its
   13.39 +subdirectories''.
   13.40 +\interaction{daily.files.add-dir}
   13.41 +Notice in this example that Mercurial printed the names of the files
   13.42 +it added, whereas it didn't do so when we added the file named
   13.43 +\filename{a} in the earlier example.
   13.44 +
   13.45 +What's going on is that in the former case, we explicitly named the
   13.46 +file to add on the command line, so the assumption that Mercurial
   13.47 +makes in such cases is that you know what you were doing, and it
   13.48 +doesn't print any output.
   13.49 +
   13.50 +However, when we \emph{imply} the names of files by giving the name of
   13.51 +a directory, Mercurial takes the extra step of printing the name of
   13.52 +each file that it does something with.  This makes it more clear what
   13.53 +is happening, and reduces the likelihood of a silent and nasty
   13.54 +surprise.  This behaviour is common to most Mercurial commands.
   13.55 +
   13.56 +\subsection{Aside: Mercurial tracks files, not directories}
   13.57 +
   13.58 +Mercurial does not track directory information.  Instead, it tracks
   13.59 +the path to a file.  Before creating a file, it first creates any
   13.60 +missing directory components of the path.  After it deletes a file, it
   13.61 +then deletes any empty directories that were in the deleted file's
   13.62 +path.  This sounds like a trivial distinction, but it has one minor
   13.63 +practical consequence: it is not possible to represent a completely
   13.64 +empty directory in Mercurial.
   13.65 +
   13.66 +Empty directories are rarely useful, and there are unintrusive
   13.67 +workarounds that you can use to achieve an appropriate effect.  The
   13.68 +developers of Mercurial thus felt that the complexity that would be
   13.69 +required to manage empty directories was not worth the limited benefit
   13.70 +this feature would bring.
   13.71 +
   13.72 +If you need an empty directory in your repository, there are a few
   13.73 +ways to achieve this. One is to create a directory, then \hgcmd{add} a
   13.74 +``hidden'' file to that directory.  On Unix-like systems, any file
   13.75 +name that begins with a period (``\texttt{.}'') is treated as hidden
   13.76 +by most commands and GUI tools.  This approach is illustrated in
   13.77 +figure~\ref{ex:daily:hidden}.
   13.78 +
   13.79 +\begin{figure}[ht]
   13.80 +  \interaction{daily.files.hidden}
   13.81 +  \caption{Simulating an empty directory using a hidden file}
   13.82 +  \label{ex:daily:hidden}
   13.83 +\end{figure}
   13.84 +
   13.85 +Another way to tackle a need for an empty directory is to simply
   13.86 +create one in your automated build scripts before they will need it.
   13.87 +
   13.88 +\section{How to stop tracking a file}
   13.89 +
   13.90 +Once you decide that a file no longer belongs in your repository, use
   13.91 +the \hgcmd{remove} command; this deletes the file, and tells Mercurial
   13.92 +to stop tracking it.  A removed file is represented in the output of
   13.93 +\hgcmd{status} with a ``\texttt{R}''.
   13.94 +\interaction{daily.files.remove}
   13.95 +
   13.96 +After you \hgcmd{remove} a file, Mercurial will no longer track
   13.97 +changes to that file, even if you recreate a file with the same name
   13.98 +in your working directory.  If you do recreate a file with the same
   13.99 +name and want Mercurial to track the new file, simply \hgcmd{add} it.
  13.100 +Mercurial will know that the newly added file is not related to the
  13.101 +old file of the same name.
  13.102 +
  13.103 +\subsection{Removing a file does not affect its history}
  13.104 +
  13.105 +It is important to understand that removing a file has only two
  13.106 +effects.
  13.107 +\begin{itemize}
  13.108 +\item It removes the current version of the file from the working
  13.109 +  directory.
  13.110 +\item It stops Mercurial from tracking changes to the file, from the
  13.111 +  time of the next commit.
  13.112 +\end{itemize}
  13.113 +Removing a file \emph{does not} in any way alter the \emph{history} of
  13.114 +the file.
  13.115 +
  13.116 +If you update the working directory to a changeset in which a file
  13.117 +that you have removed was still tracked, it will reappear in the
  13.118 +working directory, with the contents it had when you committed that
  13.119 +changeset.  If you then update the working directory to a later
  13.120 +changeset, in which the file had been removed, Mercurial will once
  13.121 +again remove the file from the working directory.
  13.122 +
  13.123 +\subsection{Missing files}
  13.124 +
  13.125 +Mercurial considers a file that you have deleted, but not used
  13.126 +\hgcmd{remove} to delete, to be \emph{missing}.  A missing file is
  13.127 +represented with ``\texttt{!}'' in the output of \hgcmd{status}.
  13.128 +Mercurial commands will not generally do anything with missing files.
  13.129 +\interaction{daily.files.missing}
  13.130 +
  13.131 +If your repository contains a file that \hgcmd{status} reports as
  13.132 +missing, and you want the file to stay gone, you can run
  13.133 +\hgcmdargs{remove}{\hgopt{remove}{--after}} at any time later on, to
  13.134 +tell Mercurial that you really did mean to remove the file.
  13.135 +\interaction{daily.files.remove-after}
  13.136 +
  13.137 +On the other hand, if you deleted the missing file by accident, use
  13.138 +\hgcmdargs{revert}{\emph{filename}} to recover the file.  It will
  13.139 +reappear, in unmodified form.
  13.140 +\interaction{daily.files.recover-missing}
  13.141 +
  13.142 +\subsection{Aside: why tell Mercurial explicitly to 
  13.143 +  remove a file?}
  13.144 +
  13.145 +You might wonder why Mercurial requires you to explicitly tell it that
  13.146 +you are deleting a file.  Early during the development of Mercurial,
  13.147 +it let you delete a file however you pleased; Mercurial would notice
  13.148 +the absence of the file automatically when you next ran a
  13.149 +\hgcmd{commit}, and stop tracking the file.  In practice, this made it
  13.150 +too easy to accidentally remove a file without noticing.
  13.151 +
  13.152 +\subsection{Useful shorthand---adding and removing files
  13.153 +  in one step}
  13.154 +
  13.155 +Mercurial offers a combination command, \hgcmd{addremove}, that adds
  13.156 +untracked files and marks missing files as removed.  
  13.157 +\interaction{daily.files.addremove}
  13.158 +The \hgcmd{commit} command also provides a \hgopt{commit}{-A} option
  13.159 +that performs this same add-and-remove, immediately followed by a
  13.160 +commit.
  13.161 +\interaction{daily.files.commit-addremove}
  13.162 +
  13.163 +\section{Copying files}
  13.164 +
  13.165 +Mercurial provides a \hgcmd{copy} command that lets you make a new
  13.166 +copy of a file.  When you copy a file using this command, Mercurial
  13.167 +makes a record of the fact that the new file is a copy of the original
  13.168 +file.  It treats these copied files specially when you merge your work
  13.169 +with someone else's.
  13.170 +
  13.171 +\subsection{The results of copying during a merge}
  13.172 +
  13.173 +What happens during a merge is that changes ``follow'' a copy.  To
  13.174 +best illustrate what this means, let's create an example.  We'll start
  13.175 +with the usual tiny repository that contains a single file.
  13.176 +\interaction{daily.copy.init}
  13.177 +We need to do some work in parallel, so that we'll have something to
  13.178 +merge.  So let's clone our repository.
  13.179 +\interaction{daily.copy.clone}
  13.180 +Back in our initial repository, let's use the \hgcmd{copy} command to
  13.181 +make a copy of the first file we created.
  13.182 +\interaction{daily.copy.copy}
  13.183 +
  13.184 +If we look at the output of the \hgcmd{status} command afterwards, the
  13.185 +copied file looks just like a normal added file.
  13.186 +\interaction{daily.copy.status}
  13.187 +But if we pass the \hgopt{status}{-C} option to \hgcmd{status}, it
  13.188 +prints another line of output: this is the file that our newly-added
  13.189 +file was copied \emph{from}.
  13.190 +\interaction{daily.copy.status-copy}
  13.191 +
  13.192 +Now, back in the repository we cloned, let's make a change in
  13.193 +parallel.  We'll add a line of content to the original file that we
  13.194 +created.
  13.195 +\interaction{daily.copy.other}
  13.196 +Now we have a modified \filename{file} in this repository.  When we
  13.197 +pull the changes from the first repository, and merge the two heads,
  13.198 +Mercurial will propagate the changes that we made locally to
  13.199 +\filename{file} into its copy, \filename{new-file}.
  13.200 +\interaction{daily.copy.merge}
  13.201 +
  13.202 +\subsection{Why should changes follow copies?}
  13.203 +\label{sec:daily:why-copy}
  13.204 +
  13.205 +This behaviour, of changes to a file propagating out to copies of the
  13.206 +file, might seem esoteric, but in most cases it's highly desirable.
  13.207 +
  13.208 +First of all, remember that this propagation \emph{only} happens when
  13.209 +you merge.  So if you \hgcmd{copy} a file, and subsequently modify the
  13.210 +original file during the normal course of your work, nothing will
  13.211 +happen.
  13.212 +
  13.213 +The second thing to know is that modifications will only propagate
  13.214 +across a copy as long as the repository that you're pulling changes
  13.215 +from \emph{doesn't know} about the copy.
  13.216 +
  13.217 +The reason that Mercurial does this is as follows.  Let's say I make
  13.218 +an important bug fix in a source file, and commit my changes.
  13.219 +Meanwhile, you've decided to \hgcmd{copy} the file in your repository,
  13.220 +without knowing about the bug or having seen the fix, and you have
  13.221 +started hacking on your copy of the file.
  13.222 +
  13.223 +If you pulled and merged my changes, and Mercurial \emph{didn't}
  13.224 +propagate changes across copies, your source file would now contain
  13.225 +the bug, and unless you remembered to propagate the bug fix by hand,
  13.226 +the bug would \emph{remain} in your copy of the file.
  13.227 +
  13.228 +By automatically propagating the change that fixed the bug from the
  13.229 +original file to the copy, Mercurial prevents this class of problem.
  13.230 +To my knowledge, Mercurial is the \emph{only} revision control system
  13.231 +that propagates changes across copies like this.
  13.232 +
  13.233 +Once your change history has a record that the copy and subsequent
  13.234 +merge occurred, there's usually no further need to propagate changes
  13.235 +from the original file to the copied file, and that's why Mercurial
  13.236 +only propagates changes across copies until this point, and no
  13.237 +further.
  13.238 +
  13.239 +\subsection{How to make changes \emph{not} follow a copy}
  13.240 +
  13.241 +If, for some reason, you decide that this business of automatically
  13.242 +propagating changes across copies is not for you, simply use your
  13.243 +system's normal file copy command (on Unix-like systems, that's
  13.244 +\command{cp}) to make a copy of a file, then \hgcmd{add} the new copy
  13.245 +by hand.  Before you do so, though, please do reread
  13.246 +section~\ref{sec:daily:why-copy}, and make an informed decision that
  13.247 +this behaviour is not appropriate to your specific case.
  13.248 +
  13.249 +\subsection{Behaviour of the \hgcmd{copy} command}
  13.250 +
  13.251 +When you use the \hgcmd{copy} command, Mercurial makes a copy of each
  13.252 +source file as it currently stands in the working directory.  This
  13.253 +means that if you make some modifications to a file, then \hgcmd{copy}
  13.254 +it without first having committed those changes, the new copy will
  13.255 +also contain the modifications you have made up until that point.  (I
  13.256 +find this behaviour a little counterintuitive, which is why I mention
  13.257 +it here.)
  13.258 +
  13.259 +The \hgcmd{copy} command acts similarly to the Unix \command{cp}
  13.260 +command (you can use the \hgcmd{cp} alias if you prefer).  The last
  13.261 +argument is the \emph{destination}, and all prior arguments are
  13.262 +\emph{sources}.  If you pass it a single file as the source, and the
  13.263 +destination does not exist, it creates a new file with that name.
  13.264 +\interaction{daily.copy.simple}
  13.265 +If the destination is a directory, Mercurial copies its sources into
  13.266 +that directory.
  13.267 +\interaction{daily.copy.dir-dest}
  13.268 +Copying a directory is recursive, and preserves the directory
  13.269 +structure of the source.
  13.270 +\interaction{daily.copy.dir-src}
  13.271 +If the source and destination are both directories, the source tree is
  13.272 +recreated in the destination directory.
  13.273 +\interaction{daily.copy.dir-src-dest}
  13.274 +
  13.275 +As with the \hgcmd{rename} command, if you copy a file manually and
  13.276 +then want Mercurial to know that you've copied the file, simply use
  13.277 +the \hgopt{copy}{--after} option to \hgcmd{copy}.
  13.278 +\interaction{daily.copy.after}
  13.279 +
  13.280 +\section{Renaming files}
  13.281 +
  13.282 +It's rather more common to need to rename a file than to make a copy
  13.283 +of it.  The reason I discussed the \hgcmd{copy} command before talking
  13.284 +about renaming files is that Mercurial treats a rename in essentially
  13.285 +the same way as a copy.  Therefore, knowing what Mercurial does when
  13.286 +you copy a file tells you what to expect when you rename a file.
  13.287 +
  13.288 +When you use the \hgcmd{rename} command, Mercurial makes a copy of
  13.289 +each source file, then deletes it and marks the file as removed.
  13.290 +\interaction{daily.rename.rename}
  13.291 +The \hgcmd{status} command shows the newly copied file as added, and
  13.292 +the copied-from file as removed.
  13.293 +\interaction{daily.rename.status}
  13.294 +As with the results of a \hgcmd{copy}, we must use the
  13.295 +\hgopt{status}{-C} option to \hgcmd{status} to see that the added file
  13.296 +is really being tracked by Mercurial as a copy of the original, now
  13.297 +removed, file.
  13.298 +\interaction{daily.rename.status-copy}
  13.299 +
  13.300 +As with \hgcmd{remove} and \hgcmd{copy}, you can tell Mercurial about
  13.301 +a rename after the fact using the \hgopt{rename}{--after} option.  In
  13.302 +most other respects, the behaviour of the \hgcmd{rename} command, and
  13.303 +the options it accepts, are similar to the \hgcmd{copy} command.
  13.304 +
  13.305 +\subsection{Renaming files and merging changes}
  13.306 +
  13.307 +Since Mercurial's rename is implemented as copy-and-remove, the same
  13.308 +propagation of changes happens when you merge after a rename as after
  13.309 +a copy.
  13.310 +
  13.311 +If I modify a file, and you rename it to a new name, and then we merge
  13.312 +our respective changes, my modifications to the file under its
  13.313 +original name will be propagated into the file under its new name.
  13.314 +(This is something you might expect to ``simply work,'' but not all
  13.315 +revision control systems actually do this.)
  13.316 +
  13.317 +Whereas having changes follow a copy is a feature where you can
  13.318 +perhaps nod and say ``yes, that might be useful,'' it should be clear
  13.319 +that having them follow a rename is definitely important.  Without
  13.320 +this facility, it would simply be too easy for changes to become
  13.321 +orphaned when files are renamed.
  13.322 +
  13.323 +\subsection{Divergent renames and merging}
  13.324 +
  13.325 +The case of diverging names occurs when two developers start with a
  13.326 +file---let's call it \filename{foo}---in their respective
  13.327 +repositories.
  13.328 +
  13.329 +\interaction{rename.divergent.clone}
  13.330 +Anne renames the file to \filename{bar}.
  13.331 +\interaction{rename.divergent.rename.anne}
  13.332 +Meanwhile, Bob renames it to \filename{quux}.
  13.333 +\interaction{rename.divergent.rename.bob}
  13.334 +
  13.335 +I like to think of this as a conflict because each developer has
  13.336 +expressed different intentions about what the file ought to be named.
  13.337 +
  13.338 +What do you think should happen when they merge their work?
  13.339 +Mercurial's actual behaviour is that it always preserves \emph{both}
  13.340 +names when it merges changesets that contain divergent renames.
  13.341 +\interaction{rename.divergent.merge}
  13.342 +
  13.343 +Notice that Mercurial does warn about the divergent renames, but it
  13.344 +leaves it up to you to do something about the divergence after the merge.
  13.345 +
  13.346 +\subsection{Convergent renames and merging}
  13.347 +
  13.348 +Another kind of rename conflict occurs when two people choose to
  13.349 +rename different \emph{source} files to the same \emph{destination}.
  13.350 +In this case, Mercurial runs its normal merge machinery, and lets you
  13.351 +guide it to a suitable resolution.
  13.352 +
  13.353 +\subsection{Other name-related corner cases}
  13.354 +
  13.355 +Mercurial has a longstanding bug in which it fails to handle a merge
  13.356 +where one side has a file with a given name, while another has a
  13.357 +directory with the same name.  This is documented as~\bug{29}.
  13.358 +\interaction{issue29.go}
  13.359 +
  13.360 +\section{Recovering from mistakes}
  13.361 +
  13.362 +Mercurial has some useful commands that will help you to recover from
  13.363 +some common mistakes.
  13.364 +
  13.365 +The \hgcmd{revert} command lets you undo changes that you have made to
  13.366 +your working directory.  For example, if you \hgcmd{add} a file by
  13.367 +accident, just run \hgcmd{revert} with the name of the file you added,
  13.368 +and while the file won't be touched in any way, it won't be tracked
  13.369 +for adding by Mercurial any longer, either.  You can also use
  13.370 +\hgcmd{revert} to get rid of erroneous changes to a file.
  13.371 +
  13.372 +It's useful to remember that the \hgcmd{revert} command is useful for
  13.373 +changes that you have not yet committed.  Once you've committed a
  13.374 +change, if you decide it was a mistake, you can still do something
  13.375 +about it, though your options may be more limited.
  13.376 +
  13.377 +For more information about the \hgcmd{revert} command, and details
  13.378 +about how to deal with changes you have already committed, see
  13.379 +chapter~\ref{chap:undo}.
  13.380 +
  13.381 +%%% Local Variables: 
  13.382 +%%% mode: latex
  13.383 +%%% TeX-master: "00book"
  13.384 +%%% End: 

    14.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    14.2 +++ b/en/ch06-collab.tex	Thu Jan 29 22:56:27 2009 -0800
    14.3 @@ -0,0 +1,1118 @@
    14.4 +\chapter{Collaborating with other people}
    14.5 +\label{cha:collab}
    14.6 +
    14.7 +As a completely decentralised tool, Mercurial doesn't impose any
    14.8 +policy on how people ought to work with each other.  However, if
    14.9 +you're new to distributed revision control, it helps to have some
   14.10 +tools and examples in mind when you're thinking about possible
   14.11 +workflow models.
   14.12 +
   14.13 +\section{Mercurial's web interface}
   14.14 +
   14.15 +Mercurial has a powerful web interface that provides several 
   14.16 +useful capabilities.
   14.17 +
   14.18 +For interactive use, the web interface lets you browse a single
   14.19 +repository or a collection of repositories.  You can view the history
   14.20 +of a repository, examine each change (comments and diffs), and view
   14.21 +the contents of each directory and file.
   14.22 +
   14.23 +Also for human consumption, the web interface provides an RSS feed of
   14.24 +the changes in a repository.  This lets you ``subscribe'' to a
   14.25 +repository using your favourite feed reader, and be automatically
   14.26 +notified of activity in that repository as soon as it happens.  I find
   14.27 +this capability much more convenient than the model of subscribing to
   14.28 +a mailing list to which notifications are sent, as it requires no
   14.29 +additional configuration on the part of whoever is serving the
   14.30 +repository.
   14.31 +
   14.32 +The web interface also lets remote users clone a repository, pull
   14.33 +changes from it, and (when the server is configured to permit it) push
   14.34 +changes back to it.  Mercurial's HTTP tunneling protocol aggressively
   14.35 +compresses data, so that it works efficiently even over low-bandwidth
   14.36 +network connections.
   14.37 +
   14.38 +The easiest way to get started with the web interface is to use your
   14.39 +web browser to visit an existing repository, such as the master
   14.40 +Mercurial repository at
   14.41 +\url{http://www.selenic.com/repo/hg?style=gitweb}.
   14.42 +
   14.43 +If you're interested in providing a web interface to your own
   14.44 +repositories, Mercurial provides two ways to do this.  The first is
   14.45 +using the \hgcmd{serve} command, which is best suited to short-term
   14.46 +``lightweight'' serving.  See section~\ref{sec:collab:serve} below for
   14.47 +details of how to use this command.  If you have a long-lived
   14.48 +repository that you'd like to make permanently available, Mercurial
   14.49 +has built-in support for the CGI (Common Gateway Interface) standard,
   14.50 +which all common web servers support.  See
   14.51 +section~\ref{sec:collab:cgi} for details of CGI configuration.
   14.52 +
   14.53 +\section{Collaboration models}
   14.54 +
   14.55 +With a suitably flexible tool, making decisions about workflow is much
   14.56 +more of a social engineering challenge than a technical one.
   14.57 +Mercurial imposes few limitations on how you can structure the flow of
   14.58 +work in a project, so it's up to you and your group to set up and live
   14.59 +with a model that matches your own particular needs.
   14.60 +
   14.61 +\subsection{Factors to keep in mind}
   14.62 +
   14.63 +The most important aspect of any model that you must keep in mind is
   14.64 +how well it matches the needs and capabilities of the people who will
   14.65 +be using it.  This might seem self-evident; even so, you still can't
   14.66 +afford to forget it for a moment.
   14.67 +
   14.68 +I once put together a workflow model that seemed to make perfect sense
   14.69 +to me, but that caused a considerable amount of consternation and
   14.70 +strife within my development team.  In spite of my attempts to explain
   14.71 +why we needed a complex set of branches, and how changes ought to flow
   14.72 +between them, a few team members revolted.  Even though they were
   14.73 +smart people, they didn't want to pay attention to the constraints we
   14.74 +were operating under, or face the consequences of those constraints in
   14.75 +the details of the model that I was advocating.
   14.76 +
   14.77 +Don't sweep foreseeable social or technical problems under the rug.
   14.78 +Whatever scheme you put into effect, you should plan for mistakes and
   14.79 +problem scenarios.  Consider adding automated machinery to prevent, or
   14.80 +quickly recover from, trouble that you can anticipate.  As an example,
   14.81 +if you intend to have a branch with not-for-release changes in it,
   14.82 +you'd do well to think early about the possibility that someone might
   14.83 +accidentally merge those changes into a release branch.  You could
   14.84 +avoid this particular problem by writing a hook that prevents changes
   14.85 +from being merged from an inappropriate branch.
   14.86 +
   14.87 +\subsection{Informal anarchy}
   14.88 +
   14.89 +I wouldn't suggest an ``anything goes'' approach as something
   14.90 +sustainable, but it's a model that's easy to grasp, and it works
   14.91 +perfectly well in a few unusual situations.
   14.92 +
   14.93 +As one example, many projects have a loose-knit group of collaborators
   14.94 +who rarely physically meet each other.  Some groups like to overcome
   14.95 +the isolation of working at a distance by organising occasional
   14.96 +``sprints''.  In a sprint, a number of people get together in a single
   14.97 +location (a company's conference room, a hotel meeting room, that kind
   14.98 +of place) and spend several days more or less locked in there, hacking
   14.99 +intensely on a handful of projects.
  14.100 +
  14.101 +A sprint is the perfect place to use the \hgcmd{serve} command, since
  14.102 +\hgcmd{serve} does not requires any fancy server infrastructure.  You
  14.103 +can get started with \hgcmd{serve} in moments, by reading
  14.104 +section~\ref{sec:collab:serve} below.  Then simply tell the person
  14.105 +next to you that you're running a server, send the URL to them in an
  14.106 +instant message, and you immediately have a quick-turnaround way to
  14.107 +work together.  They can type your URL into their web browser and
  14.108 +quickly review your changes; or they can pull a bugfix from you and
  14.109 +verify it; or they can clone a branch containing a new feature and try
  14.110 +it out.
  14.111 +
  14.112 +The charm, and the problem, with doing things in an ad hoc fashion
  14.113 +like this is that only people who know about your changes, and where
  14.114 +they are, can see them.  Such an informal approach simply doesn't
  14.115 +scale beyond a handful people, because each individual needs to know
  14.116 +about $n$ different repositories to pull from.
  14.117 +
  14.118 +\subsection{A single central repository}
  14.119 +
  14.120 +For smaller projects migrating from a centralised revision control
  14.121 +tool, perhaps the easiest way to get started is to have changes flow
  14.122 +through a single shared central repository.  This is also the
  14.123 +most common ``building block'' for more ambitious workflow schemes.
  14.124 +
  14.125 +Contributors start by cloning a copy of this repository.  They can
  14.126 +pull changes from it whenever they need to, and some (perhaps all)
  14.127 +developers have permission to push a change back when they're ready
  14.128 +for other people to see it.
  14.129 +
  14.130 +Under this model, it can still often make sense for people to pull
  14.131 +changes directly from each other, without going through the central
  14.132 +repository.  Consider a case in which I have a tentative bug fix, but
  14.133 +I am worried that if I were to publish it to the central repository,
  14.134 +it might subsequently break everyone else's trees as they pull it.  To
  14.135 +reduce the potential for damage, I can ask you to clone my repository
  14.136 +into a temporary repository of your own and test it.  This lets us put
  14.137 +off publishing the potentially unsafe change until it has had a little
  14.138 +testing.
  14.139 +
  14.140 +In this kind of scenario, people usually use the \command{ssh}
  14.141 +protocol to securely push changes to the central repository, as
  14.142 +documented in section~\ref{sec:collab:ssh}.  It's also usual to
  14.143 +publish a read-only copy of the repository over HTTP using CGI, as in
  14.144 +section~\ref{sec:collab:cgi}.  Publishing over HTTP satisfies the
  14.145 +needs of people who don't have push access, and those who want to use
  14.146 +web browsers to browse the repository's history.
  14.147 +
  14.148 +\subsection{Working with multiple branches}
  14.149 +
  14.150 +Projects of any significant size naturally tend to make progress on
  14.151 +several fronts simultaneously.  In the case of software, it's common
  14.152 +for a project to go through periodic official releases.  A release
  14.153 +might then go into ``maintenance mode'' for a while after its first
  14.154 +publication; maintenance releases tend to contain only bug fixes, not
  14.155 +new features.  In parallel with these maintenance releases, one or
  14.156 +more future releases may be under development.  People normally use
  14.157 +the word ``branch'' to refer to one of these many slightly different
  14.158 +directions in which development is proceeding.
  14.159 +
  14.160 +Mercurial is particularly well suited to managing a number of
  14.161 +simultaneous, but not identical, branches.  Each ``development
  14.162 +direction'' can live in its own central repository, and you can merge
  14.163 +changes from one to another as the need arises.  Because repositories
  14.164 +are independent of each other, unstable changes in a development
  14.165 +branch will never affect a stable branch unless someone explicitly
  14.166 +merges those changes in.
  14.167 +
  14.168 +Here's an example of how this can work in practice.  Let's say you
  14.169 +have one ``main branch'' on a central server.
  14.170 +\interaction{branching.init}
  14.171 +People clone it, make changes locally, test them, and push them back.
  14.172 +
  14.173 +Once the main branch reaches a release milestone, you can use the
  14.174 +\hgcmd{tag} command to give a permanent name to the milestone
  14.175 +revision.
  14.176 +\interaction{branching.tag}
  14.177 +Let's say some ongoing development occurs on the main branch.
  14.178 +\interaction{branching.main}
  14.179 +Using the tag that was recorded at the milestone, people who clone
  14.180 +that repository at any time in the future can use \hgcmd{update} to
  14.181 +get a copy of the working directory exactly as it was when that tagged
  14.182 +revision was committed.  
  14.183 +\interaction{branching.update}
  14.184 +
  14.185 +In addition, immediately after the main branch is tagged, someone can
  14.186 +then clone the main branch on the server to a new ``stable'' branch,
  14.187 +also on the server.
  14.188 +\interaction{branching.clone}
  14.189 +
  14.190 +Someone who needs to make a change to the stable branch can then clone
  14.191 +\emph{that} repository, make their changes, commit, and push their
  14.192 +changes back there.
  14.193 +\interaction{branching.stable}
  14.194 +Because Mercurial repositories are independent, and Mercurial doesn't
  14.195 +move changes around automatically, the stable and main branches are
  14.196 +\emph{isolated} from each other.  The changes that you made on the
  14.197 +main branch don't ``leak'' to the stable branch, and vice versa.
  14.198 +
  14.199 +You'll often want all of your bugfixes on the stable branch to show up
  14.200 +on the main branch, too.  Rather than rewrite a bugfix on the main
  14.201 +branch, you can simply pull and merge changes from the stable to the
  14.202 +main branch, and Mercurial will bring those bugfixes in for you.
  14.203 +\interaction{branching.merge}
  14.204 +The main branch will still contain changes that are not on the stable
  14.205 +branch, but it will also contain all of the bugfixes from the stable
  14.206 +branch.  The stable branch remains unaffected by these changes.
  14.207 +
  14.208 +\subsection{Feature branches}
  14.209 +
  14.210 +For larger projects, an effective way to manage change is to break up
  14.211 +a team into smaller groups.  Each group has a shared branch of its
  14.212 +own, cloned from a single ``master'' branch used by the entire
  14.213 +project.  People working on an individual branch are typically quite
  14.214 +isolated from developments on other branches.
  14.215 +
  14.216 +\begin{figure}[ht]
  14.217 +  \centering
  14.218 +  \grafix{feature-branches}
  14.219 +  \caption{Feature branches}
  14.220 +  \label{fig:collab:feature-branches}
  14.221 +\end{figure}
  14.222 +
  14.223 +When a particular feature is deemed to be in suitable shape, someone
  14.224 +on that feature team pulls and merges from the master branch into the
  14.225 +feature branch, then pushes back up to the master branch.
  14.226 +
  14.227 +\subsection{The release train}
  14.228 +
  14.229 +Some projects are organised on a ``train'' basis: a release is
  14.230 +scheduled to happen every few months, and whatever features are ready
  14.231 +when the ``train'' is ready to leave are allowed in.
  14.232 +
  14.233 +This model resembles working with feature branches.  The difference is
  14.234 +that when a feature branch misses a train, someone on the feature team
  14.235 +pulls and merges the changes that went out on that train release into
  14.236 +the feature branch, and the team continues its work on top of that
  14.237 +release so that their feature can make the next release.
  14.238 +
  14.239 +\subsection{The Linux kernel model}
  14.240 +
  14.241 +The development of the Linux kernel has a shallow hierarchical
  14.242 +structure, surrounded by a cloud of apparent chaos.  Because most
  14.243 +Linux developers use \command{git}, a distributed revision control
  14.244 +tool with capabilities similar to Mercurial, it's useful to describe
  14.245 +the way work flows in that environment; if you like the ideas, the
  14.246 +approach translates well across tools.
  14.247 +
  14.248 +At the center of the community sits Linus Torvalds, the creator of
  14.249 +Linux.  He publishes a single source repository that is considered the
  14.250 +``authoritative'' current tree by the entire developer community.
  14.251 +Anyone can clone Linus's tree, but he is very choosy about whose trees
  14.252 +he pulls from.
  14.253 +
  14.254 +Linus has a number of ``trusted lieutenants''.  As a general rule, he
  14.255 +pulls whatever changes they publish, in most cases without even
  14.256 +reviewing those changes.  Some of those lieutenants are generally
  14.257 +agreed to be ``maintainers'', responsible for specific subsystems
  14.258 +within the kernel.  If a random kernel hacker wants to make a change
  14.259 +to a subsystem that they want to end up in Linus's tree, they must
  14.260 +find out who the subsystem's maintainer is, and ask that maintainer to
  14.261 +take their change.  If the maintainer reviews their changes and agrees
  14.262 +to take them, they'll pass them along to Linus in due course.
  14.263 +
  14.264 +Individual lieutenants have their own approaches to reviewing,
  14.265 +accepting, and publishing changes; and for deciding when to feed them
  14.266 +to Linus.  In addition, there are several well known branches that
  14.267 +people use for different purposes.  For example, a few people maintain
  14.268 +``stable'' repositories of older versions of the kernel, to which they
  14.269 +apply critical fixes as needed.  Some maintainers publish multiple
  14.270 +trees: one for experimental changes; one for changes that they are
  14.271 +about to feed upstream; and so on.  Others just publish a single
  14.272 +tree.
  14.273 +
  14.274 +This model has two notable features.  The first is that it's ``pull
  14.275 +only''.  You have to ask, convince, or beg another developer to take a
  14.276 +change from you, because there are almost no trees to which more than
  14.277 +one person can push, and there's no way to push changes into a tree
  14.278 +that someone else controls.
  14.279 +
  14.280 +The second is that it's based on reputation and acclaim.  If you're an
  14.281 +unknown, Linus will probably ignore changes from you without even
  14.282 +responding.  But a subsystem maintainer will probably review them, and
  14.283 +will likely take them if they pass their criteria for suitability.
  14.284 +The more ``good'' changes you contribute to a maintainer, the more
  14.285 +likely they are to trust your judgment and accept your changes.  If
  14.286 +you're well-known and maintain a long-lived branch for something Linus
  14.287 +hasn't yet accepted, people with similar interests may pull your
  14.288 +changes regularly to keep up with your work.
  14.289 +
  14.290 +Reputation and acclaim don't necessarily cross subsystem or ``people''
  14.291 +boundaries.  If you're a respected but specialised storage hacker, and
  14.292 +you try to fix a networking bug, that change will receive a level of
  14.293 +scrutiny from a network maintainer comparable to a change from a
  14.294 +complete stranger.
  14.295 +
  14.296 +To people who come from more orderly project backgrounds, the
  14.297 +comparatively chaotic Linux kernel development process often seems
  14.298 +completely insane.  It's subject to the whims of individuals; people
  14.299 +make sweeping changes whenever they deem it appropriate; and the pace
  14.300 +of development is astounding.  And yet Linux is a highly successful,
  14.301 +well-regarded piece of software.
  14.302 +
  14.303 +\subsection{Pull-only versus shared-push collaboration}
  14.304 +
  14.305 +A perpetual source of heat in the open source community is whether a
  14.306 +development model in which people only ever pull changes from others
  14.307 +is ``better than'' one in which multiple people can push changes to a
  14.308 +shared repository.
  14.309 +
  14.310 +Typically, the backers of the shared-push model use tools that
  14.311 +actively enforce this approach.  If you're using a centralised
  14.312 +revision control tool such as Subversion, there's no way to make a
  14.313 +choice over which model you'll use: the tool gives you shared-push,
  14.314 +and if you want to do anything else, you'll have to roll your own
  14.315 +approach on top (such as applying a patch by hand).
  14.316 +
  14.317 +A good distributed revision control tool, such as Mercurial, will
  14.318 +support both models.  You and your collaborators can then structure
  14.319 +how you work together based on your own needs and preferences, not on
  14.320 +what contortions your tools force you into.
  14.321 +
  14.322 +\subsection{Where collaboration meets branch management}
  14.323 +
  14.324 +Once you and your team set up some shared repositories and start
  14.325 +propagating changes back and forth between local and shared repos, you
  14.326 +begin to face a related, but slightly different challenge: that of
  14.327 +managing the multiple directions in which your team may be moving at
  14.328 +once.  Even though this subject is intimately related to how your team
  14.329 +collaborates, it's dense enough to merit treatment of its own, in
  14.330 +chapter~\ref{chap:branch}.
  14.331 +
  14.332 +\section{The technical side of sharing}
  14.333 +
  14.334 +The remainder of this chapter is devoted to the question of serving
  14.335 +data to your collaborators.
  14.336 +
  14.337 +\section{Informal sharing with \hgcmd{serve}}
  14.338 +\label{sec:collab:serve}
  14.339 +
  14.340 +Mercurial's \hgcmd{serve} command is wonderfully suited to small,
  14.341 +tight-knit, and fast-paced group environments.  It also provides a
  14.342 +great way to get a feel for using Mercurial commands over a network.
  14.343 +
  14.344 +Run \hgcmd{serve} inside a repository, and in under a second it will
  14.345 +bring up a specialised HTTP server; this will accept connections from
  14.346 +any client, and serve up data for that repository until you terminate
  14.347 +it.  Anyone who knows the URL of the server you just started, and can
  14.348 +talk to your computer over the network, can then use a web browser or
  14.349 +Mercurial to read data from that repository.  A URL for a
  14.350 +\hgcmd{serve} instance running on a laptop is likely to look something
  14.351 +like \Verb|http://my-laptop.local:8000/|.
  14.352 +
  14.353 +The \hgcmd{serve} command is \emph{not} a general-purpose web server.
  14.354 +It can do only two things:
  14.355 +\begin{itemize}
  14.356 +\item Allow people to browse the history of the repository it's
  14.357 +  serving, from their normal web browsers.
  14.358 +\item Speak Mercurial's wire protocol, so that people can
  14.359 +  \hgcmd{clone} or \hgcmd{pull} changes from that repository.
  14.360 +\end{itemize}
  14.361 +In particular, \hgcmd{serve} won't allow remote users to \emph{modify}
  14.362 +your repository.  It's intended for read-only use.
  14.363 +
  14.364 +If you're getting started with Mercurial, there's nothing to prevent
  14.365 +you from using \hgcmd{serve} to serve up a repository on your own
  14.366 +computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and
  14.367 +so on to talk to that server as if the repository was hosted remotely.
  14.368 +This can help you to quickly get acquainted with using commands on
  14.369 +network-hosted repositories.
  14.370 +
  14.371 +\subsection{A few things to keep in mind}
  14.372 +
  14.373 +Because it provides unauthenticated read access to all clients, you
  14.374 +should only use \hgcmd{serve} in an environment where you either don't
  14.375 +care, or have complete control over, who can access your network and
  14.376 +pull data from your repository.
  14.377 +
  14.378 +The \hgcmd{serve} command knows nothing about any firewall software
  14.379 +you might have installed on your system or network.  It cannot detect
  14.380 +or control your firewall software.  If other people are unable to talk
  14.381 +to a running \hgcmd{serve} instance, the second thing you should do
  14.382 +(\emph{after} you make sure that they're using the correct URL) is
  14.383 +check your firewall configuration.
  14.384 +
  14.385 +By default, \hgcmd{serve} listens for incoming connections on
  14.386 +port~8000.  If another process is already listening on the port you
  14.387 +want to use, you can specify a different port to listen on using the
  14.388 +\hgopt{serve}{-p} option.
  14.389 +
  14.390 +Normally, when \hgcmd{serve} starts, it prints no output, which can be
  14.391 +a bit unnerving.  If you'd like to confirm that it is indeed running
  14.392 +correctly, and find out what URL you should send to your
  14.393 +collaborators, start it with the \hggopt{-v} option.
  14.394 +
  14.395 +\section{Using the Secure Shell (ssh) protocol}
  14.396 +\label{sec:collab:ssh}
  14.397 +
  14.398 +You can pull and push changes securely over a network connection using
  14.399 +the Secure Shell (\texttt{ssh}) protocol.  To use this successfully,
  14.400 +you may have to do a little bit of configuration on the client or
  14.401 +server sides.
  14.402 +
  14.403 +If you're not familiar with ssh, it's a network protocol that lets you
  14.404 +securely communicate with another computer.  To use it with Mercurial,
  14.405 +you'll be setting up one or more user accounts on a server so that
  14.406 +remote users can log in and execute commands.
  14.407 +
  14.408 +(If you \emph{are} familiar with ssh, you'll probably find some of the
  14.409 +material that follows to be elementary in nature.)
  14.410 +
  14.411 +\subsection{How to read and write ssh URLs}
  14.412 +
  14.413 +An ssh URL tends to look like this:
  14.414 +\begin{codesample2}
  14.415 +  ssh://bos@hg.serpentine.com:22/hg/hgbook
  14.416 +\end{codesample2}
  14.417 +\begin{enumerate}
  14.418 +\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh
  14.419 +  protocol.
  14.420 +\item The ``\texttt{bos@}'' component indicates what username to log
  14.421 +  into the server as.  You can leave this out if the remote username
  14.422 +  is the same as your local username.
  14.423 +\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the
  14.424 +  server to log into.
  14.425 +\item The ``:22'' identifies the port number to connect to the server
  14.426 +  on.  The default port is~22, so you only need to specify this part
  14.427 +  if you're \emph{not} using port~22.
  14.428 +\item The remainder of the URL is the local path to the repository on
  14.429 +  the server.
  14.430 +\end{enumerate}
  14.431 +
  14.432 +There's plenty of scope for confusion with the path component of ssh
  14.433 +URLs, as there is no standard way for tools to interpret it.  Some
  14.434 +programs behave differently than others when dealing with these paths.
  14.435 +This isn't an ideal situation, but it's unlikely to change.  Please
  14.436 +read the following paragraphs carefully.
  14.437 +
  14.438 +Mercurial treats the path to a repository on the server as relative to
  14.439 +the remote user's home directory.  For example, if user \texttt{foo}
  14.440 +on the server has a home directory of \dirname{/home/foo}, then an ssh
  14.441 +URL that contains a path component of \dirname{bar}
  14.442 +\emph{really} refers to the directory \dirname{/home/foo/bar}.
  14.443 +
  14.444 +If you want to specify a path relative to another user's home
  14.445 +directory, you can use a path that starts with a tilde character
  14.446 +followed by the user's name (let's call them \texttt{otheruser}), like
  14.447 +this.
  14.448 +\begin{codesample2}
  14.449 +  ssh://server/~otheruser/hg/repo
  14.450 +\end{codesample2}
  14.451 +
  14.452 +And if you really want to specify an \emph{absolute} path on the
  14.453 +server, begin the path component with two slashes, as in this example.
  14.454 +\begin{codesample2}
  14.455 +  ssh://server//absolute/path
  14.456 +\end{codesample2}
  14.457 +
  14.458 +\subsection{Finding an ssh client for your system}
  14.459 +
  14.460 +Almost every Unix-like system comes with OpenSSH preinstalled.  If
  14.461 +you're using such a system, run \Verb|which ssh| to find out if
  14.462 +the \command{ssh} command is installed (it's usually in
  14.463 +\dirname{/usr/bin}).  In the unlikely event that it isn't present,
  14.464 +take a look at your system documentation to figure out how to install
  14.465 +it.
  14.466 +
  14.467 +On Windows, you'll first need to download a suitable ssh
  14.468 +client.  There are two alternatives.
  14.469 +\begin{itemize}
  14.470 +\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides
  14.471 +  a complete suite of ssh client commands.
  14.472 +\item If you have a high tolerance for pain, you can use the Cygwin
  14.473 +  port of OpenSSH.
  14.474 +\end{itemize}
  14.475 +In either case, you'll need to edit your \hgini\ file to tell
  14.476 +Mercurial where to find the actual client command.  For example, if
  14.477 +you're using PuTTY, you'll need to use the \command{plink} command as
  14.478 +a command-line ssh client.
  14.479 +\begin{codesample2}
  14.480 +  [ui]
  14.481 +  ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"
  14.482 +\end{codesample2}
  14.483 +
  14.484 +\begin{note}
  14.485 +  The path to \command{plink} shouldn't contain any whitespace
  14.486 +  characters, or Mercurial may not be able to run it correctly (so
  14.487 +  putting it in \dirname{C:\\Program Files} is probably not a good
  14.488 +  idea).
  14.489 +\end{note}
  14.490 +
  14.491 +\subsection{Generating a key pair}
  14.492 +
  14.493 +To avoid the need to repetitively type a password every time you need
  14.494 +to use your ssh client, I recommend generating a key pair.  On a
  14.495 +Unix-like system, the \command{ssh-keygen} command will do the trick.
  14.496 +On Windows, if you're using PuTTY, the \command{puttygen} command is
  14.497 +what you'll need.
  14.498 +
  14.499 +When you generate a key pair, it's usually \emph{highly} advisable to
  14.500 +protect it with a passphrase.  (The only time that you might not want
  14.501 +to do this is when you're using the ssh protocol for automated tasks
  14.502 +on a secure network.)
  14.503 +
  14.504 +Simply generating a key pair isn't enough, however.  You'll need to
  14.505 +add the public key to the set of authorised keys for whatever user
  14.506 +you're logging in remotely as.  For servers using OpenSSH (the vast
  14.507 +majority), this will mean adding the public key to a list in a file
  14.508 +called \sfilename{authorized\_keys} in their \sdirname{.ssh}
  14.509 +directory.
  14.510 +
  14.511 +On a Unix-like system, your public key will have a \filename{.pub}
  14.512 +extension.  If you're using \command{puttygen} on Windows, you can
  14.513 +save the public key to a file of your choosing, or paste it from the
  14.514 +window it's displayed in straight into the
  14.515 +\sfilename{authorized\_keys} file.
  14.516 +
  14.517 +\subsection{Using an authentication agent}
  14.518 +
  14.519 +An authentication agent is a daemon that stores passphrases in memory
  14.520 +(so it will forget passphrases if you log out and log back in again).
  14.521 +An ssh client will notice if it's running, and query it for a
  14.522 +passphrase.  If there's no authentication agent running, or the agent
  14.523 +doesn't store the necessary passphrase, you'll have to type your
  14.524 +passphrase every time Mercurial tries to communicate with a server on
  14.525 +your behalf (e.g.~whenever you pull or push changes).
  14.526 +
  14.527 +The downside of storing passphrases in an agent is that it's possible
  14.528 +for a well-prepared attacker to recover the plain text of your
  14.529 +passphrases, in some cases even if your system has been power-cycled.
  14.530 +You should make your own judgment as to whether this is an acceptable
  14.531 +risk.  It certainly saves a lot of repeated typing.
  14.532 +
  14.533 +On Unix-like systems, the agent is called \command{ssh-agent}, and
  14.534 +it's often run automatically for you when you log in.  You'll need to
  14.535 +use the \command{ssh-add} command to add passphrases to the agent's
  14.536 +store.  On Windows, if you're using PuTTY, the \command{pageant}
  14.537 +command acts as the agent.  It adds an icon to your system tray that
  14.538 +will let you manage stored passphrases.
  14.539 +
  14.540 +\subsection{Configuring the server side properly}
  14.541 +
  14.542 +Because ssh can be fiddly to set up if you're new to it, there's a
  14.543 +variety of things that can go wrong.  Add Mercurial on top, and
  14.544 +there's plenty more scope for head-scratching.  Most of these
  14.545 +potential problems occur on the server side, not the client side.  The
  14.546 +good news is that once you've gotten a configuration working, it will
  14.547 +usually continue to work indefinitely.
  14.548 +
  14.549 +Before you try using Mercurial to talk to an ssh server, it's best to
  14.550 +make sure that you can use the normal \command{ssh} or \command{putty}
  14.551 +command to talk to the server first.  If you run into problems with
  14.552 +using these commands directly, Mercurial surely won't work.  Worse, it
  14.553 +will obscure the underlying problem.  Any time you want to debug
  14.554 +ssh-related Mercurial problems, you should drop back to making sure
  14.555 +that plain ssh client commands work first, \emph{before} you worry
  14.556 +about whether there's a problem with Mercurial.
  14.557 +
  14.558 +The first thing to be sure of on the server side is that you can
  14.559 +actually log in from another machine at all.  If you can't use
  14.560 +\command{ssh} or \command{putty} to log in, the error message you get
  14.561 +may give you a few hints as to what's wrong.  The most common problems
  14.562 +are as follows.
  14.563 +\begin{itemize}
  14.564 +\item If you get a ``connection refused'' error, either there isn't an
  14.565 +  SSH daemon running on the server at all, or it's inaccessible due to
  14.566 +  firewall configuration.
  14.567 +\item If you get a ``no route to host'' error, you either have an
  14.568 +  incorrect address for the server or a seriously locked down firewall
  14.569 +  that won't admit its existence at all.
  14.570 +\item If you get a ``permission denied'' error, you may have mistyped
  14.571 +  the username on the server, or you could have mistyped your key's
  14.572 +  passphrase or the remote user's password.
  14.573 +\end{itemize}
  14.574 +In summary, if you're having trouble talking to the server's ssh
  14.575 +daemon, first make sure that one is running at all.  On many systems
  14.576 +it will be installed, but disabled, by default.  Once you're done with
  14.577 +this step, you should then check that the server's firewall is
  14.578 +configured to allow incoming connections on the port the ssh daemon is
  14.579 +listening on (usually~22).  Don't worry about more exotic
  14.580 +possibilities for misconfiguration until you've checked these two
  14.581 +first.
  14.582 +
  14.583 +If you're using an authentication agent on the client side to store
  14.584 +passphrases for your keys, you ought to be able to log into the server
  14.585 +without being prompted for a passphrase or a password.  If you're
  14.586 +prompted for a passphrase, there are a few possible culprits.
  14.587 +\begin{itemize}
  14.588 +\item You might have forgotten to use \command{ssh-add} or
  14.589 +  \command{pageant} to store the passphrase.
  14.590 +\item You might have stored the passphrase for the wrong key.
  14.591 +\end{itemize}
  14.592 +If you're being prompted for the remote user's password, there are
  14.593 +another few possible problems to check.
  14.594 +\begin{itemize}
  14.595 +\item Either the user's home directory or their \sdirname{.ssh}
  14.596 +  directory might have excessively liberal permissions.  As a result,
  14.597 +  the ssh daemon will not trust or read their
  14.598 +  \sfilename{authorized\_keys} file.  For example, a group-writable
  14.599 +  home or \sdirname{.ssh} directory will often cause this symptom.
  14.600 +\item The user's \sfilename{authorized\_keys} file may have a problem.
  14.601 +  If anyone other than the user owns or can write to that file, the
  14.602 +  ssh daemon will not trust or read it.
  14.603 +\end{itemize}
  14.604 +
  14.605 +In the ideal world, you should be able to run the following command
  14.606 +successfully, and it should print exactly one line of output, the
  14.607 +current date and time.
  14.608 +\begin{codesample2}
  14.609 +  ssh myserver date
  14.610 +\end{codesample2}
  14.611 +
  14.612 +If, on your server, you have login scripts that print banners or other
  14.613 +junk even when running non-interactive commands like this, you should
  14.614 +fix them before you continue, so that they only print output if
  14.615 +they're run interactively.  Otherwise these banners will at least
  14.616 +clutter up Mercurial's output.  Worse, they could potentially cause
  14.617 +problems with running Mercurial commands remotely.  Mercurial makes
  14.618 +tries to detect and ignore banners in non-interactive \command{ssh}
  14.619 +sessions, but it is not foolproof.  (If you're editing your login
  14.620 +scripts on your server, the usual way to see if a login script is
  14.621 +running in an interactive shell is to check the return code from the
  14.622 +command \Verb|tty -s|.)
  14.623 +
  14.624 +Once you've verified that plain old ssh is working with your server,
  14.625 +the next step is to ensure that Mercurial runs on the server.  The
  14.626 +following command should run successfully:
  14.627 +\begin{codesample2}
  14.628 +  ssh myserver hg version
  14.629 +\end{codesample2}
  14.630 +If you see an error message instead of normal \hgcmd{version} output,
  14.631 +this is usually because you haven't installed Mercurial to
  14.632 +\dirname{/usr/bin}.  Don't worry if this is the case; you don't need
  14.633 +to do that.  But you should check for a few possible problems.
  14.634 +\begin{itemize}
  14.635 +\item Is Mercurial really installed on the server at all?  I know this
  14.636 +  sounds trivial, but it's worth checking!
  14.637 +\item Maybe your shell's search path (usually set via the \envar{PATH}
  14.638 +  environment variable) is simply misconfigured.
  14.639 +\item Perhaps your \envar{PATH} environment variable is only being set
  14.640 +  to point to the location of the \command{hg} executable if the login
  14.641 +  session is interactive.  This can happen if you're setting the path
  14.642 +  in the wrong shell login script.  See your shell's documentation for
  14.643 +  details.
  14.644 +\item The \envar{PYTHONPATH} environment variable may need to contain
  14.645 +  the path to the Mercurial Python modules.  It might not be set at
  14.646 +  all; it could be incorrect; or it may be set only if the login is
  14.647 +  interactive.
  14.648 +\end{itemize}
  14.649 +
  14.650 +If you can run \hgcmd{version} over an ssh connection, well done!
  14.651 +You've got the server and client sorted out.  You should now be able
  14.652 +to use Mercurial to access repositories hosted by that username on
  14.653 +that server.  If you run into problems with Mercurial and ssh at this
  14.654 +point, try using the \hggopt{--debug} option to get a clearer picture
  14.655 +of what's going on.
  14.656 +
  14.657 +\subsection{Using compression with ssh}
  14.658 +
  14.659 +Mercurial does not compress data when it uses the ssh protocol,
  14.660 +because the ssh protocol can transparently compress data.  However,
  14.661 +the default behaviour of ssh clients is \emph{not} to request
  14.662 +compression.
  14.663 +
  14.664 +Over any network other than a fast LAN (even a wireless network),
  14.665 +using compression is likely to significantly speed up Mercurial's
  14.666 +network operations.  For example, over a WAN, someone measured
  14.667 +compression as reducing the amount of time required to clone a
  14.668 +particularly large repository from~51 minutes to~17 minutes.
  14.669 +
  14.670 +Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C}
  14.671 +option which turns on compression.  You can easily edit your \hgrc\ to
  14.672 +enable compression for all of Mercurial's uses of the ssh protocol.
  14.673 +\begin{codesample2}
  14.674 +  [ui]
  14.675 +  ssh = ssh -C
  14.676 +\end{codesample2}
  14.677 +
  14.678 +If you use \command{ssh}, you can configure it to always use
  14.679 +compression when talking to your server.  To do this, edit your
  14.680 +\sfilename{.ssh/config} file (which may not yet exist), as follows.
  14.681 +\begin{codesample2}
  14.682 +  Host hg
  14.683 +    Compression yes
  14.684 +    HostName hg.example.com
  14.685 +\end{codesample2}
  14.686 +This defines an alias, \texttt{hg}.  When you use it on the
  14.687 +\command{ssh} command line or in a Mercurial \texttt{ssh}-protocol
  14.688 +URL, it will cause \command{ssh} to connect to \texttt{hg.example.com}
  14.689 +and use compression.  This gives you both a shorter name to type and
  14.690 +compression, each of which is a good thing in its own right.
  14.691 +
  14.692 +\section{Serving over HTTP using CGI}
  14.693 +\label{sec:collab:cgi}
  14.694 +
  14.695 +Depending on how ambitious you are, configuring Mercurial's CGI
  14.696 +interface can take anything from a few moments to several hours.
  14.697 +
  14.698 +We'll begin with the simplest of examples, and work our way towards a
  14.699 +more complex configuration.  Even for the most basic case, you're
  14.700 +almost certainly going to need to read and modify your web server's
  14.701 +configuration.
  14.702 +
  14.703 +\begin{note}
  14.704 +  Configuring a web server is a complex, fiddly, and highly
  14.705 +  system-dependent activity.  I can't possibly give you instructions
  14.706 +  that will cover anything like all of the cases you will encounter.
  14.707 +  Please use your discretion and judgment in following the sections
  14.708 +  below.  Be prepared to make plenty of mistakes, and to spend a lot
  14.709 +  of time reading your server's error logs.
  14.710 +\end{note}
  14.711 +
  14.712 +\subsection{Web server configuration checklist}
  14.713 +
  14.714 +Before you continue, do take a few moments to check a few aspects of
  14.715 +your system's setup.
  14.716 +
  14.717 +\begin{enumerate}
  14.718 +\item Do you have a web server installed at all?  Mac OS X ships with
  14.719 +  Apache, but many other systems may not have a web server installed.
  14.720 +\item If you have a web server installed, is it actually running?  On
  14.721 +  most systems, even if one is present, it will be disabled by
  14.722 +  default.
  14.723 +\item Is your server configured to allow you to run CGI programs in
  14.724 +  the directory where you plan to do so?  Most servers default to
  14.725 +  explicitly disabling the ability to run CGI programs.
  14.726 +\end{enumerate}
  14.727 +
  14.728 +If you don't have a web server installed, and don't have substantial
  14.729 +experience configuring Apache, you should consider using the
  14.730 +\texttt{lighttpd} web server instead of Apache.  Apache has a
  14.731 +well-deserved reputation for baroque and confusing configuration.
  14.732 +While \texttt{lighttpd} is less capable in some ways than Apache, most
  14.733 +of these capabilities are not relevant to serving Mercurial
  14.734 +repositories.  And \texttt{lighttpd} is undeniably \emph{much} easier
  14.735 +to get started with than Apache.
  14.736 +
  14.737 +\subsection{Basic CGI configuration}
  14.738 +
  14.739 +On Unix-like systems, it's common for users to have a subdirectory
  14.740 +named something like \dirname{public\_html} in their home directory,
  14.741 +from which they can serve up web pages.  A file named \filename{foo}
  14.742 +in this directory will be accessible at a URL of the form
  14.743 +\texttt{http://www.example.com/\~{}username/foo}.
  14.744 +
  14.745 +To get started, find the \sfilename{hgweb.cgi} script that should be
  14.746 +present in your Mercurial installation.  If you can't quickly find a
  14.747 +local copy on your system, simply download one from the master
  14.748 +Mercurial repository at
  14.749 +\url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}.
  14.750 +
  14.751 +You'll need to copy this script into your \dirname{public\_html}
  14.752 +directory, and ensure that it's executable.
  14.753 +\begin{codesample2}
  14.754 +  cp .../hgweb.cgi ~/public_html
  14.755 +  chmod 755 ~/public_html/hgweb.cgi
  14.756 +\end{codesample2}
  14.757 +The \texttt{755} argument to \command{chmod} is a little more general
  14.758 +than just making the script executable: it ensures that the script is
  14.759 +executable by anyone, and that ``group'' and ``other'' write
  14.760 +permissions are \emph{not} set.  If you were to leave those write
  14.761 +permissions enabled, Apache's \texttt{suexec} subsystem would likely
  14.762 +refuse to execute the script.  In fact, \texttt{suexec} also insists
  14.763 +that the \emph{directory} in which the script resides must not be
  14.764 +writable by others.
  14.765 +\begin{codesample2}
  14.766 +  chmod 755 ~/public_html
  14.767 +\end{codesample2}
  14.768 +
  14.769 +\subsubsection{What could \emph{possibly} go wrong?}
  14.770 +\label{sec:collab:wtf}
  14.771 +
  14.772 +Once you've copied the CGI script into place, go into a web browser,
  14.773 +and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi},
  14.774 +\emph{but} brace yourself for instant failure.  There's a high
  14.775 +probability that trying to visit this URL will fail, and there are
  14.776 +many possible reasons for this.  In fact, you're likely to stumble
  14.777 +over almost every one of the possible errors below, so please read
  14.778 +carefully.  The following are all of the problems I ran into on a
  14.779 +system running Fedora~7, with a fresh installation of Apache, and a
  14.780 +user account that I created specially to perform this exercise.
  14.781 +
  14.782 +Your web server may have per-user directories disabled.  If you're
  14.783 +using Apache, search your config file for a \texttt{UserDir}
  14.784 +directive.  If there's none present, per-user directories will be
  14.785 +disabled.  If one exists, but its value is \texttt{disabled}, then
  14.786 +per-user directories will be disabled.  Otherwise, the string after
  14.787 +\texttt{UserDir} gives the name of the subdirectory that Apache will
  14.788 +look in under your home directory, for example \dirname{public\_html}.
  14.789 +
  14.790 +Your file access permissions may be too restrictive.  The web server
  14.791 +must be able to traverse your home directory and directories under
  14.792 +your \dirname{public\_html} directory, and read files under the latter
  14.793 +too.  Here's a quick recipe to help you to make your permissions more
  14.794 +appropriate.
  14.795 +\begin{codesample2}
  14.796 +  chmod 755 ~
  14.797 +  find ~/public_html -type d -print0 | xargs -0r chmod 755
  14.798 +  find ~/public_html -type f -print0 | xargs -0r chmod 644
  14.799 +\end{codesample2}
  14.800 +
  14.801 +The other possibility with permissions is that you might get a
  14.802 +completely empty window when you try to load the script.  In this
  14.803 +case, it's likely that your access permissions are \emph{too
  14.804 +  permissive}.  Apache's \texttt{suexec} subsystem won't execute a
  14.805 +script that's group-~or world-writable, for example.
  14.806 +
  14.807 +Your web server may be configured to disallow execution of CGI
  14.808 +programs in your per-user web directory.  Here's Apache's
  14.809 +default per-user configuration from my Fedora system.
  14.810 +\begin{codesample2}
  14.811 +  <Directory /home/*/public_html>
  14.812 +      AllowOverride FileInfo AuthConfig Limit
  14.813 +      Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
  14.814 +      <Limit GET POST OPTIONS>
  14.815 +          Order allow,deny
  14.816 +          Allow from all
  14.817 +      </Limit>
  14.818 +      <LimitExcept GET POST OPTIONS>
  14.819 +          Order deny,allow
  14.820 +          Deny from all
  14.821 +      </LimitExcept>
  14.822 +  </Directory>
  14.823 +\end{codesample2}
  14.824 +If you find a similar-looking \texttt{Directory} group in your Apache
  14.825 +configuration, the directive to look at inside it is \texttt{Options}.
  14.826 +Add \texttt{ExecCGI} to the end of this list if it's missing, and
  14.827 +restart the web server.
  14.828 +
  14.829 +If you find that Apache serves you the text of the CGI script instead
  14.830 +of executing it, you may need to either uncomment (if already present)
  14.831 +or add a directive like this.
  14.832 +\begin{codesample2}
  14.833 +  AddHandler cgi-script .cgi
  14.834 +\end{codesample2}
  14.835 +
  14.836 +The next possibility is that you might be served with a colourful
  14.837 +Python backtrace claiming that it can't import a
  14.838 +\texttt{mercurial}-related module.  This is actually progress!  The
  14.839 +server is now capable of executing your CGI script.  This error is
  14.840 +only likely to occur if you're running a private installation of
  14.841 +Mercurial, instead of a system-wide version.  Remember that the web
  14.842 +server runs the CGI program without any of the environment variables
  14.843 +that you take for granted in an interactive session.  If this error
  14.844 +happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the
  14.845 +directions inside it to correctly set your \envar{PYTHONPATH}
  14.846 +environment variable.
  14.847 +
  14.848 +Finally, you are \emph{certain} to by served with another colourful
  14.849 +Python backtrace: this one will complain that it can't find
  14.850 +\dirname{/path/to/repository}.  Edit your \sfilename{hgweb.cgi} script
  14.851 +and replace the \dirname{/path/to/repository} string with the complete
  14.852 +path to the repository you want to serve up.
  14.853 +
  14.854 +At this point, when you try to reload the page, you should be
  14.855 +presented with a nice HTML view of your repository's history.  Whew!
  14.856 +
  14.857 +\subsubsection{Configuring lighttpd}
  14.858 +
  14.859 +To be exhaustive in my experiments, I tried configuring the
  14.860 +increasingly popular \texttt{lighttpd} web server to serve the same
  14.861 +repository as I described with Apache above.  I had already overcome
  14.862 +all of the problems I outlined with Apache, many of which are not
  14.863 +server-specific.  As a result, I was fairly sure that my file and
  14.864 +directory permissions were good, and that my \sfilename{hgweb.cgi}
  14.865 +script was properly edited.
  14.866 +
  14.867 +Once I had Apache running, getting \texttt{lighttpd} to serve the
  14.868 +repository was a snap (in other words, even if you're trying to use
  14.869 +\texttt{lighttpd}, you should read the Apache section).  I first had
  14.870 +to edit the \texttt{mod\_access} section of its config file to enable
  14.871 +\texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were
  14.872 +disabled by default on my system.  I then added a few lines to the end
  14.873 +of the config file, to configure these modules.
  14.874 +\begin{codesample2}
  14.875 +  userdir.path = "public_html"
  14.876 +  cgi.assign = ( ".cgi" => "" )
  14.877 +\end{codesample2}
  14.878 +With this done, \texttt{lighttpd} ran immediately for me.  If I had
  14.879 +configured \texttt{lighttpd} before Apache, I'd almost certainly have
  14.880 +run into many of the same system-level configuration problems as I did
  14.881 +with Apache.  However, I found \texttt{lighttpd} to be noticeably
  14.882 +easier to configure than Apache, even though I've used Apache for over
  14.883 +a decade, and this was my first exposure to \texttt{lighttpd}.
  14.884 +
  14.885 +\subsection{Sharing multiple repositories with one CGI script}
  14.886 +
  14.887 +The \sfilename{hgweb.cgi} script only lets you publish a single
  14.888 +repository, which is an annoying restriction.  If you want to publish
  14.889 +more than one without wracking yourself with multiple copies of the
  14.890 +same script, each with different names, a better choice is to use the
  14.891 +\sfilename{hgwebdir.cgi} script.
  14.892 +
  14.893 +The procedure to configure \sfilename{hgwebdir.cgi} is only a little
  14.894 +more involved than for \sfilename{hgweb.cgi}.  First, you must obtain
  14.895 +a copy of the script.  If you don't have one handy, you can download a
  14.896 +copy from the master Mercurial repository at
  14.897 +\url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}.
  14.898 +
  14.899 +You'll need to copy this script into your \dirname{public\_html}
  14.900 +directory, and ensure that it's executable.
  14.901 +\begin{codesample2}
  14.902 +  cp .../hgwebdir.cgi ~/public_html
  14.903 +  chmod 755 ~/public_html ~/public_html/hgwebdir.cgi
  14.904 +\end{codesample2}
  14.905 +With basic configuration out of the way, try to visit
  14.906 +\url{http://myhostname/~myuser/hgwebdir.cgi} in your browser.  It
  14.907 +should display an empty list of repositories.  If you get a blank
  14.908 +window or error message, try walking through the list of potential
  14.909 +problems in section~\ref{sec:collab:wtf}.
  14.910 +
  14.911 +The \sfilename{hgwebdir.cgi} script relies on an external
  14.912 +configuration file.  By default, it searches for a file named
  14.913 +\sfilename{hgweb.config} in the same directory as itself.  You'll need
  14.914 +to create this file, and make it world-readable.  The format of the
  14.915 +file is similar to a Windows ``ini'' file, as understood by Python's
  14.916 +\texttt{ConfigParser}~\cite{web:configparser} module.
  14.917 +
  14.918 +The easiest way to configure \sfilename{hgwebdir.cgi} is with a
  14.919 +section named \texttt{collections}.  This will automatically publish
  14.920 +\emph{every} repository under the directories you name.  The section
  14.921 +should look like this:
  14.922 +\begin{codesample2}
  14.923 +  [collections]
  14.924 +  /my/root = /my/root
  14.925 +\end{codesample2}
  14.926 +Mercurial interprets this by looking at the directory name on the
  14.927 +\emph{right} hand side of the ``\texttt{=}'' sign; finding
  14.928 +repositories in that directory hierarchy; and using the text on the
  14.929 +\emph{left} to strip off matching text from the names it will actually
  14.930 +list in the web interface.  The remaining component of a path after
  14.931 +this stripping has occurred is called a ``virtual path''.
  14.932 +
  14.933 +Given the example above, if we have a repository whose local path is
  14.934 +\dirname{/my/root/this/repo}, the CGI script will strip the leading
  14.935 +\dirname{/my/root} from the name, and publish the repository with a
  14.936 +virtual path of \dirname{this/repo}.  If the base URL for our CGI
  14.937 +script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete
  14.938 +URL for that repository will be
  14.939 +\url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}.
  14.940 +
  14.941 +If we replace \dirname{/my/root} on the left hand side of this example
  14.942 +with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off
  14.943 +\dirname{/my} from the repository name, and will give us a virtual
  14.944 +path of \dirname{root/this/repo} instead of \dirname{this/repo}.
  14.945 +
  14.946 +The \sfilename{hgwebdir.cgi} script will recursively search each
  14.947 +directory listed in the \texttt{collections} section of its
  14.948 +configuration file, but it will \texttt{not} recurse into the
  14.949 +repositories it finds.
  14.950 +
  14.951 +The \texttt{collections} mechanism makes it easy to publish many
  14.952 +repositories in a ``fire and forget'' manner.  You only need to set up
  14.953 +the CGI script and configuration file one time.  Afterwards, you can
  14.954 +publish or unpublish a repository at any time by simply moving it
  14.955 +into, or out of, the directory hierarchy in which you've configured
  14.956 +\sfilename{hgwebdir.cgi} to look.
  14.957 +
  14.958 +\subsubsection{Explicitly specifying which repositories to publish}
  14.959 +
  14.960 +In addition to the \texttt{collections} mechanism, the
  14.961 +\sfilename{hgwebdir.cgi} script allows you to publish a specific list
  14.962 +of repositories.  To do so, create a \texttt{paths} section, with
  14.963 +contents of the following form.
  14.964 +\begin{codesample2}
  14.965 +  [paths]
  14.966 +  repo1 = /my/path/to/some/repo
  14.967 +  repo2 = /some/path/to/another
  14.968 +\end{codesample2}
  14.969 +In this case, the virtual path (the component that will appear in a
  14.970 +URL) is on the left hand side of each definition, while the path to
  14.971 +the repository is on the right.  Notice that there does not need to be
  14.972 +any relationship between the virtual path you choose and the location
  14.973 +of a repository in your filesystem.
  14.974 +
  14.975 +If you wish, you can use both the \texttt{collections} and
  14.976 +\texttt{paths} mechanisms simultaneously in a single configuration
  14.977 +file.
  14.978 +
  14.979 +\begin{note}
  14.980 +  If multiple repositories have the same virtual path,
  14.981 +  \sfilename{hgwebdir.cgi} will not report an error.  Instead, it will
  14.982 +  behave unpredictably.
  14.983 +\end{note}
  14.984 +
  14.985 +\subsection{Downloading source archives}
  14.986 +
  14.987 +Mercurial's web interface lets users download an archive of any
  14.988 +revision.  This archive will contain a snapshot of the working
  14.989 +directory as of that revision, but it will not contain a copy of the
  14.990 +repository data.
  14.991 +
  14.992 +By default, this feature is not enabled.  To enable it, you'll need to
  14.993 +add an \rcitem{web}{allow\_archive} item to the \rcsection{web}
  14.994 +section of your \hgrc.
  14.995 +
  14.996 +\subsection{Web configuration options}
  14.997 +
  14.998 +Mercurial's web interfaces (the \hgcmd{serve} command, and the
  14.999 +\sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a
 14.1000 +number of configuration options that you can set.  These belong in a
 14.1001 +section named \rcsection{web}.
 14.1002 +\begin{itemize}
 14.1003 +\item[\rcitem{web}{allow\_archive}] Determines which (if any) archive
 14.1004 +  download mechanisms Mercurial supports.  If you enable this
 14.1005 +  feature, users of the web interface will be able to download an
 14.1006 +  archive of whatever revision of a repository they are viewing.
 14.1007 +  To enable the archive feature, this item must take the form of a
 14.1008 +  sequence of words drawn from the list below.
 14.1009 +  \begin{itemize}
 14.1010 +  \item[\texttt{bz2}] A \command{tar} archive, compressed using
 14.1011 +    \texttt{bzip2} compression.  This has the best compression ratio,
 14.1012 +    but uses the most CPU time on the server.
 14.1013 +  \item[\texttt{gz}] A \command{tar} archive, compressed using
 14.1014 +    \texttt{gzip} compression.
 14.1015 +  \item[\texttt{zip}] A \command{zip} archive, compressed using LZW
 14.1016 +    compression.  This format has the worst compression ratio, but is
 14.1017 +    widely used in the Windows world.
 14.1018 +  \end{itemize}
 14.1019 +  If you provide an empty list, or don't have an
 14.1020 +  \rcitem{web}{allow\_archive} entry at all, this feature will be
 14.1021 +  disabled.  Here is an example of how to enable all three supported
 14.1022 +  formats.
 14.1023 +  \begin{codesample4}
 14.1024 +    [web]
 14.1025 +    allow_archive = bz2 gz zip
 14.1026 +  \end{codesample4}
 14.1027 +\item[\rcitem{web}{allowpull}] Boolean.  Determines whether the web
 14.1028 +  interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this
 14.1029 +  repository over~HTTP.  If set to \texttt{no} or \texttt{false}, only
 14.1030 +  the ``human-oriented'' portion of the web interface is available.
 14.1031 +\item[\rcitem{web}{contact}] String.  A free-form (but preferably
 14.1032 +  brief) string identifying the person or group in charge of the
 14.1033 +  repository.  This often contains the name and email address of a
 14.1034 +  person or mailing list.  It often makes sense to place this entry in
 14.1035 +  a repository's own \sfilename{.hg/hgrc} file, but it can make sense
 14.1036 +  to use in a global \hgrc\ if every repository has a single
 14.1037 +  maintainer.
 14.1038 +\item[\rcitem{web}{maxchanges}] Integer.  The default maximum number
 14.1039 +  of changesets to display in a single page of output.
 14.1040 +\item[\rcitem{web}{maxfiles}] Integer.  The default maximum number
 14.1041 +  of modified files to display in a single page of output.
 14.1042 +\item[\rcitem{web}{stripes}] Integer.  If the web interface displays
 14.1043 +  alternating ``stripes'' to make it easier to visually align rows
 14.1044 +  when you are looking at a table, this number controls the number of
 14.1045 +  rows in each stripe.
 14.1046 +\item[\rcitem{web}{style}] Controls the template Mercurial uses to
 14.1047 +  display the web interface.  Mercurial ships with two web templates,
 14.1048 +  named \texttt{default} and \texttt{gitweb} (the latter is much more
 14.1049 +  visually attractive).  You can also specify a custom template of
 14.1050 +  your own; see chapter~\ref{chap:template} for details.  Here, you
 14.1051 +  can see how to enable the \texttt{gitweb} style.
 14.1052 +  \begin{codesample4}
 14.1053 +    [web]
 14.1054 +    style = gitweb
 14.1055 +  \end{codesample4}
 14.1056 +\item[\rcitem{web}{templates}] Path.  The directory in which to search
 14.1057 +  for template files.  By default, Mercurial searches in the directory
 14.1058 +  in which it was installed.
 14.1059 +\end{itemize}
 14.1060 +If you are using \sfilename{hgwebdir.cgi}, you can place a few
 14.1061 +configuration items in a \rcsection{web} section of the
 14.1062 +\sfilename{hgweb.config} file instead of a \hgrc\ file, for
 14.1063 +convenience.  These items are \rcitem{web}{motd} and
 14.1064 +\rcitem{web}{style}.
 14.1065 +
 14.1066 +\subsubsection{Options specific to an individual repository}
 14.1067 +
 14.1068 +A few \rcsection{web} configuration items ought to be placed in a
 14.1069 +repository's local \sfilename{.hg/hgrc}, rather than a user's or
 14.1070 +global \hgrc.
 14.1071 +\begin{itemize}
 14.1072 +\item[\rcitem{web}{description}] String.  A free-form (but preferably
 14.1073 +  brief) string that describes the contents or purpose of the
 14.1074 +  repository.
 14.1075 +\item[\rcitem{web}{name}] String.  The name to use for the repository
 14.1076 +  in the web interface.  This overrides the default name, which is the
 14.1077 +  last component of the repository's path.
 14.1078 +\end{itemize}
 14.1079 +
 14.1080 +\subsubsection{Options specific to the \hgcmd{serve} command}
 14.1081 +
 14.1082 +Some of the items in the \rcsection{web} section of a \hgrc\ file are
 14.1083 +only for use with the \hgcmd{serve} command.
 14.1084 +\begin{itemize}
 14.1085 +\item[\rcitem{web}{accesslog}] Path.  The name of a file into which to
 14.1086 +  write an access log.  By default, the \hgcmd{serve} command writes
 14.1087 +  this information to standard output, not to a file.  Log entries are
 14.1088 +  written in the standard ``combined'' file format used by almost all
 14.1089 +  web servers.
 14.1090 +\item[\rcitem{web}{address}] String.  The local address on which the
 14.1091 +  server should listen for incoming connections.  By default, the
 14.1092 +  server listens on all addresses.
 14.1093 +\item[\rcitem{web}{errorlog}] Path.  The name of a file into which to
 14.1094 +  write an error log.  By default, the \hgcmd{serve} command writes this
 14.1095 +  information to standard error, not to a file.
 14.1096 +\item[\rcitem{web}{ipv6}] Boolean.  Whether to use the IPv6 protocol.
 14.1097 +  By default, IPv6 is not used. 
 14.1098 +\item[\rcitem{web}{port}] Integer.  The TCP~port number on which the
 14.1099 +  server should listen.  The default port number used is~8000.
 14.1100 +\end{itemize}
 14.1101 +
 14.1102 +\subsubsection{Choosing the right \hgrc\ file to add \rcsection{web}
 14.1103 +  items to}
 14.1104 +
 14.1105 +It is important to remember that a web server like Apache or
 14.1106 +\texttt{lighttpd} will run under a user~ID that is different to yours.
 14.1107 +CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will
 14.1108 +usually also run under that user~ID.
 14.1109 +
 14.1110 +If you add \rcsection{web} items to your own personal \hgrc\ file, CGI
 14.1111 +scripts won't read that \hgrc\ file.  Those settings will thus only
 14.1112 +affect the behaviour of the \hgcmd{serve} command when you run it.  To
 14.1113 +cause CGI scripts to see your settings, either create a \hgrc\ file in
 14.1114 +the home directory of the user ID that runs your web server, or add
 14.1115 +those settings to a system-wide \hgrc\ file.
 14.1116 +
 14.1117 +
 14.1118 +%%% Local Variables: 
 14.1119 +%%% mode: latex
 14.1120 +%%% TeX-master: "00book"
 14.1121 +%%% End: 

    15.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    15.2 +++ b/en/ch07-filenames.tex	Thu Jan 29 22:56:27 2009 -0800
    15.3 @@ -0,0 +1,306 @@
    15.4 +\chapter{File names and pattern matching}
    15.5 +\label{chap:names}
    15.6 +
    15.7 +Mercurial provides mechanisms that let you work with file names in a
    15.8 +consistent and expressive way.
    15.9 +
   15.10 +\section{Simple file naming}
   15.11 +
   15.12 +Mercurial uses a unified piece of machinery ``under the hood'' to
   15.13 +handle file names.  Every command behaves uniformly with respect to
   15.14 +file names.  The way in which commands work with file names is as
   15.15 +follows.
   15.16 +
   15.17 +If you explicitly name real files on the command line, Mercurial works
   15.18 +with exactly those files, as you would expect.
   15.19 +\interaction{filenames.files}
   15.20 +
   15.21 +When you provide a directory name, Mercurial will interpret this as
   15.22 +``operate on every file in this directory and its subdirectories''.
   15.23 +Mercurial traverses the files and subdirectories in a directory in
   15.24 +alphabetical order.  When it encounters a subdirectory, it will
   15.25 +traverse that subdirectory before continuing with the current
   15.26 +directory.
   15.27 +\interaction{filenames.dirs}
   15.28 +
   15.29 +\section{Running commands without any file names}
   15.30 +
   15.31 +Mercurial's commands that work with file names have useful default
   15.32 +behaviours when you invoke them without providing any file names or
   15.33 +patterns.  What kind of behaviour you should expect depends on what
   15.34 +the command does.  Here are a few rules of thumb you can use to
   15.35 +predict what a command is likely to do if you don't give it any names
   15.36 +to work with.
   15.37 +\begin{itemize}
   15.38 +\item Most commands will operate on the entire working directory.
   15.39 +  This is what the \hgcmd{add} command does, for example.
   15.40 +\item If the command has effects that are difficult or impossible to
   15.41 +  reverse, it will force you to explicitly provide at least one name
   15.42 +  or pattern (see below).  This protects you from accidentally
   15.43 +  deleting files by running \hgcmd{remove} with no arguments, for
   15.44 +  example.
   15.45 +\end{itemize}
   15.46 +
   15.47 +It's easy to work around these default behaviours if they don't suit
   15.48 +you.  If a command normally operates on the whole working directory,
   15.49 +you can invoke it on just the current directory and its subdirectories
   15.50 +by giving it the name ``\dirname{.}''.
   15.51 +\interaction{filenames.wdir-subdir}
   15.52 +
   15.53 +Along the same lines, some commands normally print file names relative
   15.54 +to the root of the repository, even if you're invoking them from a
   15.55 +subdirectory.  Such a command will print file names relative to your
   15.56 +subdirectory if you give it explicit names.  Here, we're going to run
   15.57 +\hgcmd{status} from a subdirectory, and get it to operate on the
   15.58 +entire working directory while printing file names relative to our
   15.59 +subdirectory, by passing it the output of the \hgcmd{root} command.
   15.60 +\interaction{filenames.wdir-relname}
   15.61 +
   15.62 +\section{Telling you what's going on}
   15.63 +
   15.64 +The \hgcmd{add} example in the preceding section illustrates something
   15.65 +else that's helpful about Mercurial commands.  If a command operates
   15.66 +on a file that you didn't name explicitly on the command line, it will
   15.67 +usually print the name of the file, so that you will not be surprised
   15.68 +what's going on.
   15.69 +
   15.70 +The principle here is of \emph{least surprise}.  If you've exactly
   15.71 +named a file on the command line, there's no point in repeating it
   15.72 +back at you.  If Mercurial is acting on a file \emph{implicitly},
   15.73 +because you provided no names, or a directory, or a pattern (see
   15.74 +below), it's safest to tell you what it's doing.
   15.75 +
   15.76 +For commands that behave this way, you can silence them using the
   15.77 +\hggopt{-q} option.  You can also get them to print the name of every
   15.78 +file, even those you've named explicitly, using the \hggopt{-v}
   15.79 +option.
   15.80 +
   15.81 +\section{Using patterns to identify files}
   15.82 +
   15.83 +In addition to working with file and directory names, Mercurial lets
   15.84 +you use \emph{patterns} to identify files.  Mercurial's pattern
   15.85 +handling is expressive.
   15.86 +
   15.87 +On Unix-like systems (Linux, MacOS, etc.), the job of matching file
   15.88 +names to patterns normally falls to the shell.  On these systems, you
   15.89 +must explicitly tell Mercurial that a name is a pattern.  On Windows,
   15.90 +the shell does not expand patterns, so Mercurial will automatically
   15.91 +identify names that are patterns, and expand them for you.
   15.92 +
   15.93 +To provide a pattern in place of a regular name on the command line,
   15.94 +the mechanism is simple:
   15.95 +\begin{codesample2}
   15.96 +  syntax:patternbody
   15.97 +\end{codesample2}
   15.98 +That is, a pattern is identified by a short text string that says what
   15.99 +kind of pattern this is, followed by a colon, followed by the actual
  15.100 +pattern.
  15.101 +
  15.102 +Mercurial supports two kinds of pattern syntax.  The most frequently
  15.103 +used is called \texttt{glob}; this is the same kind of pattern
  15.104 +matching used by the Unix shell, and should be familiar to Windows
  15.105 +command prompt users, too.  
  15.106 +
  15.107 +When Mercurial does automatic pattern matching on Windows, it uses
  15.108 +\texttt{glob} syntax.  You can thus omit the ``\texttt{glob:}'' prefix
  15.109 +on Windows, but it's safe to use it, too.
  15.110 +
  15.111 +The \texttt{re} syntax is more powerful; it lets you specify patterns
  15.112 +using regular expressions, also known as regexps.
  15.113 +
  15.114 +By the way, in the examples that follow, notice that I'm careful to
  15.115 +wrap all of my patterns in quote characters, so that they won't get
  15.116 +expanded by the shell before Mercurial sees them.
  15.117 +
  15.118 +\subsection{Shell-style \texttt{glob} patterns}
  15.119 +
  15.120 +This is an overview of the kinds of patterns you can use when you're
  15.121 +matching on glob patterns.
  15.122 +
  15.123 +The ``\texttt{*}'' character matches any string, within a single
  15.124 +directory.
  15.125 +\interaction{filenames.glob.star}
  15.126 +
  15.127 +The ``\texttt{**}'' pattern matches any string, and crosses directory
  15.128 +boundaries.  It's not a standard Unix glob token, but it's accepted by
  15.129 +several popular Unix shells, and is very useful.
  15.130 +\interaction{filenames.glob.starstar}
  15.131 +
  15.132 +The ``\texttt{?}'' pattern matches any single character.
  15.133 +\interaction{filenames.glob.question}
  15.134 +
  15.135 +The ``\texttt{[}'' character begins a \emph{character class}.  This
  15.136 +matches any single character within the class.  The class ends with a
  15.137 +``\texttt{]}'' character.  A class may contain multiple \emph{range}s
  15.138 +of the form ``\texttt{a-f}'', which is shorthand for
  15.139 +``\texttt{abcdef}''.
  15.140 +\interaction{filenames.glob.range}
  15.141 +If the first character after the ``\texttt{[}'' in a character class
  15.142 +is a ``\texttt{!}'', it \emph{negates} the class, making it match any
  15.143 +single character not in the class.
  15.144 +
  15.145 +A ``\texttt{\{}'' begins a group of subpatterns, where the whole group
  15.146 +matches if any subpattern in the group matches.  The ``\texttt{,}''
  15.147 +character separates subpatterns, and ``\texttt{\}}'' ends the group.
  15.148 +\interaction{filenames.glob.group}
  15.149 +
  15.150 +\subsubsection{Watch out!}
  15.151 +
  15.152 +Don't forget that if you want to match a pattern in any directory, you
  15.153 +should not be using the ``\texttt{*}'' match-any token, as this will
  15.154 +only match within one directory.  Instead, use the ``\texttt{**}''
  15.155 +token.  This small example illustrates the difference between the two.
  15.156 +\interaction{filenames.glob.star-starstar}
  15.157 +
  15.158 +\subsection{Regular expression matching with \texttt{re} patterns}
  15.159 +
  15.160 +Mercurial accepts the same regular expression syntax as the Python
  15.161 +programming language (it uses Python's regexp engine internally).
  15.162 +This is based on the Perl language's regexp syntax, which is the most
  15.163 +popular dialect in use (it's also used in Java, for example).
  15.164 +
  15.165 +I won't discuss Mercurial's regexp dialect in any detail here, as
  15.166 +regexps are not often used.  Perl-style regexps are in any case
  15.167 +already exhaustively documented on a multitude of web sites, and in
  15.168 +many books.  Instead, I will focus here on a few things you should
  15.169 +know if you find yourself needing to use regexps with Mercurial.
  15.170 +
  15.171 +A regexp is matched against an entire file name, relative to the root
  15.172 +of the repository.  In other words, even if you're already in
  15.173 +subbdirectory \dirname{foo}, if you want to match files under this
  15.174 +directory, your pattern must start with ``\texttt{foo/}''.
  15.175 +
  15.176 +One thing to note, if you're familiar with Perl-style regexps, is that
  15.177 +Mercurial's are \emph{rooted}.  That is, a regexp starts matching
  15.178 +against the beginning of a string; it doesn't look for a match
  15.179 +anywhere within the string.  To match anywhere in a string, start
  15.180 +your pattern with ``\texttt{.*}''.
  15.181 +
  15.182 +\section{Filtering files}
  15.183 +
  15.184 +Not only does Mercurial give you a variety of ways to specify files;
  15.185 +it lets you further winnow those files using \emph{filters}.  Commands
  15.186 +that work with file names accept two filtering options.
  15.187 +\begin{itemize}
  15.188 +\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern
  15.189 +  that file names must match in order to be processed.
  15.190 +\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to
  15.191 +  \emph{avoid} processing files, if they match this pattern.
  15.192 +\end{itemize}
  15.193 +You can provide multiple \hggopt{-I} and \hggopt{-X} options on the
  15.194 +command line, and intermix them as you please.  Mercurial interprets
  15.195 +the patterns you provide using glob syntax by default (but you can use
  15.196 +regexps if you need to).
  15.197 +
  15.198 +You can read a \hggopt{-I} filter as ``process only the files that
  15.199 +match this filter''.
  15.200 +\interaction{filenames.filter.include}
  15.201 +The \hggopt{-X} filter is best read as ``process only the files that
  15.202 +don't match this pattern''.
  15.203 +\interaction{filenames.filter.exclude}
  15.204 +
  15.205 +\section{Ignoring unwanted files and directories}
  15.206 +
  15.207 +XXX.
  15.208 +
  15.209 +\section{Case sensitivity}
  15.210 +\label{sec:names:case}
  15.211 +
  15.212 +If you're working in a mixed development environment that contains
  15.213 +both Linux (or other Unix) systems and Macs or Windows systems, you
  15.214 +should keep in the back of your mind the knowledge that they treat the
  15.215 +case (``N'' versus ``n'') of file names in incompatible ways.  This is
  15.216 +not very likely to affect you, and it's easy to deal with if it does,
  15.217 +but it could surprise you if you don't know about it.
  15.218 +
  15.219 +Operating systems and filesystems differ in the way they handle the
  15.220 +\emph{case} of characters in file and directory names.  There are
  15.221 +three common ways to handle case in names.
  15.222 +\begin{itemize}
  15.223 +\item Completely case insensitive.  Uppercase and lowercase versions
  15.224 +  of a letter are treated as identical, both when creating a file and
  15.225 +  during subsequent accesses.  This is common on older DOS-based
  15.226 +  systems.
  15.227 +\item Case preserving, but insensitive.  When a file or directory is
  15.228 +  created, the case of its name is stored, and can be retrieved and
  15.229 +  displayed by the operating system.  When an existing file is being
  15.230 +  looked up, its case is ignored.  This is the standard arrangement on
  15.231 +  Windows and MacOS.  The names \filename{foo} and \filename{FoO}
  15.232 +  identify the same file.  This treatment of uppercase and lowercase
  15.233 +  letters as interchangeable is also referred to as \emph{case
  15.234 +    folding}.
  15.235 +\item Case sensitive.  The case of a name is significant at all times.
  15.236 +  The names \filename{foo} and {FoO} identify different files.  This
  15.237 +  is the way Linux and Unix systems normally work.
  15.238 +\end{itemize}
  15.239 +
  15.240 +On Unix-like systems, it is possible to have any or all of the above
  15.241 +ways of handling case in action at once.  For example, if you use a
  15.242 +USB thumb drive formatted with a FAT32 filesystem on a Linux system,
  15.243 +Linux will handle names on that filesystem in a case preserving, but
  15.244 +insensitive, way.
  15.245 +
  15.246 +\subsection{Safe, portable repository storage}
  15.247 +
  15.248 +Mercurial's repository storage mechanism is \emph{case safe}.  It
  15.249 +translates file names so that they can be safely stored on both case
  15.250 +sensitive and case insensitive filesystems.  This means that you can
  15.251 +use normal file copying tools to transfer a Mercurial repository onto,
  15.252 +for example, a USB thumb drive, and safely move that drive and
  15.253 +repository back and forth between a Mac, a PC running Windows, and a
  15.254 +Linux box.
  15.255 +
  15.256 +\subsection{Detecting case conflicts}
  15.257 +
  15.258 +When operating in the working directory, Mercurial honours the naming
  15.259 +policy of the filesystem where the working directory is located.  If
  15.260 +the filesystem is case preserving, but insensitive, Mercurial will
  15.261 +treat names that differ only in case as the same.
  15.262 +
  15.263 +An important aspect of this approach is that it is possible to commit
  15.264 +a changeset on a case sensitive (typically Linux or Unix) filesystem
  15.265 +that will cause trouble for users on case insensitive (usually Windows
  15.266 +and MacOS) users.  If a Linux user commits changes to two files, one
  15.267 +named \filename{myfile.c} and the other named \filename{MyFile.C},
  15.268 +they will be stored correctly in the repository.  And in the working
  15.269 +directories of other Linux users, they will be correctly represented
  15.270 +as separate files.
  15.271 +
  15.272 +If a Windows or Mac user pulls this change, they will not initially
  15.273 +have a problem, because Mercurial's repository storage mechanism is
  15.274 +case safe.  However, once they try to \hgcmd{update} the working
  15.275 +directory to that changeset, or \hgcmd{merge} with that changeset,
  15.276 +Mercurial will spot the conflict between the two file names that the
  15.277 +filesystem would treat as the same, and forbid the update or merge
  15.278 +from occurring.
  15.279 +
  15.280 +\subsection{Fixing a case conflict}
  15.281 +
  15.282 +If you are using Windows or a Mac in a mixed environment where some of
  15.283 +your collaborators are using Linux or Unix, and Mercurial reports a
  15.284 +case folding conflict when you try to \hgcmd{update} or \hgcmd{merge},
  15.285 +the procedure to fix the problem is simple.
  15.286 +
  15.287 +Just find a nearby Linux or Unix box, clone the problem repository
  15.288 +onto it, and use Mercurial's \hgcmd{rename} command to change the
  15.289 +names of any offending files or directories so that they will no
  15.290 +longer cause case folding conflicts.  Commit this change, \hgcmd{pull}
  15.291 +or \hgcmd{push} it across to your Windows or MacOS system, and
  15.292 +\hgcmd{update} to the revision with the non-conflicting names.
  15.293 +
  15.294 +The changeset with case-conflicting names will remain in your
  15.295 +project's history, and you still won't be able to \hgcmd{update} your
  15.296 +working directory to that changeset on a Windows or MacOS system, but
  15.297 +you can continue development unimpeded.
  15.298 +
  15.299 +\begin{note}
  15.300 +  Prior to version~0.9.3, Mercurial did not use a case safe repository
  15.301 +  storage mechanism, and did not detect case folding conflicts.  If
  15.302 +  you are using an older version of Mercurial on Windows or MacOS, I
  15.303 +  strongly recommend that you upgrade.
  15.304 +\end{note}
  15.305 +
  15.306 +%%% Local Variables: 
  15.307 +%%% mode: latex
  15.308 +%%% TeX-master: "00book"
  15.309 +%%% End: 

    16.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    16.2 +++ b/en/ch08-branch.tex	Thu Jan 29 22:56:27 2009 -0800
    16.3 @@ -0,0 +1,392 @@
    16.4 +\chapter{Managing releases and branchy development}
    16.5 +\label{chap:branch}
    16.6 +
    16.7 +Mercurial provides several mechanisms for you to manage a project that
    16.8 +is making progress on multiple fronts at once.  To understand these
    16.9 +mechanisms, let's first take a brief look at a fairly normal software
   16.10 +project structure.
   16.11 +
   16.12 +Many software projects issue periodic ``major'' releases that contain
   16.13 +substantial new features.  In parallel, they may issue ``minor''
   16.14 +releases.  These are usually identical to the major releases off which
   16.15 +they're based, but with a few bugs fixed.
   16.16 +
   16.17 +In this chapter, we'll start by talking about how to keep records of
   16.18 +project milestones such as releases.  We'll then continue on to talk
   16.19 +about the flow of work between different phases of a project, and how
   16.20 +Mercurial can help you to isolate and manage this work.
   16.21 +
   16.22 +\section{Giving a persistent name to a revision}
   16.23 +
   16.24 +Once you decide that you'd like to call a particular revision a
   16.25 +``release'', it's a good idea to record the identity of that revision.
   16.26 +This will let you reproduce that release at a later date, for whatever
   16.27 +purpose you might need at the time (reproducing a bug, porting to a
   16.28 +new platform, etc).
   16.29 +\interaction{tag.init}
   16.30 +
   16.31 +Mercurial lets you give a permanent name to any revision using the
   16.32 +\hgcmd{tag} command.  Not surprisingly, these names are called
   16.33 +``tags''.
   16.34 +\interaction{tag.tag}
   16.35 +
   16.36 +A tag is nothing more than a ``symbolic name'' for a revision.  Tags
   16.37 +exist purely for your convenience, so that you have a handy permanent
   16.38 +way to refer to a revision; Mercurial doesn't interpret the tag names
   16.39 +you use in any way.  Neither does Mercurial place any restrictions on
   16.40 +the name of a tag, beyond a few that are necessary to ensure that a
   16.41 +tag can be parsed unambiguously.  A tag name cannot contain any of the
   16.42 +following characters:
   16.43 +\begin{itemize}
   16.44 +\item Colon (ASCII 58, ``\texttt{:}'')
   16.45 +\item Carriage return (ASCII 13, ``\Verb+\r+'')
   16.46 +\item Newline (ASCII 10, ``\Verb+\n+'')
   16.47 +\end{itemize}
   16.48 +
   16.49 +You can use the \hgcmd{tags} command to display the tags present in
   16.50 +your repository.  In the output, each tagged revision is identified
   16.51 +first by its name, then by revision number, and finally by the unique
   16.52 +hash of the revision.  
   16.53 +\interaction{tag.tags}
   16.54 +Notice that \texttt{tip} is listed in the output of \hgcmd{tags}.  The
   16.55 +\texttt{tip} tag is a special ``floating'' tag, which always
   16.56 +identifies the newest revision in the repository.
   16.57 +
   16.58 +In the output of the \hgcmd{tags} command, tags are listed in reverse
   16.59 +order, by revision number.  This usually means that recent tags are
   16.60 +listed before older tags.  It also means that \texttt{tip} is always
   16.61 +going to be the first tag listed in the output of \hgcmd{tags}.
   16.62 +
   16.63 +When you run \hgcmd{log}, if it displays a revision that has tags
   16.64 +associated with it, it will print those tags.
   16.65 +\interaction{tag.log}
   16.66 +
   16.67 +Any time you need to provide a revision~ID to a Mercurial command, the
   16.68 +command will accept a tag name in its place.  Internally, Mercurial
   16.69 +will translate your tag name into the corresponding revision~ID, then
   16.70 +use that.
   16.71 +\interaction{tag.log.v1.0}
   16.72 +
   16.73 +There's no limit on the number of tags you can have in a repository,
   16.74 +or on the number of tags that a single revision can have.  As a
   16.75 +practical matter, it's not a great idea to have ``too many'' (a number
   16.76 +which will vary from project to project), simply because tags are
   16.77 +supposed to help you to find revisions.  If you have lots of tags, the
   16.78 +ease of using them to identify revisions diminishes rapidly.
   16.79 +
   16.80 +For example, if your project has milestones as frequent as every few
   16.81 +days, it's perfectly reasonable to tag each one of those.  But if you
   16.82 +have a continuous build system that makes sure every revision can be
   16.83 +built cleanly, you'd be introducing a lot of noise if you were to tag
   16.84 +every clean build.  Instead, you could tag failed builds (on the
   16.85 +assumption that they're rare!), or simply not use tags to track
   16.86 +buildability.
   16.87 +
   16.88 +If you want to remove a tag that you no longer want, use
   16.89 +\hgcmdargs{tag}{--remove}.  
   16.90 +\interaction{tag.remove}
   16.91 +You can also modify a tag at any time, so that it identifies a
   16.92 +different revision, by simply issuing a new \hgcmd{tag} command.
   16.93 +You'll have to use the \hgopt{tag}{-f} option to tell Mercurial that
   16.94 +you \emph{really} want to update the tag.
   16.95 +\interaction{tag.replace}
   16.96 +There will still be a permanent record of the previous identity of the
   16.97 +tag, but Mercurial will no longer use it.  There's thus no penalty to
   16.98 +tagging the wrong revision; all you have to do is turn around and tag
   16.99 +the correct revision once you discover your error.
  16.100 +
  16.101 +Mercurial stores tags in a normal revision-controlled file in your
  16.102 +repository.  If you've created any tags, you'll find them in a file
  16.103 +named \sfilename{.hgtags}.  When you run the \hgcmd{tag} command,
  16.104 +Mercurial modifies this file, then automatically commits the change to
  16.105 +it.  This means that every time you run \hgcmd{tag}, you'll see a
  16.106 +corresponding changeset in the output of \hgcmd{log}.
  16.107 +\interaction{tag.tip}
  16.108 +
  16.109 +\subsection{Handling tag conflicts during a merge}
  16.110 +
  16.111 +You won't often need to care about the \sfilename{.hgtags} file, but
  16.112 +it sometimes makes its presence known during a merge.  The format of
  16.113 +the file is simple: it consists of a series of lines.  Each line
  16.114 +starts with a changeset hash, followed by a space, followed by the
  16.115 +name of a tag.
  16.116 +
  16.117 +If you're resolving a conflict in the \sfilename{.hgtags} file during
  16.118 +a merge, there's one twist to modifying the \sfilename{.hgtags} file:
  16.119 +when Mercurial is parsing the tags in a repository, it \emph{never}
  16.120 +reads the working copy of the \sfilename{.hgtags} file.  Instead, it
  16.121 +reads the \emph{most recently committed} revision of the file.
  16.122 +
  16.123 +An unfortunate consequence of this design is that you can't actually
  16.124 +verify that your merged \sfilename{.hgtags} file is correct until
  16.125 +\emph{after} you've committed a change.  So if you find yourself
  16.126 +resolving a conflict on \sfilename{.hgtags} during a merge, be sure to
  16.127 +run \hgcmd{tags} after you commit.  If it finds an error in the
  16.128 +\sfilename{.hgtags} file, it will report the location of the error,
  16.129 +which you can then fix and commit.  You should then run \hgcmd{tags}
  16.130 +again, just to be sure that your fix is correct.
  16.131 +
  16.132 +\subsection{Tags and cloning}
  16.133 +
  16.134 +You may have noticed that the \hgcmd{clone} command has a
  16.135 +\hgopt{clone}{-r} option that lets you clone an exact copy of the
  16.136 +repository as of a particular changeset.  The new clone will not
  16.137 +contain any project history that comes after the revision you
  16.138 +specified.  This has an interaction with tags that can surprise the
  16.139 +unwary.
  16.140 +
  16.141 +Recall that a tag is stored as a revision to the \sfilename{.hgtags}
  16.142 +file, so that when you create a tag, the changeset in which it's
  16.143 +recorded necessarily refers to an older changeset.  When you run
  16.144 +\hgcmdargs{clone}{-r foo} to clone a repository as of tag
  16.145 +\texttt{foo}, the new clone \emph{will not contain the history that
  16.146 +  created the tag} that you used to clone the repository.  The result
  16.147 +is that you'll get exactly the right subset of the project's history
  16.148 +in the new repository, but \emph{not} the tag you might have expected.
  16.149 +
  16.150 +\subsection{When permanent tags are too much}
  16.151 +
  16.152 +Since Mercurial's tags are revision controlled and carried around with
  16.153 +a project's history, everyone you work with will see the tags you
  16.154 +create.  But giving names to revisions has uses beyond simply noting
  16.155 +that revision \texttt{4237e45506ee} is really \texttt{v2.0.2}.  If
  16.156 +you're trying to track down a subtle bug, you might want a tag to
  16.157 +remind you of something like ``Anne saw the symptoms with this
  16.158 +revision''.
  16.159 +
  16.160 +For cases like this, what you might want to use are \emph{local} tags.
  16.161 +You can create a local tag with the \hgopt{tag}{-l} option to the
  16.162 +\hgcmd{tag} command.  This will store the tag in a file called
  16.163 +\sfilename{.hg/localtags}.  Unlike \sfilename{.hgtags},
  16.164 +\sfilename{.hg/localtags} is not revision controlled.  Any tags you
  16.165 +create using \hgopt{tag}{-l} remain strictly local to the repository
  16.166 +you're currently working in.
  16.167 +
  16.168 +\section{The flow of changes---big picture vs. little}
  16.169 +
  16.170 +To return to the outline I sketched at the beginning of a chapter,
  16.171 +let's think about a project that has multiple concurrent pieces of
  16.172 +work under development at once.
  16.173 +
  16.174 +There might be a push for a new ``main'' release; a new minor bugfix
  16.175 +release to the last main release; and an unexpected ``hot fix'' to an
  16.176 +old release that is now in maintenance mode.
  16.177 +
  16.178 +The usual way people refer to these different concurrent directions of
  16.179 +development is as ``branches''.  However, we've already seen numerous
  16.180 +times that Mercurial treats \emph{all of history} as a series of
  16.181 +branches and merges.  Really, what we have here is two ideas that are
  16.182 +peripherally related, but which happen to share a name.
  16.183 +\begin{itemize}
  16.184 +\item ``Big picture'' branches represent the sweep of a project's
  16.185 +  evolution; people give them names, and talk about them in
  16.186 +  conversation.
  16.187 +\item ``Little picture'' branches are artefacts of the day-to-day
  16.188 +  activity of developing and merging changes.  They expose the
  16.189 +  narrative of how the code was developed.
  16.190 +\end{itemize}
  16.191 +
  16.192 +\section{Managing big-picture branches in repositories}
  16.193 +
  16.194 +The easiest way to isolate a ``big picture'' branch in Mercurial is in
  16.195 +a dedicated repository.  If you have an existing shared
  16.196 +repository---let's call it \texttt{myproject}---that reaches a ``1.0''
  16.197 +milestone, you can start to prepare for future maintenance releases on
  16.198 +top of version~1.0 by tagging the revision from which you prepared
  16.199 +the~1.0 release.
  16.200 +\interaction{branch-repo.tag}
  16.201 +You can then clone a new shared \texttt{myproject-1.0.1} repository as
  16.202 +of that tag.
  16.203 +\interaction{branch-repo.clone}
  16.204 +
  16.205 +Afterwards, if someone needs to work on a bug fix that ought to go
  16.206 +into an upcoming~1.0.1 minor release, they clone the
  16.207 +\texttt{myproject-1.0.1} repository, make their changes, and push them
  16.208 +back.
  16.209 +\interaction{branch-repo.bugfix}
  16.210 +Meanwhile, development for the next major release can continue,
  16.211 +isolated and unabated, in the \texttt{myproject} repository.
  16.212 +\interaction{branch-repo.new}
  16.213 +
  16.214 +\section{Don't repeat yourself: merging across branches}
  16.215 +
  16.216 +In many cases, if you have a bug to fix on a maintenance branch, the
  16.217 +chances are good that the bug exists on your project's main branch
  16.218 +(and possibly other maintenance branches, too).  It's a rare developer
  16.219 +who wants to fix the same bug multiple times, so let's look at a few
  16.220 +ways that Mercurial can help you to manage these bugfixes without
  16.221 +duplicating your work.
  16.222 +
  16.223 +In the simplest instance, all you need to do is pull changes from your
  16.224 +maintenance branch into your local clone of the target branch.
  16.225 +\interaction{branch-repo.pull}
  16.226 +You'll then need to merge the heads of the two branches, and push back
  16.227 +to the main branch.
  16.228 +\interaction{branch-repo.merge}
  16.229 +
  16.230 +\section{Naming branches within one repository}
  16.231 +
  16.232 +In most instances, isolating branches in repositories is the right
  16.233 +approach.  Its simplicity makes it easy to understand; and so it's
  16.234 +hard to make mistakes.  There's a one-to-one relationship between
  16.235 +branches you're working in and directories on your system.  This lets
  16.236 +you use normal (non-Mercurial-aware) tools to work on files within a
  16.237 +branch/repository.
  16.238 +
  16.239 +If you're more in the ``power user'' category (\emph{and} your
  16.240 +collaborators are too), there is an alternative way of handling
  16.241 +branches that you can consider.  I've already mentioned the
  16.242 +human-level distinction between ``small picture'' and ``big picture''
  16.243 +branches.  While Mercurial works with multiple ``small picture''
  16.244 +branches in a repository all the time (for example after you pull
  16.245 +changes in, but before you merge them), it can \emph{also} work with
  16.246 +multiple ``big picture'' branches.
  16.247 +
  16.248 +The key to working this way is that Mercurial lets you assign a
  16.249 +persistent \emph{name} to a branch.  There always exists a branch
  16.250 +named \texttt{default}.  Even before you start naming branches
  16.251 +yourself, you can find traces of the \texttt{default} branch if you
  16.252 +look for them.
  16.253 +
  16.254 +As an example, when you run the \hgcmd{commit} command, and it pops up
  16.255 +your editor so that you can enter a commit message, look for a line
  16.256 +that contains the text ``\texttt{HG: branch default}'' at the bottom.
  16.257 +This is telling you that your commit will occur on the branch named
  16.258 +\texttt{default}.
  16.259 +
  16.260 +To start working with named branches, use the \hgcmd{branches}
  16.261 +command.  This command lists the named branches already present in
  16.262 +your repository, telling you which changeset is the tip of each.
  16.263 +\interaction{branch-named.branches}
  16.264 +Since you haven't created any named branches yet, the only one that
  16.265 +exists is \texttt{default}.
  16.266 +
  16.267 +To find out what the ``current'' branch is, run the \hgcmd{branch}
  16.268 +command, giving it no arguments.  This tells you what branch the
  16.269 +parent of the current changeset is on.
  16.270 +\interaction{branch-named.branch}
  16.271 +
  16.272 +To create a new branch, run the \hgcmd{branch} command again.  This
  16.273 +time, give it one argument: the name of the branch you want to create.
  16.274 +\interaction{branch-named.create}
  16.275 +
  16.276 +After you've created a branch, you might wonder what effect the
  16.277 +\hgcmd{branch} command has had.  What do the \hgcmd{status} and
  16.278 +\hgcmd{tip} commands report?
  16.279 +\interaction{branch-named.status}
  16.280 +Nothing has changed in the working directory, and there's been no new
  16.281 +history created.  As this suggests, running the \hgcmd{branch} command
  16.282 +has no permanent effect; it only tells Mercurial what branch name to
  16.283 +use the \emph{next} time you commit a changeset.
  16.284 +
  16.285 +When you commit a change, Mercurial records the name of the branch on
  16.286 +which you committed.  Once you've switched from the \texttt{default}
  16.287 +branch to another and committed, you'll see the name of the new branch
  16.288 +show up in the output of \hgcmd{log}, \hgcmd{tip}, and other commands
  16.289 +that display the same kind of output.
  16.290 +\interaction{branch-named.commit}
  16.291 +The \hgcmd{log}-like commands will print the branch name of every
  16.292 +changeset that's not on the \texttt{default} branch.  As a result, if
  16.293 +you never use named branches, you'll never see this information.
  16.294 +
  16.295 +Once you've named a branch and committed a change with that name,
  16.296 +every subsequent commit that descends from that change will inherit
  16.297 +the same branch name.  You can change the name of a branch at any
  16.298 +time, using the \hgcmd{branch} command.  
  16.299 +\interaction{branch-named.rebranch}
  16.300 +In practice, this is something you won't do very often, as branch
  16.301 +names tend to have fairly long lifetimes.  (This isn't a rule, just an
  16.302 +observation.)
  16.303 +
  16.304 +\section{Dealing with multiple named branches in a repository}
  16.305 +
  16.306 +If you have more than one named branch in a repository, Mercurial will
  16.307 +remember the branch that your working directory on when you start a
  16.308 +command like \hgcmd{update} or \hgcmdargs{pull}{-u}.  It will update
  16.309 +the working directory to the tip of this branch, no matter what the
  16.310 +``repo-wide'' tip is.  To update to a revision that's on a different
  16.311 +named branch, you may need to use the \hgopt{update}{-C} option to
  16.312 +\hgcmd{update}.
  16.313 +
  16.314 +This behaviour is a little subtle, so let's see it in action.  First,
  16.315 +let's remind ourselves what branch we're currently on, and what
  16.316 +branches are in our repository.
  16.317 +\interaction{branch-named.parents}
  16.318 +We're on the \texttt{bar} branch, but there also exists an older
  16.319 +\hgcmd{foo} branch.
  16.320 +
  16.321 +We can \hgcmd{update} back and forth between the tips of the
  16.322 +\texttt{foo} and \texttt{bar} branches without needing to use the
  16.323 +\hgopt{update}{-C} option, because this only involves going backwards
  16.324 +and forwards linearly through our change history.
  16.325 +\interaction{branch-named.update-switchy}
  16.326 +
  16.327 +If we go back to the \texttt{foo} branch and then run \hgcmd{update},
  16.328 +it will keep us on \texttt{foo}, not move us to the tip of
  16.329 +\texttt{bar}.
  16.330 +\interaction{branch-named.update-nothing}
  16.331 +
  16.332 +Committing a new change on the \texttt{foo} branch introduces a new
  16.333 +head.
  16.334 +\interaction{branch-named.foo-commit}
  16.335 +
  16.336 +\section{Branch names and merging}
  16.337 +
  16.338 +As you've probably noticed, merges in Mercurial are not symmetrical.
  16.339 +Let's say our repository has two heads, 17 and 23.  If I
  16.340 +\hgcmd{update} to 17 and then \hgcmd{merge} with 23, Mercurial records
  16.341 +17 as the first parent of the merge, and 23 as the second.  Whereas if
  16.342 +I \hgcmd{update} to 23 and then \hgcmd{merge} with 17, it records 23
  16.343 +as the first parent, and 17 as the second.
  16.344 +
  16.345 +This affects Mercurial's choice of branch name when you merge.  After
  16.346 +a merge, Mercurial will retain the branch name of the first parent
  16.347 +when you commit the result of the merge.  If your first parent's
  16.348 +branch name is \texttt{foo}, and you merge with \texttt{bar}, the
  16.349 +branch name will still be \texttt{foo} after you merge.
  16.350 +
  16.351 +It's not unusual for a repository to contain multiple heads, each with
  16.352 +the same branch name.  Let's say I'm working on the \texttt{foo}
  16.353 +branch, and so are you.  We commit different changes; I pull your
  16.354 +changes; I now have two heads, each claiming to be on the \texttt{foo}
  16.355 +branch.  The result of a merge will be a single head on the
  16.356 +\texttt{foo} branch, as you might hope.
  16.357 +
  16.358 +But if I'm working on the \texttt{bar} branch, and I merge work from
  16.359 +the \texttt{foo} branch, the result will remain on the \texttt{bar}
  16.360 +branch.
  16.361 +\interaction{branch-named.merge}
  16.362 +
  16.363 +To give a more concrete example, if I'm working on the
  16.364 +\texttt{bleeding-edge} branch, and I want to bring in the latest fixes
  16.365 +from the \texttt{stable} branch, Mercurial will choose the ``right''
  16.366 +(\texttt{bleeding-edge}) branch name when I pull and merge from
  16.367 +\texttt{stable}.
  16.368 +
  16.369 +\section{Branch naming is generally useful}
  16.370 +
  16.371 +You shouldn't think of named branches as applicable only to situations
  16.372 +where you have multiple long-lived branches cohabiting in a single
  16.373 +repository.  They're very useful even in the one-branch-per-repository
  16.374 +case.  
  16.375 +
  16.376 +In the simplest case, giving a name to each branch gives you a
  16.377 +permanent record of which branch a changeset originated on.  This
  16.378 +gives you more context when you're trying to follow the history of a
  16.379 +long-lived branchy project.
  16.380 +
  16.381 +If you're working with shared repositories, you can set up a
  16.382 +\hook{pretxnchangegroup} hook on each that will block incoming changes
  16.383 +that have the ``wrong'' branch name.  This provides a simple, but
  16.384 +effective, defence against people accidentally pushing changes from a
  16.385 +``bleeding edge'' branch to a ``stable'' branch.  Such a hook might
  16.386 +look like this inside the shared repo's \hgrc.
  16.387 +\begin{codesample2}
  16.388 +  [hooks]
  16.389 +  pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch
  16.390 +\end{codesample2}
  16.391 +
  16.392 +%%% Local Variables: 
  16.393 +%%% mode: latex
  16.394 +%%% TeX-master: "00book"
  16.395 +%%% End: 

    17.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    17.2 +++ b/en/ch09-undo.tex	Thu Jan 29 22:56:27 2009 -0800
    17.3 @@ -0,0 +1,767 @@
    17.4 +\chapter{Finding and fixing your mistakes}
    17.5 +\label{chap:undo}
    17.6 +
    17.7 +To err might be human, but to really handle the consequences well
    17.8 +takes a top-notch revision control system.  In this chapter, we'll
    17.9 +discuss some of the techniques you can use when you find that a
   17.10 +problem has crept into your project.  Mercurial has some highly
   17.11 +capable features that will help you to isolate the sources of
   17.12 +problems, and to handle them appropriately.
   17.13 +
   17.14 +\section{Erasing local history}
   17.15 +
   17.16 +\subsection{The accidental commit}
   17.17 +
   17.18 +I have the occasional but persistent problem of typing rather more
   17.19 +quickly than I can think, which sometimes results in me committing a
   17.20 +changeset that is either incomplete or plain wrong.  In my case, the
   17.21 +usual kind of incomplete changeset is one in which I've created a new
   17.22 +source file, but forgotten to \hgcmd{add} it.  A ``plain wrong''
   17.23 +changeset is not as common, but no less annoying.
   17.24 +
   17.25 +\subsection{Rolling back a transaction}
   17.26 +\label{sec:undo:rollback}
   17.27 +
   17.28 +In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats
   17.29 +each modification of a repository as a \emph{transaction}.  Every time
   17.30 +you commit a changeset or pull changes from another repository,
   17.31 +Mercurial remembers what you did.  You can undo, or \emph{roll back},
   17.32 +exactly one of these actions using the \hgcmd{rollback} command.  (See
   17.33 +section~\ref{sec:undo:rollback-after-push} for an important caveat
   17.34 +about the use of this command.)
   17.35 +
   17.36 +Here's a mistake that I often find myself making: committing a change
   17.37 +in which I've created a new file, but forgotten to \hgcmd{add} it.
   17.38 +\interaction{rollback.commit}
   17.39 +Looking at the output of \hgcmd{status} after the commit immediately
   17.40 +confirms the error.
   17.41 +\interaction{rollback.status}
   17.42 +The commit captured the changes to the file \filename{a}, but not the
   17.43 +new file \filename{b}.  If I were to push this changeset to a
   17.44 +repository that I shared with a colleague, the chances are high that
   17.45 +something in \filename{a} would refer to \filename{b}, which would not
   17.46 +be present in their repository when they pulled my changes.  I would
   17.47 +thus become the object of some indignation.
   17.48 +
   17.49 +However, luck is with me---I've caught my error before I pushed the
   17.50 +changeset.  I use the \hgcmd{rollback} command, and Mercurial makes
   17.51 +that last changeset vanish.
   17.52 +\interaction{rollback.rollback}
   17.53 +Notice that the changeset is no longer present in the repository's
   17.54 +history, and the working directory once again thinks that the file
   17.55 +\filename{a} is modified.  The commit and rollback have left the
   17.56 +working directory exactly as it was prior to the commit; the changeset
   17.57 +has been completely erased.  I can now safely \hgcmd{add} the file
   17.58 +\filename{b}, and rerun my commit.
   17.59 +\interaction{rollback.add}
   17.60 +
   17.61 +\subsection{The erroneous pull}
   17.62 +
   17.63 +It's common practice with Mercurial to maintain separate development
   17.64 +branches of a project in different repositories.  Your development
   17.65 +team might have one shared repository for your project's ``0.9''
   17.66 +release, and another, containing different changes, for the ``1.0''
   17.67 +release.
   17.68 +
   17.69 +Given this, you can imagine that the consequences could be messy if
   17.70 +you had a local ``0.9'' repository, and accidentally pulled changes
   17.71 +from the shared ``1.0'' repository into it.  At worst, you could be
   17.72 +paying insufficient attention, and push those changes into the shared
   17.73 +``0.9'' tree, confusing your entire team (but don't worry, we'll
   17.74 +return to this horror scenario later).  However, it's more likely that
   17.75 +you'll notice immediately, because Mercurial will display the URL it's
   17.76 +pulling from, or you will see it pull a suspiciously large number of
   17.77 +changes into the repository.
   17.78 +
   17.79 +The \hgcmd{rollback} command will work nicely to expunge all of the
   17.80 +changesets that you just pulled.  Mercurial groups all changes from
   17.81 +one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is
   17.82 +all you need to undo this mistake.
   17.83 +
   17.84 +\subsection{Rolling back is useless once you've pushed}
   17.85 +\label{sec:undo:rollback-after-push}
   17.86 +
   17.87 +The value of the \hgcmd{rollback} command drops to zero once you've
   17.88 +pushed your changes to another repository.  Rolling back a change
   17.89 +makes it disappear entirely, but \emph{only} in the repository in
   17.90 +which you perform the \hgcmd{rollback}.  Because a rollback eliminates
   17.91 +history, there's no way for the disappearance of a change to propagate
   17.92 +between repositories.
   17.93 +
   17.94 +If you've pushed a change to another repository---particularly if it's
   17.95 +a shared repository---it has essentially ``escaped into the wild,''
   17.96 +and you'll have to recover from your mistake in a different way.  What
   17.97 +will happen if you push a changeset somewhere, then roll it back, then
   17.98 +pull from the repository you pushed to, is that the changeset will
   17.99 +reappear in your repository.
  17.100 +
  17.101 +(If you absolutely know for sure that the change you want to roll back
  17.102 +is the most recent change in the repository that you pushed to,
  17.103 +\emph{and} you know that nobody else could have pulled it from that
  17.104 +repository, you can roll back the changeset there, too, but you really
  17.105 +should really not rely on this working reliably.  If you do this,
  17.106 +sooner or later a change really will make it into a repository that
  17.107 +you don't directly control (or have forgotten about), and come back to
  17.108 +bite you.)
  17.109 +
  17.110 +\subsection{You can only roll back once}
  17.111 +
  17.112 +Mercurial stores exactly one transaction in its transaction log; that
  17.113 +transaction is the most recent one that occurred in the repository.
  17.114 +This means that you can only roll back one transaction.  If you expect
  17.115 +to be able to roll back one transaction, then its predecessor, this is
  17.116 +not the behaviour you will get.
  17.117 +\interaction{rollback.twice}
  17.118 +Once you've rolled back one transaction in a repository, you can't
  17.119 +roll back again in that repository until you perform another commit or
  17.120 +pull.
  17.121 +
  17.122 +\section{Reverting the mistaken change}
  17.123 +
  17.124 +If you make a modification to a file, and decide that you really
  17.125 +didn't want to change the file at all, and you haven't yet committed
  17.126 +your changes, the \hgcmd{revert} command is the one you'll need.  It
  17.127 +looks at the changeset that's the parent of the working directory, and
  17.128 +restores the contents of the file to their state as of that changeset.
  17.129 +(That's a long-winded way of saying that, in the normal case, it
  17.130 +undoes your modifications.)
  17.131 +
  17.132 +Let's illustrate how the \hgcmd{revert} command works with yet another
  17.133 +small example.  We'll begin by modifying a file that Mercurial is
  17.134 +already tracking.
  17.135 +\interaction{daily.revert.modify}
  17.136 +If we don't want that change, we can simply \hgcmd{revert} the file.
  17.137 +\interaction{daily.revert.unmodify}
  17.138 +The \hgcmd{revert} command provides us with an extra degree of safety
  17.139 +by saving our modified file with a \filename{.orig} extension.
  17.140 +\interaction{daily.revert.status}
  17.141 +
  17.142 +Here is a summary of the cases that the \hgcmd{revert} command can
  17.143 +deal with.  We will describe each of these in more detail in the
  17.144 +section that follows.
  17.145 +\begin{itemize}
  17.146 +\item If you modify a file, it will restore the file to its unmodified
  17.147 +  state.
  17.148 +\item If you \hgcmd{add} a file, it will undo the ``added'' state of
  17.149 +  the file, but leave the file itself untouched.
  17.150 +\item If you delete a file without telling Mercurial, it will restore
  17.151 +  the file to its unmodified contents.
  17.152 +\item If you use the \hgcmd{remove} command to remove a file, it will
  17.153 +  undo the ``removed'' state of the file, and restore the file to its
  17.154 +  unmodified contents.
  17.155 +\end{itemize}
  17.156 +
  17.157 +\subsection{File management errors}
  17.158 +\label{sec:undo:mgmt}
  17.159 +
  17.160 +The \hgcmd{revert} command is useful for more than just modified
  17.161 +files.  It lets you reverse the results of all of Mercurial's file
  17.162 +management commands---\hgcmd{add}, \hgcmd{remove}, and so on.
  17.163 +
  17.164 +If you \hgcmd{add} a file, then decide that in fact you don't want
  17.165 +Mercurial to track it, use \hgcmd{revert} to undo the add.  Don't
  17.166 +worry; Mercurial will not modify the file in any way.  It will just
  17.167 +``unmark'' the file.
  17.168 +\interaction{daily.revert.add}
  17.169 +
  17.170 +Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use
  17.171 +\hgcmd{revert} to restore it to the contents it had as of the parent
  17.172 +of the working directory.
  17.173 +\interaction{daily.revert.remove}
  17.174 +This works just as well for a file that you deleted by hand, without
  17.175 +telling Mercurial (recall that in Mercurial terminology, this kind of
  17.176 +file is called ``missing'').
  17.177 +\interaction{daily.revert.missing}
  17.178 +
  17.179 +If you revert a \hgcmd{copy}, the copied-to file remains in your
  17.180 +working directory afterwards, untracked.  Since a copy doesn't affect
  17.181 +the copied-from file in any way, Mercurial doesn't do anything with
  17.182 +the copied-from file.
  17.183 +\interaction{daily.revert.copy}
  17.184 +
  17.185 +\subsubsection{A slightly special case: reverting a rename}
  17.186 +
  17.187 +If you \hgcmd{rename} a file, there is one small detail that
  17.188 +you should remember.  When you \hgcmd{revert} a rename, it's not
  17.189 +enough to provide the name of the renamed-to file, as you can see
  17.190 +here.
  17.191 +\interaction{daily.revert.rename}
  17.192 +As you can see from the output of \hgcmd{status}, the renamed-to file
  17.193 +is no longer identified as added, but the renamed-\emph{from} file is
  17.194 +still removed!  This is counter-intuitive (at least to me), but at
  17.195 +least it's easy to deal with.
  17.196 +\interaction{daily.revert.rename-orig}
  17.197 +So remember, to revert a \hgcmd{rename}, you must provide \emph{both}
  17.198 +the source and destination names.  
  17.199 +
  17.200 +% TODO: the output doesn't look like it will be removed!
  17.201 +
  17.202 +(By the way, if you rename a file, then modify the renamed-to file,
  17.203 +then revert both components of the rename, when Mercurial restores the
  17.204 +file that was removed as part of the rename, it will be unmodified.
  17.205 +If you need the modifications in the renamed-to file to show up in the
  17.206 +renamed-from file, don't forget to copy them over.)
  17.207 +
  17.208 +These fiddly aspects of reverting a rename arguably constitute a small
  17.209 +bug in Mercurial.
  17.210 +
  17.211 +\section{Dealing with committed changes}
  17.212 +
  17.213 +Consider a case where you have committed a change $a$, and another
  17.214 +change $b$ on top of it; you then realise that change $a$ was
  17.215 +incorrect.  Mercurial lets you ``back out'' an entire changeset
  17.216 +automatically, and building blocks that let you reverse part of a
  17.217 +changeset by hand.
  17.218 +
  17.219 +Before you read this section, here's something to keep in mind: the
  17.220 +\hgcmd{backout} command undoes changes by \emph{adding} history, not
  17.221 +by modifying or erasing it.  It's the right tool to use if you're
  17.222 +fixing bugs, but not if you're trying to undo some change that has
  17.223 +catastrophic consequences.  To deal with those, see
  17.224 +section~\ref{sec:undo:aaaiiieee}.
  17.225 +
  17.226 +\subsection{Backing out a changeset}
  17.227 +
  17.228 +The \hgcmd{backout} command lets you ``undo'' the effects of an entire
  17.229 +changeset in an automated fashion.  Because Mercurial's history is
  17.230 +immutable, this command \emph{does not} get rid of the changeset you
  17.231 +want to undo.  Instead, it creates a new changeset that
  17.232 +\emph{reverses} the effect of the to-be-undone changeset.
  17.233 +
  17.234 +The operation of the \hgcmd{backout} command is a little intricate, so
  17.235 +let's illustrate it with some examples.  First, we'll create a
  17.236 +repository with some simple changes.
  17.237 +\interaction{backout.init}
  17.238 +
  17.239 +The \hgcmd{backout} command takes a single changeset ID as its
  17.240 +argument; this is the changeset to back out.  Normally,
  17.241 +\hgcmd{backout} will drop you into a text editor to write a commit
  17.242 +message, so you can record why you're backing the change out.  In this
  17.243 +example, we provide a commit message on the command line using the
  17.244 +\hgopt{backout}{-m} option.
  17.245 +
  17.246 +\subsection{Backing out the tip changeset}
  17.247 +
  17.248 +We're going to start by backing out the last changeset we committed.
  17.249 +\interaction{backout.simple}
  17.250 +You can see that the second line from \filename{myfile} is no longer
  17.251 +present.  Taking a look at the output of \hgcmd{log} gives us an idea
  17.252 +of what the \hgcmd{backout} command has done.
  17.253 +\interaction{backout.simple.log}
  17.254 +Notice that the new changeset that \hgcmd{backout} has created is a
  17.255 +child of the changeset we backed out.  It's easier to see this in
  17.256 +figure~\ref{fig:undo:backout}, which presents a graphical view of the
  17.257 +change history.  As you can see, the history is nice and linear.
  17.258 +
  17.259 +\begin{figure}[htb]
  17.260 +  \centering
  17.261 +  \grafix{undo-simple}
  17.262 +  \caption{Backing out a change using the \hgcmd{backout} command}
  17.263 +  \label{fig:undo:backout}
  17.264 +\end{figure}
  17.265 +
  17.266 +\subsection{Backing out a non-tip change}
  17.267 +
  17.268 +If you want to back out a change other than the last one you
  17.269 +committed, pass the \hgopt{backout}{--merge} option to the
  17.270 +\hgcmd{backout} command.
  17.271 +\interaction{backout.non-tip.clone}
  17.272 +This makes backing out any changeset a ``one-shot'' operation that's
  17.273 +usually simple and fast.
  17.274 +\interaction{backout.non-tip.backout}
  17.275 +
  17.276 +If you take a look at the contents of \filename{myfile} after the
  17.277 +backout finishes, you'll see that the first and third changes are
  17.278 +present, but not the second.
  17.279 +\interaction{backout.non-tip.cat}
  17.280 +
  17.281 +As the graphical history in figure~\ref{fig:undo:backout-non-tip}
  17.282 +illustrates, Mercurial actually commits \emph{two} changes in this
  17.283 +kind of situation (the box-shaped nodes are the ones that Mercurial
  17.284 +commits automatically).  Before Mercurial begins the backout process,
  17.285 +it first remembers what the current parent of the working directory
  17.286 +is.  It then backs out the target changeset, and commits that as a
  17.287 +changeset.  Finally, it merges back to the previous parent of the
  17.288 +working directory, and commits the result of the merge.
  17.289 +
  17.290 +% TODO: to me it looks like mercurial doesn't commit the second merge automatically!
  17.291 +
  17.292 +\begin{figure}[htb]
  17.293 +  \centering
  17.294 +  \grafix{undo-non-tip}
  17.295 +  \caption{Automated backout of a non-tip change using the \hgcmd{backout} command}
  17.296 +  \label{fig:undo:backout-non-tip}
  17.297 +\end{figure}
  17.298 +
  17.299 +The result is that you end up ``back where you were'', only with some
  17.300 +extra history that undoes the effect of the changeset you wanted to
  17.301 +back out.
  17.302 +
  17.303 +\subsubsection{Always use the \hgopt{backout}{--merge} option}
  17.304 +
  17.305 +In fact, since the \hgopt{backout}{--merge} option will do the ``right
  17.306 +thing'' whether or not the changeset you're backing out is the tip
  17.307 +(i.e.~it won't try to merge if it's backing out the tip, since there's
  17.308 +no need), you should \emph{always} use this option when you run the
  17.309 +\hgcmd{backout} command.
  17.310 +
  17.311 +\subsection{Gaining more control of the backout process}
  17.312 +
  17.313 +While I've recommended that you always use the
  17.314 +\hgopt{backout}{--merge} option when backing out a change, the
  17.315 +\hgcmd{backout} command lets you decide how to merge a backout
  17.316 +changeset.  Taking control of the backout process by hand is something
  17.317 +you will rarely need to do, but it can be useful to understand what
  17.318 +the \hgcmd{backout} command is doing for you automatically.  To
  17.319 +illustrate this, let's clone our first repository, but omit the
  17.320 +backout change that it contains.
  17.321 +
  17.322 +\interaction{backout.manual.clone}
  17.323 +As with our earlier example, We'll commit a third changeset, then back
  17.324 +out its parent, and see what happens.
  17.325 +\interaction{backout.manual.backout} 
  17.326 +Our new changeset is again a descendant of the changeset we backout
  17.327 +out; it's thus a new head, \emph{not} a descendant of the changeset
  17.328 +that was the tip.  The \hgcmd{backout} command was quite explicit in
  17.329 +telling us this.
  17.330 +\interaction{backout.manual.log}
  17.331 +
  17.332 +Again, it's easier to see what has happened by looking at a graph of
  17.333 +the revision history, in figure~\ref{fig:undo:backout-manual}.  This
  17.334 +makes it clear that when we use \hgcmd{backout} to back out a change
  17.335 +other than the tip, Mercurial adds a new head to the repository (the
  17.336 +change it committed is box-shaped).
  17.337 +
  17.338 +\begin{figure}[htb]
  17.339 +  \centering
  17.340 +  \grafix{undo-manual}
  17.341 +  \caption{Backing out a change using the \hgcmd{backout} command}
  17.342 +  \label{fig:undo:backout-manual}
  17.343 +\end{figure}
  17.344 +
  17.345 +After the \hgcmd{backout} command has completed, it leaves the new
  17.346 +``backout'' changeset as the parent of the working directory.
  17.347 +\interaction{backout.manual.parents}
  17.348 +Now we have two isolated sets of changes.
  17.349 +\interaction{backout.manual.heads}
  17.350 +
  17.351 +Let's think about what we expect to see as the contents of
  17.352 +\filename{myfile} now.  The first change should be present, because
  17.353 +we've never backed it out.  The second change should be missing, as
  17.354 +that's the change we backed out.  Since the history graph shows the
  17.355 +third change as a separate head, we \emph{don't} expect to see the
  17.356 +third change present in \filename{myfile}.
  17.357 +\interaction{backout.manual.cat}
  17.358 +To get the third change back into the file, we just do a normal merge
  17.359 +of our two heads.
  17.360 +\interaction{backout.manual.merge}
  17.361 +Afterwards, the graphical history of our repository looks like
  17.362 +figure~\ref{fig:undo:backout-manual-merge}.
  17.363 +
  17.364 +\begin{figure}[htb]
  17.365 +  \centering
  17.366 +  \grafix{undo-manual-merge}
  17.367 +  \caption{Manually merging a backout change}
  17.368 +  \label{fig:undo:backout-manual-merge}
  17.369 +\end{figure}
  17.370 +
  17.371 +\subsection{Why \hgcmd{backout} works as it does}
  17.372 +
  17.373 +Here's a brief description of how the \hgcmd{backout} command works.
  17.374 +\begin{enumerate}
  17.375 +\item It ensures that the working directory is ``clean'', i.e.~that
  17.376 +  the output of \hgcmd{status} would be empty.
  17.377 +\item It remembers the current parent of the working directory.  Let's
  17.378 +  call this changeset \texttt{orig}
  17.379 +\item It does the equivalent of a \hgcmd{update} to sync the working
  17.380 +  directory to the changeset you want to back out.  Let's call this
  17.381 +  changeset \texttt{backout}
  17.382 +\item It finds the parent of that changeset.  Let's call that
  17.383 +  changeset \texttt{parent}.
  17.384 +\item For each file that the \texttt{backout} changeset affected, it
  17.385 +  does the equivalent of a \hgcmdargs{revert}{-r parent} on that file,
  17.386 +  to restore it to the contents it had before that changeset was
  17.387 +  committed.
  17.388 +\item It commits the result as a new changeset.  This changeset has
  17.389 +  \texttt{backout} as its parent.
  17.390 +\item If you specify \hgopt{backout}{--merge} on the command line, it
  17.391 +  merges with \texttt{orig}, and commits the result of the merge.
  17.392 +\end{enumerate}
  17.393 +
  17.394 +An alternative way to implement the \hgcmd{backout} command would be
  17.395 +to \hgcmd{export} the to-be-backed-out changeset as a diff, then use
  17.396 +the \cmdopt{patch}{--reverse} option to the \command{patch} command to
  17.397 +reverse the effect of the change without fiddling with the working
  17.398 +directory.  This sounds much simpler, but it would not work nearly as
  17.399 +well.
  17.400 +
  17.401 +The reason that \hgcmd{backout} does an update, a commit, a merge, and
  17.402 +another commit is to give the merge machinery the best chance to do a
  17.403 +good job when dealing with all the changes \emph{between} the change
  17.404 +you're backing out and the current tip.  
  17.405 +
  17.406 +If you're backing out a changeset that's~100 revisions back in your
  17.407 +project's history, the chances that the \command{patch} command will
  17.408 +be able to apply a reverse diff cleanly are not good, because
  17.409 +intervening changes are likely to have ``broken the context'' that
  17.410 +\command{patch} uses to determine whether it can apply a patch (if
  17.411 +this sounds like gibberish, see \ref{sec:mq:patch} for a
  17.412 +discussion of the \command{patch} command).  Also, Mercurial's merge
  17.413 +machinery will handle files and directories being renamed, permission
  17.414 +changes, and modifications to binary files, none of which
  17.415 +\command{patch} can deal with.
  17.416 +
  17.417 +\section{Changes that should never have been}
  17.418 +\label{sec:undo:aaaiiieee}
  17.419 +
  17.420 +Most of the time, the \hgcmd{backout} command is exactly what you need
  17.421 +if you want to undo the effects of a change.  It leaves a permanent
  17.422 +record of exactly what you did, both when committing the original
  17.423 +changeset and when you cleaned up after it.
  17.424 +
  17.425 +On rare occasions, though, you may find that you've committed a change
  17.426 +that really should not be present in the repository at all.  For
  17.427 +example, it would be very unusual, and usually considered a mistake,
  17.428 +to commit a software project's object files as well as its source
  17.429 +files.  Object files have almost no intrinsic value, and they're
  17.430 +\emph{big}, so they increase the size of the repository and the amount
  17.431 +of time it takes to clone or pull changes.
  17.432 +
  17.433 +Before I discuss the options that you have if you commit a ``brown
  17.434 +paper bag'' change (the kind that's so bad that you want to pull a
  17.435 +brown paper bag over your head), let me first discuss some approaches
  17.436 +that probably won't work.
  17.437 +
  17.438 +Since Mercurial treats history as accumulative---every change builds
  17.439 +on top of all changes that preceded it---you generally can't just make
  17.440 +disastrous changes disappear.  The one exception is when you've just
  17.441 +committed a change, and it hasn't been pushed or pulled into another
  17.442 +repository.  That's when you can safely use the \hgcmd{rollback}
  17.443 +command, as I detailed in section~\ref{sec:undo:rollback}.
  17.444 +
  17.445 +After you've pushed a bad change to another repository, you
  17.446 +\emph{could} still use \hgcmd{rollback} to make your local copy of the
  17.447 +change disappear, but it won't have the consequences you want.  The
  17.448 +change will still be present in the remote repository, so it will
  17.449 +reappear in your local repository the next time you pull.
  17.450 +
  17.451 +If a situation like this arises, and you know which repositories your
  17.452 +bad change has propagated into, you can \emph{try} to get rid of the
  17.453 +changeefrom \emph{every} one of those repositories.  This is, of
  17.454 +course, not a satisfactory solution: if you miss even a single
  17.455 +repository while you're expunging, the change is still ``in the
  17.456 +wild'', and could propagate further.
  17.457 +
  17.458 +If you've committed one or more changes \emph{after} the change that
  17.459 +you'd like to see disappear, your options are further reduced.
  17.460 +Mercurial doesn't provide a way to ``punch a hole'' in history,
  17.461 +leaving changesets intact.
  17.462 +
  17.463 +XXX This needs filling out.  The \texttt{hg-replay} script in the
  17.464 +\texttt{examples} directory works, but doesn't handle merge
  17.465 +changesets.  Kind of an important omission.
  17.466 +
  17.467 +\subsection{Protect yourself from ``escaped'' changes}
  17.468 +
  17.469 +If you've committed some changes to your local repository and they've
  17.470 +been pushed or pulled somewhere else, this isn't necessarily a
  17.471 +disaster.  You can protect yourself ahead of time against some classes
  17.472 +of bad changeset.  This is particularly easy if your team usually
  17.473 +pulls changes from a central repository.
  17.474 +
  17.475 +By configuring some hooks on that repository to validate incoming
  17.476 +changesets (see chapter~\ref{chap:hook}), you can automatically
  17.477 +prevent some kinds of bad changeset from being pushed to the central
  17.478 +repository at all.  With such a configuration in place, some kinds of
  17.479 +bad changeset will naturally tend to ``die out'' because they can't
  17.480 +propagate into the central repository.  Better yet, this happens
  17.481 +without any need for explicit intervention.
  17.482 +
  17.483 +For instance, an incoming change hook that verifies that a changeset
  17.484 +will actually compile can prevent people from inadvertantly ``breaking
  17.485 +the build''.
  17.486 +
  17.487 +\section{Finding the source of a bug}
  17.488 +\label{sec:undo:bisect}
  17.489 +
  17.490 +While it's all very well to be able to back out a changeset that
  17.491 +introduced a bug, this requires that you know which changeset to back
  17.492 +out.  Mercurial provides an invaluable command, called
  17.493 +\hgcmd{bisect}, that helps you to automate this process and accomplish
  17.494 +it very efficiently.
  17.495 +
  17.496 +The idea behind the \hgcmd{bisect} command is that a changeset has
  17.497 +introduced some change of behaviour that you can identify with a
  17.498 +simple binary test.  You don't know which piece of code introduced the
  17.499 +change, but you know how to test for the presence of the bug.  The
  17.500 +\hgcmd{bisect} command uses your test to direct its search for the
  17.501 +changeset that introduced the code that caused the bug.
  17.502 +
  17.503 +Here are a few scenarios to help you understand how you might apply
  17.504 +this command.
  17.505 +\begin{itemize}
  17.506 +\item The most recent version of your software has a bug that you
  17.507 +  remember wasn't present a few weeks ago, but you don't know when it
  17.508 +  was introduced.  Here, your binary test checks for the presence of
  17.509 +  that bug.
  17.510 +\item You fixed a bug in a rush, and now it's time to close the entry
  17.511 +  in your team's bug database.  The bug database requires a changeset
  17.512 +  ID when you close an entry, but you don't remember which changeset
  17.513 +  you fixed the bug in.  Once again, your binary test checks for the
  17.514 +  presence of the bug.
  17.515 +\item Your software works correctly, but runs~15\% slower than the
  17.516 +  last time you measured it.  You want to know which changeset
  17.517 +  introduced the performance regression.  In this case, your binary
  17.518 +  test measures the performance of your software, to see whether it's
  17.519 +  ``fast'' or ``slow''.
  17.520 +\item The sizes of the components of your project that you ship
  17.521 +  exploded recently, and you suspect that something changed in the way
  17.522 +  you build your project.
  17.523 +\end{itemize}
  17.524 +
  17.525 +From these examples, it should be clear that the \hgcmd{bisect}
  17.526 +command is not useful only for finding the sources of bugs.  You can
  17.527 +use it to find any ``emergent property'' of a repository (anything
  17.528 +that you can't find from a simple text search of the files in the
  17.529 +tree) for which you can write a binary test.
  17.530 +
  17.531 +We'll introduce a little bit of terminology here, just to make it
  17.532 +clear which parts of the search process are your responsibility, and
  17.533 +which are Mercurial's.  A \emph{test} is something that \emph{you} run
  17.534 +when \hgcmd{bisect} chooses a changeset.  A \emph{probe} is what
  17.535 +\hgcmd{bisect} runs to tell whether a revision is good.  Finally,
  17.536 +we'll use the word ``bisect'', as both a noun and a verb, to stand in
  17.537 +for the phrase ``search using the \hgcmd{bisect} command.
  17.538 +
  17.539 +One simple way to automate the searching process would be simply to
  17.540 +probe every changeset.  However, this scales poorly.  If it took ten
  17.541 +minutes to test a single changeset, and you had 10,000 changesets in
  17.542 +your repository, the exhaustive approach would take on average~35
  17.543 +\emph{days} to find the changeset that introduced a bug.  Even if you
  17.544 +knew that the bug was introduced by one of the last 500 changesets,
  17.545 +and limited your search to those, you'd still be looking at over 40
  17.546 +hours to find the changeset that introduced your bug.
  17.547 +
  17.548 +What the \hgcmd{bisect} command does is use its knowledge of the
  17.549 +``shape'' of your project's revision history to perform a search in
  17.550 +time proportional to the \emph{logarithm} of the number of changesets
  17.551 +to check (the kind of search it performs is called a dichotomic
  17.552 +search).  With this approach, searching through 10,000 changesets will
  17.553 +take less than three hours, even at ten minutes per test (the search
  17.554 +will require about 14 tests).  Limit your search to the last hundred
  17.555 +changesets, and it will take only about an hour (roughly seven tests).
  17.556 +
  17.557 +The \hgcmd{bisect} command is aware of the ``branchy'' nature of a
  17.558 +Mercurial project's revision history, so it has no problems dealing
  17.559 +with branches, merges, or multiple heads in a repository.  It can
  17.560 +prune entire branches of history with a single probe, which is how it
  17.561 +operates so efficiently.
  17.562 +
  17.563 +\subsection{Using the \hgcmd{bisect} command}
  17.564 +
  17.565 +Here's an example of \hgcmd{bisect} in action.
  17.566 +
  17.567 +\begin{note}
  17.568 +  In versions 0.9.5 and earlier of Mercurial, \hgcmd{bisect} was not a
  17.569 +  core command: it was distributed with Mercurial as an extension.
  17.570 +  This section describes the built-in command, not the old extension.
  17.571 +\end{note}
  17.572 +
  17.573 +Now let's create a repository, so that we can try out the
  17.574 +\hgcmd{bisect} command in isolation.
  17.575 +\interaction{bisect.init}
  17.576 +We'll simulate a project that has a bug in it in a simple-minded way:
  17.577 +create trivial changes in a loop, and nominate one specific change
  17.578 +that will have the ``bug''.  This loop creates 35 changesets, each
  17.579 +adding a single file to the repository.  We'll represent our ``bug''
  17.580 +with a file that contains the text ``i have a gub''.
  17.581 +\interaction{bisect.commits}
  17.582 +
  17.583 +The next thing that we'd like to do is figure out how to use the
  17.584 +\hgcmd{bisect} command.  We can use Mercurial's normal built-in help
  17.585 +mechanism for this.
  17.586 +\interaction{bisect.help}
  17.587 +
  17.588 +The \hgcmd{bisect} command works in steps.  Each step proceeds as follows.
  17.589 +\begin{enumerate}
  17.590 +\item You run your binary test.
  17.591 +  \begin{itemize}
  17.592 +  \item If the test succeeded, you tell \hgcmd{bisect} by running the
  17.593 +    \hgcmdargs{bisect}{good} command.
  17.594 +  \item If it failed, run the \hgcmdargs{bisect}{--bad} command.
  17.595 +  \end{itemize}
  17.596 +\item The command uses your information to decide which changeset to
  17.597 +  test next.
  17.598 +\item It updates the working directory to that changeset, and the
  17.599 +  process begins again.
  17.600 +\end{enumerate}
  17.601 +The process ends when \hgcmd{bisect} identifies a unique changeset
  17.602 +that marks the point where your test transitioned from ``succeeding''
  17.603 +to ``failing''.
  17.604 +
  17.605 +To start the search, we must run the \hgcmdargs{bisect}{--reset} command.
  17.606 +\interaction{bisect.search.init}
  17.607 +
  17.608 +In our case, the binary test we use is simple: we check to see if any
  17.609 +file in the repository contains the string ``i have a gub''.  If it
  17.610 +does, this changeset contains the change that ``caused the bug''.  By
  17.611 +convention, a changeset that has the property we're searching for is
  17.612 +``bad'', while one that doesn't is ``good''.
  17.613 +
  17.614 +Most of the time, the revision to which the working directory is
  17.615 +synced (usually the tip) already exhibits the problem introduced by
  17.616 +the buggy change, so we'll mark it as ``bad''.
  17.617 +\interaction{bisect.search.bad-init}
  17.618 +
  17.619 +Our next task is to nominate a changeset that we know \emph{doesn't}
  17.620 +have the bug; the \hgcmd{bisect} command will ``bracket'' its search
  17.621 +between the first pair of good and bad changesets.  In our case, we
  17.622 +know that revision~10 didn't have the bug.  (I'll have more words
  17.623 +about choosing the first ``good'' changeset later.)
  17.624 +\interaction{bisect.search.good-init}
  17.625 +
  17.626 +Notice that this command printed some output.
  17.627 +\begin{itemize}
  17.628 +\item It told us how many changesets it must consider before it can
  17.629 +  identify the one that introduced the bug, and how many tests that
  17.630 +  will require.
  17.631 +\item It updated the working directory to the next changeset to test,
  17.632 +  and told us which changeset it's testing.
  17.633 +\end{itemize}
  17.634 +
  17.635 +We now run our test in the working directory.  We use the
  17.636 +\command{grep} command to see if our ``bad'' file is present in the
  17.637 +working directory.  If it is, this revision is bad; if not, this
  17.638 +revision is good.
  17.639 +\interaction{bisect.search.step1}
  17.640 +
  17.641 +This test looks like a perfect candidate for automation, so let's turn
  17.642 +it into a shell function.
  17.643 +\interaction{bisect.search.mytest}
  17.644 +We can now run an entire test step with a single command,
  17.645 +\texttt{mytest}.
  17.646 +\interaction{bisect.search.step2}
  17.647 +A few more invocations of our canned test step command, and we're
  17.648 +done.
  17.649 +\interaction{bisect.search.rest}
  17.650 +
  17.651 +Even though we had~40 changesets to search through, the \hgcmd{bisect}
  17.652 +command let us find the changeset that introduced our ``bug'' with
  17.653 +only five tests.  Because the number of tests that the \hgcmd{bisect}
  17.654 +command performs grows logarithmically with the number of changesets to
  17.655 +search, the advantage that it has over the ``brute force'' search
  17.656 +approach increases with every changeset you add.
  17.657 +
  17.658 +\subsection{Cleaning up after your search}
  17.659 +
  17.660 +When you're finished using the \hgcmd{bisect} command in a
  17.661 +repository, you can use the \hgcmdargs{bisect}{reset} command to drop
  17.662 +the information it was using to drive your search.  The command
  17.663 +doesn't use much space, so it doesn't matter if you forget to run this
  17.664 +command.  However, \hgcmd{bisect} won't let you start a new search in
  17.665 +that repository until you do a \hgcmdargs{bisect}{reset}.
  17.666 +\interaction{bisect.search.reset}
  17.667 +
  17.668 +\section{Tips for finding bugs effectively}
  17.669 +
  17.670 +\subsection{Give consistent input}
  17.671 +
  17.672 +The \hgcmd{bisect} command requires that you correctly report the
  17.673 +result of every test you perform.  If you tell it that a test failed
  17.674 +when it really succeeded, it \emph{might} be able to detect the
  17.675 +inconsistency.  If it can identify an inconsistency in your reports,
  17.676 +it will tell you that a particular changeset is both good and bad.
  17.677 +However, it can't do this perfectly; it's about as likely to report
  17.678 +the wrong changeset as the source of the bug.
  17.679 +
  17.680 +\subsection{Automate as much as possible}
  17.681 +
  17.682 +When I started using the \hgcmd{bisect} command, I tried a few times
  17.683 +to run my tests by hand, on the command line.  This is an approach
  17.684 +that I, at least, am not suited to.  After a few tries, I found that I
  17.685 +was making enough mistakes that I was having to restart my searches
  17.686 +several times before finally getting correct results.
  17.687 +
  17.688 +My initial problems with driving the \hgcmd{bisect} command by hand
  17.689 +occurred even with simple searches on small repositories; if the
  17.690 +problem you're looking for is more subtle, or the number of tests that
  17.691 +\hgcmd{bisect} must perform increases, the likelihood of operator
  17.692 +error ruining the search is much higher.  Once I started automating my
  17.693 +tests, I had much better results.
  17.694 +
  17.695 +The key to automated testing is twofold:
  17.696 +\begin{itemize}
  17.697 +\item always test for the same symptom, and
  17.698 +\item always feed consistent input to the \hgcmd{bisect} command.
  17.699 +\end{itemize}
  17.700 +In my tutorial example above, the \command{grep} command tests for the
  17.701 +symptom, and the \texttt{if} statement takes the result of this check
  17.702 +and ensures that we always feed the same input to the \hgcmd{bisect}
  17.703 +command.  The \texttt{mytest} function marries these together in a
  17.704 +reproducible way, so that every test is uniform and consistent.
  17.705 +
  17.706 +\subsection{Check your results}
  17.707 +
  17.708 +Because the output of a \hgcmd{bisect} search is only as good as the
  17.709 +input you give it, don't take the changeset it reports as the
  17.710 +absolute truth.  A simple way to cross-check its report is to manually
  17.711 +run your test at each of the following changesets:
  17.712 +\begin{itemize}
  17.713 +\item The changeset that it reports as the first bad revision.  Your
  17.714 +  test should still report this as bad.
  17.715 +\item The parent of that changeset (either parent, if it's a merge).
  17.716 +  Your test should report this changeset as good.
  17.717 +\item A child of that changeset.  Your test should report this
  17.718 +  changeset as bad.
  17.719 +\end{itemize}
  17.720 +
  17.721 +\subsection{Beware interference between bugs}
  17.722 +
  17.723 +It's possible that your search for one bug could be disrupted by the
  17.724 +presence of another.  For example, let's say your software crashes at
  17.725 +revision 100, and worked correctly at revision 50.  Unknown to you,
  17.726 +someone else introduced a different crashing bug at revision 60, and
  17.727 +fixed it at revision 80.  This could distort your results in one of
  17.728 +several ways.
  17.729 +
  17.730 +It is possible that this other bug completely ``masks'' yours, which
  17.731 +is to say that it occurs before your bug has a chance to manifest
  17.732 +itself.  If you can't avoid that other bug (for example, it prevents
  17.733 +your project from building), and so can't tell whether your bug is
  17.734 +present in a particular changeset, the \hgcmd{bisect} command cannot
  17.735 +help you directly.  Instead, you can mark a changeset as untested by
  17.736 +running \hgcmdargs{bisect}{--skip}.
  17.737 +
  17.738 +A different problem could arise if your test for a bug's presence is
  17.739 +not specific enough.  If you check for ``my program crashes'', then
  17.740 +both your crashing bug and an unrelated crashing bug that masks it
  17.741 +will look like the same thing, and mislead \hgcmd{bisect}.
  17.742 +
  17.743 +Another useful situation in which to use \hgcmdargs{bisect}{--skip} is
  17.744 +if you can't test a revision because your project was in a broken and
  17.745 +hence untestable state at that revision, perhaps because someone
  17.746 +checked in a change that prevented the project from building.
  17.747 +
  17.748 +\subsection{Bracket your search lazily}
  17.749 +
  17.750 +Choosing the first ``good'' and ``bad'' changesets that will mark the
  17.751 +end points of your search is often easy, but it bears a little
  17.752 +discussion nevertheless.  From the perspective of \hgcmd{bisect}, the
  17.753 +``newest'' changeset is conventionally ``bad'', and the older
  17.754 +changeset is ``good''.
  17.755 +
  17.756 +If you're having trouble remembering when a suitable ``good'' change
  17.757 +was, so that you can tell \hgcmd{bisect}, you could do worse than
  17.758 +testing changesets at random.  Just remember to eliminate contenders
  17.759 +that can't possibly exhibit the bug (perhaps because the feature with
  17.760 +the bug isn't present yet) and those where another problem masks the
  17.761 +bug (as I discussed above).
  17.762 +
  17.763 +Even if you end up ``early'' by thousands of changesets or months of
  17.764 +history, you will only add a handful of tests to the total number that
  17.765 +\hgcmd{bisect} must perform, thanks to its logarithmic behaviour.
  17.766 +
  17.767 +%%% Local Variables: 
  17.768 +%%% mode: latex
  17.769 +%%% TeX-master: "00book"
  17.770 +%%% End: 

    18.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    18.2 +++ b/en/ch10-hook.tex	Thu Jan 29 22:56:27 2009 -0800
    18.3 @@ -0,0 +1,1413 @@
    18.4 +\chapter{Handling repository events with hooks}
    18.5 +\label{chap:hook}
    18.6 +
    18.7 +Mercurial offers a powerful mechanism to let you perform automated
    18.8 +actions in response to events that occur in a repository.  In some
    18.9 +cases, you can even control Mercurial's response to those events.
   18.10 +
   18.11 +The name Mercurial uses for one of these actions is a \emph{hook}.
   18.12 +Hooks are called ``triggers'' in some revision control systems, but
   18.13 +the two names refer to the same idea.
   18.14 +
   18.15 +\section{An overview of hooks in Mercurial}
   18.16 +
   18.17 +Here is a brief list of the hooks that Mercurial supports.  We will
   18.18 +revisit each of these hooks in more detail later, in
   18.19 +section~\ref{sec:hook:ref}.
   18.20 +
   18.21 +\begin{itemize}
   18.22 +\item[\small\hook{changegroup}] This is run after a group of
   18.23 +  changesets has been brought into the repository from elsewhere.
   18.24 +\item[\small\hook{commit}] This is run after a new changeset has been
   18.25 +  created in the local repository.
   18.26 +\item[\small\hook{incoming}] This is run once for each new changeset
   18.27 +  that is brought into the repository from elsewhere.  Notice the
   18.28 +  difference from \hook{changegroup}, which is run once per
   18.29 +  \emph{group} of changesets brought in.
   18.30 +\item[\small\hook{outgoing}] This is run after a group of changesets
   18.31 +  has been transmitted from this repository.
   18.32 +\item[\small\hook{prechangegroup}] This is run before starting to
   18.33 +  bring a group of changesets into the repository.
   18.34 +\item[\small\hook{precommit}] Controlling. This is run before starting
   18.35 +  a commit.
   18.36 +\item[\small\hook{preoutgoing}] Controlling. This is run before
   18.37 +  starting to transmit a group of changesets from this repository.
   18.38 +\item[\small\hook{pretag}] Controlling. This is run before creating a tag.
   18.39 +\item[\small\hook{pretxnchangegroup}] Controlling. This is run after a
   18.40 +  group of changesets has been brought into the local repository from
   18.41 +  another, but before the transaction completes that will make the
   18.42 +  changes permanent in the repository.
   18.43 +\item[\small\hook{pretxncommit}] Controlling. This is run after a new
   18.44 +  changeset has been created in the local repository, but before the
   18.45 +  transaction completes that will make it permanent.
   18.46 +\item[\small\hook{preupdate}] Controlling. This is run before starting
   18.47 +  an update or merge of the working directory.
   18.48 +\item[\small\hook{tag}] This is run after a tag is created.
   18.49 +\item[\small\hook{update}] This is run after an update or merge of the
   18.50 +  working directory has finished.
   18.51 +\end{itemize}
   18.52 +Each of the hooks whose description begins with the word
   18.53 +``Controlling'' has the ability to determine whether an activity can
   18.54 +proceed.  If the hook succeeds, the activity may proceed; if it fails,
   18.55 +the activity is either not permitted or undone, depending on the hook.
   18.56 +
   18.57 +\section{Hooks and security}
   18.58 +
   18.59 +\subsection{Hooks are run with your privileges}
   18.60 +
   18.61 +When you run a Mercurial command in a repository, and the command
   18.62 +causes a hook to run, that hook runs on \emph{your} system, under
   18.63 +\emph{your} user account, with \emph{your} privilege level.  Since
   18.64 +hooks are arbitrary pieces of executable code, you should treat them
   18.65 +with an appropriate level of suspicion.  Do not install a hook unless
   18.66 +you are confident that you know who created it and what it does.
   18.67 +
   18.68 +In some cases, you may be exposed to hooks that you did not install
   18.69 +yourself.  If you work with Mercurial on an unfamiliar system,
   18.70 +Mercurial will run hooks defined in that system's global \hgrc\ file.
   18.71 +
   18.72 +If you are working with a repository owned by another user, Mercurial
   18.73 +can run hooks defined in that user's repository, but it will still run
   18.74 +them as ``you''.  For example, if you \hgcmd{pull} from that
   18.75 +repository, and its \sfilename{.hg/hgrc} defines a local
   18.76 +\hook{outgoing} hook, that hook will run under your user account, even
   18.77 +though you don't own that repository.
   18.78 +
   18.79 +\begin{note}
   18.80 +  This only applies if you are pulling from a repository on a local or
   18.81 +  network filesystem.  If you're pulling over http or ssh, any
   18.82 +  \hook{outgoing} hook will run under whatever account is executing
   18.83 +  the server process, on the server.
   18.84 +\end{note}
   18.85 +
   18.86 +XXX To see what hooks are defined in a repository, use the
   18.87 +\hgcmdargs{config}{hooks} command.  If you are working in one
   18.88 +repository, but talking to another that you do not own (e.g.~using
   18.89 +\hgcmd{pull} or \hgcmd{incoming}), remember that it is the other
   18.90 +repository's hooks you should be checking, not your own.
   18.91 +
   18.92 +\subsection{Hooks do not propagate}
   18.93 +
   18.94 +In Mercurial, hooks are not revision controlled, and do not propagate
   18.95 +when you clone, or pull from, a repository.  The reason for this is
   18.96 +simple: a hook is a completely arbitrary piece of executable code.  It
   18.97 +runs under your user identity, with your privilege level, on your
   18.98 +machine.
   18.99 +
  18.100 +It would be extremely reckless for any distributed revision control
  18.101 +system to implement revision-controlled hooks, as this would offer an
  18.102 +easily exploitable way to subvert the accounts of users of the
  18.103 +revision control system.
  18.104 +
  18.105 +Since Mercurial does not propagate hooks, if you are collaborating
  18.106 +with other people on a common project, you should not assume that they
  18.107 +are using the same Mercurial hooks as you are, or that theirs are
  18.108 +correctly configured.  You should document the hooks you expect people
  18.109 +to use.
  18.110 +
  18.111 +In a corporate intranet, this is somewhat easier to control, as you
  18.112 +can for example provide a ``standard'' installation of Mercurial on an
  18.113 +NFS filesystem, and use a site-wide \hgrc\ file to define hooks that
  18.114 +all users will see.  However, this too has its limits; see below.
  18.115 +
  18.116 +\subsection{Hooks can be overridden}
  18.117 +
  18.118 +Mercurial allows you to override a hook definition by redefining the
  18.119 +hook.  You can disable it by setting its value to the empty string, or
  18.120 +change its behaviour as you wish.
  18.121 +
  18.122 +If you deploy a system-~or site-wide \hgrc\ file that defines some
  18.123 +hooks, you should thus understand that your users can disable or
  18.124 +override those hooks.
  18.125 +
  18.126 +\subsection{Ensuring that critical hooks are run}
  18.127 +
  18.128 +Sometimes you may want to enforce a policy that you do not want others
  18.129 +to be able to work around.  For example, you may have a requirement
  18.130 +that every changeset must pass a rigorous set of tests.  Defining this
  18.131 +requirement via a hook in a site-wide \hgrc\ won't work for remote
  18.132 +users on laptops, and of course local users can subvert it at will by
  18.133 +overriding the hook.
  18.134 +
  18.135 +Instead, you can set up your policies for use of Mercurial so that
  18.136 +people are expected to propagate changes through a well-known
  18.137 +``canonical'' server that you have locked down and configured
  18.138 +appropriately.
  18.139 +
  18.140 +One way to do this is via a combination of social engineering and
  18.141 +technology.  Set up a restricted-access account; users can push
  18.142 +changes over the network to repositories managed by this account, but
  18.143 +they cannot log into the account and run normal shell commands.  In
  18.144 +this scenario, a user can commit a changeset that contains any old
  18.145 +garbage they want.
  18.146 +
  18.147 +When someone pushes a changeset to the server that everyone pulls
  18.148 +from, the server will test the changeset before it accepts it as
  18.149 +permanent, and reject it if it fails to pass the test suite.  If
  18.150 +people only pull changes from this filtering server, it will serve to
  18.151 +ensure that all changes that people pull have been automatically
  18.152 +vetted.
  18.153 +
  18.154 +\section{Care with \texttt{pretxn} hooks in a shared-access repository}
  18.155 +
  18.156 +If you want to use hooks to do some automated work in a repository
  18.157 +that a number of people have shared access to, you need to be careful
  18.158 +in how you do this.
  18.159 +
  18.160 +Mercurial only locks a repository when it is writing to the
  18.161 +repository, and only the parts of Mercurial that write to the
  18.162 +repository pay attention to locks.  Write locks are necessary to
  18.163 +prevent multiple simultaneous writers from scribbling on each other's
  18.164 +work, corrupting the repository.
  18.165 +
  18.166 +Because Mercurial is careful with the order in which it reads and
  18.167 +writes data, it does not need to acquire a lock when it wants to read
  18.168 +data from the repository.  The parts of Mercurial that read from the
  18.169 +repository never pay attention to locks.  This lockless reading scheme
  18.170 +greatly increases performance and concurrency.
  18.171 +
  18.172 +With great performance comes a trade-off, though, one which has the
  18.173 +potential to cause you trouble unless you're aware of it.  To describe
  18.174 +this requires a little detail about how Mercurial adds changesets to a
  18.175 +repository and reads those changes.
  18.176 +
  18.177 +When Mercurial \emph{writes} metadata, it writes it straight into the
  18.178 +destination file.  It writes file data first, then manifest data
  18.179 +(which contains pointers to the new file data), then changelog data
  18.180 +(which contains pointers to the new manifest data).  Before the first
  18.181 +write to each file, it stores a record of where the end of the file
  18.182 +was in its transaction log.  If the transaction must be rolled back,
  18.183 +Mercurial simply truncates each file back to the size it was before the
  18.184 +transaction began.
  18.185 +
  18.186 +When Mercurial \emph{reads} metadata, it reads the changelog first,
  18.187 +then everything else.  Since a reader will only access parts of the
  18.188 +manifest or file metadata that it can see in the changelog, it can
  18.189 +never see partially written data.
  18.190 +
  18.191 +Some controlling hooks (\hook{pretxncommit} and
  18.192 +\hook{pretxnchangegroup}) run when a transaction is almost complete.
  18.193 +All of the metadata has been written, but Mercurial can still roll the
  18.194 +transaction back and cause the newly-written data to disappear.
  18.195 +
  18.196 +If one of these hooks runs for long, it opens a window of time during
  18.197 +which a reader can see the metadata for changesets that are not yet
  18.198 +permanent, and should not be thought of as ``really there''.  The
  18.199 +longer the hook runs, the longer that window is open.
  18.200 +
  18.201 +\subsection{The problem illustrated}
  18.202 +
  18.203 +In principle, a good use for the \hook{pretxnchangegroup} hook would
  18.204 +be to automatically build and test incoming changes before they are
  18.205 +accepted into a central repository.  This could let you guarantee that
  18.206 +nobody can push changes to this repository that ``break the build''.
  18.207 +But if a client can pull changes while they're being tested, the
  18.208 +usefulness of the test is zero; an unsuspecting someone can pull
  18.209 +untested changes, potentially breaking their build.
  18.210 +
  18.211 +The safest technological answer to this challenge is to set up such a
  18.212 +``gatekeeper'' repository as \emph{unidirectional}.  Let it take
  18.213 +changes pushed in from the outside, but do not allow anyone to pull
  18.214 +changes from it (use the \hook{preoutgoing} hook to lock it down).
  18.215 +Configure a \hook{changegroup} hook so that if a build or test
  18.216 +succeeds, the hook will push the new changes out to another repository
  18.217 +that people \emph{can} pull from.
  18.218 +
  18.219 +In practice, putting a centralised bottleneck like this in place is
  18.220 +not often a good idea, and transaction visibility has nothing to do
  18.221 +with the problem.  As the size of a project---and the time it takes to
  18.222 +build and test---grows, you rapidly run into a wall with this ``try
  18.223 +before you buy'' approach, where you have more changesets to test than
  18.224 +time in which to deal with them.  The inevitable result is frustration
  18.225 +on the part of all involved.
  18.226 +
  18.227 +An approach that scales better is to get people to build and test
  18.228 +before they push, then run automated builds and tests centrally
  18.229 +\emph{after} a push, to be sure all is well.  The advantage of this
  18.230 +approach is that it does not impose a limit on the rate at which the
  18.231 +repository can accept changes.
  18.232 +
  18.233 +\section{A short tutorial on using hooks}
  18.234 +\label{sec:hook:simple}
  18.235 +
  18.236 +It is easy to write a Mercurial hook.  Let's start with a hook that
  18.237 +runs when you finish a \hgcmd{commit}, and simply prints the hash of
  18.238 +the changeset you just created.  The hook is called \hook{commit}.
  18.239 +
  18.240 +\begin{figure}[ht]
  18.241 +  \interaction{hook.simple.init}
  18.242 +  \caption{A simple hook that runs when a changeset is committed}
  18.243 +  \label{ex:hook:init}
  18.244 +\end{figure}
  18.245 +
  18.246 +All hooks follow the pattern in example~\ref{ex:hook:init}.  You add
  18.247 +an entry to the \rcsection{hooks} section of your \hgrc.  On the left
  18.248 +is the name of the event to trigger on; on the right is the action to
  18.249 +take.  As you can see, you can run an arbitrary shell command in a
  18.250 +hook.  Mercurial passes extra information to the hook using
  18.251 +environment variables (look for \envar{HG\_NODE} in the example).
  18.252 +
  18.253 +\subsection{Performing multiple actions per event}
  18.254 +
  18.255 +Quite often, you will want to define more than one hook for a
  18.256 +particular kind of event, as shown in example~\ref{ex:hook:ext}.
  18.257 +Mercurial lets you do this by adding an \emph{extension} to the end of
  18.258 +a hook's name.  You extend a hook's name by giving the name of the
  18.259 +hook, followed by a full stop (the ``\texttt{.}'' character), followed
  18.260 +by some more text of your choosing.  For example, Mercurial will run
  18.261 +both \texttt{commit.foo} and \texttt{commit.bar} when the
  18.262 +\texttt{commit} event occurs.
  18.263 +
  18.264 +\begin{figure}[ht]
  18.265 +  \interaction{hook.simple.ext}
  18.266 +  \caption{Defining a second \hook{commit} hook}
  18.267 +  \label{ex:hook:ext}
  18.268 +\end{figure}
  18.269 +
  18.270 +To give a well-defined order of execution when there are multiple
  18.271 +hooks defined for an event, Mercurial sorts hooks by extension, and
  18.272 +executes the hook commands in this sorted order.  In the above
  18.273 +example, it will execute \texttt{commit.bar} before
  18.274 +\texttt{commit.foo}, and \texttt{commit} before both.
  18.275 +
  18.276 +It is a good idea to use a somewhat descriptive extension when you
  18.277 +define a new hook.  This will help you to remember what the hook was
  18.278 +for.  If the hook fails, you'll get an error message that contains the
  18.279 +hook name and extension, so using a descriptive extension could give
  18.280 +you an immediate hint as to why the hook failed (see
  18.281 +section~\ref{sec:hook:perm} for an example).
  18.282 +
  18.283 +\subsection{Controlling whether an activity can proceed}
  18.284 +\label{sec:hook:perm}
  18.285 +
  18.286 +In our earlier examples, we used the \hook{commit} hook, which is
  18.287 +run after a commit has completed.  This is one of several Mercurial
  18.288 +hooks that run after an activity finishes.  Such hooks have no way of
  18.289 +influencing the activity itself.
  18.290 +
  18.291 +Mercurial defines a number of events that occur before an activity
  18.292 +starts; or after it starts, but before it finishes.  Hooks that
  18.293 +trigger on these events have the added ability to choose whether the
  18.294 +activity can continue, or will abort.  
  18.295 +
  18.296 +The \hook{pretxncommit} hook runs after a commit has all but
  18.297 +completed.  In other words, the metadata representing the changeset
  18.298 +has been written out to disk, but the transaction has not yet been
  18.299 +allowed to complete.  The \hook{pretxncommit} hook has the ability to
  18.300 +decide whether the transaction can complete, or must be rolled back.
  18.301 +
  18.302 +If the \hook{pretxncommit} hook exits with a status code of zero, the
  18.303 +transaction is allowed to complete; the commit finishes; and the
  18.304 +\hook{commit} hook is run.  If the \hook{pretxncommit} hook exits with
  18.305 +a non-zero status code, the transaction is rolled back; the metadata
  18.306 +representing the changeset is erased; and the \hook{commit} hook is
  18.307 +not run.
  18.308 +
  18.309 +\begin{figure}[ht]
  18.310 +  \interaction{hook.simple.pretxncommit}
  18.311 +  \caption{Using the \hook{pretxncommit} hook to control commits}
  18.312 +  \label{ex:hook:pretxncommit}
  18.313 +\end{figure}
  18.314 +
  18.315 +The hook in example~\ref{ex:hook:pretxncommit} checks that a commit
  18.316 +comment contains a bug ID.  If it does, the commit can complete.  If
  18.317 +not, the commit is rolled back.
  18.318 +
  18.319 +\section{Writing your own hooks}
  18.320 +
  18.321 +When you are writing a hook, you might find it useful to run Mercurial
  18.322 +either with the \hggopt{-v} option, or the \rcitem{ui}{verbose} config
  18.323 +item set to ``true''.  When you do so, Mercurial will print a message
  18.324 +before it calls each hook.
  18.325 +
  18.326 +\subsection{Choosing how your hook should run}
  18.327 +\label{sec:hook:lang}
  18.328 +
  18.329 +You can write a hook either as a normal program---typically a shell
  18.330 +script---or as a Python function that is executed within the Mercurial
  18.331 +process.
  18.332 +
  18.333 +Writing a hook as an external program has the advantage that it
  18.334 +requires no knowledge of Mercurial's internals.  You can call normal
  18.335 +Mercurial commands to get any added information you need.  The
  18.336 +trade-off is that external hooks are slower than in-process hooks.
  18.337 +
  18.338 +An in-process Python hook has complete access to the Mercurial API,
  18.339 +and does not ``shell out'' to another process, so it is inherently
  18.340 +faster than an external hook.  It is also easier to obtain much of the
  18.341 +information that a hook requires by using the Mercurial API than by
  18.342 +running Mercurial commands.
  18.343 +
  18.344 +If you are comfortable with Python, or require high performance,
  18.345 +writing your hooks in Python may be a good choice.  However, when you
  18.346 +have a straightforward hook to write and you don't need to care about
  18.347 +performance (probably the majority of hooks), a shell script is
  18.348 +perfectly fine.
  18.349 +
  18.350 +\subsection{Hook parameters}
  18.351 +\label{sec:hook:param}
  18.352 +
  18.353 +Mercurial calls each hook with a set of well-defined parameters.  In
  18.354 +Python, a parameter is passed as a keyword argument to your hook
  18.355 +function.  For an external program, a parameter is passed as an
  18.356 +environment variable.
  18.357 +
  18.358 +Whether your hook is written in Python or as a shell script, the
  18.359 +hook-specific parameter names and values will be the same.  A boolean
  18.360 +parameter will be represented as a boolean value in Python, but as the
  18.361 +number 1 (for ``true'') or 0 (for ``false'') as an environment
  18.362 +variable for an external hook.  If a hook parameter is named
  18.363 +\texttt{foo}, the keyword argument for a Python hook will also be
  18.364 +named \texttt{foo}, while the environment variable for an external
  18.365 +hook will be named \texttt{HG\_FOO}.
  18.366 +
  18.367 +\subsection{Hook return values and activity control}
  18.368 +
  18.369 +A hook that executes successfully must exit with a status of zero if
  18.370 +external, or return boolean ``false'' if in-process.  Failure is
  18.371 +indicated with a non-zero exit status from an external hook, or an
  18.372 +in-process hook returning boolean ``true''.  If an in-process hook
  18.373 +raises an exception, the hook is considered to have failed.
  18.374 +
  18.375 +For a hook that controls whether an activity can proceed, zero/false
  18.376 +means ``allow'', while non-zero/true/exception means ``deny''.
  18.377 +
  18.378 +\subsection{Writing an external hook}
  18.379 +
  18.380 +When you define an external hook in your \hgrc\ and the hook is run,
  18.381 +its value is passed to your shell, which interprets it.  This means
  18.382 +that you can use normal shell constructs in the body of the hook.
  18.383 +
  18.384 +An executable hook is always run with its current directory set to a
  18.385 +repository's root directory.
  18.386 +
  18.387 +Each hook parameter is passed in as an environment variable; the name
  18.388 +is upper-cased, and prefixed with the string ``\texttt{HG\_}''.
  18.389 +
  18.390 +With the exception of hook parameters, Mercurial does not set or
  18.391 +modify any environment variables when running a hook.  This is useful
  18.392 +to remember if you are writing a site-wide hook that may be run by a
  18.393 +number of different users with differing environment variables set.
  18.394 +In multi-user situations, you should not rely on environment variables
  18.395 +being set to the values you have in your environment when testing the
  18.396 +hook.
  18.397 +
  18.398 +\subsection{Telling Mercurial to use an in-process hook}
  18.399 +
  18.400 +The \hgrc\ syntax for defining an in-process hook is slightly
  18.401 +different than for an executable hook.  The value of the hook must
  18.402 +start with the text ``\texttt{python:}'', and continue with the
  18.403 +fully-qualified name of a callable object to use as the hook's value.
  18.404 +
  18.405 +The module in which a hook lives is automatically imported when a hook
  18.406 +is run.  So long as you have the module name and \envar{PYTHONPATH}
  18.407 +right, it should ``just work''.
  18.408 +
  18.409 +The following \hgrc\ example snippet illustrates the syntax and
  18.410 +meaning of the notions we just described.
  18.411 +\begin{codesample2}
  18.412 +  [hooks]
  18.413 +  commit.example = python:mymodule.submodule.myhook
  18.414 +\end{codesample2}
  18.415 +When Mercurial runs the \texttt{commit.example} hook, it imports
  18.416 +\texttt{mymodule.submodule}, looks for the callable object named
  18.417 +\texttt{myhook}, and calls it.
  18.418 +
  18.419 +\subsection{Writing an in-process hook}
  18.420 +
  18.421 +The simplest in-process hook does nothing, but illustrates the basic
  18.422 +shape of the hook API:
  18.423 +\begin{codesample2}
  18.424 +  def myhook(ui, repo, **kwargs):
  18.425 +      pass
  18.426 +\end{codesample2}
  18.427 +The first argument to a Python hook is always a
  18.428 +\pymodclass{mercurial.ui}{ui} object.  The second is a repository object;
  18.429 +at the moment, it is always an instance of
  18.430 +\pymodclass{mercurial.localrepo}{localrepository}.  Following these two
  18.431 +arguments are other keyword arguments.  Which ones are passed in
  18.432 +depends on the hook being called, but a hook can ignore arguments it
  18.433 +doesn't care about by dropping them into a keyword argument dict, as
  18.434 +with \texttt{**kwargs} above.
  18.435 +
  18.436 +\section{Some hook examples}
  18.437 +
  18.438 +\subsection{Writing meaningful commit messages}
  18.439 +
  18.440 +It's hard to imagine a useful commit message being very short.  The
  18.441 +simple \hook{pretxncommit} hook of figure~\ref{ex:hook:msglen.go}
  18.442 +will prevent you from committing a changeset with a message that is
  18.443 +less than ten bytes long.
  18.444 +
  18.445 +\begin{figure}[ht]
  18.446 +  \interaction{hook.msglen.go}
  18.447 +  \caption{A hook that forbids overly short commit messages}
  18.448 +  \label{ex:hook:msglen.go}
  18.449 +\end{figure}
  18.450 +
  18.451 +\subsection{Checking for trailing whitespace}
  18.452 +
  18.453 +An interesting use of a commit-related hook is to help you to write
  18.454 +cleaner code.  A simple example of ``cleaner code'' is the dictum that
  18.455 +a change should not add any new lines of text that contain ``trailing
  18.456 +whitespace''.  Trailing whitespace is a series of space and tab
  18.457 +characters at the end of a line of text.  In most cases, trailing
  18.458 +whitespace is unnecessary, invisible noise, but it is occasionally
  18.459 +problematic, and people often prefer to get rid of it.
  18.460 +
  18.461 +You can use either the \hook{precommit} or \hook{pretxncommit} hook to
  18.462 +tell whether you have a trailing whitespace problem.  If you use the
  18.463 +\hook{precommit} hook, the hook will not know which files you are
  18.464 +committing, so it will have to check every modified file in the
  18.465 +repository for trailing white space.  If you want to commit a change
  18.466 +to just the file \filename{foo}, but the file \filename{bar} contains
  18.467 +trailing whitespace, doing a check in the \hook{precommit} hook will
  18.468 +prevent you from committing \filename{foo} due to the problem with
  18.469 +\filename{bar}.  This doesn't seem right.
  18.470 +
  18.471 +Should you choose the \hook{pretxncommit} hook, the check won't occur
  18.472 +until just before the transaction for the commit completes.  This will
  18.473 +allow you to check for problems only the exact files that are being
  18.474 +committed.  However, if you entered the commit message interactively
  18.475 +and the hook fails, the transaction will roll back; you'll have to
  18.476 +re-enter the commit message after you fix the trailing whitespace and
  18.477 +run \hgcmd{commit} again.
  18.478 +
  18.479 +\begin{figure}[ht]
  18.480 +  \interaction{hook.ws.simple}
  18.481 +  \caption{A simple hook that checks for trailing whitespace}
  18.482 +  \label{ex:hook:ws.simple}
  18.483 +\end{figure}
  18.484 +
  18.485 +Figure~\ref{ex:hook:ws.simple} introduces a simple \hook{pretxncommit}
  18.486 +hook that checks for trailing whitespace.  This hook is short, but not
  18.487 +very helpful.  It exits with an error status if a change adds a line
  18.488 +with trailing whitespace to any file, but does not print any
  18.489 +information that might help us to identify the offending file or
  18.490 +line.  It also has the nice property of not paying attention to
  18.491 +unmodified lines; only lines that introduce new trailing whitespace
  18.492 +cause problems.
  18.493 +
  18.494 +\begin{figure}[ht]
  18.495 +  \interaction{hook.ws.better}
  18.496 +  \caption{A better trailing whitespace hook}
  18.497 +  \label{ex:hook:ws.better}
  18.498 +\end{figure}
  18.499 +
  18.500 +The example of figure~\ref{ex:hook:ws.better} is much more complex,
  18.501 +but also more useful.  It parses a unified diff to see if any lines
  18.502 +add trailing whitespace, and prints the name of the file and the line
  18.503 +number of each such occurrence.  Even better, if the change adds
  18.504 +trailing whitespace, this hook saves the commit comment and prints the
  18.505 +name of the save file before exiting and telling Mercurial to roll the
  18.506 +transaction back, so you can use
  18.507 +\hgcmdargs{commit}{\hgopt{commit}{-l}~\emph{filename}} to reuse the
  18.508 +saved commit message once you've corrected the problem.
  18.509 +
  18.510 +As a final aside, note in figure~\ref{ex:hook:ws.better} the use of
  18.511 +\command{perl}'s in-place editing feature to get rid of trailing
  18.512 +whitespace from a file.  This is concise and useful enough that I will
  18.513 +reproduce it here.
  18.514 +\begin{codesample2}
  18.515 +  perl -pi -e 's,\textbackslash{}s+\$,,' filename
  18.516 +\end{codesample2}
  18.517 +
  18.518 +\section{Bundled hooks}
  18.519 +
  18.520 +Mercurial ships with several bundled hooks.  You can find them in the
  18.521 +\dirname{hgext} directory of a Mercurial source tree.  If you are
  18.522 +using a Mercurial binary package, the hooks will be located in the
  18.523 +\dirname{hgext} directory of wherever your package installer put
  18.524 +Mercurial.
  18.525 +
  18.526 +\subsection{\hgext{acl}---access control for parts of a repository}
  18.527 +
  18.528 +The \hgext{acl} extension lets you control which remote users are
  18.529 +allowed to push changesets to a networked server.  You can protect any
  18.530 +portion of a repository (including the entire repo), so that a
  18.531 +specific remote user can push changes that do not affect the protected
  18.532 +portion.
  18.533 +
  18.534 +This extension implements access control based on the identity of the
  18.535 +user performing a push, \emph{not} on who committed the changesets
  18.536 +they're pushing.  It makes sense to use this hook only if you have a
  18.537 +locked-down server environment that authenticates remote users, and
  18.538 +you want to be sure that only specific users are allowed to push
  18.539 +changes to that server.
  18.540 +
  18.541 +\subsubsection{Configuring the \hook{acl} hook}
  18.542 +
  18.543 +In order to manage incoming changesets, the \hgext{acl} hook must be
  18.544 +used as a \hook{pretxnchangegroup} hook.  This lets it see which files
  18.545 +are modified by each incoming changeset, and roll back a group of
  18.546 +changesets if they modify ``forbidden'' files.  Example:
  18.547 +\begin{codesample2}
  18.548 +  [hooks]
  18.549 +  pretxnchangegroup.acl = python:hgext.acl.hook
  18.550 +\end{codesample2}
  18.551 +
  18.552 +The \hgext{acl} extension is configured using three sections.  
  18.553 +
  18.554 +The \rcsection{acl} section has only one entry, \rcitem{acl}{sources},
  18.555 +which lists the sources of incoming changesets that the hook should
  18.556 +pay attention to.  You don't normally need to configure this section.
  18.557 +\begin{itemize}
  18.558 +\item[\rcitem{acl}{serve}] Control incoming changesets that are arriving
  18.559 +  from a remote repository over http or ssh.  This is the default
  18.560 +  value of \rcitem{acl}{sources}, and usually the only setting you'll
  18.561 +  need for this configuration item.
  18.562 +\item[\rcitem{acl}{pull}] Control incoming changesets that are
  18.563 +  arriving via a pull from a local repository.
  18.564 +\item[\rcitem{acl}{push}] Control incoming changesets that are
  18.565 +  arriving via a push from a local repository.
  18.566 +\item[\rcitem{acl}{bundle}] Control incoming changesets that are
  18.567 +  arriving from another repository via a bundle.
  18.568 +\end{itemize}
  18.569 +
  18.570 +The \rcsection{acl.allow} section controls the users that are allowed to
  18.571 +add changesets to the repository.  If this section is not present, all
  18.572 +users that are not explicitly denied are allowed.  If this section is
  18.573 +present, all users that are not explicitly allowed are denied (so an
  18.574 +empty section means that all users are denied).
  18.575 +
  18.576 +The \rcsection{acl.deny} section determines which users are denied
  18.577 +from adding changesets to the repository.  If this section is not
  18.578 +present or is empty, no users are denied.
  18.579 +
  18.580 +The syntaxes for the \rcsection{acl.allow} and \rcsection{acl.deny}
  18.581 +sections are identical.  On the left of each entry is a glob pattern
  18.582 +that matches files or directories, relative to the root of the
  18.583 +repository; on the right, a user name.
  18.584 +
  18.585 +In the following example, the user \texttt{docwriter} can only push
  18.586 +changes to the \dirname{docs} subtree of the repository, while
  18.587 +\texttt{intern} can push changes to any file or directory except
  18.588 +\dirname{source/sensitive}.
  18.589 +\begin{codesample2}
  18.590 +  [acl.allow]
  18.591 +  docs/** = docwriter
  18.592 +
  18.593 +  [acl.deny]
  18.594 +  source/sensitive/** = intern
  18.595 +\end{codesample2}
  18.596 +
  18.597 +\subsubsection{Testing and troubleshooting}
  18.598 +
  18.599 +If you want to test the \hgext{acl} hook, run it with Mercurial's
  18.600 +debugging output enabled.  Since you'll probably be running it on a
  18.601 +server where it's not convenient (or sometimes possible) to pass in
  18.602 +the \hggopt{--debug} option, don't forget that you can enable
  18.603 +debugging output in your \hgrc:
  18.604 +\begin{codesample2}
  18.605 +  [ui]
  18.606 +  debug = true
  18.607 +\end{codesample2}
  18.608 +With this enabled, the \hgext{acl} hook will print enough information
  18.609 +to let you figure out why it is allowing or forbidding pushes from
  18.610 +specific users.
  18.611 +
  18.612 +\subsection{\hgext{bugzilla}---integration with Bugzilla}
  18.613 +
  18.614 +The \hgext{bugzilla} extension adds a comment to a Bugzilla bug
  18.615 +whenever it finds a reference to that bug ID in a commit comment.  You
  18.616 +can install this hook on a shared server, so that any time a remote
  18.617 +user pushes changes to this server, the hook gets run.  
  18.618 +
  18.619 +It adds a comment to the bug that looks like this (you can configure
  18.620 +the contents of the comment---see below):
  18.621 +\begin{codesample2}
  18.622 +  Changeset aad8b264143a, made by Joe User <joe.user@domain.com> in
  18.623 +  the frobnitz repository, refers to this bug.
  18.624 +
  18.625 +  For complete details, see
  18.626 +  http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a
  18.627 +
  18.628 +  Changeset description:
  18.629 +        Fix bug 10483 by guarding against some NULL pointers
  18.630 +\end{codesample2}
  18.631 +The value of this hook is that it automates the process of updating a
  18.632 +bug any time a changeset refers to it.  If you configure the hook
  18.633 +properly, it makes it easy for people to browse straight from a
  18.634 +Bugzilla bug to a changeset that refers to that bug.
  18.635 +
  18.636 +You can use the code in this hook as a starting point for some more
  18.637 +exotic Bugzilla integration recipes.  Here are a few possibilities:
  18.638 +\begin{itemize}
  18.639 +\item Require that every changeset pushed to the server have a valid
  18.640 +  bug~ID in its commit comment.  In this case, you'd want to configure
  18.641 +  the hook as a \hook{pretxncommit} hook.  This would allow the hook
  18.642 +  to reject changes that didn't contain bug IDs.
  18.643 +\item Allow incoming changesets to automatically modify the
  18.644 +  \emph{state} of a bug, as well as simply adding a comment.  For
  18.645 +  example, the hook could recognise the string ``fixed bug 31337'' as
  18.646 +  indicating that it should update the state of bug 31337 to
  18.647 +  ``requires testing''.
  18.648 +\end{itemize}
  18.649 +
  18.650 +\subsubsection{Configuring the \hook{bugzilla} hook}
  18.651 +\label{sec:hook:bugzilla:config}
  18.652 +
  18.653 +You should configure this hook in your server's \hgrc\ as an
  18.654 +\hook{incoming} hook, for example as follows:
  18.655 +\begin{codesample2}
  18.656 +  [hooks]
  18.657 +  incoming.bugzilla = python:hgext.bugzilla.hook
  18.658 +\end{codesample2}
  18.659 +
  18.660 +Because of the specialised nature of this hook, and because Bugzilla
  18.661 +was not written with this kind of integration in mind, configuring
  18.662 +this hook is a somewhat involved process.
  18.663 +
  18.664 +Before you begin, you must install the MySQL bindings for Python on
  18.665 +the host(s) where you'll be running the hook.  If this is not
  18.666 +available as a binary package for your system, you can download it
  18.667 +from~\cite{web:mysql-python}.
  18.668 +
  18.669 +Configuration information for this hook lives in the
  18.670 +\rcsection{bugzilla} section of your \hgrc.
  18.671 +\begin{itemize}
  18.672 +\item[\rcitem{bugzilla}{version}] The version of Bugzilla installed on
  18.673 +  the server.  The database schema that Bugzilla uses changes
  18.674 +  occasionally, so this hook has to know exactly which schema to use.
  18.675 +  At the moment, the only version supported is \texttt{2.16}.
  18.676 +\item[\rcitem{bugzilla}{host}] The hostname of the MySQL server that
  18.677 +  stores your Bugzilla data.  The database must be configured to allow
  18.678 +  connections from whatever host you are running the \hook{bugzilla}
  18.679 +  hook on.
  18.680 +\item[\rcitem{bugzilla}{user}] The username with which to connect to
  18.681 +  the MySQL server.  The database must be configured to allow this
  18.682 +  user to connect from whatever host you are running the
  18.683 +  \hook{bugzilla} hook on.  This user must be able to access and
  18.684 +  modify Bugzilla tables.  The default value of this item is
  18.685 +  \texttt{bugs}, which is the standard name of the Bugzilla user in a
  18.686 +  MySQL database.
  18.687 +\item[\rcitem{bugzilla}{password}] The MySQL password for the user you
  18.688 +  configured above.  This is stored as plain text, so you should make
  18.689 +  sure that unauthorised users cannot read the \hgrc\ file where you
  18.690 +  store this information.
  18.691 +\item[\rcitem{bugzilla}{db}] The name of the Bugzilla database on the
  18.692 +  MySQL server.  The default value of this item is \texttt{bugs},
  18.693 +  which is the standard name of the MySQL database where Bugzilla
  18.694 +  stores its data.
  18.695 +\item[\rcitem{bugzilla}{notify}] If you want Bugzilla to send out a
  18.696 +  notification email to subscribers after this hook has added a
  18.697 +  comment to a bug, you will need this hook to run a command whenever
  18.698 +  it updates the database.  The command to run depends on where you
  18.699 +  have installed Bugzilla, but it will typically look something like
  18.700 +  this, if you have Bugzilla installed in
  18.701 +  \dirname{/var/www/html/bugzilla}:
  18.702 +  \begin{codesample4}
  18.703 +    cd /var/www/html/bugzilla && ./processmail %s nobody@nowhere.com
  18.704 +  \end{codesample4}
  18.705 +  The Bugzilla \texttt{processmail} program expects to be given a
  18.706 +  bug~ID (the hook replaces ``\texttt{\%s}'' with the bug~ID) and an
  18.707 +  email address.  It also expects to be able to write to some files in
  18.708 +  the directory that it runs in.  If Bugzilla and this hook are not
  18.709 +  installed on the same machine, you will need to find a way to run
  18.710 +  \texttt{processmail} on the server where Bugzilla is installed.
  18.711 +\end{itemize}
  18.712 +
  18.713 +\subsubsection{Mapping committer names to Bugzilla user names}
  18.714 +
  18.715 +By default, the \hgext{bugzilla} hook tries to use the email address
  18.716 +of a changeset's committer as the Bugzilla user name with which to
  18.717 +update a bug.  If this does not suit your needs, you can map committer
  18.718 +email addresses to Bugzilla user names using a \rcsection{usermap}
  18.719 +section.
  18.720 +
  18.721 +Each item in the \rcsection{usermap} section contains an email address
  18.722 +on the left, and a Bugzilla user name on the right.
  18.723 +\begin{codesample2}
  18.724 +  [usermap]
  18.725 +  jane.user@example.com = jane
  18.726 +\end{codesample2}
  18.727 +You can either keep the \rcsection{usermap} data in a normal \hgrc, or
  18.728 +tell the \hgext{bugzilla} hook to read the information from an
  18.729 +external \filename{usermap} file.  In the latter case, you can store
  18.730 +\filename{usermap} data by itself in (for example) a user-modifiable
  18.731 +repository.  This makes it possible to let your users maintain their
  18.732 +own \rcitem{bugzilla}{usermap} entries.  The main \hgrc\ file might
  18.733 +look like this:
  18.734 +\begin{codesample2}
  18.735 +  # regular hgrc file refers to external usermap file
  18.736 +  [bugzilla]
  18.737 +  usermap = /home/hg/repos/userdata/bugzilla-usermap.conf
  18.738 +\end{codesample2}
  18.739 +While the \filename{usermap} file that it refers to might look like
  18.740 +this:
  18.741 +\begin{codesample2}
  18.742 +  # bugzilla-usermap.conf - inside a hg repository
  18.743 +  [usermap]
  18.744 +  stephanie@example.com = steph
  18.745 +\end{codesample2}
  18.746 +
  18.747 +\subsubsection{Configuring the text that gets added to a bug}
  18.748 +
  18.749 +You can configure the text that this hook adds as a comment; you
  18.750 +specify it in the form of a Mercurial template.  Several \hgrc\
  18.751 +entries (still in the \rcsection{bugzilla} section) control this
  18.752 +behaviour.
  18.753 +\begin{itemize}
  18.754 +\item[\texttt{strip}] The number of leading path elements to strip
  18.755 +  from a repository's path name to construct a partial path for a URL.
  18.756 +  For example, if the repositories on your server live under
  18.757 +  \dirname{/home/hg/repos}, and you have a repository whose path is
  18.758 +  \dirname{/home/hg/repos/app/tests}, then setting \texttt{strip} to
  18.759 +  \texttt{4} will give a partial path of \dirname{app/tests}.  The
  18.760 +  hook will make this partial path available when expanding a
  18.761 +  template, as \texttt{webroot}.
  18.762 +\item[\texttt{template}] The text of the template to use.  In addition
  18.763 +  to the usual changeset-related variables, this template can use
  18.764 +  \texttt{hgweb} (the value of the \texttt{hgweb} configuration item
  18.765 +  above) and \texttt{webroot} (the path constructed using
  18.766 +  \texttt{strip} above).
  18.767 +\end{itemize}
  18.768 +
  18.769 +In addition, you can add a \rcitem{web}{baseurl} item to the
  18.770 +\rcsection{web} section of your \hgrc.  The \hgext{bugzilla} hook will
  18.771 +make this available when expanding a template, as the base string to
  18.772 +use when constructing a URL that will let users browse from a Bugzilla
  18.773 +comment to view a changeset.  Example:
  18.774 +\begin{codesample2}
  18.775 +  [web]
  18.776 +  baseurl = http://hg.domain.com/
  18.777 +\end{codesample2}
  18.778 +
  18.779 +Here is an example set of \hgext{bugzilla} hook config information.
  18.780 +\begin{codesample2}
  18.781 +  [bugzilla]
  18.782 +  host = bugzilla.example.com
  18.783 +  password = mypassword
  18.784 +  version = 2.16
  18.785 +  # server-side repos live in /home/hg/repos, so strip 4 leading
  18.786 +  # separators
  18.787 +  strip = 4
  18.788 +  hgweb = http://hg.example.com/
  18.789 +  usermap = /home/hg/repos/notify/bugzilla.conf
  18.790 +  template = Changeset \{node|short\}, made by \{author\} in the \{webroot\}
  18.791 +    repo, refers to this bug.\\nFor complete details, see 
  18.792 +    \{hgweb\}\{webroot\}?cmd=changeset;node=\{node|short\}\\nChangeset
  18.793 +    description:\\n\\t\{desc|tabindent\}
  18.794 +\end{codesample2}
  18.795 +
  18.796 +\subsubsection{Testing and troubleshooting}
  18.797 +
  18.798 +The most common problems with configuring the \hgext{bugzilla} hook
  18.799 +relate to running Bugzilla's \filename{processmail} script and mapping
  18.800 +committer names to user names.
  18.801 +
  18.802 +Recall from section~\ref{sec:hook:bugzilla:config} above that the user
  18.803 +that runs the Mercurial process on the server is also the one that
  18.804 +will run the \filename{processmail} script.  The
  18.805 +\filename{processmail} script sometimes causes Bugzilla to write to
  18.806 +files in its configuration directory, and Bugzilla's configuration
  18.807 +files are usually owned by the user that your web server runs under.
  18.808 +
  18.809 +You can cause \filename{processmail} to be run with the suitable
  18.810 +user's identity using the \command{sudo} command.  Here is an example
  18.811 +entry for a \filename{sudoers} file.
  18.812 +\begin{codesample2}
  18.813 +  hg_user = (httpd_user) NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s
  18.814 +\end{codesample2}
  18.815 +This allows the \texttt{hg\_user} user to run a
  18.816 +\filename{processmail-wrapper} program under the identity of
  18.817 +\texttt{httpd\_user}.
  18.818 +
  18.819 +This indirection through a wrapper script is necessary, because
  18.820 +\filename{processmail} expects to be run with its current directory
  18.821 +set to wherever you installed Bugzilla; you can't specify that kind of
  18.822 +constraint in a \filename{sudoers} file.  The contents of the wrapper
  18.823 +script are simple:
  18.824 +\begin{codesample2}
  18.825 +  #!/bin/sh
  18.826 +  cd `dirname $0` && ./processmail "$1" nobody@example.com
  18.827 +\end{codesample2}
  18.828 +It doesn't seem to matter what email address you pass to
  18.829 +\filename{processmail}.
  18.830 +
  18.831 +If your \rcsection{usermap} is not set up correctly, users will see an
  18.832 +error message from the \hgext{bugzilla} hook when they push changes
  18.833 +to the server.  The error message will look like this:
  18.834 +\begin{codesample2}
  18.835 +  cannot find bugzilla user id for john.q.public@example.com
  18.836 +\end{codesample2}
  18.837 +What this means is that the committer's address,
  18.838 +\texttt{john.q.public@example.com}, is not a valid Bugzilla user name,
  18.839 +nor does it have an entry in your \rcsection{usermap} that maps it to
  18.840 +a valid Bugzilla user name.
  18.841 +
  18.842 +\subsection{\hgext{notify}---send email notifications}
  18.843 +
  18.844 +Although Mercurial's built-in web server provides RSS feeds of changes
  18.845 +in every repository, many people prefer to receive change
  18.846 +notifications via email.  The \hgext{notify} hook lets you send out
  18.847 +notifications to a set of email addresses whenever changesets arrive
  18.848 +that those subscribers are interested in.
  18.849 +
  18.850 +As with the \hgext{bugzilla} hook, the \hgext{notify} hook is
  18.851 +template-driven, so you can customise the contents of the notification
  18.852 +messages that it sends.
  18.853 +
  18.854 +By default, the \hgext{notify} hook includes a diff of every changeset
  18.855 +that it sends out; you can limit the size of the diff, or turn this
  18.856 +feature off entirely.  It is useful for letting subscribers review
  18.857 +changes immediately, rather than clicking to follow a URL.
  18.858 +
  18.859 +\subsubsection{Configuring the \hgext{notify} hook}
  18.860 +
  18.861 +You can set up the \hgext{notify} hook to send one email message per
  18.862 +incoming changeset, or one per incoming group of changesets (all those
  18.863 +that arrived in a single pull or push).
  18.864 +\begin{codesample2}
  18.865 +  [hooks]
  18.866 +  # send one email per group of changes
  18.867 +  changegroup.notify = python:hgext.notify.hook
  18.868 +  # send one email per change
  18.869 +  incoming.notify = python:hgext.notify.hook
  18.870 +\end{codesample2}
  18.871 +
  18.872 +Configuration information for this hook lives in the
  18.873 +\rcsection{notify} section of a \hgrc\ file.
  18.874 +\begin{itemize}
  18.875 +\item[\rcitem{notify}{test}] By default, this hook does not send out
  18.876 +  email at all; instead, it prints the message that it \emph{would}
  18.877 +  send.  Set this item to \texttt{false} to allow email to be sent.
  18.878 +  The reason that sending of email is turned off by default is that it
  18.879 +  takes several tries to configure this extension exactly as you would
  18.880 +  like, and it would be bad form to spam subscribers with a number of
  18.881 +  ``broken'' notifications while you debug your configuration.
  18.882 +\item[\rcitem{notify}{config}] The path to a configuration file that
  18.883 +  contains subscription information.  This is kept separate from the
  18.884 +  main \hgrc\ so that you can maintain it in a repository of its own.
  18.885 +  People can then clone that repository, update their subscriptions,
  18.886 +  and push the changes back to your server.
  18.887 +\item[\rcitem{notify}{strip}] The number of leading path separator
  18.888 +  characters to strip from a repository's path, when deciding whether
  18.889 +  a repository has subscribers.  For example, if the repositories on
  18.890 +  your server live in \dirname{/home/hg/repos}, and \hgext{notify} is
  18.891 +  considering a repository named \dirname{/home/hg/repos/shared/test},
  18.892 +  setting \rcitem{notify}{strip} to \texttt{4} will cause
  18.893 +  \hgext{notify} to trim the path it considers down to
  18.894 +  \dirname{shared/test}, and it will match subscribers against that.
  18.895 +\item[\rcitem{notify}{template}] The template text to use when sending
  18.896 +  messages.  This specifies both the contents of the message header
  18.897 +  and its body.
  18.898 +\item[\rcitem{notify}{maxdiff}] The maximum number of lines of diff
  18.899 +  data to append to the end of a message.  If a diff is longer than
  18.900 +  this, it is truncated.  By default, this is set to 300.  Set this to
  18.901 +  \texttt{0} to omit diffs from notification emails.
  18.902 +\item[\rcitem{notify}{sources}] A list of sources of changesets to
  18.903 +  consider.  This lets you limit \hgext{notify} to only sending out
  18.904 +  email about changes that remote users pushed into this repository
  18.905 +  via a server, for example.  See section~\ref{sec:hook:sources} for
  18.906 +  the sources you can specify here.
  18.907 +\end{itemize}
  18.908 +
  18.909 +If you set the \rcitem{web}{baseurl} item in the \rcsection{web}
  18.910 +section, you can use it in a template; it will be available as
  18.911 +\texttt{webroot}.
  18.912 +
  18.913 +Here is an example set of \hgext{notify} configuration information.
  18.914 +\begin{codesample2}
  18.915 +  [notify]
  18.916 +  # really send email
  18.917 +  test = false
  18.918 +  # subscriber data lives in the notify repo
  18.919 +  config = /home/hg/repos/notify/notify.conf
  18.920 +  # repos live in /home/hg/repos on server, so strip 4 "/" chars
  18.921 +  strip = 4
  18.922 +  template = X-Hg-Repo: \{webroot\}
  18.923 +    Subject: \{webroot\}: \{desc|firstline|strip\}
  18.924 +    From: \{author\}
  18.925 +
  18.926 +    changeset \{node|short\} in \{root\}
  18.927 +    details: \{baseurl\}\{webroot\}?cmd=changeset;node=\{node|short\}
  18.928 +    description:
  18.929 +      \{desc|tabindent|strip\}
  18.930 +
  18.931 +  [web]
  18.932 +  baseurl = http://hg.example.com/
  18.933 +\end{codesample2}
  18.934 +
  18.935 +This will produce a message that looks like the following:
  18.936 +\begin{codesample2}
  18.937 +  X-Hg-Repo: tests/slave
  18.938 +  Subject: tests/slave: Handle error case when slave has no buffers
  18.939 +  Date: Wed,  2 Aug 2006 15:25:46 -0700 (PDT)
  18.940 +
  18.941 +  changeset 3cba9bfe74b5 in /home/hg/repos/tests/slave
  18.942 +  details: http://hg.example.com/tests/slave?cmd=changeset;node=3cba9bfe74b5
  18.943 +  description:
  18.944 +          Handle error case when slave has no buffers
  18.945 +  diffs (54 lines):
  18.946 +
  18.947 +  diff -r 9d95df7cf2ad -r 3cba9bfe74b5 include/tests.h
  18.948 +  --- a/include/tests.h      Wed Aug 02 15:19:52 2006 -0700
  18.949 +  +++ b/include/tests.h      Wed Aug 02 15:25:26 2006 -0700
  18.950 +  @@ -212,6 +212,15 @@ static __inline__ void test_headers(void *h)
  18.951 +  [...snip...]
  18.952 +\end{codesample2}
  18.953 +
  18.954 +\subsubsection{Testing and troubleshooting}
  18.955 +
  18.956 +Do not forget that by default, the \hgext{notify} extension \emph{will
  18.957 +  not send any mail} until you explicitly configure it to do so, by
  18.958 +setting \rcitem{notify}{test} to \texttt{false}.  Until you do that,
  18.959 +it simply prints the message it \emph{would} send.
  18.960 +
  18.961 +\section{Information for writers of hooks}
  18.962 +\label{sec:hook:ref}
  18.963 +
  18.964 +\subsection{In-process hook execution}
  18.965 +
  18.966 +An in-process hook is called with arguments of the following form:
  18.967 +\begin{codesample2}
  18.968 +  def myhook(ui, repo, **kwargs):
  18.969 +      pass
  18.970 +\end{codesample2}
  18.971 +The \texttt{ui} parameter is a \pymodclass{mercurial.ui}{ui} object.
  18.972 +The \texttt{repo} parameter is a
  18.973 +\pymodclass{mercurial.localrepo}{localrepository} object.  The
  18.974 +names and values of the \texttt{**kwargs} parameters depend on the
  18.975 +hook being invoked, with the following common features:
  18.976 +\begin{itemize}
  18.977 +\item If a parameter is named \texttt{node} or
  18.978 +  \texttt{parent\emph{N}}, it will contain a hexadecimal changeset ID.
  18.979 +  The empty string is used to represent ``null changeset ID'' instead
  18.980 +  of a string of zeroes.
  18.981 +\item If a parameter is named \texttt{url}, it will contain the URL of
  18.982 +  a remote repository, if that can be determined.
  18.983 +\item Boolean-valued parameters are represented as Python
  18.984 +  \texttt{bool} objects.
  18.985 +\end{itemize}
  18.986 +
  18.987 +An in-process hook is called without a change to the process's working
  18.988 +directory (unlike external hooks, which are run in the root of the
  18.989 +repository).  It must not change the process's working directory, or
  18.990 +it will cause any calls it makes into the Mercurial API to fail.
  18.991 +
  18.992 +If a hook returns a boolean ``false'' value, it is considered to have
  18.993 +succeeded.  If it returns a boolean ``true'' value or raises an
  18.994 +exception, it is considered to have failed.  A useful way to think of
  18.995 +the calling convention is ``tell me if you fail''.
  18.996 +
  18.997 +Note that changeset IDs are passed into Python hooks as hexadecimal
  18.998 +strings, not the binary hashes that Mercurial's APIs normally use.  To
  18.999 +convert a hash from hex to binary, use the
 18.1000 +\pymodfunc{mercurial.node}{bin} function.
 18.1001 +
 18.1002 +\subsection{External hook execution}
 18.1003 +
 18.1004 +An external hook is passed to the shell of the user running Mercurial.
 18.1005 +Features of that shell, such as variable substitution and command
 18.1006 +redirection, are available.  The hook is run in the root directory of
 18.1007 +the repository (unlike in-process hooks, which are run in the same
 18.1008 +directory that Mercurial was run in).
 18.1009 +
 18.1010 +Hook parameters are passed to the hook as environment variables.  Each
 18.1011 +environment variable's name is converted in upper case and prefixed
 18.1012 +with the string ``\texttt{HG\_}''.  For example, if the name of a
 18.1013 +parameter is ``\texttt{node}'', the name of the environment variable
 18.1014 +representing that parameter will be ``\texttt{HG\_NODE}''.
 18.1015 +
 18.1016 +A boolean parameter is represented as the string ``\texttt{1}'' for
 18.1017 +``true'', ``\texttt{0}'' for ``false''.  If an environment variable is
 18.1018 +named \envar{HG\_NODE}, \envar{HG\_PARENT1} or \envar{HG\_PARENT2}, it
 18.1019 +contains a changeset ID represented as a hexadecimal string.  The
 18.1020 +empty string is used to represent ``null changeset ID'' instead of a
 18.1021 +string of zeroes.  If an environment variable is named
 18.1022 +\envar{HG\_URL}, it will contain the URL of a remote repository, if
 18.1023 +that can be determined.
 18.1024 +
 18.1025 +If a hook exits with a status of zero, it is considered to have
 18.1026 +succeeded.  If it exits with a non-zero status, it is considered to
 18.1027 +have failed.
 18.1028 +
 18.1029 +\subsection{Finding out where changesets come from}
 18.1030 +
 18.1031 +A hook that involves the transfer of changesets between a local
 18.1032 +repository and another may be able to find out information about the
 18.1033 +``far side''.  Mercurial knows \emph{how} changes are being
 18.1034 +transferred, and in many cases \emph{where} they are being transferred
 18.1035 +to or from.
 18.1036 +
 18.1037 +\subsubsection{Sources of changesets}
 18.1038 +\label{sec:hook:sources}
 18.1039 +
 18.1040 +Mercurial will tell a hook what means are, or were, used to transfer
 18.1041 +changesets between repositories.  This is provided by Mercurial in a
 18.1042 +Python parameter named \texttt{source}, or an environment variable named
 18.1043 +\envar{HG\_SOURCE}.
 18.1044 +
 18.1045 +\begin{itemize}
 18.1046 +\item[\texttt{serve}] Changesets are transferred to or from a remote
 18.1047 +  repository over http or ssh.
 18.1048 +\item[\texttt{pull}] Changesets are being transferred via a pull from
 18.1049 +  one repository into another.
 18.1050 +\item[\texttt{push}] Changesets are being transferred via a push from
 18.1051 +  one repository into another.
 18.1052 +\item[\texttt{bundle}] Changesets are being transferred to or from a
 18.1053 +  bundle.
 18.1054 +\end{itemize}
 18.1055 +
 18.1056 +\subsubsection{Where changes are going---remote repository URLs}
 18.1057 +\label{sec:hook:url}
 18.1058 +
 18.1059 +When possible, Mercurial will tell a hook the location of the ``far
 18.1060 +side'' of an activity that transfers changeset data between
 18.1061 +repositories.  This is provided by Mercurial in a Python parameter
 18.1062 +named \texttt{url}, or an environment variable named \envar{HG\_URL}.
 18.1063 +
 18.1064 +This information is not always known.  If a hook is invoked in a
 18.1065 +repository that is being served via http or ssh, Mercurial cannot tell
 18.1066 +where the remote repository is, but it may know where the client is
 18.1067 +connecting from.  In such cases, the URL will take one of the
 18.1068 +following forms:
 18.1069 +\begin{itemize}
 18.1070 +\item \texttt{remote:ssh:\emph{ip-address}}---remote ssh client, at
 18.1071 +  the given IP address.
 18.1072 +\item \texttt{remote:http:\emph{ip-address}}---remote http client, at
 18.1073 +  the given IP address.  If the client is using SSL, this will be of
 18.1074 +  the form \texttt{remote:https:\emph{ip-address}}.
 18.1075 +\item Empty---no information could be discovered about the remote
 18.1076 +  client.
 18.1077 +\end{itemize}
 18.1078 +
 18.1079 +\section{Hook reference}
 18.1080 +
 18.1081 +\subsection{\hook{changegroup}---after remote changesets added}
 18.1082 +\label{sec:hook:changegroup}
 18.1083 +
 18.1084 +This hook is run after a group of pre-existing changesets has been
 18.1085 +added to the repository, for example via a \hgcmd{pull} or
 18.1086 +\hgcmd{unbundle}.  This hook is run once per operation that added one
 18.1087 +or more changesets.  This is in contrast to the \hook{incoming} hook,
 18.1088 +which is run once per changeset, regardless of whether the changesets
 18.1089 +arrive in a group.
 18.1090 +
 18.1091 +Some possible uses for this hook include kicking off an automated
 18.1092 +build or test of the added changesets, updating a bug database, or
 18.1093 +notifying subscribers that a repository contains new changes.
 18.1094 +
 18.1095 +Parameters to this hook:
 18.1096 +\begin{itemize}
 18.1097 +\item[\texttt{node}] A changeset ID.  The changeset ID of the first
 18.1098 +  changeset in the group that was added.  All changesets between this
 18.1099 +  and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by
 18.1100 +  a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}.
 18.1101 +\item[\texttt{source}] A string.  The source of these changes.  See
 18.1102 +  section~\ref{sec:hook:sources} for details.
 18.1103 +\item[\texttt{url}] A URL.  The location of the remote repository, if
 18.1104 +  known.  See section~\ref{sec:hook:url} for more information.
 18.1105 +\end{itemize}
 18.1106 +
 18.1107 +See also: \hook{incoming} (section~\ref{sec:hook:incoming}),
 18.1108 +\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}),
 18.1109 +\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup})
 18.1110 +
 18.1111 +\subsection{\hook{commit}---after a new changeset is created}
 18.1112 +\label{sec:hook:commit}
 18.1113 +
 18.1114 +This hook is run after a new changeset has been created.
 18.1115 +
 18.1116 +Parameters to this hook:
 18.1117 +\begin{itemize}
 18.1118 +\item[\texttt{node}] A changeset ID.  The changeset ID of the newly
 18.1119 +  committed changeset.
 18.1120 +\item[\texttt{parent1}] A changeset ID.  The changeset ID of the first
 18.1121 +  parent of the newly committed changeset.
 18.1122 +\item[\texttt{parent2}] A changeset ID.  The changeset ID of the second
 18.1123 +  parent of the newly committed changeset.
 18.1124 +\end{itemize}
 18.1125 +
 18.1126 +See also: \hook{precommit} (section~\ref{sec:hook:precommit}),
 18.1127 +\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit})
 18.1128 +
 18.1129 +\subsection{\hook{incoming}---after one remote changeset is added}
 18.1130 +\label{sec:hook:incoming}
 18.1131 +
 18.1132 +This hook is run after a pre-existing changeset has been added to the
 18.1133 +repository, for example via a \hgcmd{push}.  If a group of changesets
 18.1134 +was added in a single operation, this hook is called once for each
 18.1135 +added changeset.
 18.1136 +
 18.1137 +You can use this hook for the same purposes as the \hook{changegroup}
 18.1138 +hook (section~\ref{sec:hook:changegroup}); it's simply more convenient
 18.1139 +sometimes to run a hook once per group of changesets, while other
 18.1140 +times it's handier once per changeset.
 18.1141 +
 18.1142 +Parameters to this hook:
 18.1143 +\begin{itemize}
 18.1144 +\item[\texttt{node}] A changeset ID.  The ID of the newly added
 18.1145 +  changeset.
 18.1146 +\item[\texttt{source}] A string.  The source of these changes.  See
 18.1147 +  section~\ref{sec:hook:sources} for details.
 18.1148 +\item[\texttt{url}] A URL.  The location of the remote repository, if
 18.1149 +  known.  See section~\ref{sec:hook:url} for more information.
 18.1150 +\end{itemize}
 18.1151 +
 18.1152 +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}) \hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), \hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup})
 18.1153 +
 18.1154 +\subsection{\hook{outgoing}---after changesets are propagated}
 18.1155 +\label{sec:hook:outgoing}
 18.1156 +
 18.1157 +This hook is run after a group of changesets has been propagated out
 18.1158 +of this repository, for example by a \hgcmd{push} or \hgcmd{bundle}
 18.1159 +command.
 18.1160 +
 18.1161 +One possible use for this hook is to notify administrators that
 18.1162 +changes have been pulled.
 18.1163 +
 18.1164 +Parameters to this hook:
 18.1165 +\begin{itemize}
 18.1166 +\item[\texttt{node}] A changeset ID.  The changeset ID of the first
 18.1167 +  changeset of the group that was sent.
 18.1168 +\item[\texttt{source}] A string.  The source of the of the operation
 18.1169 +  (see section~\ref{sec:hook:sources}).  If a remote client pulled
 18.1170 +  changes from this repository, \texttt{source} will be
 18.1171 +  \texttt{serve}.  If the client that obtained changes from this
 18.1172 +  repository was local, \texttt{source} will be \texttt{bundle},
 18.1173 +  \texttt{pull}, or \texttt{push}, depending on the operation the
 18.1174 +  client performed.
 18.1175 +\item[\texttt{url}] A URL.  The location of the remote repository, if
 18.1176 +  known.  See section~\ref{sec:hook:url} for more information.
 18.1177 +\end{itemize}
 18.1178 +
 18.1179 +See also: \hook{preoutgoing} (section~\ref{sec:hook:preoutgoing})
 18.1180 +
 18.1181 +\subsection{\hook{prechangegroup}---before starting to add remote changesets}
 18.1182 +\label{sec:hook:prechangegroup}
 18.1183 +
 18.1184 +This controlling hook is run before Mercurial begins to add a group of
 18.1185 +changesets from another repository.
 18.1186 +
 18.1187 +This hook does not have any information about the changesets to be
 18.1188 +added, because it is run before transmission of those changesets is
 18.1189 +allowed to begin.  If this hook fails, the changesets will not be
 18.1190 +transmitted.
 18.1191 +
 18.1192 +One use for this hook is to prevent external changes from being added
 18.1193 +to a repository.  For example, you could use this to ``freeze'' a
 18.1194 +server-hosted branch temporarily or permanently so that users cannot
 18.1195 +push to it, while still allowing a local administrator to modify the
 18.1196 +repository.
 18.1197 +
 18.1198 +Parameters to this hook:
 18.1199 +\begin{itemize}
 18.1200 +\item[\texttt{source}] A string.  The source of these changes.  See
 18.1201 +  section~\ref{sec:hook:sources} for details.
 18.1202 +\item[\texttt{url}] A URL.  The location of the remote repository, if
 18.1203 +  known.  See section~\ref{sec:hook:url} for more information.
 18.1204 +\end{itemize}
 18.1205 +
 18.1206 +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}),
 18.1207 +\hook{incoming} (section~\ref{sec:hook:incoming}), ,
 18.1208 +\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup})
 18.1209 +
 18.1210 +\subsection{\hook{precommit}---before starting to commit a changeset}
 18.1211 +\label{sec:hook:precommit}
 18.1212 +
 18.1213 +This hook is run before Mercurial begins to commit a new changeset.
 18.1214 +It is run before Mercurial has any of the metadata for the commit,
 18.1215 +such as the files to be committed, the commit message, or the commit
 18.1216 +date.
 18.1217 +
 18.1218 +One use for this hook is to disable the ability to commit new
 18.1219 +changesets, while still allowing incoming changesets.  Another is to
 18.1220 +run a build or test, and only allow the commit to begin if the build
 18.1221 +or test succeeds.
 18.1222 +
 18.1223 +Parameters to this hook:
 18.1224 +\begin{itemize}
 18.1225 +\item[\texttt{parent1}] A changeset ID.  The changeset ID of the first
 18.1226 +  parent of the working directory.
 18.1227 +\item[\texttt{parent2}] A changeset ID.  The changeset ID of the second
 18.1228 +  parent of the working directory.
 18.1229 +\end{itemize}
 18.1230 +If the commit proceeds, the parents of the working directory will
 18.1231 +become the parents of the new changeset.
 18.1232 +
 18.1233 +See also: \hook{commit} (section~\ref{sec:hook:commit}),
 18.1234 +\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit})
 18.1235 +
 18.1236 +\subsection{\hook{preoutgoing}---before starting to propagate changesets}
 18.1237 +\label{sec:hook:preoutgoing}
 18.1238 +
 18.1239 +This hook is invoked before Mercurial knows the identities of the
 18.1240 +changesets to be transmitted.
 18.1241 +
 18.1242 +One use for this hook is to prevent changes from being transmitted to
 18.1243 +another repository.
 18.1244 +
 18.1245 +Parameters to this hook:
 18.1246 +\begin{itemize}
 18.1247 +\item[\texttt{source}] A string.  The source of the operation that is
 18.1248 +  attempting to obtain changes from this repository (see
 18.1249 +  section~\ref{sec:hook:sources}).  See the documentation for the
 18.1250 +  \texttt{source} parameter to the \hook{outgoing} hook, in
 18.1251 +  section~\ref{sec:hook:outgoing}, for possible values of this
 18.1252 +  parameter.
 18.1253 +\item[\texttt{url}] A URL.  The location of the remote repository, if
 18.1254 +  known.  See section~\ref{sec:hook:url} for more information.
 18.1255 +\end{itemize}
 18.1256 +
 18.1257 +See also: \hook{outgoing} (section~\ref{sec:hook:outgoing})
 18.1258 +
 18.1259 +\subsection{\hook{pretag}---before tagging a changeset}
 18.1260 +\label{sec:hook:pretag}
 18.1261 +
 18.1262 +This controlling hook is run before a tag is created.  If the hook
 18.1263 +succeeds, creation of the tag proceeds.  If the hook fails, the tag is
 18.1264 +not created.
 18.1265 +
 18.1266 +Parameters to this hook:
 18.1267 +\begin{itemize}
 18.1268 +\item[\texttt{local}] A boolean.  Whether the tag is local to this
 18.1269 +  repository instance (i.e.~stored in \sfilename{.hg/localtags}) or
 18.1270 +  managed by Mercurial (stored in \sfilename{.hgtags}).
 18.1271 +\item[\texttt{node}] A changeset ID.  The ID of the changeset to be tagged.
 18.1272 +\item[\texttt{tag}] A string.  The name of the tag to be created.
 18.1273 +\end{itemize}
 18.1274 +
 18.1275 +If the tag to be created is revision-controlled, the \hook{precommit}
 18.1276 +and \hook{pretxncommit} hooks (sections~\ref{sec:hook:commit}
 18.1277 +and~\ref{sec:hook:pretxncommit}) will also be run.
 18.1278 +
 18.1279 +See also: \hook{tag} (section~\ref{sec:hook:tag})
 18.1280 +
 18.1281 +\subsection{\hook{pretxnchangegroup}---before completing addition of
 18.1282 +  remote changesets}
 18.1283 +\label{sec:hook:pretxnchangegroup}
 18.1284 +
 18.1285 +This controlling hook is run before a transaction---that manages the
 18.1286 +addition of a group of new changesets from outside the
 18.1287 +repository---completes.  If the hook succeeds, the transaction
 18.1288 +completes, and all of the changesets become permanent within this
 18.1289 +repository.  If the hook fails, the transaction is rolled back, and
 18.1290 +the data for the changesets is erased.
 18.1291 +
 18.1292 +This hook can access the metadata associated with the almost-added
 18.1293 +changesets, but it should not do anything permanent with this data.
 18.1294 +It must also not modify the working directory.
 18.1295 +
 18.1296 +While this hook is running, if other Mercurial processes access this
 18.1297 +repository, they will be able to see the almost-added changesets as if
 18.1298 +they are permanent.  This may lead to race conditions if you do not
 18.1299 +take steps to avoid them.
 18.1300 +
 18.1301 +This hook can be used to automatically vet a group of changesets.  If
 18.1302 +the hook fails, all of the changesets are ``rejected'' when the
 18.1303 +transaction rolls back.
 18.1304 +
 18.1305 +Parameters to this hook:
 18.1306 +\begin{itemize}
 18.1307 +\item[\texttt{node}] A changeset ID.  The changeset ID of the first
 18.1308 +  changeset in the group that was added.  All changesets between this
 18.1309 +  and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by
 18.1310 +  a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}.
 18.1311 +\item[\texttt{source}] A string.  The source of these changes.  See
 18.1312 +  section~\ref{sec:hook:sources} for details.
 18.1313 +\item[\texttt{url}] A URL.  The location of the remote repository, if
 18.1314 +  known.  See section~\ref{sec:hook:url} for more information.
 18.1315 +\end{itemize}
 18.1316 +
 18.1317 +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}),
 18.1318 +\hook{incoming} (section~\ref{sec:hook:incoming}),
 18.1319 +\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup})
 18.1320 +
 18.1321 +\subsection{\hook{pretxncommit}---before completing commit of new changeset}
 18.1322 +\label{sec:hook:pretxncommit}
 18.1323 +
 18.1324 +This controlling hook is run before a transaction---that manages a new
 18.1325 +commit---completes.  If the hook succeeds, the transaction completes
 18.1326 +and the changeset becomes permanent within this repository.  If the
 18.1327 +hook fails, the transaction is rolled back, and the commit data is
 18.1328 +erased.
 18.1329 +
 18.1330 +This hook can access the metadata associated with the almost-new
 18.1331 +changeset, but it should not do anything permanent with this data.  It
 18.1332 +must also not modify the working directory.
 18.1333 +
 18.1334 +While this hook is running, if other Mercurial processes access this
 18.1335 +repository, they will be able to see the almost-new changeset as if it
 18.1336 +is permanent.  This may lead to race conditions if you do not take
 18.1337 +steps to avoid them.
 18.1338 +
 18.1339 +Parameters to this hook:
 18.1340 +\begin{itemize}
 18.1341 +\item[\texttt{node}] A changeset ID.  The changeset ID of the newly
 18.1342 +  committed changeset.
 18.1343 +\item[\texttt{parent1}] A changeset ID.  The changeset ID of the first
 18.1344 +  parent of the newly committed changeset.
 18.1345 +\item[\texttt{parent2}] A changeset ID.  The changeset ID of the second
 18.1346 +  parent of the newly committed changeset.
 18.1347 +\end{itemize}
 18.1348 +
 18.1349 +See also: \hook{precommit} (section~\ref{sec:hook:precommit})
 18.1350 +
 18.1351 +\subsection{\hook{preupdate}---before updating or merging working directory}
 18.1352 +\label{sec:hook:preupdate}
 18.1353 +
 18.1354 +This controlling hook is run before an update or merge of the working
 18.1355 +directory begins.  It is run only if Mercurial's normal pre-update
 18.1356 +checks determine that the update or merge can proceed.  If the hook
 18.1357 +succeeds, the update or merge may proceed; if it fails, the update or
 18.1358 +merge does not start.
 18.1359 +
 18.1360 +Parameters to this hook:
 18.1361 +\begin{itemize}
 18.1362 +\item[\texttt{parent1}] A changeset ID.  The ID of the parent that the
 18.1363 +  working directory is to be updated to.  If the working directory is
 18.1364 +  being merged, it will not change this parent.
 18.1365 +\item[\texttt{parent2}] A changeset ID.  Only set if the working
 18.1366 +  directory is being merged.  The ID of the revision that the working
 18.1367 +  directory is being merged with.
 18.1368 +\end{itemize}
 18.1369 +
 18.1370 +See also: \hook{update} (section~\ref{sec:hook:update})
 18.1371 +
 18.1372 +\subsection{\hook{tag}---after tagging a changeset}
 18.1373 +\label{sec:hook:tag}
 18.1374 +
 18.1375 +This hook is run after a tag has been created.
 18.1376 +
 18.1377 +Parameters to this hook:
 18.1378 +\begin{itemize}
 18.1379 +\item[\texttt{local}] A boolean.  Whether the new tag is local to this
 18.1380 +  repository instance (i.e.~stored in \sfilename{.hg/localtags}) or
 18.1381 +  managed by Mercurial (stored in \sfilename{.hgtags}).
 18.1382 +\item[\texttt{node}] A changeset ID.  The ID of the changeset that was
 18.1383 +  tagged.
 18.1384 +\item[\texttt{tag}] A string.  The name of the tag that was created.
 18.1385 +\end{itemize}
 18.1386 +
 18.1387 +If the created tag is revision-controlled, the \hook{commit} hook
 18.1388 +(section~\ref{sec:hook:commit}) is run before this hook.
 18.1389 +
 18.1390 +See also: \hook{pretag} (section~\ref{sec:hook:pretag})
 18.1391 +
 18.1392 +\subsection{\hook{update}---after updating or merging working directory}
 18.1393 +\label{sec:hook:update}
 18.1394 +
 18.1395 +This hook is run after an update or merge of the working directory
 18.1396 +completes.  Since a merge can fail (if the external \command{hgmerge}
 18.1397 +command fails to resolve conflicts in a file), this hook communicates
 18.1398 +whether the update or merge completed cleanly.
 18.1399 +
 18.1400 +\begin{itemize}
 18.1401 +\item[\texttt{error}] A boolean.  Indicates whether the update or
 18.1402 +  merge completed successfully.
 18.1403 +\item[\texttt{parent1}] A changeset ID.  The ID of the parent that the
 18.1404 +  working directory was updated to.  If the working directory was
 18.1405 +  merged, it will not have changed this parent.
 18.1406 +\item[\texttt{parent2}] A changeset ID.  Only set if the working
 18.1407 +  directory was merged.  The ID of the revision that the working
 18.1408 +  directory was merged with.
 18.1409 +\end{itemize}
 18.1410 +
 18.1411 +See also: \hook{preupdate} (section~\ref{sec:hook:preupdate})
 18.1412 +
 18.1413 +%%% Local Variables: 
 18.1414 +%%% mode: latex
 18.1415 +%%% TeX-master: "00book"
 18.1416 +%%% End: 

    19.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    19.2 +++ b/en/ch11-template.tex	Thu Jan 29 22:56:27 2009 -0800
    19.3 @@ -0,0 +1,475 @@
    19.4 +\chapter{Customising the output of Mercurial}
    19.5 +\label{chap:template}
    19.6 +
    19.7 +Mercurial provides a powerful mechanism to let you control how it
    19.8 +displays information.  The mechanism is based on templates.  You can
    19.9 +use templates to generate specific output for a single command, or to
   19.10 +customise the entire appearance of the built-in web interface.
   19.11 +
   19.12 +\section{Using precanned output styles}
   19.13 +\label{sec:style}
   19.14 +
   19.15 +Packaged with Mercurial are some output styles that you can use
   19.16 +immediately.  A style is simply a precanned template that someone
   19.17 +wrote and installed somewhere that Mercurial can find.
   19.18 +
   19.19 +Before we take a look at Mercurial's bundled styles, let's review its
   19.20 +normal output.
   19.21 +
   19.22 +\interaction{template.simple.normal}
   19.23 +
   19.24 +This is somewhat informative, but it takes up a lot of space---five
   19.25 +lines of output per changeset.  The \texttt{compact} style reduces
   19.26 +this to three lines, presented in a sparse manner.
   19.27 +
   19.28 +\interaction{template.simple.compact}
   19.29 +
   19.30 +The \texttt{changelog} style hints at the expressive power of
   19.31 +Mercurial's templating engine.  This style attempts to follow the GNU
   19.32 +Project's changelog guidelines\cite{web:changelog}.
   19.33 +
   19.34 +\interaction{template.simple.changelog}
   19.35 +
   19.36 +You will not be shocked to learn that Mercurial's default output style
   19.37 +is named \texttt{default}.
   19.38 +
   19.39 +\subsection{Setting a default style}
   19.40 +
   19.41 +You can modify the output style that Mercurial will use for every
   19.42 +command by editing your \hgrc\ file, naming the style you would
   19.43 +prefer to use.
   19.44 +
   19.45 +\begin{codesample2}
   19.46 +  [ui]
   19.47 +  style = compact
   19.48 +\end{codesample2}
   19.49 +
   19.50 +If you write a style of your own, you can use it by either providing
   19.51 +the path to your style file, or copying your style file into a
   19.52 +location where Mercurial can find it (typically the \texttt{templates}
   19.53 +subdirectory of your Mercurial install directory).
   19.54 +
   19.55 +\section{Commands that support styles and templates}
   19.56 +
   19.57 +All of Mercurial's ``\texttt{log}-like'' commands let you use styles
   19.58 +and templates: \hgcmd{incoming}, \hgcmd{log}, \hgcmd{outgoing}, and
   19.59 +\hgcmd{tip}.
   19.60 +
   19.61 +As I write this manual, these are so far the only commands that
   19.62 +support styles and templates.  Since these are the most important
   19.63 +commands that need customisable output, there has been little pressure
   19.64 +from the Mercurial user community to add style and template support to
   19.65 +other commands.
   19.66 +
   19.67 +\section{The basics of templating}
   19.68 +
   19.69 +At its simplest, a Mercurial template is a piece of text.  Some of the
   19.70 +text never changes, while other parts are \emph{expanded}, or replaced
   19.71 +with new text, when necessary.
   19.72 +
   19.73 +Before we continue, let's look again at a simple example of
   19.74 +Mercurial's normal output.
   19.75 +
   19.76 +\interaction{template.simple.normal}
   19.77 +
   19.78 +Now, let's run the same command, but using a template to change its
   19.79 +output.
   19.80 +
   19.81 +\interaction{template.simple.simplest}
   19.82 +
   19.83 +The example above illustrates the simplest possible template; it's
   19.84 +just a piece of static text, printed once for each changeset.  The
   19.85 +\hgopt{log}{--template} option to the \hgcmd{log} command tells
   19.86 +Mercurial to use the given text as the template when printing each
   19.87 +changeset.
   19.88 +
   19.89 +Notice that the template string above ends with the text
   19.90 +``\Verb+\n+''.  This is an \emph{escape sequence}, telling Mercurial
   19.91 +to print a newline at the end of each template item.  If you omit this
   19.92 +newline, Mercurial will run each piece of output together.  See
   19.93 +section~\ref{sec:template:escape} for more details of escape sequences.
   19.94 +
   19.95 +A template that prints a fixed string of text all the time isn't very
   19.96 +useful; let's try something a bit more complex.
   19.97 +
   19.98 +\interaction{template.simple.simplesub}
   19.99 +
  19.100 +As you can see, the string ``\Verb+{desc}+'' in the template has been
  19.101 +replaced in the output with the description of each changeset.  Every
  19.102 +time Mercurial finds text enclosed in curly braces (``\texttt{\{}''
  19.103 +and ``\texttt{\}}''), it will try to replace the braces and text with
  19.104 +the expansion of whatever is inside.  To print a literal curly brace,
  19.105 +you must escape it, as described in section~\ref{sec:template:escape}.
  19.106 +
  19.107 +\section{Common template keywords}
  19.108 +\label{sec:template:keyword}
  19.109 +
  19.110 +You can start writing simple templates immediately using the keywords
  19.111 +below.
  19.112 +
  19.113 +\begin{itemize}
  19.114 +\item[\tplkword{author}] String.  The unmodified author of the changeset.
  19.115 +\item[\tplkword{branches}] String.  The name of the branch on which
  19.116 +  the changeset was committed.  Will be empty if the branch name was
  19.117 +  \texttt{default}.
  19.118 +\item[\tplkword{date}] Date information.  The date when the changeset
  19.119 +  was committed.  This is \emph{not} human-readable; you must pass it
  19.120 +  through a filter that will render it appropriately.  See
  19.121 +  section~\ref{sec:template:filter} for more information on filters.
  19.122 +  The date is expressed as a pair of numbers.  The first number is a
  19.123 +  Unix UTC timestamp (seconds since January 1, 1970); the second is
  19.124 +  the offset of the committer's timezone from UTC, in seconds.
  19.125 +\item[\tplkword{desc}] String.  The text of the changeset description.
  19.126 +\item[\tplkword{files}] List of strings.  All files modified, added, or
  19.127 +  removed by this changeset.
  19.128 +\item[\tplkword{file\_adds}] List of strings.  Files added by this
  19.129 +  changeset.
  19.130 +\item[\tplkword{file\_dels}] List of strings.  Files removed by this
  19.131 +  changeset.
  19.132 +\item[\tplkword{node}] String.  The changeset identification hash, as a
  19.133 +  40-character hexadecimal string.
  19.134 +\item[\tplkword{parents}] List of strings.  The parents of the
  19.135 +  changeset.
  19.136 +\item[\tplkword{rev}] Integer.  The repository-local changeset revision
  19.137 +  number.
  19.138 +\item[\tplkword{tags}] List of strings.  Any tags associated with the
  19.139 +  changeset.
  19.140 +\end{itemize}
  19.141 +
  19.142 +A few simple experiments will show us what to expect when we use these
  19.143 +keywords; you can see the results in
  19.144 +figure~\ref{fig:template:keywords}.
  19.145 +
  19.146 +\begin{figure}
  19.147 +  \interaction{template.simple.keywords}
  19.148 +  \caption{Template keywords in use}
  19.149 +  \label{fig:template:keywords}
  19.150 +\end{figure}
  19.151 +
  19.152 +As we noted above, the date keyword does not produce human-readable
  19.153 +output, so we must treat it specially.  This involves using a
  19.154 +\emph{filter}, about which more in section~\ref{sec:template:filter}.
  19.155 +
  19.156 +\interaction{template.simple.datekeyword}
  19.157 +
  19.158 +\section{Escape sequences}
  19.159 +\label{sec:template:escape}
  19.160 +
  19.161 +Mercurial's templating engine recognises the most commonly used escape
  19.162 +sequences in strings.  When it sees a backslash (``\Verb+\+'')
  19.163 +character, it looks at the following character and substitutes the two
  19.164 +characters with a single replacement, as described below.
  19.165 +
  19.166 +\begin{itemize}
  19.167 +\item[\Verb+\textbackslash\textbackslash+] Backslash, ``\Verb+\+'',
  19.168 +  ASCII~134.
  19.169 +\item[\Verb+\textbackslash n+] Newline, ASCII~12.
  19.170 +\item[\Verb+\textbackslash r+] Carriage return, ASCII~15.
  19.171 +\item[\Verb+\textbackslash t+] Tab, ASCII~11.
  19.172 +\item[\Verb+\textbackslash v+] Vertical tab, ASCII~13.
  19.173 +\item[\Verb+\textbackslash \{+] Open curly brace, ``\Verb+{+'', ASCII~173.
  19.174 +\item[\Verb+\textbackslash \}+] Close curly brace, ``\Verb+}+'', ASCII~175.
  19.175 +\end{itemize}
  19.176 +
  19.177 +As indicated above, if you want the expansion of a template to contain
  19.178 +a literal ``\Verb+\+'', ``\Verb+{+'', or ``\Verb+{+'' character, you
  19.179 +must escape it.
  19.180 +
  19.181 +\section{Filtering keywords to change their results}
  19.182 +\label{sec:template:filter}
  19.183 +
  19.184 +Some of the results of template expansion are not immediately easy to
  19.185 +use.  Mercurial lets you specify an optional chain of \emph{filters}
  19.186 +to modify the result of expanding a keyword.  You have already seen a
  19.187 +common filter, \tplkwfilt{date}{isodate}, in action above, to make a
  19.188 +date readable.
  19.189 +
  19.190 +Below is a list of the most commonly used filters that Mercurial
  19.191 +supports.  While some filters can be applied to any text, others can
  19.192 +only be used in specific circumstances.  The name of each filter is
  19.193 +followed first by an indication of where it can be used, then a
  19.194 +description of its effect.
  19.195 +
  19.196 +\begin{itemize}
  19.197 +\item[\tplfilter{addbreaks}] Any text. Add an XHTML ``\Verb+<br/>+''
  19.198 +  tag before the end of every line except the last.  For example,
  19.199 +  ``\Verb+foo\nbar+'' becomes ``\Verb+foo<br/>\nbar+''.
  19.200 +\item[\tplkwfilt{date}{age}] \tplkword{date} keyword.  Render the
  19.201 +  age of the date, relative to the current time.  Yields a string like
  19.202 +  ``\Verb+10 minutes+''.
  19.203 +\item[\tplfilter{basename}] Any text, but most useful for the
  19.204 +  \tplkword{files} keyword and its relatives.  Treat the text as a
  19.205 +  path, and return the basename. For example, ``\Verb+foo/bar/baz+''
  19.206 +  becomes ``\Verb+baz+''.
  19.207 +\item[\tplkwfilt{date}{date}] \tplkword{date} keyword.  Render a date
  19.208 +  in a similar format to the Unix \tplkword{date} command, but with
  19.209 +  timezone included.  Yields a string like
  19.210 +  ``\Verb+Mon Sep 04 15:13:13 2006 -0700+''.
  19.211 +\item[\tplkwfilt{author}{domain}] Any text, but most useful for the
  19.212 +  \tplkword{author} keyword.  Finds the first string that looks like
  19.213 +  an email address, and extract just the domain component.  For
  19.214 +  example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes
  19.215 +  ``\Verb+serpentine.com+''.
  19.216 +\item[\tplkwfilt{author}{email}] Any text, but most useful for the
  19.217 +  \tplkword{author} keyword.  Extract the first string that looks like
  19.218 +  an email address.  For example,
  19.219 +  ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes
  19.220 +  ``\Verb+bos@serpentine.com+''.
  19.221 +\item[\tplfilter{escape}] Any text.  Replace the special XML/XHTML
  19.222 +  characters ``\Verb+&+'', ``\Verb+<+'' and ``\Verb+>+'' with
  19.223 +  XML entities.
  19.224 +\item[\tplfilter{fill68}] Any text.  Wrap the text to fit in 68
  19.225 +  columns.  This is useful before you pass text through the
  19.226 +  \tplfilter{tabindent} filter, and still want it to fit in an
  19.227 +  80-column fixed-font window.
  19.228 +\item[\tplfilter{fill76}] Any text.  Wrap the text to fit in 76
  19.229 +  columns.
  19.230 +\item[\tplfilter{firstline}] Any text.  Yield the first line of text,
  19.231 +  without any trailing newlines.
  19.232 +\item[\tplkwfilt{date}{hgdate}] \tplkword{date} keyword.  Render the
  19.233 +  date as a pair of readable numbers.  Yields a string like
  19.234 +  ``\Verb+1157407993 25200+''.
  19.235 +\item[\tplkwfilt{date}{isodate}] \tplkword{date} keyword.  Render the
  19.236 +  date as a text string in ISO~8601 format.  Yields a string like
  19.237 +  ``\Verb+2006-09-04 15:13:13 -0700+''.
  19.238 +\item[\tplfilter{obfuscate}] Any text, but most useful for the
  19.239 +  \tplkword{author} keyword.  Yield the input text rendered as a
  19.240 +  sequence of XML entities.  This helps to defeat some particularly
  19.241 +  stupid screen-scraping email harvesting spambots.
  19.242 +\item[\tplkwfilt{author}{person}] Any text, but most useful for the
  19.243 +  \tplkword{author} keyword.  Yield the text before an email address.
  19.244 +  For example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+''
  19.245 +  becomes ``\Verb+Bryan O'Sullivan+''.
  19.246 +\item[\tplkwfilt{date}{rfc822date}] \tplkword{date} keyword.  Render a
  19.247 +  date using the same format used in email headers.  Yields a string
  19.248 +  like ``\Verb+Mon, 04 Sep 2006 15:13:13 -0700+''.
  19.249 +\item[\tplkwfilt{node}{short}] Changeset hash.  Yield the short form
  19.250 +  of a changeset hash, i.e.~a 12-character hexadecimal string.
  19.251 +\item[\tplkwfilt{date}{shortdate}] \tplkword{date} keyword.  Render
  19.252 +  the year, month, and day of the date.  Yields a string like
  19.253 +  ``\Verb+2006-09-04+''.
  19.254 +\item[\tplfilter{strip}] Any text.  Strip all leading and trailing
  19.255 +  whitespace from the string.
  19.256 +\item[\tplfilter{tabindent}] Any text.  Yield the text, with every line
  19.257 +  except the first starting with a tab character.
  19.258 +\item[\tplfilter{urlescape}] Any text.  Escape all characters that are
  19.259 +  considered ``special'' by URL parsers.  For example, \Verb+foo bar+
  19.260 +  becomes \Verb+foo%20bar+.
  19.261 +\item[\tplkwfilt{author}{user}] Any text, but most useful for the
  19.262 +  \tplkword{author} keyword.  Return the ``user'' portion of an email
  19.263 +  address.  For example,
  19.264 +  ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes
  19.265 +  ``\Verb+bos+''.
  19.266 +\end{itemize}
  19.267 +
  19.268 +\begin{figure}
  19.269 +  \interaction{template.simple.manyfilters}
  19.270 +  \caption{Template filters in action}
  19.271 +  \label{fig:template:filters}
  19.272 +\end{figure}
  19.273 +
  19.274 +\begin{note}
  19.275 +  If you try to apply a filter to a piece of data that it cannot
  19.276 +  process, Mercurial will fail and print a Python exception.  For
  19.277 +  example, trying to run the output of the \tplkword{desc} keyword
  19.278 +  into the \tplkwfilt{date}{isodate} filter is not a good idea.
  19.279 +\end{note}
  19.280 +
  19.281 +\subsection{Combining filters}
  19.282 +
  19.283 +It is easy to combine filters to yield output in the form you would
  19.284 +like.  The following chain of filters tidies up a description, then
  19.285 +makes sure that it fits cleanly into 68 columns, then indents it by a
  19.286 +further 8~characters (at least on Unix-like systems, where a tab is
  19.287 +conventionally 8~characters wide).
  19.288 +
  19.289 +\interaction{template.simple.combine}
  19.290 +
  19.291 +Note the use of ``\Verb+\t+'' (a tab character) in the template to
  19.292 +force the first line to be indented; this is necessary since
  19.293 +\tplkword{tabindent} indents all lines \emph{except} the first.
  19.294 +
  19.295 +Keep in mind that the order of filters in a chain is significant.  The
  19.296 +first filter is applied to the result of the keyword; the second to
  19.297 +the result of the first filter; and so on.  For example, using
  19.298 +\Verb+fill68|tabindent+ gives very different results from
  19.299 +\Verb+tabindent|fill68+.
  19.300 +
  19.301 +
  19.302 +\section{From templates to styles}
  19.303 +
  19.304 +A command line template provides a quick and simple way to format some
  19.305 +output.  Templates can become verbose, though, and it's useful to be
  19.306 +able to give a template a name.  A style file is a template with a
  19.307 +name, stored in a file.
  19.308 +
  19.309 +More than that, using a style file unlocks the power of Mercurial's
  19.310 +templating engine in ways that are not possible using the command line
  19.311 +\hgopt{log}{--template} option.
  19.312 +
  19.313 +\subsection{The simplest of style files}
  19.314 +
  19.315 +Our simple style file contains just one line:
  19.316 +
  19.317 +\interaction{template.simple.rev}
  19.318 +
  19.319 +This tells Mercurial, ``if you're printing a changeset, use the text
  19.320 +on the right as the template''.
  19.321 +
  19.322 +\subsection{Style file syntax}
  19.323 +
  19.324 +The syntax rules for a style file are simple.
  19.325 +
  19.326 +\begin{itemize}
  19.327 +\item The file is processed one line at a time.
  19.328 +
  19.329 +\item Leading and trailing white space are ignored.
  19.330 +
  19.331 +\item Empty lines are skipped.
  19.332 +
  19.333 +\item If a line starts with either of the characters ``\texttt{\#}'' or
  19.334 +  ``\texttt{;}'', the entire line is treated as a comment, and skipped
  19.335 +  as if empty.
  19.336 +
  19.337 +\item A line starts with a keyword.  This must start with an
  19.338 +  alphabetic character or underscore, and can subsequently contain any
  19.339 +  alphanumeric character or underscore.  (In regexp notation, a
  19.340 +  keyword must match \Verb+[A-Za-z_][A-Za-z0-9_]*+.)
  19.341 +
  19.342 +\item The next element must be an ``\texttt{=}'' character, which can
  19.343 +  be preceded or followed by an arbitrary amount of white space.
  19.344 +
  19.345 +\item If the rest of the line starts and ends with matching quote
  19.346 +  characters (either single or double quote), it is treated as a
  19.347 +  template body.
  19.348 +
  19.349 +\item If the rest of the line \emph{does not} start with a quote
  19.350 +  character, it is treated as the name of a file; the contents of this
  19.351 +  file will be read and used as a template body.
  19.352 +\end{itemize}
  19.353 +
  19.354 +\section{Style files by example}
  19.355 +
  19.356 +To illustrate how to write a style file, we will construct a few by
  19.357 +example.  Rather than provide a complete style file and walk through
  19.358 +it, we'll mirror the usual process of developing a style file by
  19.359 +starting with something very simple, and walking through a series of
  19.360 +successively more complete examples.
  19.361 +
  19.362 +\subsection{Identifying mistakes in style files}
  19.363 +
  19.364 +If Mercurial encounters a problem in a style file you are working on,
  19.365 +it prints a terse error message that, once you figure out what it
  19.366 +means, is actually quite useful.
  19.367 +
  19.368 +\interaction{template.svnstyle.syntax.input}
  19.369 +
  19.370 +Notice that \filename{broken.style} attempts to define a
  19.371 +\texttt{changeset} keyword, but forgets to give any content for it.
  19.372 +When instructed to use this style file, Mercurial promptly complains.
  19.373 +
  19.374 +\interaction{template.svnstyle.syntax.error}
  19.375 +
  19.376 +This error message looks intimidating, but it is not too hard to
  19.377 +follow.
  19.378 +
  19.379 +\begin{itemize}
  19.380 +\item The first component is simply Mercurial's way of saying ``I am
  19.381 +  giving up''.
  19.382 +  \begin{codesample4}
  19.383 +    \textbf{abort:} broken.style:1: parse error
  19.384 +  \end{codesample4}
  19.385 +
  19.386 +\item Next comes the name of the style file that contains the error.
  19.387 +  \begin{codesample4}
  19.388 +    abort: \textbf{broken.style}:1: parse error
  19.389 +  \end{codesample4}
  19.390 +
  19.391 +\item Following the file name is the line number where the error was
  19.392 +  encountered.
  19.393 +  \begin{codesample4}
  19.394 +    abort: broken.style:\textbf{1}: parse error
  19.395 +  \end{codesample4}
  19.396 +
  19.397 +\item Finally, a description of what went wrong.
  19.398 +  \begin{codesample4}
  19.399 +    abort: broken.style:1: \textbf{parse error}
  19.400 +  \end{codesample4}
  19.401 +  The description of the problem is not always clear (as in this
  19.402 +  case), but even when it is cryptic, it is almost always trivial to
  19.403 +  visually inspect the offending line in the style file and see what
  19.404 +  is wrong.
  19.405 +\end{itemize}
  19.406 +
  19.407 +\subsection{Uniquely identifying a repository}
  19.408 +
  19.409 +If you would like to be able to identify a Mercurial repository
  19.410 +``fairly uniquely'' using a short string as an identifier, you can
  19.411 +use the first revision in the repository.
  19.412 +\interaction{template.svnstyle.id} 
  19.413 +This is not guaranteed to be unique, but it is nevertheless useful in
  19.414 +many cases.
  19.415 +\begin{itemize}
  19.416 +\item It will not work in a completely empty repository, because such
  19.417 +  a repository does not have a revision~zero.
  19.418 +\item Neither will it work in the (extremely rare) case where a
  19.419 +  repository is a merge of two or more formerly independent
  19.420 +  repositories, and you still have those repositories around.
  19.421 +\end{itemize}
  19.422 +Here are some uses to which you could put this identifier:
  19.423 +\begin{itemize}
  19.424 +\item As a key into a table for a database that manages repositories
  19.425 +  on a server.
  19.426 +\item As half of a \{\emph{repository~ID}, \emph{revision~ID}\} tuple.
  19.427 +  Save this information away when you run an automated build or other
  19.428 +  activity, so that you can ``replay'' the build later if necessary.
  19.429 +\end{itemize}
  19.430 +
  19.431 +\subsection{Mimicking Subversion's output}
  19.432 +
  19.433 +Let's try to emulate the default output format used by another
  19.434 +revision control tool, Subversion.
  19.435 +\interaction{template.svnstyle.short}
  19.436 +
  19.437 +Since Subversion's output style is fairly simple, it is easy to
  19.438 +copy-and-paste a hunk of its output into a file, and replace the text
  19.439 +produced above by Subversion with the template values we'd like to see
  19.440 +expanded.
  19.441 +\interaction{template.svnstyle.template}
  19.442 +
  19.443 +There are a few small ways in which this template deviates from the
  19.444 +output produced by Subversion.
  19.445 +\begin{itemize}
  19.446 +\item Subversion prints a ``readable'' date (the ``\texttt{Wed, 27 Sep
  19.447 +    2006}'' in the example output above) in parentheses.  Mercurial's
  19.448 +  templating engine does not provide a way to display a date in this
  19.449 +  format without also printing the time and time zone.
  19.450 +\item We emulate Subversion's printing of ``separator'' lines full of
  19.451 +  ``\texttt{-}'' characters by ending the template with such a line.
  19.452 +  We use the templating engine's \tplkword{header} keyword to print a
  19.453 +  separator line as the first line of output (see below), thus
  19.454 +  achieving similar output to Subversion.
  19.455 +\item Subversion's output includes a count in the header of the number
  19.456 +  of lines in the commit message.  We cannot replicate this in
  19.457 +  Mercurial; the templating engine does not currently provide a filter
  19.458 +  that counts the number of lines the template generates.
  19.459 +\end{itemize}
  19.460 +It took me no more than a minute or two of work to replace literal
  19.461 +text from an example of Subversion's output with some keywords and
  19.462 +filters to give the template above.  The style file simply refers to
  19.463 +the template.
  19.464 +\interaction{template.svnstyle.style}
  19.465 +
  19.466 +We could have included the text of the template file directly in the
  19.467 +style file by enclosing it in quotes and replacing the newlines with
  19.468 +``\verb!\n!'' sequences, but it would have made the style file too
  19.469 +difficult to read.  Readability is a good guide when you're trying to
  19.470 +decide whether some text belongs in a style file, or in a template
  19.471 +file that the style file points to.  If the style file will look too
  19.472 +big or cluttered if you insert a literal piece of text, drop it into a
  19.473 +template instead.
  19.474 +
  19.475 +%%% Local Variables: 
  19.476 +%%% mode: latex
  19.477 +%%% TeX-master: "00book"
  19.478 +%%% End: 

    20.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    20.2 +++ b/en/ch12-mq.tex	Thu Jan 29 22:56:27 2009 -0800
    20.3 @@ -0,0 +1,1043 @@
    20.4 +\chapter{Managing change with Mercurial Queues}
    20.5 +\label{chap:mq}
    20.6 +
    20.7 +\section{The patch management problem}
    20.8 +\label{sec:mq:patch-mgmt}
    20.9 +
   20.10 +Here is a common scenario: you need to install a software package from
   20.11 +source, but you find a bug that you must fix in the source before you
   20.12 +can start using the package.  You make your changes, forget about the
   20.13 +package for a while, and a few months later you need to upgrade to a
   20.14 +newer version of the package.  If the newer version of the package
   20.15 +still has the bug, you must extract your fix from the older source
   20.16 +tree and apply it against the newer version.  This is a tedious task,
   20.17 +and it's easy to make mistakes.
   20.18 +
   20.19 +This is a simple case of the ``patch management'' problem.  You have
   20.20 +an ``upstream'' source tree that you can't change; you need to make
   20.21 +some local changes on top of the upstream tree; and you'd like to be
   20.22 +able to keep those changes separate, so that you can apply them to
   20.23 +newer versions of the upstream source.
   20.24 +
   20.25 +The patch management problem arises in many situations.  Probably the
   20.26 +most visible is that a user of an open source software project will
   20.27 +contribute a bug fix or new feature to the project's maintainers in the
   20.28 +form of a patch.
   20.29 +
   20.30 +Distributors of operating systems that include open source software
   20.31 +often need to make changes to the packages they distribute so that
   20.32 +they will build properly in their environments.
   20.33 +
   20.34 +When you have few changes to maintain, it is easy to manage a single
   20.35 +patch using the standard \command{diff} and \command{patch} programs
   20.36 +(see section~\ref{sec:mq:patch} for a discussion of these tools).
   20.37 +Once the number of changes grows, it starts to make sense to maintain
   20.38 +patches as discrete ``chunks of work,'' so that for example a single
   20.39 +patch will contain only one bug fix (the patch might modify several
   20.40 +files, but it's doing ``only one thing''), and you may have a number
   20.41 +of such patches for different bugs you need fixed and local changes
   20.42 +you require.  In this situation, if you submit a bug fix patch to the
   20.43 +upstream maintainers of a package and they include your fix in a
   20.44 +subsequent release, you can simply drop that single patch when you're
   20.45 +updating to the newer release.
   20.46 +
   20.47 +Maintaining a single patch against an upstream tree is a little
   20.48 +tedious and error-prone, but not difficult.  However, the complexity
   20.49 +of the problem grows rapidly as the number of patches you have to
   20.50 +maintain increases.  With more than a tiny number of patches in hand,
   20.51 +understanding which ones you have applied and maintaining them moves
   20.52 +from messy to overwhelming.
   20.53 +
   20.54 +Fortunately, Mercurial includes a powerful extension, Mercurial Queues
   20.55 +(or simply ``MQ''), that massively simplifies the patch management
   20.56 +problem.
   20.57 +
   20.58 +\section{The prehistory of Mercurial Queues}
   20.59 +\label{sec:mq:history}
   20.60 +
   20.61 +During the late 1990s, several Linux kernel developers started to
   20.62 +maintain ``patch series'' that modified the behaviour of the Linux
   20.63 +kernel.  Some of these series were focused on stability, some on
   20.64 +feature coverage, and others were more speculative.
   20.65 +
   20.66 +The sizes of these patch series grew rapidly.  In 2002, Andrew Morton
   20.67 +published some shell scripts he had been using to automate the task of
   20.68 +managing his patch queues.  Andrew was successfully using these
   20.69 +scripts to manage hundreds (sometimes thousands) of patches on top of
   20.70 +the Linux kernel.
   20.71 +
   20.72 +\subsection{A patchwork quilt}
   20.73 +\label{sec:mq:quilt}
   20.74 +
   20.75 +In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the
   20.76 +approach of Andrew's scripts and published a tool called ``patchwork
   20.77 +quilt''~\cite{web:quilt}, or simply ``quilt''
   20.78 +(see~\cite{gruenbacher:2005} for a paper describing it).  Because
   20.79 +quilt substantially automated patch management, it rapidly gained a
   20.80 +large following among open source software developers.
   20.81 +
   20.82 +Quilt manages a \emph{stack of patches} on top of a directory tree.
   20.83 +To begin, you tell quilt to manage a directory tree, and tell it which
   20.84 +files you want to manage; it stores away the names and contents of
   20.85 +those files.  To fix a bug, you create a new patch (using a single
   20.86 +command), edit the files you need to fix, then ``refresh'' the patch.
   20.87 +
   20.88 +The refresh step causes quilt to scan the directory tree; it updates
   20.89 +the patch with all of the changes you have made.  You can create
   20.90 +another patch on top of the first, which will track the changes
   20.91 +required to modify the tree from ``tree with one patch applied'' to
   20.92 +``tree with two patches applied''.
   20.93 +
   20.94 +You can \emph{change} which patches are applied to the tree.  If you
   20.95 +``pop'' a patch, the changes made by that patch will vanish from the
   20.96 +directory tree.  Quilt remembers which patches you have popped,
   20.97 +though, so you can ``push'' a popped patch again, and the directory
   20.98 +tree will be restored to contain the modifications in the patch.  Most
   20.99 +importantly, you can run the ``refresh'' command at any time, and the
  20.100 +topmost applied patch will be updated.  This means that you can, at
  20.101 +any time, change both which patches are applied and what
  20.102 +modifications those patches make.
  20.103 +
  20.104 +Quilt knows nothing about revision control tools, so it works equally
  20.105 +well on top of an unpacked tarball or a Subversion working copy.
  20.106 +
  20.107 +\subsection{From patchwork quilt to Mercurial Queues}
  20.108 +\label{sec:mq:quilt-mq}
  20.109 +
  20.110 +In mid-2005, Chris Mason took the features of quilt and wrote an
  20.111 +extension that he called Mercurial Queues, which added quilt-like
  20.112 +behaviour to Mercurial.
  20.113 +
  20.114 +The key difference between quilt and MQ is that quilt knows nothing
  20.115 +about revision control systems, while MQ is \emph{integrated} into
  20.116 +Mercurial.  Each patch that you push is represented as a Mercurial
  20.117 +changeset.  Pop a patch, and the changeset goes away.
  20.118 +
  20.119 +Because quilt does not care about revision control tools, it is still
  20.120 +a tremendously useful piece of software to know about for situations
  20.121 +where you cannot use Mercurial and MQ.
  20.122 +
  20.123 +\section{The huge advantage of MQ}
  20.124 +
  20.125 +I cannot overstate the value that MQ offers through the unification of
  20.126 +patches and revision control.
  20.127 +
  20.128 +A major reason that patches have persisted in the free software and
  20.129 +open source world---in spite of the availability of increasingly
  20.130 +capable revision control tools over the years---is the \emph{agility}
  20.131 +they offer.  
  20.132 +
  20.133 +Traditional revision control tools make a permanent, irreversible
  20.134 +record of everything that you do.  While this has great value, it's
  20.135 +also somewhat stifling.  If you want to perform a wild-eyed
  20.136 +experiment, you have to be careful in how you go about it, or you risk
  20.137 +leaving unneeded---or worse, misleading or destabilising---traces of
  20.138 +your missteps and errors in the permanent revision record.
  20.139 +
  20.140 +By contrast, MQ's marriage of distributed revision control with
  20.141 +patches makes it much easier to isolate your work.  Your patches live
  20.142 +on top of normal revision history, and you can make them disappear or
  20.143 +reappear at will.  If you don't like a patch, you can drop it.  If a
  20.144 +patch isn't quite as you want it to be, simply fix it---as many times
  20.145 +as you need to, until you have refined it into the form you desire.
  20.146 +
  20.147 +As an example, the integration of patches with revision control makes
  20.148 +understanding patches and debugging their effects---and their
  20.149 +interplay with the code they're based on---\emph{enormously} easier.
  20.150 +Since every applied patch has an associated changeset, you can use
  20.151 +\hgcmdargs{log}{\emph{filename}} to see which changesets and patches
  20.152 +affected a file.  You can use the \hgext{bisect} command to
  20.153 +binary-search through all changesets and applied patches to see where
  20.154 +a bug got introduced or fixed.  You can use the \hgcmd{annotate}
  20.155 +command to see which changeset or patch modified a particular line of
  20.156 +a source file.  And so on.
  20.157 +
  20.158 +\section{Understanding patches}
  20.159 +\label{sec:mq:patch}
  20.160 +
  20.161 +Because MQ doesn't hide its patch-oriented nature, it is helpful to
  20.162 +understand what patches are, and a little about the tools that work
  20.163 +with them.
  20.164 +
  20.165 +The traditional Unix \command{diff} command compares two files, and
  20.166 +prints a list of differences between them. The \command{patch} command
  20.167 +understands these differences as \emph{modifications} to make to a
  20.168 +file.  Take a look at figure~\ref{ex:mq:diff} for a simple example of
  20.169 +these commands in action.
  20.170 +
  20.171 +\begin{figure}[ht]
  20.172 +  \interaction{mq.dodiff.diff}
  20.173 +  \caption{Simple uses of the \command{diff} and \command{patch} commands}
  20.174 +  \label{ex:mq:diff}
  20.175 +\end{figure}
  20.176 +
  20.177 +The type of file that \command{diff} generates (and \command{patch}
  20.178 +takes as input) is called a ``patch'' or a ``diff''; there is no
  20.179 +difference between a patch and a diff.  (We'll use the term ``patch'',
  20.180 +since it's more commonly used.)
  20.181 +
  20.182 +A patch file can start with arbitrary text; the \command{patch}
  20.183 +command ignores this text, but MQ uses it as the commit message when
  20.184 +creating changesets.  To find the beginning of the patch content,
  20.185 +\command{patch} searches for the first line that starts with the
  20.186 +string ``\texttt{diff~-}''.
  20.187 +
  20.188 +MQ works with \emph{unified} diffs (\command{patch} can accept several
  20.189 +other diff formats, but MQ doesn't).  A unified diff contains two
  20.190 +kinds of header.  The \emph{file header} describes the file being
  20.191 +modified; it contains the name of the file to modify.  When
  20.192 +\command{patch} sees a new file header, it looks for a file with that
  20.193 +name to start modifying.
  20.194 +
  20.195 +After the file header comes a series of \emph{hunks}.  Each hunk
  20.196 +starts with a header; this identifies the range of line numbers within
  20.197 +the file that the hunk should modify.  Following the header, a hunk
  20.198 +starts and ends with a few (usually three) lines of text from the
  20.199 +unmodified file; these are called the \emph{context} for the hunk.  If
  20.200 +there's only a small amount of context between successive hunks,
  20.201 +\command{diff} doesn't print a new hunk header; it just runs the hunks
  20.202 +together, with a few lines of context between modifications.
  20.203 +
  20.204 +Each line of context begins with a space character.  Within the hunk,
  20.205 +a line that begins with ``\texttt{-}'' means ``remove this line,''
  20.206 +while a line that begins with ``\texttt{+}'' means ``insert this
  20.207 +line.''  For example, a line that is modified is represented by one
  20.208 +deletion and one insertion.
  20.209 +
  20.210 +We will return to some of the more subtle aspects of patches later (in
  20.211 +section~\ref{sec:mq:adv-patch}), but you should have enough information
  20.212 +now to use MQ.
  20.213 +
  20.214 +\section{Getting started with Mercurial Queues}
  20.215 +\label{sec:mq:start}
  20.216 +
  20.217 +Because MQ is implemented as an extension, you must explicitly enable
  20.218 +before you can use it.  (You don't need to download anything; MQ ships
  20.219 +with the standard Mercurial distribution.)  To enable MQ, edit your
  20.220 +\tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}.
  20.221 +
  20.222 +\begin{figure}[ht]
  20.223 +  \begin{codesample4}
  20.224 +    [extensions]
  20.225 +    hgext.mq =
  20.226 +  \end{codesample4}
  20.227 +  \label{ex:mq:config}
  20.228 +  \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension}
  20.229 +\end{figure}
  20.230 +
  20.231 +Once the extension is enabled, it will make a number of new commands
  20.232 +available.  To verify that the extension is working, you can use
  20.233 +\hgcmd{help} to see if the \hgxcmd{mq}{qinit} command is now available; see
  20.234 +the example in figure~\ref{ex:mq:enabled}.
  20.235 +
  20.236 +\begin{figure}[ht]
  20.237 +  \interaction{mq.qinit-help.help}
  20.238 +  \caption{How to verify that MQ is enabled}
  20.239 +  \label{ex:mq:enabled}
  20.240 +\end{figure}
  20.241 +
  20.242 +You can use MQ with \emph{any} Mercurial repository, and its commands
  20.243 +only operate within that repository.  To get started, simply prepare
  20.244 +the repository using the \hgxcmd{mq}{qinit} command (see
  20.245 +figure~\ref{ex:mq:qinit}).  This command creates an empty directory
  20.246 +called \sdirname{.hg/patches}, where MQ will keep its metadata.  As
  20.247 +with many Mercurial commands, the \hgxcmd{mq}{qinit} command prints nothing
  20.248 +if it succeeds.
  20.249 +
  20.250 +\begin{figure}[ht]
  20.251 +  \interaction{mq.tutorial.qinit}
  20.252 +  \caption{Preparing a repository for use with MQ}
  20.253 +  \label{ex:mq:qinit}
  20.254 +\end{figure}
  20.255 +
  20.256 +\begin{figure}[ht]
  20.257 +  \interaction{mq.tutorial.qnew}
  20.258 +  \caption{Creating a new patch}
  20.259 +  \label{ex:mq:qnew}
  20.260 +\end{figure}
  20.261 +
  20.262 +\subsection{Creating a new patch}
  20.263 +
  20.264 +To begin work on a new patch, use the \hgxcmd{mq}{qnew} command.  This
  20.265 +command takes one argument, the name of the patch to create.  MQ will
  20.266 +use this as the name of an actual file in the \sdirname{.hg/patches}
  20.267 +directory, as you can see in figure~\ref{ex:mq:qnew}.
  20.268 +
  20.269 +Also newly present in the \sdirname{.hg/patches} directory are two
  20.270 +other files, \sfilename{series} and \sfilename{status}.  The
  20.271 +\sfilename{series} file lists all of the patches that MQ knows about
  20.272 +for this repository, with one patch per line.  Mercurial uses the
  20.273 +\sfilename{status} file for internal book-keeping; it tracks all of the
  20.274 +patches that MQ has \emph{applied} in this repository.
  20.275 +
  20.276 +\begin{note}
  20.277 +  You may sometimes want to edit the \sfilename{series} file by hand;
  20.278 +  for example, to change the sequence in which some patches are
  20.279 +  applied.  However, manually editing the \sfilename{status} file is
  20.280 +  almost always a bad idea, as it's easy to corrupt MQ's idea of what
  20.281 +  is happening.
  20.282 +\end{note}
  20.283 +
  20.284 +Once you have created your new patch, you can edit files in the
  20.285 +working directory as you usually would.  All of the normal Mercurial
  20.286 +commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as
  20.287 +they did before.
  20.288 +
  20.289 +\subsection{Refreshing a patch}
  20.290 +
  20.291 +When you reach a point where you want to save your work, use the
  20.292 +\hgxcmd{mq}{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch
  20.293 +you are working on.  This command folds the changes you have made in
  20.294 +the working directory into your patch, and updates its corresponding
  20.295 +changeset to contain those changes.
  20.296 +
  20.297 +\begin{figure}[ht]
  20.298 +  \interaction{mq.tutorial.qrefresh}
  20.299 +  \caption{Refreshing a patch}
  20.300 +  \label{ex:mq:qrefresh}
  20.301 +\end{figure}
  20.302 +
  20.303 +You can run \hgxcmd{mq}{qrefresh} as often as you like, so it's a good way
  20.304 +to ``checkpoint'' your work.  Refresh your patch at an opportune
  20.305 +time; try an experiment; and if the experiment doesn't work out,
  20.306 +\hgcmd{revert} your modifications back to the last time you refreshed.
  20.307 +
  20.308 +\begin{figure}[ht]
  20.309 +  \interaction{mq.tutorial.qrefresh2}
  20.310 +  \caption{Refresh a patch many times to accumulate changes}
  20.311 +  \label{ex:mq:qrefresh2}
  20.312 +\end{figure}
  20.313 +
  20.314 +\subsection{Stacking and tracking patches}
  20.315 +
  20.316 +Once you have finished working on a patch, or need to work on another,
  20.317 +you can use the \hgxcmd{mq}{qnew} command again to create a new patch.
  20.318 +Mercurial will apply this patch on top of your existing patch.  See
  20.319 +figure~\ref{ex:mq:qnew2} for an example.  Notice that the patch
  20.320 +contains the changes in our prior patch as part of its context (you
  20.321 +can see this more clearly in the output of \hgcmd{annotate}).
  20.322 +
  20.323 +\begin{figure}[ht]
  20.324 +  \interaction{mq.tutorial.qnew2}
  20.325 +  \caption{Stacking a second patch on top of the first}
  20.326 +  \label{ex:mq:qnew2}
  20.327 +\end{figure}
  20.328 +
  20.329 +So far, with the exception of \hgxcmd{mq}{qnew} and \hgxcmd{mq}{qrefresh}, we've
  20.330 +been careful to only use regular Mercurial commands.  However, MQ
  20.331 +provides many commands that are easier to use when you are thinking
  20.332 +about patches, as illustrated in figure~\ref{ex:mq:qseries}:
  20.333 +
  20.334 +\begin{itemize}
  20.335 +\item The \hgxcmd{mq}{qseries} command lists every patch that MQ knows
  20.336 +  about in this repository, from oldest to newest (most recently
  20.337 +  \emph{created}).
  20.338 +\item The \hgxcmd{mq}{qapplied} command lists every patch that MQ has
  20.339 +  \emph{applied} in this repository, again from oldest to newest (most
  20.340 +  recently applied).
  20.341 +\end{itemize}
  20.342 +
  20.343 +\begin{figure}[ht]
  20.344 +  \interaction{mq.tutorial.qseries}
  20.345 +  \caption{Understanding the patch stack with \hgxcmd{mq}{qseries} and
  20.346 +    \hgxcmd{mq}{qapplied}}
  20.347 +  \label{ex:mq:qseries}
  20.348 +\end{figure}
  20.349 +
  20.350 +\subsection{Manipulating the patch stack}
  20.351 +
  20.352 +The previous discussion implied that there must be a difference
  20.353 +between ``known'' and ``applied'' patches, and there is.  MQ can
  20.354 +manage a patch without it being applied in the repository.
  20.355 +
  20.356 +An \emph{applied} patch has a corresponding changeset in the
  20.357 +repository, and the effects of the patch and changeset are visible in
  20.358 +the working directory.  You can undo the application of a patch using
  20.359 +the \hgxcmd{mq}{qpop} command.  MQ still \emph{knows about}, or manages, a
  20.360 +popped patch, but the patch no longer has a corresponding changeset in
  20.361 +the repository, and the working directory does not contain the changes
  20.362 +made by the patch.  Figure~\ref{fig:mq:stack} illustrates the
  20.363 +difference between applied and tracked patches.
  20.364 +
  20.365 +\begin{figure}[ht]
  20.366 +  \centering
  20.367 +  \grafix{mq-stack}
  20.368 +  \caption{Applied and unapplied patches in the MQ patch stack}
  20.369 +  \label{fig:mq:stack}
  20.370 +\end{figure}
  20.371 +
  20.372 +You can reapply an unapplied, or popped, patch using the \hgxcmd{mq}{qpush}
  20.373 +command.  This creates a new changeset to correspond to the patch, and
  20.374 +the patch's changes once again become present in the working
  20.375 +directory.  See figure~\ref{ex:mq:qpop} for examples of \hgxcmd{mq}{qpop}
  20.376 +and \hgxcmd{mq}{qpush} in action.  Notice that once we have popped a patch
  20.377 +or two patches, the output of \hgxcmd{mq}{qseries} remains the same, while
  20.378 +that of \hgxcmd{mq}{qapplied} has changed.
  20.379 +
  20.380 +\begin{figure}[ht]
  20.381 +  \interaction{mq.tutorial.qpop}
  20.382 +  \caption{Modifying the stack of applied patches}
  20.383 +  \label{ex:mq:qpop}
  20.384 +\end{figure}
  20.385 +
  20.386 +\subsection{Pushing and popping many patches}
  20.387 +
  20.388 +While \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} each operate on a single patch at
  20.389 +a time by default, you can push and pop many patches in one go.  The
  20.390 +\hgxopt{mq}{qpush}{-a} option to \hgxcmd{mq}{qpush} causes it to push all
  20.391 +unapplied patches, while the \hgxopt{mq}{qpop}{-a} option to \hgxcmd{mq}{qpop}
  20.392 +causes it to pop all applied patches.  (For some more ways to push and
  20.393 +pop many patches, see section~\ref{sec:mq:perf} below.)
  20.394 +
  20.395 +\begin{figure}[ht]
  20.396 +  \interaction{mq.tutorial.qpush-a}
  20.397 +  \caption{Pushing all unapplied patches}
  20.398 +  \label{ex:mq:qpush-a}
  20.399 +\end{figure}
  20.400 +
  20.401 +\subsection{Safety checks, and overriding them}
  20.402 +
  20.403 +Several MQ commands check the working directory before they do
  20.404 +anything, and fail if they find any modifications.  They do this to
  20.405 +ensure that you won't lose any changes that you have made, but not yet
  20.406 +incorporated into a patch.  Figure~\ref{ex:mq:add} illustrates this;
  20.407 +the \hgxcmd{mq}{qnew} command will not create a new patch if there are
  20.408 +outstanding changes, caused in this case by the \hgcmd{add} of
  20.409 +\filename{file3}.
  20.410 +
  20.411 +\begin{figure}[ht]
  20.412 +  \interaction{mq.tutorial.add}
  20.413 +  \caption{Forcibly creating a patch}
  20.414 +  \label{ex:mq:add}
  20.415 +\end{figure}
  20.416 +
  20.417 +Commands that check the working directory all take an ``I know what
  20.418 +I'm doing'' option, which is always named \option{-f}.  The exact
  20.419 +meaning of \option{-f} depends on the command.  For example,
  20.420 +\hgcmdargs{qnew}{\hgxopt{mq}{qnew}{-f}} will incorporate any outstanding
  20.421 +changes into the new patch it creates, but
  20.422 +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-f}} will revert modifications to any
  20.423 +files affected by the patch that it is popping.  Be sure to read the
  20.424 +documentation for a command's \option{-f} option before you use it!
  20.425 +
  20.426 +\subsection{Working on several patches at once}
  20.427 +
  20.428 +The \hgxcmd{mq}{qrefresh} command always refreshes the \emph{topmost}
  20.429 +applied patch.  This means that you can suspend work on one patch (by
  20.430 +refreshing it), pop or push to make a different patch the top, and
  20.431 +work on \emph{that} patch for a while.
  20.432 +
  20.433 +Here's an example that illustrates how you can use this ability.
  20.434 +Let's say you're developing a new feature as two patches.  The first
  20.435 +is a change to the core of your software, and the second---layered on
  20.436 +top of the first---changes the user interface to use the code you just
  20.437 +added to the core.  If you notice a bug in the core while you're
  20.438 +working on the UI patch, it's easy to fix the core.  Simply
  20.439 +\hgxcmd{mq}{qrefresh} the UI patch to save your in-progress changes, and
  20.440 +\hgxcmd{mq}{qpop} down to the core patch.  Fix the core bug,
  20.441 +\hgxcmd{mq}{qrefresh} the core patch, and \hgxcmd{mq}{qpush} back to the UI
  20.442 +patch to continue where you left off.
  20.443 +
  20.444 +\section{More about patches}
  20.445 +\label{sec:mq:adv-patch}
  20.446 +
  20.447 +MQ uses the GNU \command{patch} command to apply patches, so it's
  20.448 +helpful to know a few more detailed aspects of how \command{patch}
  20.449 +works, and about patches themselves.
  20.450 +
  20.451 +\subsection{The strip count}
  20.452 +
  20.453 +If you look at the file headers in a patch, you will notice that the
  20.454 +pathnames usually have an extra component on the front that isn't
  20.455 +present in the actual path name.  This is a holdover from the way that
  20.456 +people used to generate patches (people still do this, but it's
  20.457 +somewhat rare with modern revision control tools).  
  20.458 +
  20.459 +Alice would unpack a tarball, edit her files, then decide that she
  20.460 +wanted to create a patch.  So she'd rename her working directory,
  20.461 +unpack the tarball again (hence the need for the rename), and use the
  20.462 +\cmdopt{diff}{-r} and \cmdopt{diff}{-N} options to \command{diff} to
  20.463 +recursively generate a patch between the unmodified directory and the
  20.464 +modified one.  The result would be that the name of the unmodified
  20.465 +directory would be at the front of the left-hand path in every file
  20.466 +header, and the name of the modified directory would be at the front
  20.467 +of the right-hand path.
  20.468 +
  20.469 +Since someone receiving a patch from the Alices of the net would be
  20.470 +unlikely to have unmodified and modified directories with exactly the
  20.471 +same names, the \command{patch} command has a \cmdopt{patch}{-p}
  20.472 +option that indicates the number of leading path name components to
  20.473 +strip when trying to apply a patch.  This number is called the
  20.474 +\emph{strip count}.
  20.475 +
  20.476 +An option of ``\texttt{-p1}'' means ``use a strip count of one''.  If
  20.477 +\command{patch} sees a file name \filename{foo/bar/baz} in a file
  20.478 +header, it will strip \filename{foo} and try to patch a file named
  20.479 +\filename{bar/baz}.  (Strictly speaking, the strip count refers to the
  20.480 +number of \emph{path separators} (and the components that go with them
  20.481 +) to strip.  A strip count of one will turn \filename{foo/bar} into
  20.482 +\filename{bar}, but \filename{/foo/bar} (notice the extra leading
  20.483 +slash) into \filename{foo/bar}.)
  20.484 +
  20.485 +The ``standard'' strip count for patches is one; almost all patches
  20.486 +contain one leading path name component that needs to be stripped.
  20.487 +Mercurial's \hgcmd{diff} command generates path names in this form,
  20.488 +and the \hgcmd{import} command and MQ expect patches to have a strip
  20.489 +count of one.
  20.490 +
  20.491 +If you receive a patch from someone that you want to add to your patch
  20.492 +queue, and the patch needs a strip count other than one, you cannot
  20.493 +just \hgxcmd{mq}{qimport} the patch, because \hgxcmd{mq}{qimport} does not yet
  20.494 +have a \texttt{-p} option (see~\bug{311}).  Your best bet is to
  20.495 +\hgxcmd{mq}{qnew} a patch of your own, then use \cmdargs{patch}{-p\emph{N}}
  20.496 +to apply their patch, followed by \hgcmd{addremove} to pick up any
  20.497 +files added or removed by the patch, followed by \hgxcmd{mq}{qrefresh}.
  20.498 +This complexity may become unnecessary; see~\bug{311} for details.
  20.499 +\subsection{Strategies for applying a patch}
  20.500 +
  20.501 +When \command{patch} applies a hunk, it tries a handful of
  20.502 +successively less accurate strategies to try to make the hunk apply.
  20.503 +This falling-back technique often makes it possible to take a patch
  20.504 +that was generated against an old version of a file, and apply it
  20.505 +against a newer version of that file.
  20.506 +
  20.507 +First, \command{patch} tries an exact match, where the line numbers,
  20.508 +the context, and the text to be modified must apply exactly.  If it
  20.509 +cannot make an exact match, it tries to find an exact match for the
  20.510 +context, without honouring the line numbering information.  If this
  20.511 +succeeds, it prints a line of output saying that the hunk was applied,
  20.512 +but at some \emph{offset} from the original line number.
  20.513 +
  20.514 +If a context-only match fails, \command{patch} removes the first and
  20.515 +last lines of the context, and tries a \emph{reduced} context-only
  20.516 +match.  If the hunk with reduced context succeeds, it prints a message
  20.517 +saying that it applied the hunk with a \emph{fuzz factor} (the number
  20.518 +after the fuzz factor indicates how many lines of context
  20.519 +\command{patch} had to trim before the patch applied).
  20.520 +
  20.521 +When neither of these techniques works, \command{patch} prints a
  20.522 +message saying that the hunk in question was rejected.  It saves
  20.523 +rejected hunks (also simply called ``rejects'') to a file with the
  20.524 +same name, and an added \sfilename{.rej} extension.  It also saves an
  20.525 +unmodified copy of the file with a \sfilename{.orig} extension; the
  20.526 +copy of the file without any extensions will contain any changes made
  20.527 +by hunks that \emph{did} apply cleanly.  If you have a patch that
  20.528 +modifies \filename{foo} with six hunks, and one of them fails to
  20.529 +apply, you will have: an unmodified \filename{foo.orig}, a
  20.530 +\filename{foo.rej} containing one hunk, and \filename{foo}, containing
  20.531 +the changes made by the five successful hunks.
  20.532 +
  20.533 +\subsection{Some quirks of patch representation}
  20.534 +
  20.535 +There are a few useful things to know about how \command{patch} works
  20.536 +with files.
  20.537 +\begin{itemize}
  20.538 +\item This should already be obvious, but \command{patch} cannot
  20.539 +  handle binary files.
  20.540 +\item Neither does it care about the executable bit; it creates new
  20.541 +  files as readable, but not executable.
  20.542 +\item \command{patch} treats the removal of a file as a diff between
  20.543 +  the file to be removed and the empty file.  So your idea of ``I
  20.544 +  deleted this file'' looks like ``every line of this file was
  20.545 +  deleted'' in a patch.
  20.546 +\item It treats the addition of a file as a diff between the empty
  20.547 +  file and the file to be added.  So in a patch, your idea of ``I
  20.548 +  added this file'' looks like ``every line of this file was added''.
  20.549 +\item It treats a renamed file as the removal of the old name, and the
  20.550 +  addition of the new name.  This means that renamed files have a big
  20.551 +  footprint in patches.  (Note also that Mercurial does not currently
  20.552 +  try to infer when files have been renamed or copied in a patch.)
  20.553 +\item \command{patch} cannot represent empty files, so you cannot use
  20.554 +  a patch to represent the notion ``I added this empty file to the
  20.555 +  tree''.
  20.556 +\end{itemize}
  20.557 +\subsection{Beware the fuzz}
  20.558 +
  20.559 +While applying a hunk at an offset, or with a fuzz factor, will often
  20.560 +be completely successful, these inexact techniques naturally leave
  20.561 +open the possibility of corrupting the patched file.  The most common
  20.562 +cases typically involve applying a patch twice, or at an incorrect
  20.563 +location in the file.  If \command{patch} or \hgxcmd{mq}{qpush} ever
  20.564 +mentions an offset or fuzz factor, you should make sure that the
  20.565 +modified files are correct afterwards.  
  20.566 +
  20.567 +It's often a good idea to refresh a patch that has applied with an
  20.568 +offset or fuzz factor; refreshing the patch generates new context
  20.569 +information that will make it apply cleanly.  I say ``often,'' not
  20.570 +``always,'' because sometimes refreshing a patch will make it fail to
  20.571 +apply against a different revision of the underlying files.  In some
  20.572 +cases, such as when you're maintaining a patch that must sit on top of
  20.573 +multiple versions of a source tree, it's acceptable to have a patch
  20.574 +apply with some fuzz, provided you've verified the results of the
  20.575 +patching process in such cases.
  20.576 +
  20.577 +\subsection{Handling rejection}
  20.578 +
  20.579 +If \hgxcmd{mq}{qpush} fails to apply a patch, it will print an error
  20.580 +message and exit.  If it has left \sfilename{.rej} files behind, it is
  20.581 +usually best to fix up the rejected hunks before you push more patches
  20.582 +or do any further work.
  20.583 +
  20.584 +If your patch \emph{used to} apply cleanly, and no longer does because
  20.585 +you've changed the underlying code that your patches are based on,
  20.586 +Mercurial Queues can help; see section~\ref{sec:mq:merge} for details.
  20.587 +
  20.588 +Unfortunately, there aren't any great techniques for dealing with
  20.589 +rejected hunks.  Most often, you'll need to view the \sfilename{.rej}
  20.590 +file and edit the target file, applying the rejected hunks by hand.
  20.591 +
  20.592 +If you're feeling adventurous, Neil Brown, a Linux kernel hacker,
  20.593 +wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more
  20.594 +vigorous than \command{patch} in its attempts to make a patch apply.
  20.595 +
  20.596 +Another Linux kernel hacker, Chris Mason (the author of Mercurial
  20.597 +Queues), wrote a similar tool called
  20.598 +\command{mpatch}~\cite{web:mpatch}, which takes a simple approach to
  20.599 +automating the application of hunks rejected by \command{patch}.  The
  20.600 +\command{mpatch} command can help with four common reasons that a hunk
  20.601 +may be rejected:
  20.602 +
  20.603 +\begin{itemize}
  20.604 +\item The context in the middle of a hunk has changed.
  20.605 +\item A hunk is missing some context at the beginning or end.
  20.606 +\item A large hunk might apply better---either entirely or in
  20.607 +  part---if it was broken up into smaller hunks.
  20.608 +\item A hunk removes lines with slightly different content than those
  20.609 +  currently present in the file.
  20.610 +\end{itemize}
  20.611 +
  20.612 +If you use \command{wiggle} or \command{mpatch}, you should be doubly
  20.613 +careful to check your results when you're done.  In fact,
  20.614 +\command{mpatch} enforces this method of double-checking the tool's
  20.615 +output, by automatically dropping you into a merge program when it has
  20.616 +done its job, so that you can verify its work and finish off any
  20.617 +remaining merges.
  20.618 +
  20.619 +\section{Getting the best performance out of MQ}
  20.620 +\label{sec:mq:perf}
  20.621 +
  20.622 +MQ is very efficient at handling a large number of patches.  I ran
  20.623 +some performance experiments in mid-2006 for a talk that I gave at the
  20.624 +2006 EuroPython conference~\cite{web:europython}.  I used as my data
  20.625 +set the Linux 2.6.17-mm1 patch series, which consists of 1,738
  20.626 +patches.  I applied these on top of a Linux kernel repository
  20.627 +containing all 27,472 revisions between Linux 2.6.12-rc2 and Linux
  20.628 +2.6.17.
  20.629 +
  20.630 +On my old, slow laptop, I was able to
  20.631 +\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} all 1,738 patches in 3.5 minutes,
  20.632 +and \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} them all in 30 seconds.  (On a
  20.633 +newer laptop, the time to push all patches dropped to two minutes.)  I
  20.634 +could \hgxcmd{mq}{qrefresh} one of the biggest patches (which made 22,779
  20.635 +lines of changes to 287 files) in 6.6 seconds.
  20.636 +
  20.637 +Clearly, MQ is well suited to working in large trees, but there are a
  20.638 +few tricks you can use to get the best performance of it.
  20.639 +
  20.640 +First of all, try to ``batch'' operations together.  Every time you
  20.641 +run \hgxcmd{mq}{qpush} or \hgxcmd{mq}{qpop}, these commands scan the working
  20.642 +directory once to make sure you haven't made some changes and then
  20.643 +forgotten to run \hgxcmd{mq}{qrefresh}.  On a small tree, the time that
  20.644 +this scan takes is unnoticeable.  However, on a medium-sized tree
  20.645 +(containing tens of thousands of files), it can take a second or more.
  20.646 +
  20.647 +The \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} commands allow you to push and pop
  20.648 +multiple patches at a time.  You can identify the ``destination
  20.649 +patch'' that you want to end up at.  When you \hgxcmd{mq}{qpush} with a
  20.650 +destination specified, it will push patches until that patch is at the
  20.651 +top of the applied stack.  When you \hgxcmd{mq}{qpop} to a destination, MQ
  20.652 +will pop patches until the destination patch is at the top.
  20.653 +
  20.654 +You can identify a destination patch using either the name of the
  20.655 +patch, or by number.  If you use numeric addressing, patches are
  20.656 +counted from zero; this means that the first patch is zero, the second
  20.657 +is one, and so on.
  20.658 +
  20.659 +\section{Updating your patches when the underlying code changes}
  20.660 +\label{sec:mq:merge}
  20.661 +
  20.662 +It's common to have a stack of patches on top of an underlying
  20.663 +repository that you don't modify directly.  If you're working on
  20.664 +changes to third-party code, or on a feature that is taking longer to
  20.665 +develop than the rate of change of the code beneath, you will often
  20.666 +need to sync up with the underlying code, and fix up any hunks in your
  20.667 +patches that no longer apply.  This is called \emph{rebasing} your
  20.668 +patch series.
  20.669 +
  20.670 +The simplest way to do this is to \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}}
  20.671 +your patches, then \hgcmd{pull} changes into the underlying
  20.672 +repository, and finally \hgcmdargs{qpush}{\hgxopt{mq}{qpop}{-a}} your
  20.673 +patches again.  MQ will stop pushing any time it runs across a patch
  20.674 +that fails to apply during conflicts, allowing you to fix your
  20.675 +conflicts, \hgxcmd{mq}{qrefresh} the affected patch, and continue pushing
  20.676 +until you have fixed your entire stack.
  20.677 +
  20.678 +This approach is easy to use and works well if you don't expect
  20.679 +changes to the underlying code to affect how well your patches apply.
  20.680 +If your patch stack touches code that is modified frequently or
  20.681 +invasively in the underlying repository, however, fixing up rejected
  20.682 +hunks by hand quickly becomes tiresome.
  20.683 +
  20.684 +It's possible to partially automate the rebasing process.  If your
  20.685 +patches apply cleanly against some revision of the underlying repo, MQ
  20.686 +can use this information to help you to resolve conflicts between your
  20.687 +patches and a different revision.
  20.688 +
  20.689 +The process is a little involved.
  20.690 +\begin{enumerate}
  20.691 +\item To begin, \hgcmdargs{qpush}{-a} all of your patches on top of
  20.692 +  the revision where you know that they apply cleanly.
  20.693 +\item Save a backup copy of your patch directory using
  20.694 +  \hgcmdargs{qsave}{\hgxopt{mq}{qsave}{-e} \hgxopt{mq}{qsave}{-c}}.  This prints
  20.695 +  the name of the directory that it has saved the patches in.  It will
  20.696 +  save the patches to a directory called
  20.697 +  \sdirname{.hg/patches.\emph{N}}, where \texttt{\emph{N}} is a small
  20.698 +  integer.  It also commits a ``save changeset'' on top of your
  20.699 +  applied patches; this is for internal book-keeping, and records the
  20.700 +  states of the \sfilename{series} and \sfilename{status} files.
  20.701 +\item Use \hgcmd{pull} to bring new changes into the underlying
  20.702 +  repository.  (Don't run \hgcmdargs{pull}{-u}; see below for why.)
  20.703 +\item Update to the new tip revision, using
  20.704 +  \hgcmdargs{update}{\hgopt{update}{-C}} to override the patches you
  20.705 +  have pushed.
  20.706 +\item Merge all patches using \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}
  20.707 +    \hgxopt{mq}{qpush}{-a}}.  The \hgxopt{mq}{qpush}{-m} option to \hgxcmd{mq}{qpush}
  20.708 +  tells MQ to perform a three-way merge if the patch fails to apply.
  20.709 +\end{enumerate}
  20.710 +
  20.711 +During the \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}}, each patch in the
  20.712 +\sfilename{series} file is applied normally.  If a patch applies with
  20.713 +fuzz or rejects, MQ looks at the queue you \hgxcmd{mq}{qsave}d, and
  20.714 +performs a three-way merge with the corresponding changeset.  This
  20.715 +merge uses Mercurial's normal merge machinery, so it may pop up a GUI
  20.716 +merge tool to help you to resolve problems.
  20.717 +
  20.718 +When you finish resolving the effects of a patch, MQ refreshes your
  20.719 +patch based on the result of the merge.
  20.720 +
  20.721 +At the end of this process, your repository will have one extra head
  20.722 +from the old patch queue, and a copy of the old patch queue will be in
  20.723 +\sdirname{.hg/patches.\emph{N}}. You can remove the extra head using
  20.724 +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a} \hgxopt{mq}{qpop}{-n} patches.\emph{N}}
  20.725 +or \hgcmd{strip}.  You can delete \sdirname{.hg/patches.\emph{N}} once
  20.726 +you are sure that you no longer need it as a backup.
  20.727 +
  20.728 +\section{Identifying patches}
  20.729 +
  20.730 +MQ commands that work with patches let you refer to a patch either by
  20.731 +using its name or by a number.  By name is obvious enough; pass the
  20.732 +name \filename{foo.patch} to \hgxcmd{mq}{qpush}, for example, and it will
  20.733 +push patches until \filename{foo.patch} is applied.  
  20.734 +
  20.735 +As a shortcut, you can refer to a patch using both a name and a
  20.736 +numeric offset; \texttt{foo.patch-2} means ``two patches before
  20.737 +\texttt{foo.patch}'', while \texttt{bar.patch+4} means ``four patches
  20.738 +after \texttt{bar.patch}''.
  20.739 +
  20.740 +Referring to a patch by index isn't much different.  The first patch
  20.741 +printed in the output of \hgxcmd{mq}{qseries} is patch zero (yes, it's one
  20.742 +of those start-at-zero counting systems); the second is patch one; and
  20.743 +so on.
  20.744 +
  20.745 +MQ also makes it easy to work with patches when you are using normal
  20.746 +Mercurial commands.  Every command that accepts a changeset ID will
  20.747 +also accept the name of an applied patch.  MQ augments the tags
  20.748 +normally in the repository with an eponymous one for each applied
  20.749 +patch.  In addition, the special tags \index{tags!special tag
  20.750 +  names!\texttt{qbase}}\texttt{qbase} and \index{tags!special tag
  20.751 +  names!\texttt{qtip}}\texttt{qtip} identify the ``bottom-most'' and
  20.752 +topmost applied patches, respectively.
  20.753 +
  20.754 +These additions to Mercurial's normal tagging capabilities make
  20.755 +dealing with patches even more of a breeze.
  20.756 +\begin{itemize}
  20.757 +\item Want to patchbomb a mailing list with your latest series of
  20.758 +  changes?
  20.759 +  \begin{codesample4}
  20.760 +    hg email qbase:qtip
  20.761 +  \end{codesample4}
  20.762 +  (Don't know what ``patchbombing'' is?  See
  20.763 +  section~\ref{sec:hgext:patchbomb}.)
  20.764 +\item Need to see all of the patches since \texttt{foo.patch} that
  20.765 +  have touched files in a subdirectory of your tree?
  20.766 +  \begin{codesample4}
  20.767 +    hg log -r foo.patch:qtip \emph{subdir}
  20.768 +  \end{codesample4}
  20.769 +\end{itemize}
  20.770 +
  20.771 +Because MQ makes the names of patches available to the rest of
  20.772 +Mercurial through its normal internal tag machinery, you don't need to
  20.773 +type in the entire name of a patch when you want to identify it by
  20.774 +name.
  20.775 +
  20.776 +\begin{figure}[ht]
  20.777 +  \interaction{mq.id.output}
  20.778 +  \caption{Using MQ's tag features to work with patches}
  20.779 +  \label{ex:mq:id}
  20.780 +\end{figure}
  20.781 +
  20.782 +Another nice consequence of representing patch names as tags is that
  20.783 +when you run the \hgcmd{log} command, it will display a patch's name
  20.784 +as a tag, simply as part of its normal output.  This makes it easy to
  20.785 +visually distinguish applied patches from underlying ``normal''
  20.786 +revisions.  Figure~\ref{ex:mq:id} shows a few normal Mercurial
  20.787 +commands in use with applied patches.
  20.788 +
  20.789 +\section{Useful things to know about}
  20.790 +
  20.791 +There are a number of aspects of MQ usage that don't fit tidily into
  20.792 +sections of their own, but that are good to know.  Here they are, in
  20.793 +one place.
  20.794 +
  20.795 +\begin{itemize}
  20.796 +\item Normally, when you \hgxcmd{mq}{qpop} a patch and \hgxcmd{mq}{qpush} it
  20.797 +  again, the changeset that represents the patch after the pop/push
  20.798 +  will have a \emph{different identity} than the changeset that
  20.799 +  represented the hash beforehand.  See
  20.800 +  section~\ref{sec:mqref:cmd:qpush} for information as to why this is.
  20.801 +\item It's not a good idea to \hgcmd{merge} changes from another
  20.802 +  branch with a patch changeset, at least if you want to maintain the
  20.803 +  ``patchiness'' of that changeset and changesets below it on the
  20.804 +  patch stack.  If you try to do this, it will appear to succeed, but
  20.805 +  MQ will become confused.
  20.806 +\end{itemize}
  20.807 +
  20.808 +\section{Managing patches in a repository}
  20.809 +\label{sec:mq:repo}
  20.810 +
  20.811 +Because MQ's \sdirname{.hg/patches} directory resides outside a
  20.812 +Mercurial repository's working directory, the ``underlying'' Mercurial
  20.813 +repository knows nothing about the management or presence of patches.
  20.814 +
  20.815 +This presents the interesting possibility of managing the contents of
  20.816 +the patch directory as a Mercurial repository in its own right.  This
  20.817 +can be a useful way to work.  For example, you can work on a patch for
  20.818 +a while, \hgxcmd{mq}{qrefresh} it, then \hgcmd{commit} the current state of
  20.819 +the patch.  This lets you ``roll back'' to that version of the patch
  20.820 +later on.
  20.821 +
  20.822 +You can then share different versions of the same patch stack among
  20.823 +multiple underlying repositories.  I use this when I am developing a
  20.824 +Linux kernel feature.  I have a pristine copy of my kernel sources for
  20.825 +each of several CPU architectures, and a cloned repository under each
  20.826 +that contains the patches I am working on.  When I want to test a
  20.827 +change on a different architecture, I push my current patches to the
  20.828 +patch repository associated with that kernel tree, pop and push all of
  20.829 +my patches, and build and test that kernel.
  20.830 +
  20.831 +Managing patches in a repository makes it possible for multiple
  20.832 +developers to work on the same patch series without colliding with
  20.833 +each other, all on top of an underlying source base that they may or
  20.834 +may not control.
  20.835 +
  20.836 +\subsection{MQ support for patch repositories}
  20.837 +
  20.838 +MQ helps you to work with the \sdirname{.hg/patches} directory as a
  20.839 +repository; when you prepare a repository for working with patches
  20.840 +using \hgxcmd{mq}{qinit}, you can pass the \hgxopt{mq}{qinit}{-c} option to
  20.841 +create the \sdirname{.hg/patches} directory as a Mercurial repository.
  20.842 +
  20.843 +\begin{note}
  20.844 +  If you forget to use the \hgxopt{mq}{qinit}{-c} option, you can simply go
  20.845 +  into the \sdirname{.hg/patches} directory at any time and run
  20.846 +  \hgcmd{init}.  Don't forget to add an entry for the
  20.847 +  \sfilename{status} file to the \sfilename{.hgignore} file, though
  20.848 +
  20.849 +  (\hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} does this for you
  20.850 +  automatically); you \emph{really} don't want to manage the
  20.851 +  \sfilename{status} file.
  20.852 +\end{note}
  20.853 +
  20.854 +As a convenience, if MQ notices that the \dirname{.hg/patches}
  20.855 +directory is a repository, it will automatically \hgcmd{add} every
  20.856 +patch that you create and import.
  20.857 +
  20.858 +MQ provides a shortcut command, \hgxcmd{mq}{qcommit}, that runs
  20.859 +\hgcmd{commit} in the \sdirname{.hg/patches} directory.  This saves
  20.860 +some bothersome typing.
  20.861 +
  20.862 +Finally, as a convenience to manage the patch directory, you can
  20.863 +define the alias \command{mq} on Unix systems. For example, on Linux
  20.864 +systems using the \command{bash} shell, you can include the following
  20.865 +snippet in your \tildefile{.bashrc}.
  20.866 +
  20.867 +\begin{codesample2}
  20.868 +  alias mq=`hg -R \$(hg root)/.hg/patches'
  20.869 +\end{codesample2}
  20.870 +
  20.871 +You can then issue commands of the form \cmdargs{mq}{pull} from
  20.872 +the main repository.
  20.873 +
  20.874 +\subsection{A few things to watch out for}
  20.875 +
  20.876 +MQ's support for working with a repository full of patches is limited
  20.877 +in a few small respects.
  20.878 +
  20.879 +MQ cannot automatically detect changes that you make to the patch
  20.880 +directory.  If you \hgcmd{pull}, manually edit, or \hgcmd{update}
  20.881 +changes to patches or the \sfilename{series} file, you will have to
  20.882 +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} and then
  20.883 +\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} in the underlying repository to
  20.884 +see those changes show up there.  If you forget to do this, you can
  20.885 +confuse MQ's idea of which patches are applied.
  20.886 +
  20.887 +\section{Third party tools for working with patches}
  20.888 +\label{sec:mq:tools}
  20.889 +
  20.890 +Once you've been working with patches for a while, you'll find
  20.891 +yourself hungry for tools that will help you to understand and
  20.892 +manipulate the patches you're dealing with.
  20.893 +
  20.894 +The \command{diffstat} command~\cite{web:diffstat} generates a
  20.895 +histogram of the modifications made to each file in a patch.  It
  20.896 +provides a good way to ``get a sense of'' a patch---which files it
  20.897 +affects, and how much change it introduces to each file and as a
  20.898 +whole.  (I find that it's a good idea to use \command{diffstat}'s
  20.899 +\cmdopt{diffstat}{-p} option as a matter of course, as otherwise it
  20.900 +will try to do clever things with prefixes of file names that
  20.901 +inevitably confuse at least me.)
  20.902 +
  20.903 +\begin{figure}[ht]
  20.904 +  \interaction{mq.tools.tools}
  20.905 +  \caption{The \command{diffstat}, \command{filterdiff}, and \command{lsdiff} commands}
  20.906 +  \label{ex:mq:tools}
  20.907 +\end{figure}
  20.908 +
  20.909 +The \package{patchutils} package~\cite{web:patchutils} is invaluable.
  20.910 +It provides a set of small utilities that follow the ``Unix
  20.911 +philosophy;'' each does one useful thing with a patch.  The
  20.912 +\package{patchutils} command I use most is \command{filterdiff}, which
  20.913 +extracts subsets from a patch file.  For example, given a patch that
  20.914 +modifies hundreds of files across dozens of directories, a single
  20.915 +invocation of \command{filterdiff} can generate a smaller patch that
  20.916 +only touches files whose names match a particular glob pattern.  See
  20.917 +section~\ref{mq-collab:tips:interdiff} for another example.
  20.918 +
  20.919 +\section{Good ways to work with patches}
  20.920 +
  20.921 +Whether you are working on a patch series to submit to a free software
  20.922 +or open source project, or a series that you intend to treat as a
  20.923 +sequence of regular changesets when you're done, you can use some
  20.924 +simple techniques to keep your work well organised.
  20.925 +
  20.926 +Give your patches descriptive names.  A good name for a patch might be
  20.927 +\filename{rework-device-alloc.patch}, because it will immediately give
  20.928 +you a hint what the purpose of the patch is.  Long names shouldn't be
  20.929 +a problem; you won't be typing the names often, but you \emph{will} be
  20.930 +running commands like \hgxcmd{mq}{qapplied} and \hgxcmd{mq}{qtop} over and over.
  20.931 +Good naming becomes especially important when you have a number of
  20.932 +patches to work with, or if you are juggling a number of different
  20.933 +tasks and your patches only get a fraction of your attention.
  20.934 +
  20.935 +Be aware of what patch you're working on.  Use the \hgxcmd{mq}{qtop}
  20.936 +command and skim over the text of your patches frequently---for
  20.937 +example, using \hgcmdargs{tip}{\hgopt{tip}{-p}})---to be sure of where
  20.938 +you stand.  I have several times worked on and \hgxcmd{mq}{qrefresh}ed a
  20.939 +patch other than the one I intended, and it's often tricky to migrate
  20.940 +changes into the right patch after making them in the wrong one.
  20.941 +
  20.942 +For this reason, it is very much worth investing a little time to
  20.943 +learn how to use some of the third-party tools I described in
  20.944 +section~\ref{sec:mq:tools}, particularly \command{diffstat} and
  20.945 +\command{filterdiff}.  The former will give you a quick idea of what
  20.946 +changes your patch is making, while the latter makes it easy to splice
  20.947 +hunks selectively out of one patch and into another.
  20.948 +
  20.949 +\section{MQ cookbook}
  20.950 +
  20.951 +\subsection{Manage ``trivial'' patches}
  20.952 +
  20.953 +Because the overhead of dropping files into a new Mercurial repository
  20.954 +is so low, it makes a lot of sense to manage patches this way even if
  20.955 +you simply want to make a few changes to a source tarball that you
  20.956 +downloaded.
  20.957 +
  20.958 +Begin by downloading and unpacking the source tarball,
  20.959 +and turning it into a Mercurial repository.
  20.960 +\interaction{mq.tarball.download}
  20.961 +
  20.962 +Continue by creating a patch stack and making your changes.
  20.963 +\interaction{mq.tarball.qinit}
  20.964 +
  20.965 +Let's say a few weeks or months pass, and your package author releases
  20.966 +a new version.  First, bring their changes into the repository.
  20.967 +\interaction{mq.tarball.newsource}
  20.968 +The pipeline starting with \hgcmd{locate} above deletes all files in
  20.969 +the working directory, so that \hgcmd{commit}'s
  20.970 +\hgopt{commit}{--addremove} option can actually tell which files have
  20.971 +really been removed in the newer version of the source.
  20.972 +
  20.973 +Finally, you can apply your patches on top of the new tree.
  20.974 +\interaction{mq.tarball.repush}
  20.975 +
  20.976 +\subsection{Combining entire patches}
  20.977 +\label{sec:mq:combine}
  20.978 +
  20.979 +MQ provides a command, \hgxcmd{mq}{qfold} that lets you combine entire
  20.980 +patches.  This ``folds'' the patches you name, in the order you name
  20.981 +them, into the topmost applied patch, and concatenates their
  20.982 +descriptions onto the end of its description.  The patches that you
  20.983 +fold must be unapplied before you fold them.
  20.984 +
  20.985 +The order in which you fold patches matters.  If your topmost applied
  20.986 +patch is \texttt{foo}, and you \hgxcmd{mq}{qfold} \texttt{bar} and
  20.987 +\texttt{quux} into it, you will end up with a patch that has the same
  20.988 +effect as if you applied first \texttt{foo}, then \texttt{bar},
  20.989 +followed by \texttt{quux}.
  20.990 +
  20.991 +\subsection{Merging part of one patch into another}
  20.992 +
  20.993 +Merging \emph{part} of one patch into another is more difficult than
  20.994 +combining entire patches.
  20.995 +
  20.996 +If you want to move changes to entire files, you can use
  20.997 +\command{filterdiff}'s \cmdopt{filterdiff}{-i} and
  20.998 +\cmdopt{filterdiff}{-x} options to choose the modifications to snip
  20.999 +out of one patch, concatenating its output onto the end of the patch
 20.1000 +you want to merge into.  You usually won't need to modify the patch
 20.1001 +you've merged the changes from.  Instead, MQ will report some rejected
 20.1002 +hunks when you \hgxcmd{mq}{qpush} it (from the hunks you moved into the
 20.1003 +other patch), and you can simply \hgxcmd{mq}{qrefresh} the patch to drop
 20.1004 +the duplicate hunks.
 20.1005 +
 20.1006 +If you have a patch that has multiple hunks modifying a file, and you
 20.1007 +only want to move a few of those hunks, the job becomes more messy,
 20.1008 +but you can still partly automate it.  Use \cmdargs{lsdiff}{-nvv} to
 20.1009 +print some metadata about the patch.
 20.1010 +\interaction{mq.tools.lsdiff}
 20.1011 +
 20.1012 +This command prints three different kinds of number:
 20.1013 +\begin{itemize}
 20.1014 +\item (in the first column) a \emph{file number} to identify each file
 20.1015 +  modified in the patch;
 20.1016 +\item (on the next line, indented) the line number within a modified
 20.1017 +  file where a hunk starts; and
 20.1018 +\item (on the same line) a \emph{hunk number} to identify that hunk.
 20.1019 +\end{itemize}
 20.1020 +
 20.1021 +You'll have to use some visual inspection, and reading of the patch,
 20.1022 +to identify the file and hunk numbers you'll want, but you can then
 20.1023 +pass them to to \command{filterdiff}'s \cmdopt{filterdiff}{--files}
 20.1024 +and \cmdopt{filterdiff}{--hunks} options, to select exactly the file
 20.1025 +and hunk you want to extract.
 20.1026 +
 20.1027 +Once you have this hunk, you can concatenate it onto the end of your
 20.1028 +destination patch and continue with the remainder of
 20.1029 +section~\ref{sec:mq:combine}.
 20.1030 +
 20.1031 +\section{Differences between quilt and MQ}
 20.1032 +
 20.1033 +If you are already familiar with quilt, MQ provides a similar command
 20.1034 +set.  There are a few differences in the way that it works.
 20.1035 +
 20.1036 +You will already have noticed that most quilt commands have MQ
 20.1037 +counterparts that simply begin with a ``\texttt{q}''.  The exceptions
 20.1038 +are quilt's \texttt{add} and \texttt{remove} commands, the
 20.1039 +counterparts for which are the normal Mercurial \hgcmd{add} and
 20.1040 +\hgcmd{remove} commands.  There is no MQ equivalent of the quilt
 20.1041 +\texttt{edit} command.
 20.1042 +
 20.1043 +%%% Local Variables: 
 20.1044 +%%% mode: latex
 20.1045 +%%% TeX-master: "00book"
 20.1046 +%%% End: 

    21.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    21.2 +++ b/en/ch13-mq-collab.tex	Thu Jan 29 22:56:27 2009 -0800
    21.3 @@ -0,0 +1,393 @@
    21.4 +\chapter{Advanced uses of Mercurial Queues}
    21.5 +\label{chap:mq-collab}
    21.6 +
    21.7 +While it's easy to pick up straightforward uses of Mercurial Queues,
    21.8 +use of a little discipline and some of MQ's less frequently used
    21.9 +capabilities makes it possible to work in complicated development
   21.10 +environments.
   21.11 +
   21.12 +In this chapter, I will use as an example a technique I have used to
   21.13 +manage the development of an Infiniband device driver for the Linux
   21.14 +kernel.  The driver in question is large (at least as drivers go),
   21.15 +with 25,000 lines of code spread across 35 source files.  It is
   21.16 +maintained by a small team of developers.
   21.17 +
   21.18 +While much of the material in this chapter is specific to Linux, the
   21.19 +same principles apply to any code base for which you're not the
   21.20 +primary owner, and upon which you need to do a lot of development.
   21.21 +
   21.22 +\section{The problem of many targets}
   21.23 +
   21.24 +The Linux kernel changes rapidly, and has never been internally
   21.25 +stable; developers frequently make drastic changes between releases.
   21.26 +This means that a version of the driver that works well with a
   21.27 +particular released version of the kernel will not even \emph{compile}
   21.28 +correctly against, typically, any other version.
   21.29 +
   21.30 +To maintain a driver, we have to keep a number of distinct versions of
   21.31 +Linux in mind.
   21.32 +\begin{itemize}
   21.33 +\item One target is the main Linux kernel development tree.
   21.34 +  Maintenance of the code is in this case partly shared by other
   21.35 +  developers in the kernel community, who make ``drive-by''
   21.36 +  modifications to the driver as they develop and refine kernel
   21.37 +  subsystems.
   21.38 +\item We also maintain a number of ``backports'' to older versions of
   21.39 +  the Linux kernel, to support the needs of customers who are running
   21.40 +  older Linux distributions that do not incorporate our drivers.  (To
   21.41 +  \emph{backport} a piece of code is to modify it to work in an older
   21.42 +  version of its target environment than the version it was developed
   21.43 +  for.)
   21.44 +\item Finally, we make software releases on a schedule that is
   21.45 +  necessarily not aligned with those used by Linux distributors and
   21.46 +  kernel developers, so that we can deliver new features to customers
   21.47 +  without forcing them to upgrade their entire kernels or
   21.48 +  distributions.
   21.49 +\end{itemize}
   21.50 +
   21.51 +\subsection{Tempting approaches that don't work well}
   21.52 +
   21.53 +There are two ``standard'' ways to maintain a piece of software that
   21.54 +has to target many different environments.
   21.55 +
   21.56 +The first is to maintain a number of branches, each intended for a
   21.57 +single target.  The trouble with this approach is that you must
   21.58 +maintain iron discipline in the flow of changes between repositories.
   21.59 +A new feature or bug fix must start life in a ``pristine'' repository,
   21.60 +then percolate out to every backport repository.  Backport changes are
   21.61 +more limited in the branches they should propagate to; a backport
   21.62 +change that is applied to a branch where it doesn't belong will
   21.63 +probably stop the driver from compiling.
   21.64 +
   21.65 +The second is to maintain a single source tree filled with conditional
   21.66 +statements that turn chunks of code on or off depending on the
   21.67 +intended target.  Because these ``ifdefs'' are not allowed in the
   21.68 +Linux kernel tree, a manual or automatic process must be followed to
   21.69 +strip them out and yield a clean tree.  A code base maintained in this
   21.70 +fashion rapidly becomes a rat's nest of conditional blocks that are
   21.71 +difficult to understand and maintain.
   21.72 +
   21.73 +Neither of these approaches is well suited to a situation where you
   21.74 +don't ``own'' the canonical copy of a source tree.  In the case of a
   21.75 +Linux driver that is distributed with the standard kernel, Linus's
   21.76 +tree contains the copy of the code that will be treated by the world
   21.77 +as canonical.  The upstream version of ``my'' driver can be modified
   21.78 +by people I don't know, without me even finding out about it until
   21.79 +after the changes show up in Linus's tree.  
   21.80 +
   21.81 +These approaches have the added weakness of making it difficult to
   21.82 +generate well-formed patches to submit upstream.
   21.83 +
   21.84 +In principle, Mercurial Queues seems like a good candidate to manage a
   21.85 +development scenario such as the above.  While this is indeed the
   21.86 +case, MQ contains a few added features that make the job more
   21.87 +pleasant.
   21.88 +
   21.89 +\section{Conditionally applying patches with 
   21.90 +  guards}
   21.91 +
   21.92 +Perhaps the best way to maintain sanity with so many targets is to be
   21.93 +able to choose specific patches to apply for a given situation.  MQ
   21.94 +provides a feature called ``guards'' (which originates with quilt's
   21.95 +\texttt{guards} command) that does just this.  To start off, let's
   21.96 +create a simple repository for experimenting in.
   21.97 +\interaction{mq.guards.init}
   21.98 +This gives us a tiny repository that contains two patches that don't
   21.99 +have any dependencies on each other, because they touch different files.
  21.100 +
  21.101 +The idea behind conditional application is that you can ``tag'' a
  21.102 +patch with a \emph{guard}, which is simply a text string of your
  21.103 +choosing, then tell MQ to select specific guards to use when applying
  21.104 +patches.  MQ will then either apply, or skip over, a guarded patch,
  21.105 +depending on the guards that you have selected.
  21.106 +
  21.107 +A patch can have an arbitrary number of guards;
  21.108 +each one is \emph{positive} (``apply this patch if this guard is
  21.109 +selected'') or \emph{negative} (``skip this patch if this guard is
  21.110 +selected'').  A patch with no guards is always applied.
  21.111 +
  21.112 +\section{Controlling the guards on a patch}
  21.113 +
  21.114 +The \hgxcmd{mq}{qguard} command lets you determine which guards should
  21.115 +apply to a patch, or display the guards that are already in effect.
  21.116 +Without any arguments, it displays the guards on the current topmost
  21.117 +patch.
  21.118 +\interaction{mq.guards.qguard}
  21.119 +To set a positive guard on a patch, prefix the name of the guard with
  21.120 +a ``\texttt{+}''.
  21.121 +\interaction{mq.guards.qguard.pos}
  21.122 +To set a negative guard on a patch, prefix the name of the guard with
  21.123 +a ``\texttt{-}''.
  21.124 +\interaction{mq.guards.qguard.neg}
  21.125 +
  21.126 +\begin{note}
  21.127 +  The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it
  21.128 +  doesn't \emph{modify} them.  What this means is that if you run
  21.129 +  \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on
  21.130 +  the same patch, the \emph{only} guard that will be set on it
  21.131 +  afterwards is \texttt{+c}.
  21.132 +\end{note}
  21.133 +
  21.134 +Mercurial stores guards in the \sfilename{series} file; the form in
  21.135 +which they are stored is easy both to understand and to edit by hand.
  21.136 +(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if
  21.137 +you don't want to; it's okay to simply edit the \sfilename{series}
  21.138 +file.)
  21.139 +\interaction{mq.guards.series}
  21.140 +
  21.141 +\section{Selecting the guards to use}
  21.142 +
  21.143 +The \hgxcmd{mq}{qselect} command determines which guards are active at a
  21.144 +given time.  The effect of this is to determine which patches MQ will
  21.145 +apply the next time you run \hgxcmd{mq}{qpush}.  It has no other effect; in
  21.146 +particular, it doesn't do anything to patches that are already
  21.147 +applied.
  21.148 +
  21.149 +With no arguments, the \hgxcmd{mq}{qselect} command lists the guards
  21.150 +currently in effect, one per line of output.  Each argument is treated
  21.151 +as the name of a guard to apply.
  21.152 +\interaction{mq.guards.qselect.foo}
  21.153 +In case you're interested, the currently selected guards are stored in
  21.154 +the \sfilename{guards} file.
  21.155 +\interaction{mq.guards.qselect.cat}
  21.156 +We can see the effect the selected guards have when we run
  21.157 +\hgxcmd{mq}{qpush}.
  21.158 +\interaction{mq.guards.qselect.qpush}
  21.159 +
  21.160 +A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}''
  21.161 +character.  The name of a guard must not contain white space, but most
  21.162 +other characters are acceptable.  If you try to use a guard with an
  21.163 +invalid name, MQ will complain:
  21.164 +\interaction{mq.guards.qselect.error} 
  21.165 +Changing the selected guards changes the patches that are applied.
  21.166 +\interaction{mq.guards.qselect.quux} 
  21.167 +You can see in the example below that negative guards take precedence
  21.168 +over positive guards.
  21.169 +\interaction{mq.guards.qselect.foobar}
  21.170 +
  21.171 +\section{MQ's rules for applying patches}
  21.172 +
  21.173 +The rules that MQ uses when deciding whether to apply a patch
  21.174 +are as follows.
  21.175 +\begin{itemize}
  21.176 +\item A patch that has no guards is always applied.
  21.177 +\item If the patch has any negative guard that matches any currently
  21.178 +  selected guard, the patch is skipped.
  21.179 +\item If the patch has any positive guard that matches any currently
  21.180 +  selected guard, the patch is applied.
  21.181 +\item If the patch has positive or negative guards, but none matches
  21.182 +  any currently selected guard, the patch is skipped.
  21.183 +\end{itemize}
  21.184 +
  21.185 +\section{Trimming the work environment}
  21.186 +
  21.187 +In working on the device driver I mentioned earlier, I don't apply the
  21.188 +patches to a normal Linux kernel tree.  Instead, I use a repository
  21.189 +that contains only a snapshot of the source files and headers that are
  21.190 +relevant to Infiniband development.  This repository is~1\% the size
  21.191 +of a kernel repository, so it's easier to work with.
  21.192 +
  21.193 +I then choose a ``base'' version on top of which the patches are
  21.194 +applied.  This is a snapshot of the Linux kernel tree as of a revision
  21.195 +of my choosing.  When I take the snapshot, I record the changeset ID
  21.196 +from the kernel repository in the commit message.  Since the snapshot
  21.197 +preserves the ``shape'' and content of the relevant parts of the
  21.198 +kernel tree, I can apply my patches on top of either my tiny
  21.199 +repository or a normal kernel tree.
  21.200 +
  21.201 +Normally, the base tree atop which the patches apply should be a
  21.202 +snapshot of a very recent upstream tree.  This best facilitates the
  21.203 +development of patches that can easily be submitted upstream with few
  21.204 +or no modifications.
  21.205 +
  21.206 +\section{Dividing up the \sfilename{series} file}
  21.207 +
  21.208 +I categorise the patches in the \sfilename{series} file into a number
  21.209 +of logical groups.  Each section of like patches begins with a block
  21.210 +of comments that describes the purpose of the patches that follow.
  21.211 +
  21.212 +The sequence of patch groups that I maintain follows.  The ordering of
  21.213 +these groups is important; I'll describe why after I introduce the
  21.214 +groups.
  21.215 +\begin{itemize}
  21.216 +\item The ``accepted'' group.  Patches that the development team has
  21.217 +  submitted to the maintainer of the Infiniband subsystem, and which
  21.218 +  he has accepted, but which are not present in the snapshot that the
  21.219 +  tiny repository is based on.  These are ``read only'' patches,
  21.220 +  present only to transform the tree into a similar state as it is in
  21.221 +  the upstream maintainer's repository.
  21.222 +\item The ``rework'' group.  Patches that I have submitted, but that
  21.223 +  the upstream maintainer has requested modifications to before he
  21.224 +  will accept them.
  21.225 +\item The ``pending'' group.  Patches that I have not yet submitted to
  21.226 +  the upstream maintainer, but which we have finished working on.
  21.227 +  These will be ``read only'' for a while.  If the upstream maintainer
  21.228 +  accepts them upon submission, I'll move them to the end of the
  21.229 +  ``accepted'' group.  If he requests that I modify any, I'll move
  21.230 +  them to the beginning of the ``rework'' group.
  21.231 +\item The ``in progress'' group.  Patches that are actively being
  21.232 +  developed, and should not be submitted anywhere yet.
  21.233 +\item The ``backport'' group.  Patches that adapt the source tree to
  21.234 +  older versions of the kernel tree.
  21.235 +\item The ``do not ship'' group.  Patches that for some reason should
  21.236 +  never be submitted upstream.  For example, one such patch might
  21.237 +  change embedded driver identification strings to make it easier to
  21.238 +  distinguish, in the field, between an out-of-tree version of the
  21.239 +  driver and a version shipped by a distribution vendor.
  21.240 +\end{itemize}
  21.241 +
  21.242 +Now to return to the reasons for ordering groups of patches in this
  21.243 +way.  We would like the lowest patches in the stack to be as stable as
  21.244 +possible, so that we will not need to rework higher patches due to
  21.245 +changes in context.  Putting patches that will never be changed first
  21.246 +in the \sfilename{series} file serves this purpose.
  21.247 +
  21.248 +We would also like the patches that we know we'll need to modify to be
  21.249 +applied on top of a source tree that resembles the upstream tree as
  21.250 +closely as possible.  This is why we keep accepted patches around for
  21.251 +a while.
  21.252 +
  21.253 +The ``backport'' and ``do not ship'' patches float at the end of the
  21.254 +\sfilename{series} file.  The backport patches must be applied on top
  21.255 +of all other patches, and the ``do not ship'' patches might as well
  21.256 +stay out of harm's way.
  21.257 +
  21.258 +\section{Maintaining the patch series}
  21.259 +
  21.260 +In my work, I use a number of guards to control which patches are to
  21.261 +be applied.
  21.262 +
  21.263 +\begin{itemize}
  21.264 +\item ``Accepted'' patches are guarded with \texttt{accepted}.  I
  21.265 +  enable this guard most of the time.  When I'm applying the patches
  21.266 +  on top of a tree where the patches are already present, I can turn
  21.267 +  this patch off, and the patches that follow it will apply cleanly.
  21.268 +\item Patches that are ``finished'', but not yet submitted, have no
  21.269 +  guards.  If I'm applying the patch stack to a copy of the upstream
  21.270 +  tree, I don't need to enable any guards in order to get a reasonably
  21.271 +  safe source tree.
  21.272 +\item Those patches that need reworking before being resubmitted are
  21.273 +  guarded with \texttt{rework}.
  21.274 +\item For those patches that are still under development, I use
  21.275 +  \texttt{devel}.
  21.276 +\item A backport patch may have several guards, one for each version
  21.277 +  of the kernel to which it applies.  For example, a patch that
  21.278 +  backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard.
  21.279 +\end{itemize}
  21.280 +This variety of guards gives me considerable flexibility in
  21.281 +determining what kind of source tree I want to end up with.  For most
  21.282 +situations, the selection of appropriate guards is automated during
  21.283 +the build process, but I can manually tune the guards to use for less
  21.284 +common circumstances.
  21.285 +
  21.286 +\subsection{The art of writing backport patches}
  21.287 +
  21.288 +Using MQ, writing a backport patch is a simple process.  All such a
  21.289 +patch has to do is modify a piece of code that uses a kernel feature
  21.290 +not present in the older version of the kernel, so that the driver
  21.291 +continues to work correctly under that older version.
  21.292 +
  21.293 +A useful goal when writing a good backport patch is to make your code
  21.294 +look as if it was written for the older version of the kernel you're
  21.295 +targeting.  The less obtrusive the patch, the easier it will be to
  21.296 +understand and maintain.  If you're writing a collection of backport
  21.297 +patches to avoid the ``rat's nest'' effect of lots of
  21.298 +\texttt{\#ifdef}s (hunks of source code that are only used
  21.299 +conditionally) in your code, don't introduce version-dependent
  21.300 +\texttt{\#ifdef}s into the patches.  Instead, write several patches,
  21.301 +each of which makes unconditional changes, and control their
  21.302 +application using guards.
  21.303 +
  21.304 +There are two reasons to divide backport patches into a distinct
  21.305 +group, away from the ``regular'' patches whose effects they modify.
  21.306 +The first is that intermingling the two makes it more difficult to use
  21.307 +a tool like the \hgext{patchbomb} extension to automate the process of
  21.308 +submitting the patches to an upstream maintainer.  The second is that
  21.309 +a backport patch could perturb the context in which a subsequent
  21.310 +regular patch is applied, making it impossible to apply the regular
  21.311 +patch cleanly \emph{without} the earlier backport patch already being
  21.312 +applied.
  21.313 +
  21.314 +\section{Useful tips for developing with MQ}
  21.315 +
  21.316 +\subsection{Organising patches in directories}
  21.317 +
  21.318 +If you're working on a substantial project with MQ, it's not difficult
  21.319 +to accumulate a large number of patches.  For example, I have one
  21.320 +patch repository that contains over 250 patches.
  21.321 +
  21.322 +If you can group these patches into separate logical categories, you
  21.323 +can if you like store them in different directories; MQ has no
  21.324 +problems with patch names that contain path separators.
  21.325 +
  21.326 +\subsection{Viewing the history of a patch}
  21.327 +\label{mq-collab:tips:interdiff}
  21.328 +
  21.329 +If you're developing a set of patches over a long time, it's a good
  21.330 +idea to maintain them in a repository, as discussed in
  21.331 +section~\ref{sec:mq:repo}.  If you do so, you'll quickly discover that
  21.332 +using the \hgcmd{diff} command to look at the history of changes to a
  21.333 +patch is unworkable.  This is in part because you're looking at the
  21.334 +second derivative of the real code (a diff of a diff), but also
  21.335 +because MQ adds noise to the process by modifying time stamps and
  21.336 +directory names when it updates a patch.
  21.337 +
  21.338 +However, you can use the \hgext{extdiff} extension, which is bundled
  21.339 +with Mercurial, to turn a diff of two versions of a patch into
  21.340 +something readable.  To do this, you will need a third-party package
  21.341 +called \package{patchutils}~\cite{web:patchutils}.  This provides a
  21.342 +command named \command{interdiff}, which shows the differences between
  21.343 +two diffs as a diff.  Used on two versions of the same diff, it
  21.344 +generates a diff that represents the diff from the first to the second
  21.345 +version.
  21.346 +
  21.347 +You can enable the \hgext{extdiff} extension in the usual way, by
  21.348 +adding a line to the \rcsection{extensions} section of your \hgrc.
  21.349 +\begin{codesample2}
  21.350 +  [extensions]
  21.351 +  extdiff =
  21.352 +\end{codesample2}
  21.353 +The \command{interdiff} command expects to be passed the names of two
  21.354 +files, but the \hgext{extdiff} extension passes the program it runs a
  21.355 +pair of directories, each of which can contain an arbitrary number of
  21.356 +files.  We thus need a small program that will run \command{interdiff}
  21.357 +on each pair of files in these two directories.  This program is
  21.358 +available as \sfilename{hg-interdiff} in the \dirname{examples}
  21.359 +directory of the source code repository that accompanies this book.
  21.360 +\excode{hg-interdiff}
  21.361 +
  21.362 +With the \sfilename{hg-interdiff} program in your shell's search path,
  21.363 +you can run it as follows, from inside an MQ patch directory:
  21.364 +\begin{codesample2}
  21.365 +  hg extdiff -p hg-interdiff -r A:B my-change.patch
  21.366 +\end{codesample2}
  21.367 +Since you'll probably want to use this long-winded command a lot, you
  21.368 +can get \hgext{hgext} to make it available as a normal Mercurial
  21.369 +command, again by editing your \hgrc.
  21.370 +\begin{codesample2}
  21.371 +  [extdiff]
  21.372 +  cmd.interdiff = hg-interdiff
  21.373 +\end{codesample2}
  21.374 +This directs \hgext{hgext} to make an \texttt{interdiff} command
  21.375 +available, so you can now shorten the previous invocation of
  21.376 +\hgxcmd{extdiff}{extdiff} to something a little more wieldy.
  21.377 +\begin{codesample2}
  21.378 +  hg interdiff -r A:B my-change.patch
  21.379 +\end{codesample2}
  21.380 +
  21.381 +\begin{note}
  21.382 +  The \command{interdiff} command works well only if the underlying
  21.383 +  files against which versions of a patch are generated remain the
  21.384 +  same.  If you create a patch, modify the underlying files, and then
  21.385 +  regenerate the patch, \command{interdiff} may not produce useful
  21.386 +  output.
  21.387 +\end{note}
  21.388 +
  21.389 +The \hgext{extdiff} extension is useful for more than merely improving
  21.390 +the presentation of MQ~patches.  To read more about it, go to
  21.391 +section~\ref{sec:hgext:extdiff}.
  21.392 +
  21.393 +%%% Local Variables: 
  21.394 +%%% mode: latex
  21.395 +%%% TeX-master: "00book"
  21.396 +%%% End: 

    22.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    22.2 +++ b/en/ch14-hgext.tex	Thu Jan 29 22:56:27 2009 -0800
    22.3 @@ -0,0 +1,429 @@
    22.4 +\chapter{Adding functionality with extensions}
    22.5 +\label{chap:hgext}
    22.6 +
    22.7 +While the core of Mercurial is quite complete from a functionality
    22.8 +standpoint, it's deliberately shorn of fancy features.  This approach
    22.9 +of preserving simplicity keeps the software easy to deal with for both
   22.10 +maintainers and users.
   22.11 +
   22.12 +However, Mercurial doesn't box you in with an inflexible command set:
   22.13 +you can add features to it as \emph{extensions} (sometimes known as
   22.14 +\emph{plugins}).  We've already discussed a few of these extensions in
   22.15 +earlier chapters.
   22.16 +\begin{itemize}
   22.17 +\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch}
   22.18 +  extension; this combines pulling new changes and merging them with
   22.19 +  local changes into a single command, \hgxcmd{fetch}{fetch}.
   22.20 +\item In chapter~\ref{chap:hook}, we covered several extensions that
   22.21 +  are useful for hook-related functionality: \hgext{acl} adds access
   22.22 +  control lists; \hgext{bugzilla} adds integration with the Bugzilla
   22.23 +  bug tracking system; and \hgext{notify} sends notification emails on
   22.24 +  new changes.
   22.25 +\item The Mercurial Queues patch management extension is so invaluable
   22.26 +  that it merits two chapters and an appendix all to itself.
   22.27 +  Chapter~\ref{chap:mq} covers the basics;
   22.28 +  chapter~\ref{chap:mq-collab} discusses advanced topics; and
   22.29 +  appendix~\ref{chap:mqref} goes into detail on each command.
   22.30 +\end{itemize}
   22.31 +
   22.32 +In this chapter, we'll cover some of the other extensions that are
   22.33 +available for Mercurial, and briefly touch on some of the machinery
   22.34 +you'll need to know about if you want to write an extension of your
   22.35 +own.
   22.36 +\begin{itemize}
   22.37 +\item In section~\ref{sec:hgext:inotify}, we'll discuss the
   22.38 +  possibility of \emph{huge} performance improvements using the
   22.39 +  \hgext{inotify} extension.
   22.40 +\end{itemize}
   22.41 +
   22.42 +\section{Improve performance with the \hgext{inotify} extension}
   22.43 +\label{sec:hgext:inotify}
   22.44 +
   22.45 +Are you interested in having some of the most common Mercurial
   22.46 +operations run as much as a hundred times faster?  Read on!
   22.47 +
   22.48 +Mercurial has great performance under normal circumstances.  For
   22.49 +example, when you run the \hgcmd{status} command, Mercurial has to
   22.50 +scan almost every directory and file in your repository so that it can
   22.51 +display file status.  Many other Mercurial commands need to do the
   22.52 +same work behind the scenes; for example, the \hgcmd{diff} command
   22.53 +uses the status machinery to avoid doing an expensive comparison
   22.54 +operation on files that obviously haven't changed.
   22.55 +
   22.56 +Because obtaining file status is crucial to good performance, the
   22.57 +authors of Mercurial have optimised this code to within an inch of its
   22.58 +life.  However, there's no avoiding the fact that when you run
   22.59 +\hgcmd{status}, Mercurial is going to have to perform at least one
   22.60 +expensive system call for each managed file to determine whether it's
   22.61 +changed since the last time Mercurial checked.  For a sufficiently
   22.62 +large repository, this can take a long time.
   22.63 +
   22.64 +To put a number on the magnitude of this effect, I created a
   22.65 +repository containing 150,000 managed files.  I timed \hgcmd{status}
   22.66 +as taking ten seconds to run, even when \emph{none} of those files had
   22.67 +been modified.
   22.68 +
   22.69 +Many modern operating systems contain a file notification facility.
   22.70 +If a program signs up to an appropriate service, the operating system
   22.71 +will notify it every time a file of interest is created, modified, or
   22.72 +deleted.  On Linux systems, the kernel component that does this is
   22.73 +called \texttt{inotify}.
   22.74 +
   22.75 +Mercurial's \hgext{inotify} extension talks to the kernel's
   22.76 +\texttt{inotify} component to optimise \hgcmd{status} commands.  The
   22.77 +extension has two components.  A daemon sits in the background and
   22.78 +receives notifications from the \texttt{inotify} subsystem.  It also
   22.79 +listens for connections from a regular Mercurial command.  The
   22.80 +extension modifies Mercurial's behaviour so that instead of scanning
   22.81 +the filesystem, it queries the daemon.  Since the daemon has perfect
   22.82 +information about the state of the repository, it can respond with a
   22.83 +result instantaneously, avoiding the need to scan every directory and
   22.84 +file in the repository.
   22.85 +
   22.86 +Recall the ten seconds that I measured plain Mercurial as taking to
   22.87 +run \hgcmd{status} on a 150,000 file repository.  With the
   22.88 +\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a
   22.89 +factor of \emph{one hundred} faster.
   22.90 +
   22.91 +Before we continue, please pay attention to some caveats.
   22.92 +\begin{itemize}
   22.93 +\item The \hgext{inotify} extension is Linux-specific.  Because it
   22.94 +  interfaces directly to the Linux kernel's \texttt{inotify}
   22.95 +  subsystem, it does not work on other operating systems.
   22.96 +\item It should work on any Linux distribution that was released after
   22.97 +  early~2005.  Older distributions are likely to have a kernel that
   22.98 +  lacks \texttt{inotify}, or a version of \texttt{glibc} that does not
   22.99 +  have the necessary interfacing support.
  22.100 +\item Not all filesystems are suitable for use with the
  22.101 +  \hgext{inotify} extension.  Network filesystems such as NFS are a
  22.102 +  non-starter, for example, particularly if you're running Mercurial
  22.103 +  on several systems, all mounting the same network filesystem.  The
  22.104 +  kernel's \texttt{inotify} system has no way of knowing about changes
  22.105 +  made on another system.  Most local filesystems (e.g.~ext3, XFS,
  22.106 +  ReiserFS) should work fine.
  22.107 +\end{itemize}
  22.108 +
  22.109 +The \hgext{inotify} extension is not yet shipped with Mercurial as of
  22.110 +May~2007, so it's a little more involved to set up than other
  22.111 +extensions.  But the performance improvement is worth it!
  22.112 +
  22.113 +The extension currently comes in two parts: a set of patches to the
  22.114 +Mercurial source code, and a library of Python bindings to the
  22.115 +\texttt{inotify} subsystem.
  22.116 +\begin{note}
  22.117 +  There are \emph{two} Python \texttt{inotify} binding libraries.  One
  22.118 +  of them is called \texttt{pyinotify}, and is packaged by some Linux
  22.119 +  distributions as \texttt{python-inotify}.  This is \emph{not} the
  22.120 +  one you'll need, as it is too buggy and inefficient to be practical.
  22.121 +\end{note}
  22.122 +To get going, it's best to already have a functioning copy of
  22.123 +Mercurial installed.
  22.124 +\begin{note}
  22.125 +  If you follow the instructions below, you'll be \emph{replacing} and
  22.126 +  overwriting any existing installation of Mercurial that you might
  22.127 +  already have, using the latest ``bleeding edge'' Mercurial code.
  22.128 +  Don't say you weren't warned!
  22.129 +\end{note}
  22.130 +\begin{enumerate}
  22.131 +\item Clone the Python \texttt{inotify} binding repository.  Build and
  22.132 +  install it.
  22.133 +  \begin{codesample4}
  22.134 +    hg clone http://hg.kublai.com/python/inotify
  22.135 +    cd inotify
  22.136 +    python setup.py build --force
  22.137 +    sudo python setup.py install --skip-build
  22.138 +  \end{codesample4}
  22.139 +\item Clone the \dirname{crew} Mercurial repository.  Clone the
  22.140 +  \hgext{inotify} patch repository so that Mercurial Queues will be
  22.141 +  able to apply patches to your cope of the \dirname{crew} repository.
  22.142 +  \begin{codesample4}
  22.143 +    hg clone http://hg.intevation.org/mercurial/crew
  22.144 +    hg clone crew inotify
  22.145 +    hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches
  22.146 +  \end{codesample4}
  22.147 +\item Make sure that you have the Mercurial Queues extension,
  22.148 +  \hgext{mq}, enabled.  If you've never used MQ, read
  22.149 +  section~\ref{sec:mq:start} to get started quickly.
  22.150 +\item Go into the \dirname{inotify} repo, and apply all of the
  22.151 +  \hgext{inotify} patches using the \hgxopt{mq}{qpush}{-a} option to
  22.152 +  the \hgxcmd{mq}{qpush} command.
  22.153 +  \begin{codesample4}
  22.154 +    cd inotify
  22.155 +    hg qpush -a
  22.156 +  \end{codesample4}
  22.157 +  If you get an error message from \hgxcmd{mq}{qpush}, you should not
  22.158 +  continue.  Instead, ask for help.
  22.159 +\item Build and install the patched version of Mercurial.
  22.160 +  \begin{codesample4}
  22.161 +    python setup.py build --force
  22.162 +    sudo python setup.py install --skip-build
  22.163 +  \end{codesample4}
  22.164 +\end{enumerate}
  22.165 +Once you've build a suitably patched version of Mercurial, all you
  22.166 +need to do to enable the \hgext{inotify} extension is add an entry to
  22.167 +your \hgrc.
  22.168 +\begin{codesample2}
  22.169 +  [extensions]
  22.170 +  inotify =
  22.171 +\end{codesample2}
  22.172 +When the \hgext{inotify} extension is enabled, Mercurial will
  22.173 +automatically and transparently start the status daemon the first time
  22.174 +you run a command that needs status in a repository.  It runs one
  22.175 +status daemon per repository.
  22.176 +
  22.177 +The status daemon is started silently, and runs in the background.  If
  22.178 +you look at a list of running processes after you've enabled the
  22.179 +\hgext{inotify} extension and run a few commands in different
  22.180 +repositories, you'll thus see a few \texttt{hg} processes sitting
  22.181 +around, waiting for updates from the kernel and queries from
  22.182 +Mercurial.
  22.183 +
  22.184 +The first time you run a Mercurial command in a repository when you
  22.185 +have the \hgext{inotify} extension enabled, it will run with about the
  22.186 +same performance as a normal Mercurial command.  This is because the
  22.187 +status daemon needs to perform a normal status scan so that it has a
  22.188 +baseline against which to apply later updates from the kernel.
  22.189 +However, \emph{every} subsequent command that does any kind of status
  22.190 +check should be noticeably faster on repositories of even fairly
  22.191 +modest size.  Better yet, the bigger your repository is, the greater a
  22.192 +performance advantage you'll see.  The \hgext{inotify} daemon makes
  22.193 +status operations almost instantaneous on repositories of all sizes!
  22.194 +
  22.195 +If you like, you can manually start a status daemon using the
  22.196 +\hgxcmd{inotify}{inserve} command.  This gives you slightly finer
  22.197 +control over how the daemon ought to run.  This command will of course
  22.198 +only be available when the \hgext{inotify} extension is enabled.
  22.199 +
  22.200 +When you're using the \hgext{inotify} extension, you should notice
  22.201 +\emph{no difference at all} in Mercurial's behaviour, with the sole
  22.202 +exception of status-related commands running a whole lot faster than
  22.203 +they used to.  You should specifically expect that commands will not
  22.204 +print different output; neither should they give different results.
  22.205 +If either of these situations occurs, please report a bug.
  22.206 +
  22.207 +\section{Flexible diff support with the \hgext{extdiff} extension}
  22.208 +\label{sec:hgext:extdiff}
  22.209 +
  22.210 +Mercurial's built-in \hgcmd{diff} command outputs plaintext unified
  22.211 +diffs.
  22.212 +\interaction{extdiff.diff}
  22.213 +If you would like to use an external tool to display modifications,
  22.214 +you'll want to use the \hgext{extdiff} extension.  This will let you
  22.215 +use, for example, a graphical diff tool.
  22.216 +
  22.217 +The \hgext{extdiff} extension is bundled with Mercurial, so it's easy
  22.218 +to set up.  In the \rcsection{extensions} section of your \hgrc,
  22.219 +simply add a one-line entry to enable the extension.
  22.220 +\begin{codesample2}
  22.221 +  [extensions]
  22.222 +  extdiff =
  22.223 +\end{codesample2}
  22.224 +This introduces a command named \hgxcmd{extdiff}{extdiff}, which by
  22.225 +default uses your system's \command{diff} command to generate a
  22.226 +unified diff in the same form as the built-in \hgcmd{diff} command.
  22.227 +\interaction{extdiff.extdiff}
  22.228 +The result won't be exactly the same as with the built-in \hgcmd{diff}
  22.229 +variations, because the output of \command{diff} varies from one
  22.230 +system to another, even when passed the same options.
  22.231 +
  22.232 +As the ``\texttt{making snapshot}'' lines of output above imply, the
  22.233 +\hgxcmd{extdiff}{extdiff} command works by creating two snapshots of
  22.234 +your source tree.  The first snapshot is of the source revision; the
  22.235 +second, of the target revision or working directory.  The
  22.236 +\hgxcmd{extdiff}{extdiff} command generates these snapshots in a
  22.237 +temporary directory, passes the name of each directory to an external
  22.238 +diff viewer, then deletes the temporary directory.  For efficiency, it
  22.239 +only snapshots the directories and files that have changed between the
  22.240 +two revisions.
  22.241 +
  22.242 +Snapshot directory names have the same base name as your repository.
  22.243 +If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo}
  22.244 +will be the name of each snapshot directory.  Each snapshot directory
  22.245 +name has its changeset ID appended, if appropriate.  If a snapshot is
  22.246 +of revision \texttt{a631aca1083f}, the directory will be named
  22.247 +\dirname{foo.a631aca1083f}.  A snapshot of the working directory won't
  22.248 +have a changeset ID appended, so it would just be \dirname{foo} in
  22.249 +this example.  To see what this looks like in practice, look again at
  22.250 +the \hgxcmd{extdiff}{extdiff} example above.  Notice that the diff has
  22.251 +the snapshot directory names embedded in its header.
  22.252 +
  22.253 +The \hgxcmd{extdiff}{extdiff} command accepts two important options.
  22.254 +The \hgxopt{extdiff}{extdiff}{-p} option lets you choose a program to
  22.255 +view differences with, instead of \command{diff}.  With the
  22.256 +\hgxopt{extdiff}{extdiff}{-o} option, you can change the options that
  22.257 +\hgxcmd{extdiff}{extdiff} passes to the program (by default, these
  22.258 +options are ``\texttt{-Npru}'', which only make sense if you're
  22.259 +running \command{diff}).  In other respects, the
  22.260 +\hgxcmd{extdiff}{extdiff} command acts similarly to the built-in
  22.261 +\hgcmd{diff} command: you use the same option names, syntax, and
  22.262 +arguments to specify the revisions you want, the files you want, and
  22.263 +so on.
  22.264 +
  22.265 +As an example, here's how to run the normal system \command{diff}
  22.266 +command, getting it to generate context diffs (using the
  22.267 +\cmdopt{diff}{-c} option) instead of unified diffs, and five lines of
  22.268 +context instead of the default three (passing \texttt{5} as the
  22.269 +argument to the \cmdopt{diff}{-C} option).
  22.270 +\interaction{extdiff.extdiff-ctx}
  22.271 +
  22.272 +Launching a visual diff tool is just as easy.  Here's how to launch
  22.273 +the \command{kdiff3} viewer.
  22.274 +\begin{codesample2}
  22.275 +  hg extdiff -p kdiff3 -o ''
  22.276 +\end{codesample2}
  22.277 +
  22.278 +If your diff viewing command can't deal with directories, you can
  22.279 +easily work around this with a little scripting.  For an example of
  22.280 +such scripting in action with the \hgext{mq} extension and the
  22.281 +\command{interdiff} command, see
  22.282 +section~\ref{mq-collab:tips:interdiff}.
  22.283 +
  22.284 +\subsection{Defining command aliases}
  22.285 +
  22.286 +It can be cumbersome to remember the options to both the
  22.287 +\hgxcmd{extdiff}{extdiff} command and the diff viewer you want to use,
  22.288 +so the \hgext{extdiff} extension lets you define \emph{new} commands
  22.289 +that will invoke your diff viewer with exactly the right options.
  22.290 +
  22.291 +All you need to do is edit your \hgrc, and add a section named
  22.292 +\rcsection{extdiff}.  Inside this section, you can define multiple
  22.293 +commands.  Here's how to add a \texttt{kdiff3} command.  Once you've
  22.294 +defined this, you can type ``\texttt{hg kdiff3}'' and the
  22.295 +\hgext{extdiff} extension will run \command{kdiff3} for you.
  22.296 +\begin{codesample2}
  22.297 +  [extdiff]
  22.298 +  cmd.kdiff3 =
  22.299 +\end{codesample2}
  22.300 +If you leave the right hand side of the definition empty, as above,
  22.301 +the \hgext{extdiff} extension uses the name of the command you defined
  22.302 +as the name of the external program to run.  But these names don't
  22.303 +have to be the same.  Here, we define a command named ``\texttt{hg
  22.304 +  wibble}'', which runs \command{kdiff3}.
  22.305 +\begin{codesample2}
  22.306 +  [extdiff]
  22.307 +  cmd.wibble = kdiff3
  22.308 +\end{codesample2}
  22.309 +
  22.310 +You can also specify the default options that you want to invoke your
  22.311 +diff viewing program with.  The prefix to use is ``\texttt{opts.}'',
  22.312 +followed by the name of the command to which the options apply.  This
  22.313 +example defines a ``\texttt{hg vimdiff}'' command that runs the
  22.314 +\command{vim} editor's \texttt{DirDiff} extension.
  22.315 +\begin{codesample2}
  22.316 +  [extdiff]  
  22.317 +  cmd.vimdiff = vim
  22.318 +  opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)'
  22.319 +\end{codesample2}
  22.320 +
  22.321 +\section{Cherrypicking changes with the \hgext{transplant} extension}
  22.322 +\label{sec:hgext:transplant}
  22.323 +
  22.324 +Need to have a long chat with Brendan about this.
  22.325 +
  22.326 +\section{Send changes via email with the \hgext{patchbomb} extension}
  22.327 +\label{sec:hgext:patchbomb}
  22.328 +
  22.329 +Many projects have a culture of ``change review'', in which people
  22.330 +send their modifications to a mailing list for others to read and
  22.331 +comment on before they commit the final version to a shared
  22.332 +repository.  Some projects have people who act as gatekeepers; they
  22.333 +apply changes from other people to a repository to which those others
  22.334 +don't have access.
  22.335 +
  22.336 +Mercurial makes it easy to send changes over email for review or
  22.337 +application, via its \hgext{patchbomb} extension.  The extension is so
  22.338 +namd because changes are formatted as patches, and it's usual to send
  22.339 +one changeset per email message.  Sending a long series of changes by
  22.340 +email is thus much like ``bombing'' the recipient's inbox, hence
  22.341 +``patchbomb''.
  22.342 +
  22.343 +As usual, the basic configuration of the \hgext{patchbomb} extension
  22.344 +takes just one or two lines in your \hgrc.
  22.345 +\begin{codesample2}
  22.346 +  [extensions]
  22.347 +  patchbomb =
  22.348 +\end{codesample2}
  22.349 +Once you've enabled the extension, you will have a new command
  22.350 +available, named \hgxcmd{patchbomb}{email}.
  22.351 +
  22.352 +The safest and best way to invoke the \hgxcmd{patchbomb}{email}
  22.353 +command is to \emph{always} run it first with the
  22.354 +\hgxopt{patchbomb}{email}{-n} option.  This will show you what the
  22.355 +command \emph{would} send, without actually sending anything.  Once
  22.356 +you've had a quick glance over the changes and verified that you are
  22.357 +sending the right ones, you can rerun the same command, with the
  22.358 +\hgxopt{patchbomb}{email}{-n} option removed.
  22.359 +
  22.360 +The \hgxcmd{patchbomb}{email} command accepts the same kind of
  22.361 +revision syntax as every other Mercurial command.  For example, this
  22.362 +command will send every revision between 7 and \texttt{tip},
  22.363 +inclusive.
  22.364 +\begin{codesample2}
  22.365 +  hg email -n 7:tip
  22.366 +\end{codesample2}
  22.367 +You can also specify a \emph{repository} to compare with.  If you
  22.368 +provide a repository but no revisions, the \hgxcmd{patchbomb}{email}
  22.369 +command will send all revisions in the local repository that are not
  22.370 +present in the remote repository.  If you additionally specify
  22.371 +revisions or a branch name (the latter using the
  22.372 +\hgxopt{patchbomb}{email}{-b} option), this will constrain the
  22.373 +revisions sent.
  22.374 +
  22.375 +It's perfectly safe to run the \hgxcmd{patchbomb}{email} command
  22.376 +without the names of the people you want to send to: if you do this,
  22.377 +it will just prompt you for those values interactively.  (If you're
  22.378 +using a Linux or Unix-like system, you should have enhanced
  22.379 +\texttt{readline}-style editing capabilities when entering those
  22.380 +headers, too, which is useful.)
  22.381 +
  22.382 +When you are sending just one revision, the \hgxcmd{patchbomb}{email}
  22.383 +command will by default use the first line of the changeset
  22.384 +description as the subject of the single email message it sends.
  22.385 +
  22.386 +If you send multiple revisions, the \hgxcmd{patchbomb}{email} command
  22.387 +will usually send one message per changeset.  It will preface the
  22.388 +series with an introductory message, in which you should describe the
  22.389 +purpose of the series of changes you're sending.
  22.390 +
  22.391 +\subsection{Changing the behaviour of patchbombs}
  22.392 +
  22.393 +Not every project has exactly the same conventions for sending changes
  22.394 +in email; the \hgext{patchbomb} extension tries to accommodate a
  22.395 +number of variations through command line options.
  22.396 +\begin{itemize}
  22.397 +\item You can write a subject for the introductory message on the
  22.398 +  command line using the \hgxopt{patchbomb}{email}{-s} option.  This
  22.399 +  takes one argument, the text of the subject to use.
  22.400 +\item To change the email address from which the messages originate,
  22.401 +  use the \hgxopt{patchbomb}{email}{-f} option.  This takes one
  22.402 +  argument, the email address to use.
  22.403 +\item The default behaviour is to send unified diffs (see
  22.404 +  section~\ref{sec:mq:patch} for a description of the format), one per
  22.405 +  message.  You can send a binary bundle instead with the
  22.406 +  \hgxopt{patchbomb}{email}{-b} option.  
  22.407 +\item Unified diffs are normally prefaced with a metadata header.  You
  22.408 +  can omit this, and send unadorned diffs, with the
  22.409 +  \hgxopt{patchbomb}{email}{--plain} option.
  22.410 +\item Diffs are normally sent ``inline'', in the same body part as the
  22.411 +  description of a patch.  This makes it easiest for the largest
  22.412 +  number of readers to quote and respond to parts of a diff, as some
  22.413 +  mail clients will only quote the first MIME body part in a message.
  22.414 +  If you'd prefer to send the description and the diff in separate
  22.415 +  body parts, use the \hgxopt{patchbomb}{email}{-a} option.
  22.416 +\item Instead of sending mail messages, you can write them to an
  22.417 +  \texttt{mbox}-format mail folder using the
  22.418 +  \hgxopt{patchbomb}{email}{-m} option.  That option takes one
  22.419 +  argument, the name of the file to write to.
  22.420 +\item If you would like to add a \command{diffstat}-format summary to
  22.421 +  each patch, and one to the introductory message, use the
  22.422 +  \hgxopt{patchbomb}{email}{-d} option.  The \command{diffstat}
  22.423 +  command displays a table containing the name of each file patched,
  22.424 +  the number of lines affected, and a histogram showing how much each
  22.425 +  file is modified.  This gives readers a qualitative glance at how
  22.426 +  complex a patch is.
  22.427 +\end{itemize}
  22.428 +
  22.429 +%%% Local Variables: 
  22.430 +%%% mode: latex
  22.431 +%%% TeX-master: "00book"
  22.432 +%%% End: 

    23.1 --- a/en/cmdref.tex	Thu Jan 29 22:47:34 2009 -0800
    23.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    23.3 @@ -1,176 +0,0 @@
    23.4 -\chapter{Command reference}
    23.5 -\label{cmdref}
    23.6 -
    23.7 -\cmdref{add}{add files at the next commit}
    23.8 -\optref{add}{I}{include}
    23.9 -\optref{add}{X}{exclude}
   23.10 -\optref{add}{n}{dry-run}
   23.11 -
   23.12 -\cmdref{diff}{print changes in history or working directory}
   23.13 -
   23.14 -Show differences between revisions for the specified files or
   23.15 -directories, using the unified diff format.  For a description of the
   23.16 -unified diff format, see section~\ref{sec:mq:patch}.
   23.17 -
   23.18 -By default, this command does not print diffs for files that Mercurial
   23.19 -considers to contain binary data.  To control this behaviour, see the
   23.20 -\hgopt{diff}{-a} and \hgopt{diff}{--git} options.
   23.21 -
   23.22 -\subsection{Options}
   23.23 -
   23.24 -\loptref{diff}{nodates}
   23.25 -
   23.26 -Omit date and time information when printing diff headers.
   23.27 -
   23.28 -\optref{diff}{B}{ignore-blank-lines}
   23.29 -
   23.30 -Do not print changes that only insert or delete blank lines.  A line
   23.31 -that contains only whitespace is not considered blank.
   23.32 -
   23.33 -\optref{diff}{I}{include}
   23.34 -
   23.35 -Include files and directories whose names match the given patterns.
   23.36 -
   23.37 -\optref{diff}{X}{exclude}
   23.38 -
   23.39 -Exclude files and directories whose names match the given patterns.
   23.40 -
   23.41 -\optref{diff}{a}{text}
   23.42 -
   23.43 -If this option is not specified, \hgcmd{diff} will refuse to print
   23.44 -diffs for files that it detects as binary. Specifying \hgopt{diff}{-a}
   23.45 -forces \hgcmd{diff} to treat all files as text, and generate diffs for
   23.46 -all of them.
   23.47 -
   23.48 -This option is useful for files that are ``mostly text'' but have a
   23.49 -few embedded NUL characters.  If you use it on files that contain a
   23.50 -lot of binary data, its output will be incomprehensible.
   23.51 -
   23.52 -\optref{diff}{b}{ignore-space-change}
   23.53 -
   23.54 -Do not print a line if the only change to that line is in the amount
   23.55 -of white space it contains.
   23.56 -
   23.57 -\optref{diff}{g}{git}
   23.58 -
   23.59 -Print \command{git}-compatible diffs.  XXX reference a format
   23.60 -description.
   23.61 -
   23.62 -\optref{diff}{p}{show-function}
   23.63 -
   23.64 -Display the name of the enclosing function in a hunk header, using a
   23.65 -simple heuristic.  This functionality is enabled by default, so the
   23.66 -\hgopt{diff}{-p} option has no effect unless you change the value of
   23.67 -the \rcitem{diff}{showfunc} config item, as in the following example.
   23.68 -\interaction{cmdref.diff-p}
   23.69 -
   23.70 -\optref{diff}{r}{rev}
   23.71 -
   23.72 -Specify one or more revisions to compare.  The \hgcmd{diff} command
   23.73 -accepts up to two \hgopt{diff}{-r} options to specify the revisions to
   23.74 -compare.
   23.75 -
   23.76 -\begin{enumerate}
   23.77 -\setcounter{enumi}{0}
   23.78 -\item Display the differences between the parent revision of the
   23.79 -  working directory and the working directory.
   23.80 -\item Display the differences between the specified changeset and the
   23.81 -  working directory.
   23.82 -\item Display the differences between the two specified changesets.
   23.83 -\end{enumerate}
   23.84 -
   23.85 -You can specify two revisions using either two \hgopt{diff}{-r}
   23.86 -options or revision range notation.  For example, the two revision
   23.87 -specifications below are equivalent.
   23.88 -\begin{codesample2}
   23.89 -  hg diff -r 10 -r 20
   23.90 -  hg diff -r10:20
   23.91 -\end{codesample2}
   23.92 -
   23.93 -When you provide two revisions, Mercurial treats the order of those
   23.94 -revisions as significant.  Thus, \hgcmdargs{diff}{-r10:20} will
   23.95 -produce a diff that will transform files from their contents as of
   23.96 -revision~10 to their contents as of revision~20, while
   23.97 -\hgcmdargs{diff}{-r20:10} means the opposite: the diff that will
   23.98 -transform files from their revision~20 contents to their revision~10
   23.99 -contents.  You cannot reverse the ordering in this way if you are
  23.100 -diffing against the working directory.
  23.101 -
  23.102 -\optref{diff}{w}{ignore-all-space}
  23.103 -
  23.104 -\cmdref{version}{print version and copyright information}
  23.105 -
  23.106 -This command displays the version of Mercurial you are running, and
  23.107 -its copyright license.  There are four kinds of version string that
  23.108 -you may see.
  23.109 -\begin{itemize}
  23.110 -\item The string ``\texttt{unknown}''. This version of Mercurial was
  23.111 -  not built in a Mercurial repository, and cannot determine its own
  23.112 -  version.
  23.113 -\item A short numeric string, such as ``\texttt{1.1}''. This is a
  23.114 -  build of a revision of Mercurial that was identified by a specific
  23.115 -  tag in the repository where it was built.  (This doesn't necessarily
  23.116 -  mean that you're running an official release; someone else could
  23.117 -  have added that tag to any revision in the repository where they
  23.118 -  built Mercurial.)
  23.119 -\item A hexadecimal string, such as ``\texttt{875489e31abe}''.  This
  23.120 -  is a build of the given revision of Mercurial.
  23.121 -\item A hexadecimal string followed by a date, such as
  23.122 -  ``\texttt{875489e31abe+20070205}''.  This is a build of the given
  23.123 -  revision of Mercurial, where the build repository contained some
  23.124 -  local changes that had not been committed.
  23.125 -\end{itemize}
  23.126 -
  23.127 -\subsection{Tips and tricks}
  23.128 -
  23.129 -\subsubsection{Why do the results of \hgcmd{diff} and \hgcmd{status}
  23.130 -  differ?}
  23.131 -\label{cmdref:diff-vs-status}
  23.132 -
  23.133 -When you run the \hgcmd{status} command, you'll see a list of files
  23.134 -that Mercurial will record changes for the next time you perform a
  23.135 -commit.  If you run the \hgcmd{diff} command, you may notice that it
  23.136 -prints diffs for only a \emph{subset} of the files that \hgcmd{status}
  23.137 -listed.  There are two possible reasons for this.
  23.138 -
  23.139 -The first is that \hgcmd{status} prints some kinds of modifications
  23.140 -that \hgcmd{diff} doesn't normally display.  The \hgcmd{diff} command
  23.141 -normally outputs unified diffs, which don't have the ability to
  23.142 -represent some changes that Mercurial can track.  Most notably,
  23.143 -traditional diffs can't represent a change in whether or not a file is
  23.144 -executable, but Mercurial records this information.
  23.145 -
  23.146 -If you use the \hgopt{diff}{--git} option to \hgcmd{diff}, it will
  23.147 -display \command{git}-compatible diffs that \emph{can} display this
  23.148 -extra information.
  23.149 -
  23.150 -The second possible reason that \hgcmd{diff} might be printing diffs
  23.151 -for a subset of the files displayed by \hgcmd{status} is that if you
  23.152 -invoke it without any arguments, \hgcmd{diff} prints diffs against the
  23.153 -first parent of the working directory.  If you have run \hgcmd{merge}
  23.154 -to merge two changesets, but you haven't yet committed the results of
  23.155 -the merge, your working directory has two parents (use \hgcmd{parents}
  23.156 -to see them).  While \hgcmd{status} prints modifications relative to
  23.157 -\emph{both} parents after an uncommitted merge, \hgcmd{diff} still
  23.158 -operates relative only to the first parent.  You can get it to print
  23.159 -diffs relative to the second parent by specifying that parent with the
  23.160 -\hgopt{diff}{-r} option.  There is no way to print diffs relative to
  23.161 -both parents.
  23.162 -
  23.163 -\subsubsection{Generating safe binary diffs}
  23.164 -
  23.165 -If you use the \hgopt{diff}{-a} option to force Mercurial to print
  23.166 -diffs of files that are either ``mostly text'' or contain lots of
  23.167 -binary data, those diffs cannot subsequently be applied by either
  23.168 -Mercurial's \hgcmd{import} command or the system's \command{patch}
  23.169 -command.  
  23.170 -
  23.171 -If you want to generate a diff of a binary file that is safe to use as
  23.172 -input for \hgcmd{import}, use the \hgcmd{diff}{--git} option when you
  23.173 -generate the patch.  The system \command{patch} command cannot handle
  23.174 -binary patches at all.
  23.175 -
  23.176 -%%% Local Variables: 
  23.177 -%%% mode: latex
  23.178 -%%% TeX-master: "00book"
  23.179 -%%% End: 

    24.1 --- a/en/collab.tex	Thu Jan 29 22:47:34 2009 -0800
    24.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    24.3 @@ -1,1118 +0,0 @@
    24.4 -\chapter{Collaborating with other people}
    24.5 -\label{cha:collab}
    24.6 -
    24.7 -As a completely decentralised tool, Mercurial doesn't impose any
    24.8 -policy on how people ought to work with each other.  However, if
    24.9 -you're new to distributed revision control, it helps to have some
   24.10 -tools and examples in mind when you're thinking about possible
   24.11 -workflow models.
   24.12 -
   24.13 -\section{Mercurial's web interface}
   24.14 -
   24.15 -Mercurial has a powerful web interface that provides several 
   24.16 -useful capabilities.
   24.17 -
   24.18 -For interactive use, the web interface lets you browse a single
   24.19 -repository or a collection of repositories.  You can view the history
   24.20 -of a repository, examine each change (comments and diffs), and view
   24.21 -the contents of each directory and file.
   24.22 -
   24.23 -Also for human consumption, the web interface provides an RSS feed of
   24.24 -the changes in a repository.  This lets you ``subscribe'' to a
   24.25 -repository using your favourite feed reader, and be automatically
   24.26 -notified of activity in that repository as soon as it happens.  I find
   24.27 -this capability much more convenient than the model of subscribing to
   24.28 -a mailing list to which notifications are sent, as it requires no
   24.29 -additional configuration on the part of whoever is serving the
   24.30 -repository.
   24.31 -
   24.32 -The web interface also lets remote users clone a repository, pull
   24.33 -changes from it, and (when the server is configured to permit it) push
   24.34 -changes back to it.  Mercurial's HTTP tunneling protocol aggressively
   24.35 -compresses data, so that it works efficiently even over low-bandwidth
   24.36 -network connections.
   24.37 -
   24.38 -The easiest way to get started with the web interface is to use your
   24.39 -web browser to visit an existing repository, such as the master
   24.40 -Mercurial repository at
   24.41 -\url{http://www.selenic.com/repo/hg?style=gitweb}.
   24.42 -
   24.43 -If you're interested in providing a web interface to your own
   24.44 -repositories, Mercurial provides two ways to do this.  The first is
   24.45 -using the \hgcmd{serve} command, which is best suited to short-term
   24.46 -``lightweight'' serving.  See section~\ref{sec:collab:serve} below for
   24.47 -details of how to use this command.  If you have a long-lived
   24.48 -repository that you'd like to make permanently available, Mercurial
   24.49 -has built-in support for the CGI (Common Gateway Interface) standard,
   24.50 -which all common web servers support.  See
   24.51 -section~\ref{sec:collab:cgi} for details of CGI configuration.
   24.52 -
   24.53 -\section{Collaboration models}
   24.54 -
   24.55 -With a suitably flexible tool, making decisions about workflow is much
   24.56 -more of a social engineering challenge than a technical one.
   24.57 -Mercurial imposes few limitations on how you can structure the flow of
   24.58 -work in a project, so it's up to you and your group to set up and live
   24.59 -with a model that matches your own particular needs.
   24.60 -
   24.61 -\subsection{Factors to keep in mind}
   24.62 -
   24.63 -The most important aspect of any model that you must keep in mind is
   24.64 -how well it matches the needs and capabilities of the people who will
   24.65 -be using it.  This might seem self-evident; even so, you still can't
   24.66 -afford to forget it for a moment.
   24.67 -
   24.68 -I once put together a workflow model that seemed to make perfect sense
   24.69 -to me, but that caused a considerable amount of consternation and
   24.70 -strife within my development team.  In spite of my attempts to explain
   24.71 -why we needed a complex set of branches, and how changes ought to flow
   24.72 -between them, a few team members revolted.  Even though they were
   24.73 -smart people, they didn't want to pay attention to the constraints we
   24.74 -were operating under, or face the consequences of those constraints in
   24.75 -the details of the model that I was advocating.
   24.76 -
   24.77 -Don't sweep foreseeable social or technical problems under the rug.
   24.78 -Whatever scheme you put into effect, you should plan for mistakes and
   24.79 -problem scenarios.  Consider adding automated machinery to prevent, or
   24.80 -quickly recover from, trouble that you can anticipate.  As an example,
   24.81 -if you intend to have a branch with not-for-release changes in it,
   24.82 -you'd do well to think early about the possibility that someone might
   24.83 -accidentally merge those changes into a release branch.  You could
   24.84 -avoid this particular problem by writing a hook that prevents changes
   24.85 -from being merged from an inappropriate branch.
   24.86 -
   24.87 -\subsection{Informal anarchy}
   24.88 -
   24.89 -I wouldn't suggest an ``anything goes'' approach as something
   24.90 -sustainable, but it's a model that's easy to grasp, and it works
   24.91 -perfectly well in a few unusual situations.
   24.92 -
   24.93 -As one example, many projects have a loose-knit group of collaborators
   24.94 -who rarely physically meet each other.  Some groups like to overcome
   24.95 -the isolation of working at a distance by organising occasional
   24.96 -``sprints''.  In a sprint, a number of people get together in a single
   24.97 -location (a company's conference room, a hotel meeting room, that kind
   24.98 -of place) and spend several days more or less locked in there, hacking
   24.99 -intensely on a handful of projects.
  24.100 -
  24.101 -A sprint is the perfect place to use the \hgcmd{serve} command, since
  24.102 -\hgcmd{serve} does not requires any fancy server infrastructure.  You
  24.103 -can get started with \hgcmd{serve} in moments, by reading
  24.104 -section~\ref{sec:collab:serve} below.  Then simply tell the person
  24.105 -next to you that you're running a server, send the URL to them in an
  24.106 -instant message, and you immediately have a quick-turnaround way to
  24.107 -work together.  They can type your URL into their web browser and
  24.108 -quickly review your changes; or they can pull a bugfix from you and
  24.109 -verify it; or they can clone a branch containing a new feature and try
  24.110 -it out.
  24.111 -
  24.112 -The charm, and the problem, with doing things in an ad hoc fashion
  24.113 -like this is that only people who know about your changes, and where
  24.114 -they are, can see them.  Such an informal approach simply doesn't
  24.115 -scale beyond a handful people, because each individual needs to know
  24.116 -about $n$ different repositories to pull from.
  24.117 -
  24.118 -\subsection{A single central repository}
  24.119 -
  24.120 -For smaller projects migrating from a centralised revision control
  24.121 -tool, perhaps the easiest way to get started is to have changes flow
  24.122 -through a single shared central repository.  This is also the
  24.123 -most common ``building block'' for more ambitious workflow schemes.
  24.124 -
  24.125 -Contributors start by cloning a copy of this repository.  They can
  24.126 -pull changes from it whenever they need to, and some (perhaps all)
  24.127 -developers have permission to push a change back when they're ready
  24.128 -for other people to see it.
  24.129 -
  24.130 -Under this model, it can still often make sense for people to pull
  24.131 -changes directly from each other, without going through the central
  24.132 -repository.  Consider a case in which I have a tentative bug fix, but
  24.133 -I am worried that if I were to publish it to the central repository,
  24.134 -it might subsequently break everyone else's trees as they pull it.  To
  24.135 -reduce the potential for damage, I can ask you to clone my repository
  24.136 -into a temporary repository of your own and test it.  This lets us put
  24.137 -off publishing the potentially unsafe change until it has had a little
  24.138 -testing.
  24.139 -
  24.140 -In this kind of scenario, people usually use the \command{ssh}
  24.141 -protocol to securely push changes to the central repository, as
  24.142 -documented in section~\ref{sec:collab:ssh}.  It's also usual to
  24.143 -publish a read-only copy of the repository over HTTP using CGI, as in
  24.144 -section~\ref{sec:collab:cgi}.  Publishing over HTTP satisfies the
  24.145 -needs of people who don't have push access, and those who want to use
  24.146 -web browsers to browse the repository's history.
  24.147 -
  24.148 -\subsection{Working with multiple branches}
  24.149 -
  24.150 -Projects of any significant size naturally tend to make progress on
  24.151 -several fronts simultaneously.  In the case of software, it's common
  24.152 -for a project to go through periodic official releases.  A release
  24.153 -might then go into ``maintenance mode'' for a while after its first
  24.154 -publication; maintenance releases tend to contain only bug fixes, not
  24.155 -new features.  In parallel with these maintenance releases, one or
  24.156 -more future releases may be under development.  People normally use
  24.157 -the word ``branch'' to refer to one of these many slightly different
  24.158 -directions in which development is proceeding.
  24.159 -
  24.160 -Mercurial is particularly well suited to managing a number of
  24.161 -simultaneous, but not identical, branches.  Each ``development
  24.162 -direction'' can live in its own central repository, and you can merge
  24.163 -changes from one to another as the need arises.  Because repositories
  24.164 -are independent of each other, unstable changes in a development
  24.165 -branch will never affect a stable branch unless someone explicitly
  24.166 -merges those changes in.
  24.167 -
  24.168 -Here's an example of how this can work in practice.  Let's say you
  24.169 -have one ``main branch'' on a central server.
  24.170 -\interaction{branching.init}
  24.171 -People clone it, make changes locally, test them, and push them back.
  24.172 -
  24.173 -Once the main branch reaches a release milestone, you can use the
  24.174 -\hgcmd{tag} command to give a permanent name to the milestone
  24.175 -revision.
  24.176 -\interaction{branching.tag}
  24.177 -Let's say some ongoing development occurs on the main branch.
  24.178 -\interaction{branching.main}
  24.179 -Using the tag that was recorded at the milestone, people who clone
  24.180 -that repository at any time in the future can use \hgcmd{update} to
  24.181 -get a copy of the working directory exactly as it was when that tagged
  24.182 -revision was committed.  
  24.183 -\interaction{branching.update}
  24.184 -
  24.185 -In addition, immediately after the main branch is tagged, someone can
  24.186 -then clone the main branch on the server to a new ``stable'' branch,
  24.187 -also on the server.
  24.188 -\interaction{branching.clone}
  24.189 -
  24.190 -Someone who needs to make a change to the stable branch can then clone
  24.191 -\emph{that} repository, make their changes, commit, and push their
  24.192 -changes back there.
  24.193 -\interaction{branching.stable}
  24.194 -Because Mercurial repositories are independent, and Mercurial doesn't
  24.195 -move changes around automatically, the stable and main branches are
  24.196 -\emph{isolated} from each other.  The changes that you made on the
  24.197 -main branch don't ``leak'' to the stable branch, and vice versa.
  24.198 -
  24.199 -You'll often want all of your bugfixes on the stable branch to show up
  24.200 -on the main branch, too.  Rather than rewrite a bugfix on the main
  24.201 -branch, you can simply pull and merge changes from the stable to the
  24.202 -main branch, and Mercurial will bring those bugfixes in for you.
  24.203 -\interaction{branching.merge}
  24.204 -The main branch will still contain changes that are not on the stable
  24.205 -branch, but it will also contain all of the bugfixes from the stable
  24.206 -branch.  The stable branch remains unaffected by these changes.
  24.207 -
  24.208 -\subsection{Feature branches}
  24.209 -
  24.210 -For larger projects, an effective way to manage change is to break up
  24.211 -a team into smaller groups.  Each group has a shared branch of its
  24.212 -own, cloned from a single ``master'' branch used by the entire
  24.213 -project.  People working on an individual branch are typically quite
  24.214 -isolated from developments on other branches.
  24.215 -
  24.216 -\begin{figure}[ht]
  24.217 -  \centering
  24.218 -  \grafix{feature-branches}
  24.219 -  \caption{Feature branches}
  24.220 -  \label{fig:collab:feature-branches}
  24.221 -\end{figure}
  24.222 -
  24.223 -When a particular feature is deemed to be in suitable shape, someone
  24.224 -on that feature team pulls and merges from the master branch into the
  24.225 -feature branch, then pushes back up to the master branch.
  24.226 -
  24.227 -\subsection{The release train}
  24.228 -
  24.229 -Some projects are organised on a ``train'' basis: a release is
  24.230 -scheduled to happen every few months, and whatever features are ready
  24.231 -when the ``train'' is ready to leave are allowed in.
  24.232 -
  24.233 -This model resembles working with feature branches.  The difference is
  24.234 -that when a feature branch misses a train, someone on the feature team
  24.235 -pulls and merges the changes that went out on that train release into
  24.236 -the feature branch, and the team continues its work on top of that
  24.237 -release so that their feature can make the next release.
  24.238 -
  24.239 -\subsection{The Linux kernel model}
  24.240 -
  24.241 -The development of the Linux kernel has a shallow hierarchical
  24.242 -structure, surrounded by a cloud of apparent chaos.  Because most
  24.243 -Linux developers use \command{git}, a distributed revision control
  24.244 -tool with capabilities similar to Mercurial, it's useful to describe
  24.245 -the way work flows in that environment; if you like the ideas, the
  24.246 -approach translates well across tools.
  24.247 -
  24.248 -At the center of the community sits Linus Torvalds, the creator of
  24.249 -Linux.  He publishes a single source repository that is considered the
  24.250 -``authoritative'' current tree by the entire developer community.
  24.251 -Anyone can clone Linus's tree, but he is very choosy about whose trees
  24.252 -he pulls from.
  24.253 -
  24.254 -Linus has a number of ``trusted lieutenants''.  As a general rule, he
  24.255 -pulls whatever changes they publish, in most cases without even
  24.256 -reviewing those changes.  Some of those lieutenants are generally
  24.257 -agreed to be ``maintainers'', responsible for specific subsystems
  24.258 -within the kernel.  If a random kernel hacker wants to make a change
  24.259 -to a subsystem that they want to end up in Linus's tree, they must
  24.260 -find out who the subsystem's maintainer is, and ask that maintainer to
  24.261 -take their change.  If the maintainer reviews their changes and agrees
  24.262 -to take them, they'll pass them along to Linus in due course.
  24.263 -
  24.264 -Individual lieutenants have their own approaches to reviewing,
  24.265 -accepting, and publishing changes; and for deciding when to feed them
  24.266 -to Linus.  In addition, there are several well known branches that
  24.267 -people use for different purposes.  For example, a few people maintain
  24.268 -``stable'' repositories of older versions of the kernel, to which they
  24.269 -apply critical fixes as needed.  Some maintainers publish multiple
  24.270 -trees: one for experimental changes; one for changes that they are
  24.271 -about to feed upstream; and so on.  Others just publish a single
  24.272 -tree.
  24.273 -
  24.274 -This model has two notable features.  The first is that it's ``pull
  24.275 -only''.  You have to ask, convince, or beg another developer to take a
  24.276 -change from you, because there are almost no trees to which more than
  24.277 -one person can push, and there's no way to push changes into a tree
  24.278 -that someone else controls.
  24.279 -
  24.280 -The second is that it's based on reputation and acclaim.  If you're an
  24.281 -unknown, Linus will probably ignore changes from you without even
  24.282 -responding.  But a subsystem maintainer will probably review them, and
  24.283 -will likely take them if they pass their criteria for suitability.
  24.284 -The more ``good'' changes you contribute to a maintainer, the more
  24.285 -likely they are to trust your judgment and accept your changes.  If
  24.286 -you're well-known and maintain a long-lived branch for something Linus
  24.287 -hasn't yet accepted, people with similar interests may pull your
  24.288 -changes regularly to keep up with your work.
  24.289 -
  24.290 -Reputation and acclaim don't necessarily cross subsystem or ``people''
  24.291 -boundaries.  If you're a respected but specialised storage hacker, and
  24.292 -you try to fix a networking bug, that change will receive a level of
  24.293 -scrutiny from a network maintainer comparable to a change from a
  24.294 -complete stranger.
  24.295 -
  24.296 -To people who come from more orderly project backgrounds, the
  24.297 -comparatively chaotic Linux kernel development process often seems
  24.298 -completely insane.  It's subject to the whims of individuals; people
  24.299 -make sweeping changes whenever they deem it appropriate; and the pace
  24.300 -of development is astounding.  And yet Linux is a highly successful,
  24.301 -well-regarded piece of software.
  24.302 -
  24.303 -\subsection{Pull-only versus shared-push collaboration}
  24.304 -
  24.305 -A perpetual source of heat in the open source community is whether a
  24.306 -development model in which people only ever pull changes from others
  24.307 -is ``better than'' one in which multiple people can push changes to a
  24.308 -shared repository.
  24.309 -
  24.310 -Typically, the backers of the shared-push model use tools that
  24.311 -actively enforce this approach.  If you're using a centralised
  24.312 -revision control tool such as Subversion, there's no way to make a
  24.313 -choice over which model you'll use: the tool gives you shared-push,
  24.314 -and if you want to do anything else, you'll have to roll your own
  24.315 -approach on top (such as applying a patch by hand).
  24.316 -
  24.317 -A good distributed revision control tool, such as Mercurial, will
  24.318 -support both models.  You and your collaborators can then structure
  24.319 -how you work together based on your own needs and preferences, not on
  24.320 -what contortions your tools force you into.
  24.321 -
  24.322 -\subsection{Where collaboration meets branch management}
  24.323 -
  24.324 -Once you and your team set up some shared repositories and start
  24.325 -propagating changes back and forth between local and shared repos, you
  24.326 -begin to face a related, but slightly different challenge: that of
  24.327 -managing the multiple directions in which your team may be moving at
  24.328 -once.  Even though this subject is intimately related to how your team
  24.329 -collaborates, it's dense enough to merit treatment of its own, in
  24.330 -chapter~\ref{chap:branch}.
  24.331 -
  24.332 -\section{The technical side of sharing}
  24.333 -
  24.334 -The remainder of this chapter is devoted to the question of serving
  24.335 -data to your collaborators.
  24.336 -
  24.337 -\section{Informal sharing with \hgcmd{serve}}
  24.338 -\label{sec:collab:serve}
  24.339 -
  24.340 -Mercurial's \hgcmd{serve} command is wonderfully suited to small,
  24.341 -tight-knit, and fast-paced group environments.  It also provides a
  24.342 -great way to get a feel for using Mercurial commands over a network.
  24.343 -
  24.344 -Run \hgcmd{serve} inside a repository, and in under a second it will
  24.345 -bring up a specialised HTTP server; this will accept connections from
  24.346 -any client, and serve up data for that repository until you terminate
  24.347 -it.  Anyone who knows the URL of the server you just started, and can
  24.348 -talk to your computer over the network, can then use a web browser or
  24.349 -Mercurial to read data from that repository.  A URL for a
  24.350 -\hgcmd{serve} instance running on a laptop is likely to look something
  24.351 -like \Verb|http://my-laptop.local:8000/|.
  24.352 -
  24.353 -The \hgcmd{serve} command is \emph{not} a general-purpose web server.
  24.354 -It can do only two things:
  24.355 -\begin{itemize}
  24.356 -\item Allow people to browse the history of the repository it's
  24.357 -  serving, from their normal web browsers.
  24.358 -\item Speak Mercurial's wire protocol, so that people can
  24.359 -  \hgcmd{clone} or \hgcmd{pull} changes from that repository.
  24.360 -\end{itemize}
  24.361 -In particular, \hgcmd{serve} won't allow remote users to \emph{modify}
  24.362 -your repository.  It's intended for read-only use.
  24.363 -
  24.364 -If you're getting started with Mercurial, there's nothing to prevent
  24.365 -you from using \hgcmd{serve} to serve up a repository on your own
  24.366 -computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and
  24.367 -so on to talk to that server as if the repository was hosted remotely.
  24.368 -This can help you to quickly get acquainted with using commands on
  24.369 -network-hosted repositories.
  24.370 -
  24.371 -\subsection{A few things to keep in mind}
  24.372 -
  24.373 -Because it provides unauthenticated read access to all clients, you
  24.374 -should only use \hgcmd{serve} in an environment where you either don't
  24.375 -care, or have complete control over, who can access your network and
  24.376 -pull data from your repository.
  24.377 -
  24.378 -The \hgcmd{serve} command knows nothing about any firewall software
  24.379 -you might have installed on your system or network.  It cannot detect
  24.380 -or control your firewall software.  If other people are unable to talk
  24.381 -to a running \hgcmd{serve} instance, the second thing you should do
  24.382 -(\emph{after} you make sure that they're using the correct URL) is
  24.383 -check your firewall configuration.
  24.384 -
  24.385 -By default, \hgcmd{serve} listens for incoming connections on
  24.386 -port~8000.  If another process is already listening on the port you
  24.387 -want to use, you can specify a different port to listen on using the
  24.388 -\hgopt{serve}{-p} option.
  24.389 -
  24.390 -Normally, when \hgcmd{serve} starts, it prints no output, which can be
  24.391 -a bit unnerving.  If you'd like to confirm that it is indeed running
  24.392 -correctly, and find out what URL you should send to your
  24.393 -collaborators, start it with the \hggopt{-v} option.
  24.394 -
  24.395 -\section{Using the Secure Shell (ssh) protocol}
  24.396 -\label{sec:collab:ssh}
  24.397 -
  24.398 -You can pull and push changes securely over a network connection using
  24.399 -the Secure Shell (\texttt{ssh}) protocol.  To use this successfully,
  24.400 -you may have to do a little bit of configuration on the client or
  24.401 -server sides.
  24.402 -
  24.403 -If you're not familiar with ssh, it's a network protocol that lets you
  24.404 -securely communicate with another computer.  To use it with Mercurial,
  24.405 -you'll be setting up one or more user accounts on a server so that
  24.406 -remote users can log in and execute commands.
  24.407 -
  24.408 -(If you \emph{are} familiar with ssh, you'll probably find some of the
  24.409 -material that follows to be elementary in nature.)
  24.410 -
  24.411 -\subsection{How to read and write ssh URLs}
  24.412 -
  24.413 -An ssh URL tends to look like this:
  24.414 -\begin{codesample2}
  24.415 -  ssh://bos@hg.serpentine.com:22/hg/hgbook
  24.416 -\end{codesample2}
  24.417 -\begin{enumerate}
  24.418 -\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh
  24.419 -  protocol.
  24.420 -\item The ``\texttt{bos@}'' component indicates what username to log
  24.421 -  into the server as.  You can leave this out if the remote username
  24.422 -  is the same as your local username.
  24.423 -\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the
  24.424 -  server to log into.
  24.425 -\item The ``:22'' identifies the port number to connect to the server
  24.426 -  on.  The default port is~22, so you only need to specify this part
  24.427 -  if you're \emph{not} using port~22.
  24.428 -\item The remainder of the URL is the local path to the repository on
  24.429 -  the server.
  24.430 -\end{enumerate}
  24.431 -
  24.432 -There's plenty of scope for confusion with the path component of ssh
  24.433 -URLs, as there is no standard way for tools to interpret it.  Some
  24.434 -programs behave differently than others when dealing with these paths.
  24.435 -This isn't an ideal situation, but it's unlikely to change.  Please
  24.436 -read the following paragraphs carefully.
  24.437 -
  24.438 -Mercurial treats the path to a repository on the server as relative to
  24.439 -the remote user's home directory.  For example, if user \texttt{foo}
  24.440 -on the server has a home directory of \dirname{/home/foo}, then an ssh
  24.441 -URL that contains a path component of \dirname{bar}
  24.442 -\emph{really} refers to the directory \dirname{/home/foo/bar}.
  24.443 -
  24.444 -If you want to specify a path relative to another user's home
  24.445 -directory, you can use a path that starts with a tilde character
  24.446 -followed by the user's name (let's call them \texttt{otheruser}), like
  24.447 -this.
  24.448 -\begin{codesample2}
  24.449 -  ssh://server/~otheruser/hg/repo
  24.450 -\end{codesample2}
  24.451 -
  24.452 -And if you really want to specify an \emph{absolute} path on the
  24.453 -server, begin the path component with two slashes, as in this example.
  24.454 -\begin{codesample2}
  24.455 -  ssh://server//absolute/path
  24.456 -\end{codesample2}
  24.457 -
  24.458 -\subsection{Finding an ssh client for your system}
  24.459 -
  24.460 -Almost every Unix-like system comes with OpenSSH preinstalled.  If
  24.461 -you're using such a system, run \Verb|which ssh| to find out if
  24.462 -the \command{ssh} command is installed (it's usually in
  24.463 -\dirname{/usr/bin}).  In the unlikely event that it isn't present,
  24.464 -take a look at your system documentation to figure out how to install
  24.465 -it.
  24.466 -
  24.467 -On Windows, you'll first need to download a suitable ssh
  24.468 -client.  There are two alternatives.
  24.469 -\begin{itemize}
  24.470 -\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides
  24.471 -  a complete suite of ssh client commands.
  24.472 -\item If you have a high tolerance for pain, you can use the Cygwin
  24.473 -  port of OpenSSH.
  24.474 -\end{itemize}
  24.475 -In either case, you'll need to edit your \hgini\ file to tell
  24.476 -Mercurial where to find the actual client command.  For example, if
  24.477 -you're using PuTTY, you'll need to use the \command{plink} command as
  24.478 -a command-line ssh client.
  24.479 -\begin{codesample2}
  24.480 -  [ui]
  24.481 -  ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"
  24.482 -\end{codesample2}
  24.483 -
  24.484 -\begin{note}
  24.485 -  The path to \command{plink} shouldn't contain any whitespace
  24.486 -  characters, or Mercurial may not be able to run it correctly (so
  24.487 -  putting it in \dirname{C:\\Program Files} is probably not a good
  24.488 -  idea).
  24.489 -\end{note}
  24.490 -
  24.491 -\subsection{Generating a key pair}
  24.492 -
  24.493 -To avoid the need to repetitively type a password every time you need
  24.494 -to use your ssh client, I recommend generating a key pair.  On a
  24.495 -Unix-like system, the \command{ssh-keygen} command will do the trick.
  24.496 -On Windows, if you're using PuTTY, the \command{puttygen} command is
  24.497 -what you'll need.
  24.498 -
  24.499 -When you generate a key pair, it's usually \emph{highly} advisable to
  24.500 -protect it with a passphrase.  (The only time that you might not want
  24.501 -to do this is when you're using the ssh protocol for automated tasks
  24.502 -on a secure network.)
  24.503 -
  24.504 -Simply generating a key pair isn't enough, however.  You'll need to
  24.505 -add the public key to the set of authorised keys for whatever user
  24.506 -you're logging in remotely as.  For servers using OpenSSH (the vast
  24.507 -majority), this will mean adding the public key to a list in a file
  24.508 -called \sfilename{authorized\_keys} in their \sdirname{.ssh}
  24.509 -directory.
  24.510 -
  24.511 -On a Unix-like system, your public key will have a \filename{.pub}
  24.512 -extension.  If you're using \command{puttygen} on Windows, you can
  24.513 -save the public key to a file of your choosing, or paste it from the
  24.514 -window it's displayed in straight into the
  24.515 -\sfilename{authorized\_keys} file.
  24.516 -
  24.517 -\subsection{Using an authentication agent}
  24.518 -
  24.519 -An authentication agent is a daemon that stores passphrases in memory
  24.520 -(so it will forget passphrases if you log out and log back in again).
  24.521 -An ssh client will notice if it's running, and query it for a
  24.522 -passphrase.  If there's no authentication agent running, or the agent
  24.523 -doesn't store the necessary passphrase, you'll have to type your
  24.524 -passphrase every time Mercurial tries to communicate with a server on
  24.525 -your behalf (e.g.~whenever you pull or push changes).
  24.526 -
  24.527 -The downside of storing passphrases in an agent is that it's possible
  24.528 -for a well-prepared attacker to recover the plain text of your
  24.529 -passphrases, in some cases even if your system has been power-cycled.
  24.530 -You should make your own judgment as to whether this is an acceptable
  24.531 -risk.  It certainly saves a lot of repeated typing.
  24.532 -
  24.533 -On Unix-like systems, the agent is called \command{ssh-agent}, and
  24.534 -it's often run automatically for you when you log in.  You'll need to
  24.535 -use the \command{ssh-add} command to add passphrases to the agent's
  24.536 -store.  On Windows, if you're using PuTTY, the \command{pageant}
  24.537 -command acts as the agent.  It adds an icon to your system tray that
  24.538 -will let you manage stored passphrases.
  24.539 -
  24.540 -\subsection{Configuring the server side properly}
  24.541 -
  24.542 -Because ssh can be fiddly to set up if you're new to it, there's a
  24.543 -variety of things that can go wrong.  Add Mercurial on top, and
  24.544 -there's plenty more scope for head-scratching.  Most of these
  24.545 -potential problems occur on the server side, not the client side.  The
  24.546 -good news is that once you've gotten a configuration working, it will
  24.547 -usually continue to work indefinitely.
  24.548 -
  24.549 -Before you try using Mercurial to talk to an ssh server, it's best to
  24.550 -make sure that you can use the normal \command{ssh} or \command{putty}
  24.551 -command to talk to the server first.  If you run into problems with
  24.552 -using these commands directly, Mercurial surely won't work.  Worse, it
  24.553 -will obscure the underlying problem.  Any time you want to debug
  24.554 -ssh-related Mercurial problems, you should drop back to making sure
  24.555 -that plain ssh client commands work first, \emph{before} you worry
  24.556 -about whether there's a problem with Mercurial.
  24.557 -
  24.558 -The first thing to be sure of on the server side is that you can
  24.559 -actually log in from another machine at all.  If you can't use
  24.560 -\command{ssh} or \command{putty} to log in, the error message you get
  24.561 -may give you a few hints as to what's wrong.  The most common problems
  24.562 -are as follows.
  24.563 -\begin{itemize}
  24.564 -\item If you get a ``connection refused'' error, either there isn't an
  24.565 -  SSH daemon running on the server at all, or it's inaccessible due to
  24.566 -  firewall configuration.
  24.567 -\item If you get a ``no route to host'' error, you either have an
  24.568 -  incorrect address for the server or a seriously locked down firewall
  24.569 -  that won't admit its existence at all.
  24.570 -\item If you get a ``permission denied'' error, you may have mistyped
  24.571 -  the username on the server, or you could have mistyped your key's
  24.572 -  passphrase or the remote user's password.
  24.573 -\end{itemize}
  24.574 -In summary, if you're having trouble talking to the server's ssh
  24.575 -daemon, first make sure that one is running at all.  On many systems
  24.576 -it will be installed, but disabled, by default.  Once you're done with
  24.577 -this step, you should then check that the server's firewall is
  24.578 -configured to allow incoming connections on the port the ssh daemon is
  24.579 -listening on (usually~22).  Don't worry about more exotic
  24.580 -possibilities for misconfiguration until you've checked these two
  24.581 -first.
  24.582 -
  24.583 -If you're using an authentication agent on the client side to store
  24.584 -passphrases for your keys, you ought to be able to log into the server
  24.585 -without being prompted for a passphrase or a password.  If you're
  24.586 -prompted for a passphrase, there are a few possible culprits.
  24.587 -\begin{itemize}
  24.588 -\item You might have forgotten to use \command{ssh-add} or
  24.589 -  \command{pageant} to store the passphrase.
  24.590 -\item You might have stored the passphrase for the wrong key.
  24.591 -\end{itemize}
  24.592 -If you're being prompted for the remote user's password, there are
  24.593 -another few possible problems to check.
  24.594 -\begin{itemize}
  24.595 -\item Either the user's home directory or their \sdirname{.ssh}
  24.596 -  directory might have excessively liberal permissions.  As a result,
  24.597 -  the ssh daemon will not trust or read their
  24.598 -  \sfilename{authorized\_keys} file.  For example, a group-writable
  24.599 -  home or \sdirname{.ssh} directory will often cause this symptom.
  24.600 -\item The user's \sfilename{authorized\_keys} file may have a problem.
  24.601 -  If anyone other than the user owns or can write to that file, the
  24.602 -  ssh daemon will not trust or read it.
  24.603 -\end{itemize}
  24.604 -
  24.605 -In the ideal world, you should be able to run the following command
  24.606 -successfully, and it should print exactly one line of output, the
  24.607 -current date and time.
  24.608 -\begin{codesample2}
  24.609 -  ssh myserver date
  24.610 -\end{codesample2}
  24.611 -
  24.612 -If, on your server, you have login scripts that print banners or other
  24.613 -junk even when running non-interactive commands like this, you should
  24.614 -fix them before you continue, so that they only print output if
  24.615 -they're run interactively.  Otherwise these banners will at least
  24.616 -clutter up Mercurial's output.  Worse, they could potentially cause
  24.617 -problems with running Mercurial commands remotely.  Mercurial makes
  24.618 -tries to detect and ignore banners in non-interactive \command{ssh}
  24.619 -sessions, but it is not foolproof.  (If you're editing your login
  24.620 -scripts on your server, the usual way to see if a login script is
  24.621 -running in an interactive shell is to check the return code from the
  24.622 -command \Verb|tty -s|.)
  24.623 -
  24.624 -Once you've verified that plain old ssh is working with your server,
  24.625 -the next step is to ensure that Mercurial runs on the server.  The
  24.626 -following command should run successfully:
  24.627 -\begin{codesample2}
  24.628 -  ssh myserver hg version
  24.629 -\end{codesample2}
  24.630 -If you see an error message instead of normal \hgcmd{version} output,
  24.631 -this is usually because you haven't installed Mercurial to
  24.632 -\dirname{/usr/bin}.  Don't worry if this is the case; you don't need
  24.633 -to do that.  But you should check for a few possible problems.
  24.634 -\begin{itemize}
  24.635 -\item Is Mercurial really installed on the server at all?  I know this
  24.636 -  sounds trivial, but it's worth checking!
  24.637 -\item Maybe your shell's search path (usually set via the \envar{PATH}
  24.638 -  environment variable) is simply misconfigured.
  24.639 -\item Perhaps your \envar{PATH} environment variable is only being set
  24.640 -  to point to the location of the \command{hg} executable if the login
  24.641 -  session is interactive.  This can happen if you're setting the path
  24.642 -  in the wrong shell login script.  See your shell's documentation for
  24.643 -  details.
  24.644 -\item The \envar{PYTHONPATH} environment variable may need to contain
  24.645 -  the path to the Mercurial Python modules.  It might not be set at
  24.646 -  all; it could be incorrect; or it may be set only if the login is
  24.647 -  interactive.
  24.648 -\end{itemize}
  24.649 -
  24.650 -If you can run \hgcmd{version} over an ssh connection, well done!
  24.651 -You've got the server and client sorted out.  You should now be able
  24.652 -to use Mercurial to access repositories hosted by that username on
  24.653 -that server.  If you run into problems with Mercurial and ssh at this
  24.654 -point, try using the \hggopt{--debug} option to get a clearer picture
  24.655 -of what's going on.
  24.656 -
  24.657 -\subsection{Using compression with ssh}
  24.658 -
  24.659 -Mercurial does not compress data when it uses the ssh protocol,
  24.660 -because the ssh protocol can transparently compress data.  However,
  24.661 -the default behaviour of ssh clients is \emph{not} to request
  24.662 -compression.
  24.663 -
  24.664 -Over any network other than a fast LAN (even a wireless network),
  24.665 -using compression is likely to significantly speed up Mercurial's
  24.666 -network operations.  For example, over a WAN, someone measured
  24.667 -compression as reducing the amount of time required to clone a
  24.668 -particularly large repository from~51 minutes to~17 minutes.
  24.669 -
  24.670 -Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C}
  24.671 -option which turns on compression.  You can easily edit your \hgrc\ to
  24.672 -enable compression for all of Mercurial's uses of the ssh protocol.
  24.673 -\begin{codesample2}
  24.674 -  [ui]
  24.675 -  ssh = ssh -C
  24.676 -\end{codesample2}
  24.677 -
  24.678 -If you use \command{ssh}, you can configure it to always use
  24.679 -compression when talking to your server.  To do this, edit your
  24.680 -\sfilename{.ssh/config} file (which may not yet exist), as follows.
  24.681 -\begin{codesample2}
  24.682 -  Host hg
  24.683 -    Compression yes
  24.684 -    HostName hg.example.com
  24.685 -\end{codesample2}
  24.686 -This defines an alias, \texttt{hg}.  When you use it on the
  24.687 -\command{ssh} command line or in a Mercurial \texttt{ssh}-protocol
  24.688 -URL, it will cause \command{ssh} to connect to \texttt{hg.example.com}
  24.689 -and use compression.  This gives you both a shorter name to type and
  24.690 -compression, each of which is a good thing in its own right.
  24.691 -
  24.692 -\section{Serving over HTTP using CGI}
  24.693 -\label{sec:collab:cgi}
  24.694 -
  24.695 -Depending on how ambitious you are, configuring Mercurial's CGI
  24.696 -interface can take anything from a few moments to several hours.
  24.697 -
  24.698 -We'll begin with the simplest of examples, and work our way towards a
  24.699 -more complex configuration.  Even for the most basic case, you're
  24.700 -almost certainly going to need to read and modify your web server's
  24.701 -configuration.
  24.702 -
  24.703 -\begin{note}
  24.704 -  Configuring a web server is a complex, fiddly, and highly
  24.705 -  system-dependent activity.  I can't possibly give you instructions
  24.706 -  that will cover anything like all of the cases you will encounter.
  24.707 -  Please use your discretion and judgment in following the sections
  24.708 -  below.  Be prepared to make plenty of mistakes, and to spend a lot
  24.709 -  of time reading your server's error logs.
  24.710 -\end{note}
  24.711 -
  24.712 -\subsection{Web server configuration checklist}
  24.713 -
  24.714 -Before you continue, do take a few moments to check a few aspects of
  24.715 -your system's setup.
  24.716 -
  24.717 -\begin{enumerate}
  24.718 -\item Do you have a web server installed at all?  Mac OS X ships with
  24.719 -  Apache, but many other systems may not have a web server installed.
  24.720 -\item If you have a web server installed, is it actually running?  On
  24.721 -  most systems, even if one is present, it will be disabled by
  24.722 -  default.
  24.723 -\item Is your server configured to allow you to run CGI programs in
  24.724 -  the directory where you plan to do so?  Most servers default to
  24.725 -  explicitly disabling the ability to run CGI programs.
  24.726 -\end{enumerate}
  24.727 -
  24.728 -If you don't have a web server installed, and don't have substantial
  24.729 -experience configuring Apache, you should consider using the
  24.730 -\texttt{lighttpd} web server instead of Apache.  Apache has a
  24.731 -well-deserved reputation for baroque and confusing configuration.
  24.732 -While \texttt{lighttpd} is less capable in some ways than Apache, most
  24.733 -of these capabilities are not relevant to serving Mercurial
  24.734 -repositories.  And \texttt{lighttpd} is undeniably \emph{much} easier
  24.735 -to get started with than Apache.
  24.736 -
  24.737 -\subsection{Basic CGI configuration}
  24.738 -
  24.739 -On Unix-like systems, it's common for users to have a subdirectory
  24.740 -named something like \dirname{public\_html} in their home directory,
  24.741 -from which they can serve up web pages.  A file named \filename{foo}
  24.742 -in this directory will be accessible at a URL of the form
  24.743 -\texttt{http://www.example.com/\~{}username/foo}.
  24.744 -
  24.745 -To get started, find the \sfilename{hgweb.cgi} script that should be
  24.746 -present in your Mercurial installation.  If you can't quickly find a
  24.747 -local copy on your system, simply download one from the master
  24.748 -Mercurial repository at
  24.749 -\url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}.
  24.750 -
  24.751 -You'll need to copy this script into your \dirname{public\_html}
  24.752 -directory, and ensure that it's executable.
  24.753 -\begin{codesample2}
  24.754 -  cp .../hgweb.cgi ~/public_html
  24.755 -  chmod 755 ~/public_html/hgweb.cgi
  24.756 -\end{codesample2}
  24.757 -The \texttt{755} argument to \command{chmod} is a little more general
  24.758 -than just making the script executable: it ensures that the script is
  24.759 -executable by anyone, and that ``group'' and ``other'' write
  24.760 -permissions are \emph{not} set.  If you were to leave those write
  24.761 -permissions enabled, Apache's \texttt{suexec} subsystem would likely
  24.762 -refuse to execute the script.  In fact, \texttt{suexec} also insists
  24.763 -that the \emph{directory} in which the script resides must not be
  24.764 -writable by others.
  24.765 -\begin{codesample2}
  24.766 -  chmod 755 ~/public_html
  24.767 -\end{codesample2}
  24.768 -
  24.769 -\subsubsection{What could \emph{possibly} go wrong?}
  24.770 -\label{sec:collab:wtf}
  24.771 -
  24.772 -Once you've copied the CGI script into place, go into a web browser,
  24.773 -and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi},
  24.774 -\emph{but} brace yourself for instant failure.  There's a high
  24.775 -probability that trying to visit this URL will fail, and there are
  24.776 -many possible reasons for this.  In fact, you're likely to stumble
  24.777 -over almost every one of the possible errors below, so please read
  24.778 -carefully.  The following are all of the problems I ran into on a
  24.779 -system running Fedora~7, with a fresh installation of Apache, and a
  24.780 -user account that I created specially to perform this exercise.
  24.781 -
  24.782 -Your web server may have per-user directories disabled.  If you're
  24.783 -using Apache, search your config file for a \texttt{UserDir}
  24.784 -directive.  If there's none present, per-user directories will be
  24.785 -disabled.  If one exists, but its value is \texttt{disabled}, then
  24.786 -per-user directories will be disabled.  Otherwise, the string after
  24.787 -\texttt{UserDir} gives the name of the subdirectory that Apache will
  24.788 -look in under your home directory, for example \dirname{public\_html}.
  24.789 -
  24.790 -Your file access permissions may be too restrictive.  The web server
  24.791 -must be able to traverse your home directory and directories under
  24.792 -your \dirname{public\_html} directory, and read files under the latter
  24.793 -too.  Here's a quick recipe to help you to make your permissions more
  24.794 -appropriate.
  24.795 -\begin{codesample2}
  24.796 -  chmod 755 ~
  24.797 -  find ~/public_html -type d -print0 | xargs -0r chmod 755
  24.798 -  find ~/public_html -type f -print0 | xargs -0r chmod 644
  24.799 -\end{codesample2}
  24.800 -
  24.801 -The other possibility with permissions is that you might get a
  24.802 -completely empty window when you try to load the script.  In this
  24.803 -case, it's likely that your access permissions are \emph{too
  24.804 -  permissive}.  Apache's \texttt{suexec} subsystem won't execute a
  24.805 -script that's group-~or world-writable, for example.
  24.806 -
  24.807 -Your web server may be configured to disallow execution of CGI
  24.808 -programs in your per-user web directory.  Here's Apache's
  24.809 -default per-user configuration from my Fedora system.
  24.810 -\begin{codesample2}
  24.811 -  <Directory /home/*/public_html>
  24.812 -      AllowOverride FileInfo AuthConfig Limit
  24.813 -      Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
  24.814 -      <Limit GET POST OPTIONS>
  24.815 -          Order allow,deny
  24.816 -          Allow from all
  24.817 -      </Limit>
  24.818 -      <LimitExcept GET POST OPTIONS>
  24.819 -          Order deny,allow
  24.820 -          Deny from all
  24.821 -      </LimitExcept>
  24.822 -  </Directory>
  24.823 -\end{codesample2}
  24.824 -If you find a similar-looking \texttt{Directory} group in your Apache
  24.825 -configuration, the directive to look at inside it is \texttt{Options}.
  24.826 -Add \texttt{ExecCGI} to the end of this list if it's missing, and
  24.827 -restart the web server.
  24.828 -
  24.829 -If you find that Apache serves you the text of the CGI script instead
  24.830 -of executing it, you may need to either uncomment (if already present)
  24.831 -or add a directive like this.
  24.832 -\begin{codesample2}
  24.833 -  AddHandler cgi-script .cgi
  24.834 -\end{codesample2}
  24.835 -
  24.836 -The next possibility is that you might be served with a colourful
  24.837 -Python backtrace claiming that it can't import a
  24.838 -\texttt{mercurial}-related module.  This is actually progress!  The
  24.839 -server is now capable of executing your CGI script.  This error is
  24.840 -only likely to occur if you're running a private installation of
  24.841 -Mercurial, instead of a system-wide version.  Remember that the web
  24.842 -server runs the CGI program without any of the environment variables
  24.843 -that you take for granted in an interactive session.  If this error
  24.844 -happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the
  24.845 -directions inside it to correctly set your \envar{PYTHONPATH}
  24.846 -environment variable.
  24.847 -
  24.848 -Finally, you are \emph{certain} to by served with another colourful
  24.849 -Python backtrace: this one will complain that it can't find
  24.850 -\dirname{/path/to/repository}.  Edit your \sfilename{hgweb.cgi} script
  24.851 -and replace the \dirname{/path/to/repository} string with the complete
  24.852 -path to the repository you want to serve up.
  24.853 -
  24.854 -At this point, when you try to reload the page, you should be
  24.855 -presented with a nice HTML view of your repository's history.  Whew!
  24.856 -
  24.857 -\subsubsection{Configuring lighttpd}
  24.858 -
  24.859 -To be exhaustive in my experiments, I tried configuring the
  24.860 -increasingly popular \texttt{lighttpd} web server to serve the same
  24.861 -repository as I described with Apache above.  I had already overcome
  24.862 -all of the problems I outlined with Apache, many of which are not
  24.863 -server-specific.  As a result, I was fairly sure that my file and
  24.864 -directory permissions were good, and that my \sfilename{hgweb.cgi}
  24.865 -script was properly edited.
  24.866 -
  24.867 -Once I had Apache running, getting \texttt{lighttpd} to serve the
  24.868 -repository was a snap (in other words, even if you're trying to use
  24.869 -\texttt{lighttpd}, you should read the Apache section).  I first had
  24.870 -to edit the \texttt{mod\_access} section of its config file to enable
  24.871 -\texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were
  24.872 -disabled by default on my system.  I then added a few lines to the end
  24.873 -of the config file, to configure these modules.
  24.874 -\begin{codesample2}
  24.875 -  userdir.path = "public_html"
  24.876 -  cgi.assign = ( ".cgi" => "" )
  24.877 -\end{codesample2}
  24.878 -With this done, \texttt{lighttpd} ran immediately for me.  If I had
  24.879 -configured \texttt{lighttpd} before Apache, I'd almost certainly have
  24.880 -run into many of the same system-level configuration problems as I did
  24.881 -with Apache.  However, I found \texttt{lighttpd} to be noticeably
  24.882 -easier to configure than Apache, even though I've used Apache for over
  24.883 -a decade, and this was my first exposure to \texttt{lighttpd}.
  24.884 -
  24.885 -\subsection{Sharing multiple repositories with one CGI script}
  24.886 -
  24.887 -The \sfilename{hgweb.cgi} script only lets you publish a single
  24.888 -repository, which is an annoying restriction.  If you want to publish
  24.889 -more than one without wracking yourself with multiple copies of the
  24.890 -same script, each with different names, a better choice is to use the
  24.891 -\sfilename{hgwebdir.cgi} script.
  24.892 -
  24.893 -The procedure to configure \sfilename{hgwebdir.cgi} is only a little
  24.894 -more involved than for \sfilename{hgweb.cgi}.  First, you must obtain
  24.895 -a copy of the script.  If you don't have one handy, you can download a
  24.896 -copy from the master Mercurial repository at
  24.897 -\url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}.
  24.898 -
  24.899 -You'll need to copy this script into your \dirname{public\_html}
  24.900 -directory, and ensure that it's executable.
  24.901 -\begin{codesample2}
  24.902 -  cp .../hgwebdir.cgi ~/public_html
  24.903 -  chmod 755 ~/public_html ~/public_html/hgwebdir.cgi
  24.904 -\end{codesample2}
  24.905 -With basic configuration out of the way, try to visit
  24.906 -\url{http://myhostname/~myuser/hgwebdir.cgi} in your browser.  It
  24.907 -should display an empty list of repositories.  If you get a blank
  24.908 -window or error message, try walking through the list of potential
  24.909 -problems in section~\ref{sec:collab:wtf}.
  24.910 -
  24.911 -The \sfilename{hgwebdir.cgi} script relies on an external
  24.912 -configuration file.  By default, it searches for a file named
  24.913 -\sfilename{hgweb.config} in the same directory as itself.  You'll need
  24.914 -to create this file, and make it world-readable.  The format of the
  24.915 -file is similar to a Windows ``ini'' file, as understood by Python's
  24.916 -\texttt{ConfigParser}~\cite{web:configparser} module.
  24.917 -
  24.918 -The easiest way to configure \sfilename{hgwebdir.cgi} is with a
  24.919 -section named \texttt{collections}.  This will automatically publish
  24.920 -\emph{every} repository under the directories you name.  The section
  24.921 -should look like this:
  24.922 -\begin{codesample2}
  24.923 -  [collections]
  24.924 -  /my/root = /my/root
  24.925 -\end{codesample2}
  24.926 -Mercurial interprets this by looking at the directory name on the
  24.927 -\emph{right} hand side of the ``\texttt{=}'' sign; finding
  24.928 -repositories in that directory hierarchy; and using the text on the
  24.929 -\emph{left} to strip off matching text from the names it will actually
  24.930 -list in the web interface.  The remaining component of a path after
  24.931 -this stripping has occurred is called a ``virtual path''.
  24.932 -
  24.933 -Given the example above, if we have a repository whose local path is
  24.934 -\dirname{/my/root/this/repo}, the CGI script will strip the leading
  24.935 -\dirname{/my/root} from the name, and publish the repository with a
  24.936 -virtual path of \dirname{this/repo}.  If the base URL for our CGI
  24.937 -script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete
  24.938 -URL for that repository will be
  24.939 -\url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}.
  24.940 -
  24.941 -If we replace \dirname{/my/root} on the left hand side of this example
  24.942 -with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off
  24.943 -\dirname{/my} from the repository name, and will give us a virtual
  24.944 -path of \dirname{root/this/repo} instead of \dirname{this/repo}.
  24.945 -
  24.946 -The \sfilename{hgwebdir.cgi} script will recursively search each
  24.947 -directory listed in the \texttt{collections} section of its
  24.948 -configuration file, but it will \texttt{not} recurse into the
  24.949 -repositories it finds.
  24.950 -
  24.951 -The \texttt{collections} mechanism makes it easy to publish many
  24.952 -repositories in a ``fire and forget'' manner.  You only need to set up
  24.953 -the CGI script and configuration file one time.  Afterwards, you can
  24.954 -publish or unpublish a repository at any time by simply moving it
  24.955 -into, or out of, the directory hierarchy in which you've configured
  24.956 -\sfilename{hgwebdir.cgi} to look.
  24.957 -
  24.958 -\subsubsection{Explicitly specifying which repositories to publish}
  24.959 -
  24.960 -In addition to the \texttt{collections} mechanism, the
  24.961 -\sfilename{hgwebdir.cgi} script allows you to publish a specific list
  24.962 -of repositories.  To do so, create a \texttt{paths} section, with
  24.963 -contents of the following form.
  24.964 -\begin{codesample2}
  24.965 -  [paths]
  24.966 -  repo1 = /my/path/to/some/repo
  24.967 -  repo2 = /some/path/to/another
  24.968 -\end{codesample2}
  24.969 -In this case, the virtual path (the component that will appear in a
  24.970 -URL) is on the left hand side of each definition, while the path to
  24.971 -the repository is on the right.  Notice that there does not need to be
  24.972 -any relationship between the virtual path you choose and the location
  24.973 -of a repository in your filesystem.
  24.974 -
  24.975 -If you wish, you can use both the \texttt{collections} and
  24.976 -\texttt{paths} mechanisms simultaneously in a single configuration
  24.977 -file.
  24.978 -
  24.979 -\begin{note}
  24.980 -  If multiple repositories have the same virtual path,
  24.981 -  \sfilename{hgwebdir.cgi} will not report an error.  Instead, it will
  24.982 -  behave unpredictably.
  24.983 -\end{note}
  24.984 -
  24.985 -\subsection{Downloading source archives}
  24.986 -
  24.987 -Mercurial's web interface lets users download an archive of any
  24.988 -revision.  This archive will contain a snapshot of the working
  24.989 -directory as of that revision, but it will not contain a copy of the
  24.990 -repository data.
  24.991 -
  24.992 -By default, this feature is not enabled.  To enable it, you'll need to
  24.993 -add an \rcitem{web}{allow\_archive} item to the \rcsection{web}
  24.994 -section of your \hgrc.
  24.995 -
  24.996 -\subsection{Web configuration options}
  24.997 -
  24.998 -Mercurial's web interfaces (the \hgcmd{serve} command, and the
  24.999 -\sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a
 24.1000 -number of configuration options that you can set.  These belong in a
 24.1001 -section named \rcsection{web}.
 24.1002 -\begin{itemize}
 24.1003 -\item[\rcitem{web}{allow\_archive}] Determines which (if any) archive
 24.1004 -  download mechanisms Mercurial supports.  If you enable this
 24.1005 -  feature, users of the web interface will be able to download an
 24.1006 -  archive of whatever revision of a repository they are viewing.
 24.1007 -  To enable the archive feature, this item must take the form of a
 24.1008 -  sequence of words drawn from the list below.
 24.1009 -  \begin{itemize}
 24.1010 -  \item[\texttt{bz2}] A \command{tar} archive, compressed using
 24.1011 -    \texttt{bzip2} compression.  This has the best compression ratio,
 24.1012 -    but uses the most CPU time on the server.
 24.1013 -  \item[\texttt{gz}] A \command{tar} archive, compressed using
 24.1014 -    \texttt{gzip} compression.
 24.1015 -  \item[\texttt{zip}] A \command{zip} archive, compressed using LZW
 24.1016 -    compression.  This format has the worst compression ratio, but is
 24.1017 -    widely used in the Windows world.
 24.1018 -  \end{itemize}
 24.1019 -  If you provide an empty list, or don't have an
 24.1020 -  \rcitem{web}{allow\_archive} entry at all, this feature will be
 24.1021 -  disabled.  Here is an example of how to enable all three supported
 24.1022 -  formats.
 24.1023 -  \begin{codesample4}
 24.1024 -    [web]
 24.1025 -    allow_archive = bz2 gz zip
 24.1026 -  \end{codesample4}
 24.1027 -\item[\rcitem{web}{allowpull}] Boolean.  Determines whether the web
 24.1028 -  interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this
 24.1029 -  repository over~HTTP.  If set to \texttt{no} or \texttt{false}, only
 24.1030 -  the ``human-oriented'' portion of the web interface is available.
 24.1031 -\item[\rcitem{web}{contact}] String.  A free-form (but preferably
 24.1032 -  brief) string identifying the person or group in charge of the
 24.1033 -  repository.  This often contains the name and email address of a
 24.1034 -  person or mailing list.  It often makes sense to place this entry in
 24.1035 -  a repository's own \sfilename{.hg/hgrc} file, but it can make sense
 24.1036 -  to use in a global \hgrc\ if every repository has a single
 24.1037 -  maintainer.
 24.1038 -\item[\rcitem{web}{maxchanges}] Integer.  The default maximum number
 24.1039 -  of changesets to display in a single page of output.
 24.1040 -\item[\rcitem{web}{maxfiles}] Integer.  The default maximum number
 24.1041 -  of modified files to display in a single page of output.
 24.1042 -\item[\rcitem{web}{stripes}] Integer.  If the web interface displays
 24.1043 -  alternating ``stripes'' to make it easier to visually align rows
 24.1044 -  when you are looking at a table, this number controls the number of
 24.1045 -  rows in each stripe.
 24.1046 -\item[\rcitem{web}{style}] Controls the template Mercurial uses to
 24.1047 -  display the web interface.  Mercurial ships with two web templates,
 24.1048 -  named \texttt{default} and \texttt{gitweb} (the latter is much more
 24.1049 -  visually attractive).  You can also specify a custom template of
 24.1050 -  your own; see chapter~\ref{chap:template} for details.  Here, you
 24.1051 -  can see how to enable the \texttt{gitweb} style.
 24.1052 -  \begin{codesample4}
 24.1053 -    [web]
 24.1054 -    style = gitweb
 24.1055 -  \end{codesample4}
 24.1056 -\item[\rcitem{web}{templates}] Path.  The directory in which to search
 24.1057 -  for template files.  By default, Mercurial searches in the directory
 24.1058 -  in which it was installed.
 24.1059 -\end{itemize}
 24.1060 -If you are using \sfilename{hgwebdir.cgi}, you can place a few
 24.1061 -configuration items in a \rcsection{web} section of the
 24.1062 -\sfilename{hgweb.config} file instead of a \hgrc\ file, for
 24.1063 -convenience.  These items are \rcitem{web}{motd} and
 24.1064 -\rcitem{web}{style}.
 24.1065 -
 24.1066 -\subsubsection{Options specific to an individual repository}
 24.1067 -
 24.1068 -A few \rcsection{web} configuration items ought to be placed in a
 24.1069 -repository's local \sfilename{.hg/hgrc}, rather than a user's or
 24.1070 -global \hgrc.
 24.1071 -\begin{itemize}
 24.1072 -\item[\rcitem{web}{description}] String.  A free-form (but preferably
 24.1073 -  brief) string that describes the contents or purpose of the
 24.1074 -  repository.
 24.1075 -\item[\rcitem{web}{name}] String.  The name to use for the repository
 24.1076 -  in the web interface.  This overrides the default name, which is the
 24.1077 -  last component of the repository's path.
 24.1078 -\end{itemize}
 24.1079 -
 24.1080 -\subsubsection{Options specific to the \hgcmd{serve} command}
 24.1081 -
 24.1082 -Some of the items in the \rcsection{web} section of a \hgrc\ file are
 24.1083 -only for use with the \hgcmd{serve} command.
 24.1084 -\begin{itemize}
 24.1085 -\item[\rcitem{web}{accesslog}] Path.  The name of a file into which to
 24.1086 -  write an access log.  By default, the \hgcmd{serve} command writes
 24.1087 -  this information to standard output, not to a file.  Log entries are
 24.1088 -  written in the standard ``combined'' file format used by almost all
 24.1089 -  web servers.
 24.1090 -\item[\rcitem{web}{address}] String.  The local address on which the
 24.1091 -  server should listen for incoming connections.  By default, the
 24.1092 -  server listens on all addresses.
 24.1093 -\item[\rcitem{web}{errorlog}] Path.  The name of a file into which to
 24.1094 -  write an error log.  By default, the \hgcmd{serve} command writes this
 24.1095 -  information to standard error, not to a file.
 24.1096 -\item[\rcitem{web}{ipv6}] Boolean.  Whether to use the IPv6 protocol.
 24.1097 -  By default, IPv6 is not used. 
 24.1098 -\item[\rcitem{web}{port}] Integer.  The TCP~port number on which the
 24.1099 -  server should listen.  The default port number used is~8000.
 24.1100 -\end{itemize}
 24.1101 -
 24.1102 -\subsubsection{Choosing the right \hgrc\ file to add \rcsection{web}
 24.1103 -  items to}
 24.1104 -
 24.1105 -It is important to remember that a web server like Apache or
 24.1106 -\texttt{lighttpd} will run under a user~ID that is different to yours.
 24.1107 -CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will
 24.1108 -usually also run under that user~ID.
 24.1109 -
 24.1110 -If you add \rcsection{web} items to your own personal \hgrc\ file, CGI
 24.1111 -scripts won't read that \hgrc\ file.  Those settings will thus only
 24.1112 -affect the behaviour of the \hgcmd{serve} command when you run it.  To
 24.1113 -cause CGI scripts to see your settings, either create a \hgrc\ file in
 24.1114 -the home directory of the user ID that runs your web server, or add
 24.1115 -those settings to a system-wide \hgrc\ file.
 24.1116 -
 24.1117 -
 24.1118 -%%% Local Variables: 
 24.1119 -%%% mode: latex
 24.1120 -%%% TeX-master: "00book"
 24.1121 -%%% End: 

    25.1 --- a/en/concepts.tex	Thu Jan 29 22:47:34 2009 -0800
    25.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    25.3 @@ -1,577 +0,0 @@
    25.4 -\chapter{Behind the scenes}
    25.5 -\label{chap:concepts}
    25.6 -
    25.7 -Unlike many revision control systems, the concepts upon which
    25.8 -Mercurial is built are simple enough that it's easy to understand how
    25.9 -the software really works.  Knowing this certainly isn't necessary,
   25.10 -but I find it useful to have a ``mental model'' of what's going on.
   25.11 -
   25.12 -This understanding gives me confidence that Mercurial has been
   25.13 -carefully designed to be both \emph{safe} and \emph{efficient}.  And
   25.14 -just as importantly, if it's easy for me to retain a good idea of what
   25.15 -the software is doing when I perform a revision control task, I'm less
   25.16 -likely to be surprised by its behaviour.
   25.17 -
   25.18 -In this chapter, we'll initially cover the core concepts behind
   25.19 -Mercurial's design, then continue to discuss some of the interesting
   25.20 -details of its implementation.
   25.21 -
   25.22 -\section{Mercurial's historical record}
   25.23 -
   25.24 -\subsection{Tracking the history of a single file}
   25.25 -
   25.26 -When Mercurial tracks modifications to a file, it stores the history
   25.27 -of that file in a metadata object called a \emph{filelog}.  Each entry
   25.28 -in the filelog contains enough information to reconstruct one revision
   25.29 -of the file that is being tracked.  Filelogs are stored as files in
   25.30 -the \sdirname{.hg/store/data} directory.  A filelog contains two kinds
   25.31 -of information: revision data, and an index to help Mercurial to find
   25.32 -a revision efficiently.
   25.33 -
   25.34 -A file that is large, or has a lot of history, has its filelog stored
   25.35 -in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}''
   25.36 -suffix) files.  For small files without much history, the revision
   25.37 -data and index are combined in a single ``\texttt{.i}'' file.  The
   25.38 -correspondence between a file in the working directory and the filelog
   25.39 -that tracks its history in the repository is illustrated in
   25.40 -figure~\ref{fig:concepts:filelog}.
   25.41 -
   25.42 -\begin{figure}[ht]
   25.43 -  \centering
   25.44 -  \grafix{filelog}
   25.45 -  \caption{Relationships between files in working directory and
   25.46 -    filelogs in repository}
   25.47 -  \label{fig:concepts:filelog}
   25.48 -\end{figure}
   25.49 -
   25.50 -\subsection{Managing tracked files}
   25.51 -
   25.52 -Mercurial uses a structure called a \emph{manifest} to collect
   25.53 -together information about the files that it tracks.  Each entry in
   25.54 -the manifest contains information about the files present in a single
   25.55 -changeset.  An entry records which files are present in the changeset,
   25.56 -the revision of each file, and a few other pieces of file metadata.
   25.57 -
   25.58 -\subsection{Recording changeset information}
   25.59 -
   25.60 -The \emph{changelog} contains information about each changeset.  Each
   25.61 -revision records who committed a change, the changeset comment, other
   25.62 -pieces of changeset-related information, and the revision of the
   25.63 -manifest to use.
   25.64 -
   25.65 -\subsection{Relationships between revisions}
   25.66 -
   25.67 -Within a changelog, a manifest, or a filelog, each revision stores a
   25.68 -pointer to its immediate parent (or to its two parents, if it's a
   25.69 -merge revision).  As I mentioned above, there are also relationships
   25.70 -between revisions \emph{across} these structures, and they are
   25.71 -hierarchical in nature.
   25.72 -
   25.73 -For every changeset in a repository, there is exactly one revision
   25.74 -stored in the changelog.  Each revision of the changelog contains a
   25.75 -pointer to a single revision of the manifest.  A revision of the
   25.76 -manifest stores a pointer to a single revision of each filelog tracked
   25.77 -when that changeset was created.  These relationships are illustrated
   25.78 -in figure~\ref{fig:concepts:metadata}.
   25.79 -
   25.80 -\begin{figure}[ht]
   25.81 -  \centering
   25.82 -  \grafix{metadata}
   25.83 -  \caption{Metadata relationships}
   25.84 -  \label{fig:concepts:metadata}
   25.85 -\end{figure}
   25.86 -
   25.87 -As the illustration shows, there is \emph{not} a ``one to one''
   25.88 -relationship between revisions in the changelog, manifest, or filelog.
   25.89 -If the manifest hasn't changed between two changesets, the changelog
   25.90 -entries for those changesets will point to the same revision of the
   25.91 -manifest.  If a file that Mercurial tracks hasn't changed between two
   25.92 -changesets, the entry for that file in the two revisions of the
   25.93 -manifest will point to the same revision of its filelog.
   25.94 -
   25.95 -\section{Safe, efficient storage}
   25.96 -
   25.97 -The underpinnings of changelogs, manifests, and filelogs are provided
   25.98 -by a single structure called the \emph{revlog}.
   25.99 -
  25.100 -\subsection{Efficient storage}
  25.101 -
  25.102 -The revlog provides efficient storage of revisions using a
  25.103 -\emph{delta} mechanism.  Instead of storing a complete copy of a file
  25.104 -for each revision, it stores the changes needed to transform an older
  25.105 -revision into the new revision.  For many kinds of file data, these
  25.106 -deltas are typically a fraction of a percent of the size of a full
  25.107 -copy of a file.
  25.108 -
  25.109 -Some obsolete revision control systems can only work with deltas of
  25.110 -text files.  They must either store binary files as complete snapshots
  25.111 -or encoded into a text representation, both of which are wasteful
  25.112 -approaches.  Mercurial can efficiently handle deltas of files with
  25.113 -arbitrary binary contents; it doesn't need to treat text as special.
  25.114 -
  25.115 -\subsection{Safe operation}
  25.116 -\label{sec:concepts:txn}
  25.117 -
  25.118 -Mercurial only ever \emph{appends} data to the end of a revlog file.
  25.119 -It never modifies a section of a file after it has written it.  This
  25.120 -is both more robust and efficient than schemes that need to modify or
  25.121 -rewrite data.
  25.122 -
  25.123 -In addition, Mercurial treats every write as part of a
  25.124 -\emph{transaction} that can span a number of files.  A transaction is
  25.125 -\emph{atomic}: either the entire transaction succeeds and its effects
  25.126 -are all visible to readers in one go, or the whole thing is undone.
  25.127 -This guarantee of atomicity means that if you're running two copies of
  25.128 -Mercurial, where one is reading data and one is writing it, the reader
  25.129 -will never see a partially written result that might confuse it.
  25.130 -
  25.131 -The fact that Mercurial only appends to files makes it easier to
  25.132 -provide this transactional guarantee.  The easier it is to do stuff
  25.133 -like this, the more confident you should be that it's done correctly.
  25.134 -
  25.135 -\subsection{Fast retrieval}
  25.136 -
  25.137 -Mercurial cleverly avoids a pitfall common to all earlier
  25.138 -revision control systems: the problem of \emph{inefficient retrieval}.
  25.139 -Most revision control systems store the contents of a revision as an
  25.140 -incremental series of modifications against a ``snapshot''.  To
  25.141 -reconstruct a specific revision, you must first read the snapshot, and
  25.142 -then every one of the revisions between the snapshot and your target
  25.143 -revision.  The more history that a file accumulates, the more
  25.144 -revisions you must read, hence the longer it takes to reconstruct a
  25.145 -particular revision.
  25.146 -
  25.147 -\begin{figure}[ht]
  25.148 -  \centering
  25.149 -  \grafix{snapshot}
  25.150 -  \caption{Snapshot of a revlog, with incremental deltas}
  25.151 -  \label{fig:concepts:snapshot}
  25.152 -\end{figure}
  25.153 -
  25.154 -The innovation that Mercurial applies to this problem is simple but
  25.155 -effective.  Once the cumulative amount of delta information stored
  25.156 -since the last snapshot exceeds a fixed threshold, it stores a new
  25.157 -snapshot (compressed, of course), instead of another delta.  This
  25.158 -makes it possible to reconstruct \emph{any} revision of a file
  25.159 -quickly.  This approach works so well that it has since been copied by
  25.160 -several other revision control systems.
  25.161 -
  25.162 -Figure~\ref{fig:concepts:snapshot} illustrates the idea.  In an entry
  25.163 -in a revlog's index file, Mercurial stores the range of entries from
  25.164 -the data file that it must read to reconstruct a particular revision.
  25.165 -
  25.166 -\subsubsection{Aside: the influence of video compression}
  25.167 -
  25.168 -If you're familiar with video compression or have ever watched a TV
  25.169 -feed through a digital cable or satellite service, you may know that
  25.170 -most video compression schemes store each frame of video as a delta
  25.171 -against its predecessor frame.  In addition, these schemes use
  25.172 -``lossy'' compression techniques to increase the compression ratio, so
  25.173 -visual errors accumulate over the course of a number of inter-frame
  25.174 -deltas.
  25.175 -
  25.176 -Because it's possible for a video stream to ``drop out'' occasionally
  25.177 -due to signal glitches, and to limit the accumulation of artefacts
  25.178 -introduced by the lossy compression process, video encoders
  25.179 -periodically insert a complete frame (called a ``key frame'') into the
  25.180 -video stream; the next delta is generated against that frame.  This
  25.181 -means that if the video signal gets interrupted, it will resume once
  25.182 -the next key frame is received.  Also, the accumulation of encoding
  25.183 -errors restarts anew with each key frame.
  25.184 -
  25.185 -\subsection{Identification and strong integrity}
  25.186 -
  25.187 -Along with delta or snapshot information, a revlog entry contains a
  25.188 -cryptographic hash of the data that it represents.  This makes it
  25.189 -difficult to forge the contents of a revision, and easy to detect
  25.190 -accidental corruption.  
  25.191 -
  25.192 -Hashes provide more than a mere check against corruption; they are
  25.193 -used as the identifiers for revisions.  The changeset identification
  25.194 -hashes that you see as an end user are from revisions of the
  25.195 -changelog.  Although filelogs and the manifest also use hashes,
  25.196 -Mercurial only uses these behind the scenes.
  25.197 -
  25.198 -Mercurial verifies that hashes are correct when it retrieves file
  25.199 -revisions and when it pulls changes from another repository.  If it
  25.200 -encounters an integrity problem, it will complain and stop whatever
  25.201 -it's doing.
  25.202 -
  25.203 -In addition to the effect it has on retrieval efficiency, Mercurial's
  25.204 -use of periodic snapshots makes it more robust against partial data
  25.205 -corruption.  If a revlog becomes partly corrupted due to a hardware
  25.206 -error or system bug, it's often possible to reconstruct some or most
  25.207 -revisions from the uncorrupted sections of the revlog, both before and
  25.208 -after the corrupted section.  This would not be possible with a
  25.209 -delta-only storage model.
  25.210 -
  25.211 -\section{Revision history, branching,
  25.212 -  and merging}
  25.213 -
  25.214 -Every entry in a Mercurial revlog knows the identity of its immediate
  25.215 -ancestor revision, usually referred to as its \emph{parent}.  In fact,
  25.216 -a revision contains room for not one parent, but two.  Mercurial uses
  25.217 -a special hash, called the ``null ID'', to represent the idea ``there
  25.218 -is no parent here''.  This hash is simply a string of zeroes.
  25.219 -
  25.220 -In figure~\ref{fig:concepts:revlog}, you can see an example of the
  25.221 -conceptual structure of a revlog.  Filelogs, manifests, and changelogs
  25.222 -all have this same structure; they differ only in the kind of data
  25.223 -stored in each delta or snapshot.
  25.224 -
  25.225 -The first revision in a revlog (at the bottom of the image) has the
  25.226 -null ID in both of its parent slots.  For a ``normal'' revision, its
  25.227 -first parent slot contains the ID of its parent revision, and its
  25.228 -second contains the null ID, indicating that the revision has only one
  25.229 -real parent.  Any two revisions that have the same parent ID are
  25.230 -branches.  A revision that represents a merge between branches has two
  25.231 -normal revision IDs in its parent slots.
  25.232 -
  25.233 -\begin{figure}[ht]
  25.234 -  \centering
  25.235 -  \grafix{revlog}
  25.236 -  \caption{}
  25.237 -  \label{fig:concepts:revlog}
  25.238 -\end{figure}
  25.239 -
  25.240 -\section{The working directory}
  25.241 -
  25.242 -In the working directory, Mercurial stores a snapshot of the files
  25.243 -from the repository as of a particular changeset.
  25.244 -
  25.245 -The working directory ``knows'' which changeset it contains.  When you
  25.246 -update the working directory to contain a particular changeset,
  25.247 -Mercurial looks up the appropriate revision of the manifest to find
  25.248 -out which files it was tracking at the time that changeset was
  25.249 -committed, and which revision of each file was then current.  It then
  25.250 -recreates a copy of each of those files, with the same contents it had
  25.251 -when the changeset was committed.
  25.252 -
  25.253 -The \emph{dirstate} contains Mercurial's knowledge of the working
  25.254 -directory.  This details which changeset the working directory is
  25.255 -updated to, and all of the files that Mercurial is tracking in the
  25.256 -working directory.
  25.257 -
  25.258 -Just as a revision of a revlog has room for two parents, so that it
  25.259 -can represent either a normal revision (with one parent) or a merge of
  25.260 -two earlier revisions, the dirstate has slots for two parents.  When
  25.261 -you use the \hgcmd{update} command, the changeset that you update to
  25.262 -is stored in the ``first parent'' slot, and the null ID in the second.
  25.263 -When you \hgcmd{merge} with another changeset, the first parent
  25.264 -remains unchanged, and the second parent is filled in with the
  25.265 -changeset you're merging with.  The \hgcmd{parents} command tells you
  25.266 -what the parents of the dirstate are.
  25.267 -
  25.268 -\subsection{What happens when you commit}
  25.269 -
  25.270 -The dirstate stores parent information for more than just book-keeping
  25.271 -purposes.  Mercurial uses the parents of the dirstate as \emph{the
  25.272 -  parents of a new changeset} when you perform a commit.
  25.273 -
  25.274 -\begin{figure}[ht]
  25.275 -  \centering
  25.276 -  \grafix{wdir}
  25.277 -  \caption{The working directory can have two parents}
  25.278 -  \label{fig:concepts:wdir}
  25.279 -\end{figure}
  25.280 -
  25.281 -Figure~\ref{fig:concepts:wdir} shows the normal state of the working
  25.282 -directory, where it has a single changeset as parent.  That changeset
  25.283 -is the \emph{tip}, the newest changeset in the repository that has no
  25.284 -children.
  25.285 -
  25.286 -\begin{figure}[ht]
  25.287 -  \centering
  25.288 -  \grafix{wdir-after-commit}
  25.289 -  \caption{The working directory gains new parents after a commit}
  25.290 -  \label{fig:concepts:wdir-after-commit}
  25.291 -\end{figure}
  25.292 -
  25.293 -It's useful to think of the working directory as ``the changeset I'm
  25.294 -about to commit''.  Any files that you tell Mercurial that you've
  25.295 -added, removed, renamed, or copied will be reflected in that
  25.296 -changeset, as will modifications to any files that Mercurial is
  25.297 -already tracking; the new changeset will have the parents of the
  25.298 -working directory as its parents.
  25.299 -
  25.300 -After a commit, Mercurial will update the parents of the working
  25.301 -directory, so that the first parent is the ID of the new changeset,
  25.302 -and the second is the null ID.  This is shown in
  25.303 -figure~\ref{fig:concepts:wdir-after-commit}.  Mercurial doesn't touch
  25.304 -any of the files in the working directory when you commit; it just
  25.305 -modifies the dirstate to note its new parents.
  25.306 -
  25.307 -\subsection{Creating a new head}
  25.308 -
  25.309 -It's perfectly normal to update the working directory to a changeset
  25.310 -other than the current tip.  For example, you might want to know what
  25.311 -your project looked like last Tuesday, or you could be looking through
  25.312 -changesets to see which one introduced a bug.  In cases like this, the
  25.313 -natural thing to do is update the working directory to the changeset
  25.314 -you're interested in, and then examine the files in the working
  25.315 -directory directly to see their contents as they were when you
  25.316 -committed that changeset.  The effect of this is shown in
  25.317 -figure~\ref{fig:concepts:wdir-pre-branch}.
  25.318 -
  25.319 -\begin{figure}[ht]
  25.320 -  \centering
  25.321 -  \grafix{wdir-pre-branch}
  25.322 -  \caption{The working directory, updated to an older changeset}
  25.323 -  \label{fig:concepts:wdir-pre-branch}
  25.324 -\end{figure}
  25.325 -
  25.326 -Having updated the working directory to an older changeset, what
  25.327 -happens if you make some changes, and then commit?  Mercurial behaves
  25.328 -in the same way as I outlined above.  The parents of the working
  25.329 -directory become the parents of the new changeset.  This new changeset
  25.330 -has no children, so it becomes the new tip.  And the repository now
  25.331 -contains two changesets that have no children; we call these
  25.332 -\emph{heads}.  You can see the structure that this creates in
  25.333 -figure~\ref{fig:concepts:wdir-branch}.
  25.334 -
  25.335 -\begin{figure}[ht]
  25.336 -  \centering
  25.337 -  \grafix{wdir-branch}
  25.338 -  \caption{After a commit made while synced to an older changeset}
  25.339 -  \label{fig:concepts:wdir-branch}
  25.340 -\end{figure}
  25.341 -
  25.342 -\begin{note}
  25.343 -  If you're new to Mercurial, you should keep in mind a common
  25.344 -  ``error'', which is to use the \hgcmd{pull} command without any
  25.345 -  options.  By default, the \hgcmd{pull} command \emph{does not}
  25.346 -  update the working directory, so you'll bring new changesets into
  25.347 -  your repository, but the working directory will stay synced at the
  25.348 -  same changeset as before the pull.  If you make some changes and
  25.349 -  commit afterwards, you'll thus create a new head, because your
  25.350 -  working directory isn't synced to whatever the current tip is.
  25.351 -
  25.352 -  I put the word ``error'' in quotes because all that you need to do
  25.353 -  to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}.  In
  25.354 -  other words, this almost never has negative consequences; it just
  25.355 -  surprises people.  I'll discuss other ways to avoid this behaviour,
  25.356 -  and why Mercurial behaves in this initially surprising way, later
  25.357 -  on.
  25.358 -\end{note}
  25.359 -
  25.360 -\subsection{Merging heads}
  25.361 -
  25.362 -When you run the \hgcmd{merge} command, Mercurial leaves the first
  25.363 -parent of the working directory unchanged, and sets the second parent
  25.364 -to the changeset you're merging with, as shown in
  25.365 -figure~\ref{fig:concepts:wdir-merge}.
  25.366 -
  25.367 -\begin{figure}[ht]
  25.368 -  \centering
  25.369 -  \grafix{wdir-merge}
  25.370 -  \caption{Merging two heads}
  25.371 -  \label{fig:concepts:wdir-merge}
  25.372 -\end{figure}
  25.373 -
  25.374 -Mercurial also has to modify the working directory, to merge the files
  25.375 -managed in the two changesets.  Simplified a little, the merging
  25.376 -process goes like this, for every file in the manifests of both
  25.377 -changesets.
  25.378 -\begin{itemize}
  25.379 -\item If neither changeset has modified a file, do nothing with that
  25.380 -  file.
  25.381 -\item If one changeset has modified a file, and the other hasn't,
  25.382 -  create the modified copy of the file in the working directory.
  25.383 -\item If one changeset has removed a file, and the other hasn't (or
  25.384 -  has also deleted it), delete the file from the working directory.
  25.385 -\item If one changeset has removed a file, but the other has modified
  25.386 -  the file, ask the user what to do: keep the modified file, or remove
  25.387 -  it?
  25.388 -\item If both changesets have modified a file, invoke an external
  25.389 -  merge program to choose the new contents for the merged file.  This
  25.390 -  may require input from the user.
  25.391 -\item If one changeset has modified a file, and the other has renamed
  25.392 -  or copied the file, make sure that the changes follow the new name
  25.393 -  of the file.
  25.394 -\end{itemize}
  25.395 -There are more details---merging has plenty of corner cases---but
  25.396 -these are the most common choices that are involved in a merge.  As
  25.397 -you can see, most cases are completely automatic, and indeed most
  25.398 -merges finish automatically, without requiring your input to resolve
  25.399 -any conflicts.
  25.400 -
  25.401 -When you're thinking about what happens when you commit after a merge,
  25.402 -once again the working directory is ``the changeset I'm about to
  25.403 -commit''.  After the \hgcmd{merge} command completes, the working
  25.404 -directory has two parents; these will become the parents of the new
  25.405 -changeset.
  25.406 -
  25.407 -Mercurial lets you perform multiple merges, but you must commit the
  25.408 -results of each individual merge as you go.  This is necessary because
  25.409 -Mercurial only tracks two parents for both revisions and the working
  25.410 -directory.  While it would be technically possible to merge multiple
  25.411 -changesets at once, the prospect of user confusion and making a
  25.412 -terrible mess of a merge immediately becomes overwhelming.
  25.413 -
  25.414 -\section{Other interesting design features}
  25.415 -
  25.416 -In the sections above, I've tried to highlight some of the most
  25.417 -important aspects of Mercurial's design, to illustrate that it pays
  25.418 -careful attention to reliability and performance.  However, the
  25.419 -attention to detail doesn't stop there.  There are a number of other
  25.420 -aspects of Mercurial's construction that I personally find
  25.421 -interesting.  I'll detail a few of them here, separate from the ``big
  25.422 -ticket'' items above, so that if you're interested, you can gain a
  25.423 -better idea of the amount of thinking that goes into a well-designed
  25.424 -system.
  25.425 -
  25.426 -\subsection{Clever compression}
  25.427 -
  25.428 -When appropriate, Mercurial will store both snapshots and deltas in
  25.429 -compressed form.  It does this by always \emph{trying to} compress a
  25.430 -snapshot or delta, but only storing the compressed version if it's
  25.431 -smaller than the uncompressed version.
  25.432 -
  25.433 -This means that Mercurial does ``the right thing'' when storing a file
  25.434 -whose native form is compressed, such as a \texttt{zip} archive or a
  25.435 -JPEG image.  When these types of files are compressed a second time,
  25.436 -the resulting file is usually bigger than the once-compressed form,
  25.437 -and so Mercurial will store the plain \texttt{zip} or JPEG.
  25.438 -
  25.439 -Deltas between revisions of a compressed file are usually larger than
  25.440 -snapshots of the file, and Mercurial again does ``the right thing'' in
  25.441 -these cases.  It finds that such a delta exceeds the threshold at
  25.442 -which it should store a complete snapshot of the file, so it stores
  25.443 -the snapshot, again saving space compared to a naive delta-only
  25.444 -approach.
  25.445 -
  25.446 -\subsubsection{Network recompression}
  25.447 -
  25.448 -When storing revisions on disk, Mercurial uses the ``deflate''
  25.449 -compression algorithm (the same one used by the popular \texttt{zip}
  25.450 -archive format), which balances good speed with a respectable
  25.451 -compression ratio.  However, when transmitting revision data over a
  25.452 -network connection, Mercurial uncompresses the compressed revision
  25.453 -data.
  25.454 -
  25.455 -If the connection is over HTTP, Mercurial recompresses the entire
  25.456 -stream of data using a compression algorithm that gives a better
  25.457 -compression ratio (the Burrows-Wheeler algorithm from the widely used
  25.458 -\texttt{bzip2} compression package).  This combination of algorithm
  25.459 -and compression of the entire stream (instead of a revision at a time)
  25.460 -substantially reduces the number of bytes to be transferred, yielding
  25.461 -better network performance over almost all kinds of network.
  25.462 -
  25.463 -(If the connection is over \command{ssh}, Mercurial \emph{doesn't}
  25.464 -recompress the stream, because \command{ssh} can already do this
  25.465 -itself.)
  25.466 -
  25.467 -\subsection{Read/write ordering and atomicity}
  25.468 -
  25.469 -Appending to files isn't the whole story when it comes to guaranteeing
  25.470 -that a reader won't see a partial write.  If you recall
  25.471 -figure~\ref{fig:concepts:metadata}, revisions in the changelog point to
  25.472 -revisions in the manifest, and revisions in the manifest point to
  25.473 -revisions in filelogs.  This hierarchy is deliberate.
  25.474 -
  25.475 -A writer starts a transaction by writing filelog and manifest data,
  25.476 -and doesn't write any changelog data until those are finished.  A
  25.477 -reader starts by reading changelog data, then manifest data, followed
  25.478 -by filelog data.
  25.479 -
  25.480 -Since the writer has always finished writing filelog and manifest data
  25.481 -before it writes to the changelog, a reader will never read a pointer
  25.482 -to a partially written manifest revision from the changelog, and it will
  25.483 -never read a pointer to a partially written filelog revision from the
  25.484 -manifest.
  25.485 -
  25.486 -\subsection{Concurrent access}
  25.487 -
  25.488 -The read/write ordering and atomicity guarantees mean that Mercurial
  25.489 -never needs to \emph{lock} a repository when it's reading data, even
  25.490 -if the repository is being written to while the read is occurring.
  25.491 -This has a big effect on scalability; you can have an arbitrary number
  25.492 -of Mercurial processes safely reading data from a repository safely
  25.493 -all at once, no matter whether it's being written to or not.
  25.494 -
  25.495 -The lockless nature of reading means that if you're sharing a
  25.496 -repository on a multi-user system, you don't need to grant other local
  25.497 -users permission to \emph{write} to your repository in order for them
  25.498 -to be able to clone it or pull changes from it; they only need
  25.499 -\emph{read} permission.  (This is \emph{not} a common feature among
  25.500 -revision control systems, so don't take it for granted!  Most require
  25.501 -readers to be able to lock a repository to access it safely, and this
  25.502 -requires write permission on at least one directory, which of course
  25.503 -makes for all kinds of nasty and annoying security and administrative
  25.504 -problems.)
  25.505 -
  25.506 -Mercurial uses locks to ensure that only one process can write to a
  25.507 -repository at a time (the locking mechanism is safe even over
  25.508 -filesystems that are notoriously hostile to locking, such as NFS).  If
  25.509 -a repository is locked, a writer will wait for a while to retry if the
  25.510 -repository becomes unlocked, but if the repository remains locked for
  25.511 -too long, the process attempting to write will time out after a while.
  25.512 -This means that your daily automated scripts won't get stuck forever
  25.513 -and pile up if a system crashes unnoticed, for example.  (Yes, the
  25.514 -timeout is configurable, from zero to infinity.)
  25.515 -
  25.516 -\subsubsection{Safe dirstate access}
  25.517 -
  25.518 -As with revision data, Mercurial doesn't take a lock to read the
  25.519 -dirstate file; it does acquire a lock to write it.  To avoid the
  25.520 -possibility of reading a partially written copy of the dirstate file,
  25.521 -Mercurial writes to a file with a unique name in the same directory as
  25.522 -the dirstate file, then renames the temporary file atomically to
  25.523 -\filename{dirstate}.  The file named \filename{dirstate} is thus
  25.524 -guaranteed to be complete, not partially written.
  25.525 -
  25.526 -\subsection{Avoiding seeks}
  25.527 -
  25.528 -Critical to Mercurial's performance is the avoidance of seeks of the
  25.529 -disk head, since any seek is far more expensive than even a
  25.530 -comparatively large read operation.
  25.531 -
  25.532 -This is why, for example, the dirstate is stored in a single file.  If
  25.533 -there were a dirstate file per directory that Mercurial tracked, the
  25.534 -disk would seek once per directory.  Instead, Mercurial reads the
  25.535 -entire single dirstate file in one step.
  25.536 -
  25.537 -Mercurial also uses a ``copy on write'' scheme when cloning a
  25.538 -repository on local storage.  Instead of copying every revlog file
  25.539 -from the old repository into the new repository, it makes a ``hard
  25.540 -link'', which is a shorthand way to say ``these two names point to the
  25.541 -same file''.  When Mercurial is about to write to one of a revlog's
  25.542 -files, it checks to see if the number of names pointing at the file is
  25.543 -greater than one.  If it is, more than one repository is using the
  25.544 -file, so Mercurial makes a new copy of the file that is private to
  25.545 -this repository.
  25.546 -
  25.547 -A few revision control developers have pointed out that this idea of
  25.548 -making a complete private copy of a file is not very efficient in its
  25.549 -use of storage.  While this is true, storage is cheap, and this method
  25.550 -gives the highest performance while deferring most book-keeping to the
  25.551 -operating system.  An alternative scheme would most likely reduce
  25.552 -performance and increase the complexity of the software, each of which
  25.553 -is much more important to the ``feel'' of day-to-day use.
  25.554 -
  25.555 -\subsection{Other contents of the dirstate}
  25.556 -
  25.557 -Because Mercurial doesn't force you to tell it when you're modifying a
  25.558 -file, it uses the dirstate to store some extra information so it can
  25.559 -determine efficiently whether you have modified a file.  For each file
  25.560 -in the working directory, it stores the time that it last modified the
  25.561 -file itself, and the size of the file at that time.  
  25.562 -
  25.563 -When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
  25.564 -\hgcmd{copy} files, Mercurial updates the dirstate so that it knows
  25.565 -what to do with those files when you commit.
  25.566 -
  25.567 -When Mercurial is checking the states of files in the working
  25.568 -directory, it first checks a file's modification time.  If that has
  25.569 -not changed, the file must not have been modified.  If the file's size
  25.570 -has changed, the file must have been modified.  If the modification
  25.571 -time has changed, but the size has not, only then does Mercurial need
  25.572 -to read the actual contents of the file to see if they've changed.
  25.573 -Storing these few extra pieces of information dramatically reduces the
  25.574 -amount of data that Mercurial needs to read, which yields large
  25.575 -performance improvements compared to other revision control systems.
  25.576 -
  25.577 -%%% Local Variables: 
  25.578 -%%% mode: latex
  25.579 -%%% TeX-master: "00book"
  25.580 -%%% End:

    26.1 --- a/en/daily.tex	Thu Jan 29 22:47:34 2009 -0800
    26.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    26.3 @@ -1,381 +0,0 @@
    26.4 -\chapter{Mercurial in daily use}
    26.5 -\label{chap:daily}
    26.6 -
    26.7 -\section{Telling Mercurial which files to track}
    26.8 -
    26.9 -Mercurial does not work with files in your repository unless you tell
   26.10 -it to manage them.  The \hgcmd{status} command will tell you which
   26.11 -files Mercurial doesn't know about; it uses a ``\texttt{?}'' to
   26.12 -display such files.
   26.13 -
   26.14 -To tell Mercurial to track a file, use the \hgcmd{add} command.  Once
   26.15 -you have added a file, the entry in the output of \hgcmd{status} for
   26.16 -that file changes from ``\texttt{?}'' to ``\texttt{A}''.
   26.17 -\interaction{daily.files.add}
   26.18 -
   26.19 -After you run a \hgcmd{commit}, the files that you added before the
   26.20 -commit will no longer be listed in the output of \hgcmd{status}.  The
   26.21 -reason for this is that \hgcmd{status} only tells you about
   26.22 -``interesting'' files---those that you have modified or told Mercurial
   26.23 -to do something with---by default.  If you have a repository that
   26.24 -contains thousands of files, you will rarely want to know about files
   26.25 -that Mercurial is tracking, but that have not changed.  (You can still
   26.26 -get this information; we'll return to this later.)
   26.27 -
   26.28 -Once you add a file, Mercurial doesn't do anything with it
   26.29 -immediately.  Instead, it will take a snapshot of the file's state the
   26.30 -next time you perform a commit.  It will then continue to track the
   26.31 -changes you make to the file every time you commit, until you remove
   26.32 -the file.
   26.33 -
   26.34 -\subsection{Explicit versus implicit file naming}
   26.35 -
   26.36 -A useful behaviour that Mercurial has is that if you pass the name of
   26.37 -a directory to a command, every Mercurial command will treat this as
   26.38 -``I want to operate on every file in this directory and its
   26.39 -subdirectories''.
   26.40 -\interaction{daily.files.add-dir}
   26.41 -Notice in this example that Mercurial printed the names of the files
   26.42 -it added, whereas it didn't do so when we added the file named
   26.43 -\filename{a} in the earlier example.
   26.44 -
   26.45 -What's going on is that in the former case, we explicitly named the
   26.46 -file to add on the command line, so the assumption that Mercurial
   26.47 -makes in such cases is that you know what you were doing, and it
   26.48 -doesn't print any output.
   26.49 -
   26.50 -However, when we \emph{imply} the names of files by giving the name of
   26.51 -a directory, Mercurial takes the extra step of printing the name of
   26.52 -each file that it does something with.  This makes it more clear what
   26.53 -is happening, and reduces the likelihood of a silent and nasty
   26.54 -surprise.  This behaviour is common to most Mercurial commands.
   26.55 -
   26.56 -\subsection{Aside: Mercurial tracks files, not directories}
   26.57 -
   26.58 -Mercurial does not track directory information.  Instead, it tracks
   26.59 -the path to a file.  Before creating a file, it first creates any
   26.60 -missing directory components of the path.  After it deletes a file, it
   26.61 -then deletes any empty directories that were in the deleted file's
   26.62 -path.  This sounds like a trivial distinction, but it has one minor
   26.63 -practical consequence: it is not possible to represent a completely
   26.64 -empty directory in Mercurial.
   26.65 -
   26.66 -Empty directories are rarely useful, and there are unintrusive
   26.67 -workarounds that you can use to achieve an appropriate effect.  The
   26.68 -developers of Mercurial thus felt that the complexity that would be
   26.69 -required to manage empty directories was not worth the limited benefit
   26.70 -this feature would bring.
   26.71 -
   26.72 -If you need an empty directory in your repository, there are a few
   26.73 -ways to achieve this. One is to create a directory, then \hgcmd{add} a
   26.74 -``hidden'' file to that directory.  On Unix-like systems, any file
   26.75 -name that begins with a period (``\texttt{.}'') is treated as hidden
   26.76 -by most commands and GUI tools.  This approach is illustrated in
   26.77 -figure~\ref{ex:daily:hidden}.
   26.78 -
   26.79 -\begin{figure}[ht]
   26.80 -  \interaction{daily.files.hidden}
   26.81 -  \caption{Simulating an empty directory using a hidden file}
   26.82 -  \label{ex:daily:hidden}
   26.83 -\end{figure}
   26.84 -
   26.85 -Another way to tackle a need for an empty directory is to simply
   26.86 -create one in your automated build scripts before they will need it.
   26.87 -
   26.88 -\section{How to stop tracking a file}
   26.89 -
   26.90 -Once you decide that a file no longer belongs in your repository, use
   26.91 -the \hgcmd{remove} command; this deletes the file, and tells Mercurial
   26.92 -to stop tracking it.  A removed file is represented in the output of
   26.93 -\hgcmd{status} with a ``\texttt{R}''.
   26.94 -\interaction{daily.files.remove}
   26.95 -
   26.96 -After you \hgcmd{remove} a file, Mercurial will no longer track
   26.97 -changes to that file, even if you recreate a file with the same name
   26.98 -in your working directory.  If you do recreate a file with the same
   26.99 -name and want Mercurial to track the new file, simply \hgcmd{add} it.
  26.100 -Mercurial will know that the newly added file is not related to the
  26.101 -old file of the same name.
  26.102 -
  26.103 -\subsection{Removing a file does not affect its history}
  26.104 -
  26.105 -It is important to understand that removing a file has only two
  26.106 -effects.
  26.107 -\begin{itemize}
  26.108 -\item It removes the current version of the file from the working
  26.109 -  directory.
  26.110 -\item It stops Mercurial from tracking changes to the file, from the
  26.111 -  time of the next commit.
  26.112 -\end{itemize}
  26.113 -Removing a file \emph{does not} in any way alter the \emph{history} of
  26.114 -the file.
  26.115 -
  26.116 -If you update the working directory to a changeset in which a file
  26.117 -that you have removed was still tracked, it will reappear in the
  26.118 -working directory, with the contents it had when you committed that
  26.119 -changeset.  If you then update the working directory to a later
  26.120 -changeset, in which the file had been removed, Mercurial will once
  26.121 -again remove the file from the working directory.
  26.122 -
  26.123 -\subsection{Missing files}
  26.124 -
  26.125 -Mercurial considers a file that you have deleted, but not used
  26.126 -\hgcmd{remove} to delete, to be \emph{missing}.  A missing file is
  26.127 -represented with ``\texttt{!}'' in the output of \hgcmd{status}.
  26.128 -Mercurial commands will not generally do anything with missing files.
  26.129 -\interaction{daily.files.missing}
  26.130 -
  26.131 -If your repository contains a file that \hgcmd{status} reports as
  26.132 -missing, and you want the file to stay gone, you can run
  26.133 -\hgcmdargs{remove}{\hgopt{remove}{--after}} at any time later on, to
  26.134 -tell Mercurial that you really did mean to remove the file.
  26.135 -\interaction{daily.files.remove-after}
  26.136 -
  26.137 -On the other hand, if you deleted the missing file by accident, use
  26.138 -\hgcmdargs{revert}{\emph{filename}} to recover the file.  It will
  26.139 -reappear, in unmodified form.
  26.140 -\interaction{daily.files.recover-missing}
  26.141 -
  26.142 -\subsection{Aside: why tell Mercurial explicitly to 
  26.143 -  remove a file?}
  26.144 -
  26.145 -You might wonder why Mercurial requires you to explicitly tell it that
  26.146 -you are deleting a file.  Early during the development of Mercurial,
  26.147 -it let you delete a file however you pleased; Mercurial would notice
  26.148 -the absence of the file automatically when you next ran a
  26.149 -\hgcmd{commit}, and stop tracking the file.  In practice, this made it
  26.150 -too easy to accidentally remove a file without noticing.
  26.151 -
  26.152 -\subsection{Useful shorthand---adding and removing files
  26.153 -  in one step}
  26.154 -
  26.155 -Mercurial offers a combination command, \hgcmd{addremove}, that adds
  26.156 -untracked files and marks missing files as removed.  
  26.157 -\interaction{daily.files.addremove}
  26.158 -The \hgcmd{commit} command also provides a \hgopt{commit}{-A} option
  26.159 -that performs this same add-and-remove, immediately followed by a
  26.160 -commit.
  26.161 -\interaction{daily.files.commit-addremove}
  26.162 -
  26.163 -\section{Copying files}
  26.164 -
  26.165 -Mercurial provides a \hgcmd{copy} command that lets you make a new
  26.166 -copy of a file.  When you copy a file using this command, Mercurial
  26.167 -makes a record of the fact that the new file is a copy of the original
  26.168 -file.  It treats these copied files specially when you merge your work
  26.169 -with someone else's.
  26.170 -
  26.171 -\subsection{The results of copying during a merge}
  26.172 -
  26.173 -What happens during a merge is that changes ``follow'' a copy.  To
  26.174 -best illustrate what this means, let's create an example.  We'll start
  26.175 -with the usual tiny repository that contains a single file.
  26.176 -\interaction{daily.copy.init}
  26.177 -We need to do some work in parallel, so that we'll have something to
  26.178 -merge.  So let's clone our repository.
  26.179 -\interaction{daily.copy.clone}
  26.180 -Back in our initial repository, let's use the \hgcmd{copy} command to
  26.181 -make a copy of the first file we created.
  26.182 -\interaction{daily.copy.copy}
  26.183 -
  26.184 -If we look at the output of the \hgcmd{status} command afterwards, the
  26.185 -copied file looks just like a normal added file.
  26.186 -\interaction{daily.copy.status}
  26.187 -But if we pass the \hgopt{status}{-C} option to \hgcmd{status}, it
  26.188 -prints another line of output: this is the file that our newly-added
  26.189 -file was copied \emph{from}.
  26.190 -\interaction{daily.copy.status-copy}
  26.191 -
  26.192 -Now, back in the repository we cloned, let's make a change in
  26.193 -parallel.  We'll add a line of content to the original file that we
  26.194 -created.
  26.195 -\interaction{daily.copy.other}
  26.196 -Now we have a modified \filename{file} in this repository.  When we
  26.197 -pull the changes from the first repository, and merge the two heads,
  26.198 -Mercurial will propagate the changes that we made locally to
  26.199 -\filename{file} into its copy, \filename{new-file}.
  26.200 -\interaction{daily.copy.merge}
  26.201 -
  26.202 -\subsection{Why should changes follow copies?}
  26.203 -\label{sec:daily:why-copy}
  26.204 -
  26.205 -This behaviour, of changes to a file propagating out to copies of the
  26.206 -file, might seem esoteric, but in most cases it's highly desirable.
  26.207 -
  26.208 -First of all, remember that this propagation \emph{only} happens when
  26.209 -you merge.  So if you \hgcmd{copy} a file, and subsequently modify the
  26.210 -original file during the normal course of your work, nothing will
  26.211 -happen.
  26.212 -
  26.213 -The second thing to know is that modifications will only propagate
  26.214 -across a copy as long as the repository that you're pulling changes
  26.215 -from \emph{doesn't know} about the copy.
  26.216 -
  26.217 -The reason that Mercurial does this is as follows.  Let's say I make
  26.218 -an important bug fix in a source file, and commit my changes.
  26.219 -Meanwhile, you've decided to \hgcmd{copy} the file in your repository,
  26.220 -without knowing about the bug or having seen the fix, and you have
  26.221 -started hacking on your copy of the file.
  26.222 -
  26.223 -If you pulled and merged my changes, and Mercurial \emph{didn't}
  26.224 -propagate changes across copies, your source file would now contain
  26.225 -the bug, and unless you remembered to propagate the bug fix by hand,
  26.226 -the bug would \emph{remain} in your copy of the file.
  26.227 -
  26.228 -By automatically propagating the change that fixed the bug from the
  26.229 -original file to the copy, Mercurial prevents this class of problem.
  26.230 -To my knowledge, Mercurial is the \emph{only} revision control system
  26.231 -that propagates changes across copies like this.
  26.232 -
  26.233 -Once your change history has a record that the copy and subsequent
  26.234 -merge occurred, there's usually no further need to propagate changes
  26.235 -from the original file to the copied file, and that's why Mercurial
  26.236 -only propagates changes across copies until this point, and no
  26.237 -further.
  26.238 -
  26.239 -\subsection{How to make changes \emph{not} follow a copy}
  26.240 -
  26.241 -If, for some reason, you decide that this business of automatically
  26.242 -propagating changes across copies is not for you, simply use your
  26.243 -system's normal file copy command (on Unix-like systems, that's
  26.244 -\command{cp}) to make a copy of a file, then \hgcmd{add} the new copy
  26.245 -by hand.  Before you do so, though, please do reread
  26.246 -section~\ref{sec:daily:why-copy}, and make an informed decision that
  26.247 -this behaviour is not appropriate to your specific case.
  26.248 -
  26.249 -\subsection{Behaviour of the \hgcmd{copy} command}
  26.250 -
  26.251 -When you use the \hgcmd{copy} command, Mercurial makes a copy of each
  26.252 -source file as it currently stands in the working directory.  This
  26.253 -means that if you make some modifications to a file, then \hgcmd{copy}
  26.254 -it without first having committed those changes, the new copy will
  26.255 -also contain the modifications you have made up until that point.  (I
  26.256 -find this behaviour a little counterintuitive, which is why I mention
  26.257 -it here.)
  26.258 -
  26.259 -The \hgcmd{copy} command acts similarly to the Unix \command{cp}
  26.260 -command (you can use the \hgcmd{cp} alias if you prefer).  The last
  26.261 -argument is the \emph{destination}, and all prior arguments are
  26.262 -\emph{sources}.  If you pass it a single file as the source, and the
  26.263 -destination does not exist, it creates a new file with that name.
  26.264 -\interaction{daily.copy.simple}
  26.265 -If the destination is a directory, Mercurial copies its sources into
  26.266 -that directory.
  26.267 -\interaction{daily.copy.dir-dest}
  26.268 -Copying a directory is recursive, and preserves the directory
  26.269 -structure of the source.
  26.270 -\interaction{daily.copy.dir-src}
  26.271 -If the source and destination are both directories, the source tree is
  26.272 -recreated in the destination directory.
  26.273 -\interaction{daily.copy.dir-src-dest}
  26.274 -
  26.275 -As with the \hgcmd{rename} command, if you copy a file manually and
  26.276 -then want Mercurial to know that you've copied the file, simply use
  26.277 -the \hgopt{copy}{--after} option to \hgcmd{copy}.
  26.278 -\interaction{daily.copy.after}
  26.279 -
  26.280 -\section{Renaming files}
  26.281 -
  26.282 -It's rather more common to need to rename a file than to make a copy
  26.283 -of it.  The reason I discussed the \hgcmd{copy} command before talking
  26.284 -about renaming files is that Mercurial treats a rename in essentially
  26.285 -the same way as a copy.  Therefore, knowing what Mercurial does when
  26.286 -you copy a file tells you what to expect when you rename a file.
  26.287 -
  26.288 -When you use the \hgcmd{rename} command, Mercurial makes a copy of
  26.289 -each source file, then deletes it and marks the file as removed.
  26.290 -\interaction{daily.rename.rename}
  26.291 -The \hgcmd{status} command shows the newly copied file as added, and
  26.292 -the copied-from file as removed.
  26.293 -\interaction{daily.rename.status}
  26.294 -As with the results of a \hgcmd{copy}, we must use the
  26.295 -\hgopt{status}{-C} option to \hgcmd{status} to see that the added file
  26.296 -is really being tracked by Mercurial as a copy of the original, now
  26.297 -removed, file.
  26.298 -\interaction{daily.rename.status-copy}
  26.299 -
  26.300 -As with \hgcmd{remove} and \hgcmd{copy}, you can tell Mercurial about
  26.301 -a rename after the fact using the \hgopt{rename}{--after} option.  In
  26.302 -most other respects, the behaviour of the \hgcmd{rename} command, and
  26.303 -the options it accepts, are similar to the \hgcmd{copy} command.
  26.304 -
  26.305 -\subsection{Renaming files and merging changes}
  26.306 -
  26.307 -Since Mercurial's rename is implemented as copy-and-remove, the same
  26.308 -propagation of changes happens when you merge after a rename as after
  26.309 -a copy.
  26.310 -
  26.311 -If I modify a file, and you rename it to a new name, and then we merge
  26.312 -our respective changes, my modifications to the file under its
  26.313 -original name will be propagated into the file under its new name.
  26.314 -(This is something you might expect to ``simply work,'' but not all
  26.315 -revision control systems actually do this.)
  26.316 -
  26.317 -Whereas having changes follow a copy is a feature where you can
  26.318 -perhaps nod and say ``yes, that might be useful,'' it should be clear
  26.319 -that having them follow a rename is definitely important.  Without
  26.320 -this facility, it would simply be too easy for changes to become
  26.321 -orphaned when files are renamed.
  26.322 -
  26.323 -\subsection{Divergent renames and merging}
  26.324 -
  26.325 -The case of diverging names occurs when two developers start with a
  26.326 -file---let's call it \filename{foo}---in their respective
  26.327 -repositories.
  26.328 -
  26.329 -\interaction{rename.divergent.clone}
  26.330 -Anne renames the file to \filename{bar}.
  26.331 -\interaction{rename.divergent.rename.anne}
  26.332 -Meanwhile, Bob renames it to \filename{quux}.
  26.333 -\interaction{rename.divergent.rename.bob}
  26.334 -
  26.335 -I like to think of this as a conflict because each developer has
  26.336 -expressed different intentions about what the file ought to be named.
  26.337 -
  26.338 -What do you think should happen when they merge their work?
  26.339 -Mercurial's actual behaviour is that it always preserves \emph{both}
  26.340 -names when it merges changesets that contain divergent renames.
  26.341 -\interaction{rename.divergent.merge}
  26.342 -
  26.343 -Notice that Mercurial does warn about the divergent renames, but it
  26.344 -leaves it up to you to do something about the divergence after the merge.
  26.345 -
  26.346 -\subsection{Convergent renames and merging}
  26.347 -
  26.348 -Another kind of rename conflict occurs when two people choose to
  26.349 -rename different \emph{source} files to the same \emph{destination}.
  26.350 -In this case, Mercurial runs its normal merge machinery, and lets you
  26.351 -guide it to a suitable resolution.
  26.352 -
  26.353 -\subsection{Other name-related corner cases}
  26.354 -
  26.355 -Mercurial has a longstanding bug in which it fails to handle a merge
  26.356 -where one side has a file with a given name, while another has a
  26.357 -directory with the same name.  This is documented as~\bug{29}.
  26.358 -\interaction{issue29.go}
  26.359 -
  26.360 -\section{Recovering from mistakes}
  26.361 -
  26.362 -Mercurial has some useful commands that will help you to recover from
  26.363 -some common mistakes.
  26.364 -
  26.365 -The \hgcmd{revert} command lets you undo changes that you have made to
  26.366 -your working directory.  For example, if you \hgcmd{add} a file by
  26.367 -accident, just run \hgcmd{revert} with the name of the file you added,
  26.368 -and while the file won't be touched in any way, it won't be tracked
  26.369 -for adding by Mercurial any longer, either.  You can also use
  26.370 -\hgcmd{revert} to get rid of erroneous changes to a file.
  26.371 -
  26.372 -It's useful to remember that the \hgcmd{revert} command is useful for
  26.373 -changes that you have not yet committed.  Once you've committed a
  26.374 -change, if you decide it was a mistake, you can still do something
  26.375 -about it, though your options may be more limited.
  26.376 -
  26.377 -For more information about the \hgcmd{revert} command, and details
  26.378 -about how to deal with changes you have already committed, see
  26.379 -chapter~\ref{chap:undo}.
  26.380 -
  26.381 -%%% Local Variables: 
  26.382 -%%% mode: latex
  26.383 -%%% TeX-master: "00book"
  26.384 -%%% End: 

    27.1 --- a/en/filenames.tex	Thu Jan 29 22:47:34 2009 -0800
    27.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    27.3 @@ -1,306 +0,0 @@
    27.4 -\chapter{File names and pattern matching}
    27.5 -\label{chap:names}
    27.6 -
    27.7 -Mercurial provides mechanisms that let you work with file names in a
    27.8 -consistent and expressive way.
    27.9 -
   27.10 -\section{Simple file naming}
   27.11 -
   27.12 -Mercurial uses a unified piece of machinery ``under the hood'' to
   27.13 -handle file names.  Every command behaves uniformly with respect to
   27.14 -file names.  The way in which commands work with file names is as
   27.15 -follows.
   27.16 -
   27.17 -If you explicitly name real files on the command line, Mercurial works
   27.18 -with exactly those files, as you would expect.
   27.19 -\interaction{filenames.files}
   27.20 -
   27.21 -When you provide a directory name, Mercurial will interpret this as
   27.22 -``operate on every file in this directory and its subdirectories''.
   27.23 -Mercurial traverses the files and subdirectories in a directory in
   27.24 -alphabetical order.  When it encounters a subdirectory, it will
   27.25 -traverse that subdirectory before continuing with the current
   27.26 -directory.
   27.27 -\interaction{filenames.dirs}
   27.28 -
   27.29 -\section{Running commands without any file names}
   27.30 -
   27.31 -Mercurial's commands that work with file names have useful default
   27.32 -behaviours when you invoke them without providing any file names or
   27.33 -patterns.  What kind of behaviour you should expect depends on what
   27.34 -the command does.  Here are a few rules of thumb you can use to
   27.35 -predict what a command is likely to do if you don't give it any names
   27.36 -to work with.
   27.37 -\begin{itemize}
   27.38 -\item Most commands will operate on the entire working directory.
   27.39 -  This is what the \hgcmd{add} command does, for example.
   27.40 -\item If the command has effects that are difficult or impossible to
   27.41 -  reverse, it will force you to explicitly provide at least one name
   27.42 -  or pattern (see below).  This protects you from accidentally
   27.43 -  deleting files by running \hgcmd{remove} with no arguments, for
   27.44 -  example.
   27.45 -\end{itemize}
   27.46 -
   27.47 -It's easy to work around these default behaviours if they don't suit
   27.48 -you.  If a command normally operates on the whole working directory,
   27.49 -you can invoke it on just the current directory and its subdirectories
   27.50 -by giving it the name ``\dirname{.}''.
   27.51 -\interaction{filenames.wdir-subdir}
   27.52 -
   27.53 -Along the same lines, some commands normally print file names relative
   27.54 -to the root of the repository, even if you're invoking them from a
   27.55 -subdirectory.  Such a command will print file names relative to your
   27.56 -subdirectory if you give it explicit names.  Here, we're going to run
   27.57 -\hgcmd{status} from a subdirectory, and get it to operate on the
   27.58 -entire working directory while printing file names relative to our
   27.59 -subdirectory, by passing it the output of the \hgcmd{root} command.
   27.60 -\interaction{filenames.wdir-relname}
   27.61 -
   27.62 -\section{Telling you what's going on}
   27.63 -
   27.64 -The \hgcmd{add} example in the preceding section illustrates something
   27.65 -else that's helpful about Mercurial commands.  If a command operates
   27.66 -on a file that you didn't name explicitly on the command line, it will
   27.67 -usually print the name of the file, so that you will not be surprised
   27.68 -what's going on.
   27.69 -
   27.70 -The principle here is of \emph{least surprise}.  If you've exactly
   27.71 -named a file on the command line, there's no point in repeating it
   27.72 -back at you.  If Mercurial is acting on a file \emph{implicitly},
   27.73 -because you provided no names, or a directory, or a pattern (see
   27.74 -below), it's safest to tell you what it's doing.
   27.75 -
   27.76 -For commands that behave this way, you can silence them using the
   27.77 -\hggopt{-q} option.  You can also get them to print the name of every
   27.78 -file, even those you've named explicitly, using the \hggopt{-v}
   27.79 -option.
   27.80 -
   27.81 -\section{Using patterns to identify files}
   27.82 -
   27.83 -In addition to working with file and directory names, Mercurial lets
   27.84 -you use \emph{patterns} to identify files.  Mercurial's pattern
   27.85 -handling is expressive.
   27.86 -
   27.87 -On Unix-like systems (Linux, MacOS, etc.), the job of matching file
   27.88 -names to patterns normally falls to the shell.  On these systems, you
   27.89 -must explicitly tell Mercurial that a name is a pattern.  On Windows,
   27.90 -the shell does not expand patterns, so Mercurial will automatically
   27.91 -identify names that are patterns, and expand them for you.
   27.92 -
   27.93 -To provide a pattern in place of a regular name on the command line,
   27.94 -the mechanism is simple:
   27.95 -\begin{codesample2}
   27.96 -  syntax:patternbody
   27.97 -\end{codesample2}
   27.98 -That is, a pattern is identified by a short text string that says what
   27.99 -kind of pattern this is, followed by a colon, followed by the actual
  27.100 -pattern.
  27.101 -
  27.102 -Mercurial supports two kinds of pattern syntax.  The most frequently
  27.103 -used is called \texttt{glob}; this is the same kind of pattern
  27.104 -matching used by the Unix shell, and should be familiar to Windows
  27.105 -command prompt users, too.  
  27.106 -
  27.107 -When Mercurial does automatic pattern matching on Windows, it uses
  27.108 -\texttt{glob} syntax.  You can thus omit the ``\texttt{glob:}'' prefix
  27.109 -on Windows, but it's safe to use it, too.
  27.110 -
  27.111 -The \texttt{re} syntax is more powerful; it lets you specify patterns
  27.112 -using regular expressions, also known as regexps.
  27.113 -
  27.114 -By the way, in the examples that follow, notice that I'm careful to
  27.115 -wrap all of my patterns in quote characters, so that they won't get
  27.116 -expanded by the shell before Mercurial sees them.
  27.117 -
  27.118 -\subsection{Shell-style \texttt{glob} patterns}
  27.119 -
  27.120 -This is an overview of the kinds of patterns you can use when you're
  27.121 -matching on glob patterns.
  27.122 -
  27.123 -The ``\texttt{*}'' character matches any string, within a single
  27.124 -directory.
  27.125 -\interaction{filenames.glob.star}
  27.126 -
  27.127 -The ``\texttt{**}'' pattern matches any string, and crosses directory
  27.128 -boundaries.  It's not a standard Unix glob token, but it's accepted by
  27.129 -several popular Unix shells, and is very useful.
  27.130 -\interaction{filenames.glob.starstar}
  27.131 -
  27.132 -The ``\texttt{?}'' pattern matches any single character.
  27.133 -\interaction{filenames.glob.question}
  27.134 -
  27.135 -The ``\texttt{[}'' character begins a \emph{character class}.  This
  27.136 -matches any single character within the class.  The class ends with a
  27.137 -``\texttt{]}'' character.  A class may contain multiple \emph{range}s
  27.138 -of the form ``\texttt{a-f}'', which is shorthand for
  27.139 -``\texttt{abcdef}''.
  27.140 -\interaction{filenames.glob.range}
  27.141 -If the first character after the ``\texttt{[}'' in a character class
  27.142 -is a ``\texttt{!}'', it \emph{negates} the class, making it match any
  27.143 -single character not in the class.
  27.144 -
  27.145 -A ``\texttt{\{}'' begins a group of subpatterns, where the whole group
  27.146 -matches if any subpattern in the group matches.  The ``\texttt{,}''
  27.147 -character separates subpatterns, and ``\texttt{\}}'' ends the group.
  27.148 -\interaction{filenames.glob.group}
  27.149 -
  27.150 -\subsubsection{Watch out!}
  27.151 -
  27.152 -Don't forget that if you want to match a pattern in any directory, you
  27.153 -should not be using the ``\texttt{*}'' match-any token, as this will
  27.154 -only match within one directory.  Instead, use the ``\texttt{**}''
  27.155 -token.  This small example illustrates the difference between the two.
  27.156 -\interaction{filenames.glob.star-starstar}
  27.157 -
  27.158 -\subsection{Regular expression matching with \texttt{re} patterns}
  27.159 -
  27.160 -Mercurial accepts the same regular expression syntax as the Python
  27.161 -programming language (it uses Python's regexp engine internally).
  27.162 -This is based on the Perl language's regexp syntax, which is the most
  27.163 -popular dialect in use (it's also used in Java, for example).
  27.164 -
  27.165 -I won't discuss Mercurial's regexp dialect in any detail here, as
  27.166 -regexps are not often used.  Perl-style regexps are in any case
  27.167 -already exhaustively documented on a multitude of web sites, and in
  27.168 -many books.  Instead, I will focus here on a few things you should
  27.169 -know if you find yourself needing to use regexps with Mercurial.
  27.170 -
  27.171 -A regexp is matched against an entire file name, relative to the root
  27.172 -of the repository.  In other words, even if you're already in
  27.173 -subbdirectory \dirname{foo}, if you want to match files under this
  27.174 -directory, your pattern must start with ``\texttt{foo/}''.
  27.175 -
  27.176 -One thing to note, if you're familiar with Perl-style regexps, is that
  27.177 -Mercurial's are \emph{rooted}.  That is, a regexp starts matching
  27.178 -against the beginning of a string; it doesn't look for a match
  27.179 -anywhere within the string.  To match anywhere in a string, start
  27.180 -your pattern with ``\texttt{.*}''.
  27.181 -
  27.182 -\section{Filtering files}
  27.183 -
  27.184 -Not only does Mercurial give you a variety of ways to specify files;
  27.185 -it lets you further winnow those files using \emph{filters}.  Commands
  27.186 -that work with file names accept two filtering options.
  27.187 -\begin{itemize}
  27.188 -\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern
  27.189 -  that file names must match in order to be processed.
  27.190 -\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to
  27.191 -  \emph{avoid} processing files, if they match this pattern.
  27.192 -\end{itemize}
  27.193 -You can provide multiple \hggopt{-I} and \hggopt{-X} options on the
  27.194 -command line, and intermix them as you please.  Mercurial interprets
  27.195 -the patterns you provide using glob syntax by default (but you can use
  27.196 -regexps if you need to).
  27.197 -
  27.198 -You can read a \hggopt{-I} filter as ``process only the files that
  27.199 -match this filter''.
  27.200 -\interaction{filenames.filter.include}
  27.201 -The \hggopt{-X} filter is best read as ``process only the files that
  27.202 -don't match this pattern''.
  27.203 -\interaction{filenames.filter.exclude}
  27.204 -
  27.205 -\section{Ignoring unwanted files and directories}
  27.206 -
  27.207 -XXX.
  27.208 -
  27.209 -\section{Case sensitivity}
  27.210 -\label{sec:names:case}
  27.211 -
  27.212 -If you're working in a mixed development environment that contains
  27.213 -both Linux (or other Unix) systems and Macs or Windows systems, you
  27.214 -should keep in the back of your mind the knowledge that they treat the
  27.215 -case (``N'' versus ``n'') of file names in incompatible ways.  This is
  27.216 -not very likely to affect you, and it's easy to deal with if it does,
  27.217 -but it could surprise you if you don't know about it.
  27.218 -
  27.219 -Operating systems and filesystems differ in the way they handle the
  27.220 -\emph{case} of characters in file and directory names.  There are
  27.221 -three common ways to handle case in names.
  27.222 -\begin{itemize}
  27.223 -\item Completely case insensitive.  Uppercase and lowercase versions
  27.224 -  of a letter are treated as identical, both when creating a file and
  27.225 -  during subsequent accesses.  This is common on older DOS-based
  27.226 -  systems.
  27.227 -\item Case preserving, but insensitive.  When a file or directory is
  27.228 -  created, the case of its name is stored, and can be retrieved and
  27.229 -  displayed by the operating system.  When an existing file is being
  27.230 -  looked up, its case is ignored.  This is the standard arrangement on
  27.231 -  Windows and MacOS.  The names \filename{foo} and \filename{FoO}
  27.232 -  identify the same file.  This treatment of uppercase and lowercase
  27.233 -  letters as interchangeable is also referred to as \emph{case
  27.234 -    folding}.
  27.235 -\item Case sensitive.  The case of a name is significant at all times.
  27.236 -  The names \filename{foo} and {FoO} identify different files.  This
  27.237 -  is the way Linux and Unix systems normally work.
  27.238 -\end{itemize}
  27.239 -
  27.240 -On Unix-like systems, it is possible to have any or all of the above
  27.241 -ways of handling case in action at once.  For example, if you use a
  27.242 -USB thumb drive formatted with a FAT32 filesystem on a Linux system,
  27.243 -Linux will handle names on that filesystem in a case preserving, but
  27.244 -insensitive, way.
  27.245 -
  27.246 -\subsection{Safe, portable repository storage}
  27.247 -
  27.248 -Mercurial's repository storage mechanism is \emph{case safe}.  It
  27.249 -translates file names so that they can be safely stored on both case
  27.250 -sensitive and case insensitive filesystems.  This means that you can
  27.251 -use normal file copying tools to transfer a Mercurial repository onto,
  27.252 -for example, a USB thumb drive, and safely move that drive and
  27.253 -repository back and forth between a Mac, a PC running Windows, and a
  27.254 -Linux box.
  27.255 -
  27.256 -\subsection{Detecting case conflicts}
  27.257 -
  27.258 -When operating in the working directory, Mercurial honours the naming
  27.259 -policy of the filesystem where the working directory is located.  If
  27.260 -the filesystem is case preserving, but insensitive, Mercurial will
  27.261 -treat names that differ only in case as the same.
  27.262 -
  27.263 -An important aspect of this approach is that it is possible to commit
  27.264 -a changeset on a case sensitive (typically Linux or Unix) filesystem
  27.265 -that will cause trouble for users on case insensitive (usually Windows
  27.266 -and MacOS) users.  If a Linux user commits changes to two files, one
  27.267 -named \filename{myfile.c} and the other named \filename{MyFile.C},
  27.268 -they will be stored correctly in the repository.  And in the working
  27.269 -directories of other Linux users, they will be correctly represented
  27.270 -as separate files.
  27.271 -
  27.272 -If a Windows or Mac user pulls this change, they will not initially
  27.273 -have a problem, because Mercurial's repository storage mechanism is
  27.274 -case safe.  However, once they try to \hgcmd{update} the working
  27.275 -directory to that changeset, or \hgcmd{merge} with that changeset,
  27.276 -Mercurial will spot the conflict between the two file names that the
  27.277 -filesystem would treat as the same, and forbid the update or merge
  27.278 -from occurring.
  27.279 -
  27.280 -\subsection{Fixing a case conflict}
  27.281 -
  27.282 -If you are using Windows or a Mac in a mixed environment where some of
  27.283 -your collaborators are using Linux or Unix, and Mercurial reports a
  27.284 -case folding conflict when you try to \hgcmd{update} or \hgcmd{merge},
  27.285 -the procedure to fix the problem is simple.
  27.286 -
  27.287 -Just find a nearby Linux or Unix box, clone the problem repository
  27.288 -onto it, and use Mercurial's \hgcmd{rename} command to change the
  27.289 -names of any offending files or directories so that they will no
  27.290 -longer cause case folding conflicts.  Commit this change, \hgcmd{pull}
  27.291 -or \hgcmd{push} it across to your Windows or MacOS system, and
  27.292 -\hgcmd{update} to the revision with the non-conflicting names.
  27.293 -
  27.294 -The changeset with case-conflicting names will remain in your
  27.295 -project's history, and you still won't be able to \hgcmd{update} your
  27.296 -working directory to that changeset on a Windows or MacOS system, but
  27.297 -you can continue development unimpeded.
  27.298 -
  27.299 -\begin{note}
  27.300 -  Prior to version~0.9.3, Mercurial did not use a case safe repository
  27.301 -  storage mechanism, and did not detect case folding conflicts.  If
  27.302 -  you are using an older version of Mercurial on Windows or MacOS, I
  27.303 -  strongly recommend that you upgrade.
  27.304 -\end{note}
  27.305 -
  27.306 -%%% Local Variables: 
  27.307 -%%% mode: latex
  27.308 -%%% TeX-master: "00book"
  27.309 -%%% End: 

    28.1 --- a/en/hgext.tex	Thu Jan 29 22:47:34 2009 -0800
    28.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    28.3 @@ -1,429 +0,0 @@
    28.4 -\chapter{Adding functionality with extensions}
    28.5 -\label{chap:hgext}
    28.6 -
    28.7 -While the core of Mercurial is quite complete from a functionality
    28.8 -standpoint, it's deliberately shorn of fancy features.  This approach
    28.9 -of preserving simplicity keeps the software easy to deal with for both
   28.10 -maintainers and users.
   28.11 -
   28.12 -However, Mercurial doesn't box you in with an inflexible command set:
   28.13 -you can add features to it as \emph{extensions} (sometimes known as
   28.14 -\emph{plugins}).  We've already discussed a few of these extensions in
   28.15 -earlier chapters.
   28.16 -\begin{itemize}
   28.17 -\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch}
   28.18 -  extension; this combines pulling new changes and merging them with
   28.19 -  local changes into a single command, \hgxcmd{fetch}{fetch}.
   28.20 -\item In chapter~\ref{chap:hook}, we covered several extensions that
   28.21 -  are useful for hook-related functionality: \hgext{acl} adds access
   28.22 -  control lists; \hgext{bugzilla} adds integration with the Bugzilla
   28.23 -  bug tracking system; and \hgext{notify} sends notification emails on
   28.24 -  new changes.
   28.25 -\item The Mercurial Queues patch management extension is so invaluable
   28.26 -  that it merits two chapters and an appendix all to itself.
   28.27 -  Chapter~\ref{chap:mq} covers the basics;
   28.28 -  chapter~\ref{chap:mq-collab} discusses advanced topics; and
   28.29 -  appendix~\ref{chap:mqref} goes into detail on each command.
   28.30 -\end{itemize}
   28.31 -
   28.32 -In this chapter, we'll cover some of the other extensions that are
   28.33 -available for Mercurial, and briefly touch on some of the machinery
   28.34 -you'll need to know about if you want to write an extension of your
   28.35 -own.
   28.36 -\begin{itemize}
   28.37 -\item In section~\ref{sec:hgext:inotify}, we'll discuss the
   28.38 -  possibility of \emph{huge} performance improvements using the
   28.39 -  \hgext{inotify} extension.
   28.40 -\end{itemize}
   28.41 -
   28.42 -\section{Improve performance with the \hgext{inotify} extension}
   28.43 -\label{sec:hgext:inotify}
   28.44 -
   28.45 -Are you interested in having some of the most common Mercurial
   28.46 -operations run as much as a hundred times faster?  Read on!
   28.47 -
   28.48 -Mercurial has great performance under normal circumstances.  For
   28.49 -example, when you run the \hgcmd{status} command, Mercurial has to
   28.50 -scan almost every directory and file in your repository so that it can
   28.51 -display file status.  Many other Mercurial commands need to do the
   28.52 -same work behind the scenes; for example, the \hgcmd{diff} command
   28.53 -uses the status machinery to avoid doing an expensive comparison
   28.54 -operation on files that obviously haven't changed.
   28.55 -
   28.56 -Because obtaining file status is crucial to good performance, the
   28.57 -authors of Mercurial have optimised this code to within an inch of its
   28.58 -life.  However, there's no avoiding the fact that when you run
   28.59 -\hgcmd{status}, Mercurial is going to have to perform at least one
   28.60 -expensive system call for each managed file to determine whether it's
   28.61 -changed since the last time Mercurial checked.  For a sufficiently
   28.62 -large repository, this can take a long time.
   28.63 -
   28.64 -To put a number on the magnitude of this effect, I created a
   28.65 -repository containing 150,000 managed files.  I timed \hgcmd{status}
   28.66 -as taking ten seconds to run, even when \emph{none} of those files had
   28.67 -been modified.
   28.68 -
   28.69 -Many modern operating systems contain a file notification facility.
   28.70 -If a program signs up to an appropriate service, the operating system
   28.71 -will notify it every time a file of interest is created, modified, or
   28.72 -deleted.  On Linux systems, the kernel component that does this is
   28.73 -called \texttt{inotify}.
   28.74 -
   28.75 -Mercurial's \hgext{inotify} extension talks to the kernel's
   28.76 -\texttt{inotify} component to optimise \hgcmd{status} commands.  The
   28.77 -extension has two components.  A daemon sits in the background and
   28.78 -receives notifications from the \texttt{inotify} subsystem.  It also
   28.79 -listens for connections from a regular Mercurial command.  The
   28.80 -extension modifies Mercurial's behaviour so that instead of scanning
   28.81 -the filesystem, it queries the daemon.  Since the daemon has perfect
   28.82 -information about the state of the repository, it can respond with a
   28.83 -result instantaneously, avoiding the need to scan every directory and
   28.84 -file in the repository.
   28.85 -
   28.86 -Recall the ten seconds that I measured plain Mercurial as taking to
   28.87 -run \hgcmd{status} on a 150,000 file repository.  With the
   28.88 -\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a
   28.89 -factor of \emph{one hundred} faster.
   28.90 -
   28.91 -Before we continue, please pay attention to some caveats.
   28.92 -\begin{itemize}
   28.93 -\item The \hgext{inotify} extension is Linux-specific.  Because it
   28.94 -  interfaces directly to the Linux kernel's \texttt{inotify}
   28.95 -  subsystem, it does not work on other operating systems.
   28.96 -\item It should work on any Linux distribution that was released after
   28.97 -  early~2005.  Older distributions are likely to have a kernel that
   28.98 -  lacks \texttt{inotify}, or a version of \texttt{glibc} that does not
   28.99 -  have the necessary interfacing support.
  28.100 -\item Not all filesystems are suitable for use with the
  28.101 -  \hgext{inotify} extension.  Network filesystems such as NFS are a
  28.102 -  non-starter, for example, particularly if you're running Mercurial
  28.103 -  on several systems, all mounting the same network filesystem.  The
  28.104 -  kernel's \texttt{inotify} system has no way of knowing about changes
  28.105 -  made on another system.  Most local filesystems (e.g.~ext3, XFS,
  28.106 -  ReiserFS) should work fine.
  28.107 -\end{itemize}
  28.108 -
  28.109 -The \hgext{inotify} extension is not yet shipped with Mercurial as of
  28.110 -May~2007, so it's a little more involved to set up than other
  28.111 -extensions.  But the performance improvement is worth it!
  28.112 -
  28.113 -The extension currently comes in two parts: a set of patches to the
  28.114 -Mercurial source code, and a library of Python bindings to the
  28.115 -\texttt{inotify} subsystem.
  28.116 -\begin{note}
  28.117 -  There are \emph{two} Python \texttt{inotify} binding libraries.  One
  28.118 -  of them is called \texttt{pyinotify}, and is packaged by some Linux
  28.119 -  distributions as \texttt{python-inotify}.  This is \emph{not} the
  28.120 -  one you'll need, as it is too buggy and inefficient to be practical.
  28.121 -\end{note}
  28.122 -To get going, it's best to already have a functioning copy of
  28.123 -Mercurial installed.
  28.124 -\begin{note}
  28.125 -  If you follow the instructions below, you'll be \emph{replacing} and
  28.126 -  overwriting any existing installation of Mercurial that you might
  28.127 -  already have, using the latest ``bleeding edge'' Mercurial code.
  28.128 -  Don't say you weren't warned!
  28.129 -\end{note}
  28.130 -\begin{enumerate}
  28.131 -\item Clone the Python \texttt{inotify} binding repository.  Build and
  28.132 -  install it.
  28.133 -  \begin{codesample4}
  28.134 -    hg clone http://hg.kublai.com/python/inotify
  28.135 -    cd inotify
  28.136 -    python setup.py build --force
  28.137 -    sudo python setup.py install --skip-build
  28.138 -  \end{codesample4}
  28.139 -\item Clone the \dirname{crew} Mercurial repository.  Clone the
  28.140 -  \hgext{inotify} patch repository so that Mercurial Queues will be
  28.141 -  able to apply patches to your cope of the \dirname{crew} repository.
  28.142 -  \begin{codesample4}
  28.143 -    hg clone http://hg.intevation.org/mercurial/crew
  28.144 -    hg clone crew inotify
  28.145 -    hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches
  28.146 -  \end{codesample4}
  28.147 -\item Make sure that you have the Mercurial Queues extension,
  28.148 -  \hgext{mq}, enabled.  If you've never used MQ, read
  28.149 -  section~\ref{sec:mq:start} to get started quickly.
  28.150 -\item Go into the \dirname{inotify} repo, and apply all of the
  28.151 -  \hgext{inotify} patches using the \hgxopt{mq}{qpush}{-a} option to
  28.152 -  the \hgxcmd{mq}{qpush} command.
  28.153 -  \begin{codesample4}
  28.154 -    cd inotify
  28.155 -    hg qpush -a
  28.156 -  \end{codesample4}
  28.157 -  If you get an error message from \hgxcmd{mq}{qpush}, you should not
  28.158 -  continue.  Instead, ask for help.
  28.159 -\item Build and install the patched version of Mercurial.
  28.160 -  \begin{codesample4}
  28.161 -    python setup.py build --force
  28.162 -    sudo python setup.py install --skip-build
  28.163 -  \end{codesample4}
  28.164 -\end{enumerate}
  28.165 -Once you've build a suitably patched version of Mercurial, all you
  28.166 -need to do to enable the \hgext{inotify} extension is add an entry to
  28.167 -your \hgrc.
  28.168 -\begin{codesample2}
  28.169 -  [extensions]
  28.170 -  inotify =
  28.171 -\end{codesample2}
  28.172 -When the \hgext{inotify} extension is enabled, Mercurial will
  28.173 -automatically and transparently start the status daemon the first time
  28.174 -you run a command that needs status in a repository.  It runs one
  28.175 -status daemon per repository.
  28.176 -
  28.177 -The status daemon is started silently, and runs in the background.  If
  28.178 -you look at a list of running processes after you've enabled the
  28.179 -\hgext{inotify} extension and run a few commands in different
  28.180 -repositories, you'll thus see a few \texttt{hg} processes sitting
  28.181 -around, waiting for updates from the kernel and queries from
  28.182 -Mercurial.
  28.183 -
  28.184 -The first time you run a Mercurial command in a repository when you
  28.185 -have the \hgext{inotify} extension enabled, it will run with about the
  28.186 -same performance as a normal Mercurial command.  This is because the
  28.187 -status daemon needs to perform a normal status scan so that it has a
  28.188 -baseline against which to apply later updates from the kernel.
  28.189 -However, \emph{every} subsequent command that does any kind of status
  28.190 -check should be noticeably faster on repositories of even fairly
  28.191 -modest size.  Better yet, the bigger your repository is, the greater a
  28.192 -performance advantage you'll see.  The \hgext{inotify} daemon makes
  28.193 -status operations almost instantaneous on repositories of all sizes!
  28.194 -
  28.195 -If you like, you can manually start a status daemon using the
  28.196 -\hgxcmd{inotify}{inserve} command.  This gives you slightly finer
  28.197 -control over how the daemon ought to run.  This command will of course
  28.198 -only be available when the \hgext{inotify} extension is enabled.
  28.199 -
  28.200 -When you're using the \hgext{inotify} extension, you should notice
  28.201 -\emph{no difference at all} in Mercurial's behaviour, with the sole
  28.202 -exception of status-related commands running a whole lot faster than
  28.203 -they used to.  You should specifically expect that commands will not
  28.204 -print different output; neither should they give different results.
  28.205 -If either of these situations occurs, please report a bug.
  28.206 -
  28.207 -\section{Flexible diff support with the \hgext{extdiff} extension}
  28.208 -\label{sec:hgext:extdiff}
  28.209 -
  28.210 -Mercurial's built-in \hgcmd{diff} command outputs plaintext unified
  28.211 -diffs.
  28.212 -\interaction{extdiff.diff}
  28.213 -If you would like to use an external tool to display modifications,
  28.214 -you'll want to use the \hgext{extdiff} extension.  This will let you
  28.215 -use, for example, a graphical diff tool.
  28.216 -
  28.217 -The \hgext{extdiff} extension is bundled with Mercurial, so it's easy
  28.218 -to set up.  In the \rcsection{extensions} section of your \hgrc,
  28.219 -simply add a one-line entry to enable the extension.
  28.220 -\begin{codesample2}
  28.221 -  [extensions]
  28.222 -  extdiff =
  28.223 -\end{codesample2}
  28.224 -This introduces a command named \hgxcmd{extdiff}{extdiff}, which by
  28.225 -default uses your system's \command{diff} command to generate a
  28.226 -unified diff in the same form as the built-in \hgcmd{diff} command.
  28.227 -\interaction{extdiff.extdiff}
  28.228 -The result won't be exactly the same as with the built-in \hgcmd{diff}
  28.229 -variations, because the output of \command{diff} varies from one
  28.230 -system to another, even when passed the same options.
  28.231 -
  28.232 -As the ``\texttt{making snapshot}'' lines of output above imply, the
  28.233 -\hgxcmd{extdiff}{extdiff} command works by creating two snapshots of
  28.234 -your source tree.  The first snapshot is of the source revision; the
  28.235 -second, of the target revision or working directory.  The
  28.236 -\hgxcmd{extdiff}{extdiff} command generates these snapshots in a
  28.237 -temporary directory, passes the name of each directory to an external
  28.238 -diff viewer, then deletes the temporary directory.  For efficiency, it
  28.239 -only snapshots the directories and files that have changed between the
  28.240 -two revisions.
  28.241 -
  28.242 -Snapshot directory names have the same base name as your repository.
  28.243 -If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo}
  28.244 -will be the name of each snapshot directory.  Each snapshot directory
  28.245 -name has its changeset ID appended, if appropriate.  If a snapshot is
  28.246 -of revision \texttt{a631aca1083f}, the directory will be named
  28.247 -\dirname{foo.a631aca1083f}.  A snapshot of the working directory won't
  28.248 -have a changeset ID appended, so it would just be \dirname{foo} in
  28.249 -this example.  To see what this looks like in practice, look again at
  28.250 -the \hgxcmd{extdiff}{extdiff} example above.  Notice that the diff has
  28.251 -the snapshot directory names embedded in its header.
  28.252 -
  28.253 -The \hgxcmd{extdiff}{extdiff} command accepts two important options.
  28.254 -The \hgxopt{extdiff}{extdiff}{-p} option lets you choose a program to
  28.255 -view differences with, instead of \command{diff}.  With the
  28.256 -\hgxopt{extdiff}{extdiff}{-o} option, you can change the options that
  28.257 -\hgxcmd{extdiff}{extdiff} passes to the program (by default, these
  28.258 -options are ``\texttt{-Npru}'', which only make sense if you're
  28.259 -running \command{diff}).  In other respects, the
  28.260 -\hgxcmd{extdiff}{extdiff} command acts similarly to the built-in
  28.261 -\hgcmd{diff} command: you use the same option names, syntax, and
  28.262 -arguments to specify the revisions you want, the files you want, and
  28.263 -so on.
  28.264 -
  28.265 -As an example, here's how to run the normal system \command{diff}
  28.266 -command, getting it to generate context diffs (using the
  28.267 -\cmdopt{diff}{-c} option) instead of unified diffs, and five lines of
  28.268 -context instead of the default three (passing \texttt{5} as the
  28.269 -argument to the \cmdopt{diff}{-C} option).
  28.270 -\interaction{extdiff.extdiff-ctx}
  28.271 -
  28.272 -Launching a visual diff tool is just as easy.  Here's how to launch
  28.273 -the \command{kdiff3} viewer.
  28.274 -\begin{codesample2}
  28.275 -  hg extdiff -p kdiff3 -o ''
  28.276 -\end{codesample2}
  28.277 -
  28.278 -If your diff viewing command can't deal with directories, you can
  28.279 -easily work around this with a little scripting.  For an example of
  28.280 -such scripting in action with the \hgext{mq} extension and the
  28.281 -\command{interdiff} command, see
  28.282 -section~\ref{mq-collab:tips:interdiff}.
  28.283 -
  28.284 -\subsection{Defining command aliases}
  28.285 -
  28.286 -It can be cumbersome to remember the options to both the
  28.287 -\hgxcmd{extdiff}{extdiff} command and the diff viewer you want to use,
  28.288 -so the \hgext{extdiff} extension lets you define \emph{new} commands
  28.289 -that will invoke your diff viewer with exactly the right options.
  28.290 -
  28.291 -All you need to do is edit your \hgrc, and add a section named
  28.292 -\rcsection{extdiff}.  Inside this section, you can define multiple
  28.293 -commands.  Here's how to add a \texttt{kdiff3} command.  Once you've
  28.294 -defined this, you can type ``\texttt{hg kdiff3}'' and the
  28.295 -\hgext{extdiff} extension will run \command{kdiff3} for you.
  28.296 -\begin{codesample2}
  28.297 -  [extdiff]
  28.298 -  cmd.kdiff3 =
  28.299 -\end{codesample2}
  28.300 -If you leave the right hand side of the definition empty, as above,
  28.301 -the \hgext{extdiff} extension uses the name of the command you defined
  28.302 -as the name of the external program to run.  But these names don't
  28.303 -have to be the same.  Here, we define a command named ``\texttt{hg
  28.304 -  wibble}'', which runs \command{kdiff3}.
  28.305 -\begin{codesample2}
  28.306 -  [extdiff]
  28.307 -  cmd.wibble = kdiff3
  28.308 -\end{codesample2}
  28.309 -
  28.310 -You can also specify the default options that you want to invoke your
  28.311 -diff viewing program with.  The prefix to use is ``\texttt{opts.}'',
  28.312 -followed by the name of the command to which the options apply.  This
  28.313 -example defines a ``\texttt{hg vimdiff}'' command that runs the
  28.314 -\command{vim} editor's \texttt{DirDiff} extension.
  28.315 -\begin{codesample2}
  28.316 -  [extdiff]  
  28.317 -  cmd.vimdiff = vim
  28.318 -  opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)'
  28.319 -\end{codesample2}
  28.320 -
  28.321 -\section{Cherrypicking changes with the \hgext{transplant} extension}
  28.322 -\label{sec:hgext:transplant}
  28.323 -
  28.324 -Need to have a long chat with Brendan about this.
  28.325 -
  28.326 -\section{Send changes via email with the \hgext{patchbomb} extension}
  28.327 -\label{sec:hgext:patchbomb}
  28.328 -
  28.329 -Many projects have a culture of ``change review'', in which people
  28.330 -send their modifications to a mailing list for others to read and
  28.331 -comment on before they commit the final version to a shared
  28.332 -repository.  Some projects have people who act as gatekeepers; they
  28.333 -apply changes from other people to a repository to which those others
  28.334 -don't have access.
  28.335 -
  28.336 -Mercurial makes it easy to send changes over email for review or
  28.337 -application, via its \hgext{patchbomb} extension.  The extension is so
  28.338 -namd because changes are formatted as patches, and it's usual to send
  28.339 -one changeset per email message.  Sending a long series of changes by
  28.340 -email is thus much like ``bombing'' the recipient's inbox, hence
  28.341 -``patchbomb''.
  28.342 -
  28.343 -As usual, the basic configuration of the \hgext{patchbomb} extension
  28.344 -takes just one or two lines in your \hgrc.
  28.345 -\begin{codesample2}
  28.346 -  [extensions]
  28.347 -  patchbomb =
  28.348 -\end{codesample2}
  28.349 -Once you've enabled the extension, you will have a new command
  28.350 -available, named \hgxcmd{patchbomb}{email}.
  28.351 -
  28.352 -The safest and best way to invoke the \hgxcmd{patchbomb}{email}
  28.353 -command is to \emph{always} run it first with the
  28.354 -\hgxopt{patchbomb}{email}{-n} option.  This will show you what the
  28.355 -command \emph{would} send, without actually sending anything.  Once
  28.356 -you've had a quick glance over the changes and verified that you are
  28.357 -sending the right ones, you can rerun the same command, with the
  28.358 -\hgxopt{patchbomb}{email}{-n} option removed.
  28.359 -
  28.360 -The \hgxcmd{patchbomb}{email} command accepts the same kind of
  28.361 -revision syntax as every other Mercurial command.  For example, this
  28.362 -command will send every revision between 7 and \texttt{tip},
  28.363 -inclusive.
  28.364 -\begin{codesample2}
  28.365 -  hg email -n 7:tip
  28.366 -\end{codesample2}
  28.367 -You can also specify a \emph{repository} to compare with.  If you
  28.368 -provide a repository but no revisions, the \hgxcmd{patchbomb}{email}
  28.369 -command will send all revisions in the local repository that are not
  28.370 -present in the remote repository.  If you additionally specify
  28.371 -revisions or a branch name (the latter using the
  28.372 -\hgxopt{patchbomb}{email}{-b} option), this will constrain the
  28.373 -revisions sent.
  28.374 -
  28.375 -It's perfectly safe to run the \hgxcmd{patchbomb}{email} command
  28.376 -without the names of the people you want to send to: if you do this,
  28.377 -it will just prompt you for those values interactively.  (If you're
  28.378 -using a Linux or Unix-like system, you should have enhanced
  28.379 -\texttt{readline}-style editing capabilities when entering those
  28.380 -headers, too, which is useful.)
  28.381 -
  28.382 -When you are sending just one revision, the \hgxcmd{patchbomb}{email}
  28.383 -command will by default use the first line of the changeset
  28.384 -description as the subject of the single email message it sends.
  28.385 -
  28.386 -If you send multiple revisions, the \hgxcmd{patchbomb}{email} command
  28.387 -will usually send one message per changeset.  It will preface the
  28.388 -series with an introductory message, in which you should describe the
  28.389 -purpose of the series of changes you're sending.
  28.390 -
  28.391 -\subsection{Changing the behaviour of patchbombs}
  28.392 -
  28.393 -Not every project has exactly the same conventions for sending changes
  28.394 -in email; the \hgext{patchbomb} extension tries to accommodate a
  28.395 -number of variations through command line options.
  28.396 -\begin{itemize}
  28.397 -\item You can write a subject for the introductory message on the
  28.398 -  command line using the \hgxopt{patchbomb}{email}{-s} option.  This
  28.399 -  takes one argument, the text of the subject to use.
  28.400 -\item To change the email address from which the messages originate,
  28.401 -  use the \hgxopt{patchbomb}{email}{-f} option.  This takes one
  28.402 -  argument, the email address to use.
  28.403 -\item The default behaviour is to send unified diffs (see
  28.404 -  section~\ref{sec:mq:patch} for a description of the format), one per
  28.405 -  message.  You can send a binary bundle instead with the
  28.406 -  \hgxopt{patchbomb}{email}{-b} option.  
  28.407 -\item Unified diffs are normally prefaced with a metadata header.  You
  28.408 -  can omit this, and send unadorned diffs, with the
  28.409 -  \hgxopt{patchbomb}{email}{--plain} option.
  28.410 -\item Diffs are normally sent ``inline'', in the same body part as the
  28.411 -  description of a patch.  This makes it easiest for the largest
  28.412 -  number of readers to quote and respond to parts of a diff, as some
  28.413 -  mail clients will only quote the first MIME body part in a message.
  28.414 -  If you'd prefer to send the description and the diff in separate
  28.415 -  body parts, use the \hgxopt{patchbomb}{email}{-a} option.
  28.416 -\item Instead of sending mail messages, you can write them to an
  28.417 -  \texttt{mbox}-format mail folder using the
  28.418 -  \hgxopt{patchbomb}{email}{-m} option.  That option takes one
  28.419 -  argument, the name of the file to write to.
  28.420 -\item If you would like to add a \command{diffstat}-format summary to
  28.421 -  each patch, and one to the introductory message, use the
  28.422 -  \hgxopt{patchbomb}{email}{-d} option.  The \command{diffstat}
  28.423 -  command displays a table containing the name of each file patched,
  28.424 -  the number of lines affected, and a histogram showing how much each
  28.425 -  file is modified.  This gives readers a qualitative glance at how
  28.426 -  complex a patch is.
  28.427 -\end{itemize}
  28.428 -
  28.429 -%%% Local Variables: 
  28.430 -%%% mode: latex
  28.431 -%%% TeX-master: "00book"
  28.432 -%%% End: 

    29.1 --- a/en/hook.tex	Thu Jan 29 22:47:34 2009 -0800
    29.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    29.3 @@ -1,1413 +0,0 @@
    29.4 -\chapter{Handling repository events with hooks}
    29.5 -\label{chap:hook}
    29.6 -
    29.7 -Mercurial offers a powerful mechanism to let you perform automated
    29.8 -actions in response to events that occur in a repository.  In some
    29.9 -cases, you can even control Mercurial's response to those events.
   29.10 -
   29.11 -The name Mercurial uses for one of these actions is a \emph{hook}.
   29.12 -Hooks are called ``triggers'' in some revision control systems, but
   29.13 -the two names refer to the same idea.
   29.14 -
   29.15 -\section{An overview of hooks in Mercurial}
   29.16 -
   29.17 -Here is a brief list of the hooks that Mercurial supports.  We will
   29.18 -revisit each of these hooks in more detail later, in
   29.19 -section~\ref{sec:hook:ref}.
   29.20 -
   29.21 -\begin{itemize}
   29.22 -\item[\small\hook{changegroup}] This is run after a group of
   29.23 -  changesets has been brought into the repository from elsewhere.
   29.24 -\item[\small\hook{commit}] This is run after a new changeset has been
   29.25 -  created in the local repository.
   29.26 -\item[\small\hook{incoming}] This is run once for each new changeset
   29.27 -  that is brought into the repository from elsewhere.  Notice the
   29.28 -  difference from \hook{changegroup}, which is run once per
   29.29 -  \emph{group} of changesets brought in.
   29.30 -\item[\small\hook{outgoing}] This is run after a group of changesets
   29.31 -  has been transmitted from this repository.
   29.32 -\item[\small\hook{prechangegroup}] This is run before starting to
   29.33 -  bring a group of changesets into the repository.
   29.34 -\item[\small\hook{precommit}] Controlling. This is run before starting
   29.35 -  a commit.
   29.36 -\item[\small\hook{preoutgoing}] Controlling. This is run before
   29.37 -  starting to transmit a group of changesets from this repository.
   29.38 -\item[\small\hook{pretag}] Controlling. This is run before creating a tag.
   29.39 -\item[\small\hook{pretxnchangegroup}] Controlling. This is run after a
   29.40 -  group of changesets has been brought into the local repository from
   29.41 -  another, but before the transaction completes that will make the
   29.42 -  changes permanent in the repository.
   29.43 -\item[\small\hook{pretxncommit}] Controlling. This is run after a new
   29.44 -  changeset has been created in the local repository, but before the
   29.45 -  transaction completes that will make it permanent.
   29.46 -\item[\small\hook{preupdate}] Controlling. This is run before starting
   29.47 -  an update or merge of the working directory.
   29.48 -\item[\small\hook{tag}] This is run after a tag is created.
   29.49 -\item[\small\hook{update}] This is run after an update or merge of the
   29.50 -  working directory has finished.
   29.51 -\end{itemize}
   29.52 -Each of the hooks whose description begins with the word
   29.53 -``Controlling'' has the ability to determine whether an activity can
   29.54 -proceed.  If the hook succeeds, the activity may proceed; if it fails,
   29.55 -the activity is either not permitted or undone, depending on the hook.
   29.56 -
   29.57 -\section{Hooks and security}
   29.58 -
   29.59 -\subsection{Hooks are run with your privileges}
   29.60 -
   29.61 -When you run a Mercurial command in a repository, and the command
   29.62 -causes a hook to run, that hook runs on \emph{your} system, under
   29.63 -\emph{your} user account, with \emph{your} privilege level.  Since
   29.64 -hooks are arbitrary pieces of executable code, you should treat them
   29.65 -with an appropriate level of suspicion.  Do not install a hook unless
   29.66 -you are confident that you know who created it and what it does.
   29.67 -
   29.68 -In some cases, you may be exposed to hooks that you did not install
   29.69 -yourself.  If you work with Mercurial on an unfamiliar system,
   29.70 -Mercurial will run hooks defined in that system's global \hgrc\ file.
   29.71 -
   29.72 -If you are working with a repository owned by another user, Mercurial
   29.73 -can run hooks defined in that user's repository, but it will still run
   29.74 -them as ``you''.  For example, if you \hgcmd{pull} from that
   29.75 -repository, and its \sfilename{.hg/hgrc} defines a local
   29.76 -\hook{outgoing} hook, that hook will run under your user account, even
   29.77 -though you don't own that repository.
   29.78 -
   29.79 -\begin{note}
   29.80 -  This only applies if you are pulling from a repository on a local or
   29.81 -  network filesystem.  If you're pulling over http or ssh, any
   29.82 -  \hook{outgoing} hook will run under whatever account is executing
   29.83 -  the server process, on the server.
   29.84 -\end{note}
   29.85 -
   29.86 -XXX To see what hooks are defined in a repository, use the
   29.87 -\hgcmdargs{config}{hooks} command.  If you are working in one
   29.88 -repository, but talking to another that you do not own (e.g.~using
   29.89 -\hgcmd{pull} or \hgcmd{incoming}), remember that it is the other
   29.90 -repository's hooks you should be checking, not your own.
   29.91 -
   29.92 -\subsection{Hooks do not propagate}
   29.93 -
   29.94 -In Mercurial, hooks are not revision controlled, and do not propagate
   29.95 -when you clone, or pull from, a repository.  The reason for this is
   29.96 -simple: a hook is a completely arbitrary piece of executable code.  It
   29.97 -runs under your user identity, with your privilege level, on your
   29.98 -machine.
   29.99 -
  29.100 -It would be extremely reckless for any distributed revision control
  29.101 -system to implement revision-controlled hooks, as this would offer an
  29.102 -easily exploitable way to subvert the accounts of users of the
  29.103 -revision control system.
  29.104 -
  29.105 -Since Mercurial does not propagate hooks, if you are collaborating
  29.106 -with other people on a common project, you should not assume that they
  29.107 -are using the same Mercurial hooks as you are, or that theirs are
  29.108 -correctly configured.  You should document the hooks you expect people
  29.109 -to use.
  29.110 -
  29.111 -In a corporate intranet, this is somewhat easier to control, as you
  29.112 -can for example provide a ``standard'' installation of Mercurial on an
  29.113 -NFS filesystem, and use a site-wide \hgrc\ file to define hooks that
  29.114 -all users will see.  However, this too has its limits; see below.
  29.115 -
  29.116 -\subsection{Hooks can be overridden}
  29.117 -
  29.118 -Mercurial allows you to override a hook definition by redefining the
  29.119 -hook.  You can disable it by setting its value to the empty string, or
  29.120 -change its behaviour as you wish.
  29.121 -
  29.122 -If you deploy a system-~or site-wide \hgrc\ file that defines some
  29.123 -hooks, you should thus understand that your users can disable or
  29.124 -override those hooks.
  29.125 -
  29.126 -\subsection{Ensuring that critical hooks are run}
  29.127 -
  29.128 -Sometimes you may want to enforce a policy that you do not want others
  29.129 -to be able to work around.  For example, you may have a requirement
  29.130 -that every changeset must pass a rigorous set of tests.  Defining this
  29.131 -requirement via a hook in a site-wide \hgrc\ won't work for remote
  29.132 -users on laptops, and of course local users can subvert it at will by
  29.133 -overriding the hook.
  29.134 -
  29.135 -Instead, you can set up your policies for use of Mercurial so that
  29.136 -people are expected to propagate changes through a well-known
  29.137 -``canonical'' server that you have locked down and configured
  29.138 -appropriately.
  29.139 -
  29.140 -One way to do this is via a combination of social engineering and
  29.141 -technology.  Set up a restricted-access account; users can push
  29.142 -changes over the network to repositories managed by this account, but
  29.143 -they cannot log into the account and run normal shell commands.  In
  29.144 -this scenario, a user can commit a changeset that contains any old
  29.145 -garbage they want.
  29.146 -
  29.147 -When someone pushes a changeset to the server that everyone pulls
  29.148 -from, the server will test the changeset before it accepts it as
  29.149 -permanent, and reject it if it fails to pass the test suite.  If
  29.150 -people only pull changes from this filtering server, it will serve to
  29.151 -ensure that all changes that people pull have been automatically
  29.152 -vetted.
  29.153 -
  29.154 -\section{Care with \texttt{pretxn} hooks in a shared-access repository}
  29.155 -
  29.156 -If you want to use hooks to do some automated work in a repository
  29.157 -that a number of people have shared access to, you need to be careful
  29.158 -in how you do this.
  29.159 -
  29.160 -Mercurial only locks a repository when it is writing to the
  29.161 -repository, and only the parts of Mercurial that write to the
  29.162 -repository pay attention to locks.  Write locks are necessary to
  29.163 -prevent multiple simultaneous writers from scribbling on each other's
  29.164 -work, corrupting the repository.
  29.165 -
  29.166 -Because Mercurial is careful with the order in which it reads and
  29.167 -writes data, it does not need to acquire a lock when it wants to read
  29.168 -data from the repository.  The parts of Mercurial that read from the
  29.169 -repository never pay attention to locks.  This lockless reading scheme
  29.170 -greatly increases performance and concurrency.
  29.171 -
  29.172 -With great performance comes a trade-off, though, one which has the
  29.173 -potential to cause you trouble unless you're aware of it.  To describe
  29.174 -this requires a little detail about how Mercurial adds changesets to a
  29.175 -repository and reads those changes.
  29.176 -
  29.177 -When Mercurial \emph{writes} metadata, it writes it straight into the
  29.178 -destination file.  It writes file data first, then manifest data
  29.179 -(which contains pointers to the new file data), then changelog data
  29.180 -(which contains pointers to the new manifest data).  Before the first
  29.181 -write to each file, it stores a record of where the end of the file
  29.182 -was in its transaction log.  If the transaction must be rolled back,
  29.183 -Mercurial simply truncates each file back to the size it was before the
  29.184 -transaction began.
  29.185 -
  29.186 -When Mercurial \emph{reads} metadata, it reads the changelog first,
  29.187 -then everything else.  Since a reader will only access parts of the
  29.188 -manifest or file metadata that it can see in the changelog, it can
  29.189 -never see partially written data.
  29.190 -
  29.191 -Some controlling hooks (\hook{pretxncommit} and
  29.192 -\hook{pretxnchangegroup}) run when a transaction is almost complete.
  29.193 -All of the metadata has been written, but Mercurial can still roll the
  29.194 -transaction back and cause the newly-written data to disappear.
  29.195 -
  29.196 -If one of these hooks runs for long, it opens a window of time during
  29.197 -which a reader can see the metadata for changesets that are not yet
  29.198 -permanent, and should not be thought of as ``really there''.  The
  29.199 -longer the hook runs, the longer that window is open.
  29.200 -
  29.201 -\subsection{The problem illustrated}
  29.202 -
  29.203 -In principle, a good use for the \hook{pretxnchangegroup} hook would
  29.204 -be to automatically build and test incoming changes before they are
  29.205 -accepted into a central repository.  This could let you guarantee that
  29.206 -nobody can push changes to this repository that ``break the build''.
  29.207 -But if a client can pull changes while they're being tested, the
  29.208 -usefulness of the test is zero; an unsuspecting someone can pull
  29.209 -untested changes, potentially breaking their build.
  29.210 -
  29.211 -The safest technological answer to this challenge is to set up such a
  29.212 -``gatekeeper'' repository as \emph{unidirectional}.  Let it take
  29.213 -changes pushed in from the outside, but do not allow anyone to pull
  29.214 -changes from it (use the \hook{preoutgoing} hook to lock it down).
  29.215 -Configure a \hook{changegroup} hook so that if a build or test
  29.216 -succeeds, the hook will push the new changes out to another repository
  29.217 -that people \emph{can} pull from.
  29.218 -
  29.219 -In practice, putting a centralised bottleneck like this in place is
  29.220 -not often a good idea, and transaction visibility has nothing to do
  29.221 -with the problem.  As the size of a project---and the time it takes to
  29.222 -build and test---grows, you rapidly run into a wall with this ``try
  29.223 -before you buy'' approach, where you have more changesets to test than
  29.224 -time in which to deal with them.  The inevitable result is frustration
  29.225 -on the part of all involved.
  29.226 -
  29.227 -An approach that scales better is to get people to build and test
  29.228 -before they push, then run automated builds and tests centrally
  29.229 -\emph{after} a push, to be sure all is well.  The advantage of this
  29.230 -approach is that it does not impose a limit on the rate at which the
  29.231 -repository can accept changes.
  29.232 -
  29.233 -\section{A short tutorial on using hooks}
  29.234 -\label{sec:hook:simple}
  29.235 -
  29.236 -It is easy to write a Mercurial hook.  Let's start with a hook that
  29.237 -runs when you finish a \hgcmd{commit}, and simply prints the hash of
  29.238 -the changeset you just created.  The hook is called \hook{commit}.
  29.239 -
  29.240 -\begin{figure}[ht]
  29.241 -  \interaction{hook.simple.init}
  29.242 -  \caption{A simple hook that runs when a changeset is committed}
  29.243 -  \label{ex:hook:init}
  29.244 -\end{figure}
  29.245 -
  29.246 -All hooks follow the pattern in example~\ref{ex:hook:init}.  You add
  29.247 -an entry to the \rcsection{hooks} section of your \hgrc.  On the left
  29.248 -is the name of the event to trigger on; on the right is the action to
  29.249 -take.  As you can see, you can run an arbitrary shell command in a
  29.250 -hook.  Mercurial passes extra information to the hook using
  29.251 -environment variables (look for \envar{HG\_NODE} in the example).
  29.252 -
  29.253 -\subsection{Performing multiple actions per event}
  29.254 -
  29.255 -Quite often, you will want to define more than one hook for a
  29.256 -particular kind of event, as shown in example~\ref{ex:hook:ext}.
  29.257 -Mercurial lets you do this by adding an \emph{extension} to the end of
  29.258 -a hook's name.  You extend a hook's name by giving the name of the
  29.259 -hook, followed by a full stop (the ``\texttt{.}'' character), followed
  29.260 -by some more text of your choosing.  For example, Mercurial will run
  29.261 -both \texttt{commit.foo} and \texttt{commit.bar} when the
  29.262 -\texttt{commit} event occurs.
  29.263 -
  29.264 -\begin{figure}[ht]
  29.265 -  \interaction{hook.simple.ext}
  29.266 -  \caption{Defining a second \hook{commit} hook}
  29.267 -  \label{ex:hook:ext}
  29.268 -\end{figure}
  29.269 -
  29.270 -To give a well-defined order of execution when there are multiple
  29.271 -hooks defined for an event, Mercurial sorts hooks by extension, and
  29.272 -executes the hook commands in this sorted order.  In the above
  29.273 -example, it will execute \texttt{commit.bar} before
  29.274 -\texttt{commit.foo}, and \texttt{commit} before both.
  29.275 -
  29.276 -It is a good idea to use a somewhat descriptive extension when you
  29.277 -define a new hook.  This will help you to remember what the hook was
  29.278 -for.  If the hook fails, you'll get an error message that contains the
  29.279 -hook name and extension, so using a descriptive extension could give
  29.280 -you an immediate hint as to why the hook failed (see
  29.281 -section~\ref{sec:hook:perm} for an example).
  29.282 -
  29.283 -\subsection{Controlling whether an activity can proceed}
  29.284 -\label{sec:hook:perm}
  29.285 -
  29.286 -In our earlier examples, we used the \hook{commit} hook, which is
  29.287 -run after a commit has completed.  This is one of several Mercurial
  29.288 -hooks that run after an activity finishes.  Such hooks have no way of
  29.289 -influencing the activity itself.
  29.290 -
  29.291 -Mercurial defines a number of events that occur before an activity
  29.292 -starts; or after it starts, but before it finishes.  Hooks that
  29.293 -trigger on these events have the added ability to choose whether the
  29.294 -activity can continue, or will abort.  
  29.295 -
  29.296 -The \hook{pretxncommit} hook runs after a commit has all but
  29.297 -completed.  In other words, the metadata representing the changeset
  29.298 -has been written out to disk, but the transaction has not yet been
  29.299 -allowed to complete.  The \hook{pretxncommit} hook has the ability to
  29.300 -decide whether the transaction can complete, or must be rolled back.
  29.301 -
  29.302 -If the \hook{pretxncommit} hook exits with a status code of zero, the
  29.303 -transaction is allowed to complete; the commit finishes; and the
  29.304 -\hook{commit} hook is run.  If the \hook{pretxncommit} hook exits with
  29.305 -a non-zero status code, the transaction is rolled back; the metadata
  29.306 -representing the changeset is erased; and the \hook{commit} hook is
  29.307 -not run.
  29.308 -
  29.309 -\begin{figure}[ht]
  29.310 -  \interaction{hook.simple.pretxncommit}
  29.311 -  \caption{Using the \hook{pretxncommit} hook to control commits}
  29.312 -  \label{ex:hook:pretxncommit}
  29.313 -\end{figure}
  29.314 -
  29.315 -The hook in example~\ref{ex:hook:pretxncommit} checks that a commit
  29.316 -comment contains a bug ID.  If it does, the commit can complete.  If
  29.317 -not, the commit is rolled back.
  29.318 -
  29.319 -\section{Writing your own hooks}
  29.320 -
  29.321 -When you are writing a hook, you might find it useful to run Mercurial
  29.322 -either with the \hggopt{-v} option, or the \rcitem{ui}{verbose} config
  29.323 -item set to ``true''.  When you do so, Mercurial will print a message
  29.324 -before it calls each hook.
  29.325 -
  29.326 -\subsection{Choosing how your hook should run}
  29.327 -\label{sec:hook:lang}
  29.328 -
  29.329 -You can write a hook either as a normal program---typically a shell
  29.330 -script---or as a Python function that is executed within the Mercurial
  29.331 -process.
  29.332 -
  29.333 -Writing a hook as an external program has the advantage that it
  29.334 -requires no knowledge of Mercurial's internals.  You can call normal
  29.335 -Mercurial commands to get any added information you need.  The
  29.336 -trade-off is that external hooks are slower than in-process hooks.
  29.337 -
  29.338 -An in-process Python hook has complete access to the Mercurial API,
  29.339 -and does not ``shell out'' to another process, so it is inherently
  29.340 -faster than an external hook.  It is also easier to obtain much of the
  29.341 -information that a hook requires by using the Mercurial API than by
  29.342 -running Mercurial commands.
  29.343 -
  29.344 -If you are comfortable with Python, or require high performance,
  29.345 -writing your hooks in Python may be a good choice.  However, when you
  29.346 -have a straightforward hook to write and you don't need to care about
  29.347 -performance (probably the majority of hooks), a shell script is
  29.348 -perfectly fine.
  29.349 -
  29.350 -\subsection{Hook parameters}
  29.351 -\label{sec:hook:param}
  29.352 -
  29.353 -Mercurial calls each hook with a set of well-defined parameters.  In
  29.354 -Python, a parameter is passed as a keyword argument to your hook
  29.355 -function.  For an external program, a parameter is passed as an
  29.356 -environment variable.
  29.357 -
  29.358 -Whether your hook is written in Python or as a shell script, the
  29.359 -hook-specific parameter names and values will be the same.  A boolean
  29.360 -parameter will be represented as a boolean value in Python, but as the
  29.361 -number 1 (for ``true'') or 0 (for ``false'') as an environment
  29.362 -variable for an external hook.  If a hook parameter is named
  29.363 -\texttt{foo}, the keyword argument for a Python hook will also be
  29.364 -named \texttt{foo}, while the environment variable for an external
  29.365 -hook will be named \texttt{HG\_FOO}.
  29.366 -
  29.367 -\subsection{Hook return values and activity control}
  29.368 -
  29.369 -A hook that executes successfully must exit with a status of zero if
  29.370 -external, or return boolean ``false'' if in-process.  Failure is
  29.371 -indicated with a non-zero exit status from an external hook, or an
  29.372 -in-process hook returning boolean ``true''.  If an in-process hook
  29.373 -raises an exception, the hook is considered to have failed.
  29.374 -
  29.375 -For a hook that controls whether an activity can proceed, zero/false
  29.376 -means ``allow'', while non-zero/true/exception means ``deny''.
  29.377 -
  29.378 -\subsection{Writing an external hook}
  29.379 -
  29.380 -When you define an external hook in your \hgrc\ and the hook is run,
  29.381 -its value is passed to your shell, which interprets it.  This means
  29.382 -that you can use normal shell constructs in the body of the hook.
  29.383 -
  29.384 -An executable hook is always run with its current directory set to a
  29.385 -repository's root directory.
  29.386 -
  29.387 -Each hook parameter is passed in as an environment variable; the name
  29.388 -is upper-cased, and prefixed with the string ``\texttt{HG\_}''.
  29.389 -
  29.390 -With the exception of hook parameters, Mercurial does not set or
  29.391 -modify any environment variables when running a hook.  This is useful
  29.392 -to remember if you are writing a site-wide hook that may be run by a
  29.393 -number of different users with differing environment variables set.
  29.394 -In multi-user situations, you should not rely on environment variables
  29.395 -being set to the values you have in your environment when testing the
  29.396 -hook.
  29.397 -
  29.398 -\subsection{Telling Mercurial to use an in-process hook}
  29.399 -
  29.400 -The \hgrc\ syntax for defining an in-process hook is slightly
  29.401 -different than for an executable hook.  The value of the hook must
  29.402 -start with the text ``\texttt{python:}'', and continue with the
  29.403 -fully-qualified name of a callable object to use as the hook's value.
  29.404 -
  29.405 -The module in which a hook lives is automatically imported when a hook
  29.406 -is run.  So long as you have the module name and \envar{PYTHONPATH}
  29.407 -right, it should ``just work''.
  29.408 -
  29.409 -The following \hgrc\ example snippet illustrates the syntax and
  29.410 -meaning of the notions we just described.
  29.411 -\begin{codesample2}
  29.412 -  [hooks]
  29.413 -  commit.example = python:mymodule.submodule.myhook
  29.414 -\end{codesample2}
  29.415 -When Mercurial runs the \texttt{commit.example} hook, it imports
  29.416 -\texttt{mymodule.submodule}, looks for the callable object named
  29.417 -\texttt{myhook}, and calls it.
  29.418 -
  29.419 -\subsection{Writing an in-process hook}
  29.420 -
  29.421 -The simplest in-process hook does nothing, but illustrates the basic
  29.422 -shape of the hook API:
  29.423 -\begin{codesample2}
  29.424 -  def myhook(ui, repo, **kwargs):
  29.425 -      pass
  29.426 -\end{codesample2}
  29.427 -The first argument to a Python hook is always a
  29.428 -\pymodclass{mercurial.ui}{ui} object.  The second is a repository object;
  29.429 -at the moment, it is always an instance of
  29.430 -\pymodclass{mercurial.localrepo}{localrepository}.  Following these two
  29.431 -arguments are other keyword arguments.  Which ones are passed in
  29.432 -depends on the hook being called, but a hook can ignore arguments it
  29.433 -doesn't care about by dropping them into a keyword argument dict, as
  29.434 -with \texttt{**kwargs} above.
  29.435 -
  29.436 -\section{Some hook examples}
  29.437 -
  29.438 -\subsection{Writing meaningful commit messages}
  29.439 -
  29.440 -It's hard to imagine a useful commit message being very short.  The
  29.441 -simple \hook{pretxncommit} hook of figure~\ref{ex:hook:msglen.go}
  29.442 -will prevent you from committing a changeset with a message that is
  29.443 -less than ten bytes long.
  29.444 -
  29.445 -\begin{figure}[ht]
  29.446 -  \interaction{hook.msglen.go}
  29.447 -  \caption{A hook that forbids overly short commit messages}
  29.448 -  \label{ex:hook:msglen.go}
  29.449 -\end{figure}
  29.450 -
  29.451 -\subsection{Checking for trailing whitespace}
  29.452 -
  29.453 -An interesting use of a commit-related hook is to help you to write
  29.454 -cleaner code.  A simple example of ``cleaner code'' is the dictum that
  29.455 -a change should not add any new lines of text that contain ``trailing
  29.456 -whitespace''.  Trailing whitespace is a series of space and tab
  29.457 -characters at the end of a line of text.  In most cases, trailing
  29.458 -whitespace is unnecessary, invisible noise, but it is occasionally
  29.459 -problematic, and people often prefer to get rid of it.
  29.460 -
  29.461 -You can use either the \hook{precommit} or \hook{pretxncommit} hook to
  29.462 -tell whether you have a trailing whitespace problem.  If you use the
  29.463 -\hook{precommit} hook, the hook will not know which files you are
  29.464 -committing, so it will have to check every modified file in the
  29.465 -repository for trailing white space.  If you want to commit a change
  29.466 -to just the file \filename{foo}, but the file \filename{bar} contains
  29.467 -trailing whitespace, doing a check in the \hook{precommit} hook will
  29.468 -prevent you from committing \filename{foo} due to the problem with
  29.469 -\filename{bar}.  This doesn't seem right.
  29.470 -
  29.471 -Should you choose the \hook{pretxncommit} hook, the check won't occur
  29.472 -until just before the transaction for the commit completes.  This will
  29.473 -allow you to check for problems only the exact files that are being
  29.474 -committed.  However, if you entered the commit message interactively
  29.475 -and the hook fails, the transaction will roll back; you'll have to
  29.476 -re-enter the commit message after you fix the trailing whitespace and
  29.477 -run \hgcmd{commit} again.
  29.478 -
  29.479 -\begin{figure}[ht]
  29.480 -  \interaction{hook.ws.simple}
  29.481 -  \caption{A simple hook that checks for trailing whitespace}
  29.482 -  \label{ex:hook:ws.simple}
  29.483 -\end{figure}
  29.484 -
  29.485 -Figure~\ref{ex:hook:ws.simple} introduces a simple \hook{pretxncommit}
  29.486 -hook that checks for trailing whitespace.  This hook is short, but not
  29.487 -very helpful.  It exits with an error status if a change adds a line
  29.488 -with trailing whitespace to any file, but does not print any
  29.489 -information that might help us to identify the offending file or
  29.490 -line.  It also has the nice property of not paying attention to
  29.491 -unmodified lines; only lines that introduce new trailing whitespace
  29.492 -cause problems.
  29.493 -
  29.494 -\begin{figure}[ht]
  29.495 -  \interaction{hook.ws.better}
  29.496 -  \caption{A better trailing whitespace hook}
  29.497 -  \label{ex:hook:ws.better}
  29.498 -\end{figure}
  29.499 -
  29.500 -The example of figure~\ref{ex:hook:ws.better} is much more complex,
  29.501 -but also more useful.  It parses a unified diff to see if any lines
  29.502 -add trailing whitespace, and prints the name of the file and the line
  29.503 -number of each such occurrence.  Even better, if the change adds
  29.504 -trailing whitespace, this hook saves the commit comment and prints the
  29.505 -name of the save file before exiting and telling Mercurial to roll the
  29.506 -transaction back, so you can use
  29.507 -\hgcmdargs{commit}{\hgopt{commit}{-l}~\emph{filename}} to reuse the
  29.508 -saved commit message once you've corrected the problem.
  29.509 -
  29.510 -As a final aside, note in figure~\ref{ex:hook:ws.better} the use of
  29.511 -\command{perl}'s in-place editing feature to get rid of trailing
  29.512 -whitespace from a file.  This is concise and useful enough that I will
  29.513 -reproduce it here.
  29.514 -\begin{codesample2}
  29.515 -  perl -pi -e 's,\textbackslash{}s+\$,,' filename
  29.516 -\end{codesample2}
  29.517 -
  29.518 -\section{Bundled hooks}
  29.519 -
  29.520 -Mercurial ships with several bundled hooks.  You can find them in the
  29.521 -\dirname{hgext} directory of a Mercurial source tree.  If you are
  29.522 -using a Mercurial binary package, the hooks will be located in the
  29.523 -\dirname{hgext} directory of wherever your package installer put
  29.524 -Mercurial.
  29.525 -
  29.526 -\subsection{\hgext{acl}---access control for parts of a repository}
  29.527 -
  29.528 -The \hgext{acl} extension lets you control which remote users are
  29.529 -allowed to push changesets to a networked server.  You can protect any
  29.530 -portion of a repository (including the entire repo), so that a
  29.531 -specific remote user can push changes that do not affect the protected
  29.532 -portion.
  29.533 -
  29.534 -This extension implements access control based on the identity of the
  29.535 -user performing a push, \emph{not} on who committed the changesets
  29.536 -they're pushing.  It makes sense to use this hook only if you have a
  29.537 -locked-down server environment that authenticates remote users, and
  29.538 -you want to be sure that only specific users are allowed to push
  29.539 -changes to that server.
  29.540 -
  29.541 -\subsubsection{Configuring the \hook{acl} hook}
  29.542 -
  29.543 -In order to manage incoming changesets, the \hgext{acl} hook must be
  29.544 -used as a \hook{pretxnchangegroup} hook.  This lets it see which files
  29.545 -are modified by each incoming changeset, and roll back a group of
  29.546 -changesets if they modify ``forbidden'' files.  Example:
  29.547 -\begin{codesample2}
  29.548 -  [hooks]
  29.549 -  pretxnchangegroup.acl = python:hgext.acl.hook
  29.550 -\end{codesample2}
  29.551 -
  29.552 -The \hgext{acl} extension is configured using three sections.  
  29.553 -
  29.554 -The \rcsection{acl} section has only one entry, \rcitem{acl}{sources},
  29.555 -which lists the sources of incoming changesets that the hook should
  29.556 -pay attention to.  You don't normally need to configure this section.
  29.557 -\begin{itemize}
  29.558 -\item[\rcitem{acl}{serve}] Control incoming changesets that are arriving
  29.559 -  from a remote repository over http or ssh.  This is the default
  29.560 -  value of \rcitem{acl}{sources}, and usually the only setting you'll
  29.561 -  need for this configuration item.
  29.562 -\item[\rcitem{acl}{pull}] Control incoming changesets that are
  29.563 -  arriving via a pull from a local repository.
  29.564 -\item[\rcitem{acl}{push}] Control incoming changesets that are
  29.565 -  arriving via a push from a local repository.
  29.566 -\item[\rcitem{acl}{bundle}] Control incoming changesets that are
  29.567 -  arriving from another repository via a bundle.
  29.568 -\end{itemize}
  29.569 -
  29.570 -The \rcsection{acl.allow} section controls the users that are allowed to
  29.571 -add changesets to the repository.  If this section is not present, all
  29.572 -users that are not explicitly denied are allowed.  If this section is
  29.573 -present, all users that are not explicitly allowed are denied (so an
  29.574 -empty section means that all users are denied).
  29.575 -
  29.576 -The \rcsection{acl.deny} section determines which users are denied
  29.577 -from adding changesets to the repository.  If this section is not
  29.578 -present or is empty, no users are denied.
  29.579 -
  29.580 -The syntaxes for the \rcsection{acl.allow} and \rcsection{acl.deny}
  29.581 -sections are identical.  On the left of each entry is a glob pattern
  29.582 -that matches files or directories, relative to the root of the
  29.583 -repository; on the right, a user name.
  29.584 -
  29.585 -In the following example, the user \texttt{docwriter} can only push
  29.586 -changes to the \dirname{docs} subtree of the repository, while
  29.587 -\texttt{intern} can push changes to any file or directory except
  29.588 -\dirname{source/sensitive}.
  29.589 -\begin{codesample2}
  29.590 -  [acl.allow]
  29.591 -  docs/** = docwriter
  29.592 -
  29.593 -  [acl.deny]
  29.594 -  source/sensitive/** = intern
  29.595 -\end{codesample2}
  29.596 -
  29.597 -\subsubsection{Testing and troubleshooting}
  29.598 -
  29.599 -If you want to test the \hgext{acl} hook, run it with Mercurial's
  29.600 -debugging output enabled.  Since you'll probably be running it on a
  29.601 -server where it's not convenient (or sometimes possible) to pass in
  29.602 -the \hggopt{--debug} option, don't forget that you can enable
  29.603 -debugging output in your \hgrc:
  29.604 -\begin{codesample2}
  29.605 -  [ui]
  29.606 -  debug = true
  29.607 -\end{codesample2}
  29.608 -With this enabled, the \hgext{acl} hook will print enough information
  29.609 -to let you figure out why it is allowing or forbidding pushes from
  29.610 -specific users.
  29.611 -
  29.612 -\subsection{\hgext{bugzilla}---integration with Bugzilla}
  29.613 -
  29.614 -The \hgext{bugzilla} extension adds a comment to a Bugzilla bug
  29.615 -whenever it finds a reference to that bug ID in a commit comment.  You
  29.616 -can install this hook on a shared server, so that any time a remote
  29.617 -user pushes changes to this server, the hook gets run.  
  29.618 -
  29.619 -It adds a comment to the bug that looks like this (you can configure
  29.620 -the contents of the comment---see below):
  29.621 -\begin{codesample2}
  29.622 -  Changeset aad8b264143a, made by Joe User <joe.user@domain.com> in
  29.623 -  the frobnitz repository, refers to this bug.
  29.624 -
  29.625 -  For complete details, see
  29.626 -  http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a
  29.627 -
  29.628 -  Changeset description:
  29.629 -        Fix bug 10483 by guarding against some NULL pointers
  29.630 -\end{codesample2}
  29.631 -The value of this hook is that it automates the process of updating a
  29.632 -bug any time a changeset refers to it.  If you configure the hook
  29.633 -properly, it makes it easy for people to browse straight from a
  29.634 -Bugzilla bug to a changeset that refers to that bug.
  29.635 -
  29.636 -You can use the code in this hook as a starting point for some more
  29.637 -exotic Bugzilla integration recipes.  Here are a few possibilities:
  29.638 -\begin{itemize}
  29.639 -\item Require that every changeset pushed to the server have a valid
  29.640 -  bug~ID in its commit comment.  In this case, you'd want to configure
  29.641 -  the hook as a \hook{pretxncommit} hook.  This would allow the hook
  29.642 -  to reject changes that didn't contain bug IDs.
  29.643 -\item Allow incoming changesets to automatically modify the
  29.644 -  \emph{state} of a bug, as well as simply adding a comment.  For
  29.645 -  example, the hook could recognise the string ``fixed bug 31337'' as
  29.646 -  indicating that it should update the state of bug 31337 to
  29.647 -  ``requires testing''.
  29.648 -\end{itemize}
  29.649 -
  29.650 -\subsubsection{Configuring the \hook{bugzilla} hook}
  29.651 -\label{sec:hook:bugzilla:config}
  29.652 -
  29.653 -You should configure this hook in your server's \hgrc\ as an
  29.654 -\hook{incoming} hook, for example as follows:
  29.655 -\begin{codesample2}
  29.656 -  [hooks]
  29.657 -  incoming.bugzilla = python:hgext.bugzilla.hook
  29.658 -\end{codesample2}
  29.659 -
  29.660 -Because of the specialised nature of this hook, and because Bugzilla
  29.661 -was not written with this kind of integration in mind, configuring
  29.662 -this hook is a somewhat involved process.
  29.663 -
  29.664 -Before you begin, you must install the MySQL bindings for Python on
  29.665 -the host(s) where you'll be running the hook.  If this is not
  29.666 -available as a binary package for your system, you can download it
  29.667 -from~\cite{web:mysql-python}.
  29.668 -
  29.669 -Configuration information for this hook lives in the
  29.670 -\rcsection{bugzilla} section of your \hgrc.
  29.671 -\begin{itemize}
  29.672 -\item[\rcitem{bugzilla}{version}] The version of Bugzilla installed on
  29.673 -  the server.  The database schema that Bugzilla uses changes
  29.674 -  occasionally, so this hook has to know exactly which schema to use.
  29.675 -  At the moment, the only version supported is \texttt{2.16}.
  29.676 -\item[\rcitem{bugzilla}{host}] The hostname of the MySQL server that
  29.677 -  stores your Bugzilla data.  The database must be configured to allow
  29.678 -  connections from whatever host you are running the \hook{bugzilla}
  29.679 -  hook on.
  29.680 -\item[\rcitem{bugzilla}{user}] The username with which to connect to
  29.681 -  the MySQL server.  The database must be configured to allow this
  29.682 -  user to connect from whatever host you are running the
  29.683 -  \hook{bugzilla} hook on.  This user must be able to access and
  29.684 -  modify Bugzilla tables.  The default value of this item is
  29.685 -  \texttt{bugs}, which is the standard name of the Bugzilla user in a
  29.686 -  MySQL database.
  29.687 -\item[\rcitem{bugzilla}{password}] The MySQL password for the user you
  29.688 -  configured above.  This is stored as plain text, so you should make
  29.689 -  sure that unauthorised users cannot read the \hgrc\ file where you
  29.690 -  store this information.
  29.691 -\item[\rcitem{bugzilla}{db}] The name of the Bugzilla database on the
  29.692 -  MySQL server.  The default value of this item is \texttt{bugs},
  29.693 -  which is the standard name of the MySQL database where Bugzilla
  29.694 -  stores its data.
  29.695 -\item[\rcitem{bugzilla}{notify}] If you want Bugzilla to send out a
  29.696 -  notification email to subscribers after this hook has added a
  29.697 -  comment to a bug, you will need this hook to run a command whenever
  29.698 -  it updates the database.  The command to run depends on where you
  29.699 -  have installed Bugzilla, but it will typically look something like
  29.700 -  this, if you have Bugzilla installed in
  29.701 -  \dirname{/var/www/html/bugzilla}:
  29.702 -  \begin{codesample4}
  29.703 -    cd /var/www/html/bugzilla && ./processmail %s nobody@nowhere.com
  29.704 -  \end{codesample4}
  29.705 -  The Bugzilla \texttt{processmail} program expects to be given a
  29.706 -  bug~ID (the hook replaces ``\texttt{\%s}'' with the bug~ID) and an
  29.707 -  email address.  It also expects to be able to write to some files in
  29.708 -  the directory that it runs in.  If Bugzilla and this hook are not
  29.709 -  installed on the same machine, you will need to find a way to run
  29.710 -  \texttt{processmail} on the server where Bugzilla is installed.
  29.711 -\end{itemize}
  29.712 -
  29.713 -\subsubsection{Mapping committer names to Bugzilla user names}
  29.714 -
  29.715 -By default, the \hgext{bugzilla} hook tries to use the email address
  29.716 -of a changeset's committer as the Bugzilla user name with which to
  29.717 -update a bug.  If this does not suit your needs, you can map committer
  29.718 -email addresses to Bugzilla user names using a \rcsection{usermap}
  29.719 -section.
  29.720 -
  29.721 -Each item in the \rcsection{usermap} section contains an email address
  29.722 -on the left, and a Bugzilla user name on the right.
  29.723 -\begin{codesample2}
  29.724 -  [usermap]
  29.725 -  jane.user@example.com = jane
  29.726 -\end{codesample2}
  29.727 -You can either keep the \rcsection{usermap} data in a normal \hgrc, or
  29.728 -tell the \hgext{bugzilla} hook to read the information from an
  29.729 -external \filename{usermap} file.  In the latter case, you can store
  29.730 -\filename{usermap} data by itself in (for example) a user-modifiable
  29.731 -repository.  This makes it possible to let your users maintain their
  29.732 -own \rcitem{bugzilla}{usermap} entries.  The main \hgrc\ file might
  29.733 -look like this:
  29.734 -\begin{codesample2}
  29.735 -  # regular hgrc file refers to external usermap file
  29.736 -  [bugzilla]
  29.737 -  usermap = /home/hg/repos/userdata/bugzilla-usermap.conf
  29.738 -\end{codesample2}
  29.739 -While the \filename{usermap} file that it refers to might look like
  29.740 -this:
  29.741 -\begin{codesample2}
  29.742 -  # bugzilla-usermap.conf - inside a hg repository
  29.743 -  [usermap]
  29.744 -  stephanie@example.com = steph
  29.745 -\end{codesample2}
  29.746 -
  29.747 -\subsubsection{Configuring the text that gets added to a bug}
  29.748 -
  29.749 -You can configure the text that this hook adds as a comment; you
  29.750 -specify it in the form of a Mercurial template.  Several \hgrc\
  29.751 -entries (still in the \rcsection{bugzilla} section) control this
  29.752 -behaviour.
  29.753 -\begin{itemize}
  29.754 -\item[\texttt{strip}] The number of leading path elements to strip
  29.755 -  from a repository's path name to construct a partial path for a URL.
  29.756 -  For example, if the repositories on your server live under
  29.757 -  \dirname{/home/hg/repos}, and you have a repository whose path is
  29.758 -  \dirname{/home/hg/repos/app/tests}, then setting \texttt{strip} to
  29.759 -  \texttt{4} will give a partial path of \dirname{app/tests}.  The
  29.760 -  hook will make this partial path available when expanding a
  29.761 -  template, as \texttt{webroot}.
  29.762 -\item[\texttt{template}] The text of the template to use.  In addition
  29.763 -  to the usual changeset-related variables, this template can use
  29.764 -  \texttt{hgweb} (the value of the \texttt{hgweb} configuration item
  29.765 -  above) and \texttt{webroot} (the path constructed using
  29.766 -  \texttt{strip} above).
  29.767 -\end{itemize}
  29.768 -
  29.769 -In addition, you can add a \rcitem{web}{baseurl} item to the
  29.770 -\rcsection{web} section of your \hgrc.  The \hgext{bugzilla} hook will
  29.771 -make this available when expanding a template, as the base string to
  29.772 -use when constructing a URL that will let users browse from a Bugzilla
  29.773 -comment to view a changeset.  Example:
  29.774 -\begin{codesample2}
  29.775 -  [web]
  29.776 -  baseurl = http://hg.domain.com/
  29.777 -\end{codesample2}
  29.778 -
  29.779 -Here is an example set of \hgext{bugzilla} hook config information.
  29.780 -\begin{codesample2}
  29.781 -  [bugzilla]
  29.782 -  host = bugzilla.example.com
  29.783 -  password = mypassword
  29.784 -  version = 2.16
  29.785 -  # server-side repos live in /home/hg/repos, so strip 4 leading
  29.786 -  # separators
  29.787 -  strip = 4
  29.788 -  hgweb = http://hg.example.com/
  29.789 -  usermap = /home/hg/repos/notify/bugzilla.conf
  29.790 -  template = Changeset \{node|short\}, made by \{author\} in the \{webroot\}
  29.791 -    repo, refers to this bug.\\nFor complete details, see 
  29.792 -    \{hgweb\}\{webroot\}?cmd=changeset;node=\{node|short\}\\nChangeset
  29.793 -    description:\\n\\t\{desc|tabindent\}
  29.794 -\end{codesample2}
  29.795 -
  29.796 -\subsubsection{Testing and troubleshooting}
  29.797 -
  29.798 -The most common problems with configuring the \hgext{bugzilla} hook
  29.799 -relate to running Bugzilla's \filename{processmail} script and mapping
  29.800 -committer names to user names.
  29.801 -
  29.802 -Recall from section~\ref{sec:hook:bugzilla:config} above that the user
  29.803 -that runs the Mercurial process on the server is also the one that
  29.804 -will run the \filename{processmail} script.  The
  29.805 -\filename{processmail} script sometimes causes Bugzilla to write to
  29.806 -files in its configuration directory, and Bugzilla's configuration
  29.807 -files are usually owned by the user that your web server runs under.
  29.808 -
  29.809 -You can cause \filename{processmail} to be run with the suitable
  29.810 -user's identity using the \command{sudo} command.  Here is an example
  29.811 -entry for a \filename{sudoers} file.
  29.812 -\begin{codesample2}
  29.813 -  hg_user = (httpd_user) NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s
  29.814 -\end{codesample2}
  29.815 -This allows the \texttt{hg\_user} user to run a
  29.816 -\filename{processmail-wrapper} program under the identity of
  29.817 -\texttt{httpd\_user}.
  29.818 -
  29.819 -This indirection through a wrapper script is necessary, because
  29.820 -\filename{processmail} expects to be run with its current directory
  29.821 -set to wherever you installed Bugzilla; you can't specify that kind of
  29.822 -constraint in a \filename{sudoers} file.  The contents of the wrapper
  29.823 -script are simple:
  29.824 -\begin{codesample2}
  29.825 -  #!/bin/sh
  29.826 -  cd `dirname $0` && ./processmail "$1" nobody@example.com
  29.827 -\end{codesample2}
  29.828 -It doesn't seem to matter what email address you pass to
  29.829 -\filename{processmail}.
  29.830 -
  29.831 -If your \rcsection{usermap} is not set up correctly, users will see an
  29.832 -error message from the \hgext{bugzilla} hook when they push changes
  29.833 -to the server.  The error message will look like this:
  29.834 -\begin{codesample2}
  29.835 -  cannot find bugzilla user id for john.q.public@example.com
  29.836 -\end{codesample2}
  29.837 -What this means is that the committer's address,
  29.838 -\texttt{john.q.public@example.com}, is not a valid Bugzilla user name,
  29.839 -nor does it have an entry in your \rcsection{usermap} that maps it to
  29.840 -a valid Bugzilla user name.
  29.841 -
  29.842 -\subsection{\hgext{notify}---send email notifications}
  29.843 -
  29.844 -Although Mercurial's built-in web server provides RSS feeds of changes
  29.845 -in every repository, many people prefer to receive change
  29.846 -notifications via email.  The \hgext{notify} hook lets you send out
  29.847 -notifications to a set of email addresses whenever changesets arrive
  29.848 -that those subscribers are interested in.
  29.849 -
  29.850 -As with the \hgext{bugzilla} hook, the \hgext{notify} hook is
  29.851 -template-driven, so you can customise the contents of the notification
  29.852 -messages that it sends.
  29.853 -
  29.854 -By default, the \hgext{notify} hook includes a diff of every changeset
  29.855 -that it sends out; you can limit the size of the diff, or turn this
  29.856 -feature off entirely.  It is useful for letting subscribers review
  29.857 -changes immediately, rather than clicking to follow a URL.
  29.858 -
  29.859 -\subsubsection{Configuring the \hgext{notify} hook}
  29.860 -
  29.861 -You can set up the \hgext{notify} hook to send one email message per
  29.862 -incoming changeset, or one per incoming group of changesets (all those
  29.863 -that arrived in a single pull or push).
  29.864 -\begin{codesample2}
  29.865 -  [hooks]
  29.866 -  # send one email per group of changes
  29.867 -  changegroup.notify = python:hgext.notify.hook
  29.868 -  # send one email per change
  29.869 -  incoming.notify = python:hgext.notify.hook
  29.870 -\end{codesample2}
  29.871 -
  29.872 -Configuration information for this hook lives in the
  29.873 -\rcsection{notify} section of a \hgrc\ file.
  29.874 -\begin{itemize}
  29.875 -\item[\rcitem{notify}{test}] By default, this hook does not send out
  29.876 -  email at all; instead, it prints the message that it \emph{would}
  29.877 -  send.  Set this item to \texttt{false} to allow email to be sent.
  29.878 -  The reason that sending of email is turned off by default is that it
  29.879 -  takes several tries to configure this extension exactly as you would
  29.880 -  like, and it would be bad form to spam subscribers with a number of
  29.881 -  ``broken'' notifications while you debug your configuration.
  29.882 -\item[\rcitem{notify}{config}] The path to a configuration file that
  29.883 -  contains subscription information.  This is kept separate from the
  29.884 -  main \hgrc\ so that you can maintain it in a repository of its own.
  29.885 -  People can then clone that repository, update their subscriptions,
  29.886 -  and push the changes back to your server.
  29.887 -\item[\rcitem{notify}{strip}] The number of leading path separator
  29.888 -  characters to strip from a repository's path, when deciding whether
  29.889 -  a repository has subscribers.  For example, if the repositories on
  29.890 -  your server live in \dirname{/home/hg/repos}, and \hgext{notify} is
  29.891 -  considering a repository named \dirname{/home/hg/repos/shared/test},
  29.892 -  setting \rcitem{notify}{strip} to \texttt{4} will cause
  29.893 -  \hgext{notify} to trim the path it considers down to
  29.894 -  \dirname{shared/test}, and it will match subscribers against that.
  29.895 -\item[\rcitem{notify}{template}] The template text to use when sending
  29.896 -  messages.  This specifies both the contents of the message header
  29.897 -  and its body.
  29.898 -\item[\rcitem{notify}{maxdiff}] The maximum number of lines of diff
  29.899 -  data to append to the end of a message.  If a diff is longer than
  29.900 -  this, it is truncated.  By default, this is set to 300.  Set this to
  29.901 -  \texttt{0} to omit diffs from notification emails.
  29.902 -\item[\rcitem{notify}{sources}] A list of sources of changesets to
  29.903 -  consider.  This lets you limit \hgext{notify} to only sending out
  29.904 -  email about changes that remote users pushed into this repository
  29.905 -  via a server, for example.  See section~\ref{sec:hook:sources} for
  29.906 -  the sources you can specify here.
  29.907 -\end{itemize}
  29.908 -
  29.909 -If you set the \rcitem{web}{baseurl} item in the \rcsection{web}
  29.910 -section, you can use it in a template; it will be available as
  29.911 -\texttt{webroot}.
  29.912 -
  29.913 -Here is an example set of \hgext{notify} configuration information.
  29.914 -\begin{codesample2}
  29.915 -  [notify]
  29.916 -  # really send email
  29.917 -  test = false
  29.918 -  # subscriber data lives in the notify repo
  29.919 -  config = /home/hg/repos/notify/notify.conf
  29.920 -  # repos live in /home/hg/repos on server, so strip 4 "/" chars
  29.921 -  strip = 4
  29.922 -  template = X-Hg-Repo: \{webroot\}
  29.923 -    Subject: \{webroot\}: \{desc|firstline|strip\}
  29.924 -    From: \{author\}
  29.925 -
  29.926 -    changeset \{node|short\} in \{root\}
  29.927 -    details: \{baseurl\}\{webroot\}?cmd=changeset;node=\{node|short\}
  29.928 -    description:
  29.929 -      \{desc|tabindent|strip\}
  29.930 -
  29.931 -  [web]
  29.932 -  baseurl = http://hg.example.com/
  29.933 -\end{codesample2}
  29.934 -
  29.935 -This will produce a message that looks like the following:
  29.936 -\begin{codesample2}
  29.937 -  X-Hg-Repo: tests/slave
  29.938 -  Subject: tests/slave: Handle error case when slave has no buffers
  29.939 -  Date: Wed,  2 Aug 2006 15:25:46 -0700 (PDT)
  29.940 -
  29.941 -  changeset 3cba9bfe74b5 in /home/hg/repos/tests/slave
  29.942 -  details: http://hg.example.com/tests/slave?cmd=changeset;node=3cba9bfe74b5
  29.943 -  description:
  29.944 -          Handle error case when slave has no buffers
  29.945 -  diffs (54 lines):
  29.946 -
  29.947 -  diff -r 9d95df7cf2ad -r 3cba9bfe74b5 include/tests.h
  29.948 -  --- a/include/tests.h      Wed Aug 02 15:19:52 2006 -0700
  29.949 -  +++ b/include/tests.h      Wed Aug 02 15:25:26 2006 -0700
  29.950 -  @@ -212,6 +212,15 @@ static __inline__ void test_headers(void *h)
  29.951 -  [...snip...]
  29.952 -\end{codesample2}
  29.953 -
  29.954 -\subsubsection{Testing and troubleshooting}
  29.955 -
  29.956 -Do not forget that by default, the \hgext{notify} extension \emph{will
  29.957 -  not send any mail} until you explicitly configure it to do so, by
  29.958 -setting \rcitem{notify}{test} to \texttt{false}.  Until you do that,
  29.959 -it simply prints the message it \emph{would} send.
  29.960 -
  29.961 -\section{Information for writers of hooks}
  29.962 -\label{sec:hook:ref}
  29.963 -
  29.964 -\subsection{In-process hook execution}
  29.965 -
  29.966 -An in-process hook is called with arguments of the following form:
  29.967 -\begin{codesample2}
  29.968 -  def myhook(ui, repo, **kwargs):
  29.969 -      pass
  29.970 -\end{codesample2}
  29.971 -The \texttt{ui} parameter is a \pymodclass{mercurial.ui}{ui} object.
  29.972 -The \texttt{repo} parameter is a
  29.973 -\pymodclass{mercurial.localrepo}{localrepository} object.  The
  29.974 -names and values of the \texttt{**kwargs} parameters depend on the
  29.975 -hook being invoked, with the following common features:
  29.976 -\begin{itemize}
  29.977 -\item If a parameter is named \texttt{node} or
  29.978 -  \texttt{parent\emph{N}}, it will contain a hexadecimal changeset ID.
  29.979 -  The empty string is used to represent ``null changeset ID'' instead
  29.980 -  of a string of zeroes.
  29.981 -\item If a parameter is named \texttt{url}, it will contain the URL of
  29.982 -  a remote repository, if that can be determined.
  29.983 -\item Boolean-valued parameters are represented as Python
  29.984 -  \texttt{bool} objects.
  29.985 -\end{itemize}
  29.986 -
  29.987 -An in-process hook is called without a change to the process's working
  29.988 -directory (unlike external hooks, which are run in the root of the
  29.989 -repository).  It must not change the process's working directory, or
  29.990 -it will cause any calls it makes into the Mercurial API to fail.
  29.991 -
  29.992 -If a hook returns a boolean ``false'' value, it is considered to have
  29.993 -succeeded.  If it returns a boolean ``true'' value or raises an
  29.994 -exception, it is considered to have failed.  A useful way to think of
  29.995 -the calling convention is ``tell me if you fail''.
  29.996 -
  29.997 -Note that changeset IDs are passed into Python hooks as hexadecimal
  29.998 -strings, not the binary hashes that Mercurial's APIs normally use.  To
  29.999 -convert a hash from hex to binary, use the
 29.1000 -\pymodfunc{mercurial.node}{bin} function.
 29.1001 -
 29.1002 -\subsection{External hook execution}
 29.1003 -
 29.1004 -An external hook is passed to the shell of the user running Mercurial.
 29.1005 -Features of that shell, such as variable substitution and command
 29.1006 -redirection, are available.  The hook is run in the root directory of
 29.1007 -the repository (unlike in-process hooks, which are run in the same
 29.1008 -directory that Mercurial was run in).
 29.1009 -
 29.1010 -Hook parameters are passed to the hook as environment variables.  Each
 29.1011 -environment variable's name is converted in upper case and prefixed
 29.1012 -with the string ``\texttt{HG\_}''.  For example, if the name of a
 29.1013 -parameter is ``\texttt{node}'', the name of the environment variable
 29.1014 -representing that parameter will be ``\texttt{HG\_NODE}''.
 29.1015 -
 29.1016 -A boolean parameter is represented as the string ``\texttt{1}'' for
 29.1017 -``true'', ``\texttt{0}'' for ``false''.  If an environment variable is
 29.1018 -named \envar{HG\_NODE}, \envar{HG\_PARENT1} or \envar{HG\_PARENT2}, it
 29.1019 -contains a changeset ID represented as a hexadecimal string.  The
 29.1020 -empty string is used to represent ``null changeset ID'' instead of a
 29.1021 -string of zeroes.  If an environment variable is named
 29.1022 -\envar{HG\_URL}, it will contain the URL of a remote repository, if
 29.1023 -that can be determined.
 29.1024 -
 29.1025 -If a hook exits with a status of zero, it is considered to have
 29.1026 -succeeded.  If it exits with a non-zero status, it is considered to
 29.1027 -have failed.
 29.1028 -
 29.1029 -\subsection{Finding out where changesets come from}
 29.1030 -
 29.1031 -A hook that involves the transfer of changesets between a local
 29.1032 -repository and another may be able to find out information about the
 29.1033 -``far side''.  Mercurial knows \emph{how} changes are being
 29.1034 -transferred, and in many cases \emph{where} they are being transferred
 29.1035 -to or from.
 29.1036 -
 29.1037 -\subsubsection{Sources of changesets}
 29.1038 -\label{sec:hook:sources}
 29.1039 -
 29.1040 -Mercurial will tell a hook what means are, or were, used to transfer
 29.1041 -changesets between repositories.  This is provided by Mercurial in a
 29.1042 -Python parameter named \texttt{source}, or an environment variable named
 29.1043 -\envar{HG\_SOURCE}.
 29.1044 -
 29.1045 -\begin{itemize}
 29.1046 -\item[\texttt{serve}] Changesets are transferred to or from a remote
 29.1047 -  repository over http or ssh.
 29.1048 -\item[\texttt{pull}] Changesets are being transferred via a pull from
 29.1049 -  one repository into another.
 29.1050 -\item[\texttt{push}] Changesets are being transferred via a push from
 29.1051 -  one repository into another.
 29.1052 -\item[\texttt{bundle}] Changesets are being transferred to or from a
 29.1053 -  bundle.
 29.1054 -\end{itemize}
 29.1055 -
 29.1056 -\subsubsection{Where changes are going---remote repository URLs}
 29.1057 -\label{sec:hook:url}
 29.1058 -
 29.1059 -When possible, Mercurial will tell a hook the location of the ``far
 29.1060 -side'' of an activity that transfers changeset data between
 29.1061 -repositories.  This is provided by Mercurial in a Python parameter
 29.1062 -named \texttt{url}, or an environment variable named \envar{HG\_URL}.
 29.1063 -
 29.1064 -This information is not always known.  If a hook is invoked in a
 29.1065 -repository that is being served via http or ssh, Mercurial cannot tell
 29.1066 -where the remote repository is, but it may know where the client is
 29.1067 -connecting from.  In such cases, the URL will take one of the
 29.1068 -following forms:
 29.1069 -\begin{itemize}
 29.1070 -\item \texttt{remote:ssh:\emph{ip-address}}---remote ssh client, at
 29.1071 -  the given IP address.
 29.1072 -\item \texttt{remote:http:\emph{ip-address}}---remote http client, at
 29.1073 -  the given IP address.  If the client is using SSL, this will be of
 29.1074 -  the form \texttt{remote:https:\emph{ip-address}}.
 29.1075 -\item Empty---no information could be discovered about the remote
 29.1076 -  client.
 29.1077 -\end{itemize}
 29.1078 -
 29.1079 -\section{Hook reference}
 29.1080 -
 29.1081 -\subsection{\hook{changegroup}---after remote changesets added}
 29.1082 -\label{sec:hook:changegroup}
 29.1083 -
 29.1084 -This hook is run after a group of pre-existing changesets has been
 29.1085 -added to the repository, for example via a \hgcmd{pull} or
 29.1086 -\hgcmd{unbundle}.  This hook is run once per operation that added one
 29.1087 -or more changesets.  This is in contrast to the \hook{incoming} hook,
 29.1088 -which is run once per changeset, regardless of whether the changesets
 29.1089 -arrive in a group.
 29.1090 -
 29.1091 -Some possible uses for this hook include kicking off an automated
 29.1092 -build or test of the added changesets, updating a bug database, or
 29.1093 -notifying subscribers that a repository contains new changes.
 29.1094 -
 29.1095 -Parameters to this hook:
 29.1096 -\begin{itemize}
 29.1097 -\item[\texttt{node}] A changeset ID.  The changeset ID of the first
 29.1098 -  changeset in the group that was added.  All changesets between this
 29.1099 -  and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by
 29.1100 -  a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}.
 29.1101 -\item[\texttt{source}] A string.  The source of these changes.  See
 29.1102 -  section~\ref{sec:hook:sources} for details.
 29.1103 -\item[\texttt{url}] A URL.  The location of the remote repository, if
 29.1104 -  known.  See section~\ref{sec:hook:url} for more information.
 29.1105 -\end{itemize}
 29.1106 -
 29.1107 -See also: \hook{incoming} (section~\ref{sec:hook:incoming}),
 29.1108 -\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}),
 29.1109 -\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup})
 29.1110 -
 29.1111 -\subsection{\hook{commit}---after a new changeset is created}
 29.1112 -\label{sec:hook:commit}
 29.1113 -
 29.1114 -This hook is run after a new changeset has been created.
 29.1115 -
 29.1116 -Parameters to this hook:
 29.1117 -\begin{itemize}
 29.1118 -\item[\texttt{node}] A changeset ID.  The changeset ID of the newly
 29.1119 -  committed changeset.
 29.1120 -\item[\texttt{parent1}] A changeset ID.  The changeset ID of the first
 29.1121 -  parent of the newly committed changeset.
 29.1122 -\item[\texttt{parent2}] A changeset ID.  The changeset ID of the second
 29.1123 -  parent of the newly committed changeset.
 29.1124 -\end{itemize}
 29.1125 -
 29.1126 -See also: \hook{precommit} (section~\ref{sec:hook:precommit}),
 29.1127 -\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit})
 29.1128 -
 29.1129 -\subsection{\hook{incoming}---after one remote changeset is added}
 29.1130 -\label{sec:hook:incoming}
 29.1131 -
 29.1132 -This hook is run after a pre-existing changeset has been added to the
 29.1133 -repository, for example via a \hgcmd{push}.  If a group of changesets
 29.1134 -was added in a single operation, this hook is called once for each
 29.1135 -added changeset.
 29.1136 -
 29.1137 -You can use this hook for the same purposes as the \hook{changegroup}
 29.1138 -hook (section~\ref{sec:hook:changegroup}); it's simply more convenient
 29.1139 -sometimes to run a hook once per group of changesets, while other
 29.1140 -times it's handier once per changeset.
 29.1141 -
 29.1142 -Parameters to this hook:
 29.1143 -\begin{itemize}
 29.1144 -\item[\texttt{node}] A changeset ID.  The ID of the newly added
 29.1145 -  changeset.
 29.1146 -\item[\texttt{source}] A string.  The source of these changes.  See
 29.1147 -  section~\ref{sec:hook:sources} for details.
 29.1148 -\item[\texttt{url}] A URL.  The location of the remote repository, if
 29.1149 -  known.  See section~\ref{sec:hook:url} for more information.
 29.1150 -\end{itemize}
 29.1151 -
 29.1152 -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}) \hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), \hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup})
 29.1153 -
 29.1154 -\subsection{\hook{outgoing}---after changesets are propagated}
 29.1155 -\label{sec:hook:outgoing}
 29.1156 -
 29.1157 -This hook is run after a group of changesets has been propagated out
 29.1158 -of this repository, for example by a \hgcmd{push} or \hgcmd{bundle}
 29.1159 -command.
 29.1160 -
 29.1161 -One possible use for this hook is to notify administrators that
 29.1162 -changes have been pulled.
 29.1163 -
 29.1164 -Parameters to this hook:
 29.1165 -\begin{itemize}
 29.1166 -\item[\texttt{node}] A changeset ID.  The changeset ID of the first
 29.1167 -  changeset of the group that was sent.
 29.1168 -\item[\texttt{source}] A string.  The source of the of the operation
 29.1169 -  (see section~\ref{sec:hook:sources}).  If a remote client pulled
 29.1170 -  changes from this repository, \texttt{source} will be
 29.1171 -  \texttt{serve}.  If the client that obtained changes from this
 29.1172 -  repository was local, \texttt{source} will be \texttt{bundle},
 29.1173 -  \texttt{pull}, or \texttt{push}, depending on the operation the
 29.1174 -  client performed.
 29.1175 -\item[\texttt{url}] A URL.  The location of the remote repository, if
 29.1176 -  known.  See section~\ref{sec:hook:url} for more information.
 29.1177 -\end{itemize}
 29.1178 -
 29.1179 -See also: \hook{preoutgoing} (section~\ref{sec:hook:preoutgoing})
 29.1180 -
 29.1181 -\subsection{\hook{prechangegroup}---before starting to add remote changesets}
 29.1182 -\label{sec:hook:prechangegroup}
 29.1183 -
 29.1184 -This controlling hook is run before Mercurial begins to add a group of
 29.1185 -changesets from another repository.
 29.1186 -
 29.1187 -This hook does not have any information about the changesets to be
 29.1188 -added, because it is run before transmission of those changesets is
 29.1189 -allowed to begin.  If this hook fails, the changesets will not be
 29.1190 -transmitted.
 29.1191 -
 29.1192 -One use for this hook is to prevent external changes from being added
 29.1193 -to a repository.  For example, you could use this to ``freeze'' a
 29.1194 -server-hosted branch temporarily or permanently so that users cannot
 29.1195 -push to it, while still allowing a local administrator to modify the
 29.1196 -repository.
 29.1197 -
 29.1198 -Parameters to this hook:
 29.1199 -\begin{itemize}
 29.1200 -\item[\texttt{source}] A string.  The source of these changes.  See
 29.1201 -  section~\ref{sec:hook:sources} for details.
 29.1202 -\item[\texttt{url}] A URL.  The location of the remote repository, if
 29.1203 -  known.  See section~\ref{sec:hook:url} for more information.
 29.1204 -\end{itemize}
 29.1205 -
 29.1206 -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}),
 29.1207 -\hook{incoming} (section~\ref{sec:hook:incoming}), ,
 29.1208 -\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup})
 29.1209 -
 29.1210 -\subsection{\hook{precommit}---before starting to commit a changeset}
 29.1211 -\label{sec:hook:precommit}
 29.1212 -
 29.1213 -This hook is run before Mercurial begins to commit a new changeset.
 29.1214 -It is run before Mercurial has any of the metadata for the commit,
 29.1215 -such as the files to be committed, the commit message, or the commit
 29.1216 -date.
 29.1217 -
 29.1218 -One use for this hook is to disable the ability to commit new
 29.1219 -changesets, while still allowing incoming changesets.  Another is to
 29.1220 -run a build or test, and only allow the commit to begin if the build
 29.1221 -or test succeeds.
 29.1222 -
 29.1223 -Parameters to this hook:
 29.1224 -\begin{itemize}
 29.1225 -\item[\texttt{parent1}] A changeset ID.  The changeset ID of the first
 29.1226 -  parent of the working directory.
 29.1227 -\item[\texttt{parent2}] A changeset ID.  The changeset ID of the second
 29.1228 -  parent of the working directory.
 29.1229 -\end{itemize}
 29.1230 -If the commit proceeds, the parents of the working directory will
 29.1231 -become the parents of the new changeset.
 29.1232 -
 29.1233 -See also: \hook{commit} (section~\ref{sec:hook:commit}),
 29.1234 -\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit})
 29.1235 -
 29.1236 -\subsection{\hook{preoutgoing}---before starting to propagate changesets}
 29.1237 -\label{sec:hook:preoutgoing}
 29.1238 -
 29.1239 -This hook is invoked before Mercurial knows the identities of the
 29.1240 -changesets to be transmitted.
 29.1241 -
 29.1242 -One use for this hook is to prevent changes from being transmitted to
 29.1243 -another repository.
 29.1244 -
 29.1245 -Parameters to this hook:
 29.1246 -\begin{itemize}
 29.1247 -\item[\texttt{source}] A string.  The source of the operation that is
 29.1248 -  attempting to obtain changes from this repository (see
 29.1249 -  section~\ref{sec:hook:sources}).  See the documentation for the
 29.1250 -  \texttt{source} parameter to the \hook{outgoing} hook, in
 29.1251 -  section~\ref{sec:hook:outgoing}, for possible values of this
 29.1252 -  parameter.
 29.1253 -\item[\texttt{url}] A URL.  The location of the remote repository, if
 29.1254 -  known.  See section~\ref{sec:hook:url} for more information.
 29.1255 -\end{itemize}
 29.1256 -
 29.1257 -See also: \hook{outgoing} (section~\ref{sec:hook:outgoing})
 29.1258 -
 29.1259 -\subsection{\hook{pretag}---before tagging a changeset}
 29.1260 -\label{sec:hook:pretag}
 29.1261 -
 29.1262 -This controlling hook is run before a tag is created.  If the hook
 29.1263 -succeeds, creation of the tag proceeds.  If the hook fails, the tag is
 29.1264 -not created.
 29.1265 -
 29.1266 -Parameters to this hook:
 29.1267 -\begin{itemize}
 29.1268 -\item[\texttt{local}] A boolean.  Whether the tag is local to this
 29.1269 -  repository instance (i.e.~stored in \sfilename{.hg/localtags}) or
 29.1270 -  managed by Mercurial (stored in \sfilename{.hgtags}).
 29.1271 -\item[\texttt{node}] A changeset ID.  The ID of the changeset to be tagged.
 29.1272 -\item[\texttt{tag}] A string.  The name of the tag to be created.
 29.1273 -\end{itemize}
 29.1274 -
 29.1275 -If the tag to be created is revision-controlled, the \hook{precommit}
 29.1276 -and \hook{pretxncommit} hooks (sections~\ref{sec:hook:commit}
 29.1277 -and~\ref{sec:hook:pretxncommit}) will also be run.
 29.1278 -
 29.1279 -See also: \hook{tag} (section~\ref{sec:hook:tag})
 29.1280 -
 29.1281 -\subsection{\hook{pretxnchangegroup}---before completing addition of
 29.1282 -  remote changesets}
 29.1283 -\label{sec:hook:pretxnchangegroup}
 29.1284 -
 29.1285 -This controlling hook is run before a transaction---that manages the
 29.1286 -addition of a group of new changesets from outside the
 29.1287 -repository---completes.  If the hook succeeds, the transaction
 29.1288 -completes, and all of the changesets become permanent within this
 29.1289 -repository.  If the hook fails, the transaction is rolled back, and
 29.1290 -the data for the changesets is erased.
 29.1291 -
 29.1292 -This hook can access the metadata associated with the almost-added
 29.1293 -changesets, but it should not do anything permanent with this data.
 29.1294 -It must also not modify the working directory.
 29.1295 -
 29.1296 -While this hook is running, if other Mercurial processes access this
 29.1297 -repository, they will be able to see the almost-added changesets as if
 29.1298 -they are permanent.  This may lead to race conditions if you do not
 29.1299 -take steps to avoid them.
 29.1300 -
 29.1301 -This hook can be used to automatically vet a group of changesets.  If
 29.1302 -the hook fails, all of the changesets are ``rejected'' when the
 29.1303 -transaction rolls back.
 29.1304 -
 29.1305 -Parameters to this hook:
 29.1306 -\begin{itemize}
 29.1307 -\item[\texttt{node}] A changeset ID.  The changeset ID of the first
 29.1308 -  changeset in the group that was added.  All changesets between this
 29.1309 -  and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by
 29.1310 -  a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}.
 29.1311 -\item[\texttt{source}] A string.  The source of these changes.  See
 29.1312 -  section~\ref{sec:hook:sources} for details.
 29.1313 -\item[\texttt{url}] A URL.  The location of the remote repository, if
 29.1314 -  known.  See section~\ref{sec:hook:url} for more information.
 29.1315 -\end{itemize}
 29.1316 -
 29.1317 -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}),
 29.1318 -\hook{incoming} (section~\ref{sec:hook:incoming}),
 29.1319 -\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup})
 29.1320 -
 29.1321 -\subsection{\hook{pretxncommit}---before completing commit of new changeset}
 29.1322 -\label{sec:hook:pretxncommit}
 29.1323 -
 29.1324 -This controlling hook is run before a transaction---that manages a new
 29.1325 -commit---completes.  If the hook succeeds, the transaction completes
 29.1326 -and the changeset becomes permanent within this repository.  If the
 29.1327 -hook fails, the transaction is rolled back, and the commit data is
 29.1328 -erased.
 29.1329 -
 29.1330 -This hook can access the metadata associated with the almost-new
 29.1331 -changeset, but it should not do anything permanent with this data.  It
 29.1332 -must also not modify the working directory.
 29.1333 -
 29.1334 -While this hook is running, if other Mercurial processes access this
 29.1335 -repository, they will be able to see the almost-new changeset as if it
 29.1336 -is permanent.  This may lead to race conditions if you do not take
 29.1337 -steps to avoid them.
 29.1338 -
 29.1339 -Parameters to this hook:
 29.1340 -\begin{itemize}
 29.1341 -\item[\texttt{node}] A changeset ID.  The changeset ID of the newly
 29.1342 -  committed changeset.
 29.1343 -\item[\texttt{parent1}] A changeset ID.  The changeset ID of the first
 29.1344 -  parent of the newly committed changeset.
 29.1345 -\item[\texttt{parent2}] A changeset ID.  The changeset ID of the second
 29.1346 -  parent of the newly committed changeset.
 29.1347 -\end{itemize}
 29.1348 -
 29.1349 -See also: \hook{precommit} (section~\ref{sec:hook:precommit})
 29.1350 -
 29.1351 -\subsection{\hook{preupdate}---before updating or merging working directory}
 29.1352 -\label{sec:hook:preupdate}
 29.1353 -
 29.1354 -This controlling hook is run before an update or merge of the working
 29.1355 -directory begins.  It is run only if Mercurial's normal pre-update
 29.1356 -checks determine that the update or merge can proceed.  If the hook
 29.1357 -succeeds, the update or merge may proceed; if it fails, the update or
 29.1358 -merge does not start.
 29.1359 -
 29.1360 -Parameters to this hook:
 29.1361 -\begin{itemize}
 29.1362 -\item[\texttt{parent1}] A changeset ID.  The ID of the parent that the
 29.1363 -  working directory is to be updated to.  If the working directory is
 29.1364 -  being merged, it will not change this parent.
 29.1365 -\item[\texttt{parent2}] A changeset ID.  Only set if the working
 29.1366 -  directory is being merged.  The ID of the revision that the working
 29.1367 -  directory is being merged with.
 29.1368 -\end{itemize}
 29.1369 -
 29.1370 -See also: \hook{update} (section~\ref{sec:hook:update})
 29.1371 -
 29.1372 -\subsection{\hook{tag}---after tagging a changeset}
 29.1373 -\label{sec:hook:tag}
 29.1374 -
 29.1375 -This hook is run after a tag has been created.
 29.1376 -
 29.1377 -Parameters to this hook:
 29.1378 -\begin{itemize}
 29.1379 -\item[\texttt{local}] A boolean.  Whether the new tag is local to this
 29.1380 -  repository instance (i.e.~stored in \sfilename{.hg/localtags}) or
 29.1381 -  managed by Mercurial (stored in \sfilename{.hgtags}).
 29.1382 -\item[\texttt{node}] A changeset ID.  The ID of the changeset that was
 29.1383 -  tagged.
 29.1384 -\item[\texttt{tag}] A string.  The name of the tag that was created.
 29.1385 -\end{itemize}
 29.1386 -
 29.1387 -If the created tag is revision-controlled, the \hook{commit} hook
 29.1388 -(section~\ref{sec:hook:commit}) is run before this hook.
 29.1389 -
 29.1390 -See also: \hook{pretag} (section~\ref{sec:hook:pretag})
 29.1391 -
 29.1392 -\subsection{\hook{update}---after updating or merging working directory}
 29.1393 -\label{sec:hook:update}
 29.1394 -
 29.1395 -This hook is run after an update or merge of the working directory
 29.1396 -completes.  Since a merge can fail (if the external \command{hgmerge}
 29.1397 -command fails to resolve conflicts in a file), this hook communicates
 29.1398 -whether the update or merge completed cleanly.
 29.1399 -
 29.1400 -\begin{itemize}
 29.1401 -\item[\texttt{error}] A boolean.  Indicates whether the update or
 29.1402 -  merge completed successfully.
 29.1403 -\item[\texttt{parent1}] A changeset ID.  The ID of the parent that the
 29.1404 -  working directory was updated to.  If the working directory was
 29.1405 -  merged, it will not have changed this parent.
 29.1406 -\item[\texttt{parent2}] A changeset ID.  Only set if the working
 29.1407 -  directory was merged.  The ID of the revision that the working
 29.1408 -  directory was merged with.
 29.1409 -\end{itemize}
 29.1410 -
 29.1411 -See also: \hook{preupdate} (section~\ref{sec:hook:preupdate})
 29.1412 -
 29.1413 -%%% Local Variables: 
 29.1414 -%%% mode: latex
 29.1415 -%%% TeX-master: "00book"
 29.1416 -%%% End: 

    30.1 --- a/en/intro.tex	Thu Jan 29 22:47:34 2009 -0800
    30.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    30.3 @@ -1,561 +0,0 @@
    30.4 -\chapter{Introduction}
    30.5 -\label{chap:intro}
    30.6 -
    30.7 -\section{About revision control}
    30.8 -
    30.9 -Revision control is the process of managing multiple versions of a
   30.10 -piece of information.  In its simplest form, this is something that
   30.11 -many people do by hand: every time you modify a file, save it under a
   30.12 -new name that contains a number, each one higher than the number of
   30.13 -the preceding version.
   30.14 -
   30.15 -Manually managing multiple versions of even a single file is an
   30.16 -error-prone task, though, so software tools to help automate this
   30.17 -process have long been available.  The earliest automated revision
   30.18 -control tools were intended to help a single user to manage revisions
   30.19 -of a single file.  Over the past few decades, the scope of revision
   30.20 -control tools has expanded greatly; they now manage multiple files,
   30.21 -and help multiple people to work together.  The best modern revision
   30.22 -control tools have no problem coping with thousands of people working
   30.23 -together on projects that consist of hundreds of thousands of files.
   30.24 -
   30.25 -\subsection{Why use revision control?}
   30.26 -
   30.27 -There are a number of reasons why you or your team might want to use
   30.28 -an automated revision control tool for a project.
   30.29 -\begin{itemize}
   30.30 -\item It will track the history and evolution of your project, so you
   30.31 -  don't have to.  For every change, you'll have a log of \emph{who}
   30.32 -  made it; \emph{why} they made it; \emph{when} they made it; and
   30.33 -  \emph{what} the change was.
   30.34 -\item When you're working with other people, revision control software
   30.35 -  makes it easier for you to collaborate.  For example, when people
   30.36 -  more or less simultaneously make potentially incompatible changes,
   30.37 -  the software will help you to identify and resolve those conflicts.
   30.38 -\item It can help you to recover from mistakes.  If you make a change
   30.39 -  that later turns out to be in error, you can revert to an earlier
   30.40 -  version of one or more files.  In fact, a \emph{really} good
   30.41 -  revision control tool will even help you to efficiently figure out
   30.42 -  exactly when a problem was introduced (see
   30.43 -  section~\ref{sec:undo:bisect} for details).
   30.44 -\item It will help you to work simultaneously on, and manage the drift
   30.45 -  between, multiple versions of your project.
   30.46 -\end{itemize}
   30.47 -Most of these reasons are equally valid---at least in theory---whether
   30.48 -you're working on a project by yourself, or with a hundred other
   30.49 -people.
   30.50 -
   30.51 -A key question about the practicality of revision control at these two
   30.52 -different scales (``lone hacker'' and ``huge team'') is how its
   30.53 -\emph{benefits} compare to its \emph{costs}.  A revision control tool
   30.54 -that's difficult to understand or use is going to impose a high cost.
   30.55 -
   30.56 -A five-hundred-person project is likely to collapse under its own
   30.57 -weight almost immediately without a revision control tool and process.
   30.58 -In this case, the cost of using revision control might hardly seem
   30.59 -worth considering, since \emph{without} it, failure is almost
   30.60 -guaranteed.
   30.61 -
   30.62 -On the other hand, a one-person ``quick hack'' might seem like a poor
   30.63 -place to use a revision control tool, because surely the cost of using
   30.64 -one must be close to the overall cost of the project.  Right?
   30.65 -
   30.66 -Mercurial uniquely supports \emph{both} of these scales of
   30.67 -development.  You can learn the basics in just a few minutes, and due
   30.68 -to its low overhead, you can apply revision control to the smallest of
   30.69 -projects with ease.  Its simplicity means you won't have a lot of
   30.70 -abstruse concepts or command sequences competing for mental space with
   30.71 -whatever you're \emph{really} trying to do.  At the same time,
   30.72 -Mercurial's high performance and peer-to-peer nature let you scale
   30.73 -painlessly to handle large projects.
   30.74 -
   30.75 -No revision control tool can rescue a poorly run project, but a good
   30.76 -choice of tools can make a huge difference to the fluidity with which
   30.77 -you can work on a project.
   30.78 -
   30.79 -\subsection{The many names of revision control}
   30.80 -
   30.81 -Revision control is a diverse field, so much so that it doesn't
   30.82 -actually have a single name or acronym.  Here are a few of the more
   30.83 -common names and acronyms you'll encounter:
   30.84 -\begin{itemize}
   30.85 -\item Revision control (RCS)
   30.86 -\item Software configuration management (SCM), or configuration management
   30.87 -\item Source code management
   30.88 -\item Source code control, or source control
   30.89 -\item Version control (VCS)
   30.90 -\end{itemize}
   30.91 -Some people claim that these terms actually have different meanings,
   30.92 -but in practice they overlap so much that there's no agreed or even
   30.93 -useful way to tease them apart.
   30.94 -
   30.95 -\section{A short history of revision control}
   30.96 -
   30.97 -The best known of the old-time revision control tools is SCCS (Source
   30.98 -Code Control System), which Marc Rochkind wrote at Bell Labs, in the
   30.99 -early 1970s.  SCCS operated on individual files, and required every
  30.100 -person working on a project to have access to a shared workspace on a
  30.101 -single system.  Only one person could modify a file at any time;
  30.102 -arbitration for access to files was via locks.  It was common for
  30.103 -people to lock files, and later forget to unlock them, preventing
  30.104 -anyone else from modifying those files without the help of an
  30.105 -administrator.  
  30.106 -
  30.107 -Walter Tichy developed a free alternative to SCCS in the early 1980s;
  30.108 -he called his program RCS (Revison Control System).  Like SCCS, RCS
  30.109 -required developers to work in a single shared workspace, and to lock
  30.110 -files to prevent multiple people from modifying them simultaneously.
  30.111 -
  30.112 -Later in the 1980s, Dick Grune used RCS as a building block for a set
  30.113 -of shell scripts he initially called cmt, but then renamed to CVS
  30.114 -(Concurrent Versions System).  The big innovation of CVS was that it
  30.115 -let developers work simultaneously and somewhat independently in their
  30.116 -own personal workspaces.  The personal workspaces prevented developers
  30.117 -from stepping on each other's toes all the time, as was common with
  30.118 -SCCS and RCS.  Each developer had a copy of every project file, and
  30.119 -could modify their copies independently.  They had to merge their
  30.120 -edits prior to committing changes to the central repository.
  30.121 -
  30.122 -Brian Berliner took Grune's original scripts and rewrote them in~C,
  30.123 -releasing in 1989 the code that has since developed into the modern
  30.124 -version of CVS.  CVS subsequently acquired the ability to operate over
  30.125 -a network connection, giving it a client/server architecture.  CVS's
  30.126 -architecture is centralised; only the server has a copy of the history
  30.127 -of the project.  Client workspaces just contain copies of recent
  30.128 -versions of the project's files, and a little metadata to tell them
  30.129 -where the server is.  CVS has been enormously successful; it is
  30.130 -probably the world's most widely used revision control system.
  30.131 -
  30.132 -In the early 1990s, Sun Microsystems developed an early distributed
  30.133 -revision control system, called TeamWare.  A TeamWare workspace
  30.134 -contains a complete copy of the project's history.  TeamWare has no
  30.135 -notion of a central repository.  (CVS relied upon RCS for its history
  30.136 -storage; TeamWare used SCCS.)
  30.137 -
  30.138 -As the 1990s progressed, awareness grew of a number of problems with
  30.139 -CVS.  It records simultaneous changes to multiple files individually,
  30.140 -instead of grouping them together as a single logically atomic
  30.141 -operation.  It does not manage its file hierarchy well; it is easy to
  30.142 -make a mess of a repository by renaming files and directories.  Worse,
  30.143 -its source code is difficult to read and maintain, which made the
  30.144 -``pain level'' of fixing these architectural problems prohibitive.
  30.145 -
  30.146 -In 2001, Jim Blandy and Karl Fogel, two developers who had worked on
  30.147 -CVS, started a project to replace it with a tool that would have a
  30.148 -better architecture and cleaner code.  The result, Subversion, does
  30.149 -not stray from CVS's centralised client/server model, but it adds
  30.150 -multi-file atomic commits, better namespace management, and a number
  30.151 -of other features that make it a generally better tool than CVS.
  30.152 -Since its initial release, it has rapidly grown in popularity.
  30.153 -
  30.154 -More or less simultaneously, Graydon Hoare began working on an
  30.155 -ambitious distributed revision control system that he named Monotone.
  30.156 -While Monotone addresses many of CVS's design flaws and has a
  30.157 -peer-to-peer architecture, it goes beyond earlier (and subsequent)
  30.158 -revision control tools in a number of innovative ways.  It uses
  30.159 -cryptographic hashes as identifiers, and has an integral notion of
  30.160 -``trust'' for code from different sources.
  30.161 -
  30.162 -Mercurial began life in 2005.  While a few aspects of its design are
  30.163 -influenced by Monotone, Mercurial focuses on ease of use, high
  30.164 -performance, and scalability to very large projects.
  30.165 -
  30.166 -\section{Trends in revision control}
  30.167 -
  30.168 -There has been an unmistakable trend in the development and use of
  30.169 -revision control tools over the past four decades, as people have
  30.170 -become familiar with the capabilities of their tools and constrained
  30.171 -by their limitations.
  30.172 -
  30.173 -The first generation began by managing single files on individual
  30.174 -computers.  Although these tools represented a huge advance over
  30.175 -ad-hoc manual revision control, their locking model and reliance on a
  30.176 -single computer limited them to small, tightly-knit teams.
  30.177 -
  30.178 -The second generation loosened these constraints by moving to
  30.179 -network-centered architectures, and managing entire projects at a
  30.180 -time.  As projects grew larger, they ran into new problems.  With
  30.181 -clients needing to talk to servers very frequently, server scaling
  30.182 -became an issue for large projects.  An unreliable network connection
  30.183 -could prevent remote users from being able to talk to the server at
  30.184 -all.  As open source projects started making read-only access
  30.185 -available anonymously to anyone, people without commit privileges
  30.186 -found that they could not use the tools to interact with a project in
  30.187 -a natural way, as they could not record their changes.
  30.188 -
  30.189 -The current generation of revision control tools is peer-to-peer in
  30.190 -nature.  All of these systems have dropped the dependency on a single
  30.191 -central server, and allow people to distribute their revision control
  30.192 -data to where it's actually needed.  Collaboration over the Internet
  30.193 -has moved from constrained by technology to a matter of choice and
  30.194 -consensus.  Modern tools can operate offline indefinitely and
  30.195 -autonomously, with a network connection only needed when syncing
  30.196 -changes with another repository.
  30.197 -
  30.198 -\section{A few of the advantages of distributed revision control}
  30.199 -
  30.200 -Even though distributed revision control tools have for several years
  30.201 -been as robust and usable as their previous-generation counterparts,
  30.202 -people using older tools have not yet necessarily woken up to their
  30.203 -advantages.  There are a number of ways in which distributed tools
  30.204 -shine relative to centralised ones.
  30.205 -
  30.206 -For an individual developer, distributed tools are almost always much
  30.207 -faster than centralised tools.  This is for a simple reason: a
  30.208 -centralised tool needs to talk over the network for many common
  30.209 -operations, because most metadata is stored in a single copy on the
  30.210 -central server.  A distributed tool stores all of its metadata
  30.211 -locally.  All else being equal, talking over the network adds overhead
  30.212 -to a centralised tool.  Don't underestimate the value of a snappy,
  30.213 -responsive tool: you're going to spend a lot of time interacting with
  30.214 -your revision control software.
  30.215 -
  30.216 -Distributed tools are indifferent to the vagaries of your server
  30.217 -infrastructure, again because they replicate metadata to so many
  30.218 -locations.  If you use a centralised system and your server catches
  30.219 -fire, you'd better hope that your backup media are reliable, and that
  30.220 -your last backup was recent and actually worked.  With a distributed
  30.221 -tool, you have many backups available on every contributor's computer.
  30.222 -
  30.223 -The reliability of your network will affect distributed tools far less
  30.224 -than it will centralised tools.  You can't even use a centralised tool
  30.225 -without a network connection, except for a few highly constrained
  30.226 -commands.  With a distributed tool, if your network connection goes
  30.227 -down while you're working, you may not even notice.  The only thing
  30.228 -you won't be able to do is talk to repositories on other computers,
  30.229 -something that is relatively rare compared with local operations.  If
  30.230 -you have a far-flung team of collaborators, this may be significant.
  30.231 -
  30.232 -\subsection{Advantages for open source projects}
  30.233 -
  30.234 -If you take a shine to an open source project and decide that you
  30.235 -would like to start hacking on it, and that project uses a distributed
  30.236 -revision control tool, you are at once a peer with the people who
  30.237 -consider themselves the ``core'' of that project.  If they publish
  30.238 -their repositories, you can immediately copy their project history,
  30.239 -start making changes, and record your work, using the same tools in
  30.240 -the same ways as insiders.  By contrast, with a centralised tool, you
  30.241 -must use the software in a ``read only'' mode unless someone grants
  30.242 -you permission to commit changes to their central server.  Until then,
  30.243 -you won't be able to record changes, and your local modifications will
  30.244 -be at risk of corruption any time you try to update your client's view
  30.245 -of the repository.
  30.246 -
  30.247 -\subsubsection{The forking non-problem}
  30.248 -
  30.249 -It has been suggested that distributed revision control tools pose
  30.250 -some sort of risk to open source projects because they make it easy to
  30.251 -``fork'' the development of a project.  A fork happens when there are
  30.252 -differences in opinion or attitude between groups of developers that
  30.253 -cause them to decide that they can't work together any longer.  Each
  30.254 -side takes a more or less complete copy of the project's source code,
  30.255 -and goes off in its own direction.
  30.256 -
  30.257 -Sometimes the camps in a fork decide to reconcile their differences.
  30.258 -With a centralised revision control system, the \emph{technical}
  30.259 -process of reconciliation is painful, and has to be performed largely
  30.260 -by hand.  You have to decide whose revision history is going to
  30.261 -``win'', and graft the other team's changes into the tree somehow.
  30.262 -This usually loses some or all of one side's revision history.
  30.263 -
  30.264 -What distributed tools do with respect to forking is they make forking
  30.265 -the \emph{only} way to develop a project.  Every single change that
  30.266 -you make is potentially a fork point.  The great strength of this
  30.267 -approach is that a distributed revision control tool has to be really
  30.268 -good at \emph{merging} forks, because forks are absolutely
  30.269 -fundamental: they happen all the time.  
  30.270 -
  30.271 -If every piece of work that everybody does, all the time, is framed in
  30.272 -terms of forking and merging, then what the open source world refers
  30.273 -to as a ``fork'' becomes \emph{purely} a social issue.  If anything,
  30.274 -distributed tools \emph{lower} the likelihood of a fork:
  30.275 -\begin{itemize}
  30.276 -\item They eliminate the social distinction that centralised tools
  30.277 -  impose: that between insiders (people with commit access) and
  30.278 -  outsiders (people without).
  30.279 -\item They make it easier to reconcile after a social fork, because
  30.280 -  all that's involved from the perspective of the revision control
  30.281 -  software is just another merge.
  30.282 -\end{itemize}
  30.283 -
  30.284 -Some people resist distributed tools because they want to retain tight
  30.285 -control over their projects, and they believe that centralised tools
  30.286 -give them this control.  However, if you're of this belief, and you
  30.287 -publish your CVS or Subversion repositories publically, there are
  30.288 -plenty of tools available that can pull out your entire project's
  30.289 -history (albeit slowly) and recreate it somewhere that you don't
  30.290 -control.  So while your control in this case is illusory, you are
  30.291 -forgoing the ability to fluidly collaborate with whatever people feel
  30.292 -compelled to mirror and fork your history.
  30.293 -
  30.294 -\subsection{Advantages for commercial projects}
  30.295 -
  30.296 -Many commercial projects are undertaken by teams that are scattered
  30.297 -across the globe.  Contributors who are far from a central server will
  30.298 -see slower command execution and perhaps less reliability.  Commercial
  30.299 -revision control systems attempt to ameliorate these problems with
  30.300 -remote-site replication add-ons that are typically expensive to buy
  30.301 -and cantankerous to administer.  A distributed system doesn't suffer
  30.302 -from these problems in the first place.  Better yet, you can easily
  30.303 -set up multiple authoritative servers, say one per site, so that
  30.304 -there's no redundant communication between repositories over expensive
  30.305 -long-haul network links.
  30.306 -
  30.307 -Centralised revision control systems tend to have relatively low
  30.308 -scalability.  It's not unusual for an expensive centralised system to
  30.309 -fall over under the combined load of just a few dozen concurrent
  30.310 -users.  Once again, the typical response tends to be an expensive and
  30.311 -clunky replication facility.  Since the load on a central server---if
  30.312 -you have one at all---is many times lower with a distributed
  30.313 -tool (because all of the data is replicated everywhere), a single
  30.314 -cheap server can handle the needs of a much larger team, and
  30.315 -replication to balance load becomes a simple matter of scripting.
  30.316 -
  30.317 -If you have an employee in the field, troubleshooting a problem at a
  30.318 -customer's site, they'll benefit from distributed revision control.
  30.319 -The tool will let them generate custom builds, try different fixes in
  30.320 -isolation from each other, and search efficiently through history for
  30.321 -the sources of bugs and regressions in the customer's environment, all
  30.322 -without needing to connect to your company's network.
  30.323 -
  30.324 -\section{Why choose Mercurial?}
  30.325 -
  30.326 -Mercurial has a unique set of properties that make it a particularly
  30.327 -good choice as a revision control system.
  30.328 -\begin{itemize}
  30.329 -\item It is easy to learn and use.
  30.330 -\item It is lightweight.
  30.331 -\item It scales excellently.
  30.332 -\item It is easy to customise.
  30.333 -\end{itemize}
  30.334 -
  30.335 -If you are at all familiar with revision control systems, you should
  30.336 -be able to get up and running with Mercurial in less than five
  30.337 -minutes.  Even if not, it will take no more than a few minutes
  30.338 -longer.  Mercurial's command and feature sets are generally uniform
  30.339 -and consistent, so you can keep track of a few general rules instead
  30.340 -of a host of exceptions.
  30.341 -
  30.342 -On a small project, you can start working with Mercurial in moments.
  30.343 -Creating new changes and branches; transferring changes around
  30.344 -(whether locally or over a network); and history and status operations
  30.345 -are all fast.  Mercurial attempts to stay nimble and largely out of
  30.346 -your way by combining low cognitive overhead with blazingly fast
  30.347 -operations.
  30.348 -
  30.349 -The usefulness of Mercurial is not limited to small projects: it is
  30.350 -used by projects with hundreds to thousands of contributors, each
  30.351 -containing tens of thousands of files and hundreds of megabytes of
  30.352 -source code.
  30.353 -
  30.354 -If the core functionality of Mercurial is not enough for you, it's
  30.355 -easy to build on.  Mercurial is well suited to scripting tasks, and
  30.356 -its clean internals and implementation in Python make it easy to add
  30.357 -features in the form of extensions.  There are a number of popular and
  30.358 -useful extensions already available, ranging from helping to identify
  30.359 -bugs to improving performance.
  30.360 -
  30.361 -\section{Mercurial compared with other tools}
  30.362 -
  30.363 -Before you read on, please understand that this section necessarily
  30.364 -reflects my own experiences, interests, and (dare I say it) biases.  I
  30.365 -have used every one of the revision control tools listed below, in
  30.366 -most cases for several years at a time.
  30.367 -
  30.368 -
  30.369 -\subsection{Subversion}
  30.370 -
  30.371 -Subversion is a popular revision control tool, developed to replace
  30.372 -CVS.  It has a centralised client/server architecture.
  30.373 -
  30.374 -Subversion and Mercurial have similarly named commands for performing
  30.375 -the same operations, so if you're familiar with one, it is easy to
  30.376 -learn to use the other.  Both tools are portable to all popular
  30.377 -operating systems.
  30.378 -
  30.379 -Prior to version 1.5, Subversion had no useful support for merges.
  30.380 -At the time of writing, its merge tracking capability is new, and known to be
  30.381 -\href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated
  30.382 -  and buggy}.
  30.383 -
  30.384 -Mercurial has a substantial performance advantage over Subversion on
  30.385 -every revision control operation I have benchmarked.  I have measured
  30.386 -its advantage as ranging from a factor of two to a factor of six when
  30.387 -compared with Subversion~1.4.3's \emph{ra\_local} file store, which is
  30.388 -the fastest access method available.  In more realistic deployments
  30.389 -involving a network-based store, Subversion will be at a substantially
  30.390 -larger disadvantage.  Because many Subversion commands must talk to
  30.391 -the server and Subversion does not have useful replication facilities,
  30.392 -server capacity and network bandwidth become bottlenecks for modestly
  30.393 -large projects.
  30.394 -
  30.395 -Additionally, Subversion incurs substantial storage overhead to avoid
  30.396 -network transactions for a few common operations, such as finding
  30.397 -modified files (\texttt{status}) and displaying modifications against
  30.398 -the current revision (\texttt{diff}).  As a result, a Subversion
  30.399 -working copy is often the same size as, or larger than, a Mercurial
  30.400 -repository and working directory, even though the Mercurial repository
  30.401 -contains a complete history of the project.
  30.402 -
  30.403 -Subversion is widely supported by third party tools.  Mercurial
  30.404 -currently lags considerably in this area.  This gap is closing,
  30.405 -however, and indeed some of Mercurial's GUI tools now outshine their
  30.406 -Subversion equivalents.  Like Mercurial, Subversion has an excellent
  30.407 -user manual.
  30.408 -
  30.409 -Because Subversion doesn't store revision history on the client, it is
  30.410 -well suited to managing projects that deal with lots of large, opaque
  30.411 -binary files.  If you check in fifty revisions to an incompressible
  30.412 -10MB file, Subversion's client-side space usage stays constant The
  30.413 -space used by any distributed SCM will grow rapidly in proportion to
  30.414 -the number of revisions, because the differences between each revision
  30.415 -are large.
  30.416 -
  30.417 -In addition, it's often difficult or, more usually, impossible to
  30.418 -merge different versions of a binary file.  Subversion's ability to
  30.419 -let a user lock a file, so that they temporarily have the exclusive
  30.420 -right to commit changes to it, can be a significant advantage to a
  30.421 -project where binary files are widely used.
  30.422 -
  30.423 -Mercurial can import revision history from a Subversion repository.
  30.424 -It can also export revision history to a Subversion repository.  This
  30.425 -makes it easy to ``test the waters'' and use Mercurial and Subversion
  30.426 -in parallel before deciding to switch.  History conversion is
  30.427 -incremental, so you can perform an initial conversion, then small
  30.428 -additional conversions afterwards to bring in new changes.
  30.429 -
  30.430 -
  30.431 -\subsection{Git}
  30.432 -
  30.433 -Git is a distributed revision control tool that was developed for
  30.434 -managing the Linux kernel source tree.  Like Mercurial, its early
  30.435 -design was somewhat influenced by Monotone.
  30.436 -
  30.437 -Git has a very large command set, with version~1.5.0 providing~139
  30.438 -individual commands.  It has something of a reputation for being
  30.439 -difficult to learn.  Compared to Git, Mercurial has a strong focus on
  30.440 -simplicity.
  30.441 -
  30.442 -In terms of performance, Git is extremely fast.  In several cases, it
  30.443 -is faster than Mercurial, at least on Linux, while Mercurial performs
  30.444 -better on other operations.  However, on Windows, the performance and
  30.445 -general level of support that Git provides is, at the time of writing,
  30.446 -far behind that of Mercurial.
  30.447 -
  30.448 -While a Mercurial repository needs no maintenance, a Git repository
  30.449 -requires frequent manual ``repacks'' of its metadata.  Without these,
  30.450 -performance degrades, while space usage grows rapidly.  A server that
  30.451 -contains many Git repositories that are not rigorously and frequently
  30.452 -repacked will become heavily disk-bound during backups, and there have
  30.453 -been instances of daily backups taking far longer than~24 hours as a
  30.454 -result.  A freshly packed Git repository is slightly smaller than a
  30.455 -Mercurial repository, but an unpacked repository is several orders of
  30.456 -magnitude larger.
  30.457 -
  30.458 -The core of Git is written in C.  Many Git commands are implemented as
  30.459 -shell or Perl scripts, and the quality of these scripts varies widely.
  30.460 -I have encountered several instances where scripts charged along
  30.461 -blindly in the presence of errors that should have been fatal.
  30.462 -
  30.463 -Mercurial can import revision history from a Git repository.
  30.464 -
  30.465 -
  30.466 -\subsection{CVS}
  30.467 -
  30.468 -CVS is probably the most widely used revision control tool in the
  30.469 -world.  Due to its age and internal untidiness, it has been only
  30.470 -lightly maintained for many years.
  30.471 -
  30.472 -It has a centralised client/server architecture.  It does not group
  30.473 -related file changes into atomic commits, making it easy for people to
  30.474 -``break the build'': one person can successfully commit part of a
  30.475 -change and then be blocked by the need for a merge, causing other
  30.476 -people to see only a portion of the work they intended to do.  This
  30.477 -also affects how you work with project history.  If you want to see
  30.478 -all of the modifications someone made as part of a task, you will need
  30.479 -to manually inspect the descriptions and timestamps of the changes
  30.480 -made to each file involved (if you even know what those files were).
  30.481 -
  30.482 -CVS has a muddled notion of tags and branches that I will not attempt
  30.483 -to even describe.  It does not support renaming of files or
  30.484 -directories well, making it easy to corrupt a repository.  It has
  30.485 -almost no internal consistency checking capabilities, so it is usually
  30.486 -not even possible to tell whether or how a repository is corrupt.  I
  30.487 -would not recommend CVS for any project, existing or new.
  30.488 -
  30.489 -Mercurial can import CVS revision history.  However, there are a few
  30.490 -caveats that apply; these are true of every other revision control
  30.491 -tool's CVS importer, too.  Due to CVS's lack of atomic changes and
  30.492 -unversioned filesystem hierarchy, it is not possible to reconstruct
  30.493 -CVS history completely accurately; some guesswork is involved, and
  30.494 -renames will usually not show up.  Because a lot of advanced CVS
  30.495 -administration has to be done by hand and is hence error-prone, it's
  30.496 -common for CVS importers to run into multiple problems with corrupted
  30.497 -repositories (completely bogus revision timestamps and files that have
  30.498 -remained locked for over a decade are just two of the less interesting
  30.499 -problems I can recall from personal experience).
  30.500 -
  30.501 -Mercurial can import revision history from a CVS repository.
  30.502 -
  30.503 -
  30.504 -\subsection{Commercial tools}
  30.505 -
  30.506 -Perforce has a centralised client/server architecture, with no
  30.507 -client-side caching of any data.  Unlike modern revision control
  30.508 -tools, Perforce requires that a user run a command to inform the
  30.509 -server about every file they intend to edit.
  30.510 -
  30.511 -The performance of Perforce is quite good for small teams, but it
  30.512 -falls off rapidly as the number of users grows beyond a few dozen.
  30.513 -Modestly large Perforce installations require the deployment of
  30.514 -proxies to cope with the load their users generate.
  30.515 -
  30.516 -
  30.517 -\subsection{Choosing a revision control tool}
  30.518 -
  30.519 -With the exception of CVS, all of the tools listed above have unique
  30.520 -strengths that suit them to particular styles of work.  There is no
  30.521 -single revision control tool that is best in all situations.
  30.522 -
  30.523 -As an example, Subversion is a good choice for working with frequently
  30.524 -edited binary files, due to its centralised nature and support for
  30.525 -file locking.
  30.526 -
  30.527 -I personally find Mercurial's properties of simplicity, performance,
  30.528 -and good merge support to be a compelling combination that has served
  30.529 -me well for several years.
  30.530 -
  30.531 -
  30.532 -\section{Switching from another tool to Mercurial}
  30.533 -
  30.534 -Mercurial is bundled with an extension named \hgext{convert}, which
  30.535 -can incrementally import revision history from several other revision
  30.536 -control tools.  By ``incremental'', I mean that you can convert all of
  30.537 -a project's history to date in one go, then rerun the conversion later
  30.538 -to obtain new changes that happened after the initial conversion.
  30.539 -
  30.540 -The revision control tools supported by \hgext{convert} are as
  30.541 -follows:
  30.542 -\begin{itemize}
  30.543 -\item Subversion
  30.544 -\item CVS
  30.545 -\item Git
  30.546 -\item Darcs
  30.547 -\end{itemize}
  30.548 -
  30.549 -In addition, \hgext{convert} can export changes from Mercurial to
  30.550 -Subversion.  This makes it possible to try Subversion and Mercurial in
  30.551 -parallel before committing to a switchover, without risking the loss
  30.552 -of any work.
  30.553 -
  30.554 -The \hgxcmd{conver}{convert} command is easy to use.  Simply point it
  30.555 -at the path or URL of the source repository, optionally give it the
  30.556 -name of the destination repository, and it will start working.  After
  30.557 -the initial conversion, just run the same command again to import new
  30.558 -changes.
  30.559 -
  30.560 -
  30.561 -%%% Local Variables: 
  30.562 -%%% mode: latex
  30.563 -%%% TeX-master: "00book"
  30.564 -%%% End: 

    31.1 --- a/en/license.tex	Thu Jan 29 22:47:34 2009 -0800
    31.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    31.3 @@ -1,138 +0,0 @@
    31.4 -\chapter{Open Publication License}
    31.5 -\label{cha:opl}
    31.6 -
    31.7 -Version 1.0, 8 June 1999
    31.8 -
    31.9 -\section{Requirements on both unmodified and modified versions}
   31.10 -
   31.11 -The Open Publication works may be reproduced and distributed in whole
   31.12 -or in part, in any medium physical or electronic, provided that the
   31.13 -terms of this license are adhered to, and that this license or an
   31.14 -incorporation of it by reference (with any options elected by the
   31.15 -author(s) and/or publisher) is displayed in the reproduction.
   31.16 -
   31.17 -Proper form for an incorporation by reference is as follows:
   31.18 -
   31.19 -\begin{quote}
   31.20 -  Copyright (c) \emph{year} by \emph{author's name or designee}. This
   31.21 -  material may be distributed only subject to the terms and conditions
   31.22 -  set forth in the Open Publication License, v\emph{x.y} or later (the
   31.23 -  latest version is presently available at
   31.24 -  \url{http://www.opencontent.org/openpub/}).
   31.25 -\end{quote}
   31.26 -
   31.27 -The reference must be immediately followed with any options elected by
   31.28 -the author(s) and/or publisher of the document (see
   31.29 -section~\ref{sec:opl:options}).
   31.30 -
   31.31 -Commercial redistribution of Open Publication-licensed material is
   31.32 -permitted.
   31.33 -
   31.34 -Any publication in standard (paper) book form shall require the
   31.35 -citation of the original publisher and author. The publisher and
   31.36 -author's names shall appear on all outer surfaces of the book. On all
   31.37 -outer surfaces of the book the original publisher's name shall be as
   31.38 -large as the title of the work and cited as possessive with respect to
   31.39 -the title.
   31.40 -
   31.41 -\section{Copyright}
   31.42 -
   31.43 -The copyright to each Open Publication is owned by its author(s) or
   31.44 -designee.
   31.45 -
   31.46 -\section{Scope of license}
   31.47 -
   31.48 -The following license terms apply to all Open Publication works,
   31.49 -unless otherwise explicitly stated in the document.
   31.50 -
   31.51 -Mere aggregation of Open Publication works or a portion of an Open
   31.52 -Publication work with other works or programs on the same media shall
   31.53 -not cause this license to apply to those other works. The aggregate
   31.54 -work shall contain a notice specifying the inclusion of the Open
   31.55 -Publication material and appropriate copyright notice.
   31.56 -
   31.57 -\textbf{Severability}. If any part of this license is found to be
   31.58 -unenforceable in any jurisdiction, the remaining portions of the
   31.59 -license remain in force.
   31.60 -
   31.61 -\textbf{No warranty}. Open Publication works are licensed and provided
   31.62 -``as is'' without warranty of any kind, express or implied, including,
   31.63 -but not limited to, the implied warranties of merchantability and
   31.64 -fitness for a particular purpose or a warranty of non-infringement.
   31.65 -
   31.66 -\section{Requirements on modified works}
   31.67 -
   31.68 -All modified versions of documents covered by this license, including
   31.69 -translations, anthologies, compilations and partial documents, must
   31.70 -meet the following requirements:
   31.71 -
   31.72 -\begin{enumerate}
   31.73 -\item The modified version must be labeled as such.
   31.74 -\item The person making the modifications must be identified and the
   31.75 -  modifications dated.
   31.76 -\item Acknowledgement of the original author and publisher if
   31.77 -  applicable must be retained according to normal academic citation
   31.78 -  practices.
   31.79 -\item The location of the original unmodified document must be
   31.80 -  identified.
   31.81 -\item The original author's (or authors') name(s) may not be used to
   31.82 -  assert or imply endorsement of the resulting document without the
   31.83 -  original author's (or authors') permission.
   31.84 -\end{enumerate}
   31.85 -
   31.86 -\section{Good-practice recommendations}
   31.87 -
   31.88 -In addition to the requirements of this license, it is requested from
   31.89 -and strongly recommended of redistributors that:
   31.90 -
   31.91 -\begin{enumerate}
   31.92 -\item If you are distributing Open Publication works on hardcopy or
   31.93 -  CD-ROM, you provide email notification to the authors of your intent
   31.94 -  to redistribute at least thirty days before your manuscript or media
   31.95 -  freeze, to give the authors time to provide updated documents. This
   31.96 -  notification should describe modifications, if any, made to the
   31.97 -  document.
   31.98 -\item All substantive modifications (including deletions) be either
   31.99 -  clearly marked up in the document or else described in an attachment
  31.100 -  to the document.
  31.101 -\item Finally, while it is not mandatory under this license, it is
  31.102 -  considered good form to offer a free copy of any hardcopy and CD-ROM
  31.103 -  expression of an Open Publication-licensed work to its author(s).
  31.104 -\end{enumerate}
  31.105 -
  31.106 -\section{License options}
  31.107 -\label{sec:opl:options}
  31.108 -
  31.109 -The author(s) and/or publisher of an Open Publication-licensed
  31.110 -document may elect certain options by appending language to the
  31.111 -reference to or copy of the license. These options are considered part
  31.112 -of the license instance and must be included with the license (or its
  31.113 -incorporation by reference) in derived works.
  31.114 -
  31.115 -\begin{enumerate}[A]
  31.116 -\item To prohibit distribution of substantively modified versions
  31.117 -  without the explicit permission of the author(s). ``Substantive
  31.118 -  modification'' is defined as a change to the semantic content of the
  31.119 -  document, and excludes mere changes in format or typographical
  31.120 -  corrections.
  31.121 -
  31.122 -  To accomplish this, add the phrase ``Distribution of substantively
  31.123 -  modified versions of this document is prohibited without the
  31.124 -  explicit permission of the copyright holder.'' to the license
  31.125 -  reference or copy.
  31.126 -
  31.127 -\item To prohibit any publication of this work or derivative works in
  31.128 -  whole or in part in standard (paper) book form for commercial
  31.129 -  purposes is prohibited unless prior permission is obtained from the
  31.130 -  copyright holder.
  31.131 -
  31.132 -  To accomplish this, add the phrase ``Distribution of the work or
  31.133 -  derivative of the work in any standard (paper) book form is
  31.134 -  prohibited unless prior permission is obtained from the copyright
  31.135 -  holder.'' to the license reference or copy.
  31.136 -\end{enumerate}
  31.137 -
  31.138 -%%% Local Variables: 
  31.139 -%%% mode: latex
  31.140 -%%% TeX-master: "00book"
  31.141 -%%% End: 

    32.1 --- a/en/mq-collab.tex	Thu Jan 29 22:47:34 2009 -0800
    32.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    32.3 @@ -1,393 +0,0 @@
    32.4 -\chapter{Advanced uses of Mercurial Queues}
    32.5 -\label{chap:mq-collab}
    32.6 -
    32.7 -While it's easy to pick up straightforward uses of Mercurial Queues,
    32.8 -use of a little discipline and some of MQ's less frequently used
    32.9 -capabilities makes it possible to work in complicated development
   32.10 -environments.
   32.11 -
   32.12 -In this chapter, I will use as an example a technique I have used to
   32.13 -manage the development of an Infiniband device driver for the Linux
   32.14 -kernel.  The driver in question is large (at least as drivers go),
   32.15 -with 25,000 lines of code spread across 35 source files.  It is
   32.16 -maintained by a small team of developers.
   32.17 -
   32.18 -While much of the material in this chapter is specific to Linux, the
   32.19 -same principles apply to any code base for which you're not the
   32.20 -primary owner, and upon which you need to do a lot of development.
   32.21 -
   32.22 -\section{The problem of many targets}
   32.23 -
   32.24 -The Linux kernel changes rapidly, and has never been internally
   32.25 -stable; developers frequently make drastic changes between releases.
   32.26 -This means that a version of the driver that works well with a
   32.27 -particular released version of the kernel will not even \emph{compile}
   32.28 -correctly against, typically, any other version.
   32.29 -
   32.30 -To maintain a driver, we have to keep a number of distinct versions of
   32.31 -Linux in mind.
   32.32 -\begin{itemize}
   32.33 -\item One target is the main Linux kernel development tree.
   32.34 -  Maintenance of the code is in this case partly shared by other
   32.35 -  developers in the kernel community, who make ``drive-by''
   32.36 -  modifications to the driver as they develop and refine kernel
   32.37 -  subsystems.
   32.38 -\item We also maintain a number of ``backports'' to older versions of
   32.39 -  the Linux kernel, to support the needs of customers who are running
   32.40 -  older Linux distributions that do not incorporate our drivers.  (To
   32.41 -  \emph{backport} a piece of code is to modify it to work in an older
   32.42 -  version of its target environment than the version it was developed
   32.43 -  for.)
   32.44 -\item Finally, we make software releases on a schedule that is
   32.45 -  necessarily not aligned with those used by Linux distributors and
   32.46 -  kernel developers, so that we can deliver new features to customers
   32.47 -  without forcing them to upgrade their entire kernels or
   32.48 -  distributions.
   32.49 -\end{itemize}
   32.50 -
   32.51 -\subsection{Tempting approaches that don't work well}
   32.52 -
   32.53 -There are two ``standard'' ways to maintain a piece of software that
   32.54 -has to target many different environments.
   32.55 -
   32.56 -The first is to maintain a number of branches, each intended for a
   32.57 -single target.  The trouble with this approach is that you must
   32.58 -maintain iron discipline in the flow of changes between repositories.
   32.59 -A new feature or bug fix must start life in a ``pristine'' repository,
   32.60 -then percolate out to every backport repository.  Backport changes are
   32.61 -more limited in the branches they should propagate to; a backport
   32.62 -change that is applied to a branch where it doesn't belong will
   32.63 -probably stop the driver from compiling.
   32.64 -
   32.65 -The second is to maintain a single source tree filled with conditional
   32.66 -statements that turn chunks of code on or off depending on the
   32.67 -intended target.  Because these ``ifdefs'' are not allowed in the
   32.68 -Linux kernel tree, a manual or automatic process must be followed to
   32.69 -strip them out and yield a clean tree.  A code base maintained in this
   32.70 -fashion rapidly becomes a rat's nest of conditional blocks that are
   32.71 -difficult to understand and maintain.
   32.72 -
   32.73 -Neither of these approaches is well suited to a situation where you
   32.74 -don't ``own'' the canonical copy of a source tree.  In the case of a
   32.75 -Linux driver that is distributed with the standard kernel, Linus's
   32.76 -tree contains the copy of the code that will be treated by the world
   32.77 -as canonical.  The upstream version of ``my'' driver can be modified
   32.78 -by people I don't know, without me even finding out about it until
   32.79 -after the changes show up in Linus's tree.  
   32.80 -
   32.81 -These approaches have the added weakness of making it difficult to
   32.82 -generate well-formed patches to submit upstream.
   32.83 -
   32.84 -In principle, Mercurial Queues seems like a good candidate to manage a
   32.85 -development scenario such as the above.  While this is indeed the
   32.86 -case, MQ contains a few added features that make the job more
   32.87 -pleasant.
   32.88 -
   32.89 -\section{Conditionally applying patches with 
   32.90 -  guards}
   32.91 -
   32.92 -Perhaps the best way to maintain sanity with so many targets is to be
   32.93 -able to choose specific patches to apply for a given situation.  MQ
   32.94 -provides a feature called ``guards'' (which originates with quilt's
   32.95 -\texttt{guards} command) that does just this.  To start off, let's
   32.96 -create a simple repository for experimenting in.
   32.97 -\interaction{mq.guards.init}
   32.98 -This gives us a tiny repository that contains two patches that don't
   32.99 -have any dependencies on each other, because they touch different files.
  32.100 -
  32.101 -The idea behind conditional application is that you can ``tag'' a
  32.102 -patch with a \emph{guard}, which is simply a text string of your
  32.103 -choosing, then tell MQ to select specific guards to use when applying
  32.104 -patches.  MQ will then either apply, or skip over, a guarded patch,
  32.105 -depending on the guards that you have selected.
  32.106 -
  32.107 -A patch can have an arbitrary number of guards;
  32.108 -each one is \emph{positive} (``apply this patch if this guard is
  32.109 -selected'') or \emph{negative} (``skip this patch if this guard is
  32.110 -selected'').  A patch with no guards is always applied.
  32.111 -
  32.112 -\section{Controlling the guards on a patch}
  32.113 -
  32.114 -The \hgxcmd{mq}{qguard} command lets you determine which guards should
  32.115 -apply to a patch, or display the guards that are already in effect.
  32.116 -Without any arguments, it displays the guards on the current topmost
  32.117 -patch.
  32.118 -\interaction{mq.guards.qguard}
  32.119 -To set a positive guard on a patch, prefix the name of the guard with
  32.120 -a ``\texttt{+}''.
  32.121 -\interaction{mq.guards.qguard.pos}
  32.122 -To set a negative guard on a patch, prefix the name of the guard with
  32.123 -a ``\texttt{-}''.
  32.124 -\interaction{mq.guards.qguard.neg}
  32.125 -
  32.126 -\begin{note}
  32.127 -  The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it
  32.128 -  doesn't \emph{modify} them.  What this means is that if you run
  32.129 -  \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on
  32.130 -  the same patch, the \emph{only} guard that will be set on it
  32.131 -  afterwards is \texttt{+c}.
  32.132 -\end{note}
  32.133 -
  32.134 -Mercurial stores guards in the \sfilename{series} file; the form in
  32.135 -which they are stored is easy both to understand and to edit by hand.
  32.136 -(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if
  32.137 -you don't want to; it's okay to simply edit the \sfilename{series}
  32.138 -file.)
  32.139 -\interaction{mq.guards.series}
  32.140 -
  32.141 -\section{Selecting the guards to use}
  32.142 -
  32.143 -The \hgxcmd{mq}{qselect} command determines which guards are active at a
  32.144 -given time.  The effect of this is to determine which patches MQ will
  32.145 -apply the next time you run \hgxcmd{mq}{qpush}.  It has no other effect; in
  32.146 -particular, it doesn't do anything to patches that are already
  32.147 -applied.
  32.148 -
  32.149 -With no arguments, the \hgxcmd{mq}{qselect} command lists the guards
  32.150 -currently in effect, one per line of output.  Each argument is treated
  32.151 -as the name of a guard to apply.
  32.152 -\interaction{mq.guards.qselect.foo}
  32.153 -In case you're interested, the currently selected guards are stored in
  32.154 -the \sfilename{guards} file.
  32.155 -\interaction{mq.guards.qselect.cat}
  32.156 -We can see the effect the selected guards have when we run
  32.157 -\hgxcmd{mq}{qpush}.
  32.158 -\interaction{mq.guards.qselect.qpush}
  32.159 -
  32.160 -A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}''
  32.161 -character.  The name of a guard must not contain white space, but most
  32.162 -other characters are acceptable.  If you try to use a guard with an
  32.163 -invalid name, MQ will complain:
  32.164 -\interaction{mq.guards.qselect.error} 
  32.165 -Changing the selected guards changes the patches that are applied.
  32.166 -\interaction{mq.guards.qselect.quux} 
  32.167 -You can see in the example below that negative guards take precedence
  32.168 -over positive guards.
  32.169 -\interaction{mq.guards.qselect.foobar}
  32.170 -
  32.171 -\section{MQ's rules for applying patches}
  32.172 -
  32.173 -The rules that MQ uses when deciding whether to apply a patch
  32.174 -are as follows.
  32.175 -\begin{itemize}
  32.176 -\item A patch that has no guards is always applied.
  32.177 -\item If the patch has any negative guard that matches any currently
  32.178 -  selected guard, the patch is skipped.
  32.179 -\item If the patch has any positive guard that matches any currently
  32.180 -  selected guard, the patch is applied.
  32.181 -\item If the patch has positive or negative guards, but none matches
  32.182 -  any currently selected guard, the patch is skipped.
  32.183 -\end{itemize}
  32.184 -
  32.185 -\section{Trimming the work environment}
  32.186 -
  32.187 -In working on the device driver I mentioned earlier, I don't apply the
  32.188 -patches to a normal Linux kernel tree.  Instead, I use a repository
  32.189 -that contains only a snapshot of the source files and headers that are
  32.190 -relevant to Infiniband development.  This repository is~1\% the size
  32.191 -of a kernel repository, so it's easier to work with.
  32.192 -
  32.193 -I then choose a ``base'' version on top of which the patches are
  32.194 -applied.  This is a snapshot of the Linux kernel tree as of a revision
  32.195 -of my choosing.  When I take the snapshot, I record the changeset ID
  32.196 -from the kernel repository in the commit message.  Since the snapshot
  32.197 -preserves the ``shape'' and content of the relevant parts of the
  32.198 -kernel tree, I can apply my patches on top of either my tiny
  32.199 -repository or a normal kernel tree.
  32.200 -
  32.201 -Normally, the base tree atop which the patches apply should be a
  32.202 -snapshot of a very recent upstream tree.  This best facilitates the
  32.203 -development of patches that can easily be submitted upstream with few
  32.204 -or no modifications.
  32.205 -
  32.206 -\section{Dividing up the \sfilename{series} file}
  32.207 -
  32.208 -I categorise the patches in the \sfilename{series} file into a number
  32.209 -of logical groups.  Each section of like patches begins with a block
  32.210 -of comments that describes the purpose of the patches that follow.
  32.211 -
  32.212 -The sequence of patch groups that I maintain follows.  The ordering of
  32.213 -these groups is important; I'll describe why after I introduce the
  32.214 -groups.
  32.215 -\begin{itemize}
  32.216 -\item The ``accepted'' group.  Patches that the development team has
  32.217 -  submitted to the maintainer of the Infiniband subsystem, and which
  32.218 -  he has accepted, but which are not present in the snapshot that the
  32.219 -  tiny repository is based on.  These are ``read only'' patches,
  32.220 -  present only to transform the tree into a similar state as it is in
  32.221 -  the upstream maintainer's repository.
  32.222 -\item The ``rework'' group.  Patches that I have submitted, but that
  32.223 -  the upstream maintainer has requested modifications to before he
  32.224 -  will accept them.
  32.225 -\item The ``pending'' group.  Patches that I have not yet submitted to
  32.226 -  the upstream maintainer, but which we have finished working on.
  32.227 -  These will be ``read only'' for a while.  If the upstream maintainer
  32.228 -  accepts them upon submission, I'll move them to the end of the
  32.229 -  ``accepted'' group.  If he requests that I modify any, I'll move
  32.230 -  them to the beginning of the ``rework'' group.
  32.231 -\item The ``in progress'' group.  Patches that are actively being
  32.232 -  developed, and should not be submitted anywhere yet.
  32.233 -\item The ``backport'' group.  Patches that adapt the source tree to
  32.234 -  older versions of the kernel tree.
  32.235 -\item The ``do not ship'' group.  Patches that for some reason should
  32.236 -  never be submitted upstream.  For example, one such patch might
  32.237 -  change embedded driver identification strings to make it easier to
  32.238 -  distinguish, in the field, between an out-of-tree version of the
  32.239 -  driver and a version shipped by a distribution vendor.
  32.240 -\end{itemize}
  32.241 -
  32.242 -Now to return to the reasons for ordering groups of patches in this
  32.243 -way.  We would like the lowest patches in the stack to be as stable as
  32.244 -possible, so that we will not need to rework higher patches due to
  32.245 -changes in context.  Putting patches that will never be changed first
  32.246 -in the \sfilename{series} file serves this purpose.
  32.247 -
  32.248 -We would also like the patches that we know we'll need to modify to be
  32.249 -applied on top of a source tree that resembles the upstream tree as
  32.250 -closely as possible.  This is why we keep accepted patches around for
  32.251 -a while.
  32.252 -
  32.253 -The ``backport'' and ``do not ship'' patches float at the end of the
  32.254 -\sfilename{series} file.  The backport patches must be applied on top
  32.255 -of all other patches, and the ``do not ship'' patches might as well
  32.256 -stay out of harm's way.
  32.257 -
  32.258 -\section{Maintaining the patch series}
  32.259 -
  32.260 -In my work, I use a number of guards to control which patches are to
  32.261 -be applied.
  32.262 -
  32.263 -\begin{itemize}
  32.264 -\item ``Accepted'' patches are guarded with \texttt{accepted}.  I
  32.265 -  enable this guard most of the time.  When I'm applying the patches
  32.266 -  on top of a tree where the patches are already present, I can turn
  32.267 -  this patch off, and the patches that follow it will apply cleanly.
  32.268 -\item Patches that are ``finished'', but not yet submitted, have no
  32.269 -  guards.  If I'm applying the patch stack to a copy of the upstream
  32.270 -  tree, I don't need to enable any guards in order to get a reasonably
  32.271 -  safe source tree.
  32.272 -\item Those patches that need reworking before being resubmitted are
  32.273 -  guarded with \texttt{rework}.
  32.274 -\item For those patches that are still under development, I use
  32.275 -  \texttt{devel}.
  32.276 -\item A backport patch may have several guards, one for each version
  32.277 -  of the kernel to which it applies.  For example, a patch that
  32.278 -  backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard.
  32.279 -\end{itemize}
  32.280 -This variety of guards gives me considerable flexibility in
  32.281 -determining what kind of source tree I want to end up with.  For most
  32.282 -situations, the selection of appropriate guards is automated during
  32.283 -the build process, but I can manually tune the guards to use for less
  32.284 -common circumstances.
  32.285 -
  32.286 -\subsection{The art of writing backport patches}
  32.287 -
  32.288 -Using MQ, writing a backport patch is a simple process.  All such a
  32.289 -patch has to do is modify a piece of code that uses a kernel feature
  32.290 -not present in the older version of the kernel, so that the driver
  32.291 -continues to work correctly under that older version.
  32.292 -
  32.293 -A useful goal when writing a good backport patch is to make your code
  32.294 -look as if it was written for the older version of the kernel you're
  32.295 -targeting.  The less obtrusive the patch, the easier it will be to
  32.296 -understand and maintain.  If you're writing a collection of backport
  32.297 -patches to avoid the ``rat's nest'' effect of lots of
  32.298 -\texttt{\#ifdef}s (hunks of source code that are only used
  32.299 -conditionally) in your code, don't introduce version-dependent
  32.300 -\texttt{\#ifdef}s into the patches.  Instead, write several patches,
  32.301 -each of which makes unconditional changes, and control their
  32.302 -application using guards.
  32.303 -
  32.304 -There are two reasons to divide backport patches into a distinct
  32.305 -group, away from the ``regular'' patches whose effects they modify.
  32.306 -The first is that intermingling the two makes it more difficult to use
  32.307 -a tool like the \hgext{patchbomb} extension to automate the process of
  32.308 -submitting the patches to an upstream maintainer.  The second is that
  32.309 -a backport patch could perturb the context in which a subsequent
  32.310 -regular patch is applied, making it impossible to apply the regular
  32.311 -patch cleanly \emph{without} the earlier backport patch already being
  32.312 -applied.
  32.313 -
  32.314 -\section{Useful tips for developing with MQ}
  32.315 -
  32.316 -\subsection{Organising patches in directories}
  32.317 -
  32.318 -If you're working on a substantial project with MQ, it's not difficult
  32.319 -to accumulate a large number of patches.  For example, I have one
  32.320 -patch repository that contains over 250 patches.
  32.321 -
  32.322 -If you can group these patches into separate logical categories, you
  32.323 -can if you like store them in different directories; MQ has no
  32.324 -problems with patch names that contain path separators.
  32.325 -
  32.326 -\subsection{Viewing the history of a patch}
  32.327 -\label{mq-collab:tips:interdiff}
  32.328 -
  32.329 -If you're developing a set of patches over a long time, it's a good
  32.330 -idea to maintain them in a repository, as discussed in
  32.331 -section~\ref{sec:mq:repo}.  If you do so, you'll quickly discover that
  32.332 -using the \hgcmd{diff} command to look at the history of changes to a
  32.333 -patch is unworkable.  This is in part because you're looking at the
  32.334 -second derivative of the real code (a diff of a diff), but also
  32.335 -because MQ adds noise to the process by modifying time stamps and
  32.336 -directory names when it updates a patch.
  32.337 -
  32.338 -However, you can use the \hgext{extdiff} extension, which is bundled
  32.339 -with Mercurial, to turn a diff of two versions of a patch into
  32.340 -something readable.  To do this, you will need a third-party package
  32.341 -called \package{patchutils}~\cite{web:patchutils}.  This provides a
  32.342 -command named \command{interdiff}, which shows the differences between
  32.343 -two diffs as a diff.  Used on two versions of the same diff, it
  32.344 -generates a diff that represents the diff from the first to the second
  32.345 -version.
  32.346 -
  32.347 -You can enable the \hgext{extdiff} extension in the usual way, by
  32.348 -adding a line to the \rcsection{extensions} section of your \hgrc.
  32.349 -\begin{codesample2}
  32.350 -  [extensions]
  32.351 -  extdiff =
  32.352 -\end{codesample2}
  32.353 -The \command{interdiff} command expects to be passed the names of two
  32.354 -files, but the \hgext{extdiff} extension passes the program it runs a
  32.355 -pair of directories, each of which can contain an arbitrary number of
  32.356 -files.  We thus need a small program that will run \command{interdiff}
  32.357 -on each pair of files in these two directories.  This program is
  32.358 -available as \sfilename{hg-interdiff} in the \dirname{examples}
  32.359 -directory of the source code repository that accompanies this book.
  32.360 -\excode{hg-interdiff}
  32.361 -
  32.362 -With the \sfilename{hg-interdiff} program in your shell's search path,
  32.363 -you can run it as follows, from inside an MQ patch directory:
  32.364 -\begin{codesample2}
  32.365 -  hg extdiff -p hg-interdiff -r A:B my-change.patch
  32.366 -\end{codesample2}
  32.367 -Since you'll probably want to use this long-winded command a lot, you
  32.368 -can get \hgext{hgext} to make it available as a normal Mercurial
  32.369 -command, again by editing your \hgrc.
  32.370 -\begin{codesample2}
  32.371 -  [extdiff]
  32.372 -  cmd.interdiff = hg-interdiff
  32.373 -\end{codesample2}
  32.374 -This directs \hgext{hgext} to make an \texttt{interdiff} command
  32.375 -available, so you can now shorten the previous invocation of
  32.376 -\hgxcmd{extdiff}{extdiff} to something a little more wieldy.
  32.377 -\begin{codesample2}
  32.378 -  hg interdiff -r A:B my-change.patch
  32.379 -\end{codesample2}
  32.380 -
  32.381 -\begin{note}
  32.382 -  The \command{interdiff} command works well only if the underlying
  32.383 -  files against which versions of a patch are generated remain the
  32.384 -  same.  If you create a patch, modify the underlying files, and then
  32.385 -  regenerate the patch, \command{interdiff} may not produce useful
  32.386 -  output.
  32.387 -\end{note}
  32.388 -
  32.389 -The \hgext{extdiff} extension is useful for more than merely improving
  32.390 -the presentation of MQ~patches.  To read more about it, go to
  32.391 -section~\ref{sec:hgext:extdiff}.
  32.392 -
  32.393 -%%% Local Variables: 
  32.394 -%%% mode: latex
  32.395 -%%% TeX-master: "00book"
  32.396 -%%% End: 

    33.1 --- a/en/mq-ref.tex	Thu Jan 29 22:47:34 2009 -0800
    33.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    33.3 @@ -1,349 +0,0 @@
    33.4 -\chapter{Mercurial Queues reference}
    33.5 -\label{chap:mqref}
    33.6 -
    33.7 -\section{MQ command reference}
    33.8 -\label{sec:mqref:cmdref}
    33.9 -
   33.10 -For an overview of the commands provided by MQ, use the command
   33.11 -\hgcmdargs{help}{mq}.
   33.12 -
   33.13 -\subsection{\hgxcmd{mq}{qapplied}---print applied patches}
   33.14 -
   33.15 -The \hgxcmd{mq}{qapplied} command prints the current stack of applied
   33.16 -patches.  Patches are printed in oldest-to-newest order, so the last
   33.17 -patch in the list is the ``top'' patch.
   33.18 -
   33.19 -\subsection{\hgxcmd{mq}{qcommit}---commit changes in the queue repository}
   33.20 -
   33.21 -The \hgxcmd{mq}{qcommit} command commits any outstanding changes in the
   33.22 -\sdirname{.hg/patches} repository.  This command only works if the
   33.23 -\sdirname{.hg/patches} directory is a repository, i.e.~you created the
   33.24 -directory using \hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} or ran
   33.25 -\hgcmd{init} in the directory after running \hgxcmd{mq}{qinit}.
   33.26 -
   33.27 -This command is shorthand for \hgcmdargs{commit}{--cwd .hg/patches}.
   33.28 -
   33.29 -\subsection{\hgxcmd{mq}{qdelete}---delete a patch from the
   33.30 -  \sfilename{series} file}
   33.31 -
   33.32 -The \hgxcmd{mq}{qdelete} command removes the entry for a patch from the
   33.33 -\sfilename{series} file in the \sdirname{.hg/patches} directory.  It
   33.34 -does not pop the patch if the patch is already applied.  By default,
   33.35 -it does not delete the patch file; use the \hgxopt{mq}{qdel}{-f} option to
   33.36 -do that.
   33.37 -
   33.38 -Options:
   33.39 -\begin{itemize}
   33.40 -\item[\hgxopt{mq}{qdel}{-f}] Delete the patch file.
   33.41 -\end{itemize}
   33.42 -
   33.43 -\subsection{\hgxcmd{mq}{qdiff}---print a diff of the topmost applied patch}
   33.44 -
   33.45 -The \hgxcmd{mq}{qdiff} command prints a diff of the topmost applied patch.
   33.46 -It is equivalent to \hgcmdargs{diff}{-r-2:-1}.
   33.47 -
   33.48 -\subsection{\hgxcmd{mq}{qfold}---merge (``fold'') several patches into one}
   33.49 -
   33.50 -The \hgxcmd{mq}{qfold} command merges multiple patches into the topmost
   33.51 -applied patch, so that the topmost applied patch makes the union of
   33.52 -all of the changes in the patches in question.
   33.53 -
   33.54 -The patches to fold must not be applied; \hgxcmd{mq}{qfold} will exit with
   33.55 -an error if any is.  The order in which patches are folded is
   33.56 -significant; \hgcmdargs{qfold}{a b} means ``apply the current topmost
   33.57 -patch, followed by \texttt{a}, followed by \texttt{b}''.
   33.58 -
   33.59 -The comments from the folded patches are appended to the comments of
   33.60 -the destination patch, with each block of comments separated by three
   33.61 -asterisk (``\texttt{*}'') characters.  Use the \hgxopt{mq}{qfold}{-e}
   33.62 -option to edit the commit message for the combined patch/changeset
   33.63 -after the folding has completed.
   33.64 -
   33.65 -Options:
   33.66 -\begin{itemize}
   33.67 -\item[\hgxopt{mq}{qfold}{-e}] Edit the commit message and patch description
   33.68 -  for the newly folded patch.
   33.69 -\item[\hgxopt{mq}{qfold}{-l}] Use the contents of the given file as the new
   33.70 -  commit message and patch description for the folded patch.
   33.71 -\item[\hgxopt{mq}{qfold}{-m}] Use the given text as the new commit message
   33.72 -  and patch description for the folded patch.
   33.73 -\end{itemize}
   33.74 -
   33.75 -\subsection{\hgxcmd{mq}{qheader}---display the header/description of a patch}
   33.76 -
   33.77 -The \hgxcmd{mq}{qheader} command prints the header, or description, of a
   33.78 -patch.  By default, it prints the header of the topmost applied patch.
   33.79 -Given an argument, it prints the header of the named patch.
   33.80 -
   33.81 -\subsection{\hgxcmd{mq}{qimport}---import a third-party patch into the queue}
   33.82 -
   33.83 -The \hgxcmd{mq}{qimport} command adds an entry for an external patch to the
   33.84 -\sfilename{series} file, and copies the patch into the
   33.85 -\sdirname{.hg/patches} directory.  It adds the entry immediately after
   33.86 -the topmost applied patch, but does not push the patch.
   33.87 -
   33.88 -If the \sdirname{.hg/patches} directory is a repository,
   33.89 -\hgxcmd{mq}{qimport} automatically does an \hgcmd{add} of the imported
   33.90 -patch.
   33.91 -
   33.92 -\subsection{\hgxcmd{mq}{qinit}---prepare a repository to work with MQ}
   33.93 -
   33.94 -The \hgxcmd{mq}{qinit} command prepares a repository to work with MQ.  It
   33.95 -creates a directory called \sdirname{.hg/patches}.
   33.96 -
   33.97 -Options:
   33.98 -\begin{itemize}
   33.99 -\item[\hgxopt{mq}{qinit}{-c}] Create \sdirname{.hg/patches} as a repository
  33.100 -  in its own right.  Also creates a \sfilename{.hgignore} file that
  33.101 -  will ignore the \sfilename{status} file.
  33.102 -\end{itemize}
  33.103 -
  33.104 -When the \sdirname{.hg/patches} directory is a repository, the
  33.105 -\hgxcmd{mq}{qimport} and \hgxcmd{mq}{qnew} commands automatically \hgcmd{add}
  33.106 -new patches.
  33.107 -
  33.108 -\subsection{\hgxcmd{mq}{qnew}---create a new patch}
  33.109 -
  33.110 -The \hgxcmd{mq}{qnew} command creates a new patch.  It takes one mandatory
  33.111 -argument, the name to use for the patch file.  The newly created patch
  33.112 -is created empty by default.  It is added to the \sfilename{series}
  33.113 -file after the current topmost applied patch, and is immediately
  33.114 -pushed on top of that patch.
  33.115 -
  33.116 -If \hgxcmd{mq}{qnew} finds modified files in the working directory, it will
  33.117 -refuse to create a new patch unless the \hgxopt{mq}{qnew}{-f} option is
  33.118 -used (see below).  This behaviour allows you to \hgxcmd{mq}{qrefresh} your
  33.119 -topmost applied patch before you apply a new patch on top of it.
  33.120 -
  33.121 -Options:
  33.122 -\begin{itemize}
  33.123 -\item[\hgxopt{mq}{qnew}{-f}] Create a new patch if the contents of the
  33.124 -  working directory are modified.  Any outstanding modifications are
  33.125 -  added to the newly created patch, so after this command completes,
  33.126 -  the working directory will no longer be modified.
  33.127 -\item[\hgxopt{mq}{qnew}{-m}] Use the given text as the commit message.
  33.128 -  This text will be stored at the beginning of the patch file, before
  33.129 -  the patch data.
  33.130 -\end{itemize}
  33.131 -
  33.132 -\subsection{\hgxcmd{mq}{qnext}---print the name of the next patch}
  33.133 -
  33.134 -The \hgxcmd{mq}{qnext} command prints the name name of the next patch in
  33.135 -the \sfilename{series} file after the topmost applied patch.  This
  33.136 -patch will become the topmost applied patch if you run \hgxcmd{mq}{qpush}.
  33.137 -
  33.138 -\subsection{\hgxcmd{mq}{qpop}---pop patches off the stack}
  33.139 -
  33.140 -The \hgxcmd{mq}{qpop} command removes applied patches from the top of the
  33.141 -stack of applied patches.  By default, it removes only one patch.
  33.142 -
  33.143 -This command removes the changesets that represent the popped patches
  33.144 -from the repository, and updates the working directory to undo the
  33.145 -effects of the patches.
  33.146 -
  33.147 -This command takes an optional argument, which it uses as the name or
  33.148 -index of the patch to pop to.  If given a name, it will pop patches
  33.149 -until the named patch is the topmost applied patch.  If given a
  33.150 -number, \hgxcmd{mq}{qpop} treats the number as an index into the entries in
  33.151 -the series file, counting from zero (empty lines and lines containing
  33.152 -only comments do not count).  It pops patches until the patch
  33.153 -identified by the given index is the topmost applied patch.
  33.154 -
  33.155 -The \hgxcmd{mq}{qpop} command does not read or write patches or the
  33.156 -\sfilename{series} file.  It is thus safe to \hgxcmd{mq}{qpop} a patch that
  33.157 -you have removed from the \sfilename{series} file, or a patch that you
  33.158 -have renamed or deleted entirely.  In the latter two cases, use the
  33.159 -name of the patch as it was when you applied it.
  33.160 -
  33.161 -By default, the \hgxcmd{mq}{qpop} command will not pop any patches if the
  33.162 -working directory has been modified.  You can override this behaviour
  33.163 -using the \hgxopt{mq}{qpop}{-f} option, which reverts all modifications in
  33.164 -the working directory.
  33.165 -
  33.166 -Options:
  33.167 -\begin{itemize}
  33.168 -\item[\hgxopt{mq}{qpop}{-a}] Pop all applied patches.  This returns the
  33.169 -  repository to its state before you applied any patches.
  33.170 -\item[\hgxopt{mq}{qpop}{-f}] Forcibly revert any modifications to the
  33.171 -  working directory when popping.
  33.172 -\item[\hgxopt{mq}{qpop}{-n}] Pop a patch from the named queue.
  33.173 -\end{itemize}
  33.174 -
  33.175 -The \hgxcmd{mq}{qpop} command removes one line from the end of the
  33.176 -\sfilename{status} file for each patch that it pops.
  33.177 -
  33.178 -\subsection{\hgxcmd{mq}{qprev}---print the name of the previous patch}
  33.179 -
  33.180 -The \hgxcmd{mq}{qprev} command prints the name of the patch in the
  33.181 -\sfilename{series} file that comes before the topmost applied patch.
  33.182 -This will become the topmost applied patch if you run \hgxcmd{mq}{qpop}.
  33.183 -
  33.184 -\subsection{\hgxcmd{mq}{qpush}---push patches onto the stack}
  33.185 -\label{sec:mqref:cmd:qpush}
  33.186 -
  33.187 -The \hgxcmd{mq}{qpush} command adds patches onto the applied stack.  By
  33.188 -default, it adds only one patch.
  33.189 -
  33.190 -This command creates a new changeset to represent each applied patch,
  33.191 -and updates the working directory to apply the effects of the patches.
  33.192 -
  33.193 -The default data used when creating a changeset are as follows:
  33.194 -\begin{itemize}
  33.195 -\item The commit date and time zone are the current date and time
  33.196 -  zone.  Because these data are used to compute the identity of a
  33.197 -  changeset, this means that if you \hgxcmd{mq}{qpop} a patch and
  33.198 -  \hgxcmd{mq}{qpush} it again, the changeset that you push will have a
  33.199 -  different identity than the changeset you popped.
  33.200 -\item The author is the same as the default used by the \hgcmd{commit}
  33.201 -  command.
  33.202 -\item The commit message is any text from the patch file that comes
  33.203 -  before the first diff header.  If there is no such text, a default
  33.204 -  commit message is used that identifies the name of the patch.
  33.205 -\end{itemize}
  33.206 -If a patch contains a Mercurial patch header (XXX add link), the
  33.207 -information in the patch header overrides these defaults.
  33.208 -
  33.209 -Options:
  33.210 -\begin{itemize}
  33.211 -\item[\hgxopt{mq}{qpush}{-a}] Push all unapplied patches from the
  33.212 -  \sfilename{series} file until there are none left to push.
  33.213 -\item[\hgxopt{mq}{qpush}{-l}] Add the name of the patch to the end
  33.214 -  of the commit message.
  33.215 -\item[\hgxopt{mq}{qpush}{-m}] If a patch fails to apply cleanly, use the
  33.216 -  entry for the patch in another saved queue to compute the parameters
  33.217 -  for a three-way merge, and perform a three-way merge using the
  33.218 -  normal Mercurial merge machinery.  Use the resolution of the merge
  33.219 -  as the new patch content.
  33.220 -\item[\hgxopt{mq}{qpush}{-n}] Use the named queue if merging while pushing.
  33.221 -\end{itemize}
  33.222 -
  33.223 -The \hgxcmd{mq}{qpush} command reads, but does not modify, the
  33.224 -\sfilename{series} file.  It appends one line to the \hgcmd{status}
  33.225 -file for each patch that it pushes.
  33.226 -
  33.227 -\subsection{\hgxcmd{mq}{qrefresh}---update the topmost applied patch}
  33.228 -
  33.229 -The \hgxcmd{mq}{qrefresh} command updates the topmost applied patch.  It
  33.230 -modifies the patch, removes the old changeset that represented the
  33.231 -patch, and creates a new changeset to represent the modified patch.
  33.232 -
  33.233 -The \hgxcmd{mq}{qrefresh} command looks for the following modifications:
  33.234 -\begin{itemize}
  33.235 -\item Changes to the commit message, i.e.~the text before the first
  33.236 -  diff header in the patch file, are reflected in the new changeset
  33.237 -  that represents the patch.
  33.238 -\item Modifications to tracked files in the working directory are
  33.239 -  added to the patch.
  33.240 -\item Changes to the files tracked using \hgcmd{add}, \hgcmd{copy},
  33.241 -  \hgcmd{remove}, or \hgcmd{rename}.  Added files and copy and rename
  33.242 -  destinations are added to the patch, while removed files and rename
  33.243 -  sources are removed.
  33.244 -\end{itemize}
  33.245 -
  33.246 -Even if \hgxcmd{mq}{qrefresh} detects no changes, it still recreates the
  33.247 -changeset that represents the patch.  This causes the identity of the
  33.248 -changeset to differ from the previous changeset that identified the
  33.249 -patch.
  33.250 -
  33.251 -Options:
  33.252 -\begin{itemize}
  33.253 -\item[\hgxopt{mq}{qrefresh}{-e}] Modify the commit and patch description,
  33.254 -  using the preferred text editor.
  33.255 -\item[\hgxopt{mq}{qrefresh}{-m}] Modify the commit message and patch
  33.256 -  description, using the given text.
  33.257 -\item[\hgxopt{mq}{qrefresh}{-l}] Modify the commit message and patch
  33.258 -  description, using text from the given file.
  33.259 -\end{itemize}
  33.260 -
  33.261 -\subsection{\hgxcmd{mq}{qrename}---rename a patch}
  33.262 -
  33.263 -The \hgxcmd{mq}{qrename} command renames a patch, and changes the entry for
  33.264 -the patch in the \sfilename{series} file.
  33.265 -
  33.266 -With a single argument, \hgxcmd{mq}{qrename} renames the topmost applied
  33.267 -patch.  With two arguments, it renames its first argument to its
  33.268 -second.
  33.269 -
  33.270 -\subsection{\hgxcmd{mq}{qrestore}---restore saved queue state}
  33.271 -
  33.272 -XXX No idea what this does.
  33.273 -
  33.274 -\subsection{\hgxcmd{mq}{qsave}---save current queue state}
  33.275 -
  33.276 -XXX Likewise.
  33.277 -
  33.278 -\subsection{\hgxcmd{mq}{qseries}---print the entire patch series}
  33.279 -
  33.280 -The \hgxcmd{mq}{qseries} command prints the entire patch series from the
  33.281 -\sfilename{series} file.  It prints only patch names, not empty lines
  33.282 -or comments.  It prints in order from first to be applied to last.
  33.283 -
  33.284 -\subsection{\hgxcmd{mq}{qtop}---print the name of the current patch}
  33.285 -
  33.286 -The \hgxcmd{mq}{qtop} prints the name of the topmost currently applied
  33.287 -patch.
  33.288 -
  33.289 -\subsection{\hgxcmd{mq}{qunapplied}---print patches not yet applied}
  33.290 -
  33.291 -The \hgxcmd{mq}{qunapplied} command prints the names of patches from the
  33.292 -\sfilename{series} file that are not yet applied.  It prints them in
  33.293 -order from the next patch that will be pushed to the last.
  33.294 -
  33.295 -\subsection{\hgcmd{strip}---remove a revision and descendants}
  33.296 -
  33.297 -The \hgcmd{strip} command removes a revision, and all of its
  33.298 -descendants, from the repository.  It undoes the effects of the
  33.299 -removed revisions from the repository, and updates the working
  33.300 -directory to the first parent of the removed revision.
  33.301 -
  33.302 -The \hgcmd{strip} command saves a backup of the removed changesets in
  33.303 -a bundle, so that they can be reapplied if removed in error.
  33.304 -
  33.305 -Options:
  33.306 -\begin{itemize}
  33.307 -\item[\hgopt{strip}{-b}] Save unrelated changesets that are intermixed
  33.308 -  with the stripped changesets in the backup bundle.
  33.309 -\item[\hgopt{strip}{-f}] If a branch has multiple heads, remove all
  33.310 -  heads. XXX This should be renamed, and use \texttt{-f} to strip revs
  33.311 -  when there are pending changes.
  33.312 -\item[\hgopt{strip}{-n}] Do not save a backup bundle.
  33.313 -\end{itemize}
  33.314 -
  33.315 -\section{MQ file reference}
  33.316 -
  33.317 -\subsection{The \sfilename{series} file}
  33.318 -
  33.319 -The \sfilename{series} file contains a list of the names of all
  33.320 -patches that MQ can apply.  It is represented as a list of names, with
  33.321 -one name saved per line.  Leading and trailing white space in each
  33.322 -line are ignored.
  33.323 -
  33.324 -Lines may contain comments.  A comment begins with the ``\texttt{\#}''
  33.325 -character, and extends to the end of the line.  Empty lines, and lines
  33.326 -that contain only comments, are ignored.
  33.327 -
  33.328 -You will often need to edit the \sfilename{series} file by hand, hence
  33.329 -the support for comments and empty lines noted above.  For example,
  33.330 -you can comment out a patch temporarily, and \hgxcmd{mq}{qpush} will skip
  33.331 -over that patch when applying patches.  You can also change the order
  33.332 -in which patches are applied by reordering their entries in the
  33.333 -\sfilename{series} file.
  33.334 -
  33.335 -Placing the \sfilename{series} file under revision control is also
  33.336 -supported; it is a good idea to place all of the patches that it
  33.337 -refers to under revision control, as well.  If you create a patch
  33.338 -directory using the \hgxopt{mq}{qinit}{-c} option to \hgxcmd{mq}{qinit}, this
  33.339 -will be done for you automatically.
  33.340 -
  33.341 -\subsection{The \sfilename{status} file}
  33.342 -
  33.343 -The \sfilename{status} file contains the names and changeset hashes of
  33.344 -all patches that MQ currently has applied.  Unlike the
  33.345 -\sfilename{series} file, this file is not intended for editing.  You
  33.346 -should not place this file under revision control, or modify it in any
  33.347 -way.  It is used by MQ strictly for internal book-keeping.
  33.348 -
  33.349 -%%% Local Variables: 
  33.350 -%%% mode: latex
  33.351 -%%% TeX-master: "00book"
  33.352 -%%% End: 

    34.1 --- a/en/mq.tex	Thu Jan 29 22:47:34 2009 -0800
    34.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    34.3 @@ -1,1043 +0,0 @@
    34.4 -\chapter{Managing change with Mercurial Queues}
    34.5 -\label{chap:mq}
    34.6 -
    34.7 -\section{The patch management problem}
    34.8 -\label{sec:mq:patch-mgmt}
    34.9 -
   34.10 -Here is a common scenario: you need to install a software package from
   34.11 -source, but you find a bug that you must fix in the source before you
   34.12 -can start using the package.  You make your changes, forget about the
   34.13 -package for a while, and a few months later you need to upgrade to a
   34.14 -newer version of the package.  If the newer version of the package
   34.15 -still has the bug, you must extract your fix from the older source
   34.16 -tree and apply it against the newer version.  This is a tedious task,
   34.17 -and it's easy to make mistakes.
   34.18 -
   34.19 -This is a simple case of the ``patch management'' problem.  You have
   34.20 -an ``upstream'' source tree that you can't change; you need to make
   34.21 -some local changes on top of the upstream tree; and you'd like to be
   34.22 -able to keep those changes separate, so that you can apply them to
   34.23 -newer versions of the upstream source.
   34.24 -
   34.25 -The patch management problem arises in many situations.  Probably the
   34.26 -most visible is that a user of an open source software project will
   34.27 -contribute a bug fix or new feature to the project's maintainers in the
   34.28 -form of a patch.
   34.29 -
   34.30 -Distributors of operating systems that include open source software
   34.31 -often need to make changes to the packages they distribute so that
   34.32 -they will build properly in their environments.
   34.33 -
   34.34 -When you have few changes to maintain, it is easy to manage a single
   34.35 -patch using the standard \command{diff} and \command{patch} programs
   34.36 -(see section~\ref{sec:mq:patch} for a discussion of these tools).
   34.37 -Once the number of changes grows, it starts to make sense to maintain
   34.38 -patches as discrete ``chunks of work,'' so that for example a single
   34.39 -patch will contain only one bug fix (the patch might modify several
   34.40 -files, but it's doing ``only one thing''), and you may have a number
   34.41 -of such patches for different bugs you need fixed and local changes
   34.42 -you require.  In this situation, if you submit a bug fix patch to the
   34.43 -upstream maintainers of a package and they include your fix in a
   34.44 -subsequent release, you can simply drop that single patch when you're
   34.45 -updating to the newer release.
   34.46 -
   34.47 -Maintaining a single patch against an upstream tree is a little
   34.48 -tedious and error-prone, but not difficult.  However, the complexity
   34.49 -of the problem grows rapidly as the number of patches you have to
   34.50 -maintain increases.  With more than a tiny number of patches in hand,
   34.51 -understanding which ones you have applied and maintaining them moves
   34.52 -from messy to overwhelming.
   34.53 -
   34.54 -Fortunately, Mercurial includes a powerful extension, Mercurial Queues
   34.55 -(or simply ``MQ''), that massively simplifies the patch management
   34.56 -problem.
   34.57 -
   34.58 -\section{The prehistory of Mercurial Queues}
   34.59 -\label{sec:mq:history}
   34.60 -
   34.61 -During the late 1990s, several Linux kernel developers started to
   34.62 -maintain ``patch series'' that modified the behaviour of the Linux
   34.63 -kernel.  Some of these series were focused on stability, some on
   34.64 -feature coverage, and others were more speculative.
   34.65 -
   34.66 -The sizes of these patch series grew rapidly.  In 2002, Andrew Morton
   34.67 -published some shell scripts he had been using to automate the task of
   34.68 -managing his patch queues.  Andrew was successfully using these
   34.69 -scripts to manage hundreds (sometimes thousands) of patches on top of
   34.70 -the Linux kernel.
   34.71 -
   34.72 -\subsection{A patchwork quilt}
   34.73 -\label{sec:mq:quilt}
   34.74 -
   34.75 -In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the
   34.76 -approach of Andrew's scripts and published a tool called ``patchwork
   34.77 -quilt''~\cite{web:quilt}, or simply ``quilt''
   34.78 -(see~\cite{gruenbacher:2005} for a paper describing it).  Because
   34.79 -quilt substantially automated patch management, it rapidly gained a
   34.80 -large following among open source software developers.
   34.81 -
   34.82 -Quilt manages a \emph{stack of patches} on top of a directory tree.
   34.83 -To begin, you tell quilt to manage a directory tree, and tell it which
   34.84 -files you want to manage; it stores away the names and contents of
   34.85 -those files.  To fix a bug, you create a new patch (using a single
   34.86 -command), edit the files you need to fix, then ``refresh'' the patch.
   34.87 -
   34.88 -The refresh step causes quilt to scan the directory tree; it updates
   34.89 -the patch with all of the changes you have made.  You can create
   34.90 -another patch on top of the first, which will track the changes
   34.91 -required to modify the tree from ``tree with one patch applied'' to
   34.92 -``tree with two patches applied''.
   34.93 -
   34.94 -You can \emph{change} which patches are applied to the tree.  If you
   34.95 -``pop'' a patch, the changes made by that patch will vanish from the
   34.96 -directory tree.  Quilt remembers which patches you have popped,
   34.97 -though, so you can ``push'' a popped patch again, and the directory
   34.98 -tree will be restored to contain the modifications in the patch.  Most
   34.99 -importantly, you can run the ``refresh'' command at any time, and the
  34.100 -topmost applied patch will be updated.  This means that you can, at
  34.101 -any time, change both which patches are applied and what
  34.102 -modifications those patches make.
  34.103 -
  34.104 -Quilt knows nothing about revision control tools, so it works equally
  34.105 -well on top of an unpacked tarball or a Subversion working copy.
  34.106 -
  34.107 -\subsection{From patchwork quilt to Mercurial Queues}
  34.108 -\label{sec:mq:quilt-mq}
  34.109 -
  34.110 -In mid-2005, Chris Mason took the features of quilt and wrote an
  34.111 -extension that he called Mercurial Queues, which added quilt-like
  34.112 -behaviour to Mercurial.
  34.113 -
  34.114 -The key difference between quilt and MQ is that quilt knows nothing
  34.115 -about revision control systems, while MQ is \emph{integrated} into
  34.116 -Mercurial.  Each patch that you push is represented as a Mercurial
  34.117 -changeset.  Pop a patch, and the changeset goes away.
  34.118 -
  34.119 -Because quilt does not care about revision control tools, it is still
  34.120 -a tremendously useful piece of software to know about for situations
  34.121 -where you cannot use Mercurial and MQ.
  34.122 -
  34.123 -\section{The huge advantage of MQ}
  34.124 -
  34.125 -I cannot overstate the value that MQ offers through the unification of
  34.126 -patches and revision control.
  34.127 -
  34.128 -A major reason that patches have persisted in the free software and
  34.129 -open source world---in spite of the availability of increasingly
  34.130 -capable revision control tools over the years---is the \emph{agility}
  34.131 -they offer.  
  34.132 -
  34.133 -Traditional revision control tools make a permanent, irreversible
  34.134 -record of everything that you do.  While this has great value, it's
  34.135 -also somewhat stifling.  If you want to perform a wild-eyed
  34.136 -experiment, you have to be careful in how you go about it, or you risk
  34.137 -leaving unneeded---or worse, misleading or destabilising---traces of
  34.138 -your missteps and errors in the permanent revision record.
  34.139 -
  34.140 -By contrast, MQ's marriage of distributed revision control with
  34.141 -patches makes it much easier to isolate your work.  Your patches live
  34.142 -on top of normal revision history, and you can make them disappear or
  34.143 -reappear at will.  If you don't like a patch, you can drop it.  If a
  34.144 -patch isn't quite as you want it to be, simply fix it---as many times
  34.145 -as you need to, until you have refined it into the form you desire.
  34.146 -
  34.147 -As an example, the integration of patches with revision control makes
  34.148 -understanding patches and debugging their effects---and their
  34.149 -interplay with the code they're based on---\emph{enormously} easier.
  34.150 -Since every applied patch has an associated changeset, you can use
  34.151 -\hgcmdargs{log}{\emph{filename}} to see which changesets and patches
  34.152 -affected a file.  You can use the \hgext{bisect} command to
  34.153 -binary-search through all changesets and applied patches to see where
  34.154 -a bug got introduced or fixed.  You can use the \hgcmd{annotate}
  34.155 -command to see which changeset or patch modified a particular line of
  34.156 -a source file.  And so on.
  34.157 -
  34.158 -\section{Understanding patches}
  34.159 -\label{sec:mq:patch}
  34.160 -
  34.161 -Because MQ doesn't hide its patch-oriented nature, it is helpful to
  34.162 -understand what patches are, and a little about the tools that work
  34.163 -with them.
  34.164 -
  34.165 -The traditional Unix \command{diff} command compares two files, and
  34.166 -prints a list of differences between them. The \command{patch} command
  34.167 -understands these differences as \emph{modifications} to make to a
  34.168 -file.  Take a look at figure~\ref{ex:mq:diff} for a simple example of
  34.169 -these commands in action.
  34.170 -
  34.171 -\begin{figure}[ht]
  34.172 -  \interaction{mq.dodiff.diff}
  34.173 -  \caption{Simple uses of the \command{diff} and \command{patch} commands}
  34.174 -  \label{ex:mq:diff}
  34.175 -\end{figure}
  34.176 -
  34.177 -The type of file that \command{diff} generates (and \command{patch}
  34.178 -takes as input) is called a ``patch'' or a ``diff''; there is no
  34.179 -difference between a patch and a diff.  (We'll use the term ``patch'',
  34.180 -since it's more commonly used.)
  34.181 -
  34.182 -A patch file can start with arbitrary text; the \command{patch}
  34.183 -command ignores this text, but MQ uses it as the commit message when
  34.184 -creating changesets.  To find the beginning of the patch content,
  34.185 -\command{patch} searches for the first line that starts with the
  34.186 -string ``\texttt{diff~-}''.
  34.187 -
  34.188 -MQ works with \emph{unified} diffs (\command{patch} can accept several
  34.189 -other diff formats, but MQ doesn't).  A unified diff contains two
  34.190 -kinds of header.  The \emph{file header} describes the file being
  34.191 -modified; it contains the name of the file to modify.  When
  34.192 -\command{patch} sees a new file header, it looks for a file with that
  34.193 -name to start modifying.
  34.194 -
  34.195 -After the file header comes a series of \emph{hunks}.  Each hunk
  34.196 -starts with a header; this identifies the range of line numbers within
  34.197 -the file that the hunk should modify.  Following the header, a hunk
  34.198 -starts and ends with a few (usually three) lines of text from the
  34.199 -unmodified file; these are called the \emph{context} for the hunk.  If
  34.200 -there's only a small amount of context between successive hunks,
  34.201 -\command{diff} doesn't print a new hunk header; it just runs the hunks
  34.202 -together, with a few lines of context between modifications.
  34.203 -
  34.204 -Each line of context begins with a space character.  Within the hunk,
  34.205 -a line that begins with ``\texttt{-}'' means ``remove this line,''
  34.206 -while a line that begins with ``\texttt{+}'' means ``insert this
  34.207 -line.''  For example, a line that is modified is represented by one
  34.208 -deletion and one insertion.
  34.209 -
  34.210 -We will return to some of the more subtle aspects of patches later (in
  34.211 -section~\ref{sec:mq:adv-patch}), but you should have enough information
  34.212 -now to use MQ.
  34.213 -
  34.214 -\section{Getting started with Mercurial Queues}
  34.215 -\label{sec:mq:start}
  34.216 -
  34.217 -Because MQ is implemented as an extension, you must explicitly enable
  34.218 -before you can use it.  (You don't need to download anything; MQ ships
  34.219 -with the standard Mercurial distribution.)  To enable MQ, edit your
  34.220 -\tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}.
  34.221 -
  34.222 -\begin{figure}[ht]
  34.223 -  \begin{codesample4}
  34.224 -    [extensions]
  34.225 -    hgext.mq =
  34.226 -  \end{codesample4}
  34.227 -  \label{ex:mq:config}
  34.228 -  \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension}
  34.229 -\end{figure}
  34.230 -
  34.231 -Once the extension is enabled, it will make a number of new commands
  34.232 -available.  To verify that the extension is working, you can use
  34.233 -\hgcmd{help} to see if the \hgxcmd{mq}{qinit} command is now available; see
  34.234 -the example in figure~\ref{ex:mq:enabled}.
  34.235 -
  34.236 -\begin{figure}[ht]
  34.237 -  \interaction{mq.qinit-help.help}
  34.238 -  \caption{How to verify that MQ is enabled}
  34.239 -  \label{ex:mq:enabled}
  34.240 -\end{figure}
  34.241 -
  34.242 -You can use MQ with \emph{any} Mercurial repository, and its commands
  34.243 -only operate within that repository.  To get started, simply prepare
  34.244 -the repository using the \hgxcmd{mq}{qinit} command (see
  34.245 -figure~\ref{ex:mq:qinit}).  This command creates an empty directory
  34.246 -called \sdirname{.hg/patches}, where MQ will keep its metadata.  As
  34.247 -with many Mercurial commands, the \hgxcmd{mq}{qinit} command prints nothing
  34.248 -if it succeeds.
  34.249 -
  34.250 -\begin{figure}[ht]
  34.251 -  \interaction{mq.tutorial.qinit}
  34.252 -  \caption{Preparing a repository for use with MQ}
  34.253 -  \label{ex:mq:qinit}
  34.254 -\end{figure}
  34.255 -
  34.256 -\begin{figure}[ht]
  34.257 -  \interaction{mq.tutorial.qnew}
  34.258 -  \caption{Creating a new patch}
  34.259 -  \label{ex:mq:qnew}
  34.260 -\end{figure}
  34.261 -
  34.262 -\subsection{Creating a new patch}
  34.263 -
  34.264 -To begin work on a new patch, use the \hgxcmd{mq}{qnew} command.  This
  34.265 -command takes one argument, the name of the patch to create.  MQ will
  34.266 -use this as the name of an actual file in the \sdirname{.hg/patches}
  34.267 -directory, as you can see in figure~\ref{ex:mq:qnew}.
  34.268 -
  34.269 -Also newly present in the \sdirname{.hg/patches} directory are two
  34.270 -other files, \sfilename{series} and \sfilename{status}.  The
  34.271 -\sfilename{series} file lists all of the patches that MQ knows about
  34.272 -for this repository, with one patch per line.  Mercurial uses the
  34.273 -\sfilename{status} file for internal book-keeping; it tracks all of the
  34.274 -patches that MQ has \emph{applied} in this repository.
  34.275 -
  34.276 -\begin{note}
  34.277 -  You may sometimes want to edit the \sfilename{series} file by hand;
  34.278 -  for example, to change the sequence in which some patches are
  34.279 -  applied.  However, manually editing the \sfilename{status} file is
  34.280 -  almost always a bad idea, as it's easy to corrupt MQ's idea of what
  34.281 -  is happening.
  34.282 -\end{note}
  34.283 -
  34.284 -Once you have created your new patch, you can edit files in the
  34.285 -working directory as you usually would.  All of the normal Mercurial
  34.286 -commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as
  34.287 -they did before.
  34.288 -
  34.289 -\subsection{Refreshing a patch}
  34.290 -
  34.291 -When you reach a point where you want to save your work, use the
  34.292 -\hgxcmd{mq}{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch
  34.293 -you are working on.  This command folds the changes you have made in
  34.294 -the working directory into your patch, and updates its corresponding
  34.295 -changeset to contain those changes.
  34.296 -
  34.297 -\begin{figure}[ht]
  34.298 -  \interaction{mq.tutorial.qrefresh}
  34.299 -  \caption{Refreshing a patch}
  34.300 -  \label{ex:mq:qrefresh}
  34.301 -\end{figure}
  34.302 -
  34.303 -You can run \hgxcmd{mq}{qrefresh} as often as you like, so it's a good way
  34.304 -to ``checkpoint'' your work.  Refresh your patch at an opportune
  34.305 -time; try an experiment; and if the experiment doesn't work out,
  34.306 -\hgcmd{revert} your modifications back to the last time you refreshed.
  34.307 -
  34.308 -\begin{figure}[ht]
  34.309 -  \interaction{mq.tutorial.qrefresh2}
  34.310 -  \caption{Refresh a patch many times to accumulate changes}
  34.311 -  \label{ex:mq:qrefresh2}
  34.312 -\end{figure}
  34.313 -
  34.314 -\subsection{Stacking and tracking patches}
  34.315 -
  34.316 -Once you have finished working on a patch, or need to work on another,
  34.317 -you can use the \hgxcmd{mq}{qnew} command again to create a new patch.
  34.318 -Mercurial will apply this patch on top of your existing patch.  See
  34.319 -figure~\ref{ex:mq:qnew2} for an example.  Notice that the patch
  34.320 -contains the changes in our prior patch as part of its context (you
  34.321 -can see this more clearly in the output of \hgcmd{annotate}).
  34.322 -
  34.323 -\begin{figure}[ht]
  34.324 -  \interaction{mq.tutorial.qnew2}
  34.325 -  \caption{Stacking a second patch on top of the first}
  34.326 -  \label{ex:mq:qnew2}
  34.327 -\end{figure}
  34.328 -
  34.329 -So far, with the exception of \hgxcmd{mq}{qnew} and \hgxcmd{mq}{qrefresh}, we've
  34.330 -been careful to only use regular Mercurial commands.  However, MQ
  34.331 -provides many commands that are easier to use when you are thinking
  34.332 -about patches, as illustrated in figure~\ref{ex:mq:qseries}:
  34.333 -
  34.334 -\begin{itemize}
  34.335 -\item The \hgxcmd{mq}{qseries} command lists every patch that MQ knows
  34.336 -  about in this repository, from oldest to newest (most recently
  34.337 -  \emph{created}).
  34.338 -\item The \hgxcmd{mq}{qapplied} command lists every patch that MQ has
  34.339 -  \emph{applied} in this repository, again from oldest to newest (most
  34.340 -  recently applied).
  34.341 -\end{itemize}
  34.342 -
  34.343 -\begin{figure}[ht]
  34.344 -  \interaction{mq.tutorial.qseries}
  34.345 -  \caption{Understanding the patch stack with \hgxcmd{mq}{qseries} and
  34.346 -    \hgxcmd{mq}{qapplied}}
  34.347 -  \label{ex:mq:qseries}
  34.348 -\end{figure}
  34.349 -
  34.350 -\subsection{Manipulating the patch stack}
  34.351 -
  34.352 -The previous discussion implied that there must be a difference
  34.353 -between ``known'' and ``applied'' patches, and there is.  MQ can
  34.354 -manage a patch without it being applied in the repository.
  34.355 -
  34.356 -An \emph{applied} patch has a corresponding changeset in the
  34.357 -repository, and the effects of the patch and changeset are visible in
  34.358 -the working directory.  You can undo the application of a patch using
  34.359 -the \hgxcmd{mq}{qpop} command.  MQ still \emph{knows about}, or manages, a
  34.360 -popped patch, but the patch no longer has a corresponding changeset in
  34.361 -the repository, and the working directory does not contain the changes
  34.362 -made by the patch.  Figure~\ref{fig:mq:stack} illustrates the
  34.363 -difference between applied and tracked patches.
  34.364 -
  34.365 -\begin{figure}[ht]
  34.366 -  \centering
  34.367 -  \grafix{mq-stack}
  34.368 -  \caption{Applied and unapplied patches in the MQ patch stack}
  34.369 -  \label{fig:mq:stack}
  34.370 -\end{figure}
  34.371 -
  34.372 -You can reapply an unapplied, or popped, patch using the \hgxcmd{mq}{qpush}
  34.373 -command.  This creates a new changeset to correspond to the patch, and
  34.374 -the patch's changes once again become present in the working
  34.375 -directory.  See figure~\ref{ex:mq:qpop} for examples of \hgxcmd{mq}{qpop}
  34.376 -and \hgxcmd{mq}{qpush} in action.  Notice that once we have popped a patch
  34.377 -or two patches, the output of \hgxcmd{mq}{qseries} remains the same, while
  34.378 -that of \hgxcmd{mq}{qapplied} has changed.
  34.379 -
  34.380 -\begin{figure}[ht]
  34.381 -  \interaction{mq.tutorial.qpop}
  34.382 -  \caption{Modifying the stack of applied patches}
  34.383 -  \label{ex:mq:qpop}
  34.384 -\end{figure}
  34.385 -
  34.386 -\subsection{Pushing and popping many patches}
  34.387 -
  34.388 -While \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} each operate on a single patch at
  34.389 -a time by default, you can push and pop many patches in one go.  The
  34.390 -\hgxopt{mq}{qpush}{-a} option to \hgxcmd{mq}{qpush} causes it to push all
  34.391 -unapplied patches, while the \hgxopt{mq}{qpop}{-a} option to \hgxcmd{mq}{qpop}
  34.392 -causes it to pop all applied patches.  (For some more ways to push and
  34.393 -pop many patches, see section~\ref{sec:mq:perf} below.)
  34.394 -
  34.395 -\begin{figure}[ht]
  34.396 -  \interaction{mq.tutorial.qpush-a}
  34.397 -  \caption{Pushing all unapplied patches}
  34.398 -  \label{ex:mq:qpush-a}
  34.399 -\end{figure}
  34.400 -
  34.401 -\subsection{Safety checks, and overriding them}
  34.402 -
  34.403 -Several MQ commands check the working directory before they do
  34.404 -anything, and fail if they find any modifications.  They do this to
  34.405 -ensure that you won't lose any changes that you have made, but not yet
  34.406 -incorporated into a patch.  Figure~\ref{ex:mq:add} illustrates this;
  34.407 -the \hgxcmd{mq}{qnew} command will not create a new patch if there are
  34.408 -outstanding changes, caused in this case by the \hgcmd{add} of
  34.409 -\filename{file3}.
  34.410 -
  34.411 -\begin{figure}[ht]
  34.412 -  \interaction{mq.tutorial.add}
  34.413 -  \caption{Forcibly creating a patch}
  34.414 -  \label{ex:mq:add}
  34.415 -\end{figure}
  34.416 -
  34.417 -Commands that check the working directory all take an ``I know what
  34.418 -I'm doing'' option, which is always named \option{-f}.  The exact
  34.419 -meaning of \option{-f} depends on the command.  For example,
  34.420 -\hgcmdargs{qnew}{\hgxopt{mq}{qnew}{-f}} will incorporate any outstanding
  34.421 -changes into the new patch it creates, but
  34.422 -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-f}} will revert modifications to any
  34.423 -files affected by the patch that it is popping.  Be sure to read the
  34.424 -documentation for a command's \option{-f} option before you use it!
  34.425 -
  34.426 -\subsection{Working on several patches at once}
  34.427 -
  34.428 -The \hgxcmd{mq}{qrefresh} command always refreshes the \emph{topmost}
  34.429 -applied patch.  This means that you can suspend work on one patch (by
  34.430 -refreshing it), pop or push to make a different patch the top, and
  34.431 -work on \emph{that} patch for a while.
  34.432 -
  34.433 -Here's an example that illustrates how you can use this ability.
  34.434 -Let's say you're developing a new feature as two patches.  The first
  34.435 -is a change to the core of your software, and the second---layered on
  34.436 -top of the first---changes the user interface to use the code you just
  34.437 -added to the core.  If you notice a bug in the core while you're
  34.438 -working on the UI patch, it's easy to fix the core.  Simply
  34.439 -\hgxcmd{mq}{qrefresh} the UI patch to save your in-progress changes, and
  34.440 -\hgxcmd{mq}{qpop} down to the core patch.  Fix the core bug,
  34.441 -\hgxcmd{mq}{qrefresh} the core patch, and \hgxcmd{mq}{qpush} back to the UI
  34.442 -patch to continue where you left off.
  34.443 -
  34.444 -\section{More about patches}
  34.445 -\label{sec:mq:adv-patch}
  34.446 -
  34.447 -MQ uses the GNU \command{patch} command to apply patches, so it's
  34.448 -helpful to know a few more detailed aspects of how \command{patch}
  34.449 -works, and about patches themselves.
  34.450 -
  34.451 -\subsection{The strip count}
  34.452 -
  34.453 -If you look at the file headers in a patch, you will notice that the
  34.454 -pathnames usually have an extra component on the front that isn't
  34.455 -present in the actual path name.  This is a holdover from the way that
  34.456 -people used to generate patches (people still do this, but it's
  34.457 -somewhat rare with modern revision control tools).  
  34.458 -
  34.459 -Alice would unpack a tarball, edit her files, then decide that she
  34.460 -wanted to create a patch.  So she'd rename her working directory,
  34.461 -unpack the tarball again (hence the need for the rename), and use the
  34.462 -\cmdopt{diff}{-r} and \cmdopt{diff}{-N} options to \command{diff} to
  34.463 -recursively generate a patch between the unmodified directory and the
  34.464 -modified one.  The result would be that the name of the unmodified
  34.465 -directory would be at the front of the left-hand path in every file
  34.466 -header, and the name of the modified directory would be at the front
  34.467 -of the right-hand path.
  34.468 -
  34.469 -Since someone receiving a patch from the Alices of the net would be
  34.470 -unlikely to have unmodified and modified directories with exactly the
  34.471 -same names, the \command{patch} command has a \cmdopt{patch}{-p}
  34.472 -option that indicates the number of leading path name components to
  34.473 -strip when trying to apply a patch.  This number is called the
  34.474 -\emph{strip count}.
  34.475 -
  34.476 -An option of ``\texttt{-p1}'' means ``use a strip count of one''.  If
  34.477 -\command{patch} sees a file name \filename{foo/bar/baz} in a file
  34.478 -header, it will strip \filename{foo} and try to patch a file named
  34.479 -\filename{bar/baz}.  (Strictly speaking, the strip count refers to the
  34.480 -number of \emph{path separators} (and the components that go with them
  34.481 -) to strip.  A strip count of one will turn \filename{foo/bar} into
  34.482 -\filename{bar}, but \filename{/foo/bar} (notice the extra leading
  34.483 -slash) into \filename{foo/bar}.)
  34.484 -
  34.485 -The ``standard'' strip count for patches is one; almost all patches
  34.486 -contain one leading path name component that needs to be stripped.
  34.487 -Mercurial's \hgcmd{diff} command generates path names in this form,
  34.488 -and the \hgcmd{import} command and MQ expect patches to have a strip
  34.489 -count of one.
  34.490 -
  34.491 -If you receive a patch from someone that you want to add to your patch
  34.492 -queue, and the patch needs a strip count other than one, you cannot
  34.493 -just \hgxcmd{mq}{qimport} the patch, because \hgxcmd{mq}{qimport} does not yet
  34.494 -have a \texttt{-p} option (see~\bug{311}).  Your best bet is to
  34.495 -\hgxcmd{mq}{qnew} a patch of your own, then use \cmdargs{patch}{-p\emph{N}}
  34.496 -to apply their patch, followed by \hgcmd{addremove} to pick up any
  34.497 -files added or removed by the patch, followed by \hgxcmd{mq}{qrefresh}.
  34.498 -This complexity may become unnecessary; see~\bug{311} for details.
  34.499 -\subsection{Strategies for applying a patch}
  34.500 -
  34.501 -When \command{patch} applies a hunk, it tries a handful of
  34.502 -successively less accurate strategies to try to make the hunk apply.
  34.503 -This falling-back technique often makes it possible to take a patch
  34.504 -that was generated against an old version of a file, and apply it
  34.505 -against a newer version of that file.
  34.506 -
  34.507 -First, \command{patch} tries an exact match, where the line numbers,
  34.508 -the context, and the text to be modified must apply exactly.  If it
  34.509 -cannot make an exact match, it tries to find an exact match for the
  34.510 -context, without honouring the line numbering information.  If this
  34.511 -succeeds, it prints a line of output saying that the hunk was applied,
  34.512 -but at some \emph{offset} from the original line number.
  34.513 -
  34.514 -If a context-only match fails, \command{patch} removes the first and
  34.515 -last lines of the context, and tries a \emph{reduced} context-only
  34.516 -match.  If the hunk with reduced context succeeds, it prints a message
  34.517 -saying that it applied the hunk with a \emph{fuzz factor} (the number
  34.518 -after the fuzz factor indicates how many lines of context
  34.519 -\command{patch} had to trim before the patch applied).
  34.520 -
  34.521 -When neither of these techniques works, \command{patch} prints a
  34.522 -message saying that the hunk in question was rejected.  It saves
  34.523 -rejected hunks (also simply called ``rejects'') to a file with the
  34.524 -same name, and an added \sfilename{.rej} extension.  It also saves an
  34.525 -unmodified copy of the file with a \sfilename{.orig} extension; the
  34.526 -copy of the file without any extensions will contain any changes made
  34.527 -by hunks that \emph{did} apply cleanly.  If you have a patch that
  34.528 -modifies \filename{foo} with six hunks, and one of them fails to
  34.529 -apply, you will have: an unmodified \filename{foo.orig}, a
  34.530 -\filename{foo.rej} containing one hunk, and \filename{foo}, containing
  34.531 -the changes made by the five successful hunks.
  34.532 -
  34.533 -\subsection{Some quirks of patch representation}
  34.534 -
  34.535 -There are a few useful things to know about how \command{patch} works
  34.536 -with files.
  34.537 -\begin{itemize}
  34.538 -\item This should already be obvious, but \command{patch} cannot
  34.539 -  handle binary files.
  34.540 -\item Neither does it care about the executable bit; it creates new
  34.541 -  files as readable, but not executable.
  34.542 -\item \command{patch} treats the removal of a file as a diff between
  34.543 -  the file to be removed and the empty file.  So your idea of ``I
  34.544 -  deleted this file'' looks like ``every line of this file was
  34.545 -  deleted'' in a patch.
  34.546 -\item It treats the addition of a file as a diff between the empty
  34.547 -  file and the file to be added.  So in a patch, your idea of ``I
  34.548 -  added this file'' looks like ``every line of this file was added''.
  34.549 -\item It treats a renamed file as the removal of the old name, and the
  34.550 -  addition of the new name.  This means that renamed files have a big
  34.551 -  footprint in patches.  (Note also that Mercurial does not currently
  34.552 -  try to infer when files have been renamed or copied in a patch.)
  34.553 -\item \command{patch} cannot represent empty files, so you cannot use
  34.554 -  a patch to represent the notion ``I added this empty file to the
  34.555 -  tree''.
  34.556 -\end{itemize}
  34.557 -\subsection{Beware the fuzz}
  34.558 -
  34.559 -While applying a hunk at an offset, or with a fuzz factor, will often
  34.560 -be completely successful, these inexact techniques naturally leave
  34.561 -open the possibility of corrupting the patched file.  The most common
  34.562 -cases typically involve applying a patch twice, or at an incorrect
  34.563 -location in the file.  If \command{patch} or \hgxcmd{mq}{qpush} ever
  34.564 -mentions an offset or fuzz factor, you should make sure that the
  34.565 -modified files are correct afterwards.  
  34.566 -
  34.567 -It's often a good idea to refresh a patch that has applied with an
  34.568 -offset or fuzz factor; refreshing the patch generates new context
  34.569 -information that will make it apply cleanly.  I say ``often,'' not
  34.570 -``always,'' because sometimes refreshing a patch will make it fail to
  34.571 -apply against a different revision of the underlying files.  In some
  34.572 -cases, such as when you're maintaining a patch that must sit on top of
  34.573 -multiple versions of a source tree, it's acceptable to have a patch
  34.574 -apply with some fuzz, provided you've verified the results of the
  34.575 -patching process in such cases.
  34.576 -
  34.577 -\subsection{Handling rejection}
  34.578 -
  34.579 -If \hgxcmd{mq}{qpush} fails to apply a patch, it will print an error
  34.580 -message and exit.  If it has left \sfilename{.rej} files behind, it is
  34.581 -usually best to fix up the rejected hunks before you push more patches
  34.582 -or do any further work.
  34.583 -
  34.584 -If your patch \emph{used to} apply cleanly, and no longer does because
  34.585 -you've changed the underlying code that your patches are based on,
  34.586 -Mercurial Queues can help; see section~\ref{sec:mq:merge} for details.
  34.587 -
  34.588 -Unfortunately, there aren't any great techniques for dealing with
  34.589 -rejected hunks.  Most often, you'll need to view the \sfilename{.rej}
  34.590 -file and edit the target file, applying the rejected hunks by hand.
  34.591 -
  34.592 -If you're feeling adventurous, Neil Brown, a Linux kernel hacker,
  34.593 -wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more
  34.594 -vigorous than \command{patch} in its attempts to make a patch apply.
  34.595 -
  34.596 -Another Linux kernel hacker, Chris Mason (the author of Mercurial
  34.597 -Queues), wrote a similar tool called
  34.598 -\command{mpatch}~\cite{web:mpatch}, which takes a simple approach to
  34.599 -automating the application of hunks rejected by \command{patch}.  The
  34.600 -\command{mpatch} command can help with four common reasons that a hunk
  34.601 -may be rejected:
  34.602 -
  34.603 -\begin{itemize}
  34.604 -\item The context in the middle of a hunk has changed.
  34.605 -\item A hunk is missing some context at the beginning or end.
  34.606 -\item A large hunk might apply better---either entirely or in
  34.607 -  part---if it was broken up into smaller hunks.
  34.608 -\item A hunk removes lines with slightly different content than those
  34.609 -  currently present in the file.
  34.610 -\end{itemize}
  34.611 -
  34.612 -If you use \command{wiggle} or \command{mpatch}, you should be doubly
  34.613 -careful to check your results when you're done.  In fact,
  34.614 -\command{mpatch} enforces this method of double-checking the tool's
  34.615 -output, by automatically dropping you into a merge program when it has
  34.616 -done its job, so that you can verify its work and finish off any
  34.617 -remaining merges.
  34.618 -
  34.619 -\section{Getting the best performance out of MQ}
  34.620 -\label{sec:mq:perf}
  34.621 -
  34.622 -MQ is very efficient at handling a large number of patches.  I ran
  34.623 -some performance experiments in mid-2006 for a talk that I gave at the
  34.624 -2006 EuroPython conference~\cite{web:europython}.  I used as my data
  34.625 -set the Linux 2.6.17-mm1 patch series, which consists of 1,738
  34.626 -patches.  I applied these on top of a Linux kernel repository
  34.627 -containing all 27,472 revisions between Linux 2.6.12-rc2 and Linux
  34.628 -2.6.17.
  34.629 -
  34.630 -On my old, slow laptop, I was able to
  34.631 -\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} all 1,738 patches in 3.5 minutes,
  34.632 -and \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} them all in 30 seconds.  (On a
  34.633 -newer laptop, the time to push all patches dropped to two minutes.)  I
  34.634 -could \hgxcmd{mq}{qrefresh} one of the biggest patches (which made 22,779
  34.635 -lines of changes to 287 files) in 6.6 seconds.
  34.636 -
  34.637 -Clearly, MQ is well suited to working in large trees, but there are a
  34.638 -few tricks you can use to get the best performance of it.
  34.639 -
  34.640 -First of all, try to ``batch'' operations together.  Every time you
  34.641 -run \hgxcmd{mq}{qpush} or \hgxcmd{mq}{qpop}, these commands scan the working
  34.642 -directory once to make sure you haven't made some changes and then
  34.643 -forgotten to run \hgxcmd{mq}{qrefresh}.  On a small tree, the time that
  34.644 -this scan takes is unnoticeable.  However, on a medium-sized tree
  34.645 -(containing tens of thousands of files), it can take a second or more.
  34.646 -
  34.647 -The \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} commands allow you to push and pop
  34.648 -multiple patches at a time.  You can identify the ``destination
  34.649 -patch'' that you want to end up at.  When you \hgxcmd{mq}{qpush} with a
  34.650 -destination specified, it will push patches until that patch is at the
  34.651 -top of the applied stack.  When you \hgxcmd{mq}{qpop} to a destination, MQ
  34.652 -will pop patches until the destination patch is at the top.
  34.653 -
  34.654 -You can identify a destination patch using either the name of the
  34.655 -patch, or by number.  If you use numeric addressing, patches are
  34.656 -counted from zero; this means that the first patch is zero, the second
  34.657 -is one, and so on.
  34.658 -
  34.659 -\section{Updating your patches when the underlying code changes}
  34.660 -\label{sec:mq:merge}
  34.661 -
  34.662 -It's common to have a stack of patches on top of an underlying
  34.663 -repository that you don't modify directly.  If you're working on
  34.664 -changes to third-party code, or on a feature that is taking longer to
  34.665 -develop than the rate of change of the code beneath, you will often
  34.666 -need to sync up with the underlying code, and fix up any hunks in your
  34.667 -patches that no longer apply.  This is called \emph{rebasing} your
  34.668 -patch series.
  34.669 -
  34.670 -The simplest way to do this is to \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}}
  34.671 -your patches, then \hgcmd{pull} changes into the underlying
  34.672 -repository, and finally \hgcmdargs{qpush}{\hgxopt{mq}{qpop}{-a}} your
  34.673 -patches again.  MQ will stop pushing any time it runs across a patch
  34.674 -that fails to apply during conflicts, allowing you to fix your
  34.675 -conflicts, \hgxcmd{mq}{qrefresh} the affected patch, and continue pushing
  34.676 -until you have fixed your entire stack.
  34.677 -
  34.678 -This approach is easy to use and works well if you don't expect
  34.679 -changes to the underlying code to affect how well your patches apply.
  34.680 -If your patch stack touches code that is modified frequently or
  34.681 -invasively in the underlying repository, however, fixing up rejected
  34.682 -hunks by hand quickly becomes tiresome.
  34.683 -
  34.684 -It's possible to partially automate the rebasing process.  If your
  34.685 -patches apply cleanly against some revision of the underlying repo, MQ
  34.686 -can use this information to help you to resolve conflicts between your
  34.687 -patches and a different revision.
  34.688 -
  34.689 -The process is a little involved.
  34.690 -\begin{enumerate}
  34.691 -\item To begin, \hgcmdargs{qpush}{-a} all of your patches on top of
  34.692 -  the revision where you know that they apply cleanly.
  34.693 -\item Save a backup copy of your patch directory using
  34.694 -  \hgcmdargs{qsave}{\hgxopt{mq}{qsave}{-e} \hgxopt{mq}{qsave}{-c}}.  This prints
  34.695 -  the name of the directory that it has saved the patches in.  It will
  34.696 -  save the patches to a directory called
  34.697 -  \sdirname{.hg/patches.\emph{N}}, where \texttt{\emph{N}} is a small
  34.698 -  integer.  It also commits a ``save changeset'' on top of your
  34.699 -  applied patches; this is for internal book-keeping, and records the
  34.700 -  states of the \sfilename{series} and \sfilename{status} files.
  34.701 -\item Use \hgcmd{pull} to bring new changes into the underlying
  34.702 -  repository.  (Don't run \hgcmdargs{pull}{-u}; see below for why.)
  34.703 -\item Update to the new tip revision, using
  34.704 -  \hgcmdargs{update}{\hgopt{update}{-C}} to override the patches you
  34.705 -  have pushed.
  34.706 -\item Merge all patches using \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}
  34.707 -    \hgxopt{mq}{qpush}{-a}}.  The \hgxopt{mq}{qpush}{-m} option to \hgxcmd{mq}{qpush}
  34.708 -  tells MQ to perform a three-way merge if the patch fails to apply.
  34.709 -\end{enumerate}
  34.710 -
  34.711 -During the \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}}, each patch in the
  34.712 -\sfilename{series} file is applied normally.  If a patch applies with
  34.713 -fuzz or rejects, MQ looks at the queue you \hgxcmd{mq}{qsave}d, and
  34.714 -performs a three-way merge with the corresponding changeset.  This
  34.715 -merge uses Mercurial's normal merge machinery, so it may pop up a GUI
  34.716 -merge tool to help you to resolve problems.
  34.717 -
  34.718 -When you finish resolving the effects of a patch, MQ refreshes your
  34.719 -patch based on the result of the merge.
  34.720 -
  34.721 -At the end of this process, your repository will have one extra head
  34.722 -from the old patch queue, and a copy of the old patch queue will be in
  34.723 -\sdirname{.hg/patches.\emph{N}}. You can remove the extra head using
  34.724 -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a} \hgxopt{mq}{qpop}{-n} patches.\emph{N}}
  34.725 -or \hgcmd{strip}.  You can delete \sdirname{.hg/patches.\emph{N}} once
  34.726 -you are sure that you no longer need it as a backup.
  34.727 -
  34.728 -\section{Identifying patches}
  34.729 -
  34.730 -MQ commands that work with patches let you refer to a patch either by
  34.731 -using its name or by a number.  By name is obvious enough; pass the
  34.732 -name \filename{foo.patch} to \hgxcmd{mq}{qpush}, for example, and it will
  34.733 -push patches until \filename{foo.patch} is applied.  
  34.734 -
  34.735 -As a shortcut, you can refer to a patch using both a name and a
  34.736 -numeric offset; \texttt{foo.patch-2} means ``two patches before
  34.737 -\texttt{foo.patch}'', while \texttt{bar.patch+4} means ``four patches
  34.738 -after \texttt{bar.patch}''.
  34.739 -
  34.740 -Referring to a patch by index isn't much different.  The first patch
  34.741 -printed in the output of \hgxcmd{mq}{qseries} is patch zero (yes, it's one
  34.742 -of those start-at-zero counting systems); the second is patch one; and
  34.743 -so on.
  34.744 -
  34.745 -MQ also makes it easy to work with patches when you are using normal
  34.746 -Mercurial commands.  Every command that accepts a changeset ID will
  34.747 -also accept the name of an applied patch.  MQ augments the tags
  34.748 -normally in the repository with an eponymous one for each applied
  34.749 -patch.  In addition, the special tags \index{tags!special tag
  34.750 -  names!\texttt{qbase}}\texttt{qbase} and \index{tags!special tag
  34.751 -  names!\texttt{qtip}}\texttt{qtip} identify the ``bottom-most'' and
  34.752 -topmost applied patches, respectively.
  34.753 -
  34.754 -These additions to Mercurial's normal tagging capabilities make
  34.755 -dealing with patches even more of a breeze.
  34.756 -\begin{itemize}
  34.757 -\item Want to patchbomb a mailing list with your latest series of
  34.758 -  changes?
  34.759 -  \begin{codesample4}
  34.760 -    hg email qbase:qtip
  34.761 -  \end{codesample4}
  34.762 -  (Don't know what ``patchbombing'' is?  See
  34.763 -  section~\ref{sec:hgext:patchbomb}.)
  34.764 -\item Need to see all of the patches since \texttt{foo.patch} that
  34.765 -  have touched files in a subdirectory of your tree?
  34.766 -  \begin{codesample4}
  34.767 -    hg log -r foo.patch:qtip \emph{subdir}
  34.768 -  \end{codesample4}
  34.769 -\end{itemize}
  34.770 -
  34.771 -Because MQ makes the names of patches available to the rest of
  34.772 -Mercurial through its normal internal tag machinery, you don't need to
  34.773 -type in the entire name of a patch when you want to identify it by
  34.774 -name.
  34.775 -
  34.776 -\begin{figure}[ht]
  34.777 -  \interaction{mq.id.output}
  34.778 -  \caption{Using MQ's tag features to work with patches}
  34.779 -  \label{ex:mq:id}
  34.780 -\end{figure}
  34.781 -
  34.782 -Another nice consequence of representing patch names as tags is that
  34.783 -when you run the \hgcmd{log} command, it will display a patch's name
  34.784 -as a tag, simply as part of its normal output.  This makes it easy to
  34.785 -visually distinguish applied patches from underlying ``normal''
  34.786 -revisions.  Figure~\ref{ex:mq:id} shows a few normal Mercurial
  34.787 -commands in use with applied patches.
  34.788 -
  34.789 -\section{Useful things to know about}
  34.790 -
  34.791 -There are a number of aspects of MQ usage that don't fit tidily into
  34.792 -sections of their own, but that are good to know.  Here they are, in
  34.793 -one place.
  34.794 -
  34.795 -\begin{itemize}
  34.796 -\item Normally, when you \hgxcmd{mq}{qpop} a patch and \hgxcmd{mq}{qpush} it
  34.797 -  again, the changeset that represents the patch after the pop/push
  34.798 -  will have a \emph{different identity} than the changeset that
  34.799 -  represented the hash beforehand.  See
  34.800 -  section~\ref{sec:mqref:cmd:qpush} for information as to why this is.
  34.801 -\item It's not a good idea to \hgcmd{merge} changes from another
  34.802 -  branch with a patch changeset, at least if you want to maintain the
  34.803 -  ``patchiness'' of that changeset and changesets below it on the
  34.804 -  patch stack.  If you try to do this, it will appear to succeed, but
  34.805 -  MQ will become confused.
  34.806 -\end{itemize}
  34.807 -
  34.808 -\section{Managing patches in a repository}
  34.809 -\label{sec:mq:repo}
  34.810 -
  34.811 -Because MQ's \sdirname{.hg/patches} directory resides outside a
  34.812 -Mercurial repository's working directory, the ``underlying'' Mercurial
  34.813 -repository knows nothing about the management or presence of patches.
  34.814 -
  34.815 -This presents the interesting possibility of managing the contents of
  34.816 -the patch directory as a Mercurial repository in its own right.  This
  34.817 -can be a useful way to work.  For example, you can work on a patch for
  34.818 -a while, \hgxcmd{mq}{qrefresh} it, then \hgcmd{commit} the current state of
  34.819 -the patch.  This lets you ``roll back'' to that version of the patch
  34.820 -later on.
  34.821 -
  34.822 -You can then share different versions of the same patch stack among
  34.823 -multiple underlying repositories.  I use this when I am developing a
  34.824 -Linux kernel feature.  I have a pristine copy of my kernel sources for
  34.825 -each of several CPU architectures, and a cloned repository under each
  34.826 -that contains the patches I am working on.  When I want to test a
  34.827 -change on a different architecture, I push my current patches to the
  34.828 -patch repository associated with that kernel tree, pop and push all of
  34.829 -my patches, and build and test that kernel.
  34.830 -
  34.831 -Managing patches in a repository makes it possible for multiple
  34.832 -developers to work on the same patch series without colliding with
  34.833 -each other, all on top of an underlying source base that they may or
  34.834 -may not control.
  34.835 -
  34.836 -\subsection{MQ support for patch repositories}
  34.837 -
  34.838 -MQ helps you to work with the \sdirname{.hg/patches} directory as a
  34.839 -repository; when you prepare a repository for working with patches
  34.840 -using \hgxcmd{mq}{qinit}, you can pass the \hgxopt{mq}{qinit}{-c} option to
  34.841 -create the \sdirname{.hg/patches} directory as a Mercurial repository.
  34.842 -
  34.843 -\begin{note}
  34.844 -  If you forget to use the \hgxopt{mq}{qinit}{-c} option, you can simply go
  34.845 -  into the \sdirname{.hg/patches} directory at any time and run
  34.846 -  \hgcmd{init}.  Don't forget to add an entry for the
  34.847 -  \sfilename{status} file to the \sfilename{.hgignore} file, though
  34.848 -
  34.849 -  (\hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} does this for you
  34.850 -  automatically); you \emph{really} don't want to manage the
  34.851 -  \sfilename{status} file.
  34.852 -\end{note}
  34.853 -
  34.854 -As a convenience, if MQ notices that the \dirname{.hg/patches}
  34.855 -directory is a repository, it will automatically \hgcmd{add} every
  34.856 -patch that you create and import.
  34.857 -
  34.858 -MQ provides a shortcut command, \hgxcmd{mq}{qcommit}, that runs
  34.859 -\hgcmd{commit} in the \sdirname{.hg/patches} directory.  This saves
  34.860 -some bothersome typing.
  34.861 -
  34.862 -Finally, as a convenience to manage the patch directory, you can
  34.863 -define the alias \command{mq} on Unix systems. For example, on Linux
  34.864 -systems using the \command{bash} shell, you can include the following
  34.865 -snippet in your \tildefile{.bashrc}.
  34.866 -
  34.867 -\begin{codesample2}
  34.868 -  alias mq=`hg -R \$(hg root)/.hg/patches'
  34.869 -\end{codesample2}
  34.870 -
  34.871 -You can then issue commands of the form \cmdargs{mq}{pull} from
  34.872 -the main repository.
  34.873 -
  34.874 -\subsection{A few things to watch out for}
  34.875 -
  34.876 -MQ's support for working with a repository full of patches is limited
  34.877 -in a few small respects.
  34.878 -
  34.879 -MQ cannot automatically detect changes that you make to the patch
  34.880 -directory.  If you \hgcmd{pull}, manually edit, or \hgcmd{update}
  34.881 -changes to patches or the \sfilename{series} file, you will have to
  34.882 -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} and then
  34.883 -\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} in the underlying repository to
  34.884 -see those changes show up there.  If you forget to do this, you can
  34.885 -confuse MQ's idea of which patches are applied.
  34.886 -
  34.887 -\section{Third party tools for working with patches}
  34.888 -\label{sec:mq:tools}
  34.889 -
  34.890 -Once you've been working with patches for a while, you'll find
  34.891 -yourself hungry for tools that will help you to understand and
  34.892 -manipulate the patches you're dealing with.
  34.893 -
  34.894 -The \command{diffstat} command~\cite{web:diffstat} generates a
  34.895 -histogram of the modifications made to each file in a patch.  It
  34.896 -provides a good way to ``get a sense of'' a patch---which files it
  34.897 -affects, and how much change it introduces to each file and as a
  34.898 -whole.  (I find that it's a good idea to use \command{diffstat}'s
  34.899 -\cmdopt{diffstat}{-p} option as a matter of course, as otherwise it
  34.900 -will try to do clever things with prefixes of file names that
  34.901 -inevitably confuse at least me.)
  34.902 -
  34.903 -\begin{figure}[ht]
  34.904 -  \interaction{mq.tools.tools}
  34.905 -  \caption{The \command{diffstat}, \command{filterdiff}, and \command{lsdiff} commands}
  34.906 -  \label{ex:mq:tools}
  34.907 -\end{figure}
  34.908 -
  34.909 -The \package{patchutils} package~\cite{web:patchutils} is invaluable.
  34.910 -It provides a set of small utilities that follow the ``Unix
  34.911 -philosophy;'' each does one useful thing with a patch.  The
  34.912 -\package{patchutils} command I use most is \command{filterdiff}, which
  34.913 -extracts subsets from a patch file.  For example, given a patch that
  34.914 -modifies hundreds of files across dozens of directories, a single
  34.915 -invocation of \command{filterdiff} can generate a smaller patch that
  34.916 -only touches files whose names match a particular glob pattern.  See
  34.917 -section~\ref{mq-collab:tips:interdiff} for another example.
  34.918 -
  34.919 -\section{Good ways to work with patches}
  34.920 -
  34.921 -Whether you are working on a patch series to submit to a free software
  34.922 -or open source project, or a series that you intend to treat as a
  34.923 -sequence of regular changesets when you're done, you can use some
  34.924 -simple techniques to keep your work well organised.
  34.925 -
  34.926 -Give your patches descriptive names.  A good name for a patch might be
  34.927 -\filename{rework-device-alloc.patch}, because it will immediately give
  34.928 -you a hint what the purpose of the patch is.  Long names shouldn't be
  34.929 -a problem; you won't be typing the names often, but you \emph{will} be
  34.930 -running commands like \hgxcmd{mq}{qapplied} and \hgxcmd{mq}{qtop} over and over.
  34.931 -Good naming becomes especially important when you have a number of
  34.932 -patches to work with, or if you are juggling a number of different
  34.933 -tasks and your patches only get a fraction of your attention.
  34.934 -
  34.935 -Be aware of what patch you're working on.  Use the \hgxcmd{mq}{qtop}
  34.936 -command and skim over the text of your patches frequently---for
  34.937 -example, using \hgcmdargs{tip}{\hgopt{tip}{-p}})---to be sure of where
  34.938 -you stand.  I have several times worked on and \hgxcmd{mq}{qrefresh}ed a
  34.939 -patch other than the one I intended, and it's often tricky to migrate
  34.940 -changes into the right patch after making them in the wrong one.
  34.941 -
  34.942 -For this reason, it is very much worth investing a little time to
  34.943 -learn how to use some of the third-party tools I described in
  34.944 -section~\ref{sec:mq:tools}, particularly \command{diffstat} and
  34.945 -\command{filterdiff}.  The former will give you a quick idea of what
  34.946 -changes your patch is making, while the latter makes it easy to splice
  34.947 -hunks selectively out of one patch and into another.
  34.948 -
  34.949 -\section{MQ cookbook}
  34.950 -
  34.951 -\subsection{Manage ``trivial'' patches}
  34.952 -
  34.953 -Because the overhead of dropping files into a new Mercurial repository
  34.954 -is so low, it makes a lot of sense to manage patches this way even if
  34.955 -you simply want to make a few changes to a source tarball that you
  34.956 -downloaded.
  34.957 -
  34.958 -Begin by downloading and unpacking the source tarball,
  34.959 -and turning it into a Mercurial repository.
  34.960 -\interaction{mq.tarball.download}
  34.961 -
  34.962 -Continue by creating a patch stack and making your changes.
  34.963 -\interaction{mq.tarball.qinit}
  34.964 -
  34.965 -Let's say a few weeks or months pass, and your package author releases
  34.966 -a new version.  First, bring their changes into the repository.
  34.967 -\interaction{mq.tarball.newsource}
  34.968 -The pipeline starting with \hgcmd{locate} above deletes all files in
  34.969 -the working directory, so that \hgcmd{commit}'s
  34.970 -\hgopt{commit}{--addremove} option can actually tell which files have
  34.971 -really been removed in the newer version of the source.
  34.972 -
  34.973 -Finally, you can apply your patches on top of the new tree.
  34.974 -\interaction{mq.tarball.repush}
  34.975 -
  34.976 -\subsection{Combining entire patches}
  34.977 -\label{sec:mq:combine}
  34.978 -
  34.979 -MQ provides a command, \hgxcmd{mq}{qfold} that lets you combine entire
  34.980 -patches.  This ``folds'' the patches you name, in the order you name
  34.981 -them, into the topmost applied patch, and concatenates their
  34.982 -descriptions onto the end of its description.  The patches that you
  34.983 -fold must be unapplied before you fold them.
  34.984 -
  34.985 -The order in which you fold patches matters.  If your topmost applied
  34.986 -patch is \texttt{foo}, and you \hgxcmd{mq}{qfold} \texttt{bar} and
  34.987 -\texttt{quux} into it, you will end up with a patch that has the same
  34.988 -effect as if you applied first \texttt{foo}, then \texttt{bar},
  34.989 -followed by \texttt{quux}.
  34.990 -
  34.991 -\subsection{Merging part of one patch into another}
  34.992 -
  34.993 -Merging \emph{part} of one patch into another is more difficult than
  34.994 -combining entire patches.
  34.995 -
  34.996 -If you want to move changes to entire files, you can use
  34.997 -\command{filterdiff}'s \cmdopt{filterdiff}{-i} and
  34.998 -\cmdopt{filterdiff}{-x} options to choose the modifications to snip
  34.999 -out of one patch, concatenating its output onto the end of the patch
 34.1000 -you want to merge into.  You usually won't need to modify the patch
 34.1001 -you've merged the changes from.  Instead, MQ will report some rejected
 34.1002 -hunks when you \hgxcmd{mq}{qpush} it (from the hunks you moved into the
 34.1003 -other patch), and you can simply \hgxcmd{mq}{qrefresh} the patch to drop
 34.1004 -the duplicate hunks.
 34.1005 -
 34.1006 -If you have a patch that has multiple hunks modifying a file, and you
 34.1007 -only want to move a few of those hunks, the job becomes more messy,
 34.1008 -but you can still partly automate it.  Use \cmdargs{lsdiff}{-nvv} to
 34.1009 -print some metadata about the patch.
 34.1010 -\interaction{mq.tools.lsdiff}
 34.1011 -
 34.1012 -This command prints three different kinds of number:
 34.1013 -\begin{itemize}
 34.1014 -\item (in the first column) a \emph{file number} to identify each file
 34.1015 -  modified in the patch;
 34.1016 -\item (on the next line, indented) the line number within a modified
 34.1017 -  file where a hunk starts; and
 34.1018 -\item (on the same line) a \emph{hunk number} to identify that hunk.
 34.1019 -\end{itemize}
 34.1020 -
 34.1021 -You'll have to use some visual inspection, and reading of the patch,
 34.1022 -to identify the file and hunk numbers you'll want, but you can then
 34.1023 -pass them to to \command{filterdiff}'s \cmdopt{filterdiff}{--files}
 34.1024 -and \cmdopt{filterdiff}{--hunks} options, to select exactly the file
 34.1025 -and hunk you want to extract.
 34.1026 -
 34.1027 -Once you have this hunk, you can concatenate it onto the end of your
 34.1028 -destination patch and continue with the remainder of
 34.1029 -section~\ref{sec:mq:combine}.
 34.1030 -
 34.1031 -\section{Differences between quilt and MQ}
 34.1032 -
 34.1033 -If you are already familiar with quilt, MQ provides a similar command
 34.1034 -set.  There are a few differences in the way that it works.
 34.1035 -
 34.1036 -You will already have noticed that most quilt commands have MQ
 34.1037 -counterparts that simply begin with a ``\texttt{q}''.  The exceptions
 34.1038 -are quilt's \texttt{add} and \texttt{remove} commands, the
 34.1039 -counterparts for which are the normal Mercurial \hgcmd{add} and
 34.1040 -\hgcmd{remove} commands.  There is no MQ equivalent of the quilt
 34.1041 -\texttt{edit} command.
 34.1042 -
 34.1043 -%%% Local Variables: 
 34.1044 -%%% mode: latex
 34.1045 -%%% TeX-master: "00book"
 34.1046 -%%% End: 

    35.1 --- a/en/preface.tex	Thu Jan 29 22:47:34 2009 -0800
    35.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    35.3 @@ -1,67 +0,0 @@
    35.4 -\chapter*{Preface}
    35.5 -\addcontentsline{toc}{chapter}{Preface}
    35.6 -\label{chap:preface}
    35.7 -
    35.8 -Distributed revision control is a relatively new territory, and has
    35.9 -thus far grown due to people's willingness to strike out into
   35.10 -ill-charted territory.
   35.11 -
   35.12 -I am writing a book about distributed revision control because I
   35.13 -believe that it is an important subject that deserves a field guide.
   35.14 -I chose to write about Mercurial because it is the easiest tool to
   35.15 -learn the terrain with, and yet it scales to the demands of real,
   35.16 -challenging environments where many other revision control tools fail.
   35.17 -
   35.18 -\section{This book is a work in progress}
   35.19 -
   35.20 -I am releasing this book while I am still writing it, in the hope that
   35.21 -it will prove useful to others.  I also hope that readers will
   35.22 -contribute as they see fit.
   35.23 -
   35.24 -\section{About the examples in this book}
   35.25 -
   35.26 -This book takes an unusual approach to code samples.  Every example is
   35.27 -``live''---each one is actually the result of a shell script that
   35.28 -executes the Mercurial commands you see.  Every time an image of the
   35.29 -book is built from its sources, all the example scripts are
   35.30 -automatically run, and their current results compared against their
   35.31 -expected results.
   35.32 -
   35.33 -The advantage of this approach is that the examples are always
   35.34 -accurate; they describe \emph{exactly} the behaviour of the version of
   35.35 -Mercurial that's mentioned at the front of the book.  If I update the
   35.36 -version of Mercurial that I'm documenting, and the output of some
   35.37 -command changes, the build fails.
   35.38 -
   35.39 -There is a small disadvantage to this approach, which is that the
   35.40 -dates and times you'll see in examples tend to be ``squashed''
   35.41 -together in a way that they wouldn't be if the same commands were
   35.42 -being typed by a human.  Where a human can issue no more than one
   35.43 -command every few seconds, with any resulting timestamps
   35.44 -correspondingly spread out, my automated example scripts run many
   35.45 -commands in one second.
   35.46 -
   35.47 -As an instance of this, several consecutive commits in an example can
   35.48 -show up as having occurred during the same second.  You can see this
   35.49 -occur in the \hgext{bisect} example in section~\ref{sec:undo:bisect},
   35.50 -for instance.
   35.51 -
   35.52 -So when you're reading examples, don't place too much weight on the
   35.53 -dates or times you see in the output of commands.  But \emph{do} be
   35.54 -confident that the behaviour you're seeing is consistent and
   35.55 -reproducible.
   35.56 -
   35.57 -\section{Colophon---this book is Free}
   35.58 -
   35.59 -This book is licensed under the Open Publication License, and is
   35.60 -produced entirely using Free Software tools.  It is typeset with
   35.61 -\LaTeX{}; illustrations are drawn and rendered with
   35.62 -\href{http://www.inkscape.org/}{Inkscape}.
   35.63 -
   35.64 -The complete source code for this book is published as a Mercurial
   35.65 -repository, at \url{http://hg.serpentine.com/mercurial/book}.
   35.66 -
   35.67 -%%% Local Variables: 
   35.68 -%%% mode: latex
   35.69 -%%% TeX-master: "00book"
   35.70 -%%% End: 

    36.1 --- a/en/srcinstall.tex	Thu Jan 29 22:47:34 2009 -0800
    36.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    36.3 @@ -1,53 +0,0 @@
    36.4 -\chapter{Installing Mercurial from source}
    36.5 -\label{chap:srcinstall}
    36.6 -
    36.7 -\section{On a Unix-like system}
    36.8 -\label{sec:srcinstall:unixlike}
    36.9 -
   36.10 -If you are using a Unix-like system that has a sufficiently recent
   36.11 -version of Python (2.3~or newer) available, it is easy to install
   36.12 -Mercurial from source.
   36.13 -\begin{enumerate}
   36.14 -\item Download a recent source tarball from
   36.15 -  \url{http://www.selenic.com/mercurial/download}.
   36.16 -\item Unpack the tarball:
   36.17 -  \begin{codesample4}
   36.18 -    gzip -dc mercurial-\emph{version}.tar.gz | tar xf -
   36.19 -  \end{codesample4}
   36.20 -\item Go into the source directory and run the installer script.  This
   36.21 -  will build Mercurial and install it in your home directory.
   36.22 -  \begin{codesample4}
   36.23 -    cd mercurial-\emph{version}
   36.24 -    python setup.py install --force --home=\$HOME
   36.25 -  \end{codesample4}
   36.26 -\end{enumerate}
   36.27 -Once the install finishes, Mercurial will be in the \texttt{bin}
   36.28 -subdirectory of your home directory.  Don't forget to make sure that
   36.29 -this directory is present in your shell's search path.
   36.30 -
   36.31 -You will probably need to set the \envar{PYTHONPATH} environment
   36.32 -variable so that the Mercurial executable can find the rest of the
   36.33 -Mercurial packages.  For example, on my laptop, I have set it to
   36.34 -\texttt{/home/bos/lib/python}.  The exact path that you will need to
   36.35 -use depends on how Python was built for your system, but should be
   36.36 -easy to figure out.  If you're uncertain, look through the output of
   36.37 -the installer script above, and see where the contents of the
   36.38 -\texttt{mercurial} directory were installed to.
   36.39 -
   36.40 -\section{On Windows}
   36.41 -
   36.42 -Building and installing Mercurial on Windows requires a variety of
   36.43 -tools, a fair amount of technical knowledge, and considerable
   36.44 -patience.  I very much \emph{do not recommend} this route if you are a
   36.45 -``casual user''.  Unless you intend to hack on Mercurial, I strongly
   36.46 -suggest that you use a binary package instead.
   36.47 -
   36.48 -If you are intent on building Mercurial from source on Windows, follow
   36.49 -the ``hard way'' directions on the Mercurial wiki at
   36.50 -\url{http://www.selenic.com/mercurial/wiki/index.cgi/WindowsInstall},
   36.51 -and expect the process to involve a lot of fiddly work.
   36.52 -
   36.53 -%%% Local Variables: 
   36.54 -%%% mode: latex
   36.55 -%%% TeX-master: "00book"
   36.56 -%%% End: 

    37.1 --- a/en/template.tex	Thu Jan 29 22:47:34 2009 -0800
    37.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    37.3 @@ -1,475 +0,0 @@
    37.4 -\chapter{Customising the output of Mercurial}
    37.5 -\label{chap:template}
    37.6 -
    37.7 -Mercurial provides a powerful mechanism to let you control how it
    37.8 -displays information.  The mechanism is based on templates.  You can
    37.9 -use templates to generate specific output for a single command, or to
   37.10 -customise the entire appearance of the built-in web interface.
   37.11 -
   37.12 -\section{Using precanned output styles}
   37.13 -\label{sec:style}
   37.14 -
   37.15 -Packaged with Mercurial are some output styles that you can use
   37.16 -immediately.  A style is simply a precanned template that someone
   37.17 -wrote and installed somewhere that Mercurial can find.
   37.18 -
   37.19 -Before we take a look at Mercurial's bundled styles, let's review its
   37.20 -normal output.
   37.21 -
   37.22 -\interaction{template.simple.normal}
   37.23 -
   37.24 -This is somewhat informative, but it takes up a lot of space---five
   37.25 -lines of output per changeset.  The \texttt{compact} style reduces
   37.26 -this to three lines, presented in a sparse manner.
   37.27 -
   37.28 -\interaction{template.simple.compact}
   37.29 -
   37.30 -The \texttt{changelog} style hints at the expressive power of
   37.31 -Mercurial's templating engine.  This style attempts to follow the GNU
   37.32 -Project's changelog guidelines\cite{web:changelog}.
   37.33 -
   37.34 -\interaction{template.simple.changelog}
   37.35 -
   37.36 -You will not be shocked to learn that Mercurial's default output style
   37.37 -is named \texttt{default}.
   37.38 -
   37.39 -\subsection{Setting a default style}
   37.40 -
   37.41 -You can modify the output style that Mercurial will use for every
   37.42 -command by editing your \hgrc\ file, naming the style you would
   37.43 -prefer to use.
   37.44 -
   37.45 -\begin{codesample2}
   37.46 -  [ui]
   37.47 -  style = compact
   37.48 -\end{codesample2}
   37.49 -
   37.50 -If you write a style of your own, you can use it by either providing
   37.51 -the path to your style file, or copying your style file into a
   37.52 -location where Mercurial can find it (typically the \texttt{templates}
   37.53 -subdirectory of your Mercurial install directory).
   37.54 -
   37.55 -\section{Commands that support styles and templates}
   37.56 -
   37.57 -All of Mercurial's ``\texttt{log}-like'' commands let you use styles
   37.58 -and templates: \hgcmd{incoming}, \hgcmd{log}, \hgcmd{outgoing}, and
   37.59 -\hgcmd{tip}.
   37.60 -
   37.61 -As I write this manual, these are so far the only commands that
   37.62 -support styles and templates.  Since these are the most important
   37.63 -commands that need customisable output, there has been little pressure
   37.64 -from the Mercurial user community to add style and template support to
   37.65 -other commands.
   37.66 -
   37.67 -\section{The basics of templating}
   37.68 -
   37.69 -At its simplest, a Mercurial template is a piece of text.  Some of the
   37.70 -text never changes, while other parts are \emph{expanded}, or replaced
   37.71 -with new text, when necessary.
   37.72 -
   37.73 -Before we continue, let's look again at a simple example of
   37.74 -Mercurial's normal output.
   37.75 -
   37.76 -\interaction{template.simple.normal}
   37.77 -
   37.78 -Now, let's run the same command, but using a template to change its
   37.79 -output.
   37.80 -
   37.81 -\interaction{template.simple.simplest}
   37.82 -
   37.83 -The example above illustrates the simplest possible template; it's
   37.84 -just a piece of static text, printed once for each changeset.  The
   37.85 -\hgopt{log}{--template} option to the \hgcmd{log} command tells
   37.86 -Mercurial to use the given text as the template when printing each
   37.87 -changeset.
   37.88 -
   37.89 -Notice that the template string above ends with the text
   37.90 -``\Verb+\n+''.  This is an \emph{escape sequence}, telling Mercurial
   37.91 -to print a newline at the end of each template item.  If you omit this
   37.92 -newline, Mercurial will run each piece of output together.  See
   37.93 -section~\ref{sec:template:escape} for more details of escape sequences.
   37.94 -
   37.95 -A template that prints a fixed string of text all the time isn't very
   37.96 -useful; let's try something a bit more complex.
   37.97 -
   37.98 -\interaction{template.simple.simplesub}
   37.99 -
  37.100 -As you can see, the string ``\Verb+{desc}+'' in the template has been
  37.101 -replaced in the output with the description of each changeset.  Every
  37.102 -time Mercurial finds text enclosed in curly braces (``\texttt{\{}''
  37.103 -and ``\texttt{\}}''), it will try to replace the braces and text with
  37.104 -the expansion of whatever is inside.  To print a literal curly brace,
  37.105 -you must escape it, as described in section~\ref{sec:template:escape}.
  37.106 -
  37.107 -\section{Common template keywords}
  37.108 -\label{sec:template:keyword}
  37.109 -
  37.110 -You can start writing simple templates immediately using the keywords
  37.111 -below.
  37.112 -
  37.113 -\begin{itemize}
  37.114 -\item[\tplkword{author}] String.  The unmodified author of the changeset.
  37.115 -\item[\tplkword{branches}] String.  The name of the branch on which
  37.116 -  the changeset was committed.  Will be empty if the branch name was
  37.117 -  \texttt{default}.
  37.118 -\item[\tplkword{date}] Date information.  The date when the changeset
  37.119 -  was committed.  This is \emph{not} human-readable; you must pass it
  37.120 -  through a filter that will render it appropriately.  See
  37.121 -  section~\ref{sec:template:filter} for more information on filters.
  37.122 -  The date is expressed as a pair of numbers.  The first number is a
  37.123 -  Unix UTC timestamp (seconds since January 1, 1970); the second is
  37.124 -  the offset of the committer's timezone from UTC, in seconds.
  37.125 -\item[\tplkword{desc}] String.  The text of the changeset description.
  37.126 -\item[\tplkword{files}] List of strings.  All files modified, added, or
  37.127 -  removed by this changeset.
  37.128 -\item[\tplkword{file\_adds}] List of strings.  Files added by this
  37.129 -  changeset.
  37.130 -\item[\tplkword{file\_dels}] List of strings.  Files removed by this
  37.131 -  changeset.
  37.132 -\item[\tplkword{node}] String.  The changeset identification hash, as a
  37.133 -  40-character hexadecimal string.
  37.134 -\item[\tplkword{parents}] List of strings.  The parents of the
  37.135 -  changeset.
  37.136 -\item[\tplkword{rev}] Integer.  The repository-local changeset revision
  37.137 -  number.
  37.138 -\item[\tplkword{tags}] List of strings.  Any tags associated with the
  37.139 -  changeset.
  37.140 -\end{itemize}
  37.141 -
  37.142 -A few simple experiments will show us what to expect when we use these
  37.143 -keywords; you can see the results in
  37.144 -figure~\ref{fig:template:keywords}.
  37.145 -
  37.146 -\begin{figure}
  37.147 -  \interaction{template.simple.keywords}
  37.148 -  \caption{Template keywords in use}
  37.149 -  \label{fig:template:keywords}
  37.150 -\end{figure}
  37.151 -
  37.152 -As we noted above, the date keyword does not produce human-readable
  37.153 -output, so we must treat it specially.  This involves using a
  37.154 -\emph{filter}, about which more in section~\ref{sec:template:filter}.
  37.155 -
  37.156 -\interaction{template.simple.datekeyword}
  37.157 -
  37.158 -\section{Escape sequences}
  37.159 -\label{sec:template:escape}
  37.160 -
  37.161 -Mercurial's templating engine recognises the most commonly used escape
  37.162 -sequences in strings.  When it sees a backslash (``\Verb+\+'')
  37.163 -character, it looks at the following character and substitutes the two
  37.164 -characters with a single replacement, as described below.
  37.165 -
  37.166 -\begin{itemize}
  37.167 -\item[\Verb+\textbackslash\textbackslash+] Backslash, ``\Verb+\+'',
  37.168 -  ASCII~134.
  37.169 -\item[\Verb+\textbackslash n+] Newline, ASCII~12.
  37.170 -\item[\Verb+\textbackslash r+] Carriage return, ASCII~15.
  37.171 -\item[\Verb+\textbackslash t+] Tab, ASCII~11.
  37.172 -\item[\Verb+\textbackslash v+] Vertical tab, ASCII~13.
  37.173 -\item[\Verb+\textbackslash \{+] Open curly brace, ``\Verb+{+'', ASCII~173.
  37.174 -\item[\Verb+\textbackslash \}+] Close curly brace, ``\Verb+}+'', ASCII~175.
  37.175 -\end{itemize}
  37.176 -
  37.177 -As indicated above, if you want the expansion of a template to contain
  37.178 -a literal ``\Verb+\+'', ``\Verb+{+'', or ``\Verb+{+'' character, you
  37.179 -must escape it.
  37.180 -
  37.181 -\section{Filtering keywords to change their results}
  37.182 -\label{sec:template:filter}
  37.183 -
  37.184 -Some of the results of template expansion are not immediately easy to
  37.185 -use.  Mercurial lets you specify an optional chain of \emph{filters}
  37.186 -to modify the result of expanding a keyword.  You have already seen a
  37.187 -common filter, \tplkwfilt{date}{isodate}, in action above, to make a
  37.188 -date readable.
  37.189 -
  37.190 -Below is a list of the most commonly used filters that Mercurial
  37.191 -supports.  While some filters can be applied to any text, others can
  37.192 -only be used in specific circumstances.  The name of each filter is
  37.193 -followed first by an indication of where it can be used, then a
  37.194 -description of its effect.
  37.195 -
  37.196 -\begin{itemize}
  37.197 -\item[\tplfilter{addbreaks}] Any text. Add an XHTML ``\Verb+<br/>+''
  37.198 -  tag before the end of every line except the last.  For example,
  37.199 -  ``\Verb+foo\nbar+'' becomes ``\Verb+foo<br/>\nbar+''.
  37.200 -\item[\tplkwfilt{date}{age}] \tplkword{date} keyword.  Render the
  37.201 -  age of the date, relative to the current time.  Yields a string like
  37.202 -  ``\Verb+10 minutes+''.
  37.203 -\item[\tplfilter{basename}] Any text, but most useful for the
  37.204 -  \tplkword{files} keyword and its relatives.  Treat the text as a
  37.205 -  path, and return the basename. For example, ``\Verb+foo/bar/baz+''
  37.206 -  becomes ``\Verb+baz+''.
  37.207 -\item[\tplkwfilt{date}{date}] \tplkword{date} keyword.  Render a date
  37.208 -  in a similar format to the Unix \tplkword{date} command, but with
  37.209 -  timezone included.  Yields a string like
  37.210 -  ``\Verb+Mon Sep 04 15:13:13 2006 -0700+''.
  37.211 -\item[\tplkwfilt{author}{domain}] Any text, but most useful for the
  37.212 -  \tplkword{author} keyword.  Finds the first string that looks like
  37.213 -  an email address, and extract just the domain component.  For
  37.214 -  example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes
  37.215 -  ``\Verb+serpentine.com+''.
  37.216 -\item[\tplkwfilt{author}{email}] Any text, but most useful for the
  37.217 -  \tplkword{author} keyword.  Extract the first string that looks like
  37.218 -  an email address.  For example,
  37.219 -  ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes
  37.220 -  ``\Verb+bos@serpentine.com+''.
  37.221 -\item[\tplfilter{escape}] Any text.  Replace the special XML/XHTML
  37.222 -  characters ``\Verb+&+'', ``\Verb+<+'' and ``\Verb+>+'' with
  37.223 -  XML entities.
  37.224 -\item[\tplfilter{fill68}] Any text.  Wrap the text to fit in 68
  37.225 -  columns.  This is useful before you pass text through the
  37.226 -  \tplfilter{tabindent} filter, and still want it to fit in an
  37.227 -  80-column fixed-font window.
  37.228 -\item[\tplfilter{fill76}] Any text.  Wrap the text to fit in 76
  37.229 -  columns.
  37.230 -\item[\tplfilter{firstline}] Any text.  Yield the first line of text,
  37.231 -  without any trailing newlines.
  37.232 -\item[\tplkwfilt{date}{hgdate}] \tplkword{date} keyword.  Render the
  37.233 -  date as a pair of readable numbers.  Yields a string like
  37.234 -  ``\Verb+1157407993 25200+''.
  37.235 -\item[\tplkwfilt{date}{isodate}] \tplkword{date} keyword.  Render the
  37.236 -  date as a text string in ISO~8601 format.  Yields a string like
  37.237 -  ``\Verb+2006-09-04 15:13:13 -0700+''.
  37.238 -\item[\tplfilter{obfuscate}] Any text, but most useful for the
  37.239 -  \tplkword{author} keyword.  Yield the input text rendered as a
  37.240 -  sequence of XML entities.  This helps to defeat some particularly
  37.241 -  stupid screen-scraping email harvesting spambots.
  37.242 -\item[\tplkwfilt{author}{person}] Any text, but most useful for the
  37.243 -  \tplkword{author} keyword.  Yield the text before an email address.
  37.244 -  For example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+''
  37.245 -  becomes ``\Verb+Bryan O'Sullivan+''.
  37.246 -\item[\tplkwfilt{date}{rfc822date}] \tplkword{date} keyword.  Render a
  37.247 -  date using the same format used in email headers.  Yields a string
  37.248 -  like ``\Verb+Mon, 04 Sep 2006 15:13:13 -0700+''.
  37.249 -\item[\tplkwfilt{node}{short}] Changeset hash.  Yield the short form
  37.250 -  of a changeset hash, i.e.~a 12-character hexadecimal string.
  37.251 -\item[\tplkwfilt{date}{shortdate}] \tplkword{date} keyword.  Render
  37.252 -  the year, month, and day of the date.  Yields a string like
  37.253 -  ``\Verb+2006-09-04+''.
  37.254 -\item[\tplfilter{strip}] Any text.  Strip all leading and trailing
  37.255 -  whitespace from the string.
  37.256 -\item[\tplfilter{tabindent}] Any text.  Yield the text, with every line
  37.257 -  except the first starting with a tab character.
  37.258 -\item[\tplfilter{urlescape}] Any text.  Escape all characters that are
  37.259 -  considered ``special'' by URL parsers.  For example, \Verb+foo bar+
  37.260 -  becomes \Verb+foo%20bar+.
  37.261 -\item[\tplkwfilt{author}{user}] Any text, but most useful for the
  37.262 -  \tplkword{author} keyword.  Return the ``user'' portion of an email
  37.263 -  address.  For example,
  37.264 -  ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes
  37.265 -  ``\Verb+bos+''.
  37.266 -\end{itemize}
  37.267 -
  37.268 -\begin{figure}
  37.269 -  \interaction{template.simple.manyfilters}
  37.270 -  \caption{Template filters in action}
  37.271 -  \label{fig:template:filters}
  37.272 -\end{figure}
  37.273 -
  37.274 -\begin{note}
  37.275 -  If you try to apply a filter to a piece of data that it cannot
  37.276 -  process, Mercurial will fail and print a Python exception.  For
  37.277 -  example, trying to run the output of the \tplkword{desc} keyword
  37.278 -  into the \tplkwfilt{date}{isodate} filter is not a good idea.
  37.279 -\end{note}
  37.280 -
  37.281 -\subsection{Combining filters}
  37.282 -
  37.283 -It is easy to combine filters to yield output in the form you would
  37.284 -like.  The following chain of filters tidies up a description, then
  37.285 -makes sure that it fits cleanly into 68 columns, then indents it by a
  37.286 -further 8~characters (at least on Unix-like systems, where a tab is
  37.287 -conventionally 8~characters wide).
  37.288 -
  37.289 -\interaction{template.simple.combine}
  37.290 -
  37.291 -Note the use of ``\Verb+\t+'' (a tab character) in the template to
  37.292 -force the first line to be indented; this is necessary since
  37.293 -\tplkword{tabindent} indents all lines \emph{except} the first.
  37.294 -
  37.295 -Keep in mind that the order of filters in a chain is significant.  The
  37.296 -first filter is applied to the result of the keyword; the second to
  37.297 -the result of the first filter; and so on.  For example, using
  37.298 -\Verb+fill68|tabindent+ gives very different results from
  37.299 -\Verb+tabindent|fill68+.
  37.300 -
  37.301 -
  37.302 -\section{From templates to styles}
  37.303 -
  37.304 -A command line template provides a quick and simple way to format some
  37.305 -output.  Templates can become verbose, though, and it's useful to be
  37.306 -able to give a template a name.  A style file is a template with a
  37.307 -name, stored in a file.
  37.308 -
  37.309 -More than that, using a style file unlocks the power of Mercurial's
  37.310 -templating engine in ways that are not possible using the command line
  37.311 -\hgopt{log}{--template} option.
  37.312 -
  37.313 -\subsection{The simplest of style files}
  37.314 -
  37.315 -Our simple style file contains just one line:
  37.316 -
  37.317 -\interaction{template.simple.rev}
  37.318 -
  37.319 -This tells Mercurial, ``if you're printing a changeset, use the text
  37.320 -on the right as the template''.
  37.321 -
  37.322 -\subsection{Style file syntax}
  37.323 -
  37.324 -The syntax rules for a style file are simple.
  37.325 -
  37.326 -\begin{itemize}
  37.327 -\item The file is processed one line at a time.
  37.328 -
  37.329 -\item Leading and trailing white space are ignored.
  37.330 -
  37.331 -\item Empty lines are skipped.
  37.332 -
  37.333 -\item If a line starts with either of the characters ``\texttt{\#}'' or
  37.334 -  ``\texttt{;}'', the entire line is treated as a comment, and skipped
  37.335 -  as if empty.
  37.336 -
  37.337 -\item A line starts with a keyword.  This must start with an
  37.338 -  alphabetic character or underscore, and can subsequently contain any
  37.339 -  alphanumeric character or underscore.  (In regexp notation, a
  37.340 -  keyword must match \Verb+[A-Za-z_][A-Za-z0-9_]*+.)
  37.341 -
  37.342 -\item The next element must be an ``\texttt{=}'' character, which can
  37.343 -  be preceded or followed by an arbitrary amount of white space.
  37.344 -
  37.345 -\item If the rest of the line starts and ends with matching quote
  37.346 -  characters (either single or double quote), it is treated as a
  37.347 -  template body.
  37.348 -
  37.349 -\item If the rest of the line \emph{does not} start with a quote
  37.350 -  character, it is treated as the name of a file; the contents of this
  37.351 -  file will be read and used as a template body.
  37.352 -\end{itemize}
  37.353 -
  37.354 -\section{Style files by example}
  37.355 -
  37.356 -To illustrate how to write a style file, we will construct a few by
  37.357 -example.  Rather than provide a complete style file and walk through
  37.358 -it, we'll mirror the usual process of developing a style file by
  37.359 -starting with something very simple, and walking through a series of
  37.360 -successively more complete examples.
  37.361 -
  37.362 -\subsection{Identifying mistakes in style files}
  37.363 -
  37.364 -If Mercurial encounters a problem in a style file you are working on,
  37.365 -it prints a terse error message that, once you figure out what it
  37.366 -means, is actually quite useful.
  37.367 -
  37.368 -\interaction{template.svnstyle.syntax.input}
  37.369 -
  37.370 -Notice that \filename{broken.style} attempts to define a
  37.371 -\texttt{changeset} keyword, but forgets to give any content for it.
  37.372 -When instructed to use this style file, Mercurial promptly complains.
  37.373 -
  37.374 -\interaction{template.svnstyle.syntax.error}
  37.375 -
  37.376 -This error message looks intimidating, but it is not too hard to
  37.377 -follow.
  37.378 -
  37.379 -\begin{itemize}
  37.380 -\item The first component is simply Mercurial's way of saying ``I am
  37.381 -  giving up''.
  37.382 -  \begin{codesample4}
  37.383 -    \textbf{abort:} broken.style:1: parse error
  37.384 -  \end{codesample4}
  37.385 -
  37.386 -\item Next comes the name of the style file that contains the error.
  37.387 -  \begin{codesample4}
  37.388 -    abort: \textbf{broken.style}:1: parse error
  37.389 -  \end{codesample4}
  37.390 -
  37.391 -\item Following the file name is the line number where the error was
  37.392 -  encountered.
  37.393 -  \begin{codesample4}
  37.394 -    abort: broken.style:\textbf{1}: parse error
  37.395 -  \end{codesample4}
  37.396 -
  37.397 -\item Finally, a description of what went wrong.
  37.398 -  \begin{codesample4}
  37.399 -    abort: broken.style:1: \textbf{parse error}
  37.400 -  \end{codesample4}
  37.401 -  The description of the problem is not always clear (as in this
  37.402 -  case), but even when it is cryptic, it is almost always trivial to
  37.403 -  visually inspect the offending line in the style file and see what
  37.404 -  is wrong.
  37.405 -\end{itemize}
  37.406 -
  37.407 -\subsection{Uniquely identifying a repository}
  37.408 -
  37.409 -If you would like to be able to identify a Mercurial repository
  37.410 -``fairly uniquely'' using a short string as an identifier, you can
  37.411 -use the first revision in the repository.
  37.412 -\interaction{template.svnstyle.id} 
  37.413 -This is not guaranteed to be unique, but it is nevertheless useful in
  37.414 -many cases.
  37.415 -\begin{itemize}
  37.416 -\item It will not work in a completely empty repository, because such
  37.417 -  a repository does not have a revision~zero.
  37.418 -\item Neither will it work in the (extremely rare) case where a
  37.419 -  repository is a merge of two or more formerly independent
  37.420 -  repositories, and you still have those repositories around.
  37.421 -\end{itemize}
  37.422 -Here are some uses to which you could put this identifier:
  37.423 -\begin{itemize}
  37.424 -\item As a key into a table for a database that manages repositories
  37.425 -  on a server.
  37.426 -\item As half of a \{\emph{repository~ID}, \emph{revision~ID}\} tuple.
  37.427 -  Save this information away when you run an automated build or other
  37.428 -  activity, so that you can ``replay'' the build later if necessary.
  37.429 -\end{itemize}
  37.430 -
  37.431 -\subsection{Mimicking Subversion's output}
  37.432 -
  37.433 -Let's try to emulate the default output format used by another
  37.434 -revision control tool, Subversion.
  37.435 -\interaction{template.svnstyle.short}
  37.436 -
  37.437 -Since Subversion's output style is fairly simple, it is easy to
  37.438 -copy-and-paste a hunk of its output into a file, and replace the text
  37.439 -produced above by Subversion with the template values we'd like to see
  37.440 -expanded.
  37.441 -\interaction{template.svnstyle.template}
  37.442 -
  37.443 -There are a few small ways in which this template deviates from the
  37.444 -output produced by Subversion.
  37.445 -\begin{itemize}
  37.446 -\item Subversion prints a ``readable'' date (the ``\texttt{Wed, 27 Sep
  37.447 -    2006}'' in the example output above) in parentheses.  Mercurial's
  37.448 -  templating engine does not provide a way to display a date in this
  37.449 -  format without also printing the time and time zone.
  37.450 -\item We emulate Subversion's printing of ``separator'' lines full of
  37.451 -  ``\texttt{-}'' characters by ending the template with such a line.
  37.452 -  We use the templating engine's \tplkword{header} keyword to print a
  37.453 -  separator line as the first line of output (see below), thus
  37.454 -  achieving similar output to Subversion.
  37.455 -\item Subversion's output includes a count in the header of the number
  37.456 -  of lines in the commit message.  We cannot replicate this in
  37.457 -  Mercurial; the templating engine does not currently provide a filter
  37.458 -  that counts the number of lines the template generates.
  37.459 -\end{itemize}
  37.460 -It took me no more than a minute or two of work to replace literal
  37.461 -text from an example of Subversion's output with some keywords and
  37.462 -filters to give the template above.  The style file simply refers to
  37.463 -the template.
  37.464 -\interaction{template.svnstyle.style}
  37.465 -
  37.466 -We could have included the text of the template file directly in the
  37.467 -style file by enclosing it in quotes and replacing the newlines with
  37.468 -``\verb!\n!'' sequences, but it would have made the style file too
  37.469 -difficult to read.  Readability is a good guide when you're trying to
  37.470 -decide whether some text belongs in a style file, or in a template
  37.471 -file that the style file points to.  If the style file will look too
  37.472 -big or cluttered if you insert a literal piece of text, drop it into a
  37.473 -template instead.
  37.474 -
  37.475 -%%% Local Variables: 
  37.476 -%%% mode: latex
  37.477 -%%% TeX-master: "00book"
  37.478 -%%% End: 

    38.1 --- a/en/tour-basic.tex	Thu Jan 29 22:47:34 2009 -0800
    38.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    38.3 @@ -1,624 +0,0 @@
    38.4 -\chapter{A tour of Mercurial: the basics}
    38.5 -\label{chap:tour-basic}
    38.6 -
    38.7 -\section{Installing Mercurial on your system}
    38.8 -\label{sec:tour:install}
    38.9 -
   38.10 -Prebuilt binary packages of Mercurial are available for every popular
   38.11 -operating system.  These make it easy to start using Mercurial on your
   38.12 -computer immediately.
   38.13 -
   38.14 -\subsection{Linux}
   38.15 -
   38.16 -Because each Linux distribution has its own packaging tools, policies,
   38.17 -and rate of development, it's difficult to give a comprehensive set of
   38.18 -instructions on how to install Mercurial binaries.  The version of
   38.19 -Mercurial that you will end up with can vary depending on how active
   38.20 -the person is who maintains the package for your distribution.
   38.21 -
   38.22 -To keep things simple, I will focus on installing Mercurial from the
   38.23 -command line under the most popular Linux distributions.  Most of
   38.24 -these distributions provide graphical package managers that will let
   38.25 -you install Mercurial with a single click; the package name to look
   38.26 -for is \texttt{mercurial}.
   38.27 -
   38.28 -\begin{itemize}
   38.29 -\item[Debian]
   38.30 -  \begin{codesample4}
   38.31 -    apt-get install mercurial
   38.32 -  \end{codesample4}
   38.33 -
   38.34 -\item[Fedora Core]
   38.35 -  \begin{codesample4}
   38.36 -    yum install mercurial
   38.37 -  \end{codesample4}
   38.38 -
   38.39 -\item[Gentoo]
   38.40 -  \begin{codesample4}
   38.41 -    emerge mercurial
   38.42 -  \end{codesample4}
   38.43 -
   38.44 -\item[OpenSUSE]
   38.45 -  \begin{codesample4}
   38.46 -    yum install mercurial
   38.47 -  \end{codesample4}
   38.48 -
   38.49 -\item[Ubuntu] Ubuntu's Mercurial package is based on Debian's.  To
   38.50 -  install it, run the following command.
   38.51 -  \begin{codesample4}
   38.52 -    apt-get install mercurial
   38.53 -  \end{codesample4}
   38.54 -  The Ubuntu package for Mercurial tends to lag behind the Debian
   38.55 -  version by a considerable time margin (at the time of writing, seven
   38.56 -  months), which in some cases will mean that on Ubuntu, you may run
   38.57 -  into problems that have since been fixed in the Debian package.
   38.58 -\end{itemize}
   38.59 -
   38.60 -\subsection{Solaris}
   38.61 -
   38.62 -SunFreeWare, at \url{http://www.sunfreeware.com}, is a good source for a
   38.63 -large number of pre-built Solaris packages for 32 and 64 bit Intel and
   38.64 -Sparc architectures, including current versions of Mercurial.
   38.65 -
   38.66 -\subsection{Mac OS X}
   38.67 -
   38.68 -Lee Cantey publishes an installer of Mercurial for Mac OS~X at
   38.69 -\url{http://mercurial.berkwood.com}.  This package works on both
   38.70 -Intel-~and Power-based Macs.  Before you can use it, you must install
   38.71 -a compatible version of Universal MacPython~\cite{web:macpython}.  This
   38.72 -is easy to do; simply follow the instructions on Lee's site.
   38.73 -
   38.74 -It's also possible to install Mercurial using Fink or MacPorts,
   38.75 -two popular free package managers for Mac OS X.  If you have Fink,
   38.76 -use \command{sudo apt-get install mercurial-py25}.  If MacPorts,
   38.77 -\command{sudo port install mercurial}.
   38.78 -
   38.79 -\subsection{Windows}
   38.80 -
   38.81 -Lee Cantey publishes an installer of Mercurial for Windows at
   38.82 -\url{http://mercurial.berkwood.com}.  This package has no external
   38.83 -dependencies; it ``just works''.
   38.84 -
   38.85 -\begin{note}
   38.86 -  The Windows version of Mercurial does not automatically convert line
   38.87 -  endings between Windows and Unix styles.  If you want to share work
   38.88 -  with Unix users, you must do a little additional configuration
   38.89 -  work. XXX Flesh this out.
   38.90 -\end{note}
   38.91 -
   38.92 -\section{Getting started}
   38.93 -
   38.94 -To begin, we'll use the \hgcmd{version} command to find out whether
   38.95 -Mercurial is actually installed properly.  The actual version
   38.96 -information that it prints isn't so important; it's whether it prints
   38.97 -anything at all that we care about.
   38.98 -\interaction{tour.version}
   38.99 -
  38.100 -\subsection{Built-in help}
  38.101 -
  38.102 -Mercurial provides a built-in help system.  This is invaluable for those
  38.103 -times when you find yourself stuck trying to remember how to run a
  38.104 -command.  If you are completely stuck, simply run \hgcmd{help}; it
  38.105 -will print a brief list of commands, along with a description of what
  38.106 -each does.  If you ask for help on a specific command (as below), it
  38.107 -prints more detailed information.
  38.108 -\interaction{tour.help}
  38.109 -For a more impressive level of detail (which you won't usually need)
  38.110 -run \hgcmdargs{help}{\hggopt{-v}}.  The \hggopt{-v} option is short
  38.111 -for \hggopt{--verbose}, and tells Mercurial to print more information
  38.112 -than it usually would.
  38.113 -
  38.114 -\section{Working with a repository}
  38.115 -
  38.116 -In Mercurial, everything happens inside a \emph{repository}.  The
  38.117 -repository for a project contains all of the files that ``belong to''
  38.118 -that project, along with a historical record of the project's files.
  38.119 -
  38.120 -There's nothing particularly magical about a repository; it is simply
  38.121 -a directory tree in your filesystem that Mercurial treats as special.
  38.122 -You can rename or delete a repository any time you like, using either the
  38.123 -command line or your file browser.
  38.124 -
  38.125 -\subsection{Making a local copy of a repository}
  38.126 -
  38.127 -\emph{Copying} a repository is just a little bit special.  While you
  38.128 -could use a normal file copying command to make a copy of a
  38.129 -repository, it's best to use a built-in command that Mercurial
  38.130 -provides.  This command is called \hgcmd{clone}, because it creates an
  38.131 -identical copy of an existing repository.
  38.132 -\interaction{tour.clone}
  38.133 -If our clone succeeded, we should now have a local directory called
  38.134 -\dirname{hello}.  This directory will contain some files.
  38.135 -\interaction{tour.ls}
  38.136 -These files have the same contents and history in our repository as
  38.137 -they do in the repository we cloned.
  38.138 -
  38.139 -Every Mercurial repository is complete, self-contained, and
  38.140 -independent.  It contains its own private copy of a project's files
  38.141 -and history.  A cloned repository remembers the location of the
  38.142 -repository it was cloned from, but it does not communicate with that
  38.143 -repository, or any other, unless you tell it to.
  38.144 -
  38.145 -What this means for now is that we're free to experiment with our
  38.146 -repository, safe in the knowledge that it's a private ``sandbox'' that
  38.147 -won't affect anyone else.
  38.148 -
  38.149 -\subsection{What's in a repository?}
  38.150 -
  38.151 -When we take a more detailed look inside a repository, we can see that
  38.152 -it contains a directory named \dirname{.hg}.  This is where Mercurial
  38.153 -keeps all of its metadata for the repository.
  38.154 -\interaction{tour.ls-a}
  38.155 -
  38.156 -The contents of the \dirname{.hg} directory and its subdirectories are
  38.157 -private to Mercurial.  Every other file and directory in the
  38.158 -repository is yours to do with as you please.
  38.159 -
  38.160 -To introduce a little terminology, the \dirname{.hg} directory is the
  38.161 -``real'' repository, and all of the files and directories that coexist
  38.162 -with it are said to live in the \emph{working directory}.  An easy way
  38.163 -to remember the distinction is that the \emph{repository} contains the
  38.164 -\emph{history} of your project, while the \emph{working directory}
  38.165 -contains a \emph{snapshot} of your project at a particular point in
  38.166 -history.
  38.167 -
  38.168 -\section{A tour through history}
  38.169 -
  38.170 -One of the first things we might want to do with a new, unfamiliar
  38.171 -repository is understand its history.  The \hgcmd{log} command gives
  38.172 -us a view of history.
  38.173 -\interaction{tour.log}
  38.174 -By default, this command prints a brief paragraph of output for each
  38.175 -change to the project that was recorded.  In Mercurial terminology, we
  38.176 -call each of these recorded events a \emph{changeset}, because it can
  38.177 -contain a record of changes to several files.
  38.178 -
  38.179 -The fields in a record of output from \hgcmd{log} are as follows.
  38.180 -\begin{itemize}
  38.181 -\item[\texttt{changeset}] This field has the format of a number,
  38.182 -  followed by a colon, followed by a hexadecimal string.  These are
  38.183 -  \emph{identifiers} for the changeset.  There are two identifiers
  38.184 -  because the number is shorter and easier to type than the hex
  38.185 -  string.
  38.186 -\item[\texttt{user}] The identity of the person who created the
  38.187 -  changeset.  This is a free-form field, but it most often contains a
  38.188 -  person's name and email address.
  38.189 -\item[\texttt{date}] The date and time on which the changeset was
  38.190 -  created, and the timezone in which it was created.  (The date and
  38.191 -  time are local to that timezone; they display what time and date it
  38.192 -  was for the person who created the changeset.)
  38.193 -\item[\texttt{summary}] The first line of the text message that the
  38.194 -  creator of the changeset entered to describe the changeset.
  38.195 -\end{itemize}
  38.196 -The default output printed by \hgcmd{log} is purely a summary; it is
  38.197 -missing a lot of detail.
  38.198 -
  38.199 -Figure~\ref{fig:tour-basic:history} provides a graphical representation of
  38.200 -the history of the \dirname{hello} repository, to make it a little
  38.201 -easier to see which direction history is ``flowing'' in.  We'll be
  38.202 -returning to this figure several times in this chapter and the chapter
  38.203 -that follows.
  38.204 -
  38.205 -\begin{figure}[ht]
  38.206 -  \centering
  38.207 -  \grafix{tour-history}
  38.208 -  \caption{Graphical history of the \dirname{hello} repository}
  38.209 -  \label{fig:tour-basic:history}
  38.210 -\end{figure}
  38.211 -
  38.212 -\subsection{Changesets, revisions, and talking to other 
  38.213 -  people}
  38.214 -
  38.215 -As English is a notoriously sloppy language, and computer science has
  38.216 -a hallowed history of terminological confusion (why use one term when
  38.217 -four will do?), revision control has a variety of words and phrases
  38.218 -that mean the same thing.  If you are talking about Mercurial history
  38.219 -with other people, you will find that the word ``changeset'' is often
  38.220 -compressed to ``change'' or (when written) ``cset'', and sometimes a
  38.221 -changeset is referred to as a ``revision'' or a ``rev''.
  38.222 -
  38.223 -While it doesn't matter what \emph{word} you use to refer to the
  38.224 -concept of ``a~changeset'', the \emph{identifier} that you use to
  38.225 -refer to ``a~\emph{specific} changeset'' is of great importance.
  38.226 -Recall that the \texttt{changeset} field in the output from
  38.227 -\hgcmd{log} identifies a changeset using both a number and a
  38.228 -hexadecimal string.
  38.229 -\begin{itemize}
  38.230 -\item The revision number is \emph{only valid in that repository},
  38.231 -\item while the hex string is the \emph{permanent, unchanging
  38.232 -    identifier} that will always identify that exact changeset in
  38.233 -  \emph{every} copy of the repository.
  38.234 -\end{itemize}
  38.235 -This distinction is important.  If you send someone an email talking
  38.236 -about ``revision~33'', there's a high likelihood that their
  38.237 -revision~33 will \emph{not be the same} as yours.  The reason for this
  38.238 -is that a revision number depends on the order in which changes
  38.239 -arrived in a repository, and there is no guarantee that the same
  38.240 -changes will happen in the same order in different repositories.
  38.241 -Three changes $a,b,c$ can easily appear in one repository as $0,1,2$,
  38.242 -while in another as $1,0,2$.
  38.243 -
  38.244 -Mercurial uses revision numbers purely as a convenient shorthand.  If
  38.245 -you need to discuss a changeset with someone, or make a record of a
  38.246 -changeset for some other reason (for example, in a bug report), use
  38.247 -the hexadecimal identifier.
  38.248 -
  38.249 -\subsection{Viewing specific revisions}
  38.250 -
  38.251 -To narrow the output of \hgcmd{log} down to a single revision, use the
  38.252 -\hgopt{log}{-r} (or \hgopt{log}{--rev}) option.  You can use either a
  38.253 -revision number or a long-form changeset identifier, and you can
  38.254 -provide as many revisions as you want.  \interaction{tour.log-r}
  38.255 -
  38.256 -If you want to see the history of several revisions without having to
  38.257 -list each one, you can use \emph{range notation}; this lets you
  38.258 -express the idea ``I want all revisions between $a$ and $b$,
  38.259 -inclusive''.
  38.260 -\interaction{tour.log.range}
  38.261 -Mercurial also honours the order in which you specify revisions, so
  38.262 -\hgcmdargs{log}{-r 2:4} prints $2,3,4$ while \hgcmdargs{log}{-r 4:2}
  38.263 -prints $4,3,2$.
  38.264 -
  38.265 -\subsection{More detailed information}
  38.266 -
  38.267 -While the summary information printed by \hgcmd{log} is useful if you
  38.268 -already know what you're looking for, you may need to see a complete
  38.269 -description of the change, or a list of the files changed, if you're
  38.270 -trying to decide whether a changeset is the one you're looking for.
  38.271 -The \hgcmd{log} command's \hggopt{-v} (or \hggopt{--verbose})
  38.272 -option gives you this extra detail.
  38.273 -\interaction{tour.log-v}
  38.274 -
  38.275 -If you want to see both the description and content of a change, add
  38.276 -the \hgopt{log}{-p} (or \hgopt{log}{--patch}) option.  This displays
  38.277 -the content of a change as a \emph{unified diff} (if you've never seen
  38.278 -a unified diff before, see section~\ref{sec:mq:patch} for an overview).
  38.279 -\interaction{tour.log-vp}
  38.280 -
  38.281 -\section{All about command options}
  38.282 -
  38.283 -Let's take a brief break from exploring Mercurial commands to discuss
  38.284 -a pattern in the way that they work; you may find this useful to keep
  38.285 -in mind as we continue our tour.
  38.286 -
  38.287 -Mercurial has a consistent and straightforward approach to dealing
  38.288 -with the options that you can pass to commands.  It follows the
  38.289 -conventions for options that are common to modern Linux and Unix
  38.290 -systems.
  38.291 -\begin{itemize}
  38.292 -\item Every option has a long name.  For example, as we've already
  38.293 -  seen, the \hgcmd{log} command accepts a \hgopt{log}{--rev} option.
  38.294 -\item Most options have short names, too.  Instead of
  38.295 -  \hgopt{log}{--rev}, we can use \hgopt{log}{-r}.  (The reason that
  38.296 -  some options don't have short names is that the options in question
  38.297 -  are rarely used.)
  38.298 -\item Long options start with two dashes (e.g.~\hgopt{log}{--rev}),
  38.299 -  while short options start with one (e.g.~\hgopt{log}{-r}).
  38.300 -\item Option naming and usage is consistent across commands.  For
  38.301 -  example, every command that lets you specify a changeset~ID or
  38.302 -  revision number accepts both \hgopt{log}{-r} and \hgopt{log}{--rev}
  38.303 -  arguments.
  38.304 -\end{itemize}
  38.305 -In the examples throughout this book, I use short options instead of
  38.306 -long.  This just reflects my own preference, so don't read anything
  38.307 -significant into it.
  38.308 -
  38.309 -Most commands that print output of some kind will print more output
  38.310 -when passed a \hggopt{-v} (or \hggopt{--verbose}) option, and less
  38.311 -when passed \hggopt{-q} (or \hggopt{--quiet}).
  38.312 -
  38.313 -\section{Making and reviewing changes}
  38.314 -
  38.315 -Now that we have a grasp of viewing history in Mercurial, let's take a
  38.316 -look at making some changes and examining them.
  38.317 -
  38.318 -The first thing we'll do is isolate our experiment in a repository of
  38.319 -its own.  We use the \hgcmd{clone} command, but we don't need to
  38.320 -clone a copy of the remote repository.  Since we already have a copy
  38.321 -of it locally, we can just clone that instead.  This is much faster
  38.322 -than cloning over the network, and cloning a local repository uses
  38.323 -less disk space in most cases, too.
  38.324 -\interaction{tour.reclone}
  38.325 -As an aside, it's often good practice to keep a ``pristine'' copy of a
  38.326 -remote repository around, which you can then make temporary clones of
  38.327 -to create sandboxes for each task you want to work on.  This lets you
  38.328 -work on multiple tasks in parallel, each isolated from the others
  38.329 -until it's complete and you're ready to integrate it back.  Because
  38.330 -local clones are so cheap, there's almost no overhead to cloning and
  38.331 -destroying repositories whenever you want.
  38.332 -
  38.333 -In our \dirname{my-hello} repository, we have a file
  38.334 -\filename{hello.c} that contains the classic ``hello, world'' program.
  38.335 -Let's use the ancient and venerable \command{sed} command to edit this
  38.336 -file so that it prints a second line of output.  (I'm only using
  38.337 -\command{sed} to do this because it's easy to write a scripted example
  38.338 -this way.  Since you're not under the same constraint, you probably
  38.339 -won't want to use \command{sed}; simply use your preferred text editor to
  38.340 -do the same thing.)
  38.341 -\interaction{tour.sed}
  38.342 -
  38.343 -Mercurial's \hgcmd{status} command will tell us what Mercurial knows
  38.344 -about the files in the repository.
  38.345 -\interaction{tour.status}
  38.346 -The \hgcmd{status} command prints no output for some files, but a line
  38.347 -starting with ``\texttt{M}'' for \filename{hello.c}.  Unless you tell
  38.348 -it to, \hgcmd{status} will not print any output for files that have
  38.349 -not been modified.  
  38.350 -
  38.351 -The ``\texttt{M}'' indicates that Mercurial has noticed that we
  38.352 -modified \filename{hello.c}.  We didn't need to \emph{inform}
  38.353 -Mercurial that we were going to modify the file before we started, or
  38.354 -that we had modified the file after we were done; it was able to
  38.355 -figure this out itself.
  38.356 -
  38.357 -It's a little bit helpful to know that we've modified
  38.358 -\filename{hello.c}, but we might prefer to know exactly \emph{what}
  38.359 -changes we've made to it.  To do this, we use the \hgcmd{diff}
  38.360 -command.
  38.361 -\interaction{tour.diff}
  38.362 -
  38.363 -\section{Recording changes in a new changeset}
  38.364 -
  38.365 -We can modify files, build and test our changes, and use
  38.366 -\hgcmd{status} and \hgcmd{diff} to review our changes, until we're
  38.367 -satisfied with what we've done and arrive at a natural stopping point
  38.368 -where we want to record our work in a new changeset.
  38.369 -
  38.370 -The \hgcmd{commit} command lets us create a new changeset; we'll
  38.371 -usually refer to this as ``making a commit'' or ``committing''.  
  38.372 -
  38.373 -\subsection{Setting up a username}
  38.374 -
  38.375 -When you try to run \hgcmd{commit} for the first time, it is not
  38.376 -guaranteed to succeed.  Mercurial records your name and address with
  38.377 -each change that you commit, so that you and others will later be able
  38.378 -to tell who made each change.  Mercurial tries to automatically figure
  38.379 -out a sensible username to commit the change with.  It will attempt
  38.380 -each of the following methods, in order:
  38.381 -\begin{enumerate}
  38.382 -\item If you specify a \hgopt{commit}{-u} option to the \hgcmd{commit}
  38.383 -  command on the command line, followed by a username, this is always
  38.384 -  given the highest precedence.
  38.385 -\item If you have set the \envar{HGUSER} environment variable, this is
  38.386 -  checked next.
  38.387 -\item If you create a file in your home directory called
  38.388 -  \sfilename{.hgrc}, with a \rcitem{ui}{username} entry, that will be
  38.389 -  used next.  To see what the contents of this file should look like,
  38.390 -  refer to section~\ref{sec:tour-basic:username} below.
  38.391 -\item If you have set the \envar{EMAIL} environment variable, this
  38.392 -  will be used next.
  38.393 -\item Mercurial will query your system to find out your local user
  38.394 -  name and host name, and construct a username from these components.
  38.395 -  Since this often results in a username that is not very useful, it
  38.396 -  will print a warning if it has to do this.
  38.397 -\end{enumerate}
  38.398 -If all of these mechanisms fail, Mercurial will fail, printing an
  38.399 -error message.  In this case, it will not let you commit until you set
  38.400 -up a username.
  38.401 -
  38.402 -You should think of the \envar{HGUSER} environment variable and the
  38.403 -\hgopt{commit}{-u} option to the \hgcmd{commit} command as ways to
  38.404 -\emph{override} Mercurial's default selection of username.  For normal
  38.405 -use, the simplest and most robust way to set a username for yourself
  38.406 -is by creating a \sfilename{.hgrc} file; see below for details.
  38.407 -
  38.408 -\subsubsection{Creating a Mercurial configuration file}
  38.409 -\label{sec:tour-basic:username}
  38.410 -
  38.411 -To set a user name, use your favourite editor to create a file called
  38.412 -\sfilename{.hgrc} in your home directory.  Mercurial will use this
  38.413 -file to look up your personalised configuration settings.  The initial
  38.414 -contents of your \sfilename{.hgrc} should look like this.
  38.415 -\begin{codesample2}
  38.416 -  # This is a Mercurial configuration file.
  38.417 -  [ui]
  38.418 -  username = Firstname Lastname <email.address@domain.net>
  38.419 -\end{codesample2}
  38.420 -The ``\texttt{[ui]}'' line begins a \emph{section} of the config file,
  38.421 -so you can read the ``\texttt{username = ...}'' line as meaning ``set
  38.422 -the value of the \texttt{username} item in the \texttt{ui} section''.
  38.423 -A section continues until a new section begins, or the end of the
  38.424 -file.  Mercurial ignores empty lines and treats any text from
  38.425 -``\texttt{\#}'' to the end of a line as a comment.
  38.426 -
  38.427 -\subsubsection{Choosing a user name}
  38.428 -
  38.429 -You can use any text you like as the value of the \texttt{username}
  38.430 -config item, since this information is for reading by other people,
  38.431 -but for interpreting by Mercurial.  The convention that most people
  38.432 -follow is to use their name and email address, as in the example
  38.433 -above.
  38.434 -
  38.435 -\begin{note}
  38.436 -  Mercurial's built-in web server obfuscates email addresses, to make
  38.437 -  it more difficult for the email harvesting tools that spammers use.
  38.438 -  This reduces the likelihood that you'll start receiving more junk
  38.439 -  email if you publish a Mercurial repository on the web.
  38.440 -\end{note}
  38.441 -
  38.442 -\subsection{Writing a commit message}
  38.443 -
  38.444 -When we commit a change, Mercurial drops us into a text editor, to
  38.445 -enter a message that will describe the modifications we've made in
  38.446 -this changeset.  This is called the \emph{commit message}.  It will be
  38.447 -a record for readers of what we did and why, and it will be printed by
  38.448 -\hgcmd{log} after we've finished committing.
  38.449 -\interaction{tour.commit}
  38.450 -
  38.451 -The editor that the \hgcmd{commit} command drops us into will contain
  38.452 -an empty line, followed by a number of lines starting with
  38.453 -``\texttt{HG:}''.
  38.454 -\begin{codesample2}
  38.455 -  \emph{empty line}
  38.456 -  HG: changed hello.c
  38.457 -\end{codesample2}
  38.458 -Mercurial ignores the lines that start with ``\texttt{HG:}''; it uses
  38.459 -them only to tell us which files it's recording changes to.  Modifying
  38.460 -or deleting these lines has no effect.
  38.461 -
  38.462 -\subsection{Writing a good commit message}
  38.463 -
  38.464 -Since \hgcmd{log} only prints the first line of a commit message by
  38.465 -default, it's best to write a commit message whose first line stands
  38.466 -alone.  Here's a real example of a commit message that \emph{doesn't}
  38.467 -follow this guideline, and hence has a summary that is not readable.
  38.468 -\begin{codesample2}
  38.469 -  changeset:   73:584af0e231be
  38.470 -  user:        Censored Person <censored.person@example.org>
  38.471 -  date:        Tue Sep 26 21:37:07 2006 -0700
  38.472 -  summary:     include buildmeister/commondefs.   Add an exports and install
  38.473 -\end{codesample2}
  38.474 -
  38.475 -As far as the remainder of the contents of the commit message are
  38.476 -concerned, there are no hard-and-fast rules.  Mercurial itself doesn't
  38.477 -interpret or care about the contents of the commit message, though
  38.478 -your project may have policies that dictate a certain kind of
  38.479 -formatting.
  38.480 -
  38.481 -My personal preference is for short, but informative, commit messages
  38.482 -that tell me something that I can't figure out with a quick glance at
  38.483 -the output of \hgcmdargs{log}{--patch}.
  38.484 -
  38.485 -\subsection{Aborting a commit}
  38.486 -
  38.487 -If you decide that you don't want to commit while in the middle of
  38.488 -editing a commit message, simply exit from your editor without saving
  38.489 -the file that it's editing.  This will cause nothing to happen to
  38.490 -either the repository or the working directory.
  38.491 -
  38.492 -If we run the \hgcmd{commit} command without any arguments, it records
  38.493 -all of the changes we've made, as reported by \hgcmd{status} and
  38.494 -\hgcmd{diff}.
  38.495 -
  38.496 -\subsection{Admiring our new handiwork}
  38.497 -
  38.498 -Once we've finished the commit, we can use the \hgcmd{tip} command to
  38.499 -display the changeset we just created.  This command produces output
  38.500 -that is identical to \hgcmd{log}, but it only displays the newest
  38.501 -revision in the repository.
  38.502 -\interaction{tour.tip}
  38.503 -We refer to the newest revision in the repository as the tip revision,
  38.504 -or simply the tip.
  38.505 -
  38.506 -\section{Sharing changes}
  38.507 -
  38.508 -We mentioned earlier that repositories in Mercurial are
  38.509 -self-contained.  This means that the changeset we just created exists
  38.510 -only in our \dirname{my-hello} repository.  Let's look at a few ways
  38.511 -that we can propagate this change into other repositories.
  38.512 -
  38.513 -\subsection{Pulling changes from another repository}
  38.514 -\label{sec:tour:pull}
  38.515 -
  38.516 -To get started, let's clone our original \dirname{hello} repository,
  38.517 -which does not contain the change we just committed.  We'll call our
  38.518 -temporary repository \dirname{hello-pull}.
  38.519 -\interaction{tour.clone-pull}
  38.520 -
  38.521 -We'll use the \hgcmd{pull} command to bring changes from
  38.522 -\dirname{my-hello} into \dirname{hello-pull}.  However, blindly
  38.523 -pulling unknown changes into a repository is a somewhat scary
  38.524 -prospect.  Mercurial provides the \hgcmd{incoming} command to tell us
  38.525 -what changes the \hgcmd{pull} command \emph{would} pull into the
  38.526 -repository, without actually pulling the changes in.
  38.527 -\interaction{tour.incoming}
  38.528 -(Of course, someone could cause more changesets to appear in the
  38.529 -repository that we ran \hgcmd{incoming} in, before we get a chance to
  38.530 -\hgcmd{pull} the changes, so that we could end up pulling changes that we
  38.531 -didn't expect.)
  38.532 -
  38.533 -Bringing changes into a repository is a simple matter of running the
  38.534 -\hgcmd{pull} command, and telling it which repository to pull from.
  38.535 -\interaction{tour.pull}
  38.536 -As you can see from the before-and-after output of \hgcmd{tip}, we
  38.537 -have successfully pulled changes into our repository.  There remains
  38.538 -one step before we can see these changes in the working directory.
  38.539 -
  38.540 -\subsection{Updating the working directory}
  38.541 -
  38.542 -We have so far glossed over the relationship between a repository and
  38.543 -its working directory.  The \hgcmd{pull} command that we ran in
  38.544 -section~\ref{sec:tour:pull} brought changes into the repository, but
  38.545 -if we check, there's no sign of those changes in the working
  38.546 -directory.  This is because \hgcmd{pull} does not (by default) touch
  38.547 -the working directory.  Instead, we use the \hgcmd{update} command to
  38.548 -do this.
  38.549 -\interaction{tour.update}
  38.550 -
  38.551 -It might seem a bit strange that \hgcmd{pull} doesn't update the
  38.552 -working directory automatically.  There's actually a good reason for
  38.553 -this: you can use \hgcmd{update} to update the working directory to
  38.554 -the state it was in at \emph{any revision} in the history of the
  38.555 -repository.  If you had the working directory updated to an old
  38.556 -revision---to hunt down the origin of a bug, say---and ran a
  38.557 -\hgcmd{pull} which automatically updated the working directory to a
  38.558 -new revision, you might not be terribly happy.
  38.559 -
  38.560 -However, since pull-then-update is such a common thing to do,
  38.561 -Mercurial lets you combine the two by passing the \hgopt{pull}{-u}
  38.562 -option to \hgcmd{pull}.
  38.563 -\begin{codesample2}
  38.564 -  hg pull -u
  38.565 -\end{codesample2}
  38.566 -If you look back at the output of \hgcmd{pull} in
  38.567 -section~\ref{sec:tour:pull} when we ran it without \hgopt{pull}{-u},
  38.568 -you can see that it printed a helpful reminder that we'd have to take
  38.569 -an explicit step to update the working directory:
  38.570 -\begin{codesample2}
  38.571 -  (run 'hg update' to get a working copy)
  38.572 -\end{codesample2}
  38.573 -
  38.574 -To find out what revision the working directory is at, use the
  38.575 -\hgcmd{parents} command.
  38.576 -\interaction{tour.parents}
  38.577 -If you look back at figure~\ref{fig:tour-basic:history}, you'll see
  38.578 -arrows connecting each changeset.  The node that the arrow leads
  38.579 -\emph{from} in each case is a parent, and the node that the arrow
  38.580 -leads \emph{to} is its child.  The working directory has a parent in
  38.581 -just the same way; this is the changeset that the working directory
  38.582 -currently contains.
  38.583 -
  38.584 -To update the working directory to a particular revision, give a
  38.585 -revision number or changeset~ID to the \hgcmd{update} command.
  38.586 -\interaction{tour.older}
  38.587 -If you omit an explicit revision, \hgcmd{update} will update to the
  38.588 -tip revision, as shown by the second call to \hgcmd{update} in the
  38.589 -example above.
  38.590 -
  38.591 -\subsection{Pushing changes to another repository}
  38.592 -
  38.593 -Mercurial lets us push changes to another repository, from the
  38.594 -repository we're currently visiting.  As with the example of
  38.595 -\hgcmd{pull} above, we'll create a temporary repository to push our
  38.596 -changes into.
  38.597 -\interaction{tour.clone-push}
  38.598 -The \hgcmd{outgoing} command tells us what changes would be pushed
  38.599 -into another repository.
  38.600 -\interaction{tour.outgoing}
  38.601 -And the \hgcmd{push} command does the actual push.
  38.602 -\interaction{tour.push}
  38.603 -As with \hgcmd{pull}, the \hgcmd{push} command does not update the
  38.604 -working directory in the repository that it's pushing changes into.
  38.605 -(Unlike \hgcmd{pull}, \hgcmd{push} does not provide a \texttt{-u}
  38.606 -option that updates the other repository's working directory.)
  38.607 -
  38.608 -What happens if we try to pull or push changes and the receiving
  38.609 -repository already has those changes?  Nothing too exciting.
  38.610 -\interaction{tour.push.nothing}
  38.611 -
  38.612 -\subsection{Sharing changes over a network}
  38.613 -
  38.614 -The commands we have covered in the previous few sections are not
  38.615 -limited to working with local repositories.  Each works in exactly the
  38.616 -same fashion over a network connection; simply pass in a URL instead
  38.617 -of a local path.
  38.618 -\interaction{tour.outgoing.net}
  38.619 -In this example, we can see what changes we could push to the remote
  38.620 -repository, but the repository is understandably not set up to let
  38.621 -anonymous users push to it.
  38.622 -\interaction{tour.push.net}
  38.623 -
  38.624 -%%% Local Variables: 
  38.625 -%%% mode: latex
  38.626 -%%% TeX-master: "00book"
  38.627 -%%% End: 

    39.1 --- a/en/tour-merge.tex	Thu Jan 29 22:47:34 2009 -0800
    39.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    39.3 @@ -1,283 +0,0 @@
    39.4 -\chapter{A tour of Mercurial: merging work}
    39.5 -\label{chap:tour-merge}
    39.6 -
    39.7 -We've now covered cloning a repository, making changes in a
    39.8 -repository, and pulling or pushing changes from one repository into
    39.9 -another.  Our next step is \emph{merging} changes from separate
   39.10 -repositories.
   39.11 -
   39.12 -\section{Merging streams of work}
   39.13 -
   39.14 -Merging is a fundamental part of working with a distributed revision
   39.15 -control tool.
   39.16 -\begin{itemize}
   39.17 -\item Alice and Bob each have a personal copy of a repository for a
   39.18 -  project they're collaborating on.  Alice fixes a bug in her
   39.19 -  repository; Bob adds a new feature in his.  They want the shared
   39.20 -  repository to contain both the bug fix and the new feature.
   39.21 -\item I frequently work on several different tasks for a single
   39.22 -  project at once, each safely isolated in its own repository.
   39.23 -  Working this way means that I often need to merge one piece of my
   39.24 -  own work with another.
   39.25 -\end{itemize}
   39.26 -
   39.27 -Because merging is such a common thing to need to do, Mercurial makes
   39.28 -it easy.  Let's walk through the process.  We'll begin by cloning yet
   39.29 -another repository (see how often they spring up?) and making a change
   39.30 -in it.
   39.31 -\interaction{tour.merge.clone}
   39.32 -We should now have two copies of \filename{hello.c} with different
   39.33 -contents.  The histories of the two repositories have also diverged,
   39.34 -as illustrated in figure~\ref{fig:tour-merge:sep-repos}.
   39.35 -\interaction{tour.merge.cat}
   39.36 -
   39.37 -\begin{figure}[ht]
   39.38 -  \centering
   39.39 -  \grafix{tour-merge-sep-repos}
   39.40 -  \caption{Divergent recent histories of the \dirname{my-hello} and
   39.41 -    \dirname{my-new-hello} repositories}
   39.42 -  \label{fig:tour-merge:sep-repos}
   39.43 -\end{figure}
   39.44 -
   39.45 -We already know that pulling changes from our \dirname{my-hello}
   39.46 -repository will have no effect on the working directory.
   39.47 -\interaction{tour.merge.pull}
   39.48 -However, the \hgcmd{pull} command says something about ``heads''.  
   39.49 -
   39.50 -\subsection{Head changesets}
   39.51 -
   39.52 -A head is a change that has no descendants, or children, as they're
   39.53 -also known.  The tip revision is thus a head, because the newest
   39.54 -revision in a repository doesn't have any children, but a repository
   39.55 -can contain more than one head.
   39.56 -
   39.57 -\begin{figure}[ht]
   39.58 -  \centering
   39.59 -  \grafix{tour-merge-pull}
   39.60 -  \caption{Repository contents after pulling from \dirname{my-hello} into
   39.61 -    \dirname{my-new-hello}}
   39.62 -  \label{fig:tour-merge:pull}
   39.63 -\end{figure}
   39.64 -
   39.65 -In figure~\ref{fig:tour-merge:pull}, you can see the effect of the
   39.66 -pull from \dirname{my-hello} into \dirname{my-new-hello}.  The history
   39.67 -that was already present in \dirname{my-new-hello} is untouched, but a
   39.68 -new revision has been added.  By referring to
   39.69 -figure~\ref{fig:tour-merge:sep-repos}, we can see that the
   39.70 -\emph{changeset ID} remains the same in the new repository, but the
   39.71 -\emph{revision number} has changed.  (This, incidentally, is a fine
   39.72 -example of why it's not safe to use revision numbers when discussing
   39.73 -changesets.)  We can view the heads in a repository using the
   39.74 -\hgcmd{heads} command.
   39.75 -\interaction{tour.merge.heads}
   39.76 -
   39.77 -\subsection{Performing the merge}
   39.78 -
   39.79 -What happens if we try to use the normal \hgcmd{update} command to
   39.80 -update to the new tip?
   39.81 -\interaction{tour.merge.update}
   39.82 -Mercurial is telling us that the \hgcmd{update} command won't do a
   39.83 -merge; it won't update the working directory when it thinks we might
   39.84 -be wanting to do a merge, unless we force it to do so.  Instead, we
   39.85 -use the \hgcmd{merge} command to merge the two heads.
   39.86 -\interaction{tour.merge.merge}
   39.87 -
   39.88 -\begin{figure}[ht]
   39.89 -  \centering
   39.90 -  \grafix{tour-merge-merge}
   39.91 -  \caption{Working directory and repository during merge, and
   39.92 -    following commit}
   39.93 -  \label{fig:tour-merge:merge}
   39.94 -\end{figure}
   39.95 -
   39.96 -This updates the working directory so that it contains changes from
   39.97 -\emph{both} heads, which is reflected in both the output of
   39.98 -\hgcmd{parents} and the contents of \filename{hello.c}.
   39.99 -\interaction{tour.merge.parents}
  39.100 -
  39.101 -\subsection{Committing the results of the merge}
  39.102 -
  39.103 -Whenever we've done a merge, \hgcmd{parents} will display two parents
  39.104 -until we \hgcmd{commit} the results of the merge.
  39.105 -\interaction{tour.merge.commit}
  39.106 -We now have a new tip revision; notice that it has \emph{both} of
  39.107 -our former heads as its parents.  These are the same revisions that
  39.108 -were previously displayed by \hgcmd{parents}.
  39.109 -\interaction{tour.merge.tip}
  39.110 -In figure~\ref{fig:tour-merge:merge}, you can see a representation of
  39.111 -what happens to the working directory during the merge, and how this
  39.112 -affects the repository when the commit happens.  During the merge, the
  39.113 -working directory has two parent changesets, and these become the
  39.114 -parents of the new changeset.
  39.115 -
  39.116 -\section{Merging conflicting changes}
  39.117 -
  39.118 -Most merges are simple affairs, but sometimes you'll find yourself
  39.119 -merging changes where each modifies the same portions of the same
  39.120 -files.  Unless both modifications are identical, this results in a
  39.121 -\emph{conflict}, where you have to decide how to reconcile the
  39.122 -different changes into something coherent.
  39.123 -
  39.124 -\begin{figure}[ht]
  39.125 -  \centering
  39.126 -  \grafix{tour-merge-conflict}
  39.127 -  \caption{Conflicting changes to a document}
  39.128 -  \label{fig:tour-merge:conflict}
  39.129 -\end{figure}
  39.130 -
  39.131 -Figure~\ref{fig:tour-merge:conflict} illustrates an instance of two
  39.132 -conflicting changes to a document.  We started with a single version
  39.133 -of the file; then we made some changes; while someone else made
  39.134 -different changes to the same text.  Our task in resolving the
  39.135 -conflicting changes is to decide what the file should look like.
  39.136 -
  39.137 -Mercurial doesn't have a built-in facility for handling conflicts.
  39.138 -Instead, it runs an external program called \command{hgmerge}.  This
  39.139 -is a shell script that is bundled with Mercurial; you can change it to
  39.140 -behave however you please.  What it does by default is try to find one
  39.141 -of several different merging tools that are likely to be installed on
  39.142 -your system.  It first tries a few fully automatic merging tools; if
  39.143 -these don't succeed (because the resolution process requires human
  39.144 -guidance) or aren't present, the script tries a few different
  39.145 -graphical merging tools.
  39.146 -
  39.147 -It's also possible to get Mercurial to run another program or script
  39.148 -instead of \command{hgmerge}, by setting the \envar{HGMERGE}
  39.149 -environment variable to the name of your preferred program.
  39.150 -
  39.151 -\subsection{Using a graphical merge tool}
  39.152 -
  39.153 -My preferred graphical merge tool is \command{kdiff3}, which I'll use
  39.154 -to describe the features that are common to graphical file merging
  39.155 -tools.  You can see a screenshot of \command{kdiff3} in action in
  39.156 -figure~\ref{fig:tour-merge:kdiff3}.  The kind of merge it is
  39.157 -performing is called a \emph{three-way merge}, because there are three
  39.158 -different versions of the file of interest to us.  The tool thus
  39.159 -splits the upper portion of the window into three panes:
  39.160 -\begin{itemize}
  39.161 -\item At the left is the \emph{base} version of the file, i.e.~the
  39.162 -  most recent version from which the two versions we're trying to
  39.163 -  merge are descended.
  39.164 -\item In the middle is ``our'' version of the file, with the contents
  39.165 -  that we modified.
  39.166 -\item On the right is ``their'' version of the file, the one that
  39.167 -  from the changeset that we're trying to merge with.
  39.168 -\end{itemize}
  39.169 -In the pane below these is the current \emph{result} of the merge.
  39.170 -Our task is to replace all of the red text, which indicates unresolved
  39.171 -conflicts, with some sensible merger of the ``ours'' and ``theirs''
  39.172 -versions of the file.
  39.173 -
  39.174 -All four of these panes are \emph{locked together}; if we scroll
  39.175 -vertically or horizontally in any of them, the others are updated to
  39.176 -display the corresponding sections of their respective files.
  39.177 -
  39.178 -\begin{figure}[ht]
  39.179 -  \centering
  39.180 -  \grafix{kdiff3}
  39.181 -  \caption{Using \command{kdiff3} to merge versions of a file}
  39.182 -  \label{fig:tour-merge:kdiff3}
  39.183 -\end{figure}
  39.184 -
  39.185 -For each conflicting portion of the file, we can choose to resolve
  39.186 -the conflict using some combination of text from the base version,
  39.187 -ours, or theirs.  We can also manually edit the merged file at any
  39.188 -time, in case we need to make further modifications.
  39.189 -
  39.190 -There are \emph{many} file merging tools available, too many to cover
  39.191 -here.  They vary in which platforms they are available for, and in
  39.192 -their particular strengths and weaknesses.  Most are tuned for merging
  39.193 -files containing plain text, while a few are aimed at specialised file
  39.194 -formats (generally XML).
  39.195 -
  39.196 -\subsection{A worked example}
  39.197 -
  39.198 -In this example, we will reproduce the file modification history of
  39.199 -figure~\ref{fig:tour-merge:conflict} above.  Let's begin by creating a
  39.200 -repository with a base version of our document.
  39.201 -\interaction{tour-merge-conflict.wife}
  39.202 -We'll clone the repository and make a change to the file.
  39.203 -\interaction{tour-merge-conflict.cousin}
  39.204 -And another clone, to simulate someone else making a change to the
  39.205 -file.  (This hints at the idea that it's not all that unusual to merge
  39.206 -with yourself when you isolate tasks in separate repositories, and
  39.207 -indeed to find and resolve conflicts while doing so.)
  39.208 -\interaction{tour-merge-conflict.son}
  39.209 -Having created two different versions of the file, we'll set up an
  39.210 -environment suitable for running our merge.
  39.211 -\interaction{tour-merge-conflict.pull}
  39.212 -
  39.213 -In this example, I won't use Mercurial's normal \command{hgmerge}
  39.214 -program to do the merge, because it would drop my nice automated
  39.215 -example-running tool into a graphical user interface.  Instead, I'll
  39.216 -set \envar{HGMERGE} to tell Mercurial to use the non-interactive
  39.217 -\command{merge} command.  This is bundled with many Unix-like systems.
  39.218 -If you're following this example on your computer, don't bother
  39.219 -setting \envar{HGMERGE}.
  39.220 -\interaction{tour-merge-conflict.merge}
  39.221 -Because \command{merge} can't resolve the conflicting changes, it
  39.222 -leaves \emph{merge markers} inside the file that has conflicts,
  39.223 -indicating which lines have conflicts, and whether they came from our
  39.224 -version of the file or theirs.
  39.225 -
  39.226 -Mercurial can tell from the way \command{merge} exits that it wasn't
  39.227 -able to merge successfully, so it tells us what commands we'll need to
  39.228 -run if we want to redo the merging operation.  This could be useful
  39.229 -if, for example, we were running a graphical merge tool and quit
  39.230 -because we were confused or realised we had made a mistake.
  39.231 -
  39.232 -If automatic or manual merges fail, there's nothing to prevent us from
  39.233 -``fixing up'' the affected files ourselves, and committing the results
  39.234 -of our merge:
  39.235 -\interaction{tour-merge-conflict.commit}
  39.236 -
  39.237 -\section{Simplifying the pull-merge-commit sequence}
  39.238 -\label{sec:tour-merge:fetch}
  39.239 -
  39.240 -The process of merging changes as outlined above is straightforward,
  39.241 -but requires running three commands in sequence.
  39.242 -\begin{codesample2}
  39.243 -  hg pull
  39.244 -  hg merge
  39.245 -  hg commit -m 'Merged remote changes'
  39.246 -\end{codesample2}
  39.247 -In the case of the final commit, you also need to enter a commit
  39.248 -message, which is almost always going to be a piece of uninteresting
  39.249 -``boilerplate'' text.
  39.250 -
  39.251 -It would be nice to reduce the number of steps needed, if this were
  39.252 -possible.  Indeed, Mercurial is distributed with an extension called
  39.253 -\hgext{fetch} that does just this.
  39.254 -
  39.255 -Mercurial provides a flexible extension mechanism that lets people
  39.256 -extend its functionality, while keeping the core of Mercurial small
  39.257 -and easy to deal with.  Some extensions add new commands that you can
  39.258 -use from the command line, while others work ``behind the scenes,''
  39.259 -for example adding capabilities to the server.
  39.260 -
  39.261 -The \hgext{fetch} extension adds a new command called, not
  39.262 -surprisingly, \hgcmd{fetch}.  This extension acts as a combination of
  39.263 -\hgcmd{pull}, \hgcmd{update} and \hgcmd{merge}.  It begins by pulling
  39.264 -changes from another repository into the current repository.  If it
  39.265 -finds that the changes added a new head to the repository, it begins a
  39.266 -merge, then commits the result of the merge with an
  39.267 -automatically-generated commit message.  If no new heads were added,
  39.268 -it updates the working directory to the new tip changeset.
  39.269 -
  39.270 -Enabling the \hgext{fetch} extension is easy.  Edit your
  39.271 -\sfilename{.hgrc}, and either go to the \rcsection{extensions} section
  39.272 -or create an \rcsection{extensions} section.  Then add a line that
  39.273 -simply reads ``\Verb+fetch +''.
  39.274 -\begin{codesample2}
  39.275 -  [extensions]
  39.276 -  fetch =
  39.277 -\end{codesample2}
  39.278 -(Normally, on the right-hand side of the ``\texttt{=}'' would appear
  39.279 -the location of the extension, but since the \hgext{fetch} extension
  39.280 -is in the standard distribution, Mercurial knows where to search for
  39.281 -it.)
  39.282 -
  39.283 -%%% Local Variables: 
  39.284 -%%% mode: latex
  39.285 -%%% TeX-master: "00book"
  39.286 -%%% End: 

    40.1 --- a/en/undo.tex	Thu Jan 29 22:47:34 2009 -0800
    40.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
    40.3 @@ -1,767 +0,0 @@
    40.4 -\chapter{Finding and fixing your mistakes}
    40.5 -\label{chap:undo}
    40.6 -
    40.7 -To err might be human, but to really handle the consequences well
    40.8 -takes a top-notch revision control system.  In this chapter, we'll
    40.9 -discuss some of the techniques you can use when you find that a
   40.10 -problem has crept into your project.  Mercurial has some highly
   40.11 -capable features that will help you to isolate the sources of
   40.12 -problems, and to handle them appropriately.
   40.13 -
   40.14 -\section{Erasing local history}
   40.15 -
   40.16 -\subsection{The accidental commit}
   40.17 -
   40.18 -I have the occasional but persistent problem of typing rather more
   40.19 -quickly than I can think, which sometimes results in me committing a
   40.20 -changeset that is either incomplete or plain wrong.  In my case, the
   40.21 -usual kind of incomplete changeset is one in which I've created a new
   40.22 -source file, but forgotten to \hgcmd{add} it.  A ``plain wrong''
   40.23 -changeset is not as common, but no less annoying.
   40.24 -
   40.25 -\subsection{Rolling back a transaction}
   40.26 -\label{sec:undo:rollback}
   40.27 -
   40.28 -In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats
   40.29 -each modification of a repository as a \emph{transaction}.  Every time
   40.30 -you commit a changeset or pull changes from another repository,
   40.31 -Mercurial remembers what you did.  You can undo, or \emph{roll back},
   40.32 -exactly one of these actions using the \hgcmd{rollback} command.  (See
   40.33 -section~\ref{sec:undo:rollback-after-push} for an important caveat
   40.34 -about the use of this command.)
   40.35 -
   40.36 -Here's a mistake that I often find myself making: committing a change
   40.37 -in which I've created a new file, but forgotten to \hgcmd{add} it.
   40.38 -\interaction{rollback.commit}
   40.39 -Looking at the output of \hgcmd{status} after the commit immediately
   40.40 -confirms the error.
   40.41 -\interaction{rollback.status}
   40.42 -The commit captured the changes to the file \filename{a}, but not the
   40.43 -new file \filename{b}.  If I were to push this changeset to a
   40.44 -repository that I shared with a colleague, the chances are high that
   40.45 -something in \filename{a} would refer to \filename{b}, which would not
   40.46 -be present in their repository when they pulled my changes.  I would
   40.47 -thus become the object of some indignation.
   40.48 -
   40.49 -However, luck is with me---I've caught my error before I pushed the
   40.50 -changeset.  I use the \hgcmd{rollback} command, and Mercurial makes
   40.51 -that last changeset vanish.
   40.52 -\interaction{rollback.rollback}
   40.53 -Notice that the changeset is no longer present in the repository's
   40.54 -history, and the working directory once again thinks that the file
   40.55 -\filename{a} is modified.  The commit and rollback have left the
   40.56 -working directory exactly as it was prior to the commit; the changeset
   40.57 -has been completely erased.  I can now safely \hgcmd{add} the file
   40.58 -\filename{b}, and rerun my commit.
   40.59 -\interaction{rollback.add}
   40.60 -
   40.61 -\subsection{The erroneous pull}
   40.62 -
   40.63 -It's common practice with Mercurial to maintain separate development
   40.64 -branches of a project in different repositories.  Your development
   40.65 -team might have one shared repository for your project's ``0.9''
   40.66 -release, and another, containing different changes, for the ``1.0''
   40.67 -release.
   40.68 -
   40.69 -Given this, you can imagine that the consequences could be messy if
   40.70 -you had a local ``0.9'' repository, and accidentally pulled changes
   40.71 -from the shared ``1.0'' repository into it.  At worst, you could be
   40.72 -paying insufficient attention, and push those changes into the shared
   40.73 -``0.9'' tree, confusing your entire team (but don't worry, we'll
   40.74 -return to this horror scenario later).  However, it's more likely that
   40.75 -you'll notice immediately, because Mercurial will display the URL it's
   40.76 -pulling from, or you will see it pull a suspiciously large number of
   40.77 -changes into the repository.
   40.78 -
   40.79 -The \hgcmd{rollback} command will work nicely to expunge all of the
   40.80 -changesets that you just pulled.  Mercurial groups all changes from
   40.81 -one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is
   40.82 -all you need to undo this mistake.
   40.83 -
   40.84 -\subsection{Rolling back is useless once you've pushed}
   40.85 -\label{sec:undo:rollback-after-push}
   40.86 -
   40.87 -The value of the \hgcmd{rollback} command drops to zero once you've
   40.88 -pushed your changes to another repository.  Rolling back a change
   40.89 -makes it disappear entirely, but \emph{only} in the repository in
   40.90 -which you perform the \hgcmd{rollback}.  Because a rollback eliminates
   40.91 -history, there's no way for the disappearance of a change to propagate
   40.92 -between repositories.
   40.93 -
   40.94 -If you've pushed a change to another repository---particularly if it's
   40.95 -a shared repository---it has essentially ``escaped into the wild,''
   40.96 -and you'll have to recover from your mistake in a different way.  What
   40.97 -will happen if you push a changeset somewhere, then roll it back, then
   40.98 -pull from the repository you pushed to, is that the changeset will
   40.99 -reappear in your repository.
  40.100 -
  40.101 -(If you absolutely know for sure that the change you want to roll back
  40.102 -is the most recent change in the repository that you pushed to,
  40.103 -\emph{and} you know that nobody else could have pulled it from that
  40.104 -repository, you can roll back the changeset there, too, but you really
  40.105 -should really not rely on this working reliably.  If you do this,
  40.106 -sooner or later a change really will make it into a repository that
  40.107 -you don't directly control (or have forgotten about), and come back to
  40.108 -bite you.)
  40.109 -
  40.110 -\subsection{You can only roll back once}
  40.111 -
  40.112 -Mercurial stores exactly one transaction in its transaction log; that
  40.113 -transaction is the most recent one that occurred in the repository.
  40.114 -This means that you can only roll back one transaction.  If you expect
  40.115 -to be able to roll back one transaction, then its predecessor, this is
  40.116 -not the behaviour you will get.
  40.117 -\interaction{rollback.twice}
  40.118 -Once you've rolled back one transaction in a repository, you can't
  40.119 -roll back again in that repository until you perform another commit or
  40.120 -pull.
  40.121 -
  40.122 -\section{Reverting the mistaken change}
  40.123 -
  40.124 -If you make a modification to a file, and decide that you really
  40.125 -didn't want to change the file at all, and you haven't yet committed
  40.126 -your changes, the \hgcmd{revert} command is the one you'll need.  It
  40.127 -looks at the changeset that's the parent of the working directory, and
  40.128 -restores the contents of the file to their state as of that changeset.
  40.129 -(That's a long-winded way of saying that, in the normal case, it
  40.130 -undoes your modifications.)
  40.131 -
  40.132 -Let's illustrate how the \hgcmd{revert} command works with yet another
  40.133 -small example.  We'll begin by modifying a file that Mercurial is
  40.134 -already tracking.
  40.135 -\interaction{daily.revert.modify}
  40.136 -If we don't want that change, we can simply \hgcmd{revert} the file.
  40.137 -\interaction{daily.revert.unmodify}
  40.138 -The \hgcmd{revert} command provides us with an extra degree of safety
  40.139 -by saving our modified file with a \filename{.orig} extension.
  40.140 -\interaction{daily.revert.status}
  40.141 -
  40.142 -Here is a summary of the cases that the \hgcmd{revert} command can
  40.143 -deal with.  We will describe each of these in more detail in the
  40.144 -section that follows.
  40.145 -\begin{itemize}
  40.146 -\item If you modify a file, it will restore the file to its unmodified
  40.147 -  state.
  40.148 -\item If you \hgcmd{add} a file, it will undo the ``added'' state of
  40.149 -  the file, but leave the file itself untouched.
  40.150 -\item If you delete a file without telling Mercurial, it will restore
  40.151 -  the file to its unmodified contents.
  40.152 -\item If you use the \hgcmd{remove} command to remove a file, it will
  40.153 -  undo the ``removed'' state of the file, and restore the file to its
  40.154 -  unmodified contents.
  40.155 -\end{itemize}
  40.156 -
  40.157 -\subsection{File management errors}
  40.158 -\label{sec:undo:mgmt}
  40.159 -
  40.160 -The \hgcmd{revert} command is useful for more than just modified
  40.161 -files.  It lets you reverse the results of all of Mercurial's file
  40.162 -management commands---\hgcmd{add}, \hgcmd{remove}, and so on.
  40.163 -
  40.164 -If you \hgcmd{add} a file, then decide that in fact you don't want
  40.165 -Mercurial to track it, use \hgcmd{revert} to undo the add.  Don't
  40.166 -worry; Mercurial will not modify the file in any way.  It will just
  40.167 -``unmark'' the file.
  40.168 -\interaction{daily.revert.add}
  40.169 -
  40.170 -Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use
  40.171 -\hgcmd{revert} to restore it to the contents it had as of the parent
  40.172 -of the working directory.
  40.173 -\interaction{daily.revert.remove}
  40.174 -This works just as well for a file that you deleted by hand, without
  40.175 -telling Mercurial (recall that in Mercurial terminology, this kind of
  40.176 -file is called ``missing'').
  40.177 -\interaction{daily.revert.missing}
  40.178 -
  40.179 -If you revert a \hgcmd{copy}, the copied-to file remains in your
  40.180 -working directory afterwards, untracked.  Since a copy doesn't affect
  40.181 -the copied-from file in any way, Mercurial doesn't do anything with
  40.182 -the copied-from file.
  40.183 -\interaction{daily.revert.copy}
  40.184 -
  40.185 -\subsubsection{A slightly special case: reverting a rename}
  40.186 -
  40.187 -If you \hgcmd{rename} a file, there is one small detail that
  40.188 -you should remember.  When you \hgcmd{revert} a rename, it's not
  40.189 -enough to provide the name of the renamed-to file, as you can see
  40.190 -here.
  40.191 -\interaction{daily.revert.rename}
  40.192 -As you can see from the output of \hgcmd{status}, the renamed-to file
  40.193 -is no longer identified as added, but the renamed-\emph{from} file is
  40.194 -still removed!  This is counter-intuitive (at least to me), but at
  40.195 -least it's easy to deal with.
  40.196 -\interaction{daily.revert.rename-orig}
  40.197 -So remember, to revert a \hgcmd{rename}, you must provide \emph{both}
  40.198 -the source and destination names.  
  40.199 -
  40.200 -% TODO: the output doesn't look like it will be removed!
  40.201 -
  40.202 -(By the way, if you rename a file, then modify the renamed-to file,
  40.203 -then revert both components of the rename, when Mercurial restores the
  40.204 -file that was removed as part of the rename, it will be unmodified.
  40.205 -If you need the modifications in the renamed-to file to show up in the
  40.206 -renamed-from file, don't forget to copy them over.)
  40.207 -
  40.208 -These fiddly aspects of reverting a rename arguably constitute a small
  40.209 -bug in Mercurial.
  40.210 -
  40.211 -\section{Dealing with committed changes}
  40.212 -
  40.213 -Consider a case where you have committed a change $a$, and another
  40.214 -change $b$ on top of it; you then realise that change $a$ was
  40.215 -incorrect.  Mercurial lets you ``back out'' an entire changeset
  40.216 -automatically, and building blocks that let you reverse part of a
  40.217 -changeset by hand.
  40.218 -
  40.219 -Before you read this section, here's something to keep in mind: the
  40.220 -\hgcmd{backout} command undoes changes by \emph{adding} history, not
  40.221 -by modifying or erasing it.  It's the right tool to use if you're
  40.222 -fixing bugs, but not if you're trying to undo some change that has
  40.223 -catastrophic consequences.  To deal with those, see
  40.224 -section~\ref{sec:undo:aaaiiieee}.
  40.225 -
  40.226 -\subsection{Backing out a changeset}
  40.227 -
  40.228 -The \hgcmd{backout} command lets you ``undo'' the effects of an entire
  40.229 -changeset in an automated fashion.  Because Mercurial's history is
  40.230 -immutable, this command \emph{does not} get rid of the changeset you
  40.231 -want to undo.  Instead, it creates a new changeset that
  40.232 -\emph{reverses} the effect of the to-be-undone changeset.
  40.233 -
  40.234 -The operation of the \hgcmd{backout} command is a little intricate, so
  40.235 -let's illustrate it with some examples.  First, we'll create a
  40.236 -repository with some simple changes.
  40.237 -\interaction{backout.init}
  40.238 -
  40.239 -The \hgcmd{backout} command takes a single changeset ID as its
  40.240 -argument; this is the changeset to back out.  Normally,
  40.241 -\hgcmd{backout} will drop you into a text editor to write a commit
  40.242 -message, so you can record why you're backing the change out.  In this
  40.243 -example, we provide a commit message on the command line using the
  40.244 -\hgopt{backout}{-m} option.
  40.245 -
  40.246 -\subsection{Backing out the tip changeset}
  40.247 -
  40.248 -We're going to start by backing out the last changeset we committed.
  40.249 -\interaction{backout.simple}
  40.250 -You can see that the second line from \filename{myfile} is no longer
  40.251 -present.  Taking a look at the output of \hgcmd{log} gives us an idea
  40.252 -of what the \hgcmd{backout} command has done.
  40.253 -\interaction{backout.simple.log}
  40.254 -Notice that the new changeset that \hgcmd{backout} has created is a
  40.255 -child of the changeset we backed out.  It's easier to see this in
  40.256 -figure~\ref{fig:undo:backout}, which presents a graphical view of the
  40.257 -change history.  As you can see, the history is nice and linear.
  40.258 -
  40.259 -\begin{figure}[htb]
  40.260 -  \centering
  40.261 -  \grafix{undo-simple}
  40.262 -  \caption{Backing out a change using the \hgcmd{backout} command}
  40.263 -  \label{fig:undo:backout}
  40.264 -\end{figure}
  40.265 -
  40.266 -\subsection{Backing out a non-tip change}
  40.267 -
  40.268 -If you want to back out a change other than the last one you
  40.269 -committed, pass the \hgopt{backout}{--merge} option to the
  40.270 -\hgcmd{backout} command.
  40.271 -\interaction{backout.non-tip.clone}
  40.272 -This makes backing out any changeset a ``one-shot'' operation that's
  40.273 -usually simple and fast.
  40.274 -\interaction{backout.non-tip.backout}
  40.275 -
  40.276 -If you take a look at the contents of \filename{myfile} after the
  40.277 -backout finishes, you'll see that the first and third changes are
  40.278 -present, but not the second.
  40.279 -\interaction{backout.non-tip.cat}
  40.280 -
  40.281 -As the graphical history in figure~\ref{fig:undo:backout-non-tip}
  40.282 -illustrates, Mercurial actually commits \emph{two} changes in this
  40.283 -kind of situation (the box-shaped nodes are the ones that Mercurial
  40.284 -commits automatically).  Before Mercurial begins the backout process,
  40.285 -it first remembers what the current parent of the working directory
  40.286 -is.  It then backs out the target changeset, and commits that as a
  40.287 -changeset.  Finally, it merges back to the previous parent of the
  40.288 -working directory, and commits the result of the merge.
  40.289 -
  40.290 -% TODO: to me it looks like mercurial doesn't commit the second merge automatically!
  40.291 -
  40.292 -\begin{figure}[htb]
  40.293 -  \centering
  40.294 -  \grafix{undo-non-tip}
  40.295 -  \caption{Automated backout of a non-tip change using the \hgcmd{backout} command}
  40.296 -  \label{fig:undo:backout-non-tip}
  40.297 -\end{figure}
  40.298 -
  40.299 -The result is that you end up ``back where you were'', only with some
  40.300 -extra history that undoes the effect of the changeset you wanted to
  40.301 -back out.
  40.302 -
  40.303 -\subsubsection{Always use the \hgopt{backout}{--merge} option}
  40.304 -
  40.305 -In fact, since the \hgopt{backout}{--merge} option will do the ``right
  40.306 -thing'' whether or not the changeset you're backing out is the tip
  40.307 -(i.e.~it won't try to merge if it's backing out the tip, since there's
  40.308 -no need), you should \emph{always} use this option when you run the
  40.309 -\hgcmd{backout} command.
  40.310 -
  40.311 -\subsection{Gaining more control of the backout process}
  40.312 -
  40.313 -While I've recommended that you always use the
  40.314 -\hgopt{backout}{--merge} option when backing out a change, the
  40.315 -\hgcmd{backout} command lets you decide how to merge a backout
  40.316 -changeset.  Taking control of the backout process by hand is something
  40.317 -you will rarely need to do, but it can be useful to understand what
  40.318 -the \hgcmd{backout} command is doing for you automatically.  To
  40.319 -illustrate this, let's clone our first repository, but omit the
  40.320 -backout change that it contains.
  40.321 -
  40.322 -\interaction{backout.manual.clone}
  40.323 -As with our earlier example, We'll commit a third changeset, then back
  40.324 -out its parent, and see what happens.
  40.325 -\interaction{backout.manual.backout} 
  40.326 -Our new changeset is again a descendant of the changeset we backout
  40.327 -out; it's thus a new head, \emph{not} a descendant of the changeset
  40.328 -that was the tip.  The \hgcmd{backout} command was quite explicit in
  40.329 -telling us this.
  40.330 -\interaction{backout.manual.log}
  40.331 -
  40.332 -Again, it's easier to see what has happened by looking at a graph of
  40.333 -the revision history, in figure~\ref{fig:undo:backout-manual}.  This
  40.334 -makes it clear that when we use \hgcmd{backout} to back out a change
  40.335 -other than the tip, Mercurial adds a new head to the repository (the
  40.336 -change it committed is box-shaped).
  40.337 -
  40.338 -\begin{figure}[htb]
  40.339 -  \centering
  40.340 -  \grafix{undo-manual}
  40.341 -  \caption{Backing out a change using the \hgcmd{backout} command}
  40.342 -  \label{fig:undo:backout-manual}
  40.343 -\end{figure}
  40.344 -
  40.345 -After the \hgcmd{backout} command has completed, it leaves the new
  40.346 -``backout'' changeset as the parent of the working directory.
  40.347 -\interaction{backout.manual.parents}
  40.348 -Now we have two isolated sets of changes.
  40.349 -\interaction{backout.manual.heads}
  40.350 -
  40.351 -Let's think about what we expect to see as the contents of
  40.352 -\filename{myfile} now.  The first change should be present, because
  40.353 -we've never backed it out.  The second change should be missing, as
  40.354 -that's the change we backed out.  Since the history graph shows the
  40.355 -third change as a separate head, we \emph{don't} expect to see the
  40.356 -third change present in \filename{myfile}.
  40.357 -\interaction{backout.manual.cat}
  40.358 -To get the third change back into the file, we just do a normal merge
  40.359 -of our two heads.
  40.360 -\interaction{backout.manual.merge}
  40.361 -Afterwards, the graphical history of our repository looks like
  40.362 -figure~\ref{fig:undo:backout-manual-merge}.
  40.363 -
  40.364 -\begin{figure}[htb]
  40.365 -  \centering
  40.366 -  \grafix{undo-manual-merge}
  40.367 -  \caption{Manually merging a backout change}
  40.368 -  \label{fig:undo:backout-manual-merge}
  40.369 -\end{figure}
  40.370 -
  40.371 -\subsection{Why \hgcmd{backout} works as it does}
  40.372 -
  40.373 -Here's a brief description of how the \hgcmd{backout} command works.
  40.374 -\begin{enumerate}
  40.375 -\item It ensures that the working directory is ``clean'', i.e.~that
  40.376 -  the output of \hgcmd{status} would be empty.
  40.377 -\item It remembers the current parent of the working directory.  Let's
  40.378 -  call this changeset \texttt{orig}
  40.379 -\item It does the equivalent of a \hgcmd{update} to sync the working
  40.380 -  directory to the changeset you want to back out.  Let's call this
  40.381 -  changeset \texttt{backout}
  40.382 -\item It finds the parent of that changeset.  Let's call that
  40.383 -  changeset \texttt{parent}.
  40.384 -\item For each file that the \texttt{backout} changeset affected, it
  40.385 -  does the equivalent of a \hgcmdargs{revert}{-r parent} on that file,
  40.386 -  to restore it to the contents it had before that changeset was
  40.387 -  committed.
  40.388 -\item It commits the result as a new changeset.  This changeset has
  40.389 -  \texttt{backout} as its parent.
  40.390 -\item If you specify \hgopt{backout}{--merge} on the command line, it
  40.391 -  merges with \texttt{orig}, and commits the result of the merge.
  40.392 -\end{enumerate}
  40.393 -
  40.394 -An alternative way to implement the \hgcmd{backout} command would be
  40.395 -to \hgcmd{export} the to-be-backed-out changeset as a diff, then use
  40.396 -the \cmdopt{patch}{--reverse} option to the \command{patch} command to
  40.397 -reverse the effect of the change without fiddling with the working
  40.398 -directory.  This sounds much simpler, but it would not work nearly as
  40.399 -well.
  40.400 -
  40.401 -The reason that \hgcmd{backout} does an update, a commit, a merge, and
  40.402 -another commit is to give the merge machinery the best chance to do a
  40.403 -good job when dealing with all the changes \emph{between} the change
  40.404 -you're backing out and the current tip.  
  40.405 -
  40.406 -If you're backing out a changeset that's~100 revisions back in your
  40.407 -project's history, the chances that the \command{patch} command will
  40.408 -be able to apply a reverse diff cleanly are not good, because
  40.409 -intervening changes are likely to have ``broken the context'' that
  40.410 -\command{patch} uses to determine whether it can apply a patch (if
  40.411 -this sounds like gibberish, see \ref{sec:mq:patch} for a
  40.412 -discussion of the \command{patch} command).  Also, Mercurial's merge
  40.413 -machinery will handle files and directories being renamed, permission
  40.414 -changes, and modifications to binary files, none of which
  40.415 -\command{patch} can deal with.
  40.416 -
  40.417 -\section{Changes that should never have been}
  40.418 -\label{sec:undo:aaaiiieee}
  40.419 -
  40.420 -Most of the time, the \hgcmd{backout} command is exactly what you need
  40.421 -if you want to undo the effects of a change.  It leaves a permanent
  40.422 -record of exactly what you did, both when committing the original
  40.423 -changeset and when you cleaned up after it.
  40.424 -
  40.425 -On rare occasions, though, you may find that you've committed a change
  40.426 -that really should not be present in the repository at all.  For
  40.427 -example, it would be very unusual, and usually considered a mistake,
  40.428 -to commit a software project's object files as well as its source
  40.429 -files.  Object files have almost no intrinsic value, and they're
  40.430 -\emph{big}, so they increase the size of the repository and the amount
  40.431 -of time it takes to clone or pull changes.
  40.432 -
  40.433 -Before I discuss the options that you have if you commit a ``brown
  40.434 -paper bag'' change (the kind that's so bad that you want to pull a
  40.435 -brown paper bag over your head), let me first discuss some approaches
  40.436 -that probably won't work.
  40.437 -
  40.438 -Since Mercurial treats history as accumulative---every change builds
  40.439 -on top of all changes that preceded it---you generally can't just make
  40.440 -disastrous changes disappear.  The one exception is when you've just
  40.441 -committed a change, and it hasn't been pushed or pulled into another
  40.442 -repository.  That's when you can safely use the \hgcmd{rollback}
  40.443 -command, as I detailed in section~\ref{sec:undo:rollback}.
  40.444 -
  40.445 -After you've pushed a bad change to another repository, you
  40.446 -\emph{could} still use \hgcmd{rollback} to make your local copy of the
  40.447 -change disappear, but it won't have the consequences you want.  The
  40.448 -change will still be present in the remote repository, so it will
  40.449 -reappear in your local repository the next time you pull.
  40.450 -
  40.451 -If a situation like this arises, and you know which repositories your
  40.452 -bad change has propagated into, you can \emph{try} to get rid of the
  40.453 -changeefrom \emph{every} one of those repositories.  This is, of
  40.454 -course, not a satisfactory solution: if you miss even a single
  40.455 -repository while you're expunging, the change is still ``in the
  40.456 -wild'', and could propagate further.
  40.457 -
  40.458 -If you've committed one or more changes \emph{after} the change that
  40.459 -you'd like to see disappear, your options are further reduced.
  40.460 -Mercurial doesn't provide a way to ``punch a hole'' in history,
  40.461 -leaving changesets intact.
  40.462 -
  40.463 -XXX This needs filling out.  The \texttt{hg-replay} script in the
  40.464 -\texttt{examples} directory works, but doesn't handle merge
  40.465 -changesets.  Kind of an important omission.
  40.466 -
  40.467 -\subsection{Protect yourself from ``escaped'' changes}
  40.468 -
  40.469 -If you've committed some changes to your local repository and they've
  40.470 -been pushed or pulled somewhere else, this isn't necessarily a
  40.471 -disaster.  You can protect yourself ahead of time against some classes
  40.472 -of bad changeset.  This is particularly easy if your team usually
  40.473 -pulls changes from a central repository.
  40.474 -
  40.475 -By configuring some hooks on that repository to validate incoming
  40.476 -changesets (see chapter~\ref{chap:hook}), you can automatically
  40.477 -prevent some kinds of bad changeset from being pushed to the central
  40.478 -repository at all.  With such a configuration in place, some kinds of
  40.479 -bad changeset will naturally tend to ``die out'' because they can't
  40.480 -propagate into the central repository.  Better yet, this happens
  40.481 -without any need for explicit intervention.
  40.482 -
  40.483 -For instance, an incoming change hook that verifies that a changeset
  40.484 -will actually compile can prevent people from inadvertantly ``breaking
  40.485 -the build''.
  40.486 -
  40.487 -\section{Finding the source of a bug}
  40.488 -\label{sec:undo:bisect}
  40.489 -
  40.490 -While it's all very well to be able to back out a changeset that
  40.491 -introduced a bug, this requires that you know which changeset to back
  40.492 -out.  Mercurial provides an invaluable command, called
  40.493 -\hgcmd{bisect}, that helps you to automate this process and accomplish
  40.494 -it very efficiently.
  40.495 -
  40.496 -The idea behind the \hgcmd{bisect} command is that a changeset has
  40.497 -introduced some change of behaviour that you can identify with a
  40.498 -simple binary test.  You don't know which piece of code introduced the
  40.499 -change, but you know how to test for the presence of the bug.  The
  40.500 -\hgcmd{bisect} command uses your test to direct its search for the
  40.501 -changeset that introduced the code that caused the bug.
  40.502 -
  40.503 -Here are a few scenarios to help you understand how you might apply
  40.504 -this command.
  40.505 -\begin{itemize}
  40.506 -\item The most recent version of your software has a bug that you
  40.507 -  remember wasn't present a few weeks ago, but you don't know when it
  40.508 -  was introduced.  Here, your binary test checks for the presence of
  40.509 -  that bug.
  40.510 -\item You fixed a bug in a rush, and now it's time to close the entry
  40.511 -  in your team's bug database.  The bug database requires a changeset
  40.512 -  ID when you close an entry, but you don't remember which changeset
  40.513 -  you fixed the bug in.  Once again, your binary test checks for the
  40.514 -  presence of the bug.
  40.515 -\item Your software works correctly, but runs~15\% slower than the
  40.516 -  last time you measured it.  You want to know which changeset
  40.517 -  introduced the performance regression.  In this case, your binary
  40.518 -  test measures the performance of your software, to see whether it's
  40.519 -  ``fast'' or ``slow''.
  40.520 -\item The sizes of the components of your project that you ship
  40.521 -  exploded recently, and you suspect that something changed in the way
  40.522 -  you build your project.
  40.523 -\end{itemize}
  40.524 -
  40.525 -From these examples, it should be clear that the \hgcmd{bisect}
  40.526 -command is not useful only for finding the sources of bugs.  You can
  40.527 -use it to find any ``emergent property'' of a repository (anything
  40.528 -that you can't find from a simple text search of the files in the
  40.529 -tree) for which you can write a binary test.
  40.530 -
  40.531 -We'll introduce a little bit of terminology here, just to make it
  40.532 -clear which parts of the search process are your responsibility, and
  40.533 -which are Mercurial's.  A \emph{test} is something that \emph{you} run
  40.534 -when \hgcmd{bisect} chooses a changeset.  A \emph{probe} is what
  40.535 -\hgcmd{bisect} runs to tell whether a revision is good.  Finally,
  40.536 -we'll use the word ``bisect'', as both a noun and a verb, to stand in
  40.537 -for the phrase ``search using the \hgcmd{bisect} command.
  40.538 -
  40.539 -One simple way to automate the searching process would be simply to
  40.540 -probe every changeset.  However, this scales poorly.  If it took ten
  40.541 -minutes to test a single changeset, and you had 10,000 changesets in
  40.542 -your repository, the exhaustive approach would take on average~35
  40.543 -\emph{days} to find the changeset that introduced a bug.  Even if you
  40.544 -knew that the bug was introduced by one of the last 500 changesets,
  40.545 -and limited your search to those, you'd still be looking at over 40
  40.546 -hours to find the changeset that introduced your bug.
  40.547 -
  40.548 -What the \hgcmd{bisect} command does is use its knowledge of the
  40.549 -``shape'' of your project's revision history to perform a search in
  40.550 -time proportional to the \emph{logarithm} of the number of changesets
  40.551 -to check (the kind of search it performs is called a dichotomic
  40.552 -search).  With this approach, searching through 10,000 changesets will
  40.553 -take less than three hours, even at ten minutes per test (the search
  40.554 -will require about 14 tests).  Limit your search to the last hundred
  40.555 -changesets, and it will take only about an hour (roughly seven tests).
  40.556 -
  40.557 -The \hgcmd{bisect} command is aware of the ``branchy'' nature of a
  40.558 -Mercurial project's revision history, so it has no problems dealing
  40.559 -with branches, merges, or multiple heads in a repository.  It can
  40.560 -prune entire branches of history with a single probe, which is how it
  40.561 -operates so efficiently.
  40.562 -
  40.563 -\subsection{Using the \hgcmd{bisect} command}
  40.564 -
  40.565 -Here's an example of \hgcmd{bisect} in action.
  40.566 -
  40.567 -\begin{note}
  40.568 -  In versions 0.9.5 and earlier of Mercurial, \hgcmd{bisect} was not a
  40.569 -  core command: it was distributed with Mercurial as an extension.
  40.570 -  This section describes the built-in command, not the old extension.
  40.571 -\end{note}
  40.572 -
  40.573 -Now let's create a repository, so that we can try out the
  40.574 -\hgcmd{bisect} command in isolation.
  40.575 -\interaction{bisect.init}
  40.576 -We'll simulate a project that has a bug in it in a simple-minded way:
  40.577 -create trivial changes in a loop, and nominate one specific change
  40.578 -that will have the ``bug''.  This loop creates 35 changesets, each
  40.579 -adding a single file to the repository.  We'll represent our ``bug''
  40.580 -with a file that contains the text ``i have a gub''.
  40.581 -\interaction{bisect.commits}
  40.582 -
  40.583 -The next thing that we'd like to do is figure out how to use the
  40.584 -\hgcmd{bisect} command.  We can use Mercurial's normal built-in help
  40.585 -mechanism for this.
  40.586 -\interaction{bisect.help}
  40.587 -
  40.588 -The \hgcmd{bisect} command works in steps.  Each step proceeds as follows.
  40.589 -\begin{enumerate}
  40.590 -\item You run your binary test.
  40.591 -  \begin{itemize}
  40.592 -  \item If the test succeeded, you tell \hgcmd{bisect} by running the
  40.593 -    \hgcmdargs{bisect}{good} command.
  40.594 -  \item If it failed, run the \hgcmdargs{bisect}{--bad} command.
  40.595 -  \end{itemize}
  40.596 -\item The command uses your information to decide which changeset to
  40.597 -  test next.
  40.598 -\item It updates the working directory to that changeset, and the
  40.599 -  process begins again.
  40.600 -\end{enumerate}
  40.601 -The process ends when \hgcmd{bisect} identifies a unique changeset
  40.602 -that marks the point where your test transitioned from ``succeeding''
  40.603 -to ``failing''.
  40.604 -
  40.605 -To start the search, we must run the \hgcmdargs{bisect}{--reset} command.
  40.606 -\interaction{bisect.search.init}
  40.607 -
  40.608 -In our case, the binary test we use is simple: we check to see if any
  40.609 -file in the repository contains the string ``i have a gub''.  If it
  40.610 -does, this changeset contains the change that ``caused the bug''.  By
  40.611 -convention, a changeset that has the property we're searching for is
  40.612 -``bad'', while one that doesn't is ``good''.
  40.613 -
  40.614 -Most of the time, the revision to which the working directory is
  40.615 -synced (usually the tip) already exhibits the problem introduced by
  40.616 -the buggy change, so we'll mark it as ``bad''.
  40.617 -\interaction{bisect.search.bad-init}
  40.618 -
  40.619 -Our next task is to nominate a changeset that we know \emph{doesn't}
  40.620 -have the bug; the \hgcmd{bisect} command will ``bracket'' its search
  40.621 -between the first pair of good and bad changesets.  In our case, we
  40.622 -know that revision~10 didn't have the bug.  (I'll have more words
  40.623 -about choosing the first ``good'' changeset later.)
  40.624 -\interaction{bisect.search.good-init}
  40.625 -
  40.626 -Notice that this command printed some output.
  40.627 -\begin{itemize}
  40.628 -\item It told us how many changesets it must consider before it can
  40.629 -  identify the one that introduced the bug, and how many tests that
  40.630 -  will require.
  40.631 -\item It updated the working directory to the next changeset to test,
  40.632 -  and told us which changeset it's testing.
  40.633 -\end{itemize}
  40.634 -
  40.635 -We now run our test in the working directory.  We use the
  40.636 -\command{grep} command to see if our ``bad'' file is present in the
  40.637 -working directory.  If it is, this revision is bad; if not, this
  40.638 -revision is good.
  40.639 -\interaction{bisect.search.step1}
  40.640 -
  40.641 -This test looks like a perfect candidate for automation, so let's turn
  40.642 -it into a shell function.
  40.643 -\interaction{bisect.search.mytest}
  40.644 -We can now run an entire test step with a single command,
  40.645 -\texttt{mytest}.
  40.646 -\interaction{bisect.search.step2}
  40.647 -A few more invocations of our canned test step command, and we're
  40.648 -done.
  40.649 -\interaction{bisect.search.rest}
  40.650 -
  40.651 -Even though we had~40 changesets to search through, the \hgcmd{bisect}
  40.652 -command let us find the changeset that introduced our ``bug'' with
  40.653 -only five tests.  Because the number of tests that the \hgcmd{bisect}
  40.654 -command performs grows logarithmically with the number of changesets to
  40.655 -search, the advantage that it has over the ``brute force'' search
  40.656 -approach increases with every changeset you add.
  40.657 -
  40.658 -\subsection{Cleaning up after your search}
  40.659 -
  40.660 -When you're finished using the \hgcmd{bisect} command in a
  40.661 -repository, you can use the \hgcmdargs{bisect}{reset} command to drop
  40.662 -the information it was using to drive your search.  The command
  40.663 -doesn't use much space, so it doesn't matter if you forget to run this
  40.664 -command.  However, \hgcmd{bisect} won't let you start a new search in
  40.665 -that repository until you do a \hgcmdargs{bisect}{reset}.
  40.666 -\interaction{bisect.search.reset}
  40.667 -
  40.668 -\section{Tips for finding bugs effectively}
  40.669 -
  40.670 -\subsection{Give consistent input}
  40.671 -
  40.672 -The \hgcmd{bisect} command requires that you correctly report the
  40.673 -result of every test you perform.  If you tell it that a test failed
  40.674 -when it really succeeded, it \emph{might} be able to detect the
  40.675 -inconsistency.  If it can identify an inconsistency in your reports,
  40.676 -it will tell you that a particular changeset is both good and bad.
  40.677 -However, it can't do this perfectly; it's about as likely to report
  40.678 -the wrong changeset as the source of the bug.
  40.679 -
  40.680 -\subsection{Automate as much as possible}
  40.681 -
  40.682 -When I started using the \hgcmd{bisect} command, I tried a few times
  40.683 -to run my tests by hand, on the command line.  This is an approach
  40.684 -that I, at least, am not suited to.  After a few tries, I found that I
  40.685 -was making enough mistakes that I was having to restart my searches
  40.686 -several times before finally getting correct results.
  40.687 -
  40.688 -My initial problems with driving the \hgcmd{bisect} command by hand
  40.689 -occurred even with simple searches on small repositories; if the
  40.690 -problem you're looking for is more subtle, or the number of tests that
  40.691 -\hgcmd{bisect} must perform increases, the likelihood of operator
  40.692 -error ruining the search is much higher.  Once I started automating my
  40.693 -tests, I had much better results.
  40.694 -
  40.695 -The key to automated testing is twofold:
  40.696 -\begin{itemize}
  40.697 -\item always test for the same symptom, and
  40.698 -\item always feed consistent input to the \hgcmd{bisect} command.
  40.699 -\end{itemize}
  40.700 -In my tutorial example above, the \command{grep} command tests for the
  40.701 -symptom, and the \texttt{if} statement takes the result of this check
  40.702 -and ensures that we always feed the same input to the \hgcmd{bisect}
  40.703 -command.  The \texttt{mytest} function marries these together in a
  40.704 -reproducible way, so that every test is uniform and consistent.
  40.705 -
  40.706 -\subsection{Check your results}
  40.707 -
  40.708 -Because the output of a \hgcmd{bisect} search is only as good as the
  40.709 -input you give it, don't take the changeset it reports as the
  40.710 -absolute truth.  A simple way to cross-check its report is to manually
  40.711 -run your test at each of the following changesets:
  40.712 -\begin{itemize}
  40.713 -\item The changeset that it reports as the first bad revision.  Your
  40.714 -  test should still report this as bad.
  40.715 -\item The parent of that changeset (either parent, if it's a merge).
  40.716 -  Your test should report this changeset as good.
  40.717 -\item A child of that changeset.  Your test should report this
  40.718 -  changeset as bad.
  40.719 -\end{itemize}
  40.720 -
  40.721 -\subsection{Beware interference between bugs}
  40.722 -
  40.723 -It's possible that your search for one bug could be disrupted by the
  40.724 -presence of another.  For example, let's say your software crashes at
  40.725 -revision 100, and worked correctly at revision 50.  Unknown to you,
  40.726 -someone else introduced a different crashing bug at revision 60, and
  40.727 -fixed it at revision 80.  This could distort your results in one of
  40.728 -several ways.
  40.729 -
  40.730 -It is possible that this other bug completely ``masks'' yours, which
  40.731 -is to say that it occurs before your bug has a chance to manifest
  40.732 -itself.  If you can't avoid that other bug (for example, it prevents
  40.733 -your project from building), and so can't tell whether your bug is
  40.734 -present in a particular changeset, the \hgcmd{bisect} command cannot
  40.735 -help you directly.  Instead, you can mark a changeset as untested by
  40.736 -running \hgcmdargs{bisect}{--skip}.
  40.737 -
  40.738 -A different problem could arise if your test for a bug's presence is
  40.739 -not specific enough.  If you check for ``my program crashes'', then
  40.740 -both your crashing bug and an unrelated crashing bug that masks it
  40.741 -will look like the same thing, and mislead \hgcmd{bisect}.
  40.742 -
  40.743 -Another useful situation in which to use \hgcmdargs{bisect}{--skip} is
  40.744 -if you can't test a revision because your project was in a broken and
  40.745 -hence untestable state at that revision, perhaps because someone
  40.746 -checked in a change that prevented the project from building.
  40.747 -
  40.748 -\subsection{Bracket your search lazily}
  40.749 -
  40.750 -Choosing the first ``good'' and ``bad'' changesets that will mark the
  40.751 -end points of your search is often easy, but it bears a little
  40.752 -discussion nevertheless.  From the perspective of \hgcmd{bisect}, the
  40.753 -``newest'' changeset is conventionally ``bad'', and the older
  40.754 -changeset is ``good''.
  40.755 -
  40.756 -If you're having trouble remembering when a suitable ``good'' change
  40.757 -was, so that you can tell \hgcmd{bisect}, you could do worse than
  40.758 -testing changesets at random.  Just remember to eliminate contenders
  40.759 -that can't possibly exhibit the bug (perhaps because the feature with
  40.760 -the bug isn't present yet) and those where another problem masks the
  40.761 -bug (as I discussed above).
  40.762 -
  40.763 -Even if you end up ``early'' by thousands of changesets or months of
  40.764 -history, you will only add a handful of tests to the total number that
  40.765 -\hgcmd{bisect} must perform, thanks to its logarithmic behaviour.
  40.766 -
  40.767 -%%% Local Variables: 
  40.768 -%%% mode: latex
  40.769 -%%% TeX-master: "00book"
  40.770 -%%% End:
author	Bryan O'Sullivan <bos@serpentine.com>
date	Thu Jan 29 22:56:27 2009 -0800 (2009-01-29)
parents	bc14f94e726a
children	f72b7e6cbe90
files	en/00book.tex en/Makefile en/appA-cmdref.tex en/appB-mq-ref.tex en/appC-srcinstall.tex en/appD-license.tex en/branch.tex en/ch00-preface.tex en/ch01-intro.tex en/ch02-tour-basic.tex en/ch03-tour-merge.tex en/ch04-concepts.tex en/ch05-daily.tex en/ch06-collab.tex en/ch07-filenames.tex en/ch08-branch.tex en/ch09-undo.tex en/ch10-hook.tex en/ch11-template.tex en/ch12-mq.tex en/ch13-mq-collab.tex en/ch14-hgext.tex en/cmdref.tex en/collab.tex en/concepts.tex en/daily.tex en/filenames.tex en/hgext.tex en/hook.tex en/intro.tex en/license.tex en/mq-collab.tex en/mq-ref.tex en/mq.tex en/preface.tex en/srcinstall.tex en/template.tex en/tour-basic.tex en/tour-merge.tex en/undo.tex