hgbook
changeset 550:5cd47f721686
Rename LaTeX input files to have numeric prefixes
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Thu Jan 29 22:56:27 2009 -0800 (2009-01-29) |
parents | bc14f94e726a |
children | f72b7e6cbe90 |
files | en/00book.tex en/Makefile en/appA-cmdref.tex en/appB-mq-ref.tex en/appC-srcinstall.tex en/appD-license.tex en/branch.tex en/ch00-preface.tex en/ch01-intro.tex en/ch02-tour-basic.tex en/ch03-tour-merge.tex en/ch04-concepts.tex en/ch05-daily.tex en/ch06-collab.tex en/ch07-filenames.tex en/ch08-branch.tex en/ch09-undo.tex en/ch10-hook.tex en/ch11-template.tex en/ch12-mq.tex en/ch13-mq-collab.tex en/ch14-hgext.tex en/cmdref.tex en/collab.tex en/concepts.tex en/daily.tex en/filenames.tex en/hgext.tex en/hook.tex en/intro.tex en/license.tex en/mq-collab.tex en/mq-ref.tex en/mq.tex en/preface.tex en/srcinstall.tex en/template.tex en/tour-basic.tex en/tour-merge.tex en/undo.tex |
line diff
1.1 --- a/en/00book.tex Thu Jan 29 22:47:34 2009 -0800 1.2 +++ b/en/00book.tex Thu Jan 29 22:56:27 2009 -0800 1.3 @@ -40,27 +40,27 @@ 1.4 1.5 \pagenumbering{arabic} 1.6 1.7 -\include{preface} 1.8 -\include{intro} 1.9 -\include{tour-basic} 1.10 -\include{tour-merge} 1.11 -\include{concepts} 1.12 -\include{daily} 1.13 -\include{collab} 1.14 -\include{filenames} 1.15 -\include{branch} 1.16 -\include{undo} 1.17 -\include{hook} 1.18 -\include{template} 1.19 -\include{mq} 1.20 -\include{mq-collab} 1.21 -\include{hgext} 1.22 +\include{ch00-preface} 1.23 +\include{ch01-intro} 1.24 +\include{ch02-tour-basic} 1.25 +\include{ch03-tour-merge} 1.26 +\include{ch04-concepts} 1.27 +\include{ch05-daily} 1.28 +\include{ch06-collab} 1.29 +\include{ch07-filenames} 1.30 +\include{ch08-branch} 1.31 +\include{ch09-undo} 1.32 +\include{ch10-hook} 1.33 +\include{ch11-template} 1.34 +\include{ch12-mq} 1.35 +\include{ch13-mq-collab} 1.36 +\include{ch14-hgext} 1.37 1.38 \appendix 1.39 -\include{cmdref} 1.40 -\include{mq-ref} 1.41 -\include{srcinstall} 1.42 -\include{license} 1.43 +\include{appA-cmdref} 1.44 +\include{appB-mq-ref} 1.45 +\include{appC-srcinstall} 1.46 +\include{appD-license} 1.47 \addcontentsline{toc}{chapter}{Bibliography} 1.48 \bibliographystyle{alpha} 1.49 \bibliography{99book}
2.1 --- a/en/Makefile Thu Jan 29 22:47:34 2009 -0800 2.2 +++ b/en/Makefile Thu Jan 29 22:56:27 2009 -0800 2.3 @@ -4,26 +4,8 @@ 2.4 00book.tex \ 2.5 99book.bib \ 2.6 99defs.tex \ 2.7 - build_id.tex \ 2.8 - branch.tex \ 2.9 - cmdref.tex \ 2.10 - collab.tex \ 2.11 - concepts.tex \ 2.12 - daily.tex \ 2.13 - filenames.tex \ 2.14 - hg_id.tex \ 2.15 - hgext.tex \ 2.16 - hook.tex \ 2.17 - intro.tex \ 2.18 - mq.tex \ 2.19 - mq-collab.tex \ 2.20 - mq-ref.tex \ 2.21 - preface.tex \ 2.22 - srcinstall.tex \ 2.23 - template.tex \ 2.24 - tour-basic.tex \ 2.25 - tour-merge.tex \ 2.26 - undo.tex 2.27 + app*.tex \ 2.28 + ch*.tex 2.29 2.30 image-sources := \ 2.31 feature-branches.dot \
3.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 3.2 +++ b/en/appA-cmdref.tex Thu Jan 29 22:56:27 2009 -0800 3.3 @@ -0,0 +1,176 @@ 3.4 +\chapter{Command reference} 3.5 +\label{cmdref} 3.6 + 3.7 +\cmdref{add}{add files at the next commit} 3.8 +\optref{add}{I}{include} 3.9 +\optref{add}{X}{exclude} 3.10 +\optref{add}{n}{dry-run} 3.11 + 3.12 +\cmdref{diff}{print changes in history or working directory} 3.13 + 3.14 +Show differences between revisions for the specified files or 3.15 +directories, using the unified diff format. For a description of the 3.16 +unified diff format, see section~\ref{sec:mq:patch}. 3.17 + 3.18 +By default, this command does not print diffs for files that Mercurial 3.19 +considers to contain binary data. To control this behaviour, see the 3.20 +\hgopt{diff}{-a} and \hgopt{diff}{--git} options. 3.21 + 3.22 +\subsection{Options} 3.23 + 3.24 +\loptref{diff}{nodates} 3.25 + 3.26 +Omit date and time information when printing diff headers. 3.27 + 3.28 +\optref{diff}{B}{ignore-blank-lines} 3.29 + 3.30 +Do not print changes that only insert or delete blank lines. A line 3.31 +that contains only whitespace is not considered blank. 3.32 + 3.33 +\optref{diff}{I}{include} 3.34 + 3.35 +Include files and directories whose names match the given patterns. 3.36 + 3.37 +\optref{diff}{X}{exclude} 3.38 + 3.39 +Exclude files and directories whose names match the given patterns. 3.40 + 3.41 +\optref{diff}{a}{text} 3.42 + 3.43 +If this option is not specified, \hgcmd{diff} will refuse to print 3.44 +diffs for files that it detects as binary. Specifying \hgopt{diff}{-a} 3.45 +forces \hgcmd{diff} to treat all files as text, and generate diffs for 3.46 +all of them. 3.47 + 3.48 +This option is useful for files that are ``mostly text'' but have a 3.49 +few embedded NUL characters. If you use it on files that contain a 3.50 +lot of binary data, its output will be incomprehensible. 3.51 + 3.52 +\optref{diff}{b}{ignore-space-change} 3.53 + 3.54 +Do not print a line if the only change to that line is in the amount 3.55 +of white space it contains. 3.56 + 3.57 +\optref{diff}{g}{git} 3.58 + 3.59 +Print \command{git}-compatible diffs. XXX reference a format 3.60 +description. 3.61 + 3.62 +\optref{diff}{p}{show-function} 3.63 + 3.64 +Display the name of the enclosing function in a hunk header, using a 3.65 +simple heuristic. This functionality is enabled by default, so the 3.66 +\hgopt{diff}{-p} option has no effect unless you change the value of 3.67 +the \rcitem{diff}{showfunc} config item, as in the following example. 3.68 +\interaction{cmdref.diff-p} 3.69 + 3.70 +\optref{diff}{r}{rev} 3.71 + 3.72 +Specify one or more revisions to compare. The \hgcmd{diff} command 3.73 +accepts up to two \hgopt{diff}{-r} options to specify the revisions to 3.74 +compare. 3.75 + 3.76 +\begin{enumerate} 3.77 +\setcounter{enumi}{0} 3.78 +\item Display the differences between the parent revision of the 3.79 + working directory and the working directory. 3.80 +\item Display the differences between the specified changeset and the 3.81 + working directory. 3.82 +\item Display the differences between the two specified changesets. 3.83 +\end{enumerate} 3.84 + 3.85 +You can specify two revisions using either two \hgopt{diff}{-r} 3.86 +options or revision range notation. For example, the two revision 3.87 +specifications below are equivalent. 3.88 +\begin{codesample2} 3.89 + hg diff -r 10 -r 20 3.90 + hg diff -r10:20 3.91 +\end{codesample2} 3.92 + 3.93 +When you provide two revisions, Mercurial treats the order of those 3.94 +revisions as significant. Thus, \hgcmdargs{diff}{-r10:20} will 3.95 +produce a diff that will transform files from their contents as of 3.96 +revision~10 to their contents as of revision~20, while 3.97 +\hgcmdargs{diff}{-r20:10} means the opposite: the diff that will 3.98 +transform files from their revision~20 contents to their revision~10 3.99 +contents. You cannot reverse the ordering in this way if you are 3.100 +diffing against the working directory. 3.101 + 3.102 +\optref{diff}{w}{ignore-all-space} 3.103 + 3.104 +\cmdref{version}{print version and copyright information} 3.105 + 3.106 +This command displays the version of Mercurial you are running, and 3.107 +its copyright license. There are four kinds of version string that 3.108 +you may see. 3.109 +\begin{itemize} 3.110 +\item The string ``\texttt{unknown}''. This version of Mercurial was 3.111 + not built in a Mercurial repository, and cannot determine its own 3.112 + version. 3.113 +\item A short numeric string, such as ``\texttt{1.1}''. This is a 3.114 + build of a revision of Mercurial that was identified by a specific 3.115 + tag in the repository where it was built. (This doesn't necessarily 3.116 + mean that you're running an official release; someone else could 3.117 + have added that tag to any revision in the repository where they 3.118 + built Mercurial.) 3.119 +\item A hexadecimal string, such as ``\texttt{875489e31abe}''. This 3.120 + is a build of the given revision of Mercurial. 3.121 +\item A hexadecimal string followed by a date, such as 3.122 + ``\texttt{875489e31abe+20070205}''. This is a build of the given 3.123 + revision of Mercurial, where the build repository contained some 3.124 + local changes that had not been committed. 3.125 +\end{itemize} 3.126 + 3.127 +\subsection{Tips and tricks} 3.128 + 3.129 +\subsubsection{Why do the results of \hgcmd{diff} and \hgcmd{status} 3.130 + differ?} 3.131 +\label{cmdref:diff-vs-status} 3.132 + 3.133 +When you run the \hgcmd{status} command, you'll see a list of files 3.134 +that Mercurial will record changes for the next time you perform a 3.135 +commit. If you run the \hgcmd{diff} command, you may notice that it 3.136 +prints diffs for only a \emph{subset} of the files that \hgcmd{status} 3.137 +listed. There are two possible reasons for this. 3.138 + 3.139 +The first is that \hgcmd{status} prints some kinds of modifications 3.140 +that \hgcmd{diff} doesn't normally display. The \hgcmd{diff} command 3.141 +normally outputs unified diffs, which don't have the ability to 3.142 +represent some changes that Mercurial can track. Most notably, 3.143 +traditional diffs can't represent a change in whether or not a file is 3.144 +executable, but Mercurial records this information. 3.145 + 3.146 +If you use the \hgopt{diff}{--git} option to \hgcmd{diff}, it will 3.147 +display \command{git}-compatible diffs that \emph{can} display this 3.148 +extra information. 3.149 + 3.150 +The second possible reason that \hgcmd{diff} might be printing diffs 3.151 +for a subset of the files displayed by \hgcmd{status} is that if you 3.152 +invoke it without any arguments, \hgcmd{diff} prints diffs against the 3.153 +first parent of the working directory. If you have run \hgcmd{merge} 3.154 +to merge two changesets, but you haven't yet committed the results of 3.155 +the merge, your working directory has two parents (use \hgcmd{parents} 3.156 +to see them). While \hgcmd{status} prints modifications relative to 3.157 +\emph{both} parents after an uncommitted merge, \hgcmd{diff} still 3.158 +operates relative only to the first parent. You can get it to print 3.159 +diffs relative to the second parent by specifying that parent with the 3.160 +\hgopt{diff}{-r} option. There is no way to print diffs relative to 3.161 +both parents. 3.162 + 3.163 +\subsubsection{Generating safe binary diffs} 3.164 + 3.165 +If you use the \hgopt{diff}{-a} option to force Mercurial to print 3.166 +diffs of files that are either ``mostly text'' or contain lots of 3.167 +binary data, those diffs cannot subsequently be applied by either 3.168 +Mercurial's \hgcmd{import} command or the system's \command{patch} 3.169 +command. 3.170 + 3.171 +If you want to generate a diff of a binary file that is safe to use as 3.172 +input for \hgcmd{import}, use the \hgcmd{diff}{--git} option when you 3.173 +generate the patch. The system \command{patch} command cannot handle 3.174 +binary patches at all. 3.175 + 3.176 +%%% Local Variables: 3.177 +%%% mode: latex 3.178 +%%% TeX-master: "00book" 3.179 +%%% End:
4.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 4.2 +++ b/en/appB-mq-ref.tex Thu Jan 29 22:56:27 2009 -0800 4.3 @@ -0,0 +1,349 @@ 4.4 +\chapter{Mercurial Queues reference} 4.5 +\label{chap:mqref} 4.6 + 4.7 +\section{MQ command reference} 4.8 +\label{sec:mqref:cmdref} 4.9 + 4.10 +For an overview of the commands provided by MQ, use the command 4.11 +\hgcmdargs{help}{mq}. 4.12 + 4.13 +\subsection{\hgxcmd{mq}{qapplied}---print applied patches} 4.14 + 4.15 +The \hgxcmd{mq}{qapplied} command prints the current stack of applied 4.16 +patches. Patches are printed in oldest-to-newest order, so the last 4.17 +patch in the list is the ``top'' patch. 4.18 + 4.19 +\subsection{\hgxcmd{mq}{qcommit}---commit changes in the queue repository} 4.20 + 4.21 +The \hgxcmd{mq}{qcommit} command commits any outstanding changes in the 4.22 +\sdirname{.hg/patches} repository. This command only works if the 4.23 +\sdirname{.hg/patches} directory is a repository, i.e.~you created the 4.24 +directory using \hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} or ran 4.25 +\hgcmd{init} in the directory after running \hgxcmd{mq}{qinit}. 4.26 + 4.27 +This command is shorthand for \hgcmdargs{commit}{--cwd .hg/patches}. 4.28 + 4.29 +\subsection{\hgxcmd{mq}{qdelete}---delete a patch from the 4.30 + \sfilename{series} file} 4.31 + 4.32 +The \hgxcmd{mq}{qdelete} command removes the entry for a patch from the 4.33 +\sfilename{series} file in the \sdirname{.hg/patches} directory. It 4.34 +does not pop the patch if the patch is already applied. By default, 4.35 +it does not delete the patch file; use the \hgxopt{mq}{qdel}{-f} option to 4.36 +do that. 4.37 + 4.38 +Options: 4.39 +\begin{itemize} 4.40 +\item[\hgxopt{mq}{qdel}{-f}] Delete the patch file. 4.41 +\end{itemize} 4.42 + 4.43 +\subsection{\hgxcmd{mq}{qdiff}---print a diff of the topmost applied patch} 4.44 + 4.45 +The \hgxcmd{mq}{qdiff} command prints a diff of the topmost applied patch. 4.46 +It is equivalent to \hgcmdargs{diff}{-r-2:-1}. 4.47 + 4.48 +\subsection{\hgxcmd{mq}{qfold}---merge (``fold'') several patches into one} 4.49 + 4.50 +The \hgxcmd{mq}{qfold} command merges multiple patches into the topmost 4.51 +applied patch, so that the topmost applied patch makes the union of 4.52 +all of the changes in the patches in question. 4.53 + 4.54 +The patches to fold must not be applied; \hgxcmd{mq}{qfold} will exit with 4.55 +an error if any is. The order in which patches are folded is 4.56 +significant; \hgcmdargs{qfold}{a b} means ``apply the current topmost 4.57 +patch, followed by \texttt{a}, followed by \texttt{b}''. 4.58 + 4.59 +The comments from the folded patches are appended to the comments of 4.60 +the destination patch, with each block of comments separated by three 4.61 +asterisk (``\texttt{*}'') characters. Use the \hgxopt{mq}{qfold}{-e} 4.62 +option to edit the commit message for the combined patch/changeset 4.63 +after the folding has completed. 4.64 + 4.65 +Options: 4.66 +\begin{itemize} 4.67 +\item[\hgxopt{mq}{qfold}{-e}] Edit the commit message and patch description 4.68 + for the newly folded patch. 4.69 +\item[\hgxopt{mq}{qfold}{-l}] Use the contents of the given file as the new 4.70 + commit message and patch description for the folded patch. 4.71 +\item[\hgxopt{mq}{qfold}{-m}] Use the given text as the new commit message 4.72 + and patch description for the folded patch. 4.73 +\end{itemize} 4.74 + 4.75 +\subsection{\hgxcmd{mq}{qheader}---display the header/description of a patch} 4.76 + 4.77 +The \hgxcmd{mq}{qheader} command prints the header, or description, of a 4.78 +patch. By default, it prints the header of the topmost applied patch. 4.79 +Given an argument, it prints the header of the named patch. 4.80 + 4.81 +\subsection{\hgxcmd{mq}{qimport}---import a third-party patch into the queue} 4.82 + 4.83 +The \hgxcmd{mq}{qimport} command adds an entry for an external patch to the 4.84 +\sfilename{series} file, and copies the patch into the 4.85 +\sdirname{.hg/patches} directory. It adds the entry immediately after 4.86 +the topmost applied patch, but does not push the patch. 4.87 + 4.88 +If the \sdirname{.hg/patches} directory is a repository, 4.89 +\hgxcmd{mq}{qimport} automatically does an \hgcmd{add} of the imported 4.90 +patch. 4.91 + 4.92 +\subsection{\hgxcmd{mq}{qinit}---prepare a repository to work with MQ} 4.93 + 4.94 +The \hgxcmd{mq}{qinit} command prepares a repository to work with MQ. It 4.95 +creates a directory called \sdirname{.hg/patches}. 4.96 + 4.97 +Options: 4.98 +\begin{itemize} 4.99 +\item[\hgxopt{mq}{qinit}{-c}] Create \sdirname{.hg/patches} as a repository 4.100 + in its own right. Also creates a \sfilename{.hgignore} file that 4.101 + will ignore the \sfilename{status} file. 4.102 +\end{itemize} 4.103 + 4.104 +When the \sdirname{.hg/patches} directory is a repository, the 4.105 +\hgxcmd{mq}{qimport} and \hgxcmd{mq}{qnew} commands automatically \hgcmd{add} 4.106 +new patches. 4.107 + 4.108 +\subsection{\hgxcmd{mq}{qnew}---create a new patch} 4.109 + 4.110 +The \hgxcmd{mq}{qnew} command creates a new patch. It takes one mandatory 4.111 +argument, the name to use for the patch file. The newly created patch 4.112 +is created empty by default. It is added to the \sfilename{series} 4.113 +file after the current topmost applied patch, and is immediately 4.114 +pushed on top of that patch. 4.115 + 4.116 +If \hgxcmd{mq}{qnew} finds modified files in the working directory, it will 4.117 +refuse to create a new patch unless the \hgxopt{mq}{qnew}{-f} option is 4.118 +used (see below). This behaviour allows you to \hgxcmd{mq}{qrefresh} your 4.119 +topmost applied patch before you apply a new patch on top of it. 4.120 + 4.121 +Options: 4.122 +\begin{itemize} 4.123 +\item[\hgxopt{mq}{qnew}{-f}] Create a new patch if the contents of the 4.124 + working directory are modified. Any outstanding modifications are 4.125 + added to the newly created patch, so after this command completes, 4.126 + the working directory will no longer be modified. 4.127 +\item[\hgxopt{mq}{qnew}{-m}] Use the given text as the commit message. 4.128 + This text will be stored at the beginning of the patch file, before 4.129 + the patch data. 4.130 +\end{itemize} 4.131 + 4.132 +\subsection{\hgxcmd{mq}{qnext}---print the name of the next patch} 4.133 + 4.134 +The \hgxcmd{mq}{qnext} command prints the name name of the next patch in 4.135 +the \sfilename{series} file after the topmost applied patch. This 4.136 +patch will become the topmost applied patch if you run \hgxcmd{mq}{qpush}. 4.137 + 4.138 +\subsection{\hgxcmd{mq}{qpop}---pop patches off the stack} 4.139 + 4.140 +The \hgxcmd{mq}{qpop} command removes applied patches from the top of the 4.141 +stack of applied patches. By default, it removes only one patch. 4.142 + 4.143 +This command removes the changesets that represent the popped patches 4.144 +from the repository, and updates the working directory to undo the 4.145 +effects of the patches. 4.146 + 4.147 +This command takes an optional argument, which it uses as the name or 4.148 +index of the patch to pop to. If given a name, it will pop patches 4.149 +until the named patch is the topmost applied patch. If given a 4.150 +number, \hgxcmd{mq}{qpop} treats the number as an index into the entries in 4.151 +the series file, counting from zero (empty lines and lines containing 4.152 +only comments do not count). It pops patches until the patch 4.153 +identified by the given index is the topmost applied patch. 4.154 + 4.155 +The \hgxcmd{mq}{qpop} command does not read or write patches or the 4.156 +\sfilename{series} file. It is thus safe to \hgxcmd{mq}{qpop} a patch that 4.157 +you have removed from the \sfilename{series} file, or a patch that you 4.158 +have renamed or deleted entirely. In the latter two cases, use the 4.159 +name of the patch as it was when you applied it. 4.160 + 4.161 +By default, the \hgxcmd{mq}{qpop} command will not pop any patches if the 4.162 +working directory has been modified. You can override this behaviour 4.163 +using the \hgxopt{mq}{qpop}{-f} option, which reverts all modifications in 4.164 +the working directory. 4.165 + 4.166 +Options: 4.167 +\begin{itemize} 4.168 +\item[\hgxopt{mq}{qpop}{-a}] Pop all applied patches. This returns the 4.169 + repository to its state before you applied any patches. 4.170 +\item[\hgxopt{mq}{qpop}{-f}] Forcibly revert any modifications to the 4.171 + working directory when popping. 4.172 +\item[\hgxopt{mq}{qpop}{-n}] Pop a patch from the named queue. 4.173 +\end{itemize} 4.174 + 4.175 +The \hgxcmd{mq}{qpop} command removes one line from the end of the 4.176 +\sfilename{status} file for each patch that it pops. 4.177 + 4.178 +\subsection{\hgxcmd{mq}{qprev}---print the name of the previous patch} 4.179 + 4.180 +The \hgxcmd{mq}{qprev} command prints the name of the patch in the 4.181 +\sfilename{series} file that comes before the topmost applied patch. 4.182 +This will become the topmost applied patch if you run \hgxcmd{mq}{qpop}. 4.183 + 4.184 +\subsection{\hgxcmd{mq}{qpush}---push patches onto the stack} 4.185 +\label{sec:mqref:cmd:qpush} 4.186 + 4.187 +The \hgxcmd{mq}{qpush} command adds patches onto the applied stack. By 4.188 +default, it adds only one patch. 4.189 + 4.190 +This command creates a new changeset to represent each applied patch, 4.191 +and updates the working directory to apply the effects of the patches. 4.192 + 4.193 +The default data used when creating a changeset are as follows: 4.194 +\begin{itemize} 4.195 +\item The commit date and time zone are the current date and time 4.196 + zone. Because these data are used to compute the identity of a 4.197 + changeset, this means that if you \hgxcmd{mq}{qpop} a patch and 4.198 + \hgxcmd{mq}{qpush} it again, the changeset that you push will have a 4.199 + different identity than the changeset you popped. 4.200 +\item The author is the same as the default used by the \hgcmd{commit} 4.201 + command. 4.202 +\item The commit message is any text from the patch file that comes 4.203 + before the first diff header. If there is no such text, a default 4.204 + commit message is used that identifies the name of the patch. 4.205 +\end{itemize} 4.206 +If a patch contains a Mercurial patch header (XXX add link), the 4.207 +information in the patch header overrides these defaults. 4.208 + 4.209 +Options: 4.210 +\begin{itemize} 4.211 +\item[\hgxopt{mq}{qpush}{-a}] Push all unapplied patches from the 4.212 + \sfilename{series} file until there are none left to push. 4.213 +\item[\hgxopt{mq}{qpush}{-l}] Add the name of the patch to the end 4.214 + of the commit message. 4.215 +\item[\hgxopt{mq}{qpush}{-m}] If a patch fails to apply cleanly, use the 4.216 + entry for the patch in another saved queue to compute the parameters 4.217 + for a three-way merge, and perform a three-way merge using the 4.218 + normal Mercurial merge machinery. Use the resolution of the merge 4.219 + as the new patch content. 4.220 +\item[\hgxopt{mq}{qpush}{-n}] Use the named queue if merging while pushing. 4.221 +\end{itemize} 4.222 + 4.223 +The \hgxcmd{mq}{qpush} command reads, but does not modify, the 4.224 +\sfilename{series} file. It appends one line to the \hgcmd{status} 4.225 +file for each patch that it pushes. 4.226 + 4.227 +\subsection{\hgxcmd{mq}{qrefresh}---update the topmost applied patch} 4.228 + 4.229 +The \hgxcmd{mq}{qrefresh} command updates the topmost applied patch. It 4.230 +modifies the patch, removes the old changeset that represented the 4.231 +patch, and creates a new changeset to represent the modified patch. 4.232 + 4.233 +The \hgxcmd{mq}{qrefresh} command looks for the following modifications: 4.234 +\begin{itemize} 4.235 +\item Changes to the commit message, i.e.~the text before the first 4.236 + diff header in the patch file, are reflected in the new changeset 4.237 + that represents the patch. 4.238 +\item Modifications to tracked files in the working directory are 4.239 + added to the patch. 4.240 +\item Changes to the files tracked using \hgcmd{add}, \hgcmd{copy}, 4.241 + \hgcmd{remove}, or \hgcmd{rename}. Added files and copy and rename 4.242 + destinations are added to the patch, while removed files and rename 4.243 + sources are removed. 4.244 +\end{itemize} 4.245 + 4.246 +Even if \hgxcmd{mq}{qrefresh} detects no changes, it still recreates the 4.247 +changeset that represents the patch. This causes the identity of the 4.248 +changeset to differ from the previous changeset that identified the 4.249 +patch. 4.250 + 4.251 +Options: 4.252 +\begin{itemize} 4.253 +\item[\hgxopt{mq}{qrefresh}{-e}] Modify the commit and patch description, 4.254 + using the preferred text editor. 4.255 +\item[\hgxopt{mq}{qrefresh}{-m}] Modify the commit message and patch 4.256 + description, using the given text. 4.257 +\item[\hgxopt{mq}{qrefresh}{-l}] Modify the commit message and patch 4.258 + description, using text from the given file. 4.259 +\end{itemize} 4.260 + 4.261 +\subsection{\hgxcmd{mq}{qrename}---rename a patch} 4.262 + 4.263 +The \hgxcmd{mq}{qrename} command renames a patch, and changes the entry for 4.264 +the patch in the \sfilename{series} file. 4.265 + 4.266 +With a single argument, \hgxcmd{mq}{qrename} renames the topmost applied 4.267 +patch. With two arguments, it renames its first argument to its 4.268 +second. 4.269 + 4.270 +\subsection{\hgxcmd{mq}{qrestore}---restore saved queue state} 4.271 + 4.272 +XXX No idea what this does. 4.273 + 4.274 +\subsection{\hgxcmd{mq}{qsave}---save current queue state} 4.275 + 4.276 +XXX Likewise. 4.277 + 4.278 +\subsection{\hgxcmd{mq}{qseries}---print the entire patch series} 4.279 + 4.280 +The \hgxcmd{mq}{qseries} command prints the entire patch series from the 4.281 +\sfilename{series} file. It prints only patch names, not empty lines 4.282 +or comments. It prints in order from first to be applied to last. 4.283 + 4.284 +\subsection{\hgxcmd{mq}{qtop}---print the name of the current patch} 4.285 + 4.286 +The \hgxcmd{mq}{qtop} prints the name of the topmost currently applied 4.287 +patch. 4.288 + 4.289 +\subsection{\hgxcmd{mq}{qunapplied}---print patches not yet applied} 4.290 + 4.291 +The \hgxcmd{mq}{qunapplied} command prints the names of patches from the 4.292 +\sfilename{series} file that are not yet applied. It prints them in 4.293 +order from the next patch that will be pushed to the last. 4.294 + 4.295 +\subsection{\hgcmd{strip}---remove a revision and descendants} 4.296 + 4.297 +The \hgcmd{strip} command removes a revision, and all of its 4.298 +descendants, from the repository. It undoes the effects of the 4.299 +removed revisions from the repository, and updates the working 4.300 +directory to the first parent of the removed revision. 4.301 + 4.302 +The \hgcmd{strip} command saves a backup of the removed changesets in 4.303 +a bundle, so that they can be reapplied if removed in error. 4.304 + 4.305 +Options: 4.306 +\begin{itemize} 4.307 +\item[\hgopt{strip}{-b}] Save unrelated changesets that are intermixed 4.308 + with the stripped changesets in the backup bundle. 4.309 +\item[\hgopt{strip}{-f}] If a branch has multiple heads, remove all 4.310 + heads. XXX This should be renamed, and use \texttt{-f} to strip revs 4.311 + when there are pending changes. 4.312 +\item[\hgopt{strip}{-n}] Do not save a backup bundle. 4.313 +\end{itemize} 4.314 + 4.315 +\section{MQ file reference} 4.316 + 4.317 +\subsection{The \sfilename{series} file} 4.318 + 4.319 +The \sfilename{series} file contains a list of the names of all 4.320 +patches that MQ can apply. It is represented as a list of names, with 4.321 +one name saved per line. Leading and trailing white space in each 4.322 +line are ignored. 4.323 + 4.324 +Lines may contain comments. A comment begins with the ``\texttt{\#}'' 4.325 +character, and extends to the end of the line. Empty lines, and lines 4.326 +that contain only comments, are ignored. 4.327 + 4.328 +You will often need to edit the \sfilename{series} file by hand, hence 4.329 +the support for comments and empty lines noted above. For example, 4.330 +you can comment out a patch temporarily, and \hgxcmd{mq}{qpush} will skip 4.331 +over that patch when applying patches. You can also change the order 4.332 +in which patches are applied by reordering their entries in the 4.333 +\sfilename{series} file. 4.334 + 4.335 +Placing the \sfilename{series} file under revision control is also 4.336 +supported; it is a good idea to place all of the patches that it 4.337 +refers to under revision control, as well. If you create a patch 4.338 +directory using the \hgxopt{mq}{qinit}{-c} option to \hgxcmd{mq}{qinit}, this 4.339 +will be done for you automatically. 4.340 + 4.341 +\subsection{The \sfilename{status} file} 4.342 + 4.343 +The \sfilename{status} file contains the names and changeset hashes of 4.344 +all patches that MQ currently has applied. Unlike the 4.345 +\sfilename{series} file, this file is not intended for editing. You 4.346 +should not place this file under revision control, or modify it in any 4.347 +way. It is used by MQ strictly for internal book-keeping. 4.348 + 4.349 +%%% Local Variables: 4.350 +%%% mode: latex 4.351 +%%% TeX-master: "00book" 4.352 +%%% End:
5.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 5.2 +++ b/en/appC-srcinstall.tex Thu Jan 29 22:56:27 2009 -0800 5.3 @@ -0,0 +1,53 @@ 5.4 +\chapter{Installing Mercurial from source} 5.5 +\label{chap:srcinstall} 5.6 + 5.7 +\section{On a Unix-like system} 5.8 +\label{sec:srcinstall:unixlike} 5.9 + 5.10 +If you are using a Unix-like system that has a sufficiently recent 5.11 +version of Python (2.3~or newer) available, it is easy to install 5.12 +Mercurial from source. 5.13 +\begin{enumerate} 5.14 +\item Download a recent source tarball from 5.15 + \url{http://www.selenic.com/mercurial/download}. 5.16 +\item Unpack the tarball: 5.17 + \begin{codesample4} 5.18 + gzip -dc mercurial-\emph{version}.tar.gz | tar xf - 5.19 + \end{codesample4} 5.20 +\item Go into the source directory and run the installer script. This 5.21 + will build Mercurial and install it in your home directory. 5.22 + \begin{codesample4} 5.23 + cd mercurial-\emph{version} 5.24 + python setup.py install --force --home=\$HOME 5.25 + \end{codesample4} 5.26 +\end{enumerate} 5.27 +Once the install finishes, Mercurial will be in the \texttt{bin} 5.28 +subdirectory of your home directory. Don't forget to make sure that 5.29 +this directory is present in your shell's search path. 5.30 + 5.31 +You will probably need to set the \envar{PYTHONPATH} environment 5.32 +variable so that the Mercurial executable can find the rest of the 5.33 +Mercurial packages. For example, on my laptop, I have set it to 5.34 +\texttt{/home/bos/lib/python}. The exact path that you will need to 5.35 +use depends on how Python was built for your system, but should be 5.36 +easy to figure out. If you're uncertain, look through the output of 5.37 +the installer script above, and see where the contents of the 5.38 +\texttt{mercurial} directory were installed to. 5.39 + 5.40 +\section{On Windows} 5.41 + 5.42 +Building and installing Mercurial on Windows requires a variety of 5.43 +tools, a fair amount of technical knowledge, and considerable 5.44 +patience. I very much \emph{do not recommend} this route if you are a 5.45 +``casual user''. Unless you intend to hack on Mercurial, I strongly 5.46 +suggest that you use a binary package instead. 5.47 + 5.48 +If you are intent on building Mercurial from source on Windows, follow 5.49 +the ``hard way'' directions on the Mercurial wiki at 5.50 +\url{http://www.selenic.com/mercurial/wiki/index.cgi/WindowsInstall}, 5.51 +and expect the process to involve a lot of fiddly work. 5.52 + 5.53 +%%% Local Variables: 5.54 +%%% mode: latex 5.55 +%%% TeX-master: "00book" 5.56 +%%% End:
6.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 6.2 +++ b/en/appD-license.tex Thu Jan 29 22:56:27 2009 -0800 6.3 @@ -0,0 +1,138 @@ 6.4 +\chapter{Open Publication License} 6.5 +\label{cha:opl} 6.6 + 6.7 +Version 1.0, 8 June 1999 6.8 + 6.9 +\section{Requirements on both unmodified and modified versions} 6.10 + 6.11 +The Open Publication works may be reproduced and distributed in whole 6.12 +or in part, in any medium physical or electronic, provided that the 6.13 +terms of this license are adhered to, and that this license or an 6.14 +incorporation of it by reference (with any options elected by the 6.15 +author(s) and/or publisher) is displayed in the reproduction. 6.16 + 6.17 +Proper form for an incorporation by reference is as follows: 6.18 + 6.19 +\begin{quote} 6.20 + Copyright (c) \emph{year} by \emph{author's name or designee}. This 6.21 + material may be distributed only subject to the terms and conditions 6.22 + set forth in the Open Publication License, v\emph{x.y} or later (the 6.23 + latest version is presently available at 6.24 + \url{http://www.opencontent.org/openpub/}). 6.25 +\end{quote} 6.26 + 6.27 +The reference must be immediately followed with any options elected by 6.28 +the author(s) and/or publisher of the document (see 6.29 +section~\ref{sec:opl:options}). 6.30 + 6.31 +Commercial redistribution of Open Publication-licensed material is 6.32 +permitted. 6.33 + 6.34 +Any publication in standard (paper) book form shall require the 6.35 +citation of the original publisher and author. The publisher and 6.36 +author's names shall appear on all outer surfaces of the book. On all 6.37 +outer surfaces of the book the original publisher's name shall be as 6.38 +large as the title of the work and cited as possessive with respect to 6.39 +the title. 6.40 + 6.41 +\section{Copyright} 6.42 + 6.43 +The copyright to each Open Publication is owned by its author(s) or 6.44 +designee. 6.45 + 6.46 +\section{Scope of license} 6.47 + 6.48 +The following license terms apply to all Open Publication works, 6.49 +unless otherwise explicitly stated in the document. 6.50 + 6.51 +Mere aggregation of Open Publication works or a portion of an Open 6.52 +Publication work with other works or programs on the same media shall 6.53 +not cause this license to apply to those other works. The aggregate 6.54 +work shall contain a notice specifying the inclusion of the Open 6.55 +Publication material and appropriate copyright notice. 6.56 + 6.57 +\textbf{Severability}. If any part of this license is found to be 6.58 +unenforceable in any jurisdiction, the remaining portions of the 6.59 +license remain in force. 6.60 + 6.61 +\textbf{No warranty}. Open Publication works are licensed and provided 6.62 +``as is'' without warranty of any kind, express or implied, including, 6.63 +but not limited to, the implied warranties of merchantability and 6.64 +fitness for a particular purpose or a warranty of non-infringement. 6.65 + 6.66 +\section{Requirements on modified works} 6.67 + 6.68 +All modified versions of documents covered by this license, including 6.69 +translations, anthologies, compilations and partial documents, must 6.70 +meet the following requirements: 6.71 + 6.72 +\begin{enumerate} 6.73 +\item The modified version must be labeled as such. 6.74 +\item The person making the modifications must be identified and the 6.75 + modifications dated. 6.76 +\item Acknowledgement of the original author and publisher if 6.77 + applicable must be retained according to normal academic citation 6.78 + practices. 6.79 +\item The location of the original unmodified document must be 6.80 + identified. 6.81 +\item The original author's (or authors') name(s) may not be used to 6.82 + assert or imply endorsement of the resulting document without the 6.83 + original author's (or authors') permission. 6.84 +\end{enumerate} 6.85 + 6.86 +\section{Good-practice recommendations} 6.87 + 6.88 +In addition to the requirements of this license, it is requested from 6.89 +and strongly recommended of redistributors that: 6.90 + 6.91 +\begin{enumerate} 6.92 +\item If you are distributing Open Publication works on hardcopy or 6.93 + CD-ROM, you provide email notification to the authors of your intent 6.94 + to redistribute at least thirty days before your manuscript or media 6.95 + freeze, to give the authors time to provide updated documents. This 6.96 + notification should describe modifications, if any, made to the 6.97 + document. 6.98 +\item All substantive modifications (including deletions) be either 6.99 + clearly marked up in the document or else described in an attachment 6.100 + to the document. 6.101 +\item Finally, while it is not mandatory under this license, it is 6.102 + considered good form to offer a free copy of any hardcopy and CD-ROM 6.103 + expression of an Open Publication-licensed work to its author(s). 6.104 +\end{enumerate} 6.105 + 6.106 +\section{License options} 6.107 +\label{sec:opl:options} 6.108 + 6.109 +The author(s) and/or publisher of an Open Publication-licensed 6.110 +document may elect certain options by appending language to the 6.111 +reference to or copy of the license. These options are considered part 6.112 +of the license instance and must be included with the license (or its 6.113 +incorporation by reference) in derived works. 6.114 + 6.115 +\begin{enumerate}[A] 6.116 +\item To prohibit distribution of substantively modified versions 6.117 + without the explicit permission of the author(s). ``Substantive 6.118 + modification'' is defined as a change to the semantic content of the 6.119 + document, and excludes mere changes in format or typographical 6.120 + corrections. 6.121 + 6.122 + To accomplish this, add the phrase ``Distribution of substantively 6.123 + modified versions of this document is prohibited without the 6.124 + explicit permission of the copyright holder.'' to the license 6.125 + reference or copy. 6.126 + 6.127 +\item To prohibit any publication of this work or derivative works in 6.128 + whole or in part in standard (paper) book form for commercial 6.129 + purposes is prohibited unless prior permission is obtained from the 6.130 + copyright holder. 6.131 + 6.132 + To accomplish this, add the phrase ``Distribution of the work or 6.133 + derivative of the work in any standard (paper) book form is 6.134 + prohibited unless prior permission is obtained from the copyright 6.135 + holder.'' to the license reference or copy. 6.136 +\end{enumerate} 6.137 + 6.138 +%%% Local Variables: 6.139 +%%% mode: latex 6.140 +%%% TeX-master: "00book" 6.141 +%%% End:
7.1 --- a/en/branch.tex Thu Jan 29 22:47:34 2009 -0800 7.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 7.3 @@ -1,392 +0,0 @@ 7.4 -\chapter{Managing releases and branchy development} 7.5 -\label{chap:branch} 7.6 - 7.7 -Mercurial provides several mechanisms for you to manage a project that 7.8 -is making progress on multiple fronts at once. To understand these 7.9 -mechanisms, let's first take a brief look at a fairly normal software 7.10 -project structure. 7.11 - 7.12 -Many software projects issue periodic ``major'' releases that contain 7.13 -substantial new features. In parallel, they may issue ``minor'' 7.14 -releases. These are usually identical to the major releases off which 7.15 -they're based, but with a few bugs fixed. 7.16 - 7.17 -In this chapter, we'll start by talking about how to keep records of 7.18 -project milestones such as releases. We'll then continue on to talk 7.19 -about the flow of work between different phases of a project, and how 7.20 -Mercurial can help you to isolate and manage this work. 7.21 - 7.22 -\section{Giving a persistent name to a revision} 7.23 - 7.24 -Once you decide that you'd like to call a particular revision a 7.25 -``release'', it's a good idea to record the identity of that revision. 7.26 -This will let you reproduce that release at a later date, for whatever 7.27 -purpose you might need at the time (reproducing a bug, porting to a 7.28 -new platform, etc). 7.29 -\interaction{tag.init} 7.30 - 7.31 -Mercurial lets you give a permanent name to any revision using the 7.32 -\hgcmd{tag} command. Not surprisingly, these names are called 7.33 -``tags''. 7.34 -\interaction{tag.tag} 7.35 - 7.36 -A tag is nothing more than a ``symbolic name'' for a revision. Tags 7.37 -exist purely for your convenience, so that you have a handy permanent 7.38 -way to refer to a revision; Mercurial doesn't interpret the tag names 7.39 -you use in any way. Neither does Mercurial place any restrictions on 7.40 -the name of a tag, beyond a few that are necessary to ensure that a 7.41 -tag can be parsed unambiguously. A tag name cannot contain any of the 7.42 -following characters: 7.43 -\begin{itemize} 7.44 -\item Colon (ASCII 58, ``\texttt{:}'') 7.45 -\item Carriage return (ASCII 13, ``\Verb+\r+'') 7.46 -\item Newline (ASCII 10, ``\Verb+\n+'') 7.47 -\end{itemize} 7.48 - 7.49 -You can use the \hgcmd{tags} command to display the tags present in 7.50 -your repository. In the output, each tagged revision is identified 7.51 -first by its name, then by revision number, and finally by the unique 7.52 -hash of the revision. 7.53 -\interaction{tag.tags} 7.54 -Notice that \texttt{tip} is listed in the output of \hgcmd{tags}. The 7.55 -\texttt{tip} tag is a special ``floating'' tag, which always 7.56 -identifies the newest revision in the repository. 7.57 - 7.58 -In the output of the \hgcmd{tags} command, tags are listed in reverse 7.59 -order, by revision number. This usually means that recent tags are 7.60 -listed before older tags. It also means that \texttt{tip} is always 7.61 -going to be the first tag listed in the output of \hgcmd{tags}. 7.62 - 7.63 -When you run \hgcmd{log}, if it displays a revision that has tags 7.64 -associated with it, it will print those tags. 7.65 -\interaction{tag.log} 7.66 - 7.67 -Any time you need to provide a revision~ID to a Mercurial command, the 7.68 -command will accept a tag name in its place. Internally, Mercurial 7.69 -will translate your tag name into the corresponding revision~ID, then 7.70 -use that. 7.71 -\interaction{tag.log.v1.0} 7.72 - 7.73 -There's no limit on the number of tags you can have in a repository, 7.74 -or on the number of tags that a single revision can have. As a 7.75 -practical matter, it's not a great idea to have ``too many'' (a number 7.76 -which will vary from project to project), simply because tags are 7.77 -supposed to help you to find revisions. If you have lots of tags, the 7.78 -ease of using them to identify revisions diminishes rapidly. 7.79 - 7.80 -For example, if your project has milestones as frequent as every few 7.81 -days, it's perfectly reasonable to tag each one of those. But if you 7.82 -have a continuous build system that makes sure every revision can be 7.83 -built cleanly, you'd be introducing a lot of noise if you were to tag 7.84 -every clean build. Instead, you could tag failed builds (on the 7.85 -assumption that they're rare!), or simply not use tags to track 7.86 -buildability. 7.87 - 7.88 -If you want to remove a tag that you no longer want, use 7.89 -\hgcmdargs{tag}{--remove}. 7.90 -\interaction{tag.remove} 7.91 -You can also modify a tag at any time, so that it identifies a 7.92 -different revision, by simply issuing a new \hgcmd{tag} command. 7.93 -You'll have to use the \hgopt{tag}{-f} option to tell Mercurial that 7.94 -you \emph{really} want to update the tag. 7.95 -\interaction{tag.replace} 7.96 -There will still be a permanent record of the previous identity of the 7.97 -tag, but Mercurial will no longer use it. There's thus no penalty to 7.98 -tagging the wrong revision; all you have to do is turn around and tag 7.99 -the correct revision once you discover your error. 7.100 - 7.101 -Mercurial stores tags in a normal revision-controlled file in your 7.102 -repository. If you've created any tags, you'll find them in a file 7.103 -named \sfilename{.hgtags}. When you run the \hgcmd{tag} command, 7.104 -Mercurial modifies this file, then automatically commits the change to 7.105 -it. This means that every time you run \hgcmd{tag}, you'll see a 7.106 -corresponding changeset in the output of \hgcmd{log}. 7.107 -\interaction{tag.tip} 7.108 - 7.109 -\subsection{Handling tag conflicts during a merge} 7.110 - 7.111 -You won't often need to care about the \sfilename{.hgtags} file, but 7.112 -it sometimes makes its presence known during a merge. The format of 7.113 -the file is simple: it consists of a series of lines. Each line 7.114 -starts with a changeset hash, followed by a space, followed by the 7.115 -name of a tag. 7.116 - 7.117 -If you're resolving a conflict in the \sfilename{.hgtags} file during 7.118 -a merge, there's one twist to modifying the \sfilename{.hgtags} file: 7.119 -when Mercurial is parsing the tags in a repository, it \emph{never} 7.120 -reads the working copy of the \sfilename{.hgtags} file. Instead, it 7.121 -reads the \emph{most recently committed} revision of the file. 7.122 - 7.123 -An unfortunate consequence of this design is that you can't actually 7.124 -verify that your merged \sfilename{.hgtags} file is correct until 7.125 -\emph{after} you've committed a change. So if you find yourself 7.126 -resolving a conflict on \sfilename{.hgtags} during a merge, be sure to 7.127 -run \hgcmd{tags} after you commit. If it finds an error in the 7.128 -\sfilename{.hgtags} file, it will report the location of the error, 7.129 -which you can then fix and commit. You should then run \hgcmd{tags} 7.130 -again, just to be sure that your fix is correct. 7.131 - 7.132 -\subsection{Tags and cloning} 7.133 - 7.134 -You may have noticed that the \hgcmd{clone} command has a 7.135 -\hgopt{clone}{-r} option that lets you clone an exact copy of the 7.136 -repository as of a particular changeset. The new clone will not 7.137 -contain any project history that comes after the revision you 7.138 -specified. This has an interaction with tags that can surprise the 7.139 -unwary. 7.140 - 7.141 -Recall that a tag is stored as a revision to the \sfilename{.hgtags} 7.142 -file, so that when you create a tag, the changeset in which it's 7.143 -recorded necessarily refers to an older changeset. When you run 7.144 -\hgcmdargs{clone}{-r foo} to clone a repository as of tag 7.145 -\texttt{foo}, the new clone \emph{will not contain the history that 7.146 - created the tag} that you used to clone the repository. The result 7.147 -is that you'll get exactly the right subset of the project's history 7.148 -in the new repository, but \emph{not} the tag you might have expected. 7.149 - 7.150 -\subsection{When permanent tags are too much} 7.151 - 7.152 -Since Mercurial's tags are revision controlled and carried around with 7.153 -a project's history, everyone you work with will see the tags you 7.154 -create. But giving names to revisions has uses beyond simply noting 7.155 -that revision \texttt{4237e45506ee} is really \texttt{v2.0.2}. If 7.156 -you're trying to track down a subtle bug, you might want a tag to 7.157 -remind you of something like ``Anne saw the symptoms with this 7.158 -revision''. 7.159 - 7.160 -For cases like this, what you might want to use are \emph{local} tags. 7.161 -You can create a local tag with the \hgopt{tag}{-l} option to the 7.162 -\hgcmd{tag} command. This will store the tag in a file called 7.163 -\sfilename{.hg/localtags}. Unlike \sfilename{.hgtags}, 7.164 -\sfilename{.hg/localtags} is not revision controlled. Any tags you 7.165 -create using \hgopt{tag}{-l} remain strictly local to the repository 7.166 -you're currently working in. 7.167 - 7.168 -\section{The flow of changes---big picture vs. little} 7.169 - 7.170 -To return to the outline I sketched at the beginning of a chapter, 7.171 -let's think about a project that has multiple concurrent pieces of 7.172 -work under development at once. 7.173 - 7.174 -There might be a push for a new ``main'' release; a new minor bugfix 7.175 -release to the last main release; and an unexpected ``hot fix'' to an 7.176 -old release that is now in maintenance mode. 7.177 - 7.178 -The usual way people refer to these different concurrent directions of 7.179 -development is as ``branches''. However, we've already seen numerous 7.180 -times that Mercurial treats \emph{all of history} as a series of 7.181 -branches and merges. Really, what we have here is two ideas that are 7.182 -peripherally related, but which happen to share a name. 7.183 -\begin{itemize} 7.184 -\item ``Big picture'' branches represent the sweep of a project's 7.185 - evolution; people give them names, and talk about them in 7.186 - conversation. 7.187 -\item ``Little picture'' branches are artefacts of the day-to-day 7.188 - activity of developing and merging changes. They expose the 7.189 - narrative of how the code was developed. 7.190 -\end{itemize} 7.191 - 7.192 -\section{Managing big-picture branches in repositories} 7.193 - 7.194 -The easiest way to isolate a ``big picture'' branch in Mercurial is in 7.195 -a dedicated repository. If you have an existing shared 7.196 -repository---let's call it \texttt{myproject}---that reaches a ``1.0'' 7.197 -milestone, you can start to prepare for future maintenance releases on 7.198 -top of version~1.0 by tagging the revision from which you prepared 7.199 -the~1.0 release. 7.200 -\interaction{branch-repo.tag} 7.201 -You can then clone a new shared \texttt{myproject-1.0.1} repository as 7.202 -of that tag. 7.203 -\interaction{branch-repo.clone} 7.204 - 7.205 -Afterwards, if someone needs to work on a bug fix that ought to go 7.206 -into an upcoming~1.0.1 minor release, they clone the 7.207 -\texttt{myproject-1.0.1} repository, make their changes, and push them 7.208 -back. 7.209 -\interaction{branch-repo.bugfix} 7.210 -Meanwhile, development for the next major release can continue, 7.211 -isolated and unabated, in the \texttt{myproject} repository. 7.212 -\interaction{branch-repo.new} 7.213 - 7.214 -\section{Don't repeat yourself: merging across branches} 7.215 - 7.216 -In many cases, if you have a bug to fix on a maintenance branch, the 7.217 -chances are good that the bug exists on your project's main branch 7.218 -(and possibly other maintenance branches, too). It's a rare developer 7.219 -who wants to fix the same bug multiple times, so let's look at a few 7.220 -ways that Mercurial can help you to manage these bugfixes without 7.221 -duplicating your work. 7.222 - 7.223 -In the simplest instance, all you need to do is pull changes from your 7.224 -maintenance branch into your local clone of the target branch. 7.225 -\interaction{branch-repo.pull} 7.226 -You'll then need to merge the heads of the two branches, and push back 7.227 -to the main branch. 7.228 -\interaction{branch-repo.merge} 7.229 - 7.230 -\section{Naming branches within one repository} 7.231 - 7.232 -In most instances, isolating branches in repositories is the right 7.233 -approach. Its simplicity makes it easy to understand; and so it's 7.234 -hard to make mistakes. There's a one-to-one relationship between 7.235 -branches you're working in and directories on your system. This lets 7.236 -you use normal (non-Mercurial-aware) tools to work on files within a 7.237 -branch/repository. 7.238 - 7.239 -If you're more in the ``power user'' category (\emph{and} your 7.240 -collaborators are too), there is an alternative way of handling 7.241 -branches that you can consider. I've already mentioned the 7.242 -human-level distinction between ``small picture'' and ``big picture'' 7.243 -branches. While Mercurial works with multiple ``small picture'' 7.244 -branches in a repository all the time (for example after you pull 7.245 -changes in, but before you merge them), it can \emph{also} work with 7.246 -multiple ``big picture'' branches. 7.247 - 7.248 -The key to working this way is that Mercurial lets you assign a 7.249 -persistent \emph{name} to a branch. There always exists a branch 7.250 -named \texttt{default}. Even before you start naming branches 7.251 -yourself, you can find traces of the \texttt{default} branch if you 7.252 -look for them. 7.253 - 7.254 -As an example, when you run the \hgcmd{commit} command, and it pops up 7.255 -your editor so that you can enter a commit message, look for a line 7.256 -that contains the text ``\texttt{HG: branch default}'' at the bottom. 7.257 -This is telling you that your commit will occur on the branch named 7.258 -\texttt{default}. 7.259 - 7.260 -To start working with named branches, use the \hgcmd{branches} 7.261 -command. This command lists the named branches already present in 7.262 -your repository, telling you which changeset is the tip of each. 7.263 -\interaction{branch-named.branches} 7.264 -Since you haven't created any named branches yet, the only one that 7.265 -exists is \texttt{default}. 7.266 - 7.267 -To find out what the ``current'' branch is, run the \hgcmd{branch} 7.268 -command, giving it no arguments. This tells you what branch the 7.269 -parent of the current changeset is on. 7.270 -\interaction{branch-named.branch} 7.271 - 7.272 -To create a new branch, run the \hgcmd{branch} command again. This 7.273 -time, give it one argument: the name of the branch you want to create. 7.274 -\interaction{branch-named.create} 7.275 - 7.276 -After you've created a branch, you might wonder what effect the 7.277 -\hgcmd{branch} command has had. What do the \hgcmd{status} and 7.278 -\hgcmd{tip} commands report? 7.279 -\interaction{branch-named.status} 7.280 -Nothing has changed in the working directory, and there's been no new 7.281 -history created. As this suggests, running the \hgcmd{branch} command 7.282 -has no permanent effect; it only tells Mercurial what branch name to 7.283 -use the \emph{next} time you commit a changeset. 7.284 - 7.285 -When you commit a change, Mercurial records the name of the branch on 7.286 -which you committed. Once you've switched from the \texttt{default} 7.287 -branch to another and committed, you'll see the name of the new branch 7.288 -show up in the output of \hgcmd{log}, \hgcmd{tip}, and other commands 7.289 -that display the same kind of output. 7.290 -\interaction{branch-named.commit} 7.291 -The \hgcmd{log}-like commands will print the branch name of every 7.292 -changeset that's not on the \texttt{default} branch. As a result, if 7.293 -you never use named branches, you'll never see this information. 7.294 - 7.295 -Once you've named a branch and committed a change with that name, 7.296 -every subsequent commit that descends from that change will inherit 7.297 -the same branch name. You can change the name of a branch at any 7.298 -time, using the \hgcmd{branch} command. 7.299 -\interaction{branch-named.rebranch} 7.300 -In practice, this is something you won't do very often, as branch 7.301 -names tend to have fairly long lifetimes. (This isn't a rule, just an 7.302 -observation.) 7.303 - 7.304 -\section{Dealing with multiple named branches in a repository} 7.305 - 7.306 -If you have more than one named branch in a repository, Mercurial will 7.307 -remember the branch that your working directory on when you start a 7.308 -command like \hgcmd{update} or \hgcmdargs{pull}{-u}. It will update 7.309 -the working directory to the tip of this branch, no matter what the 7.310 -``repo-wide'' tip is. To update to a revision that's on a different 7.311 -named branch, you may need to use the \hgopt{update}{-C} option to 7.312 -\hgcmd{update}. 7.313 - 7.314 -This behaviour is a little subtle, so let's see it in action. First, 7.315 -let's remind ourselves what branch we're currently on, and what 7.316 -branches are in our repository. 7.317 -\interaction{branch-named.parents} 7.318 -We're on the \texttt{bar} branch, but there also exists an older 7.319 -\hgcmd{foo} branch. 7.320 - 7.321 -We can \hgcmd{update} back and forth between the tips of the 7.322 -\texttt{foo} and \texttt{bar} branches without needing to use the 7.323 -\hgopt{update}{-C} option, because this only involves going backwards 7.324 -and forwards linearly through our change history. 7.325 -\interaction{branch-named.update-switchy} 7.326 - 7.327 -If we go back to the \texttt{foo} branch and then run \hgcmd{update}, 7.328 -it will keep us on \texttt{foo}, not move us to the tip of 7.329 -\texttt{bar}. 7.330 -\interaction{branch-named.update-nothing} 7.331 - 7.332 -Committing a new change on the \texttt{foo} branch introduces a new 7.333 -head. 7.334 -\interaction{branch-named.foo-commit} 7.335 - 7.336 -\section{Branch names and merging} 7.337 - 7.338 -As you've probably noticed, merges in Mercurial are not symmetrical. 7.339 -Let's say our repository has two heads, 17 and 23. If I 7.340 -\hgcmd{update} to 17 and then \hgcmd{merge} with 23, Mercurial records 7.341 -17 as the first parent of the merge, and 23 as the second. Whereas if 7.342 -I \hgcmd{update} to 23 and then \hgcmd{merge} with 17, it records 23 7.343 -as the first parent, and 17 as the second. 7.344 - 7.345 -This affects Mercurial's choice of branch name when you merge. After 7.346 -a merge, Mercurial will retain the branch name of the first parent 7.347 -when you commit the result of the merge. If your first parent's 7.348 -branch name is \texttt{foo}, and you merge with \texttt{bar}, the 7.349 -branch name will still be \texttt{foo} after you merge. 7.350 - 7.351 -It's not unusual for a repository to contain multiple heads, each with 7.352 -the same branch name. Let's say I'm working on the \texttt{foo} 7.353 -branch, and so are you. We commit different changes; I pull your 7.354 -changes; I now have two heads, each claiming to be on the \texttt{foo} 7.355 -branch. The result of a merge will be a single head on the 7.356 -\texttt{foo} branch, as you might hope. 7.357 - 7.358 -But if I'm working on the \texttt{bar} branch, and I merge work from 7.359 -the \texttt{foo} branch, the result will remain on the \texttt{bar} 7.360 -branch. 7.361 -\interaction{branch-named.merge} 7.362 - 7.363 -To give a more concrete example, if I'm working on the 7.364 -\texttt{bleeding-edge} branch, and I want to bring in the latest fixes 7.365 -from the \texttt{stable} branch, Mercurial will choose the ``right'' 7.366 -(\texttt{bleeding-edge}) branch name when I pull and merge from 7.367 -\texttt{stable}. 7.368 - 7.369 -\section{Branch naming is generally useful} 7.370 - 7.371 -You shouldn't think of named branches as applicable only to situations 7.372 -where you have multiple long-lived branches cohabiting in a single 7.373 -repository. They're very useful even in the one-branch-per-repository 7.374 -case. 7.375 - 7.376 -In the simplest case, giving a name to each branch gives you a 7.377 -permanent record of which branch a changeset originated on. This 7.378 -gives you more context when you're trying to follow the history of a 7.379 -long-lived branchy project. 7.380 - 7.381 -If you're working with shared repositories, you can set up a 7.382 -\hook{pretxnchangegroup} hook on each that will block incoming changes 7.383 -that have the ``wrong'' branch name. This provides a simple, but 7.384 -effective, defence against people accidentally pushing changes from a 7.385 -``bleeding edge'' branch to a ``stable'' branch. Such a hook might 7.386 -look like this inside the shared repo's \hgrc. 7.387 -\begin{codesample2} 7.388 - [hooks] 7.389 - pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch 7.390 -\end{codesample2} 7.391 - 7.392 -%%% Local Variables: 7.393 -%%% mode: latex 7.394 -%%% TeX-master: "00book" 7.395 -%%% End:
8.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 8.2 +++ b/en/ch00-preface.tex Thu Jan 29 22:56:27 2009 -0800 8.3 @@ -0,0 +1,67 @@ 8.4 +\chapter*{Preface} 8.5 +\addcontentsline{toc}{chapter}{Preface} 8.6 +\label{chap:preface} 8.7 + 8.8 +Distributed revision control is a relatively new territory, and has 8.9 +thus far grown due to people's willingness to strike out into 8.10 +ill-charted territory. 8.11 + 8.12 +I am writing a book about distributed revision control because I 8.13 +believe that it is an important subject that deserves a field guide. 8.14 +I chose to write about Mercurial because it is the easiest tool to 8.15 +learn the terrain with, and yet it scales to the demands of real, 8.16 +challenging environments where many other revision control tools fail. 8.17 + 8.18 +\section{This book is a work in progress} 8.19 + 8.20 +I am releasing this book while I am still writing it, in the hope that 8.21 +it will prove useful to others. I also hope that readers will 8.22 +contribute as they see fit. 8.23 + 8.24 +\section{About the examples in this book} 8.25 + 8.26 +This book takes an unusual approach to code samples. Every example is 8.27 +``live''---each one is actually the result of a shell script that 8.28 +executes the Mercurial commands you see. Every time an image of the 8.29 +book is built from its sources, all the example scripts are 8.30 +automatically run, and their current results compared against their 8.31 +expected results. 8.32 + 8.33 +The advantage of this approach is that the examples are always 8.34 +accurate; they describe \emph{exactly} the behaviour of the version of 8.35 +Mercurial that's mentioned at the front of the book. If I update the 8.36 +version of Mercurial that I'm documenting, and the output of some 8.37 +command changes, the build fails. 8.38 + 8.39 +There is a small disadvantage to this approach, which is that the 8.40 +dates and times you'll see in examples tend to be ``squashed'' 8.41 +together in a way that they wouldn't be if the same commands were 8.42 +being typed by a human. Where a human can issue no more than one 8.43 +command every few seconds, with any resulting timestamps 8.44 +correspondingly spread out, my automated example scripts run many 8.45 +commands in one second. 8.46 + 8.47 +As an instance of this, several consecutive commits in an example can 8.48 +show up as having occurred during the same second. You can see this 8.49 +occur in the \hgext{bisect} example in section~\ref{sec:undo:bisect}, 8.50 +for instance. 8.51 + 8.52 +So when you're reading examples, don't place too much weight on the 8.53 +dates or times you see in the output of commands. But \emph{do} be 8.54 +confident that the behaviour you're seeing is consistent and 8.55 +reproducible. 8.56 + 8.57 +\section{Colophon---this book is Free} 8.58 + 8.59 +This book is licensed under the Open Publication License, and is 8.60 +produced entirely using Free Software tools. It is typeset with 8.61 +\LaTeX{}; illustrations are drawn and rendered with 8.62 +\href{http://www.inkscape.org/}{Inkscape}. 8.63 + 8.64 +The complete source code for this book is published as a Mercurial 8.65 +repository, at \url{http://hg.serpentine.com/mercurial/book}. 8.66 + 8.67 +%%% Local Variables: 8.68 +%%% mode: latex 8.69 +%%% TeX-master: "00book" 8.70 +%%% End:
9.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 9.2 +++ b/en/ch01-intro.tex Thu Jan 29 22:56:27 2009 -0800 9.3 @@ -0,0 +1,561 @@ 9.4 +\chapter{Introduction} 9.5 +\label{chap:intro} 9.6 + 9.7 +\section{About revision control} 9.8 + 9.9 +Revision control is the process of managing multiple versions of a 9.10 +piece of information. In its simplest form, this is something that 9.11 +many people do by hand: every time you modify a file, save it under a 9.12 +new name that contains a number, each one higher than the number of 9.13 +the preceding version. 9.14 + 9.15 +Manually managing multiple versions of even a single file is an 9.16 +error-prone task, though, so software tools to help automate this 9.17 +process have long been available. The earliest automated revision 9.18 +control tools were intended to help a single user to manage revisions 9.19 +of a single file. Over the past few decades, the scope of revision 9.20 +control tools has expanded greatly; they now manage multiple files, 9.21 +and help multiple people to work together. The best modern revision 9.22 +control tools have no problem coping with thousands of people working 9.23 +together on projects that consist of hundreds of thousands of files. 9.24 + 9.25 +\subsection{Why use revision control?} 9.26 + 9.27 +There are a number of reasons why you or your team might want to use 9.28 +an automated revision control tool for a project. 9.29 +\begin{itemize} 9.30 +\item It will track the history and evolution of your project, so you 9.31 + don't have to. For every change, you'll have a log of \emph{who} 9.32 + made it; \emph{why} they made it; \emph{when} they made it; and 9.33 + \emph{what} the change was. 9.34 +\item When you're working with other people, revision control software 9.35 + makes it easier for you to collaborate. For example, when people 9.36 + more or less simultaneously make potentially incompatible changes, 9.37 + the software will help you to identify and resolve those conflicts. 9.38 +\item It can help you to recover from mistakes. If you make a change 9.39 + that later turns out to be in error, you can revert to an earlier 9.40 + version of one or more files. In fact, a \emph{really} good 9.41 + revision control tool will even help you to efficiently figure out 9.42 + exactly when a problem was introduced (see 9.43 + section~\ref{sec:undo:bisect} for details). 9.44 +\item It will help you to work simultaneously on, and manage the drift 9.45 + between, multiple versions of your project. 9.46 +\end{itemize} 9.47 +Most of these reasons are equally valid---at least in theory---whether 9.48 +you're working on a project by yourself, or with a hundred other 9.49 +people. 9.50 + 9.51 +A key question about the practicality of revision control at these two 9.52 +different scales (``lone hacker'' and ``huge team'') is how its 9.53 +\emph{benefits} compare to its \emph{costs}. A revision control tool 9.54 +that's difficult to understand or use is going to impose a high cost. 9.55 + 9.56 +A five-hundred-person project is likely to collapse under its own 9.57 +weight almost immediately without a revision control tool and process. 9.58 +In this case, the cost of using revision control might hardly seem 9.59 +worth considering, since \emph{without} it, failure is almost 9.60 +guaranteed. 9.61 + 9.62 +On the other hand, a one-person ``quick hack'' might seem like a poor 9.63 +place to use a revision control tool, because surely the cost of using 9.64 +one must be close to the overall cost of the project. Right? 9.65 + 9.66 +Mercurial uniquely supports \emph{both} of these scales of 9.67 +development. You can learn the basics in just a few minutes, and due 9.68 +to its low overhead, you can apply revision control to the smallest of 9.69 +projects with ease. Its simplicity means you won't have a lot of 9.70 +abstruse concepts or command sequences competing for mental space with 9.71 +whatever you're \emph{really} trying to do. At the same time, 9.72 +Mercurial's high performance and peer-to-peer nature let you scale 9.73 +painlessly to handle large projects. 9.74 + 9.75 +No revision control tool can rescue a poorly run project, but a good 9.76 +choice of tools can make a huge difference to the fluidity with which 9.77 +you can work on a project. 9.78 + 9.79 +\subsection{The many names of revision control} 9.80 + 9.81 +Revision control is a diverse field, so much so that it doesn't 9.82 +actually have a single name or acronym. Here are a few of the more 9.83 +common names and acronyms you'll encounter: 9.84 +\begin{itemize} 9.85 +\item Revision control (RCS) 9.86 +\item Software configuration management (SCM), or configuration management 9.87 +\item Source code management 9.88 +\item Source code control, or source control 9.89 +\item Version control (VCS) 9.90 +\end{itemize} 9.91 +Some people claim that these terms actually have different meanings, 9.92 +but in practice they overlap so much that there's no agreed or even 9.93 +useful way to tease them apart. 9.94 + 9.95 +\section{A short history of revision control} 9.96 + 9.97 +The best known of the old-time revision control tools is SCCS (Source 9.98 +Code Control System), which Marc Rochkind wrote at Bell Labs, in the 9.99 +early 1970s. SCCS operated on individual files, and required every 9.100 +person working on a project to have access to a shared workspace on a 9.101 +single system. Only one person could modify a file at any time; 9.102 +arbitration for access to files was via locks. It was common for 9.103 +people to lock files, and later forget to unlock them, preventing 9.104 +anyone else from modifying those files without the help of an 9.105 +administrator. 9.106 + 9.107 +Walter Tichy developed a free alternative to SCCS in the early 1980s; 9.108 +he called his program RCS (Revison Control System). Like SCCS, RCS 9.109 +required developers to work in a single shared workspace, and to lock 9.110 +files to prevent multiple people from modifying them simultaneously. 9.111 + 9.112 +Later in the 1980s, Dick Grune used RCS as a building block for a set 9.113 +of shell scripts he initially called cmt, but then renamed to CVS 9.114 +(Concurrent Versions System). The big innovation of CVS was that it 9.115 +let developers work simultaneously and somewhat independently in their 9.116 +own personal workspaces. The personal workspaces prevented developers 9.117 +from stepping on each other's toes all the time, as was common with 9.118 +SCCS and RCS. Each developer had a copy of every project file, and 9.119 +could modify their copies independently. They had to merge their 9.120 +edits prior to committing changes to the central repository. 9.121 + 9.122 +Brian Berliner took Grune's original scripts and rewrote them in~C, 9.123 +releasing in 1989 the code that has since developed into the modern 9.124 +version of CVS. CVS subsequently acquired the ability to operate over 9.125 +a network connection, giving it a client/server architecture. CVS's 9.126 +architecture is centralised; only the server has a copy of the history 9.127 +of the project. Client workspaces just contain copies of recent 9.128 +versions of the project's files, and a little metadata to tell them 9.129 +where the server is. CVS has been enormously successful; it is 9.130 +probably the world's most widely used revision control system. 9.131 + 9.132 +In the early 1990s, Sun Microsystems developed an early distributed 9.133 +revision control system, called TeamWare. A TeamWare workspace 9.134 +contains a complete copy of the project's history. TeamWare has no 9.135 +notion of a central repository. (CVS relied upon RCS for its history 9.136 +storage; TeamWare used SCCS.) 9.137 + 9.138 +As the 1990s progressed, awareness grew of a number of problems with 9.139 +CVS. It records simultaneous changes to multiple files individually, 9.140 +instead of grouping them together as a single logically atomic 9.141 +operation. It does not manage its file hierarchy well; it is easy to 9.142 +make a mess of a repository by renaming files and directories. Worse, 9.143 +its source code is difficult to read and maintain, which made the 9.144 +``pain level'' of fixing these architectural problems prohibitive. 9.145 + 9.146 +In 2001, Jim Blandy and Karl Fogel, two developers who had worked on 9.147 +CVS, started a project to replace it with a tool that would have a 9.148 +better architecture and cleaner code. The result, Subversion, does 9.149 +not stray from CVS's centralised client/server model, but it adds 9.150 +multi-file atomic commits, better namespace management, and a number 9.151 +of other features that make it a generally better tool than CVS. 9.152 +Since its initial release, it has rapidly grown in popularity. 9.153 + 9.154 +More or less simultaneously, Graydon Hoare began working on an 9.155 +ambitious distributed revision control system that he named Monotone. 9.156 +While Monotone addresses many of CVS's design flaws and has a 9.157 +peer-to-peer architecture, it goes beyond earlier (and subsequent) 9.158 +revision control tools in a number of innovative ways. It uses 9.159 +cryptographic hashes as identifiers, and has an integral notion of 9.160 +``trust'' for code from different sources. 9.161 + 9.162 +Mercurial began life in 2005. While a few aspects of its design are 9.163 +influenced by Monotone, Mercurial focuses on ease of use, high 9.164 +performance, and scalability to very large projects. 9.165 + 9.166 +\section{Trends in revision control} 9.167 + 9.168 +There has been an unmistakable trend in the development and use of 9.169 +revision control tools over the past four decades, as people have 9.170 +become familiar with the capabilities of their tools and constrained 9.171 +by their limitations. 9.172 + 9.173 +The first generation began by managing single files on individual 9.174 +computers. Although these tools represented a huge advance over 9.175 +ad-hoc manual revision control, their locking model and reliance on a 9.176 +single computer limited them to small, tightly-knit teams. 9.177 + 9.178 +The second generation loosened these constraints by moving to 9.179 +network-centered architectures, and managing entire projects at a 9.180 +time. As projects grew larger, they ran into new problems. With 9.181 +clients needing to talk to servers very frequently, server scaling 9.182 +became an issue for large projects. An unreliable network connection 9.183 +could prevent remote users from being able to talk to the server at 9.184 +all. As open source projects started making read-only access 9.185 +available anonymously to anyone, people without commit privileges 9.186 +found that they could not use the tools to interact with a project in 9.187 +a natural way, as they could not record their changes. 9.188 + 9.189 +The current generation of revision control tools is peer-to-peer in 9.190 +nature. All of these systems have dropped the dependency on a single 9.191 +central server, and allow people to distribute their revision control 9.192 +data to where it's actually needed. Collaboration over the Internet 9.193 +has moved from constrained by technology to a matter of choice and 9.194 +consensus. Modern tools can operate offline indefinitely and 9.195 +autonomously, with a network connection only needed when syncing 9.196 +changes with another repository. 9.197 + 9.198 +\section{A few of the advantages of distributed revision control} 9.199 + 9.200 +Even though distributed revision control tools have for several years 9.201 +been as robust and usable as their previous-generation counterparts, 9.202 +people using older tools have not yet necessarily woken up to their 9.203 +advantages. There are a number of ways in which distributed tools 9.204 +shine relative to centralised ones. 9.205 + 9.206 +For an individual developer, distributed tools are almost always much 9.207 +faster than centralised tools. This is for a simple reason: a 9.208 +centralised tool needs to talk over the network for many common 9.209 +operations, because most metadata is stored in a single copy on the 9.210 +central server. A distributed tool stores all of its metadata 9.211 +locally. All else being equal, talking over the network adds overhead 9.212 +to a centralised tool. Don't underestimate the value of a snappy, 9.213 +responsive tool: you're going to spend a lot of time interacting with 9.214 +your revision control software. 9.215 + 9.216 +Distributed tools are indifferent to the vagaries of your server 9.217 +infrastructure, again because they replicate metadata to so many 9.218 +locations. If you use a centralised system and your server catches 9.219 +fire, you'd better hope that your backup media are reliable, and that 9.220 +your last backup was recent and actually worked. With a distributed 9.221 +tool, you have many backups available on every contributor's computer. 9.222 + 9.223 +The reliability of your network will affect distributed tools far less 9.224 +than it will centralised tools. You can't even use a centralised tool 9.225 +without a network connection, except for a few highly constrained 9.226 +commands. With a distributed tool, if your network connection goes 9.227 +down while you're working, you may not even notice. The only thing 9.228 +you won't be able to do is talk to repositories on other computers, 9.229 +something that is relatively rare compared with local operations. If 9.230 +you have a far-flung team of collaborators, this may be significant. 9.231 + 9.232 +\subsection{Advantages for open source projects} 9.233 + 9.234 +If you take a shine to an open source project and decide that you 9.235 +would like to start hacking on it, and that project uses a distributed 9.236 +revision control tool, you are at once a peer with the people who 9.237 +consider themselves the ``core'' of that project. If they publish 9.238 +their repositories, you can immediately copy their project history, 9.239 +start making changes, and record your work, using the same tools in 9.240 +the same ways as insiders. By contrast, with a centralised tool, you 9.241 +must use the software in a ``read only'' mode unless someone grants 9.242 +you permission to commit changes to their central server. Until then, 9.243 +you won't be able to record changes, and your local modifications will 9.244 +be at risk of corruption any time you try to update your client's view 9.245 +of the repository. 9.246 + 9.247 +\subsubsection{The forking non-problem} 9.248 + 9.249 +It has been suggested that distributed revision control tools pose 9.250 +some sort of risk to open source projects because they make it easy to 9.251 +``fork'' the development of a project. A fork happens when there are 9.252 +differences in opinion or attitude between groups of developers that 9.253 +cause them to decide that they can't work together any longer. Each 9.254 +side takes a more or less complete copy of the project's source code, 9.255 +and goes off in its own direction. 9.256 + 9.257 +Sometimes the camps in a fork decide to reconcile their differences. 9.258 +With a centralised revision control system, the \emph{technical} 9.259 +process of reconciliation is painful, and has to be performed largely 9.260 +by hand. You have to decide whose revision history is going to 9.261 +``win'', and graft the other team's changes into the tree somehow. 9.262 +This usually loses some or all of one side's revision history. 9.263 + 9.264 +What distributed tools do with respect to forking is they make forking 9.265 +the \emph{only} way to develop a project. Every single change that 9.266 +you make is potentially a fork point. The great strength of this 9.267 +approach is that a distributed revision control tool has to be really 9.268 +good at \emph{merging} forks, because forks are absolutely 9.269 +fundamental: they happen all the time. 9.270 + 9.271 +If every piece of work that everybody does, all the time, is framed in 9.272 +terms of forking and merging, then what the open source world refers 9.273 +to as a ``fork'' becomes \emph{purely} a social issue. If anything, 9.274 +distributed tools \emph{lower} the likelihood of a fork: 9.275 +\begin{itemize} 9.276 +\item They eliminate the social distinction that centralised tools 9.277 + impose: that between insiders (people with commit access) and 9.278 + outsiders (people without). 9.279 +\item They make it easier to reconcile after a social fork, because 9.280 + all that's involved from the perspective of the revision control 9.281 + software is just another merge. 9.282 +\end{itemize} 9.283 + 9.284 +Some people resist distributed tools because they want to retain tight 9.285 +control over their projects, and they believe that centralised tools 9.286 +give them this control. However, if you're of this belief, and you 9.287 +publish your CVS or Subversion repositories publically, there are 9.288 +plenty of tools available that can pull out your entire project's 9.289 +history (albeit slowly) and recreate it somewhere that you don't 9.290 +control. So while your control in this case is illusory, you are 9.291 +forgoing the ability to fluidly collaborate with whatever people feel 9.292 +compelled to mirror and fork your history. 9.293 + 9.294 +\subsection{Advantages for commercial projects} 9.295 + 9.296 +Many commercial projects are undertaken by teams that are scattered 9.297 +across the globe. Contributors who are far from a central server will 9.298 +see slower command execution and perhaps less reliability. Commercial 9.299 +revision control systems attempt to ameliorate these problems with 9.300 +remote-site replication add-ons that are typically expensive to buy 9.301 +and cantankerous to administer. A distributed system doesn't suffer 9.302 +from these problems in the first place. Better yet, you can easily 9.303 +set up multiple authoritative servers, say one per site, so that 9.304 +there's no redundant communication between repositories over expensive 9.305 +long-haul network links. 9.306 + 9.307 +Centralised revision control systems tend to have relatively low 9.308 +scalability. It's not unusual for an expensive centralised system to 9.309 +fall over under the combined load of just a few dozen concurrent 9.310 +users. Once again, the typical response tends to be an expensive and 9.311 +clunky replication facility. Since the load on a central server---if 9.312 +you have one at all---is many times lower with a distributed 9.313 +tool (because all of the data is replicated everywhere), a single 9.314 +cheap server can handle the needs of a much larger team, and 9.315 +replication to balance load becomes a simple matter of scripting. 9.316 + 9.317 +If you have an employee in the field, troubleshooting a problem at a 9.318 +customer's site, they'll benefit from distributed revision control. 9.319 +The tool will let them generate custom builds, try different fixes in 9.320 +isolation from each other, and search efficiently through history for 9.321 +the sources of bugs and regressions in the customer's environment, all 9.322 +without needing to connect to your company's network. 9.323 + 9.324 +\section{Why choose Mercurial?} 9.325 + 9.326 +Mercurial has a unique set of properties that make it a particularly 9.327 +good choice as a revision control system. 9.328 +\begin{itemize} 9.329 +\item It is easy to learn and use. 9.330 +\item It is lightweight. 9.331 +\item It scales excellently. 9.332 +\item It is easy to customise. 9.333 +\end{itemize} 9.334 + 9.335 +If you are at all familiar with revision control systems, you should 9.336 +be able to get up and running with Mercurial in less than five 9.337 +minutes. Even if not, it will take no more than a few minutes 9.338 +longer. Mercurial's command and feature sets are generally uniform 9.339 +and consistent, so you can keep track of a few general rules instead 9.340 +of a host of exceptions. 9.341 + 9.342 +On a small project, you can start working with Mercurial in moments. 9.343 +Creating new changes and branches; transferring changes around 9.344 +(whether locally or over a network); and history and status operations 9.345 +are all fast. Mercurial attempts to stay nimble and largely out of 9.346 +your way by combining low cognitive overhead with blazingly fast 9.347 +operations. 9.348 + 9.349 +The usefulness of Mercurial is not limited to small projects: it is 9.350 +used by projects with hundreds to thousands of contributors, each 9.351 +containing tens of thousands of files and hundreds of megabytes of 9.352 +source code. 9.353 + 9.354 +If the core functionality of Mercurial is not enough for you, it's 9.355 +easy to build on. Mercurial is well suited to scripting tasks, and 9.356 +its clean internals and implementation in Python make it easy to add 9.357 +features in the form of extensions. There are a number of popular and 9.358 +useful extensions already available, ranging from helping to identify 9.359 +bugs to improving performance. 9.360 + 9.361 +\section{Mercurial compared with other tools} 9.362 + 9.363 +Before you read on, please understand that this section necessarily 9.364 +reflects my own experiences, interests, and (dare I say it) biases. I 9.365 +have used every one of the revision control tools listed below, in 9.366 +most cases for several years at a time. 9.367 + 9.368 + 9.369 +\subsection{Subversion} 9.370 + 9.371 +Subversion is a popular revision control tool, developed to replace 9.372 +CVS. It has a centralised client/server architecture. 9.373 + 9.374 +Subversion and Mercurial have similarly named commands for performing 9.375 +the same operations, so if you're familiar with one, it is easy to 9.376 +learn to use the other. Both tools are portable to all popular 9.377 +operating systems. 9.378 + 9.379 +Prior to version 1.5, Subversion had no useful support for merges. 9.380 +At the time of writing, its merge tracking capability is new, and known to be 9.381 +\href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated 9.382 + and buggy}. 9.383 + 9.384 +Mercurial has a substantial performance advantage over Subversion on 9.385 +every revision control operation I have benchmarked. I have measured 9.386 +its advantage as ranging from a factor of two to a factor of six when 9.387 +compared with Subversion~1.4.3's \emph{ra\_local} file store, which is 9.388 +the fastest access method available. In more realistic deployments 9.389 +involving a network-based store, Subversion will be at a substantially 9.390 +larger disadvantage. Because many Subversion commands must talk to 9.391 +the server and Subversion does not have useful replication facilities, 9.392 +server capacity and network bandwidth become bottlenecks for modestly 9.393 +large projects. 9.394 + 9.395 +Additionally, Subversion incurs substantial storage overhead to avoid 9.396 +network transactions for a few common operations, such as finding 9.397 +modified files (\texttt{status}) and displaying modifications against 9.398 +the current revision (\texttt{diff}). As a result, a Subversion 9.399 +working copy is often the same size as, or larger than, a Mercurial 9.400 +repository and working directory, even though the Mercurial repository 9.401 +contains a complete history of the project. 9.402 + 9.403 +Subversion is widely supported by third party tools. Mercurial 9.404 +currently lags considerably in this area. This gap is closing, 9.405 +however, and indeed some of Mercurial's GUI tools now outshine their 9.406 +Subversion equivalents. Like Mercurial, Subversion has an excellent 9.407 +user manual. 9.408 + 9.409 +Because Subversion doesn't store revision history on the client, it is 9.410 +well suited to managing projects that deal with lots of large, opaque 9.411 +binary files. If you check in fifty revisions to an incompressible 9.412 +10MB file, Subversion's client-side space usage stays constant The 9.413 +space used by any distributed SCM will grow rapidly in proportion to 9.414 +the number of revisions, because the differences between each revision 9.415 +are large. 9.416 + 9.417 +In addition, it's often difficult or, more usually, impossible to 9.418 +merge different versions of a binary file. Subversion's ability to 9.419 +let a user lock a file, so that they temporarily have the exclusive 9.420 +right to commit changes to it, can be a significant advantage to a 9.421 +project where binary files are widely used. 9.422 + 9.423 +Mercurial can import revision history from a Subversion repository. 9.424 +It can also export revision history to a Subversion repository. This 9.425 +makes it easy to ``test the waters'' and use Mercurial and Subversion 9.426 +in parallel before deciding to switch. History conversion is 9.427 +incremental, so you can perform an initial conversion, then small 9.428 +additional conversions afterwards to bring in new changes. 9.429 + 9.430 + 9.431 +\subsection{Git} 9.432 + 9.433 +Git is a distributed revision control tool that was developed for 9.434 +managing the Linux kernel source tree. Like Mercurial, its early 9.435 +design was somewhat influenced by Monotone. 9.436 + 9.437 +Git has a very large command set, with version~1.5.0 providing~139 9.438 +individual commands. It has something of a reputation for being 9.439 +difficult to learn. Compared to Git, Mercurial has a strong focus on 9.440 +simplicity. 9.441 + 9.442 +In terms of performance, Git is extremely fast. In several cases, it 9.443 +is faster than Mercurial, at least on Linux, while Mercurial performs 9.444 +better on other operations. However, on Windows, the performance and 9.445 +general level of support that Git provides is, at the time of writing, 9.446 +far behind that of Mercurial. 9.447 + 9.448 +While a Mercurial repository needs no maintenance, a Git repository 9.449 +requires frequent manual ``repacks'' of its metadata. Without these, 9.450 +performance degrades, while space usage grows rapidly. A server that 9.451 +contains many Git repositories that are not rigorously and frequently 9.452 +repacked will become heavily disk-bound during backups, and there have 9.453 +been instances of daily backups taking far longer than~24 hours as a 9.454 +result. A freshly packed Git repository is slightly smaller than a 9.455 +Mercurial repository, but an unpacked repository is several orders of 9.456 +magnitude larger. 9.457 + 9.458 +The core of Git is written in C. Many Git commands are implemented as 9.459 +shell or Perl scripts, and the quality of these scripts varies widely. 9.460 +I have encountered several instances where scripts charged along 9.461 +blindly in the presence of errors that should have been fatal. 9.462 + 9.463 +Mercurial can import revision history from a Git repository. 9.464 + 9.465 + 9.466 +\subsection{CVS} 9.467 + 9.468 +CVS is probably the most widely used revision control tool in the 9.469 +world. Due to its age and internal untidiness, it has been only 9.470 +lightly maintained for many years. 9.471 + 9.472 +It has a centralised client/server architecture. It does not group 9.473 +related file changes into atomic commits, making it easy for people to 9.474 +``break the build'': one person can successfully commit part of a 9.475 +change and then be blocked by the need for a merge, causing other 9.476 +people to see only a portion of the work they intended to do. This 9.477 +also affects how you work with project history. If you want to see 9.478 +all of the modifications someone made as part of a task, you will need 9.479 +to manually inspect the descriptions and timestamps of the changes 9.480 +made to each file involved (if you even know what those files were). 9.481 + 9.482 +CVS has a muddled notion of tags and branches that I will not attempt 9.483 +to even describe. It does not support renaming of files or 9.484 +directories well, making it easy to corrupt a repository. It has 9.485 +almost no internal consistency checking capabilities, so it is usually 9.486 +not even possible to tell whether or how a repository is corrupt. I 9.487 +would not recommend CVS for any project, existing or new. 9.488 + 9.489 +Mercurial can import CVS revision history. However, there are a few 9.490 +caveats that apply; these are true of every other revision control 9.491 +tool's CVS importer, too. Due to CVS's lack of atomic changes and 9.492 +unversioned filesystem hierarchy, it is not possible to reconstruct 9.493 +CVS history completely accurately; some guesswork is involved, and 9.494 +renames will usually not show up. Because a lot of advanced CVS 9.495 +administration has to be done by hand and is hence error-prone, it's 9.496 +common for CVS importers to run into multiple problems with corrupted 9.497 +repositories (completely bogus revision timestamps and files that have 9.498 +remained locked for over a decade are just two of the less interesting 9.499 +problems I can recall from personal experience). 9.500 + 9.501 +Mercurial can import revision history from a CVS repository. 9.502 + 9.503 + 9.504 +\subsection{Commercial tools} 9.505 + 9.506 +Perforce has a centralised client/server architecture, with no 9.507 +client-side caching of any data. Unlike modern revision control 9.508 +tools, Perforce requires that a user run a command to inform the 9.509 +server about every file they intend to edit. 9.510 + 9.511 +The performance of Perforce is quite good for small teams, but it 9.512 +falls off rapidly as the number of users grows beyond a few dozen. 9.513 +Modestly large Perforce installations require the deployment of 9.514 +proxies to cope with the load their users generate. 9.515 + 9.516 + 9.517 +\subsection{Choosing a revision control tool} 9.518 + 9.519 +With the exception of CVS, all of the tools listed above have unique 9.520 +strengths that suit them to particular styles of work. There is no 9.521 +single revision control tool that is best in all situations. 9.522 + 9.523 +As an example, Subversion is a good choice for working with frequently 9.524 +edited binary files, due to its centralised nature and support for 9.525 +file locking. 9.526 + 9.527 +I personally find Mercurial's properties of simplicity, performance, 9.528 +and good merge support to be a compelling combination that has served 9.529 +me well for several years. 9.530 + 9.531 + 9.532 +\section{Switching from another tool to Mercurial} 9.533 + 9.534 +Mercurial is bundled with an extension named \hgext{convert}, which 9.535 +can incrementally import revision history from several other revision 9.536 +control tools. By ``incremental'', I mean that you can convert all of 9.537 +a project's history to date in one go, then rerun the conversion later 9.538 +to obtain new changes that happened after the initial conversion. 9.539 + 9.540 +The revision control tools supported by \hgext{convert} are as 9.541 +follows: 9.542 +\begin{itemize} 9.543 +\item Subversion 9.544 +\item CVS 9.545 +\item Git 9.546 +\item Darcs 9.547 +\end{itemize} 9.548 + 9.549 +In addition, \hgext{convert} can export changes from Mercurial to 9.550 +Subversion. This makes it possible to try Subversion and Mercurial in 9.551 +parallel before committing to a switchover, without risking the loss 9.552 +of any work. 9.553 + 9.554 +The \hgxcmd{conver}{convert} command is easy to use. Simply point it 9.555 +at the path or URL of the source repository, optionally give it the 9.556 +name of the destination repository, and it will start working. After 9.557 +the initial conversion, just run the same command again to import new 9.558 +changes. 9.559 + 9.560 + 9.561 +%%% Local Variables: 9.562 +%%% mode: latex 9.563 +%%% TeX-master: "00book" 9.564 +%%% End:
10.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 10.2 +++ b/en/ch02-tour-basic.tex Thu Jan 29 22:56:27 2009 -0800 10.3 @@ -0,0 +1,624 @@ 10.4 +\chapter{A tour of Mercurial: the basics} 10.5 +\label{chap:tour-basic} 10.6 + 10.7 +\section{Installing Mercurial on your system} 10.8 +\label{sec:tour:install} 10.9 + 10.10 +Prebuilt binary packages of Mercurial are available for every popular 10.11 +operating system. These make it easy to start using Mercurial on your 10.12 +computer immediately. 10.13 + 10.14 +\subsection{Linux} 10.15 + 10.16 +Because each Linux distribution has its own packaging tools, policies, 10.17 +and rate of development, it's difficult to give a comprehensive set of 10.18 +instructions on how to install Mercurial binaries. The version of 10.19 +Mercurial that you will end up with can vary depending on how active 10.20 +the person is who maintains the package for your distribution. 10.21 + 10.22 +To keep things simple, I will focus on installing Mercurial from the 10.23 +command line under the most popular Linux distributions. Most of 10.24 +these distributions provide graphical package managers that will let 10.25 +you install Mercurial with a single click; the package name to look 10.26 +for is \texttt{mercurial}. 10.27 + 10.28 +\begin{itemize} 10.29 +\item[Debian] 10.30 + \begin{codesample4} 10.31 + apt-get install mercurial 10.32 + \end{codesample4} 10.33 + 10.34 +\item[Fedora Core] 10.35 + \begin{codesample4} 10.36 + yum install mercurial 10.37 + \end{codesample4} 10.38 + 10.39 +\item[Gentoo] 10.40 + \begin{codesample4} 10.41 + emerge mercurial 10.42 + \end{codesample4} 10.43 + 10.44 +\item[OpenSUSE] 10.45 + \begin{codesample4} 10.46 + yum install mercurial 10.47 + \end{codesample4} 10.48 + 10.49 +\item[Ubuntu] Ubuntu's Mercurial package is based on Debian's. To 10.50 + install it, run the following command. 10.51 + \begin{codesample4} 10.52 + apt-get install mercurial 10.53 + \end{codesample4} 10.54 + The Ubuntu package for Mercurial tends to lag behind the Debian 10.55 + version by a considerable time margin (at the time of writing, seven 10.56 + months), which in some cases will mean that on Ubuntu, you may run 10.57 + into problems that have since been fixed in the Debian package. 10.58 +\end{itemize} 10.59 + 10.60 +\subsection{Solaris} 10.61 + 10.62 +SunFreeWare, at \url{http://www.sunfreeware.com}, is a good source for a 10.63 +large number of pre-built Solaris packages for 32 and 64 bit Intel and 10.64 +Sparc architectures, including current versions of Mercurial. 10.65 + 10.66 +\subsection{Mac OS X} 10.67 + 10.68 +Lee Cantey publishes an installer of Mercurial for Mac OS~X at 10.69 +\url{http://mercurial.berkwood.com}. This package works on both 10.70 +Intel-~and Power-based Macs. Before you can use it, you must install 10.71 +a compatible version of Universal MacPython~\cite{web:macpython}. This 10.72 +is easy to do; simply follow the instructions on Lee's site. 10.73 + 10.74 +It's also possible to install Mercurial using Fink or MacPorts, 10.75 +two popular free package managers for Mac OS X. If you have Fink, 10.76 +use \command{sudo apt-get install mercurial-py25}. If MacPorts, 10.77 +\command{sudo port install mercurial}. 10.78 + 10.79 +\subsection{Windows} 10.80 + 10.81 +Lee Cantey publishes an installer of Mercurial for Windows at 10.82 +\url{http://mercurial.berkwood.com}. This package has no external 10.83 +dependencies; it ``just works''. 10.84 + 10.85 +\begin{note} 10.86 + The Windows version of Mercurial does not automatically convert line 10.87 + endings between Windows and Unix styles. If you want to share work 10.88 + with Unix users, you must do a little additional configuration 10.89 + work. XXX Flesh this out. 10.90 +\end{note} 10.91 + 10.92 +\section{Getting started} 10.93 + 10.94 +To begin, we'll use the \hgcmd{version} command to find out whether 10.95 +Mercurial is actually installed properly. The actual version 10.96 +information that it prints isn't so important; it's whether it prints 10.97 +anything at all that we care about. 10.98 +\interaction{tour.version} 10.99 + 10.100 +\subsection{Built-in help} 10.101 + 10.102 +Mercurial provides a built-in help system. This is invaluable for those 10.103 +times when you find yourself stuck trying to remember how to run a 10.104 +command. If you are completely stuck, simply run \hgcmd{help}; it 10.105 +will print a brief list of commands, along with a description of what 10.106 +each does. If you ask for help on a specific command (as below), it 10.107 +prints more detailed information. 10.108 +\interaction{tour.help} 10.109 +For a more impressive level of detail (which you won't usually need) 10.110 +run \hgcmdargs{help}{\hggopt{-v}}. The \hggopt{-v} option is short 10.111 +for \hggopt{--verbose}, and tells Mercurial to print more information 10.112 +than it usually would. 10.113 + 10.114 +\section{Working with a repository} 10.115 + 10.116 +In Mercurial, everything happens inside a \emph{repository}. The 10.117 +repository for a project contains all of the files that ``belong to'' 10.118 +that project, along with a historical record of the project's files. 10.119 + 10.120 +There's nothing particularly magical about a repository; it is simply 10.121 +a directory tree in your filesystem that Mercurial treats as special. 10.122 +You can rename or delete a repository any time you like, using either the 10.123 +command line or your file browser. 10.124 + 10.125 +\subsection{Making a local copy of a repository} 10.126 + 10.127 +\emph{Copying} a repository is just a little bit special. While you 10.128 +could use a normal file copying command to make a copy of a 10.129 +repository, it's best to use a built-in command that Mercurial 10.130 +provides. This command is called \hgcmd{clone}, because it creates an 10.131 +identical copy of an existing repository. 10.132 +\interaction{tour.clone} 10.133 +If our clone succeeded, we should now have a local directory called 10.134 +\dirname{hello}. This directory will contain some files. 10.135 +\interaction{tour.ls} 10.136 +These files have the same contents and history in our repository as 10.137 +they do in the repository we cloned. 10.138 + 10.139 +Every Mercurial repository is complete, self-contained, and 10.140 +independent. It contains its own private copy of a project's files 10.141 +and history. A cloned repository remembers the location of the 10.142 +repository it was cloned from, but it does not communicate with that 10.143 +repository, or any other, unless you tell it to. 10.144 + 10.145 +What this means for now is that we're free to experiment with our 10.146 +repository, safe in the knowledge that it's a private ``sandbox'' that 10.147 +won't affect anyone else. 10.148 + 10.149 +\subsection{What's in a repository?} 10.150 + 10.151 +When we take a more detailed look inside a repository, we can see that 10.152 +it contains a directory named \dirname{.hg}. This is where Mercurial 10.153 +keeps all of its metadata for the repository. 10.154 +\interaction{tour.ls-a} 10.155 + 10.156 +The contents of the \dirname{.hg} directory and its subdirectories are 10.157 +private to Mercurial. Every other file and directory in the 10.158 +repository is yours to do with as you please. 10.159 + 10.160 +To introduce a little terminology, the \dirname{.hg} directory is the 10.161 +``real'' repository, and all of the files and directories that coexist 10.162 +with it are said to live in the \emph{working directory}. An easy way 10.163 +to remember the distinction is that the \emph{repository} contains the 10.164 +\emph{history} of your project, while the \emph{working directory} 10.165 +contains a \emph{snapshot} of your project at a particular point in 10.166 +history. 10.167 + 10.168 +\section{A tour through history} 10.169 + 10.170 +One of the first things we might want to do with a new, unfamiliar 10.171 +repository is understand its history. The \hgcmd{log} command gives 10.172 +us a view of history. 10.173 +\interaction{tour.log} 10.174 +By default, this command prints a brief paragraph of output for each 10.175 +change to the project that was recorded. In Mercurial terminology, we 10.176 +call each of these recorded events a \emph{changeset}, because it can 10.177 +contain a record of changes to several files. 10.178 + 10.179 +The fields in a record of output from \hgcmd{log} are as follows. 10.180 +\begin{itemize} 10.181 +\item[\texttt{changeset}] This field has the format of a number, 10.182 + followed by a colon, followed by a hexadecimal string. These are 10.183 + \emph{identifiers} for the changeset. There are two identifiers 10.184 + because the number is shorter and easier to type than the hex 10.185 + string. 10.186 +\item[\texttt{user}] The identity of the person who created the 10.187 + changeset. This is a free-form field, but it most often contains a 10.188 + person's name and email address. 10.189 +\item[\texttt{date}] The date and time on which the changeset was 10.190 + created, and the timezone in which it was created. (The date and 10.191 + time are local to that timezone; they display what time and date it 10.192 + was for the person who created the changeset.) 10.193 +\item[\texttt{summary}] The first line of the text message that the 10.194 + creator of the changeset entered to describe the changeset. 10.195 +\end{itemize} 10.196 +The default output printed by \hgcmd{log} is purely a summary; it is 10.197 +missing a lot of detail. 10.198 + 10.199 +Figure~\ref{fig:tour-basic:history} provides a graphical representation of 10.200 +the history of the \dirname{hello} repository, to make it a little 10.201 +easier to see which direction history is ``flowing'' in. We'll be 10.202 +returning to this figure several times in this chapter and the chapter 10.203 +that follows. 10.204 + 10.205 +\begin{figure}[ht] 10.206 + \centering 10.207 + \grafix{tour-history} 10.208 + \caption{Graphical history of the \dirname{hello} repository} 10.209 + \label{fig:tour-basic:history} 10.210 +\end{figure} 10.211 + 10.212 +\subsection{Changesets, revisions, and talking to other 10.213 + people} 10.214 + 10.215 +As English is a notoriously sloppy language, and computer science has 10.216 +a hallowed history of terminological confusion (why use one term when 10.217 +four will do?), revision control has a variety of words and phrases 10.218 +that mean the same thing. If you are talking about Mercurial history 10.219 +with other people, you will find that the word ``changeset'' is often 10.220 +compressed to ``change'' or (when written) ``cset'', and sometimes a 10.221 +changeset is referred to as a ``revision'' or a ``rev''. 10.222 + 10.223 +While it doesn't matter what \emph{word} you use to refer to the 10.224 +concept of ``a~changeset'', the \emph{identifier} that you use to 10.225 +refer to ``a~\emph{specific} changeset'' is of great importance. 10.226 +Recall that the \texttt{changeset} field in the output from 10.227 +\hgcmd{log} identifies a changeset using both a number and a 10.228 +hexadecimal string. 10.229 +\begin{itemize} 10.230 +\item The revision number is \emph{only valid in that repository}, 10.231 +\item while the hex string is the \emph{permanent, unchanging 10.232 + identifier} that will always identify that exact changeset in 10.233 + \emph{every} copy of the repository. 10.234 +\end{itemize} 10.235 +This distinction is important. If you send someone an email talking 10.236 +about ``revision~33'', there's a high likelihood that their 10.237 +revision~33 will \emph{not be the same} as yours. The reason for this 10.238 +is that a revision number depends on the order in which changes 10.239 +arrived in a repository, and there is no guarantee that the same 10.240 +changes will happen in the same order in different repositories. 10.241 +Three changes $a,b,c$ can easily appear in one repository as $0,1,2$, 10.242 +while in another as $1,0,2$. 10.243 + 10.244 +Mercurial uses revision numbers purely as a convenient shorthand. If 10.245 +you need to discuss a changeset with someone, or make a record of a 10.246 +changeset for some other reason (for example, in a bug report), use 10.247 +the hexadecimal identifier. 10.248 + 10.249 +\subsection{Viewing specific revisions} 10.250 + 10.251 +To narrow the output of \hgcmd{log} down to a single revision, use the 10.252 +\hgopt{log}{-r} (or \hgopt{log}{--rev}) option. You can use either a 10.253 +revision number or a long-form changeset identifier, and you can 10.254 +provide as many revisions as you want. \interaction{tour.log-r} 10.255 + 10.256 +If you want to see the history of several revisions without having to 10.257 +list each one, you can use \emph{range notation}; this lets you 10.258 +express the idea ``I want all revisions between $a$ and $b$, 10.259 +inclusive''. 10.260 +\interaction{tour.log.range} 10.261 +Mercurial also honours the order in which you specify revisions, so 10.262 +\hgcmdargs{log}{-r 2:4} prints $2,3,4$ while \hgcmdargs{log}{-r 4:2} 10.263 +prints $4,3,2$. 10.264 + 10.265 +\subsection{More detailed information} 10.266 + 10.267 +While the summary information printed by \hgcmd{log} is useful if you 10.268 +already know what you're looking for, you may need to see a complete 10.269 +description of the change, or a list of the files changed, if you're 10.270 +trying to decide whether a changeset is the one you're looking for. 10.271 +The \hgcmd{log} command's \hggopt{-v} (or \hggopt{--verbose}) 10.272 +option gives you this extra detail. 10.273 +\interaction{tour.log-v} 10.274 + 10.275 +If you want to see both the description and content of a change, add 10.276 +the \hgopt{log}{-p} (or \hgopt{log}{--patch}) option. This displays 10.277 +the content of a change as a \emph{unified diff} (if you've never seen 10.278 +a unified diff before, see section~\ref{sec:mq:patch} for an overview). 10.279 +\interaction{tour.log-vp} 10.280 + 10.281 +\section{All about command options} 10.282 + 10.283 +Let's take a brief break from exploring Mercurial commands to discuss 10.284 +a pattern in the way that they work; you may find this useful to keep 10.285 +in mind as we continue our tour. 10.286 + 10.287 +Mercurial has a consistent and straightforward approach to dealing 10.288 +with the options that you can pass to commands. It follows the 10.289 +conventions for options that are common to modern Linux and Unix 10.290 +systems. 10.291 +\begin{itemize} 10.292 +\item Every option has a long name. For example, as we've already 10.293 + seen, the \hgcmd{log} command accepts a \hgopt{log}{--rev} option. 10.294 +\item Most options have short names, too. Instead of 10.295 + \hgopt{log}{--rev}, we can use \hgopt{log}{-r}. (The reason that 10.296 + some options don't have short names is that the options in question 10.297 + are rarely used.) 10.298 +\item Long options start with two dashes (e.g.~\hgopt{log}{--rev}), 10.299 + while short options start with one (e.g.~\hgopt{log}{-r}). 10.300 +\item Option naming and usage is consistent across commands. For 10.301 + example, every command that lets you specify a changeset~ID or 10.302 + revision number accepts both \hgopt{log}{-r} and \hgopt{log}{--rev} 10.303 + arguments. 10.304 +\end{itemize} 10.305 +In the examples throughout this book, I use short options instead of 10.306 +long. This just reflects my own preference, so don't read anything 10.307 +significant into it. 10.308 + 10.309 +Most commands that print output of some kind will print more output 10.310 +when passed a \hggopt{-v} (or \hggopt{--verbose}) option, and less 10.311 +when passed \hggopt{-q} (or \hggopt{--quiet}). 10.312 + 10.313 +\section{Making and reviewing changes} 10.314 + 10.315 +Now that we have a grasp of viewing history in Mercurial, let's take a 10.316 +look at making some changes and examining them. 10.317 + 10.318 +The first thing we'll do is isolate our experiment in a repository of 10.319 +its own. We use the \hgcmd{clone} command, but we don't need to 10.320 +clone a copy of the remote repository. Since we already have a copy 10.321 +of it locally, we can just clone that instead. This is much faster 10.322 +than cloning over the network, and cloning a local repository uses 10.323 +less disk space in most cases, too. 10.324 +\interaction{tour.reclone} 10.325 +As an aside, it's often good practice to keep a ``pristine'' copy of a 10.326 +remote repository around, which you can then make temporary clones of 10.327 +to create sandboxes for each task you want to work on. This lets you 10.328 +work on multiple tasks in parallel, each isolated from the others 10.329 +until it's complete and you're ready to integrate it back. Because 10.330 +local clones are so cheap, there's almost no overhead to cloning and 10.331 +destroying repositories whenever you want. 10.332 + 10.333 +In our \dirname{my-hello} repository, we have a file 10.334 +\filename{hello.c} that contains the classic ``hello, world'' program. 10.335 +Let's use the ancient and venerable \command{sed} command to edit this 10.336 +file so that it prints a second line of output. (I'm only using 10.337 +\command{sed} to do this because it's easy to write a scripted example 10.338 +this way. Since you're not under the same constraint, you probably 10.339 +won't want to use \command{sed}; simply use your preferred text editor to 10.340 +do the same thing.) 10.341 +\interaction{tour.sed} 10.342 + 10.343 +Mercurial's \hgcmd{status} command will tell us what Mercurial knows 10.344 +about the files in the repository. 10.345 +\interaction{tour.status} 10.346 +The \hgcmd{status} command prints no output for some files, but a line 10.347 +starting with ``\texttt{M}'' for \filename{hello.c}. Unless you tell 10.348 +it to, \hgcmd{status} will not print any output for files that have 10.349 +not been modified. 10.350 + 10.351 +The ``\texttt{M}'' indicates that Mercurial has noticed that we 10.352 +modified \filename{hello.c}. We didn't need to \emph{inform} 10.353 +Mercurial that we were going to modify the file before we started, or 10.354 +that we had modified the file after we were done; it was able to 10.355 +figure this out itself. 10.356 + 10.357 +It's a little bit helpful to know that we've modified 10.358 +\filename{hello.c}, but we might prefer to know exactly \emph{what} 10.359 +changes we've made to it. To do this, we use the \hgcmd{diff} 10.360 +command. 10.361 +\interaction{tour.diff} 10.362 + 10.363 +\section{Recording changes in a new changeset} 10.364 + 10.365 +We can modify files, build and test our changes, and use 10.366 +\hgcmd{status} and \hgcmd{diff} to review our changes, until we're 10.367 +satisfied with what we've done and arrive at a natural stopping point 10.368 +where we want to record our work in a new changeset. 10.369 + 10.370 +The \hgcmd{commit} command lets us create a new changeset; we'll 10.371 +usually refer to this as ``making a commit'' or ``committing''. 10.372 + 10.373 +\subsection{Setting up a username} 10.374 + 10.375 +When you try to run \hgcmd{commit} for the first time, it is not 10.376 +guaranteed to succeed. Mercurial records your name and address with 10.377 +each change that you commit, so that you and others will later be able 10.378 +to tell who made each change. Mercurial tries to automatically figure 10.379 +out a sensible username to commit the change with. It will attempt 10.380 +each of the following methods, in order: 10.381 +\begin{enumerate} 10.382 +\item If you specify a \hgopt{commit}{-u} option to the \hgcmd{commit} 10.383 + command on the command line, followed by a username, this is always 10.384 + given the highest precedence. 10.385 +\item If you have set the \envar{HGUSER} environment variable, this is 10.386 + checked next. 10.387 +\item If you create a file in your home directory called 10.388 + \sfilename{.hgrc}, with a \rcitem{ui}{username} entry, that will be 10.389 + used next. To see what the contents of this file should look like, 10.390 + refer to section~\ref{sec:tour-basic:username} below. 10.391 +\item If you have set the \envar{EMAIL} environment variable, this 10.392 + will be used next. 10.393 +\item Mercurial will query your system to find out your local user 10.394 + name and host name, and construct a username from these components. 10.395 + Since this often results in a username that is not very useful, it 10.396 + will print a warning if it has to do this. 10.397 +\end{enumerate} 10.398 +If all of these mechanisms fail, Mercurial will fail, printing an 10.399 +error message. In this case, it will not let you commit until you set 10.400 +up a username. 10.401 + 10.402 +You should think of the \envar{HGUSER} environment variable and the 10.403 +\hgopt{commit}{-u} option to the \hgcmd{commit} command as ways to 10.404 +\emph{override} Mercurial's default selection of username. For normal 10.405 +use, the simplest and most robust way to set a username for yourself 10.406 +is by creating a \sfilename{.hgrc} file; see below for details. 10.407 + 10.408 +\subsubsection{Creating a Mercurial configuration file} 10.409 +\label{sec:tour-basic:username} 10.410 + 10.411 +To set a user name, use your favourite editor to create a file called 10.412 +\sfilename{.hgrc} in your home directory. Mercurial will use this 10.413 +file to look up your personalised configuration settings. The initial 10.414 +contents of your \sfilename{.hgrc} should look like this. 10.415 +\begin{codesample2} 10.416 + # This is a Mercurial configuration file. 10.417 + [ui] 10.418 + username = Firstname Lastname <email.address@domain.net> 10.419 +\end{codesample2} 10.420 +The ``\texttt{[ui]}'' line begins a \emph{section} of the config file, 10.421 +so you can read the ``\texttt{username = ...}'' line as meaning ``set 10.422 +the value of the \texttt{username} item in the \texttt{ui} section''. 10.423 +A section continues until a new section begins, or the end of the 10.424 +file. Mercurial ignores empty lines and treats any text from 10.425 +``\texttt{\#}'' to the end of a line as a comment. 10.426 + 10.427 +\subsubsection{Choosing a user name} 10.428 + 10.429 +You can use any text you like as the value of the \texttt{username} 10.430 +config item, since this information is for reading by other people, 10.431 +but for interpreting by Mercurial. The convention that most people 10.432 +follow is to use their name and email address, as in the example 10.433 +above. 10.434 + 10.435 +\begin{note} 10.436 + Mercurial's built-in web server obfuscates email addresses, to make 10.437 + it more difficult for the email harvesting tools that spammers use. 10.438 + This reduces the likelihood that you'll start receiving more junk 10.439 + email if you publish a Mercurial repository on the web. 10.440 +\end{note} 10.441 + 10.442 +\subsection{Writing a commit message} 10.443 + 10.444 +When we commit a change, Mercurial drops us into a text editor, to 10.445 +enter a message that will describe the modifications we've made in 10.446 +this changeset. This is called the \emph{commit message}. It will be 10.447 +a record for readers of what we did and why, and it will be printed by 10.448 +\hgcmd{log} after we've finished committing. 10.449 +\interaction{tour.commit} 10.450 + 10.451 +The editor that the \hgcmd{commit} command drops us into will contain 10.452 +an empty line, followed by a number of lines starting with 10.453 +``\texttt{HG:}''. 10.454 +\begin{codesample2} 10.455 + \emph{empty line} 10.456 + HG: changed hello.c 10.457 +\end{codesample2} 10.458 +Mercurial ignores the lines that start with ``\texttt{HG:}''; it uses 10.459 +them only to tell us which files it's recording changes to. Modifying 10.460 +or deleting these lines has no effect. 10.461 + 10.462 +\subsection{Writing a good commit message} 10.463 + 10.464 +Since \hgcmd{log} only prints the first line of a commit message by 10.465 +default, it's best to write a commit message whose first line stands 10.466 +alone. Here's a real example of a commit message that \emph{doesn't} 10.467 +follow this guideline, and hence has a summary that is not readable. 10.468 +\begin{codesample2} 10.469 + changeset: 73:584af0e231be 10.470 + user: Censored Person <censored.person@example.org> 10.471 + date: Tue Sep 26 21:37:07 2006 -0700 10.472 + summary: include buildmeister/commondefs. Add an exports and install 10.473 +\end{codesample2} 10.474 + 10.475 +As far as the remainder of the contents of the commit message are 10.476 +concerned, there are no hard-and-fast rules. Mercurial itself doesn't 10.477 +interpret or care about the contents of the commit message, though 10.478 +your project may have policies that dictate a certain kind of 10.479 +formatting. 10.480 + 10.481 +My personal preference is for short, but informative, commit messages 10.482 +that tell me something that I can't figure out with a quick glance at 10.483 +the output of \hgcmdargs{log}{--patch}. 10.484 + 10.485 +\subsection{Aborting a commit} 10.486 + 10.487 +If you decide that you don't want to commit while in the middle of 10.488 +editing a commit message, simply exit from your editor without saving 10.489 +the file that it's editing. This will cause nothing to happen to 10.490 +either the repository or the working directory. 10.491 + 10.492 +If we run the \hgcmd{commit} command without any arguments, it records 10.493 +all of the changes we've made, as reported by \hgcmd{status} and 10.494 +\hgcmd{diff}. 10.495 + 10.496 +\subsection{Admiring our new handiwork} 10.497 + 10.498 +Once we've finished the commit, we can use the \hgcmd{tip} command to 10.499 +display the changeset we just created. This command produces output 10.500 +that is identical to \hgcmd{log}, but it only displays the newest 10.501 +revision in the repository. 10.502 +\interaction{tour.tip} 10.503 +We refer to the newest revision in the repository as the tip revision, 10.504 +or simply the tip. 10.505 + 10.506 +\section{Sharing changes} 10.507 + 10.508 +We mentioned earlier that repositories in Mercurial are 10.509 +self-contained. This means that the changeset we just created exists 10.510 +only in our \dirname{my-hello} repository. Let's look at a few ways 10.511 +that we can propagate this change into other repositories. 10.512 + 10.513 +\subsection{Pulling changes from another repository} 10.514 +\label{sec:tour:pull} 10.515 + 10.516 +To get started, let's clone our original \dirname{hello} repository, 10.517 +which does not contain the change we just committed. We'll call our 10.518 +temporary repository \dirname{hello-pull}. 10.519 +\interaction{tour.clone-pull} 10.520 + 10.521 +We'll use the \hgcmd{pull} command to bring changes from 10.522 +\dirname{my-hello} into \dirname{hello-pull}. However, blindly 10.523 +pulling unknown changes into a repository is a somewhat scary 10.524 +prospect. Mercurial provides the \hgcmd{incoming} command to tell us 10.525 +what changes the \hgcmd{pull} command \emph{would} pull into the 10.526 +repository, without actually pulling the changes in. 10.527 +\interaction{tour.incoming} 10.528 +(Of course, someone could cause more changesets to appear in the 10.529 +repository that we ran \hgcmd{incoming} in, before we get a chance to 10.530 +\hgcmd{pull} the changes, so that we could end up pulling changes that we 10.531 +didn't expect.) 10.532 + 10.533 +Bringing changes into a repository is a simple matter of running the 10.534 +\hgcmd{pull} command, and telling it which repository to pull from. 10.535 +\interaction{tour.pull} 10.536 +As you can see from the before-and-after output of \hgcmd{tip}, we 10.537 +have successfully pulled changes into our repository. There remains 10.538 +one step before we can see these changes in the working directory. 10.539 + 10.540 +\subsection{Updating the working directory} 10.541 + 10.542 +We have so far glossed over the relationship between a repository and 10.543 +its working directory. The \hgcmd{pull} command that we ran in 10.544 +section~\ref{sec:tour:pull} brought changes into the repository, but 10.545 +if we check, there's no sign of those changes in the working 10.546 +directory. This is because \hgcmd{pull} does not (by default) touch 10.547 +the working directory. Instead, we use the \hgcmd{update} command to 10.548 +do this. 10.549 +\interaction{tour.update} 10.550 + 10.551 +It might seem a bit strange that \hgcmd{pull} doesn't update the 10.552 +working directory automatically. There's actually a good reason for 10.553 +this: you can use \hgcmd{update} to update the working directory to 10.554 +the state it was in at \emph{any revision} in the history of the 10.555 +repository. If you had the working directory updated to an old 10.556 +revision---to hunt down the origin of a bug, say---and ran a 10.557 +\hgcmd{pull} which automatically updated the working directory to a 10.558 +new revision, you might not be terribly happy. 10.559 + 10.560 +However, since pull-then-update is such a common thing to do, 10.561 +Mercurial lets you combine the two by passing the \hgopt{pull}{-u} 10.562 +option to \hgcmd{pull}. 10.563 +\begin{codesample2} 10.564 + hg pull -u 10.565 +\end{codesample2} 10.566 +If you look back at the output of \hgcmd{pull} in 10.567 +section~\ref{sec:tour:pull} when we ran it without \hgopt{pull}{-u}, 10.568 +you can see that it printed a helpful reminder that we'd have to take 10.569 +an explicit step to update the working directory: 10.570 +\begin{codesample2} 10.571 + (run 'hg update' to get a working copy) 10.572 +\end{codesample2} 10.573 + 10.574 +To find out what revision the working directory is at, use the 10.575 +\hgcmd{parents} command. 10.576 +\interaction{tour.parents} 10.577 +If you look back at figure~\ref{fig:tour-basic:history}, you'll see 10.578 +arrows connecting each changeset. The node that the arrow leads 10.579 +\emph{from} in each case is a parent, and the node that the arrow 10.580 +leads \emph{to} is its child. The working directory has a parent in 10.581 +just the same way; this is the changeset that the working directory 10.582 +currently contains. 10.583 + 10.584 +To update the working directory to a particular revision, give a 10.585 +revision number or changeset~ID to the \hgcmd{update} command. 10.586 +\interaction{tour.older} 10.587 +If you omit an explicit revision, \hgcmd{update} will update to the 10.588 +tip revision, as shown by the second call to \hgcmd{update} in the 10.589 +example above. 10.590 + 10.591 +\subsection{Pushing changes to another repository} 10.592 + 10.593 +Mercurial lets us push changes to another repository, from the 10.594 +repository we're currently visiting. As with the example of 10.595 +\hgcmd{pull} above, we'll create a temporary repository to push our 10.596 +changes into. 10.597 +\interaction{tour.clone-push} 10.598 +The \hgcmd{outgoing} command tells us what changes would be pushed 10.599 +into another repository. 10.600 +\interaction{tour.outgoing} 10.601 +And the \hgcmd{push} command does the actual push. 10.602 +\interaction{tour.push} 10.603 +As with \hgcmd{pull}, the \hgcmd{push} command does not update the 10.604 +working directory in the repository that it's pushing changes into. 10.605 +(Unlike \hgcmd{pull}, \hgcmd{push} does not provide a \texttt{-u} 10.606 +option that updates the other repository's working directory.) 10.607 + 10.608 +What happens if we try to pull or push changes and the receiving 10.609 +repository already has those changes? Nothing too exciting. 10.610 +\interaction{tour.push.nothing} 10.611 + 10.612 +\subsection{Sharing changes over a network} 10.613 + 10.614 +The commands we have covered in the previous few sections are not 10.615 +limited to working with local repositories. Each works in exactly the 10.616 +same fashion over a network connection; simply pass in a URL instead 10.617 +of a local path. 10.618 +\interaction{tour.outgoing.net} 10.619 +In this example, we can see what changes we could push to the remote 10.620 +repository, but the repository is understandably not set up to let 10.621 +anonymous users push to it. 10.622 +\interaction{tour.push.net} 10.623 + 10.624 +%%% Local Variables: 10.625 +%%% mode: latex 10.626 +%%% TeX-master: "00book" 10.627 +%%% End:
11.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 11.2 +++ b/en/ch03-tour-merge.tex Thu Jan 29 22:56:27 2009 -0800 11.3 @@ -0,0 +1,286 @@ 11.4 +\chapter{A tour of Mercurial: merging work} 11.5 +\label{chap:tour-merge} 11.6 + 11.7 +We've now covered cloning a repository, making changes in a 11.8 +repository, and pulling or pushing changes from one repository into 11.9 +another. Our next step is \emph{merging} changes from separate 11.10 +repositories. 11.11 + 11.12 +\section{Merging streams of work} 11.13 + 11.14 +Merging is a fundamental part of working with a distributed revision 11.15 +control tool. 11.16 +\begin{itemize} 11.17 +\item Alice and Bob each have a personal copy of a repository for a 11.18 + project they're collaborating on. Alice fixes a bug in her 11.19 + repository; Bob adds a new feature in his. They want the shared 11.20 + repository to contain both the bug fix and the new feature. 11.21 +\item I frequently work on several different tasks for a single 11.22 + project at once, each safely isolated in its own repository. 11.23 + Working this way means that I often need to merge one piece of my 11.24 + own work with another. 11.25 +\end{itemize} 11.26 + 11.27 +Because merging is such a common thing to need to do, Mercurial makes 11.28 +it easy. Let's walk through the process. We'll begin by cloning yet 11.29 +another repository (see how often they spring up?) and making a change 11.30 +in it. 11.31 +\interaction{tour.merge.clone} 11.32 +We should now have two copies of \filename{hello.c} with different 11.33 +contents. The histories of the two repositories have also diverged, 11.34 +as illustrated in figure~\ref{fig:tour-merge:sep-repos}. 11.35 +\interaction{tour.merge.cat} 11.36 + 11.37 +\begin{figure}[ht] 11.38 + \centering 11.39 + \grafix{tour-merge-sep-repos} 11.40 + \caption{Divergent recent histories of the \dirname{my-hello} and 11.41 + \dirname{my-new-hello} repositories} 11.42 + \label{fig:tour-merge:sep-repos} 11.43 +\end{figure} 11.44 + 11.45 +We already know that pulling changes from our \dirname{my-hello} 11.46 +repository will have no effect on the working directory. 11.47 +\interaction{tour.merge.pull} 11.48 +However, the \hgcmd{pull} command says something about ``heads''. 11.49 + 11.50 +\subsection{Head changesets} 11.51 + 11.52 +A head is a change that has no descendants, or children, as they're 11.53 +also known. The tip revision is thus a head, because the newest 11.54 +revision in a repository doesn't have any children, but a repository 11.55 +can contain more than one head. 11.56 + 11.57 +\begin{figure}[ht] 11.58 + \centering 11.59 + \grafix{tour-merge-pull} 11.60 + \caption{Repository contents after pulling from \dirname{my-hello} into 11.61 + \dirname{my-new-hello}} 11.62 + \label{fig:tour-merge:pull} 11.63 +\end{figure} 11.64 + 11.65 +In figure~\ref{fig:tour-merge:pull}, you can see the effect of the 11.66 +pull from \dirname{my-hello} into \dirname{my-new-hello}. The history 11.67 +that was already present in \dirname{my-new-hello} is untouched, but a 11.68 +new revision has been added. By referring to 11.69 +figure~\ref{fig:tour-merge:sep-repos}, we can see that the 11.70 +\emph{changeset ID} remains the same in the new repository, but the 11.71 +\emph{revision number} has changed. (This, incidentally, is a fine 11.72 +example of why it's not safe to use revision numbers when discussing 11.73 +changesets.) We can view the heads in a repository using the 11.74 +\hgcmd{heads} command. 11.75 +\interaction{tour.merge.heads} 11.76 + 11.77 +\subsection{Performing the merge} 11.78 + 11.79 +What happens if we try to use the normal \hgcmd{update} command to 11.80 +update to the new tip? 11.81 +\interaction{tour.merge.update} 11.82 +Mercurial is telling us that the \hgcmd{update} command won't do a 11.83 +merge; it won't update the working directory when it thinks we might 11.84 +be wanting to do a merge, unless we force it to do so. Instead, we 11.85 +use the \hgcmd{merge} command to merge the two heads. 11.86 +\interaction{tour.merge.merge} 11.87 + 11.88 +\begin{figure}[ht] 11.89 + \centering 11.90 + \grafix{tour-merge-merge} 11.91 + \caption{Working directory and repository during merge, and 11.92 + following commit} 11.93 + \label{fig:tour-merge:merge} 11.94 +\end{figure} 11.95 + 11.96 +This updates the working directory so that it contains changes from 11.97 +\emph{both} heads, which is reflected in both the output of 11.98 +\hgcmd{parents} and the contents of \filename{hello.c}. 11.99 +\interaction{tour.merge.parents} 11.100 + 11.101 +\subsection{Committing the results of the merge} 11.102 + 11.103 +Whenever we've done a merge, \hgcmd{parents} will display two parents 11.104 +until we \hgcmd{commit} the results of the merge. 11.105 +\interaction{tour.merge.commit} 11.106 +We now have a new tip revision; notice that it has \emph{both} of 11.107 +our former heads as its parents. These are the same revisions that 11.108 +were previously displayed by \hgcmd{parents}. 11.109 +\interaction{tour.merge.tip} 11.110 +In figure~\ref{fig:tour-merge:merge}, you can see a representation of 11.111 +what happens to the working directory during the merge, and how this 11.112 +affects the repository when the commit happens. During the merge, the 11.113 +working directory has two parent changesets, and these become the 11.114 +parents of the new changeset. 11.115 + 11.116 +\section{Merging conflicting changes} 11.117 + 11.118 +Most merges are simple affairs, but sometimes you'll find yourself 11.119 +merging changes where each modifies the same portions of the same 11.120 +files. Unless both modifications are identical, this results in a 11.121 +\emph{conflict}, where you have to decide how to reconcile the 11.122 +different changes into something coherent. 11.123 + 11.124 +\begin{figure}[ht] 11.125 + \centering 11.126 + \grafix{tour-merge-conflict} 11.127 + \caption{Conflicting changes to a document} 11.128 + \label{fig:tour-merge:conflict} 11.129 +\end{figure} 11.130 + 11.131 +Figure~\ref{fig:tour-merge:conflict} illustrates an instance of two 11.132 +conflicting changes to a document. We started with a single version 11.133 +of the file; then we made some changes; while someone else made 11.134 +different changes to the same text. Our task in resolving the 11.135 +conflicting changes is to decide what the file should look like. 11.136 + 11.137 +Mercurial doesn't have a built-in facility for handling conflicts. 11.138 +Instead, it runs an external program called \command{hgmerge}. This 11.139 +is a shell script that is bundled with Mercurial; you can change it to 11.140 +behave however you please. What it does by default is try to find one 11.141 +of several different merging tools that are likely to be installed on 11.142 +your system. It first tries a few fully automatic merging tools; if 11.143 +these don't succeed (because the resolution process requires human 11.144 +guidance) or aren't present, the script tries a few different 11.145 +graphical merging tools. 11.146 + 11.147 +It's also possible to get Mercurial to run another program or script 11.148 +instead of \command{hgmerge}, by setting the \envar{HGMERGE} 11.149 +environment variable to the name of your preferred program. 11.150 + 11.151 +\subsection{Using a graphical merge tool} 11.152 + 11.153 +My preferred graphical merge tool is \command{kdiff3}, which I'll use 11.154 +to describe the features that are common to graphical file merging 11.155 +tools. You can see a screenshot of \command{kdiff3} in action in 11.156 +figure~\ref{fig:tour-merge:kdiff3}. The kind of merge it is 11.157 +performing is called a \emph{three-way merge}, because there are three 11.158 +different versions of the file of interest to us. The tool thus 11.159 +splits the upper portion of the window into three panes: 11.160 +\begin{itemize} 11.161 +\item At the left is the \emph{base} version of the file, i.e.~the 11.162 + most recent version from which the two versions we're trying to 11.163 + merge are descended. 11.164 +\item In the middle is ``our'' version of the file, with the contents 11.165 + that we modified. 11.166 +\item On the right is ``their'' version of the file, the one that 11.167 + from the changeset that we're trying to merge with. 11.168 +\end{itemize} 11.169 +In the pane below these is the current \emph{result} of the merge. 11.170 +Our task is to replace all of the red text, which indicates unresolved 11.171 +conflicts, with some sensible merger of the ``ours'' and ``theirs'' 11.172 +versions of the file. 11.173 + 11.174 +All four of these panes are \emph{locked together}; if we scroll 11.175 +vertically or horizontally in any of them, the others are updated to 11.176 +display the corresponding sections of their respective files. 11.177 + 11.178 +\begin{figure}[ht] 11.179 + \centering 11.180 + \grafix{kdiff3} 11.181 + \caption{Using \command{kdiff3} to merge versions of a file} 11.182 + \label{fig:tour-merge:kdiff3} 11.183 +\end{figure} 11.184 + 11.185 +For each conflicting portion of the file, we can choose to resolve 11.186 +the conflict using some combination of text from the base version, 11.187 +ours, or theirs. We can also manually edit the merged file at any 11.188 +time, in case we need to make further modifications. 11.189 + 11.190 +There are \emph{many} file merging tools available, too many to cover 11.191 +here. They vary in which platforms they are available for, and in 11.192 +their particular strengths and weaknesses. Most are tuned for merging 11.193 +files containing plain text, while a few are aimed at specialised file 11.194 +formats (generally XML). 11.195 + 11.196 +\subsection{A worked example} 11.197 + 11.198 +In this example, we will reproduce the file modification history of 11.199 +figure~\ref{fig:tour-merge:conflict} above. Let's begin by creating a 11.200 +repository with a base version of our document. 11.201 +\interaction{tour-merge-conflict.wife} 11.202 +We'll clone the repository and make a change to the file. 11.203 +\interaction{tour-merge-conflict.cousin} 11.204 +And another clone, to simulate someone else making a change to the 11.205 +file. (This hints at the idea that it's not all that unusual to merge 11.206 +with yourself when you isolate tasks in separate repositories, and 11.207 +indeed to find and resolve conflicts while doing so.) 11.208 +\interaction{tour-merge-conflict.son} 11.209 +Having created two different versions of the file, we'll set up an 11.210 +environment suitable for running our merge. 11.211 +\interaction{tour-merge-conflict.pull} 11.212 + 11.213 +In this example, I won't use Mercurial's normal \command{hgmerge} 11.214 +program to do the merge, because it would drop my nice automated 11.215 +example-running tool into a graphical user interface. Instead, I'll 11.216 +set \envar{HGMERGE} to tell Mercurial to use the non-interactive 11.217 +\command{merge} command. This is bundled with many Unix-like systems. 11.218 +If you're following this example on your computer, don't bother 11.219 +setting \envar{HGMERGE}. 11.220 + 11.221 +\textbf{XXX FIX THIS EXAMPLE.} 11.222 + 11.223 +\interaction{tour-merge-conflict.merge} 11.224 +Because \command{merge} can't resolve the conflicting changes, it 11.225 +leaves \emph{merge markers} inside the file that has conflicts, 11.226 +indicating which lines have conflicts, and whether they came from our 11.227 +version of the file or theirs. 11.228 + 11.229 +Mercurial can tell from the way \command{merge} exits that it wasn't 11.230 +able to merge successfully, so it tells us what commands we'll need to 11.231 +run if we want to redo the merging operation. This could be useful 11.232 +if, for example, we were running a graphical merge tool and quit 11.233 +because we were confused or realised we had made a mistake. 11.234 + 11.235 +If automatic or manual merges fail, there's nothing to prevent us from 11.236 +``fixing up'' the affected files ourselves, and committing the results 11.237 +of our merge: 11.238 +\interaction{tour-merge-conflict.commit} 11.239 + 11.240 +\section{Simplifying the pull-merge-commit sequence} 11.241 +\label{sec:tour-merge:fetch} 11.242 + 11.243 +The process of merging changes as outlined above is straightforward, 11.244 +but requires running three commands in sequence. 11.245 +\begin{codesample2} 11.246 + hg pull 11.247 + hg merge 11.248 + hg commit -m 'Merged remote changes' 11.249 +\end{codesample2} 11.250 +In the case of the final commit, you also need to enter a commit 11.251 +message, which is almost always going to be a piece of uninteresting 11.252 +``boilerplate'' text. 11.253 + 11.254 +It would be nice to reduce the number of steps needed, if this were 11.255 +possible. Indeed, Mercurial is distributed with an extension called 11.256 +\hgext{fetch} that does just this. 11.257 + 11.258 +Mercurial provides a flexible extension mechanism that lets people 11.259 +extend its functionality, while keeping the core of Mercurial small 11.260 +and easy to deal with. Some extensions add new commands that you can 11.261 +use from the command line, while others work ``behind the scenes,'' 11.262 +for example adding capabilities to the server. 11.263 + 11.264 +The \hgext{fetch} extension adds a new command called, not 11.265 +surprisingly, \hgcmd{fetch}. This extension acts as a combination of 11.266 +\hgcmd{pull}, \hgcmd{update} and \hgcmd{merge}. It begins by pulling 11.267 +changes from another repository into the current repository. If it 11.268 +finds that the changes added a new head to the repository, it begins a 11.269 +merge, then commits the result of the merge with an 11.270 +automatically-generated commit message. If no new heads were added, 11.271 +it updates the working directory to the new tip changeset. 11.272 + 11.273 +Enabling the \hgext{fetch} extension is easy. Edit your 11.274 +\sfilename{.hgrc}, and either go to the \rcsection{extensions} section 11.275 +or create an \rcsection{extensions} section. Then add a line that 11.276 +simply reads ``\Verb+fetch +''. 11.277 +\begin{codesample2} 11.278 + [extensions] 11.279 + fetch = 11.280 +\end{codesample2} 11.281 +(Normally, on the right-hand side of the ``\texttt{=}'' would appear 11.282 +the location of the extension, but since the \hgext{fetch} extension 11.283 +is in the standard distribution, Mercurial knows where to search for 11.284 +it.) 11.285 + 11.286 +%%% Local Variables: 11.287 +%%% mode: latex 11.288 +%%% TeX-master: "00book" 11.289 +%%% End:
12.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 12.2 +++ b/en/ch04-concepts.tex Thu Jan 29 22:56:27 2009 -0800 12.3 @@ -0,0 +1,577 @@ 12.4 +\chapter{Behind the scenes} 12.5 +\label{chap:concepts} 12.6 + 12.7 +Unlike many revision control systems, the concepts upon which 12.8 +Mercurial is built are simple enough that it's easy to understand how 12.9 +the software really works. Knowing this certainly isn't necessary, 12.10 +but I find it useful to have a ``mental model'' of what's going on. 12.11 + 12.12 +This understanding gives me confidence that Mercurial has been 12.13 +carefully designed to be both \emph{safe} and \emph{efficient}. And 12.14 +just as importantly, if it's easy for me to retain a good idea of what 12.15 +the software is doing when I perform a revision control task, I'm less 12.16 +likely to be surprised by its behaviour. 12.17 + 12.18 +In this chapter, we'll initially cover the core concepts behind 12.19 +Mercurial's design, then continue to discuss some of the interesting 12.20 +details of its implementation. 12.21 + 12.22 +\section{Mercurial's historical record} 12.23 + 12.24 +\subsection{Tracking the history of a single file} 12.25 + 12.26 +When Mercurial tracks modifications to a file, it stores the history 12.27 +of that file in a metadata object called a \emph{filelog}. Each entry 12.28 +in the filelog contains enough information to reconstruct one revision 12.29 +of the file that is being tracked. Filelogs are stored as files in 12.30 +the \sdirname{.hg/store/data} directory. A filelog contains two kinds 12.31 +of information: revision data, and an index to help Mercurial to find 12.32 +a revision efficiently. 12.33 + 12.34 +A file that is large, or has a lot of history, has its filelog stored 12.35 +in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' 12.36 +suffix) files. For small files without much history, the revision 12.37 +data and index are combined in a single ``\texttt{.i}'' file. The 12.38 +correspondence between a file in the working directory and the filelog 12.39 +that tracks its history in the repository is illustrated in 12.40 +figure~\ref{fig:concepts:filelog}. 12.41 + 12.42 +\begin{figure}[ht] 12.43 + \centering 12.44 + \grafix{filelog} 12.45 + \caption{Relationships between files in working directory and 12.46 + filelogs in repository} 12.47 + \label{fig:concepts:filelog} 12.48 +\end{figure} 12.49 + 12.50 +\subsection{Managing tracked files} 12.51 + 12.52 +Mercurial uses a structure called a \emph{manifest} to collect 12.53 +together information about the files that it tracks. Each entry in 12.54 +the manifest contains information about the files present in a single 12.55 +changeset. An entry records which files are present in the changeset, 12.56 +the revision of each file, and a few other pieces of file metadata. 12.57 + 12.58 +\subsection{Recording changeset information} 12.59 + 12.60 +The \emph{changelog} contains information about each changeset. Each 12.61 +revision records who committed a change, the changeset comment, other 12.62 +pieces of changeset-related information, and the revision of the 12.63 +manifest to use. 12.64 + 12.65 +\subsection{Relationships between revisions} 12.66 + 12.67 +Within a changelog, a manifest, or a filelog, each revision stores a 12.68 +pointer to its immediate parent (or to its two parents, if it's a 12.69 +merge revision). As I mentioned above, there are also relationships 12.70 +between revisions \emph{across} these structures, and they are 12.71 +hierarchical in nature. 12.72 + 12.73 +For every changeset in a repository, there is exactly one revision 12.74 +stored in the changelog. Each revision of the changelog contains a 12.75 +pointer to a single revision of the manifest. A revision of the 12.76 +manifest stores a pointer to a single revision of each filelog tracked 12.77 +when that changeset was created. These relationships are illustrated 12.78 +in figure~\ref{fig:concepts:metadata}. 12.79 + 12.80 +\begin{figure}[ht] 12.81 + \centering 12.82 + \grafix{metadata} 12.83 + \caption{Metadata relationships} 12.84 + \label{fig:concepts:metadata} 12.85 +\end{figure} 12.86 + 12.87 +As the illustration shows, there is \emph{not} a ``one to one'' 12.88 +relationship between revisions in the changelog, manifest, or filelog. 12.89 +If the manifest hasn't changed between two changesets, the changelog 12.90 +entries for those changesets will point to the same revision of the 12.91 +manifest. If a file that Mercurial tracks hasn't changed between two 12.92 +changesets, the entry for that file in the two revisions of the 12.93 +manifest will point to the same revision of its filelog. 12.94 + 12.95 +\section{Safe, efficient storage} 12.96 + 12.97 +The underpinnings of changelogs, manifests, and filelogs are provided 12.98 +by a single structure called the \emph{revlog}. 12.99 + 12.100 +\subsection{Efficient storage} 12.101 + 12.102 +The revlog provides efficient storage of revisions using a 12.103 +\emph{delta} mechanism. Instead of storing a complete copy of a file 12.104 +for each revision, it stores the changes needed to transform an older 12.105 +revision into the new revision. For many kinds of file data, these 12.106 +deltas are typically a fraction of a percent of the size of a full 12.107 +copy of a file. 12.108 + 12.109 +Some obsolete revision control systems can only work with deltas of 12.110 +text files. They must either store binary files as complete snapshots 12.111 +or encoded into a text representation, both of which are wasteful 12.112 +approaches. Mercurial can efficiently handle deltas of files with 12.113 +arbitrary binary contents; it doesn't need to treat text as special. 12.114 + 12.115 +\subsection{Safe operation} 12.116 +\label{sec:concepts:txn} 12.117 + 12.118 +Mercurial only ever \emph{appends} data to the end of a revlog file. 12.119 +It never modifies a section of a file after it has written it. This 12.120 +is both more robust and efficient than schemes that need to modify or 12.121 +rewrite data. 12.122 + 12.123 +In addition, Mercurial treats every write as part of a 12.124 +\emph{transaction} that can span a number of files. A transaction is 12.125 +\emph{atomic}: either the entire transaction succeeds and its effects 12.126 +are all visible to readers in one go, or the whole thing is undone. 12.127 +This guarantee of atomicity means that if you're running two copies of 12.128 +Mercurial, where one is reading data and one is writing it, the reader 12.129 +will never see a partially written result that might confuse it. 12.130 + 12.131 +The fact that Mercurial only appends to files makes it easier to 12.132 +provide this transactional guarantee. The easier it is to do stuff 12.133 +like this, the more confident you should be that it's done correctly. 12.134 + 12.135 +\subsection{Fast retrieval} 12.136 + 12.137 +Mercurial cleverly avoids a pitfall common to all earlier 12.138 +revision control systems: the problem of \emph{inefficient retrieval}. 12.139 +Most revision control systems store the contents of a revision as an 12.140 +incremental series of modifications against a ``snapshot''. To 12.141 +reconstruct a specific revision, you must first read the snapshot, and 12.142 +then every one of the revisions between the snapshot and your target 12.143 +revision. The more history that a file accumulates, the more 12.144 +revisions you must read, hence the longer it takes to reconstruct a 12.145 +particular revision. 12.146 + 12.147 +\begin{figure}[ht] 12.148 + \centering 12.149 + \grafix{snapshot} 12.150 + \caption{Snapshot of a revlog, with incremental deltas} 12.151 + \label{fig:concepts:snapshot} 12.152 +\end{figure} 12.153 + 12.154 +The innovation that Mercurial applies to this problem is simple but 12.155 +effective. Once the cumulative amount of delta information stored 12.156 +since the last snapshot exceeds a fixed threshold, it stores a new 12.157 +snapshot (compressed, of course), instead of another delta. This 12.158 +makes it possible to reconstruct \emph{any} revision of a file 12.159 +quickly. This approach works so well that it has since been copied by 12.160 +several other revision control systems. 12.161 + 12.162 +Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry 12.163 +in a revlog's index file, Mercurial stores the range of entries from 12.164 +the data file that it must read to reconstruct a particular revision. 12.165 + 12.166 +\subsubsection{Aside: the influence of video compression} 12.167 + 12.168 +If you're familiar with video compression or have ever watched a TV 12.169 +feed through a digital cable or satellite service, you may know that 12.170 +most video compression schemes store each frame of video as a delta 12.171 +against its predecessor frame. In addition, these schemes use 12.172 +``lossy'' compression techniques to increase the compression ratio, so 12.173 +visual errors accumulate over the course of a number of inter-frame 12.174 +deltas. 12.175 + 12.176 +Because it's possible for a video stream to ``drop out'' occasionally 12.177 +due to signal glitches, and to limit the accumulation of artefacts 12.178 +introduced by the lossy compression process, video encoders 12.179 +periodically insert a complete frame (called a ``key frame'') into the 12.180 +video stream; the next delta is generated against that frame. This 12.181 +means that if the video signal gets interrupted, it will resume once 12.182 +the next key frame is received. Also, the accumulation of encoding 12.183 +errors restarts anew with each key frame. 12.184 + 12.185 +\subsection{Identification and strong integrity} 12.186 + 12.187 +Along with delta or snapshot information, a revlog entry contains a 12.188 +cryptographic hash of the data that it represents. This makes it 12.189 +difficult to forge the contents of a revision, and easy to detect 12.190 +accidental corruption. 12.191 + 12.192 +Hashes provide more than a mere check against corruption; they are 12.193 +used as the identifiers for revisions. The changeset identification 12.194 +hashes that you see as an end user are from revisions of the 12.195 +changelog. Although filelogs and the manifest also use hashes, 12.196 +Mercurial only uses these behind the scenes. 12.197 + 12.198 +Mercurial verifies that hashes are correct when it retrieves file 12.199 +revisions and when it pulls changes from another repository. If it 12.200 +encounters an integrity problem, it will complain and stop whatever 12.201 +it's doing. 12.202 + 12.203 +In addition to the effect it has on retrieval efficiency, Mercurial's 12.204 +use of periodic snapshots makes it more robust against partial data 12.205 +corruption. If a revlog becomes partly corrupted due to a hardware 12.206 +error or system bug, it's often possible to reconstruct some or most 12.207 +revisions from the uncorrupted sections of the revlog, both before and 12.208 +after the corrupted section. This would not be possible with a 12.209 +delta-only storage model. 12.210 + 12.211 +\section{Revision history, branching, 12.212 + and merging} 12.213 + 12.214 +Every entry in a Mercurial revlog knows the identity of its immediate 12.215 +ancestor revision, usually referred to as its \emph{parent}. In fact, 12.216 +a revision contains room for not one parent, but two. Mercurial uses 12.217 +a special hash, called the ``null ID'', to represent the idea ``there 12.218 +is no parent here''. This hash is simply a string of zeroes. 12.219 + 12.220 +In figure~\ref{fig:concepts:revlog}, you can see an example of the 12.221 +conceptual structure of a revlog. Filelogs, manifests, and changelogs 12.222 +all have this same structure; they differ only in the kind of data 12.223 +stored in each delta or snapshot. 12.224 + 12.225 +The first revision in a revlog (at the bottom of the image) has the 12.226 +null ID in both of its parent slots. For a ``normal'' revision, its 12.227 +first parent slot contains the ID of its parent revision, and its 12.228 +second contains the null ID, indicating that the revision has only one 12.229 +real parent. Any two revisions that have the same parent ID are 12.230 +branches. A revision that represents a merge between branches has two 12.231 +normal revision IDs in its parent slots. 12.232 + 12.233 +\begin{figure}[ht] 12.234 + \centering 12.235 + \grafix{revlog} 12.236 + \caption{} 12.237 + \label{fig:concepts:revlog} 12.238 +\end{figure} 12.239 + 12.240 +\section{The working directory} 12.241 + 12.242 +In the working directory, Mercurial stores a snapshot of the files 12.243 +from the repository as of a particular changeset. 12.244 + 12.245 +The working directory ``knows'' which changeset it contains. When you 12.246 +update the working directory to contain a particular changeset, 12.247 +Mercurial looks up the appropriate revision of the manifest to find 12.248 +out which files it was tracking at the time that changeset was 12.249 +committed, and which revision of each file was then current. It then 12.250 +recreates a copy of each of those files, with the same contents it had 12.251 +when the changeset was committed. 12.252 + 12.253 +The \emph{dirstate} contains Mercurial's knowledge of the working 12.254 +directory. This details which changeset the working directory is 12.255 +updated to, and all of the files that Mercurial is tracking in the 12.256 +working directory. 12.257 + 12.258 +Just as a revision of a revlog has room for two parents, so that it 12.259 +can represent either a normal revision (with one parent) or a merge of 12.260 +two earlier revisions, the dirstate has slots for two parents. When 12.261 +you use the \hgcmd{update} command, the changeset that you update to 12.262 +is stored in the ``first parent'' slot, and the null ID in the second. 12.263 +When you \hgcmd{merge} with another changeset, the first parent 12.264 +remains unchanged, and the second parent is filled in with the 12.265 +changeset you're merging with. The \hgcmd{parents} command tells you 12.266 +what the parents of the dirstate are. 12.267 + 12.268 +\subsection{What happens when you commit} 12.269 + 12.270 +The dirstate stores parent information for more than just book-keeping 12.271 +purposes. Mercurial uses the parents of the dirstate as \emph{the 12.272 + parents of a new changeset} when you perform a commit. 12.273 + 12.274 +\begin{figure}[ht] 12.275 + \centering 12.276 + \grafix{wdir} 12.277 + \caption{The working directory can have two parents} 12.278 + \label{fig:concepts:wdir} 12.279 +\end{figure} 12.280 + 12.281 +Figure~\ref{fig:concepts:wdir} shows the normal state of the working 12.282 +directory, where it has a single changeset as parent. That changeset 12.283 +is the \emph{tip}, the newest changeset in the repository that has no 12.284 +children. 12.285 + 12.286 +\begin{figure}[ht] 12.287 + \centering 12.288 + \grafix{wdir-after-commit} 12.289 + \caption{The working directory gains new parents after a commit} 12.290 + \label{fig:concepts:wdir-after-commit} 12.291 +\end{figure} 12.292 + 12.293 +It's useful to think of the working directory as ``the changeset I'm 12.294 +about to commit''. Any files that you tell Mercurial that you've 12.295 +added, removed, renamed, or copied will be reflected in that 12.296 +changeset, as will modifications to any files that Mercurial is 12.297 +already tracking; the new changeset will have the parents of the 12.298 +working directory as its parents. 12.299 + 12.300 +After a commit, Mercurial will update the parents of the working 12.301 +directory, so that the first parent is the ID of the new changeset, 12.302 +and the second is the null ID. This is shown in 12.303 +figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch 12.304 +any of the files in the working directory when you commit; it just 12.305 +modifies the dirstate to note its new parents. 12.306 + 12.307 +\subsection{Creating a new head} 12.308 + 12.309 +It's perfectly normal to update the working directory to a changeset 12.310 +other than the current tip. For example, you might want to know what 12.311 +your project looked like last Tuesday, or you could be looking through 12.312 +changesets to see which one introduced a bug. In cases like this, the 12.313 +natural thing to do is update the working directory to the changeset 12.314 +you're interested in, and then examine the files in the working 12.315 +directory directly to see their contents as they were when you 12.316 +committed that changeset. The effect of this is shown in 12.317 +figure~\ref{fig:concepts:wdir-pre-branch}. 12.318 + 12.319 +\begin{figure}[ht] 12.320 + \centering 12.321 + \grafix{wdir-pre-branch} 12.322 + \caption{The working directory, updated to an older changeset} 12.323 + \label{fig:concepts:wdir-pre-branch} 12.324 +\end{figure} 12.325 + 12.326 +Having updated the working directory to an older changeset, what 12.327 +happens if you make some changes, and then commit? Mercurial behaves 12.328 +in the same way as I outlined above. The parents of the working 12.329 +directory become the parents of the new changeset. This new changeset 12.330 +has no children, so it becomes the new tip. And the repository now 12.331 +contains two changesets that have no children; we call these 12.332 +\emph{heads}. You can see the structure that this creates in 12.333 +figure~\ref{fig:concepts:wdir-branch}. 12.334 + 12.335 +\begin{figure}[ht] 12.336 + \centering 12.337 + \grafix{wdir-branch} 12.338 + \caption{After a commit made while synced to an older changeset} 12.339 + \label{fig:concepts:wdir-branch} 12.340 +\end{figure} 12.341 + 12.342 +\begin{note} 12.343 + If you're new to Mercurial, you should keep in mind a common 12.344 + ``error'', which is to use the \hgcmd{pull} command without any 12.345 + options. By default, the \hgcmd{pull} command \emph{does not} 12.346 + update the working directory, so you'll bring new changesets into 12.347 + your repository, but the working directory will stay synced at the 12.348 + same changeset as before the pull. If you make some changes and 12.349 + commit afterwards, you'll thus create a new head, because your 12.350 + working directory isn't synced to whatever the current tip is. 12.351 + 12.352 + I put the word ``error'' in quotes because all that you need to do 12.353 + to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In 12.354 + other words, this almost never has negative consequences; it just 12.355 + surprises people. I'll discuss other ways to avoid this behaviour, 12.356 + and why Mercurial behaves in this initially surprising way, later 12.357 + on. 12.358 +\end{note} 12.359 + 12.360 +\subsection{Merging heads} 12.361 + 12.362 +When you run the \hgcmd{merge} command, Mercurial leaves the first 12.363 +parent of the working directory unchanged, and sets the second parent 12.364 +to the changeset you're merging with, as shown in 12.365 +figure~\ref{fig:concepts:wdir-merge}. 12.366 + 12.367 +\begin{figure}[ht] 12.368 + \centering 12.369 + \grafix{wdir-merge} 12.370 + \caption{Merging two heads} 12.371 + \label{fig:concepts:wdir-merge} 12.372 +\end{figure} 12.373 + 12.374 +Mercurial also has to modify the working directory, to merge the files 12.375 +managed in the two changesets. Simplified a little, the merging 12.376 +process goes like this, for every file in the manifests of both 12.377 +changesets. 12.378 +\begin{itemize} 12.379 +\item If neither changeset has modified a file, do nothing with that 12.380 + file. 12.381 +\item If one changeset has modified a file, and the other hasn't, 12.382 + create the modified copy of the file in the working directory. 12.383 +\item If one changeset has removed a file, and the other hasn't (or 12.384 + has also deleted it), delete the file from the working directory. 12.385 +\item If one changeset has removed a file, but the other has modified 12.386 + the file, ask the user what to do: keep the modified file, or remove 12.387 + it? 12.388 +\item If both changesets have modified a file, invoke an external 12.389 + merge program to choose the new contents for the merged file. This 12.390 + may require input from the user. 12.391 +\item If one changeset has modified a file, and the other has renamed 12.392 + or copied the file, make sure that the changes follow the new name 12.393 + of the file. 12.394 +\end{itemize} 12.395 +There are more details---merging has plenty of corner cases---but 12.396 +these are the most common choices that are involved in a merge. As 12.397 +you can see, most cases are completely automatic, and indeed most 12.398 +merges finish automatically, without requiring your input to resolve 12.399 +any conflicts. 12.400 + 12.401 +When you're thinking about what happens when you commit after a merge, 12.402 +once again the working directory is ``the changeset I'm about to 12.403 +commit''. After the \hgcmd{merge} command completes, the working 12.404 +directory has two parents; these will become the parents of the new 12.405 +changeset. 12.406 + 12.407 +Mercurial lets you perform multiple merges, but you must commit the 12.408 +results of each individual merge as you go. This is necessary because 12.409 +Mercurial only tracks two parents for both revisions and the working 12.410 +directory. While it would be technically possible to merge multiple 12.411 +changesets at once, the prospect of user confusion and making a 12.412 +terrible mess of a merge immediately becomes overwhelming. 12.413 + 12.414 +\section{Other interesting design features} 12.415 + 12.416 +In the sections above, I've tried to highlight some of the most 12.417 +important aspects of Mercurial's design, to illustrate that it pays 12.418 +careful attention to reliability and performance. However, the 12.419 +attention to detail doesn't stop there. There are a number of other 12.420 +aspects of Mercurial's construction that I personally find 12.421 +interesting. I'll detail a few of them here, separate from the ``big 12.422 +ticket'' items above, so that if you're interested, you can gain a 12.423 +better idea of the amount of thinking that goes into a well-designed 12.424 +system. 12.425 + 12.426 +\subsection{Clever compression} 12.427 + 12.428 +When appropriate, Mercurial will store both snapshots and deltas in 12.429 +compressed form. It does this by always \emph{trying to} compress a 12.430 +snapshot or delta, but only storing the compressed version if it's 12.431 +smaller than the uncompressed version. 12.432 + 12.433 +This means that Mercurial does ``the right thing'' when storing a file 12.434 +whose native form is compressed, such as a \texttt{zip} archive or a 12.435 +JPEG image. When these types of files are compressed a second time, 12.436 +the resulting file is usually bigger than the once-compressed form, 12.437 +and so Mercurial will store the plain \texttt{zip} or JPEG. 12.438 + 12.439 +Deltas between revisions of a compressed file are usually larger than 12.440 +snapshots of the file, and Mercurial again does ``the right thing'' in 12.441 +these cases. It finds that such a delta exceeds the threshold at 12.442 +which it should store a complete snapshot of the file, so it stores 12.443 +the snapshot, again saving space compared to a naive delta-only 12.444 +approach. 12.445 + 12.446 +\subsubsection{Network recompression} 12.447 + 12.448 +When storing revisions on disk, Mercurial uses the ``deflate'' 12.449 +compression algorithm (the same one used by the popular \texttt{zip} 12.450 +archive format), which balances good speed with a respectable 12.451 +compression ratio. However, when transmitting revision data over a 12.452 +network connection, Mercurial uncompresses the compressed revision 12.453 +data. 12.454 + 12.455 +If the connection is over HTTP, Mercurial recompresses the entire 12.456 +stream of data using a compression algorithm that gives a better 12.457 +compression ratio (the Burrows-Wheeler algorithm from the widely used 12.458 +\texttt{bzip2} compression package). This combination of algorithm 12.459 +and compression of the entire stream (instead of a revision at a time) 12.460 +substantially reduces the number of bytes to be transferred, yielding 12.461 +better network performance over almost all kinds of network. 12.462 + 12.463 +(If the connection is over \command{ssh}, Mercurial \emph{doesn't} 12.464 +recompress the stream, because \command{ssh} can already do this 12.465 +itself.) 12.466 + 12.467 +\subsection{Read/write ordering and atomicity} 12.468 + 12.469 +Appending to files isn't the whole story when it comes to guaranteeing 12.470 +that a reader won't see a partial write. If you recall 12.471 +figure~\ref{fig:concepts:metadata}, revisions in the changelog point to 12.472 +revisions in the manifest, and revisions in the manifest point to 12.473 +revisions in filelogs. This hierarchy is deliberate. 12.474 + 12.475 +A writer starts a transaction by writing filelog and manifest data, 12.476 +and doesn't write any changelog data until those are finished. A 12.477 +reader starts by reading changelog data, then manifest data, followed 12.478 +by filelog data. 12.479 + 12.480 +Since the writer has always finished writing filelog and manifest data 12.481 +before it writes to the changelog, a reader will never read a pointer 12.482 +to a partially written manifest revision from the changelog, and it will 12.483 +never read a pointer to a partially written filelog revision from the 12.484 +manifest. 12.485 + 12.486 +\subsection{Concurrent access} 12.487 + 12.488 +The read/write ordering and atomicity guarantees mean that Mercurial 12.489 +never needs to \emph{lock} a repository when it's reading data, even 12.490 +if the repository is being written to while the read is occurring. 12.491 +This has a big effect on scalability; you can have an arbitrary number 12.492 +of Mercurial processes safely reading data from a repository safely 12.493 +all at once, no matter whether it's being written to or not. 12.494 + 12.495 +The lockless nature of reading means that if you're sharing a 12.496 +repository on a multi-user system, you don't need to grant other local 12.497 +users permission to \emph{write} to your repository in order for them 12.498 +to be able to clone it or pull changes from it; they only need 12.499 +\emph{read} permission. (This is \emph{not} a common feature among 12.500 +revision control systems, so don't take it for granted! Most require 12.501 +readers to be able to lock a repository to access it safely, and this 12.502 +requires write permission on at least one directory, which of course 12.503 +makes for all kinds of nasty and annoying security and administrative 12.504 +problems.) 12.505 + 12.506 +Mercurial uses locks to ensure that only one process can write to a 12.507 +repository at a time (the locking mechanism is safe even over 12.508 +filesystems that are notoriously hostile to locking, such as NFS). If 12.509 +a repository is locked, a writer will wait for a while to retry if the 12.510 +repository becomes unlocked, but if the repository remains locked for 12.511 +too long, the process attempting to write will time out after a while. 12.512 +This means that your daily automated scripts won't get stuck forever 12.513 +and pile up if a system crashes unnoticed, for example. (Yes, the 12.514 +timeout is configurable, from zero to infinity.) 12.515 + 12.516 +\subsubsection{Safe dirstate access} 12.517 + 12.518 +As with revision data, Mercurial doesn't take a lock to read the 12.519 +dirstate file; it does acquire a lock to write it. To avoid the 12.520 +possibility of reading a partially written copy of the dirstate file, 12.521 +Mercurial writes to a file with a unique name in the same directory as 12.522 +the dirstate file, then renames the temporary file atomically to 12.523 +\filename{dirstate}. The file named \filename{dirstate} is thus 12.524 +guaranteed to be complete, not partially written. 12.525 + 12.526 +\subsection{Avoiding seeks} 12.527 + 12.528 +Critical to Mercurial's performance is the avoidance of seeks of the 12.529 +disk head, since any seek is far more expensive than even a 12.530 +comparatively large read operation. 12.531 + 12.532 +This is why, for example, the dirstate is stored in a single file. If 12.533 +there were a dirstate file per directory that Mercurial tracked, the 12.534 +disk would seek once per directory. Instead, Mercurial reads the 12.535 +entire single dirstate file in one step. 12.536 + 12.537 +Mercurial also uses a ``copy on write'' scheme when cloning a 12.538 +repository on local storage. Instead of copying every revlog file 12.539 +from the old repository into the new repository, it makes a ``hard 12.540 +link'', which is a shorthand way to say ``these two names point to the 12.541 +same file''. When Mercurial is about to write to one of a revlog's 12.542 +files, it checks to see if the number of names pointing at the file is 12.543 +greater than one. If it is, more than one repository is using the 12.544 +file, so Mercurial makes a new copy of the file that is private to 12.545 +this repository. 12.546 + 12.547 +A few revision control developers have pointed out that this idea of 12.548 +making a complete private copy of a file is not very efficient in its 12.549 +use of storage. While this is true, storage is cheap, and this method 12.550 +gives the highest performance while deferring most book-keeping to the 12.551 +operating system. An alternative scheme would most likely reduce 12.552 +performance and increase the complexity of the software, each of which 12.553 +is much more important to the ``feel'' of day-to-day use. 12.554 + 12.555 +\subsection{Other contents of the dirstate} 12.556 + 12.557 +Because Mercurial doesn't force you to tell it when you're modifying a 12.558 +file, it uses the dirstate to store some extra information so it can 12.559 +determine efficiently whether you have modified a file. For each file 12.560 +in the working directory, it stores the time that it last modified the 12.561 +file itself, and the size of the file at that time. 12.562 + 12.563 +When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or 12.564 +\hgcmd{copy} files, Mercurial updates the dirstate so that it knows 12.565 +what to do with those files when you commit. 12.566 + 12.567 +When Mercurial is checking the states of files in the working 12.568 +directory, it first checks a file's modification time. If that has 12.569 +not changed, the file must not have been modified. If the file's size 12.570 +has changed, the file must have been modified. If the modification 12.571 +time has changed, but the size has not, only then does Mercurial need 12.572 +to read the actual contents of the file to see if they've changed. 12.573 +Storing these few extra pieces of information dramatically reduces the 12.574 +amount of data that Mercurial needs to read, which yields large 12.575 +performance improvements compared to other revision control systems. 12.576 + 12.577 +%%% Local Variables: 12.578 +%%% mode: latex 12.579 +%%% TeX-master: "00book" 12.580 +%%% End:
13.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 13.2 +++ b/en/ch05-daily.tex Thu Jan 29 22:56:27 2009 -0800 13.3 @@ -0,0 +1,381 @@ 13.4 +\chapter{Mercurial in daily use} 13.5 +\label{chap:daily} 13.6 + 13.7 +\section{Telling Mercurial which files to track} 13.8 + 13.9 +Mercurial does not work with files in your repository unless you tell 13.10 +it to manage them. The \hgcmd{status} command will tell you which 13.11 +files Mercurial doesn't know about; it uses a ``\texttt{?}'' to 13.12 +display such files. 13.13 + 13.14 +To tell Mercurial to track a file, use the \hgcmd{add} command. Once 13.15 +you have added a file, the entry in the output of \hgcmd{status} for 13.16 +that file changes from ``\texttt{?}'' to ``\texttt{A}''. 13.17 +\interaction{daily.files.add} 13.18 + 13.19 +After you run a \hgcmd{commit}, the files that you added before the 13.20 +commit will no longer be listed in the output of \hgcmd{status}. The 13.21 +reason for this is that \hgcmd{status} only tells you about 13.22 +``interesting'' files---those that you have modified or told Mercurial 13.23 +to do something with---by default. If you have a repository that 13.24 +contains thousands of files, you will rarely want to know about files 13.25 +that Mercurial is tracking, but that have not changed. (You can still 13.26 +get this information; we'll return to this later.) 13.27 + 13.28 +Once you add a file, Mercurial doesn't do anything with it 13.29 +immediately. Instead, it will take a snapshot of the file's state the 13.30 +next time you perform a commit. It will then continue to track the 13.31 +changes you make to the file every time you commit, until you remove 13.32 +the file. 13.33 + 13.34 +\subsection{Explicit versus implicit file naming} 13.35 + 13.36 +A useful behaviour that Mercurial has is that if you pass the name of 13.37 +a directory to a command, every Mercurial command will treat this as 13.38 +``I want to operate on every file in this directory and its 13.39 +subdirectories''. 13.40 +\interaction{daily.files.add-dir} 13.41 +Notice in this example that Mercurial printed the names of the files 13.42 +it added, whereas it didn't do so when we added the file named 13.43 +\filename{a} in the earlier example. 13.44 + 13.45 +What's going on is that in the former case, we explicitly named the 13.46 +file to add on the command line, so the assumption that Mercurial 13.47 +makes in such cases is that you know what you were doing, and it 13.48 +doesn't print any output. 13.49 + 13.50 +However, when we \emph{imply} the names of files by giving the name of 13.51 +a directory, Mercurial takes the extra step of printing the name of 13.52 +each file that it does something with. This makes it more clear what 13.53 +is happening, and reduces the likelihood of a silent and nasty 13.54 +surprise. This behaviour is common to most Mercurial commands. 13.55 + 13.56 +\subsection{Aside: Mercurial tracks files, not directories} 13.57 + 13.58 +Mercurial does not track directory information. Instead, it tracks 13.59 +the path to a file. Before creating a file, it first creates any 13.60 +missing directory components of the path. After it deletes a file, it 13.61 +then deletes any empty directories that were in the deleted file's 13.62 +path. This sounds like a trivial distinction, but it has one minor 13.63 +practical consequence: it is not possible to represent a completely 13.64 +empty directory in Mercurial. 13.65 + 13.66 +Empty directories are rarely useful, and there are unintrusive 13.67 +workarounds that you can use to achieve an appropriate effect. The 13.68 +developers of Mercurial thus felt that the complexity that would be 13.69 +required to manage empty directories was not worth the limited benefit 13.70 +this feature would bring. 13.71 + 13.72 +If you need an empty directory in your repository, there are a few 13.73 +ways to achieve this. One is to create a directory, then \hgcmd{add} a 13.74 +``hidden'' file to that directory. On Unix-like systems, any file 13.75 +name that begins with a period (``\texttt{.}'') is treated as hidden 13.76 +by most commands and GUI tools. This approach is illustrated in 13.77 +figure~\ref{ex:daily:hidden}. 13.78 + 13.79 +\begin{figure}[ht] 13.80 + \interaction{daily.files.hidden} 13.81 + \caption{Simulating an empty directory using a hidden file} 13.82 + \label{ex:daily:hidden} 13.83 +\end{figure} 13.84 + 13.85 +Another way to tackle a need for an empty directory is to simply 13.86 +create one in your automated build scripts before they will need it. 13.87 + 13.88 +\section{How to stop tracking a file} 13.89 + 13.90 +Once you decide that a file no longer belongs in your repository, use 13.91 +the \hgcmd{remove} command; this deletes the file, and tells Mercurial 13.92 +to stop tracking it. A removed file is represented in the output of 13.93 +\hgcmd{status} with a ``\texttt{R}''. 13.94 +\interaction{daily.files.remove} 13.95 + 13.96 +After you \hgcmd{remove} a file, Mercurial will no longer track 13.97 +changes to that file, even if you recreate a file with the same name 13.98 +in your working directory. If you do recreate a file with the same 13.99 +name and want Mercurial to track the new file, simply \hgcmd{add} it. 13.100 +Mercurial will know that the newly added file is not related to the 13.101 +old file of the same name. 13.102 + 13.103 +\subsection{Removing a file does not affect its history} 13.104 + 13.105 +It is important to understand that removing a file has only two 13.106 +effects. 13.107 +\begin{itemize} 13.108 +\item It removes the current version of the file from the working 13.109 + directory. 13.110 +\item It stops Mercurial from tracking changes to the file, from the 13.111 + time of the next commit. 13.112 +\end{itemize} 13.113 +Removing a file \emph{does not} in any way alter the \emph{history} of 13.114 +the file. 13.115 + 13.116 +If you update the working directory to a changeset in which a file 13.117 +that you have removed was still tracked, it will reappear in the 13.118 +working directory, with the contents it had when you committed that 13.119 +changeset. If you then update the working directory to a later 13.120 +changeset, in which the file had been removed, Mercurial will once 13.121 +again remove the file from the working directory. 13.122 + 13.123 +\subsection{Missing files} 13.124 + 13.125 +Mercurial considers a file that you have deleted, but not used 13.126 +\hgcmd{remove} to delete, to be \emph{missing}. A missing file is 13.127 +represented with ``\texttt{!}'' in the output of \hgcmd{status}. 13.128 +Mercurial commands will not generally do anything with missing files. 13.129 +\interaction{daily.files.missing} 13.130 + 13.131 +If your repository contains a file that \hgcmd{status} reports as 13.132 +missing, and you want the file to stay gone, you can run 13.133 +\hgcmdargs{remove}{\hgopt{remove}{--after}} at any time later on, to 13.134 +tell Mercurial that you really did mean to remove the file. 13.135 +\interaction{daily.files.remove-after} 13.136 + 13.137 +On the other hand, if you deleted the missing file by accident, use 13.138 +\hgcmdargs{revert}{\emph{filename}} to recover the file. It will 13.139 +reappear, in unmodified form. 13.140 +\interaction{daily.files.recover-missing} 13.141 + 13.142 +\subsection{Aside: why tell Mercurial explicitly to 13.143 + remove a file?} 13.144 + 13.145 +You might wonder why Mercurial requires you to explicitly tell it that 13.146 +you are deleting a file. Early during the development of Mercurial, 13.147 +it let you delete a file however you pleased; Mercurial would notice 13.148 +the absence of the file automatically when you next ran a 13.149 +\hgcmd{commit}, and stop tracking the file. In practice, this made it 13.150 +too easy to accidentally remove a file without noticing. 13.151 + 13.152 +\subsection{Useful shorthand---adding and removing files 13.153 + in one step} 13.154 + 13.155 +Mercurial offers a combination command, \hgcmd{addremove}, that adds 13.156 +untracked files and marks missing files as removed. 13.157 +\interaction{daily.files.addremove} 13.158 +The \hgcmd{commit} command also provides a \hgopt{commit}{-A} option 13.159 +that performs this same add-and-remove, immediately followed by a 13.160 +commit. 13.161 +\interaction{daily.files.commit-addremove} 13.162 + 13.163 +\section{Copying files} 13.164 + 13.165 +Mercurial provides a \hgcmd{copy} command that lets you make a new 13.166 +copy of a file. When you copy a file using this command, Mercurial 13.167 +makes a record of the fact that the new file is a copy of the original 13.168 +file. It treats these copied files specially when you merge your work 13.169 +with someone else's. 13.170 + 13.171 +\subsection{The results of copying during a merge} 13.172 + 13.173 +What happens during a merge is that changes ``follow'' a copy. To 13.174 +best illustrate what this means, let's create an example. We'll start 13.175 +with the usual tiny repository that contains a single file. 13.176 +\interaction{daily.copy.init} 13.177 +We need to do some work in parallel, so that we'll have something to 13.178 +merge. So let's clone our repository. 13.179 +\interaction{daily.copy.clone} 13.180 +Back in our initial repository, let's use the \hgcmd{copy} command to 13.181 +make a copy of the first file we created. 13.182 +\interaction{daily.copy.copy} 13.183 + 13.184 +If we look at the output of the \hgcmd{status} command afterwards, the 13.185 +copied file looks just like a normal added file. 13.186 +\interaction{daily.copy.status} 13.187 +But if we pass the \hgopt{status}{-C} option to \hgcmd{status}, it 13.188 +prints another line of output: this is the file that our newly-added 13.189 +file was copied \emph{from}. 13.190 +\interaction{daily.copy.status-copy} 13.191 + 13.192 +Now, back in the repository we cloned, let's make a change in 13.193 +parallel. We'll add a line of content to the original file that we 13.194 +created. 13.195 +\interaction{daily.copy.other} 13.196 +Now we have a modified \filename{file} in this repository. When we 13.197 +pull the changes from the first repository, and merge the two heads, 13.198 +Mercurial will propagate the changes that we made locally to 13.199 +\filename{file} into its copy, \filename{new-file}. 13.200 +\interaction{daily.copy.merge} 13.201 + 13.202 +\subsection{Why should changes follow copies?} 13.203 +\label{sec:daily:why-copy} 13.204 + 13.205 +This behaviour, of changes to a file propagating out to copies of the 13.206 +file, might seem esoteric, but in most cases it's highly desirable. 13.207 + 13.208 +First of all, remember that this propagation \emph{only} happens when 13.209 +you merge. So if you \hgcmd{copy} a file, and subsequently modify the 13.210 +original file during the normal course of your work, nothing will 13.211 +happen. 13.212 + 13.213 +The second thing to know is that modifications will only propagate 13.214 +across a copy as long as the repository that you're pulling changes 13.215 +from \emph{doesn't know} about the copy. 13.216 + 13.217 +The reason that Mercurial does this is as follows. Let's say I make 13.218 +an important bug fix in a source file, and commit my changes. 13.219 +Meanwhile, you've decided to \hgcmd{copy} the file in your repository, 13.220 +without knowing about the bug or having seen the fix, and you have 13.221 +started hacking on your copy of the file. 13.222 + 13.223 +If you pulled and merged my changes, and Mercurial \emph{didn't} 13.224 +propagate changes across copies, your source file would now contain 13.225 +the bug, and unless you remembered to propagate the bug fix by hand, 13.226 +the bug would \emph{remain} in your copy of the file. 13.227 + 13.228 +By automatically propagating the change that fixed the bug from the 13.229 +original file to the copy, Mercurial prevents this class of problem. 13.230 +To my knowledge, Mercurial is the \emph{only} revision control system 13.231 +that propagates changes across copies like this. 13.232 + 13.233 +Once your change history has a record that the copy and subsequent 13.234 +merge occurred, there's usually no further need to propagate changes 13.235 +from the original file to the copied file, and that's why Mercurial 13.236 +only propagates changes across copies until this point, and no 13.237 +further. 13.238 + 13.239 +\subsection{How to make changes \emph{not} follow a copy} 13.240 + 13.241 +If, for some reason, you decide that this business of automatically 13.242 +propagating changes across copies is not for you, simply use your 13.243 +system's normal file copy command (on Unix-like systems, that's 13.244 +\command{cp}) to make a copy of a file, then \hgcmd{add} the new copy 13.245 +by hand. Before you do so, though, please do reread 13.246 +section~\ref{sec:daily:why-copy}, and make an informed decision that 13.247 +this behaviour is not appropriate to your specific case. 13.248 + 13.249 +\subsection{Behaviour of the \hgcmd{copy} command} 13.250 + 13.251 +When you use the \hgcmd{copy} command, Mercurial makes a copy of each 13.252 +source file as it currently stands in the working directory. This 13.253 +means that if you make some modifications to a file, then \hgcmd{copy} 13.254 +it without first having committed those changes, the new copy will 13.255 +also contain the modifications you have made up until that point. (I 13.256 +find this behaviour a little counterintuitive, which is why I mention 13.257 +it here.) 13.258 + 13.259 +The \hgcmd{copy} command acts similarly to the Unix \command{cp} 13.260 +command (you can use the \hgcmd{cp} alias if you prefer). The last 13.261 +argument is the \emph{destination}, and all prior arguments are 13.262 +\emph{sources}. If you pass it a single file as the source, and the 13.263 +destination does not exist, it creates a new file with that name. 13.264 +\interaction{daily.copy.simple} 13.265 +If the destination is a directory, Mercurial copies its sources into 13.266 +that directory. 13.267 +\interaction{daily.copy.dir-dest} 13.268 +Copying a directory is recursive, and preserves the directory 13.269 +structure of the source. 13.270 +\interaction{daily.copy.dir-src} 13.271 +If the source and destination are both directories, the source tree is 13.272 +recreated in the destination directory. 13.273 +\interaction{daily.copy.dir-src-dest} 13.274 + 13.275 +As with the \hgcmd{rename} command, if you copy a file manually and 13.276 +then want Mercurial to know that you've copied the file, simply use 13.277 +the \hgopt{copy}{--after} option to \hgcmd{copy}. 13.278 +\interaction{daily.copy.after} 13.279 + 13.280 +\section{Renaming files} 13.281 + 13.282 +It's rather more common to need to rename a file than to make a copy 13.283 +of it. The reason I discussed the \hgcmd{copy} command before talking 13.284 +about renaming files is that Mercurial treats a rename in essentially 13.285 +the same way as a copy. Therefore, knowing what Mercurial does when 13.286 +you copy a file tells you what to expect when you rename a file. 13.287 + 13.288 +When you use the \hgcmd{rename} command, Mercurial makes a copy of 13.289 +each source file, then deletes it and marks the file as removed. 13.290 +\interaction{daily.rename.rename} 13.291 +The \hgcmd{status} command shows the newly copied file as added, and 13.292 +the copied-from file as removed. 13.293 +\interaction{daily.rename.status} 13.294 +As with the results of a \hgcmd{copy}, we must use the 13.295 +\hgopt{status}{-C} option to \hgcmd{status} to see that the added file 13.296 +is really being tracked by Mercurial as a copy of the original, now 13.297 +removed, file. 13.298 +\interaction{daily.rename.status-copy} 13.299 + 13.300 +As with \hgcmd{remove} and \hgcmd{copy}, you can tell Mercurial about 13.301 +a rename after the fact using the \hgopt{rename}{--after} option. In 13.302 +most other respects, the behaviour of the \hgcmd{rename} command, and 13.303 +the options it accepts, are similar to the \hgcmd{copy} command. 13.304 + 13.305 +\subsection{Renaming files and merging changes} 13.306 + 13.307 +Since Mercurial's rename is implemented as copy-and-remove, the same 13.308 +propagation of changes happens when you merge after a rename as after 13.309 +a copy. 13.310 + 13.311 +If I modify a file, and you rename it to a new name, and then we merge 13.312 +our respective changes, my modifications to the file under its 13.313 +original name will be propagated into the file under its new name. 13.314 +(This is something you might expect to ``simply work,'' but not all 13.315 +revision control systems actually do this.) 13.316 + 13.317 +Whereas having changes follow a copy is a feature where you can 13.318 +perhaps nod and say ``yes, that might be useful,'' it should be clear 13.319 +that having them follow a rename is definitely important. Without 13.320 +this facility, it would simply be too easy for changes to become 13.321 +orphaned when files are renamed. 13.322 + 13.323 +\subsection{Divergent renames and merging} 13.324 + 13.325 +The case of diverging names occurs when two developers start with a 13.326 +file---let's call it \filename{foo}---in their respective 13.327 +repositories. 13.328 + 13.329 +\interaction{rename.divergent.clone} 13.330 +Anne renames the file to \filename{bar}. 13.331 +\interaction{rename.divergent.rename.anne} 13.332 +Meanwhile, Bob renames it to \filename{quux}. 13.333 +\interaction{rename.divergent.rename.bob} 13.334 + 13.335 +I like to think of this as a conflict because each developer has 13.336 +expressed different intentions about what the file ought to be named. 13.337 + 13.338 +What do you think should happen when they merge their work? 13.339 +Mercurial's actual behaviour is that it always preserves \emph{both} 13.340 +names when it merges changesets that contain divergent renames. 13.341 +\interaction{rename.divergent.merge} 13.342 + 13.343 +Notice that Mercurial does warn about the divergent renames, but it 13.344 +leaves it up to you to do something about the divergence after the merge. 13.345 + 13.346 +\subsection{Convergent renames and merging} 13.347 + 13.348 +Another kind of rename conflict occurs when two people choose to 13.349 +rename different \emph{source} files to the same \emph{destination}. 13.350 +In this case, Mercurial runs its normal merge machinery, and lets you 13.351 +guide it to a suitable resolution. 13.352 + 13.353 +\subsection{Other name-related corner cases} 13.354 + 13.355 +Mercurial has a longstanding bug in which it fails to handle a merge 13.356 +where one side has a file with a given name, while another has a 13.357 +directory with the same name. This is documented as~\bug{29}. 13.358 +\interaction{issue29.go} 13.359 + 13.360 +\section{Recovering from mistakes} 13.361 + 13.362 +Mercurial has some useful commands that will help you to recover from 13.363 +some common mistakes. 13.364 + 13.365 +The \hgcmd{revert} command lets you undo changes that you have made to 13.366 +your working directory. For example, if you \hgcmd{add} a file by 13.367 +accident, just run \hgcmd{revert} with the name of the file you added, 13.368 +and while the file won't be touched in any way, it won't be tracked 13.369 +for adding by Mercurial any longer, either. You can also use 13.370 +\hgcmd{revert} to get rid of erroneous changes to a file. 13.371 + 13.372 +It's useful to remember that the \hgcmd{revert} command is useful for 13.373 +changes that you have not yet committed. Once you've committed a 13.374 +change, if you decide it was a mistake, you can still do something 13.375 +about it, though your options may be more limited. 13.376 + 13.377 +For more information about the \hgcmd{revert} command, and details 13.378 +about how to deal with changes you have already committed, see 13.379 +chapter~\ref{chap:undo}. 13.380 + 13.381 +%%% Local Variables: 13.382 +%%% mode: latex 13.383 +%%% TeX-master: "00book" 13.384 +%%% End:
14.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 14.2 +++ b/en/ch06-collab.tex Thu Jan 29 22:56:27 2009 -0800 14.3 @@ -0,0 +1,1118 @@ 14.4 +\chapter{Collaborating with other people} 14.5 +\label{cha:collab} 14.6 + 14.7 +As a completely decentralised tool, Mercurial doesn't impose any 14.8 +policy on how people ought to work with each other. However, if 14.9 +you're new to distributed revision control, it helps to have some 14.10 +tools and examples in mind when you're thinking about possible 14.11 +workflow models. 14.12 + 14.13 +\section{Mercurial's web interface} 14.14 + 14.15 +Mercurial has a powerful web interface that provides several 14.16 +useful capabilities. 14.17 + 14.18 +For interactive use, the web interface lets you browse a single 14.19 +repository or a collection of repositories. You can view the history 14.20 +of a repository, examine each change (comments and diffs), and view 14.21 +the contents of each directory and file. 14.22 + 14.23 +Also for human consumption, the web interface provides an RSS feed of 14.24 +the changes in a repository. This lets you ``subscribe'' to a 14.25 +repository using your favourite feed reader, and be automatically 14.26 +notified of activity in that repository as soon as it happens. I find 14.27 +this capability much more convenient than the model of subscribing to 14.28 +a mailing list to which notifications are sent, as it requires no 14.29 +additional configuration on the part of whoever is serving the 14.30 +repository. 14.31 + 14.32 +The web interface also lets remote users clone a repository, pull 14.33 +changes from it, and (when the server is configured to permit it) push 14.34 +changes back to it. Mercurial's HTTP tunneling protocol aggressively 14.35 +compresses data, so that it works efficiently even over low-bandwidth 14.36 +network connections. 14.37 + 14.38 +The easiest way to get started with the web interface is to use your 14.39 +web browser to visit an existing repository, such as the master 14.40 +Mercurial repository at 14.41 +\url{http://www.selenic.com/repo/hg?style=gitweb}. 14.42 + 14.43 +If you're interested in providing a web interface to your own 14.44 +repositories, Mercurial provides two ways to do this. The first is 14.45 +using the \hgcmd{serve} command, which is best suited to short-term 14.46 +``lightweight'' serving. See section~\ref{sec:collab:serve} below for 14.47 +details of how to use this command. If you have a long-lived 14.48 +repository that you'd like to make permanently available, Mercurial 14.49 +has built-in support for the CGI (Common Gateway Interface) standard, 14.50 +which all common web servers support. See 14.51 +section~\ref{sec:collab:cgi} for details of CGI configuration. 14.52 + 14.53 +\section{Collaboration models} 14.54 + 14.55 +With a suitably flexible tool, making decisions about workflow is much 14.56 +more of a social engineering challenge than a technical one. 14.57 +Mercurial imposes few limitations on how you can structure the flow of 14.58 +work in a project, so it's up to you and your group to set up and live 14.59 +with a model that matches your own particular needs. 14.60 + 14.61 +\subsection{Factors to keep in mind} 14.62 + 14.63 +The most important aspect of any model that you must keep in mind is 14.64 +how well it matches the needs and capabilities of the people who will 14.65 +be using it. This might seem self-evident; even so, you still can't 14.66 +afford to forget it for a moment. 14.67 + 14.68 +I once put together a workflow model that seemed to make perfect sense 14.69 +to me, but that caused a considerable amount of consternation and 14.70 +strife within my development team. In spite of my attempts to explain 14.71 +why we needed a complex set of branches, and how changes ought to flow 14.72 +between them, a few team members revolted. Even though they were 14.73 +smart people, they didn't want to pay attention to the constraints we 14.74 +were operating under, or face the consequences of those constraints in 14.75 +the details of the model that I was advocating. 14.76 + 14.77 +Don't sweep foreseeable social or technical problems under the rug. 14.78 +Whatever scheme you put into effect, you should plan for mistakes and 14.79 +problem scenarios. Consider adding automated machinery to prevent, or 14.80 +quickly recover from, trouble that you can anticipate. As an example, 14.81 +if you intend to have a branch with not-for-release changes in it, 14.82 +you'd do well to think early about the possibility that someone might 14.83 +accidentally merge those changes into a release branch. You could 14.84 +avoid this particular problem by writing a hook that prevents changes 14.85 +from being merged from an inappropriate branch. 14.86 + 14.87 +\subsection{Informal anarchy} 14.88 + 14.89 +I wouldn't suggest an ``anything goes'' approach as something 14.90 +sustainable, but it's a model that's easy to grasp, and it works 14.91 +perfectly well in a few unusual situations. 14.92 + 14.93 +As one example, many projects have a loose-knit group of collaborators 14.94 +who rarely physically meet each other. Some groups like to overcome 14.95 +the isolation of working at a distance by organising occasional 14.96 +``sprints''. In a sprint, a number of people get together in a single 14.97 +location (a company's conference room, a hotel meeting room, that kind 14.98 +of place) and spend several days more or less locked in there, hacking 14.99 +intensely on a handful of projects. 14.100 + 14.101 +A sprint is the perfect place to use the \hgcmd{serve} command, since 14.102 +\hgcmd{serve} does not requires any fancy server infrastructure. You 14.103 +can get started with \hgcmd{serve} in moments, by reading 14.104 +section~\ref{sec:collab:serve} below. Then simply tell the person 14.105 +next to you that you're running a server, send the URL to them in an 14.106 +instant message, and you immediately have a quick-turnaround way to 14.107 +work together. They can type your URL into their web browser and 14.108 +quickly review your changes; or they can pull a bugfix from you and 14.109 +verify it; or they can clone a branch containing a new feature and try 14.110 +it out. 14.111 + 14.112 +The charm, and the problem, with doing things in an ad hoc fashion 14.113 +like this is that only people who know about your changes, and where 14.114 +they are, can see them. Such an informal approach simply doesn't 14.115 +scale beyond a handful people, because each individual needs to know 14.116 +about $n$ different repositories to pull from. 14.117 + 14.118 +\subsection{A single central repository} 14.119 + 14.120 +For smaller projects migrating from a centralised revision control 14.121 +tool, perhaps the easiest way to get started is to have changes flow 14.122 +through a single shared central repository. This is also the 14.123 +most common ``building block'' for more ambitious workflow schemes. 14.124 + 14.125 +Contributors start by cloning a copy of this repository. They can 14.126 +pull changes from it whenever they need to, and some (perhaps all) 14.127 +developers have permission to push a change back when they're ready 14.128 +for other people to see it. 14.129 + 14.130 +Under this model, it can still often make sense for people to pull 14.131 +changes directly from each other, without going through the central 14.132 +repository. Consider a case in which I have a tentative bug fix, but 14.133 +I am worried that if I were to publish it to the central repository, 14.134 +it might subsequently break everyone else's trees as they pull it. To 14.135 +reduce the potential for damage, I can ask you to clone my repository 14.136 +into a temporary repository of your own and test it. This lets us put 14.137 +off publishing the potentially unsafe change until it has had a little 14.138 +testing. 14.139 + 14.140 +In this kind of scenario, people usually use the \command{ssh} 14.141 +protocol to securely push changes to the central repository, as 14.142 +documented in section~\ref{sec:collab:ssh}. It's also usual to 14.143 +publish a read-only copy of the repository over HTTP using CGI, as in 14.144 +section~\ref{sec:collab:cgi}. Publishing over HTTP satisfies the 14.145 +needs of people who don't have push access, and those who want to use 14.146 +web browsers to browse the repository's history. 14.147 + 14.148 +\subsection{Working with multiple branches} 14.149 + 14.150 +Projects of any significant size naturally tend to make progress on 14.151 +several fronts simultaneously. In the case of software, it's common 14.152 +for a project to go through periodic official releases. A release 14.153 +might then go into ``maintenance mode'' for a while after its first 14.154 +publication; maintenance releases tend to contain only bug fixes, not 14.155 +new features. In parallel with these maintenance releases, one or 14.156 +more future releases may be under development. People normally use 14.157 +the word ``branch'' to refer to one of these many slightly different 14.158 +directions in which development is proceeding. 14.159 + 14.160 +Mercurial is particularly well suited to managing a number of 14.161 +simultaneous, but not identical, branches. Each ``development 14.162 +direction'' can live in its own central repository, and you can merge 14.163 +changes from one to another as the need arises. Because repositories 14.164 +are independent of each other, unstable changes in a development 14.165 +branch will never affect a stable branch unless someone explicitly 14.166 +merges those changes in. 14.167 + 14.168 +Here's an example of how this can work in practice. Let's say you 14.169 +have one ``main branch'' on a central server. 14.170 +\interaction{branching.init} 14.171 +People clone it, make changes locally, test them, and push them back. 14.172 + 14.173 +Once the main branch reaches a release milestone, you can use the 14.174 +\hgcmd{tag} command to give a permanent name to the milestone 14.175 +revision. 14.176 +\interaction{branching.tag} 14.177 +Let's say some ongoing development occurs on the main branch. 14.178 +\interaction{branching.main} 14.179 +Using the tag that was recorded at the milestone, people who clone 14.180 +that repository at any time in the future can use \hgcmd{update} to 14.181 +get a copy of the working directory exactly as it was when that tagged 14.182 +revision was committed. 14.183 +\interaction{branching.update} 14.184 + 14.185 +In addition, immediately after the main branch is tagged, someone can 14.186 +then clone the main branch on the server to a new ``stable'' branch, 14.187 +also on the server. 14.188 +\interaction{branching.clone} 14.189 + 14.190 +Someone who needs to make a change to the stable branch can then clone 14.191 +\emph{that} repository, make their changes, commit, and push their 14.192 +changes back there. 14.193 +\interaction{branching.stable} 14.194 +Because Mercurial repositories are independent, and Mercurial doesn't 14.195 +move changes around automatically, the stable and main branches are 14.196 +\emph{isolated} from each other. The changes that you made on the 14.197 +main branch don't ``leak'' to the stable branch, and vice versa. 14.198 + 14.199 +You'll often want all of your bugfixes on the stable branch to show up 14.200 +on the main branch, too. Rather than rewrite a bugfix on the main 14.201 +branch, you can simply pull and merge changes from the stable to the 14.202 +main branch, and Mercurial will bring those bugfixes in for you. 14.203 +\interaction{branching.merge} 14.204 +The main branch will still contain changes that are not on the stable 14.205 +branch, but it will also contain all of the bugfixes from the stable 14.206 +branch. The stable branch remains unaffected by these changes. 14.207 + 14.208 +\subsection{Feature branches} 14.209 + 14.210 +For larger projects, an effective way to manage change is to break up 14.211 +a team into smaller groups. Each group has a shared branch of its 14.212 +own, cloned from a single ``master'' branch used by the entire 14.213 +project. People working on an individual branch are typically quite 14.214 +isolated from developments on other branches. 14.215 + 14.216 +\begin{figure}[ht] 14.217 + \centering 14.218 + \grafix{feature-branches} 14.219 + \caption{Feature branches} 14.220 + \label{fig:collab:feature-branches} 14.221 +\end{figure} 14.222 + 14.223 +When a particular feature is deemed to be in suitable shape, someone 14.224 +on that feature team pulls and merges from the master branch into the 14.225 +feature branch, then pushes back up to the master branch. 14.226 + 14.227 +\subsection{The release train} 14.228 + 14.229 +Some projects are organised on a ``train'' basis: a release is 14.230 +scheduled to happen every few months, and whatever features are ready 14.231 +when the ``train'' is ready to leave are allowed in. 14.232 + 14.233 +This model resembles working with feature branches. The difference is 14.234 +that when a feature branch misses a train, someone on the feature team 14.235 +pulls and merges the changes that went out on that train release into 14.236 +the feature branch, and the team continues its work on top of that 14.237 +release so that their feature can make the next release. 14.238 + 14.239 +\subsection{The Linux kernel model} 14.240 + 14.241 +The development of the Linux kernel has a shallow hierarchical 14.242 +structure, surrounded by a cloud of apparent chaos. Because most 14.243 +Linux developers use \command{git}, a distributed revision control 14.244 +tool with capabilities similar to Mercurial, it's useful to describe 14.245 +the way work flows in that environment; if you like the ideas, the 14.246 +approach translates well across tools. 14.247 + 14.248 +At the center of the community sits Linus Torvalds, the creator of 14.249 +Linux. He publishes a single source repository that is considered the 14.250 +``authoritative'' current tree by the entire developer community. 14.251 +Anyone can clone Linus's tree, but he is very choosy about whose trees 14.252 +he pulls from. 14.253 + 14.254 +Linus has a number of ``trusted lieutenants''. As a general rule, he 14.255 +pulls whatever changes they publish, in most cases without even 14.256 +reviewing those changes. Some of those lieutenants are generally 14.257 +agreed to be ``maintainers'', responsible for specific subsystems 14.258 +within the kernel. If a random kernel hacker wants to make a change 14.259 +to a subsystem that they want to end up in Linus's tree, they must 14.260 +find out who the subsystem's maintainer is, and ask that maintainer to 14.261 +take their change. If the maintainer reviews their changes and agrees 14.262 +to take them, they'll pass them along to Linus in due course. 14.263 + 14.264 +Individual lieutenants have their own approaches to reviewing, 14.265 +accepting, and publishing changes; and for deciding when to feed them 14.266 +to Linus. In addition, there are several well known branches that 14.267 +people use for different purposes. For example, a few people maintain 14.268 +``stable'' repositories of older versions of the kernel, to which they 14.269 +apply critical fixes as needed. Some maintainers publish multiple 14.270 +trees: one for experimental changes; one for changes that they are 14.271 +about to feed upstream; and so on. Others just publish a single 14.272 +tree. 14.273 + 14.274 +This model has two notable features. The first is that it's ``pull 14.275 +only''. You have to ask, convince, or beg another developer to take a 14.276 +change from you, because there are almost no trees to which more than 14.277 +one person can push, and there's no way to push changes into a tree 14.278 +that someone else controls. 14.279 + 14.280 +The second is that it's based on reputation and acclaim. If you're an 14.281 +unknown, Linus will probably ignore changes from you without even 14.282 +responding. But a subsystem maintainer will probably review them, and 14.283 +will likely take them if they pass their criteria for suitability. 14.284 +The more ``good'' changes you contribute to a maintainer, the more 14.285 +likely they are to trust your judgment and accept your changes. If 14.286 +you're well-known and maintain a long-lived branch for something Linus 14.287 +hasn't yet accepted, people with similar interests may pull your 14.288 +changes regularly to keep up with your work. 14.289 + 14.290 +Reputation and acclaim don't necessarily cross subsystem or ``people'' 14.291 +boundaries. If you're a respected but specialised storage hacker, and 14.292 +you try to fix a networking bug, that change will receive a level of 14.293 +scrutiny from a network maintainer comparable to a change from a 14.294 +complete stranger. 14.295 + 14.296 +To people who come from more orderly project backgrounds, the 14.297 +comparatively chaotic Linux kernel development process often seems 14.298 +completely insane. It's subject to the whims of individuals; people 14.299 +make sweeping changes whenever they deem it appropriate; and the pace 14.300 +of development is astounding. And yet Linux is a highly successful, 14.301 +well-regarded piece of software. 14.302 + 14.303 +\subsection{Pull-only versus shared-push collaboration} 14.304 + 14.305 +A perpetual source of heat in the open source community is whether a 14.306 +development model in which people only ever pull changes from others 14.307 +is ``better than'' one in which multiple people can push changes to a 14.308 +shared repository. 14.309 + 14.310 +Typically, the backers of the shared-push model use tools that 14.311 +actively enforce this approach. If you're using a centralised 14.312 +revision control tool such as Subversion, there's no way to make a 14.313 +choice over which model you'll use: the tool gives you shared-push, 14.314 +and if you want to do anything else, you'll have to roll your own 14.315 +approach on top (such as applying a patch by hand). 14.316 + 14.317 +A good distributed revision control tool, such as Mercurial, will 14.318 +support both models. You and your collaborators can then structure 14.319 +how you work together based on your own needs and preferences, not on 14.320 +what contortions your tools force you into. 14.321 + 14.322 +\subsection{Where collaboration meets branch management} 14.323 + 14.324 +Once you and your team set up some shared repositories and start 14.325 +propagating changes back and forth between local and shared repos, you 14.326 +begin to face a related, but slightly different challenge: that of 14.327 +managing the multiple directions in which your team may be moving at 14.328 +once. Even though this subject is intimately related to how your team 14.329 +collaborates, it's dense enough to merit treatment of its own, in 14.330 +chapter~\ref{chap:branch}. 14.331 + 14.332 +\section{The technical side of sharing} 14.333 + 14.334 +The remainder of this chapter is devoted to the question of serving 14.335 +data to your collaborators. 14.336 + 14.337 +\section{Informal sharing with \hgcmd{serve}} 14.338 +\label{sec:collab:serve} 14.339 + 14.340 +Mercurial's \hgcmd{serve} command is wonderfully suited to small, 14.341 +tight-knit, and fast-paced group environments. It also provides a 14.342 +great way to get a feel for using Mercurial commands over a network. 14.343 + 14.344 +Run \hgcmd{serve} inside a repository, and in under a second it will 14.345 +bring up a specialised HTTP server; this will accept connections from 14.346 +any client, and serve up data for that repository until you terminate 14.347 +it. Anyone who knows the URL of the server you just started, and can 14.348 +talk to your computer over the network, can then use a web browser or 14.349 +Mercurial to read data from that repository. A URL for a 14.350 +\hgcmd{serve} instance running on a laptop is likely to look something 14.351 +like \Verb|http://my-laptop.local:8000/|. 14.352 + 14.353 +The \hgcmd{serve} command is \emph{not} a general-purpose web server. 14.354 +It can do only two things: 14.355 +\begin{itemize} 14.356 +\item Allow people to browse the history of the repository it's 14.357 + serving, from their normal web browsers. 14.358 +\item Speak Mercurial's wire protocol, so that people can 14.359 + \hgcmd{clone} or \hgcmd{pull} changes from that repository. 14.360 +\end{itemize} 14.361 +In particular, \hgcmd{serve} won't allow remote users to \emph{modify} 14.362 +your repository. It's intended for read-only use. 14.363 + 14.364 +If you're getting started with Mercurial, there's nothing to prevent 14.365 +you from using \hgcmd{serve} to serve up a repository on your own 14.366 +computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and 14.367 +so on to talk to that server as if the repository was hosted remotely. 14.368 +This can help you to quickly get acquainted with using commands on 14.369 +network-hosted repositories. 14.370 + 14.371 +\subsection{A few things to keep in mind} 14.372 + 14.373 +Because it provides unauthenticated read access to all clients, you 14.374 +should only use \hgcmd{serve} in an environment where you either don't 14.375 +care, or have complete control over, who can access your network and 14.376 +pull data from your repository. 14.377 + 14.378 +The \hgcmd{serve} command knows nothing about any firewall software 14.379 +you might have installed on your system or network. It cannot detect 14.380 +or control your firewall software. If other people are unable to talk 14.381 +to a running \hgcmd{serve} instance, the second thing you should do 14.382 +(\emph{after} you make sure that they're using the correct URL) is 14.383 +check your firewall configuration. 14.384 + 14.385 +By default, \hgcmd{serve} listens for incoming connections on 14.386 +port~8000. If another process is already listening on the port you 14.387 +want to use, you can specify a different port to listen on using the 14.388 +\hgopt{serve}{-p} option. 14.389 + 14.390 +Normally, when \hgcmd{serve} starts, it prints no output, which can be 14.391 +a bit unnerving. If you'd like to confirm that it is indeed running 14.392 +correctly, and find out what URL you should send to your 14.393 +collaborators, start it with the \hggopt{-v} option. 14.394 + 14.395 +\section{Using the Secure Shell (ssh) protocol} 14.396 +\label{sec:collab:ssh} 14.397 + 14.398 +You can pull and push changes securely over a network connection using 14.399 +the Secure Shell (\texttt{ssh}) protocol. To use this successfully, 14.400 +you may have to do a little bit of configuration on the client or 14.401 +server sides. 14.402 + 14.403 +If you're not familiar with ssh, it's a network protocol that lets you 14.404 +securely communicate with another computer. To use it with Mercurial, 14.405 +you'll be setting up one or more user accounts on a server so that 14.406 +remote users can log in and execute commands. 14.407 + 14.408 +(If you \emph{are} familiar with ssh, you'll probably find some of the 14.409 +material that follows to be elementary in nature.) 14.410 + 14.411 +\subsection{How to read and write ssh URLs} 14.412 + 14.413 +An ssh URL tends to look like this: 14.414 +\begin{codesample2} 14.415 + ssh://bos@hg.serpentine.com:22/hg/hgbook 14.416 +\end{codesample2} 14.417 +\begin{enumerate} 14.418 +\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh 14.419 + protocol. 14.420 +\item The ``\texttt{bos@}'' component indicates what username to log 14.421 + into the server as. You can leave this out if the remote username 14.422 + is the same as your local username. 14.423 +\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the 14.424 + server to log into. 14.425 +\item The ``:22'' identifies the port number to connect to the server 14.426 + on. The default port is~22, so you only need to specify this part 14.427 + if you're \emph{not} using port~22. 14.428 +\item The remainder of the URL is the local path to the repository on 14.429 + the server. 14.430 +\end{enumerate} 14.431 + 14.432 +There's plenty of scope for confusion with the path component of ssh 14.433 +URLs, as there is no standard way for tools to interpret it. Some 14.434 +programs behave differently than others when dealing with these paths. 14.435 +This isn't an ideal situation, but it's unlikely to change. Please 14.436 +read the following paragraphs carefully. 14.437 + 14.438 +Mercurial treats the path to a repository on the server as relative to 14.439 +the remote user's home directory. For example, if user \texttt{foo} 14.440 +on the server has a home directory of \dirname{/home/foo}, then an ssh 14.441 +URL that contains a path component of \dirname{bar} 14.442 +\emph{really} refers to the directory \dirname{/home/foo/bar}. 14.443 + 14.444 +If you want to specify a path relative to another user's home 14.445 +directory, you can use a path that starts with a tilde character 14.446 +followed by the user's name (let's call them \texttt{otheruser}), like 14.447 +this. 14.448 +\begin{codesample2} 14.449 + ssh://server/~otheruser/hg/repo 14.450 +\end{codesample2} 14.451 + 14.452 +And if you really want to specify an \emph{absolute} path on the 14.453 +server, begin the path component with two slashes, as in this example. 14.454 +\begin{codesample2} 14.455 + ssh://server//absolute/path 14.456 +\end{codesample2} 14.457 + 14.458 +\subsection{Finding an ssh client for your system} 14.459 + 14.460 +Almost every Unix-like system comes with OpenSSH preinstalled. If 14.461 +you're using such a system, run \Verb|which ssh| to find out if 14.462 +the \command{ssh} command is installed (it's usually in 14.463 +\dirname{/usr/bin}). In the unlikely event that it isn't present, 14.464 +take a look at your system documentation to figure out how to install 14.465 +it. 14.466 + 14.467 +On Windows, you'll first need to download a suitable ssh 14.468 +client. There are two alternatives. 14.469 +\begin{itemize} 14.470 +\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides 14.471 + a complete suite of ssh client commands. 14.472 +\item If you have a high tolerance for pain, you can use the Cygwin 14.473 + port of OpenSSH. 14.474 +\end{itemize} 14.475 +In either case, you'll need to edit your \hgini\ file to tell 14.476 +Mercurial where to find the actual client command. For example, if 14.477 +you're using PuTTY, you'll need to use the \command{plink} command as 14.478 +a command-line ssh client. 14.479 +\begin{codesample2} 14.480 + [ui] 14.481 + ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key" 14.482 +\end{codesample2} 14.483 + 14.484 +\begin{note} 14.485 + The path to \command{plink} shouldn't contain any whitespace 14.486 + characters, or Mercurial may not be able to run it correctly (so 14.487 + putting it in \dirname{C:\\Program Files} is probably not a good 14.488 + idea). 14.489 +\end{note} 14.490 + 14.491 +\subsection{Generating a key pair} 14.492 + 14.493 +To avoid the need to repetitively type a password every time you need 14.494 +to use your ssh client, I recommend generating a key pair. On a 14.495 +Unix-like system, the \command{ssh-keygen} command will do the trick. 14.496 +On Windows, if you're using PuTTY, the \command{puttygen} command is 14.497 +what you'll need. 14.498 + 14.499 +When you generate a key pair, it's usually \emph{highly} advisable to 14.500 +protect it with a passphrase. (The only time that you might not want 14.501 +to do this is when you're using the ssh protocol for automated tasks 14.502 +on a secure network.) 14.503 + 14.504 +Simply generating a key pair isn't enough, however. You'll need to 14.505 +add the public key to the set of authorised keys for whatever user 14.506 +you're logging in remotely as. For servers using OpenSSH (the vast 14.507 +majority), this will mean adding the public key to a list in a file 14.508 +called \sfilename{authorized\_keys} in their \sdirname{.ssh} 14.509 +directory. 14.510 + 14.511 +On a Unix-like system, your public key will have a \filename{.pub} 14.512 +extension. If you're using \command{puttygen} on Windows, you can 14.513 +save the public key to a file of your choosing, or paste it from the 14.514 +window it's displayed in straight into the 14.515 +\sfilename{authorized\_keys} file. 14.516 + 14.517 +\subsection{Using an authentication agent} 14.518 + 14.519 +An authentication agent is a daemon that stores passphrases in memory 14.520 +(so it will forget passphrases if you log out and log back in again). 14.521 +An ssh client will notice if it's running, and query it for a 14.522 +passphrase. If there's no authentication agent running, or the agent 14.523 +doesn't store the necessary passphrase, you'll have to type your 14.524 +passphrase every time Mercurial tries to communicate with a server on 14.525 +your behalf (e.g.~whenever you pull or push changes). 14.526 + 14.527 +The downside of storing passphrases in an agent is that it's possible 14.528 +for a well-prepared attacker to recover the plain text of your 14.529 +passphrases, in some cases even if your system has been power-cycled. 14.530 +You should make your own judgment as to whether this is an acceptable 14.531 +risk. It certainly saves a lot of repeated typing. 14.532 + 14.533 +On Unix-like systems, the agent is called \command{ssh-agent}, and 14.534 +it's often run automatically for you when you log in. You'll need to 14.535 +use the \command{ssh-add} command to add passphrases to the agent's 14.536 +store. On Windows, if you're using PuTTY, the \command{pageant} 14.537 +command acts as the agent. It adds an icon to your system tray that 14.538 +will let you manage stored passphrases. 14.539 + 14.540 +\subsection{Configuring the server side properly} 14.541 + 14.542 +Because ssh can be fiddly to set up if you're new to it, there's a 14.543 +variety of things that can go wrong. Add Mercurial on top, and 14.544 +there's plenty more scope for head-scratching. Most of these 14.545 +potential problems occur on the server side, not the client side. The 14.546 +good news is that once you've gotten a configuration working, it will 14.547 +usually continue to work indefinitely. 14.548 + 14.549 +Before you try using Mercurial to talk to an ssh server, it's best to 14.550 +make sure that you can use the normal \command{ssh} or \command{putty} 14.551 +command to talk to the server first. If you run into problems with 14.552 +using these commands directly, Mercurial surely won't work. Worse, it 14.553 +will obscure the underlying problem. Any time you want to debug 14.554 +ssh-related Mercurial problems, you should drop back to making sure 14.555 +that plain ssh client commands work first, \emph{before} you worry 14.556 +about whether there's a problem with Mercurial. 14.557 + 14.558 +The first thing to be sure of on the server side is that you can 14.559 +actually log in from another machine at all. If you can't use 14.560 +\command{ssh} or \command{putty} to log in, the error message you get 14.561 +may give you a few hints as to what's wrong. The most common problems 14.562 +are as follows. 14.563 +\begin{itemize} 14.564 +\item If you get a ``connection refused'' error, either there isn't an 14.565 + SSH daemon running on the server at all, or it's inaccessible due to 14.566 + firewall configuration. 14.567 +\item If you get a ``no route to host'' error, you either have an 14.568 + incorrect address for the server or a seriously locked down firewall 14.569 + that won't admit its existence at all. 14.570 +\item If you get a ``permission denied'' error, you may have mistyped 14.571 + the username on the server, or you could have mistyped your key's 14.572 + passphrase or the remote user's password. 14.573 +\end{itemize} 14.574 +In summary, if you're having trouble talking to the server's ssh 14.575 +daemon, first make sure that one is running at all. On many systems 14.576 +it will be installed, but disabled, by default. Once you're done with 14.577 +this step, you should then check that the server's firewall is 14.578 +configured to allow incoming connections on the port the ssh daemon is 14.579 +listening on (usually~22). Don't worry about more exotic 14.580 +possibilities for misconfiguration until you've checked these two 14.581 +first. 14.582 + 14.583 +If you're using an authentication agent on the client side to store 14.584 +passphrases for your keys, you ought to be able to log into the server 14.585 +without being prompted for a passphrase or a password. If you're 14.586 +prompted for a passphrase, there are a few possible culprits. 14.587 +\begin{itemize} 14.588 +\item You might have forgotten to use \command{ssh-add} or 14.589 + \command{pageant} to store the passphrase. 14.590 +\item You might have stored the passphrase for the wrong key. 14.591 +\end{itemize} 14.592 +If you're being prompted for the remote user's password, there are 14.593 +another few possible problems to check. 14.594 +\begin{itemize} 14.595 +\item Either the user's home directory or their \sdirname{.ssh} 14.596 + directory might have excessively liberal permissions. As a result, 14.597 + the ssh daemon will not trust or read their 14.598 + \sfilename{authorized\_keys} file. For example, a group-writable 14.599 + home or \sdirname{.ssh} directory will often cause this symptom. 14.600 +\item The user's \sfilename{authorized\_keys} file may have a problem. 14.601 + If anyone other than the user owns or can write to that file, the 14.602 + ssh daemon will not trust or read it. 14.603 +\end{itemize} 14.604 + 14.605 +In the ideal world, you should be able to run the following command 14.606 +successfully, and it should print exactly one line of output, the 14.607 +current date and time. 14.608 +\begin{codesample2} 14.609 + ssh myserver date 14.610 +\end{codesample2} 14.611 + 14.612 +If, on your server, you have login scripts that print banners or other 14.613 +junk even when running non-interactive commands like this, you should 14.614 +fix them before you continue, so that they only print output if 14.615 +they're run interactively. Otherwise these banners will at least 14.616 +clutter up Mercurial's output. Worse, they could potentially cause 14.617 +problems with running Mercurial commands remotely. Mercurial makes 14.618 +tries to detect and ignore banners in non-interactive \command{ssh} 14.619 +sessions, but it is not foolproof. (If you're editing your login 14.620 +scripts on your server, the usual way to see if a login script is 14.621 +running in an interactive shell is to check the return code from the 14.622 +command \Verb|tty -s|.) 14.623 + 14.624 +Once you've verified that plain old ssh is working with your server, 14.625 +the next step is to ensure that Mercurial runs on the server. The 14.626 +following command should run successfully: 14.627 +\begin{codesample2} 14.628 + ssh myserver hg version 14.629 +\end{codesample2} 14.630 +If you see an error message instead of normal \hgcmd{version} output, 14.631 +this is usually because you haven't installed Mercurial to 14.632 +\dirname{/usr/bin}. Don't worry if this is the case; you don't need 14.633 +to do that. But you should check for a few possible problems. 14.634 +\begin{itemize} 14.635 +\item Is Mercurial really installed on the server at all? I know this 14.636 + sounds trivial, but it's worth checking! 14.637 +\item Maybe your shell's search path (usually set via the \envar{PATH} 14.638 + environment variable) is simply misconfigured. 14.639 +\item Perhaps your \envar{PATH} environment variable is only being set 14.640 + to point to the location of the \command{hg} executable if the login 14.641 + session is interactive. This can happen if you're setting the path 14.642 + in the wrong shell login script. See your shell's documentation for 14.643 + details. 14.644 +\item The \envar{PYTHONPATH} environment variable may need to contain 14.645 + the path to the Mercurial Python modules. It might not be set at 14.646 + all; it could be incorrect; or it may be set only if the login is 14.647 + interactive. 14.648 +\end{itemize} 14.649 + 14.650 +If you can run \hgcmd{version} over an ssh connection, well done! 14.651 +You've got the server and client sorted out. You should now be able 14.652 +to use Mercurial to access repositories hosted by that username on 14.653 +that server. If you run into problems with Mercurial and ssh at this 14.654 +point, try using the \hggopt{--debug} option to get a clearer picture 14.655 +of what's going on. 14.656 + 14.657 +\subsection{Using compression with ssh} 14.658 + 14.659 +Mercurial does not compress data when it uses the ssh protocol, 14.660 +because the ssh protocol can transparently compress data. However, 14.661 +the default behaviour of ssh clients is \emph{not} to request 14.662 +compression. 14.663 + 14.664 +Over any network other than a fast LAN (even a wireless network), 14.665 +using compression is likely to significantly speed up Mercurial's 14.666 +network operations. For example, over a WAN, someone measured 14.667 +compression as reducing the amount of time required to clone a 14.668 +particularly large repository from~51 minutes to~17 minutes. 14.669 + 14.670 +Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C} 14.671 +option which turns on compression. You can easily edit your \hgrc\ to 14.672 +enable compression for all of Mercurial's uses of the ssh protocol. 14.673 +\begin{codesample2} 14.674 + [ui] 14.675 + ssh = ssh -C 14.676 +\end{codesample2} 14.677 + 14.678 +If you use \command{ssh}, you can configure it to always use 14.679 +compression when talking to your server. To do this, edit your 14.680 +\sfilename{.ssh/config} file (which may not yet exist), as follows. 14.681 +\begin{codesample2} 14.682 + Host hg 14.683 + Compression yes 14.684 + HostName hg.example.com 14.685 +\end{codesample2} 14.686 +This defines an alias, \texttt{hg}. When you use it on the 14.687 +\command{ssh} command line or in a Mercurial \texttt{ssh}-protocol 14.688 +URL, it will cause \command{ssh} to connect to \texttt{hg.example.com} 14.689 +and use compression. This gives you both a shorter name to type and 14.690 +compression, each of which is a good thing in its own right. 14.691 + 14.692 +\section{Serving over HTTP using CGI} 14.693 +\label{sec:collab:cgi} 14.694 + 14.695 +Depending on how ambitious you are, configuring Mercurial's CGI 14.696 +interface can take anything from a few moments to several hours. 14.697 + 14.698 +We'll begin with the simplest of examples, and work our way towards a 14.699 +more complex configuration. Even for the most basic case, you're 14.700 +almost certainly going to need to read and modify your web server's 14.701 +configuration. 14.702 + 14.703 +\begin{note} 14.704 + Configuring a web server is a complex, fiddly, and highly 14.705 + system-dependent activity. I can't possibly give you instructions 14.706 + that will cover anything like all of the cases you will encounter. 14.707 + Please use your discretion and judgment in following the sections 14.708 + below. Be prepared to make plenty of mistakes, and to spend a lot 14.709 + of time reading your server's error logs. 14.710 +\end{note} 14.711 + 14.712 +\subsection{Web server configuration checklist} 14.713 + 14.714 +Before you continue, do take a few moments to check a few aspects of 14.715 +your system's setup. 14.716 + 14.717 +\begin{enumerate} 14.718 +\item Do you have a web server installed at all? Mac OS X ships with 14.719 + Apache, but many other systems may not have a web server installed. 14.720 +\item If you have a web server installed, is it actually running? On 14.721 + most systems, even if one is present, it will be disabled by 14.722 + default. 14.723 +\item Is your server configured to allow you to run CGI programs in 14.724 + the directory where you plan to do so? Most servers default to 14.725 + explicitly disabling the ability to run CGI programs. 14.726 +\end{enumerate} 14.727 + 14.728 +If you don't have a web server installed, and don't have substantial 14.729 +experience configuring Apache, you should consider using the 14.730 +\texttt{lighttpd} web server instead of Apache. Apache has a 14.731 +well-deserved reputation for baroque and confusing configuration. 14.732 +While \texttt{lighttpd} is less capable in some ways than Apache, most 14.733 +of these capabilities are not relevant to serving Mercurial 14.734 +repositories. And \texttt{lighttpd} is undeniably \emph{much} easier 14.735 +to get started with than Apache. 14.736 + 14.737 +\subsection{Basic CGI configuration} 14.738 + 14.739 +On Unix-like systems, it's common for users to have a subdirectory 14.740 +named something like \dirname{public\_html} in their home directory, 14.741 +from which they can serve up web pages. A file named \filename{foo} 14.742 +in this directory will be accessible at a URL of the form 14.743 +\texttt{http://www.example.com/\~{}username/foo}. 14.744 + 14.745 +To get started, find the \sfilename{hgweb.cgi} script that should be 14.746 +present in your Mercurial installation. If you can't quickly find a 14.747 +local copy on your system, simply download one from the master 14.748 +Mercurial repository at 14.749 +\url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}. 14.750 + 14.751 +You'll need to copy this script into your \dirname{public\_html} 14.752 +directory, and ensure that it's executable. 14.753 +\begin{codesample2} 14.754 + cp .../hgweb.cgi ~/public_html 14.755 + chmod 755 ~/public_html/hgweb.cgi 14.756 +\end{codesample2} 14.757 +The \texttt{755} argument to \command{chmod} is a little more general 14.758 +than just making the script executable: it ensures that the script is 14.759 +executable by anyone, and that ``group'' and ``other'' write 14.760 +permissions are \emph{not} set. If you were to leave those write 14.761 +permissions enabled, Apache's \texttt{suexec} subsystem would likely 14.762 +refuse to execute the script. In fact, \texttt{suexec} also insists 14.763 +that the \emph{directory} in which the script resides must not be 14.764 +writable by others. 14.765 +\begin{codesample2} 14.766 + chmod 755 ~/public_html 14.767 +\end{codesample2} 14.768 + 14.769 +\subsubsection{What could \emph{possibly} go wrong?} 14.770 +\label{sec:collab:wtf} 14.771 + 14.772 +Once you've copied the CGI script into place, go into a web browser, 14.773 +and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi}, 14.774 +\emph{but} brace yourself for instant failure. There's a high 14.775 +probability that trying to visit this URL will fail, and there are 14.776 +many possible reasons for this. In fact, you're likely to stumble 14.777 +over almost every one of the possible errors below, so please read 14.778 +carefully. The following are all of the problems I ran into on a 14.779 +system running Fedora~7, with a fresh installation of Apache, and a 14.780 +user account that I created specially to perform this exercise. 14.781 + 14.782 +Your web server may have per-user directories disabled. If you're 14.783 +using Apache, search your config file for a \texttt{UserDir} 14.784 +directive. If there's none present, per-user directories will be 14.785 +disabled. If one exists, but its value is \texttt{disabled}, then 14.786 +per-user directories will be disabled. Otherwise, the string after 14.787 +\texttt{UserDir} gives the name of the subdirectory that Apache will 14.788 +look in under your home directory, for example \dirname{public\_html}. 14.789 + 14.790 +Your file access permissions may be too restrictive. The web server 14.791 +must be able to traverse your home directory and directories under 14.792 +your \dirname{public\_html} directory, and read files under the latter 14.793 +too. Here's a quick recipe to help you to make your permissions more 14.794 +appropriate. 14.795 +\begin{codesample2} 14.796 + chmod 755 ~ 14.797 + find ~/public_html -type d -print0 | xargs -0r chmod 755 14.798 + find ~/public_html -type f -print0 | xargs -0r chmod 644 14.799 +\end{codesample2} 14.800 + 14.801 +The other possibility with permissions is that you might get a 14.802 +completely empty window when you try to load the script. In this 14.803 +case, it's likely that your access permissions are \emph{too 14.804 + permissive}. Apache's \texttt{suexec} subsystem won't execute a 14.805 +script that's group-~or world-writable, for example. 14.806 + 14.807 +Your web server may be configured to disallow execution of CGI 14.808 +programs in your per-user web directory. Here's Apache's 14.809 +default per-user configuration from my Fedora system. 14.810 +\begin{codesample2} 14.811 + <Directory /home/*/public_html> 14.812 + AllowOverride FileInfo AuthConfig Limit 14.813 + Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec 14.814 + <Limit GET POST OPTIONS> 14.815 + Order allow,deny 14.816 + Allow from all 14.817 + </Limit> 14.818 + <LimitExcept GET POST OPTIONS> 14.819 + Order deny,allow 14.820 + Deny from all 14.821 + </LimitExcept> 14.822 + </Directory> 14.823 +\end{codesample2} 14.824 +If you find a similar-looking \texttt{Directory} group in your Apache 14.825 +configuration, the directive to look at inside it is \texttt{Options}. 14.826 +Add \texttt{ExecCGI} to the end of this list if it's missing, and 14.827 +restart the web server. 14.828 + 14.829 +If you find that Apache serves you the text of the CGI script instead 14.830 +of executing it, you may need to either uncomment (if already present) 14.831 +or add a directive like this. 14.832 +\begin{codesample2} 14.833 + AddHandler cgi-script .cgi 14.834 +\end{codesample2} 14.835 + 14.836 +The next possibility is that you might be served with a colourful 14.837 +Python backtrace claiming that it can't import a 14.838 +\texttt{mercurial}-related module. This is actually progress! The 14.839 +server is now capable of executing your CGI script. This error is 14.840 +only likely to occur if you're running a private installation of 14.841 +Mercurial, instead of a system-wide version. Remember that the web 14.842 +server runs the CGI program without any of the environment variables 14.843 +that you take for granted in an interactive session. If this error 14.844 +happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the 14.845 +directions inside it to correctly set your \envar{PYTHONPATH} 14.846 +environment variable. 14.847 + 14.848 +Finally, you are \emph{certain} to by served with another colourful 14.849 +Python backtrace: this one will complain that it can't find 14.850 +\dirname{/path/to/repository}. Edit your \sfilename{hgweb.cgi} script 14.851 +and replace the \dirname{/path/to/repository} string with the complete 14.852 +path to the repository you want to serve up. 14.853 + 14.854 +At this point, when you try to reload the page, you should be 14.855 +presented with a nice HTML view of your repository's history. Whew! 14.856 + 14.857 +\subsubsection{Configuring lighttpd} 14.858 + 14.859 +To be exhaustive in my experiments, I tried configuring the 14.860 +increasingly popular \texttt{lighttpd} web server to serve the same 14.861 +repository as I described with Apache above. I had already overcome 14.862 +all of the problems I outlined with Apache, many of which are not 14.863 +server-specific. As a result, I was fairly sure that my file and 14.864 +directory permissions were good, and that my \sfilename{hgweb.cgi} 14.865 +script was properly edited. 14.866 + 14.867 +Once I had Apache running, getting \texttt{lighttpd} to serve the 14.868 +repository was a snap (in other words, even if you're trying to use 14.869 +\texttt{lighttpd}, you should read the Apache section). I first had 14.870 +to edit the \texttt{mod\_access} section of its config file to enable 14.871 +\texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were 14.872 +disabled by default on my system. I then added a few lines to the end 14.873 +of the config file, to configure these modules. 14.874 +\begin{codesample2} 14.875 + userdir.path = "public_html" 14.876 + cgi.assign = ( ".cgi" => "" ) 14.877 +\end{codesample2} 14.878 +With this done, \texttt{lighttpd} ran immediately for me. If I had 14.879 +configured \texttt{lighttpd} before Apache, I'd almost certainly have 14.880 +run into many of the same system-level configuration problems as I did 14.881 +with Apache. However, I found \texttt{lighttpd} to be noticeably 14.882 +easier to configure than Apache, even though I've used Apache for over 14.883 +a decade, and this was my first exposure to \texttt{lighttpd}. 14.884 + 14.885 +\subsection{Sharing multiple repositories with one CGI script} 14.886 + 14.887 +The \sfilename{hgweb.cgi} script only lets you publish a single 14.888 +repository, which is an annoying restriction. If you want to publish 14.889 +more than one without wracking yourself with multiple copies of the 14.890 +same script, each with different names, a better choice is to use the 14.891 +\sfilename{hgwebdir.cgi} script. 14.892 + 14.893 +The procedure to configure \sfilename{hgwebdir.cgi} is only a little 14.894 +more involved than for \sfilename{hgweb.cgi}. First, you must obtain 14.895 +a copy of the script. If you don't have one handy, you can download a 14.896 +copy from the master Mercurial repository at 14.897 +\url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}. 14.898 + 14.899 +You'll need to copy this script into your \dirname{public\_html} 14.900 +directory, and ensure that it's executable. 14.901 +\begin{codesample2} 14.902 + cp .../hgwebdir.cgi ~/public_html 14.903 + chmod 755 ~/public_html ~/public_html/hgwebdir.cgi 14.904 +\end{codesample2} 14.905 +With basic configuration out of the way, try to visit 14.906 +\url{http://myhostname/~myuser/hgwebdir.cgi} in your browser. It 14.907 +should display an empty list of repositories. If you get a blank 14.908 +window or error message, try walking through the list of potential 14.909 +problems in section~\ref{sec:collab:wtf}. 14.910 + 14.911 +The \sfilename{hgwebdir.cgi} script relies on an external 14.912 +configuration file. By default, it searches for a file named 14.913 +\sfilename{hgweb.config} in the same directory as itself. You'll need 14.914 +to create this file, and make it world-readable. The format of the 14.915 +file is similar to a Windows ``ini'' file, as understood by Python's 14.916 +\texttt{ConfigParser}~\cite{web:configparser} module. 14.917 + 14.918 +The easiest way to configure \sfilename{hgwebdir.cgi} is with a 14.919 +section named \texttt{collections}. This will automatically publish 14.920 +\emph{every} repository under the directories you name. The section 14.921 +should look like this: 14.922 +\begin{codesample2} 14.923 + [collections] 14.924 + /my/root = /my/root 14.925 +\end{codesample2} 14.926 +Mercurial interprets this by looking at the directory name on the 14.927 +\emph{right} hand side of the ``\texttt{=}'' sign; finding 14.928 +repositories in that directory hierarchy; and using the text on the 14.929 +\emph{left} to strip off matching text from the names it will actually 14.930 +list in the web interface. The remaining component of a path after 14.931 +this stripping has occurred is called a ``virtual path''. 14.932 + 14.933 +Given the example above, if we have a repository whose local path is 14.934 +\dirname{/my/root/this/repo}, the CGI script will strip the leading 14.935 +\dirname{/my/root} from the name, and publish the repository with a 14.936 +virtual path of \dirname{this/repo}. If the base URL for our CGI 14.937 +script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete 14.938 +URL for that repository will be 14.939 +\url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}. 14.940 + 14.941 +If we replace \dirname{/my/root} on the left hand side of this example 14.942 +with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off 14.943 +\dirname{/my} from the repository name, and will give us a virtual 14.944 +path of \dirname{root/this/repo} instead of \dirname{this/repo}. 14.945 + 14.946 +The \sfilename{hgwebdir.cgi} script will recursively search each 14.947 +directory listed in the \texttt{collections} section of its 14.948 +configuration file, but it will \texttt{not} recurse into the 14.949 +repositories it finds. 14.950 + 14.951 +The \texttt{collections} mechanism makes it easy to publish many 14.952 +repositories in a ``fire and forget'' manner. You only need to set up 14.953 +the CGI script and configuration file one time. Afterwards, you can 14.954 +publish or unpublish a repository at any time by simply moving it 14.955 +into, or out of, the directory hierarchy in which you've configured 14.956 +\sfilename{hgwebdir.cgi} to look. 14.957 + 14.958 +\subsubsection{Explicitly specifying which repositories to publish} 14.959 + 14.960 +In addition to the \texttt{collections} mechanism, the 14.961 +\sfilename{hgwebdir.cgi} script allows you to publish a specific list 14.962 +of repositories. To do so, create a \texttt{paths} section, with 14.963 +contents of the following form. 14.964 +\begin{codesample2} 14.965 + [paths] 14.966 + repo1 = /my/path/to/some/repo 14.967 + repo2 = /some/path/to/another 14.968 +\end{codesample2} 14.969 +In this case, the virtual path (the component that will appear in a 14.970 +URL) is on the left hand side of each definition, while the path to 14.971 +the repository is on the right. Notice that there does not need to be 14.972 +any relationship between the virtual path you choose and the location 14.973 +of a repository in your filesystem. 14.974 + 14.975 +If you wish, you can use both the \texttt{collections} and 14.976 +\texttt{paths} mechanisms simultaneously in a single configuration 14.977 +file. 14.978 + 14.979 +\begin{note} 14.980 + If multiple repositories have the same virtual path, 14.981 + \sfilename{hgwebdir.cgi} will not report an error. Instead, it will 14.982 + behave unpredictably. 14.983 +\end{note} 14.984 + 14.985 +\subsection{Downloading source archives} 14.986 + 14.987 +Mercurial's web interface lets users download an archive of any 14.988 +revision. This archive will contain a snapshot of the working 14.989 +directory as of that revision, but it will not contain a copy of the 14.990 +repository data. 14.991 + 14.992 +By default, this feature is not enabled. To enable it, you'll need to 14.993 +add an \rcitem{web}{allow\_archive} item to the \rcsection{web} 14.994 +section of your \hgrc. 14.995 + 14.996 +\subsection{Web configuration options} 14.997 + 14.998 +Mercurial's web interfaces (the \hgcmd{serve} command, and the 14.999 +\sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a 14.1000 +number of configuration options that you can set. These belong in a 14.1001 +section named \rcsection{web}. 14.1002 +\begin{itemize} 14.1003 +\item[\rcitem{web}{allow\_archive}] Determines which (if any) archive 14.1004 + download mechanisms Mercurial supports. If you enable this 14.1005 + feature, users of the web interface will be able to download an 14.1006 + archive of whatever revision of a repository they are viewing. 14.1007 + To enable the archive feature, this item must take the form of a 14.1008 + sequence of words drawn from the list below. 14.1009 + \begin{itemize} 14.1010 + \item[\texttt{bz2}] A \command{tar} archive, compressed using 14.1011 + \texttt{bzip2} compression. This has the best compression ratio, 14.1012 + but uses the most CPU time on the server. 14.1013 + \item[\texttt{gz}] A \command{tar} archive, compressed using 14.1014 + \texttt{gzip} compression. 14.1015 + \item[\texttt{zip}] A \command{zip} archive, compressed using LZW 14.1016 + compression. This format has the worst compression ratio, but is 14.1017 + widely used in the Windows world. 14.1018 + \end{itemize} 14.1019 + If you provide an empty list, or don't have an 14.1020 + \rcitem{web}{allow\_archive} entry at all, this feature will be 14.1021 + disabled. Here is an example of how to enable all three supported 14.1022 + formats. 14.1023 + \begin{codesample4} 14.1024 + [web] 14.1025 + allow_archive = bz2 gz zip 14.1026 + \end{codesample4} 14.1027 +\item[\rcitem{web}{allowpull}] Boolean. Determines whether the web 14.1028 + interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this 14.1029 + repository over~HTTP. If set to \texttt{no} or \texttt{false}, only 14.1030 + the ``human-oriented'' portion of the web interface is available. 14.1031 +\item[\rcitem{web}{contact}] String. A free-form (but preferably 14.1032 + brief) string identifying the person or group in charge of the 14.1033 + repository. This often contains the name and email address of a 14.1034 + person or mailing list. It often makes sense to place this entry in 14.1035 + a repository's own \sfilename{.hg/hgrc} file, but it can make sense 14.1036 + to use in a global \hgrc\ if every repository has a single 14.1037 + maintainer. 14.1038 +\item[\rcitem{web}{maxchanges}] Integer. The default maximum number 14.1039 + of changesets to display in a single page of output. 14.1040 +\item[\rcitem{web}{maxfiles}] Integer. The default maximum number 14.1041 + of modified files to display in a single page of output. 14.1042 +\item[\rcitem{web}{stripes}] Integer. If the web interface displays 14.1043 + alternating ``stripes'' to make it easier to visually align rows 14.1044 + when you are looking at a table, this number controls the number of 14.1045 + rows in each stripe. 14.1046 +\item[\rcitem{web}{style}] Controls the template Mercurial uses to 14.1047 + display the web interface. Mercurial ships with two web templates, 14.1048 + named \texttt{default} and \texttt{gitweb} (the latter is much more 14.1049 + visually attractive). You can also specify a custom template of 14.1050 + your own; see chapter~\ref{chap:template} for details. Here, you 14.1051 + can see how to enable the \texttt{gitweb} style. 14.1052 + \begin{codesample4} 14.1053 + [web] 14.1054 + style = gitweb 14.1055 + \end{codesample4} 14.1056 +\item[\rcitem{web}{templates}] Path. The directory in which to search 14.1057 + for template files. By default, Mercurial searches in the directory 14.1058 + in which it was installed. 14.1059 +\end{itemize} 14.1060 +If you are using \sfilename{hgwebdir.cgi}, you can place a few 14.1061 +configuration items in a \rcsection{web} section of the 14.1062 +\sfilename{hgweb.config} file instead of a \hgrc\ file, for 14.1063 +convenience. These items are \rcitem{web}{motd} and 14.1064 +\rcitem{web}{style}. 14.1065 + 14.1066 +\subsubsection{Options specific to an individual repository} 14.1067 + 14.1068 +A few \rcsection{web} configuration items ought to be placed in a 14.1069 +repository's local \sfilename{.hg/hgrc}, rather than a user's or 14.1070 +global \hgrc. 14.1071 +\begin{itemize} 14.1072 +\item[\rcitem{web}{description}] String. A free-form (but preferably 14.1073 + brief) string that describes the contents or purpose of the 14.1074 + repository. 14.1075 +\item[\rcitem{web}{name}] String. The name to use for the repository 14.1076 + in the web interface. This overrides the default name, which is the 14.1077 + last component of the repository's path. 14.1078 +\end{itemize} 14.1079 + 14.1080 +\subsubsection{Options specific to the \hgcmd{serve} command} 14.1081 + 14.1082 +Some of the items in the \rcsection{web} section of a \hgrc\ file are 14.1083 +only for use with the \hgcmd{serve} command. 14.1084 +\begin{itemize} 14.1085 +\item[\rcitem{web}{accesslog}] Path. The name of a file into which to 14.1086 + write an access log. By default, the \hgcmd{serve} command writes 14.1087 + this information to standard output, not to a file. Log entries are 14.1088 + written in the standard ``combined'' file format used by almost all 14.1089 + web servers. 14.1090 +\item[\rcitem{web}{address}] String. The local address on which the 14.1091 + server should listen for incoming connections. By default, the 14.1092 + server listens on all addresses. 14.1093 +\item[\rcitem{web}{errorlog}] Path. The name of a file into which to 14.1094 + write an error log. By default, the \hgcmd{serve} command writes this 14.1095 + information to standard error, not to a file. 14.1096 +\item[\rcitem{web}{ipv6}] Boolean. Whether to use the IPv6 protocol. 14.1097 + By default, IPv6 is not used. 14.1098 +\item[\rcitem{web}{port}] Integer. The TCP~port number on which the 14.1099 + server should listen. The default port number used is~8000. 14.1100 +\end{itemize} 14.1101 + 14.1102 +\subsubsection{Choosing the right \hgrc\ file to add \rcsection{web} 14.1103 + items to} 14.1104 + 14.1105 +It is important to remember that a web server like Apache or 14.1106 +\texttt{lighttpd} will run under a user~ID that is different to yours. 14.1107 +CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will 14.1108 +usually also run under that user~ID. 14.1109 + 14.1110 +If you add \rcsection{web} items to your own personal \hgrc\ file, CGI 14.1111 +scripts won't read that \hgrc\ file. Those settings will thus only 14.1112 +affect the behaviour of the \hgcmd{serve} command when you run it. To 14.1113 +cause CGI scripts to see your settings, either create a \hgrc\ file in 14.1114 +the home directory of the user ID that runs your web server, or add 14.1115 +those settings to a system-wide \hgrc\ file. 14.1116 + 14.1117 + 14.1118 +%%% Local Variables: 14.1119 +%%% mode: latex 14.1120 +%%% TeX-master: "00book" 14.1121 +%%% End:
15.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 15.2 +++ b/en/ch07-filenames.tex Thu Jan 29 22:56:27 2009 -0800 15.3 @@ -0,0 +1,306 @@ 15.4 +\chapter{File names and pattern matching} 15.5 +\label{chap:names} 15.6 + 15.7 +Mercurial provides mechanisms that let you work with file names in a 15.8 +consistent and expressive way. 15.9 + 15.10 +\section{Simple file naming} 15.11 + 15.12 +Mercurial uses a unified piece of machinery ``under the hood'' to 15.13 +handle file names. Every command behaves uniformly with respect to 15.14 +file names. The way in which commands work with file names is as 15.15 +follows. 15.16 + 15.17 +If you explicitly name real files on the command line, Mercurial works 15.18 +with exactly those files, as you would expect. 15.19 +\interaction{filenames.files} 15.20 + 15.21 +When you provide a directory name, Mercurial will interpret this as 15.22 +``operate on every file in this directory and its subdirectories''. 15.23 +Mercurial traverses the files and subdirectories in a directory in 15.24 +alphabetical order. When it encounters a subdirectory, it will 15.25 +traverse that subdirectory before continuing with the current 15.26 +directory. 15.27 +\interaction{filenames.dirs} 15.28 + 15.29 +\section{Running commands without any file names} 15.30 + 15.31 +Mercurial's commands that work with file names have useful default 15.32 +behaviours when you invoke them without providing any file names or 15.33 +patterns. What kind of behaviour you should expect depends on what 15.34 +the command does. Here are a few rules of thumb you can use to 15.35 +predict what a command is likely to do if you don't give it any names 15.36 +to work with. 15.37 +\begin{itemize} 15.38 +\item Most commands will operate on the entire working directory. 15.39 + This is what the \hgcmd{add} command does, for example. 15.40 +\item If the command has effects that are difficult or impossible to 15.41 + reverse, it will force you to explicitly provide at least one name 15.42 + or pattern (see below). This protects you from accidentally 15.43 + deleting files by running \hgcmd{remove} with no arguments, for 15.44 + example. 15.45 +\end{itemize} 15.46 + 15.47 +It's easy to work around these default behaviours if they don't suit 15.48 +you. If a command normally operates on the whole working directory, 15.49 +you can invoke it on just the current directory and its subdirectories 15.50 +by giving it the name ``\dirname{.}''. 15.51 +\interaction{filenames.wdir-subdir} 15.52 + 15.53 +Along the same lines, some commands normally print file names relative 15.54 +to the root of the repository, even if you're invoking them from a 15.55 +subdirectory. Such a command will print file names relative to your 15.56 +subdirectory if you give it explicit names. Here, we're going to run 15.57 +\hgcmd{status} from a subdirectory, and get it to operate on the 15.58 +entire working directory while printing file names relative to our 15.59 +subdirectory, by passing it the output of the \hgcmd{root} command. 15.60 +\interaction{filenames.wdir-relname} 15.61 + 15.62 +\section{Telling you what's going on} 15.63 + 15.64 +The \hgcmd{add} example in the preceding section illustrates something 15.65 +else that's helpful about Mercurial commands. If a command operates 15.66 +on a file that you didn't name explicitly on the command line, it will 15.67 +usually print the name of the file, so that you will not be surprised 15.68 +what's going on. 15.69 + 15.70 +The principle here is of \emph{least surprise}. If you've exactly 15.71 +named a file on the command line, there's no point in repeating it 15.72 +back at you. If Mercurial is acting on a file \emph{implicitly}, 15.73 +because you provided no names, or a directory, or a pattern (see 15.74 +below), it's safest to tell you what it's doing. 15.75 + 15.76 +For commands that behave this way, you can silence them using the 15.77 +\hggopt{-q} option. You can also get them to print the name of every 15.78 +file, even those you've named explicitly, using the \hggopt{-v} 15.79 +option. 15.80 + 15.81 +\section{Using patterns to identify files} 15.82 + 15.83 +In addition to working with file and directory names, Mercurial lets 15.84 +you use \emph{patterns} to identify files. Mercurial's pattern 15.85 +handling is expressive. 15.86 + 15.87 +On Unix-like systems (Linux, MacOS, etc.), the job of matching file 15.88 +names to patterns normally falls to the shell. On these systems, you 15.89 +must explicitly tell Mercurial that a name is a pattern. On Windows, 15.90 +the shell does not expand patterns, so Mercurial will automatically 15.91 +identify names that are patterns, and expand them for you. 15.92 + 15.93 +To provide a pattern in place of a regular name on the command line, 15.94 +the mechanism is simple: 15.95 +\begin{codesample2} 15.96 + syntax:patternbody 15.97 +\end{codesample2} 15.98 +That is, a pattern is identified by a short text string that says what 15.99 +kind of pattern this is, followed by a colon, followed by the actual 15.100 +pattern. 15.101 + 15.102 +Mercurial supports two kinds of pattern syntax. The most frequently 15.103 +used is called \texttt{glob}; this is the same kind of pattern 15.104 +matching used by the Unix shell, and should be familiar to Windows 15.105 +command prompt users, too. 15.106 + 15.107 +When Mercurial does automatic pattern matching on Windows, it uses 15.108 +\texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix 15.109 +on Windows, but it's safe to use it, too. 15.110 + 15.111 +The \texttt{re} syntax is more powerful; it lets you specify patterns 15.112 +using regular expressions, also known as regexps. 15.113 + 15.114 +By the way, in the examples that follow, notice that I'm careful to 15.115 +wrap all of my patterns in quote characters, so that they won't get 15.116 +expanded by the shell before Mercurial sees them. 15.117 + 15.118 +\subsection{Shell-style \texttt{glob} patterns} 15.119 + 15.120 +This is an overview of the kinds of patterns you can use when you're 15.121 +matching on glob patterns. 15.122 + 15.123 +The ``\texttt{*}'' character matches any string, within a single 15.124 +directory. 15.125 +\interaction{filenames.glob.star} 15.126 + 15.127 +The ``\texttt{**}'' pattern matches any string, and crosses directory 15.128 +boundaries. It's not a standard Unix glob token, but it's accepted by 15.129 +several popular Unix shells, and is very useful. 15.130 +\interaction{filenames.glob.starstar} 15.131 + 15.132 +The ``\texttt{?}'' pattern matches any single character. 15.133 +\interaction{filenames.glob.question} 15.134 + 15.135 +The ``\texttt{[}'' character begins a \emph{character class}. This 15.136 +matches any single character within the class. The class ends with a 15.137 +``\texttt{]}'' character. A class may contain multiple \emph{range}s 15.138 +of the form ``\texttt{a-f}'', which is shorthand for 15.139 +``\texttt{abcdef}''. 15.140 +\interaction{filenames.glob.range} 15.141 +If the first character after the ``\texttt{[}'' in a character class 15.142 +is a ``\texttt{!}'', it \emph{negates} the class, making it match any 15.143 +single character not in the class. 15.144 + 15.145 +A ``\texttt{\{}'' begins a group of subpatterns, where the whole group 15.146 +matches if any subpattern in the group matches. The ``\texttt{,}'' 15.147 +character separates subpatterns, and ``\texttt{\}}'' ends the group. 15.148 +\interaction{filenames.glob.group} 15.149 + 15.150 +\subsubsection{Watch out!} 15.151 + 15.152 +Don't forget that if you want to match a pattern in any directory, you 15.153 +should not be using the ``\texttt{*}'' match-any token, as this will 15.154 +only match within one directory. Instead, use the ``\texttt{**}'' 15.155 +token. This small example illustrates the difference between the two. 15.156 +\interaction{filenames.glob.star-starstar} 15.157 + 15.158 +\subsection{Regular expression matching with \texttt{re} patterns} 15.159 + 15.160 +Mercurial accepts the same regular expression syntax as the Python 15.161 +programming language (it uses Python's regexp engine internally). 15.162 +This is based on the Perl language's regexp syntax, which is the most 15.163 +popular dialect in use (it's also used in Java, for example). 15.164 + 15.165 +I won't discuss Mercurial's regexp dialect in any detail here, as 15.166 +regexps are not often used. Perl-style regexps are in any case 15.167 +already exhaustively documented on a multitude of web sites, and in 15.168 +many books. Instead, I will focus here on a few things you should 15.169 +know if you find yourself needing to use regexps with Mercurial. 15.170 + 15.171 +A regexp is matched against an entire file name, relative to the root 15.172 +of the repository. In other words, even if you're already in 15.173 +subbdirectory \dirname{foo}, if you want to match files under this 15.174 +directory, your pattern must start with ``\texttt{foo/}''. 15.175 + 15.176 +One thing to note, if you're familiar with Perl-style regexps, is that 15.177 +Mercurial's are \emph{rooted}. That is, a regexp starts matching 15.178 +against the beginning of a string; it doesn't look for a match 15.179 +anywhere within the string. To match anywhere in a string, start 15.180 +your pattern with ``\texttt{.*}''. 15.181 + 15.182 +\section{Filtering files} 15.183 + 15.184 +Not only does Mercurial give you a variety of ways to specify files; 15.185 +it lets you further winnow those files using \emph{filters}. Commands 15.186 +that work with file names accept two filtering options. 15.187 +\begin{itemize} 15.188 +\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern 15.189 + that file names must match in order to be processed. 15.190 +\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to 15.191 + \emph{avoid} processing files, if they match this pattern. 15.192 +\end{itemize} 15.193 +You can provide multiple \hggopt{-I} and \hggopt{-X} options on the 15.194 +command line, and intermix them as you please. Mercurial interprets 15.195 +the patterns you provide using glob syntax by default (but you can use 15.196 +regexps if you need to). 15.197 + 15.198 +You can read a \hggopt{-I} filter as ``process only the files that 15.199 +match this filter''. 15.200 +\interaction{filenames.filter.include} 15.201 +The \hggopt{-X} filter is best read as ``process only the files that 15.202 +don't match this pattern''. 15.203 +\interaction{filenames.filter.exclude} 15.204 + 15.205 +\section{Ignoring unwanted files and directories} 15.206 + 15.207 +XXX. 15.208 + 15.209 +\section{Case sensitivity} 15.210 +\label{sec:names:case} 15.211 + 15.212 +If you're working in a mixed development environment that contains 15.213 +both Linux (or other Unix) systems and Macs or Windows systems, you 15.214 +should keep in the back of your mind the knowledge that they treat the 15.215 +case (``N'' versus ``n'') of file names in incompatible ways. This is 15.216 +not very likely to affect you, and it's easy to deal with if it does, 15.217 +but it could surprise you if you don't know about it. 15.218 + 15.219 +Operating systems and filesystems differ in the way they handle the 15.220 +\emph{case} of characters in file and directory names. There are 15.221 +three common ways to handle case in names. 15.222 +\begin{itemize} 15.223 +\item Completely case insensitive. Uppercase and lowercase versions 15.224 + of a letter are treated as identical, both when creating a file and 15.225 + during subsequent accesses. This is common on older DOS-based 15.226 + systems. 15.227 +\item Case preserving, but insensitive. When a file or directory is 15.228 + created, the case of its name is stored, and can be retrieved and 15.229 + displayed by the operating system. When an existing file is being 15.230 + looked up, its case is ignored. This is the standard arrangement on 15.231 + Windows and MacOS. The names \filename{foo} and \filename{FoO} 15.232 + identify the same file. This treatment of uppercase and lowercase 15.233 + letters as interchangeable is also referred to as \emph{case 15.234 + folding}. 15.235 +\item Case sensitive. The case of a name is significant at all times. 15.236 + The names \filename{foo} and {FoO} identify different files. This 15.237 + is the way Linux and Unix systems normally work. 15.238 +\end{itemize} 15.239 + 15.240 +On Unix-like systems, it is possible to have any or all of the above 15.241 +ways of handling case in action at once. For example, if you use a 15.242 +USB thumb drive formatted with a FAT32 filesystem on a Linux system, 15.243 +Linux will handle names on that filesystem in a case preserving, but 15.244 +insensitive, way. 15.245 + 15.246 +\subsection{Safe, portable repository storage} 15.247 + 15.248 +Mercurial's repository storage mechanism is \emph{case safe}. It 15.249 +translates file names so that they can be safely stored on both case 15.250 +sensitive and case insensitive filesystems. This means that you can 15.251 +use normal file copying tools to transfer a Mercurial repository onto, 15.252 +for example, a USB thumb drive, and safely move that drive and 15.253 +repository back and forth between a Mac, a PC running Windows, and a 15.254 +Linux box. 15.255 + 15.256 +\subsection{Detecting case conflicts} 15.257 + 15.258 +When operating in the working directory, Mercurial honours the naming 15.259 +policy of the filesystem where the working directory is located. If 15.260 +the filesystem is case preserving, but insensitive, Mercurial will 15.261 +treat names that differ only in case as the same. 15.262 + 15.263 +An important aspect of this approach is that it is possible to commit 15.264 +a changeset on a case sensitive (typically Linux or Unix) filesystem 15.265 +that will cause trouble for users on case insensitive (usually Windows 15.266 +and MacOS) users. If a Linux user commits changes to two files, one 15.267 +named \filename{myfile.c} and the other named \filename{MyFile.C}, 15.268 +they will be stored correctly in the repository. And in the working 15.269 +directories of other Linux users, they will be correctly represented 15.270 +as separate files. 15.271 + 15.272 +If a Windows or Mac user pulls this change, they will not initially 15.273 +have a problem, because Mercurial's repository storage mechanism is 15.274 +case safe. However, once they try to \hgcmd{update} the working 15.275 +directory to that changeset, or \hgcmd{merge} with that changeset, 15.276 +Mercurial will spot the conflict between the two file names that the 15.277 +filesystem would treat as the same, and forbid the update or merge 15.278 +from occurring. 15.279 + 15.280 +\subsection{Fixing a case conflict} 15.281 + 15.282 +If you are using Windows or a Mac in a mixed environment where some of 15.283 +your collaborators are using Linux or Unix, and Mercurial reports a 15.284 +case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, 15.285 +the procedure to fix the problem is simple. 15.286 + 15.287 +Just find a nearby Linux or Unix box, clone the problem repository 15.288 +onto it, and use Mercurial's \hgcmd{rename} command to change the 15.289 +names of any offending files or directories so that they will no 15.290 +longer cause case folding conflicts. Commit this change, \hgcmd{pull} 15.291 +or \hgcmd{push} it across to your Windows or MacOS system, and 15.292 +\hgcmd{update} to the revision with the non-conflicting names. 15.293 + 15.294 +The changeset with case-conflicting names will remain in your 15.295 +project's history, and you still won't be able to \hgcmd{update} your 15.296 +working directory to that changeset on a Windows or MacOS system, but 15.297 +you can continue development unimpeded. 15.298 + 15.299 +\begin{note} 15.300 + Prior to version~0.9.3, Mercurial did not use a case safe repository 15.301 + storage mechanism, and did not detect case folding conflicts. If 15.302 + you are using an older version of Mercurial on Windows or MacOS, I 15.303 + strongly recommend that you upgrade. 15.304 +\end{note} 15.305 + 15.306 +%%% Local Variables: 15.307 +%%% mode: latex 15.308 +%%% TeX-master: "00book" 15.309 +%%% End:
16.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 16.2 +++ b/en/ch08-branch.tex Thu Jan 29 22:56:27 2009 -0800 16.3 @@ -0,0 +1,392 @@ 16.4 +\chapter{Managing releases and branchy development} 16.5 +\label{chap:branch} 16.6 + 16.7 +Mercurial provides several mechanisms for you to manage a project that 16.8 +is making progress on multiple fronts at once. To understand these 16.9 +mechanisms, let's first take a brief look at a fairly normal software 16.10 +project structure. 16.11 + 16.12 +Many software projects issue periodic ``major'' releases that contain 16.13 +substantial new features. In parallel, they may issue ``minor'' 16.14 +releases. These are usually identical to the major releases off which 16.15 +they're based, but with a few bugs fixed. 16.16 + 16.17 +In this chapter, we'll start by talking about how to keep records of 16.18 +project milestones such as releases. We'll then continue on to talk 16.19 +about the flow of work between different phases of a project, and how 16.20 +Mercurial can help you to isolate and manage this work. 16.21 + 16.22 +\section{Giving a persistent name to a revision} 16.23 + 16.24 +Once you decide that you'd like to call a particular revision a 16.25 +``release'', it's a good idea to record the identity of that revision. 16.26 +This will let you reproduce that release at a later date, for whatever 16.27 +purpose you might need at the time (reproducing a bug, porting to a 16.28 +new platform, etc). 16.29 +\interaction{tag.init} 16.30 + 16.31 +Mercurial lets you give a permanent name to any revision using the 16.32 +\hgcmd{tag} command. Not surprisingly, these names are called 16.33 +``tags''. 16.34 +\interaction{tag.tag} 16.35 + 16.36 +A tag is nothing more than a ``symbolic name'' for a revision. Tags 16.37 +exist purely for your convenience, so that you have a handy permanent 16.38 +way to refer to a revision; Mercurial doesn't interpret the tag names 16.39 +you use in any way. Neither does Mercurial place any restrictions on 16.40 +the name of a tag, beyond a few that are necessary to ensure that a 16.41 +tag can be parsed unambiguously. A tag name cannot contain any of the 16.42 +following characters: 16.43 +\begin{itemize} 16.44 +\item Colon (ASCII 58, ``\texttt{:}'') 16.45 +\item Carriage return (ASCII 13, ``\Verb+\r+'') 16.46 +\item Newline (ASCII 10, ``\Verb+\n+'') 16.47 +\end{itemize} 16.48 + 16.49 +You can use the \hgcmd{tags} command to display the tags present in 16.50 +your repository. In the output, each tagged revision is identified 16.51 +first by its name, then by revision number, and finally by the unique 16.52 +hash of the revision. 16.53 +\interaction{tag.tags} 16.54 +Notice that \texttt{tip} is listed in the output of \hgcmd{tags}. The 16.55 +\texttt{tip} tag is a special ``floating'' tag, which always 16.56 +identifies the newest revision in the repository. 16.57 + 16.58 +In the output of the \hgcmd{tags} command, tags are listed in reverse 16.59 +order, by revision number. This usually means that recent tags are 16.60 +listed before older tags. It also means that \texttt{tip} is always 16.61 +going to be the first tag listed in the output of \hgcmd{tags}. 16.62 + 16.63 +When you run \hgcmd{log}, if it displays a revision that has tags 16.64 +associated with it, it will print those tags. 16.65 +\interaction{tag.log} 16.66 + 16.67 +Any time you need to provide a revision~ID to a Mercurial command, the 16.68 +command will accept a tag name in its place. Internally, Mercurial 16.69 +will translate your tag name into the corresponding revision~ID, then 16.70 +use that. 16.71 +\interaction{tag.log.v1.0} 16.72 + 16.73 +There's no limit on the number of tags you can have in a repository, 16.74 +or on the number of tags that a single revision can have. As a 16.75 +practical matter, it's not a great idea to have ``too many'' (a number 16.76 +which will vary from project to project), simply because tags are 16.77 +supposed to help you to find revisions. If you have lots of tags, the 16.78 +ease of using them to identify revisions diminishes rapidly. 16.79 + 16.80 +For example, if your project has milestones as frequent as every few 16.81 +days, it's perfectly reasonable to tag each one of those. But if you 16.82 +have a continuous build system that makes sure every revision can be 16.83 +built cleanly, you'd be introducing a lot of noise if you were to tag 16.84 +every clean build. Instead, you could tag failed builds (on the 16.85 +assumption that they're rare!), or simply not use tags to track 16.86 +buildability. 16.87 + 16.88 +If you want to remove a tag that you no longer want, use 16.89 +\hgcmdargs{tag}{--remove}. 16.90 +\interaction{tag.remove} 16.91 +You can also modify a tag at any time, so that it identifies a 16.92 +different revision, by simply issuing a new \hgcmd{tag} command. 16.93 +You'll have to use the \hgopt{tag}{-f} option to tell Mercurial that 16.94 +you \emph{really} want to update the tag. 16.95 +\interaction{tag.replace} 16.96 +There will still be a permanent record of the previous identity of the 16.97 +tag, but Mercurial will no longer use it. There's thus no penalty to 16.98 +tagging the wrong revision; all you have to do is turn around and tag 16.99 +the correct revision once you discover your error. 16.100 + 16.101 +Mercurial stores tags in a normal revision-controlled file in your 16.102 +repository. If you've created any tags, you'll find them in a file 16.103 +named \sfilename{.hgtags}. When you run the \hgcmd{tag} command, 16.104 +Mercurial modifies this file, then automatically commits the change to 16.105 +it. This means that every time you run \hgcmd{tag}, you'll see a 16.106 +corresponding changeset in the output of \hgcmd{log}. 16.107 +\interaction{tag.tip} 16.108 + 16.109 +\subsection{Handling tag conflicts during a merge} 16.110 + 16.111 +You won't often need to care about the \sfilename{.hgtags} file, but 16.112 +it sometimes makes its presence known during a merge. The format of 16.113 +the file is simple: it consists of a series of lines. Each line 16.114 +starts with a changeset hash, followed by a space, followed by the 16.115 +name of a tag. 16.116 + 16.117 +If you're resolving a conflict in the \sfilename{.hgtags} file during 16.118 +a merge, there's one twist to modifying the \sfilename{.hgtags} file: 16.119 +when Mercurial is parsing the tags in a repository, it \emph{never} 16.120 +reads the working copy of the \sfilename{.hgtags} file. Instead, it 16.121 +reads the \emph{most recently committed} revision of the file. 16.122 + 16.123 +An unfortunate consequence of this design is that you can't actually 16.124 +verify that your merged \sfilename{.hgtags} file is correct until 16.125 +\emph{after} you've committed a change. So if you find yourself 16.126 +resolving a conflict on \sfilename{.hgtags} during a merge, be sure to 16.127 +run \hgcmd{tags} after you commit. If it finds an error in the 16.128 +\sfilename{.hgtags} file, it will report the location of the error, 16.129 +which you can then fix and commit. You should then run \hgcmd{tags} 16.130 +again, just to be sure that your fix is correct. 16.131 + 16.132 +\subsection{Tags and cloning} 16.133 + 16.134 +You may have noticed that the \hgcmd{clone} command has a 16.135 +\hgopt{clone}{-r} option that lets you clone an exact copy of the 16.136 +repository as of a particular changeset. The new clone will not 16.137 +contain any project history that comes after the revision you 16.138 +specified. This has an interaction with tags that can surprise the 16.139 +unwary. 16.140 + 16.141 +Recall that a tag is stored as a revision to the \sfilename{.hgtags} 16.142 +file, so that when you create a tag, the changeset in which it's 16.143 +recorded necessarily refers to an older changeset. When you run 16.144 +\hgcmdargs{clone}{-r foo} to clone a repository as of tag 16.145 +\texttt{foo}, the new clone \emph{will not contain the history that 16.146 + created the tag} that you used to clone the repository. The result 16.147 +is that you'll get exactly the right subset of the project's history 16.148 +in the new repository, but \emph{not} the tag you might have expected. 16.149 + 16.150 +\subsection{When permanent tags are too much} 16.151 + 16.152 +Since Mercurial's tags are revision controlled and carried around with 16.153 +a project's history, everyone you work with will see the tags you 16.154 +create. But giving names to revisions has uses beyond simply noting 16.155 +that revision \texttt{4237e45506ee} is really \texttt{v2.0.2}. If 16.156 +you're trying to track down a subtle bug, you might want a tag to 16.157 +remind you of something like ``Anne saw the symptoms with this 16.158 +revision''. 16.159 + 16.160 +For cases like this, what you might want to use are \emph{local} tags. 16.161 +You can create a local tag with the \hgopt{tag}{-l} option to the 16.162 +\hgcmd{tag} command. This will store the tag in a file called 16.163 +\sfilename{.hg/localtags}. Unlike \sfilename{.hgtags}, 16.164 +\sfilename{.hg/localtags} is not revision controlled. Any tags you 16.165 +create using \hgopt{tag}{-l} remain strictly local to the repository 16.166 +you're currently working in. 16.167 + 16.168 +\section{The flow of changes---big picture vs. little} 16.169 + 16.170 +To return to the outline I sketched at the beginning of a chapter, 16.171 +let's think about a project that has multiple concurrent pieces of 16.172 +work under development at once. 16.173 + 16.174 +There might be a push for a new ``main'' release; a new minor bugfix 16.175 +release to the last main release; and an unexpected ``hot fix'' to an 16.176 +old release that is now in maintenance mode. 16.177 + 16.178 +The usual way people refer to these different concurrent directions of 16.179 +development is as ``branches''. However, we've already seen numerous 16.180 +times that Mercurial treats \emph{all of history} as a series of 16.181 +branches and merges. Really, what we have here is two ideas that are 16.182 +peripherally related, but which happen to share a name. 16.183 +\begin{itemize} 16.184 +\item ``Big picture'' branches represent the sweep of a project's 16.185 + evolution; people give them names, and talk about them in 16.186 + conversation. 16.187 +\item ``Little picture'' branches are artefacts of the day-to-day 16.188 + activity of developing and merging changes. They expose the 16.189 + narrative of how the code was developed. 16.190 +\end{itemize} 16.191 + 16.192 +\section{Managing big-picture branches in repositories} 16.193 + 16.194 +The easiest way to isolate a ``big picture'' branch in Mercurial is in 16.195 +a dedicated repository. If you have an existing shared 16.196 +repository---let's call it \texttt{myproject}---that reaches a ``1.0'' 16.197 +milestone, you can start to prepare for future maintenance releases on 16.198 +top of version~1.0 by tagging the revision from which you prepared 16.199 +the~1.0 release. 16.200 +\interaction{branch-repo.tag} 16.201 +You can then clone a new shared \texttt{myproject-1.0.1} repository as 16.202 +of that tag. 16.203 +\interaction{branch-repo.clone} 16.204 + 16.205 +Afterwards, if someone needs to work on a bug fix that ought to go 16.206 +into an upcoming~1.0.1 minor release, they clone the 16.207 +\texttt{myproject-1.0.1} repository, make their changes, and push them 16.208 +back. 16.209 +\interaction{branch-repo.bugfix} 16.210 +Meanwhile, development for the next major release can continue, 16.211 +isolated and unabated, in the \texttt{myproject} repository. 16.212 +\interaction{branch-repo.new} 16.213 + 16.214 +\section{Don't repeat yourself: merging across branches} 16.215 + 16.216 +In many cases, if you have a bug to fix on a maintenance branch, the 16.217 +chances are good that the bug exists on your project's main branch 16.218 +(and possibly other maintenance branches, too). It's a rare developer 16.219 +who wants to fix the same bug multiple times, so let's look at a few 16.220 +ways that Mercurial can help you to manage these bugfixes without 16.221 +duplicating your work. 16.222 + 16.223 +In the simplest instance, all you need to do is pull changes from your 16.224 +maintenance branch into your local clone of the target branch. 16.225 +\interaction{branch-repo.pull} 16.226 +You'll then need to merge the heads of the two branches, and push back 16.227 +to the main branch. 16.228 +\interaction{branch-repo.merge} 16.229 + 16.230 +\section{Naming branches within one repository} 16.231 + 16.232 +In most instances, isolating branches in repositories is the right 16.233 +approach. Its simplicity makes it easy to understand; and so it's 16.234 +hard to make mistakes. There's a one-to-one relationship between 16.235 +branches you're working in and directories on your system. This lets 16.236 +you use normal (non-Mercurial-aware) tools to work on files within a 16.237 +branch/repository. 16.238 + 16.239 +If you're more in the ``power user'' category (\emph{and} your 16.240 +collaborators are too), there is an alternative way of handling 16.241 +branches that you can consider. I've already mentioned the 16.242 +human-level distinction between ``small picture'' and ``big picture'' 16.243 +branches. While Mercurial works with multiple ``small picture'' 16.244 +branches in a repository all the time (for example after you pull 16.245 +changes in, but before you merge them), it can \emph{also} work with 16.246 +multiple ``big picture'' branches. 16.247 + 16.248 +The key to working this way is that Mercurial lets you assign a 16.249 +persistent \emph{name} to a branch. There always exists a branch 16.250 +named \texttt{default}. Even before you start naming branches 16.251 +yourself, you can find traces of the \texttt{default} branch if you 16.252 +look for them. 16.253 + 16.254 +As an example, when you run the \hgcmd{commit} command, and it pops up 16.255 +your editor so that you can enter a commit message, look for a line 16.256 +that contains the text ``\texttt{HG: branch default}'' at the bottom. 16.257 +This is telling you that your commit will occur on the branch named 16.258 +\texttt{default}. 16.259 + 16.260 +To start working with named branches, use the \hgcmd{branches} 16.261 +command. This command lists the named branches already present in 16.262 +your repository, telling you which changeset is the tip of each. 16.263 +\interaction{branch-named.branches} 16.264 +Since you haven't created any named branches yet, the only one that 16.265 +exists is \texttt{default}. 16.266 + 16.267 +To find out what the ``current'' branch is, run the \hgcmd{branch} 16.268 +command, giving it no arguments. This tells you what branch the 16.269 +parent of the current changeset is on. 16.270 +\interaction{branch-named.branch} 16.271 + 16.272 +To create a new branch, run the \hgcmd{branch} command again. This 16.273 +time, give it one argument: the name of the branch you want to create. 16.274 +\interaction{branch-named.create} 16.275 + 16.276 +After you've created a branch, you might wonder what effect the 16.277 +\hgcmd{branch} command has had. What do the \hgcmd{status} and 16.278 +\hgcmd{tip} commands report? 16.279 +\interaction{branch-named.status} 16.280 +Nothing has changed in the working directory, and there's been no new 16.281 +history created. As this suggests, running the \hgcmd{branch} command 16.282 +has no permanent effect; it only tells Mercurial what branch name to 16.283 +use the \emph{next} time you commit a changeset. 16.284 + 16.285 +When you commit a change, Mercurial records the name of the branch on 16.286 +which you committed. Once you've switched from the \texttt{default} 16.287 +branch to another and committed, you'll see the name of the new branch 16.288 +show up in the output of \hgcmd{log}, \hgcmd{tip}, and other commands 16.289 +that display the same kind of output. 16.290 +\interaction{branch-named.commit} 16.291 +The \hgcmd{log}-like commands will print the branch name of every 16.292 +changeset that's not on the \texttt{default} branch. As a result, if 16.293 +you never use named branches, you'll never see this information. 16.294 + 16.295 +Once you've named a branch and committed a change with that name, 16.296 +every subsequent commit that descends from that change will inherit 16.297 +the same branch name. You can change the name of a branch at any 16.298 +time, using the \hgcmd{branch} command. 16.299 +\interaction{branch-named.rebranch} 16.300 +In practice, this is something you won't do very often, as branch 16.301 +names tend to have fairly long lifetimes. (This isn't a rule, just an 16.302 +observation.) 16.303 + 16.304 +\section{Dealing with multiple named branches in a repository} 16.305 + 16.306 +If you have more than one named branch in a repository, Mercurial will 16.307 +remember the branch that your working directory on when you start a 16.308 +command like \hgcmd{update} or \hgcmdargs{pull}{-u}. It will update 16.309 +the working directory to the tip of this branch, no matter what the 16.310 +``repo-wide'' tip is. To update to a revision that's on a different 16.311 +named branch, you may need to use the \hgopt{update}{-C} option to 16.312 +\hgcmd{update}. 16.313 + 16.314 +This behaviour is a little subtle, so let's see it in action. First, 16.315 +let's remind ourselves what branch we're currently on, and what 16.316 +branches are in our repository. 16.317 +\interaction{branch-named.parents} 16.318 +We're on the \texttt{bar} branch, but there also exists an older 16.319 +\hgcmd{foo} branch. 16.320 + 16.321 +We can \hgcmd{update} back and forth between the tips of the 16.322 +\texttt{foo} and \texttt{bar} branches without needing to use the 16.323 +\hgopt{update}{-C} option, because this only involves going backwards 16.324 +and forwards linearly through our change history. 16.325 +\interaction{branch-named.update-switchy} 16.326 + 16.327 +If we go back to the \texttt{foo} branch and then run \hgcmd{update}, 16.328 +it will keep us on \texttt{foo}, not move us to the tip of 16.329 +\texttt{bar}. 16.330 +\interaction{branch-named.update-nothing} 16.331 + 16.332 +Committing a new change on the \texttt{foo} branch introduces a new 16.333 +head. 16.334 +\interaction{branch-named.foo-commit} 16.335 + 16.336 +\section{Branch names and merging} 16.337 + 16.338 +As you've probably noticed, merges in Mercurial are not symmetrical. 16.339 +Let's say our repository has two heads, 17 and 23. If I 16.340 +\hgcmd{update} to 17 and then \hgcmd{merge} with 23, Mercurial records 16.341 +17 as the first parent of the merge, and 23 as the second. Whereas if 16.342 +I \hgcmd{update} to 23 and then \hgcmd{merge} with 17, it records 23 16.343 +as the first parent, and 17 as the second. 16.344 + 16.345 +This affects Mercurial's choice of branch name when you merge. After 16.346 +a merge, Mercurial will retain the branch name of the first parent 16.347 +when you commit the result of the merge. If your first parent's 16.348 +branch name is \texttt{foo}, and you merge with \texttt{bar}, the 16.349 +branch name will still be \texttt{foo} after you merge. 16.350 + 16.351 +It's not unusual for a repository to contain multiple heads, each with 16.352 +the same branch name. Let's say I'm working on the \texttt{foo} 16.353 +branch, and so are you. We commit different changes; I pull your 16.354 +changes; I now have two heads, each claiming to be on the \texttt{foo} 16.355 +branch. The result of a merge will be a single head on the 16.356 +\texttt{foo} branch, as you might hope. 16.357 + 16.358 +But if I'm working on the \texttt{bar} branch, and I merge work from 16.359 +the \texttt{foo} branch, the result will remain on the \texttt{bar} 16.360 +branch. 16.361 +\interaction{branch-named.merge} 16.362 + 16.363 +To give a more concrete example, if I'm working on the 16.364 +\texttt{bleeding-edge} branch, and I want to bring in the latest fixes 16.365 +from the \texttt{stable} branch, Mercurial will choose the ``right'' 16.366 +(\texttt{bleeding-edge}) branch name when I pull and merge from 16.367 +\texttt{stable}. 16.368 + 16.369 +\section{Branch naming is generally useful} 16.370 + 16.371 +You shouldn't think of named branches as applicable only to situations 16.372 +where you have multiple long-lived branches cohabiting in a single 16.373 +repository. They're very useful even in the one-branch-per-repository 16.374 +case. 16.375 + 16.376 +In the simplest case, giving a name to each branch gives you a 16.377 +permanent record of which branch a changeset originated on. This 16.378 +gives you more context when you're trying to follow the history of a 16.379 +long-lived branchy project. 16.380 + 16.381 +If you're working with shared repositories, you can set up a 16.382 +\hook{pretxnchangegroup} hook on each that will block incoming changes 16.383 +that have the ``wrong'' branch name. This provides a simple, but 16.384 +effective, defence against people accidentally pushing changes from a 16.385 +``bleeding edge'' branch to a ``stable'' branch. Such a hook might 16.386 +look like this inside the shared repo's \hgrc. 16.387 +\begin{codesample2} 16.388 + [hooks] 16.389 + pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch 16.390 +\end{codesample2} 16.391 + 16.392 +%%% Local Variables: 16.393 +%%% mode: latex 16.394 +%%% TeX-master: "00book" 16.395 +%%% End:
17.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 17.2 +++ b/en/ch09-undo.tex Thu Jan 29 22:56:27 2009 -0800 17.3 @@ -0,0 +1,767 @@ 17.4 +\chapter{Finding and fixing your mistakes} 17.5 +\label{chap:undo} 17.6 + 17.7 +To err might be human, but to really handle the consequences well 17.8 +takes a top-notch revision control system. In this chapter, we'll 17.9 +discuss some of the techniques you can use when you find that a 17.10 +problem has crept into your project. Mercurial has some highly 17.11 +capable features that will help you to isolate the sources of 17.12 +problems, and to handle them appropriately. 17.13 + 17.14 +\section{Erasing local history} 17.15 + 17.16 +\subsection{The accidental commit} 17.17 + 17.18 +I have the occasional but persistent problem of typing rather more 17.19 +quickly than I can think, which sometimes results in me committing a 17.20 +changeset that is either incomplete or plain wrong. In my case, the 17.21 +usual kind of incomplete changeset is one in which I've created a new 17.22 +source file, but forgotten to \hgcmd{add} it. A ``plain wrong'' 17.23 +changeset is not as common, but no less annoying. 17.24 + 17.25 +\subsection{Rolling back a transaction} 17.26 +\label{sec:undo:rollback} 17.27 + 17.28 +In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats 17.29 +each modification of a repository as a \emph{transaction}. Every time 17.30 +you commit a changeset or pull changes from another repository, 17.31 +Mercurial remembers what you did. You can undo, or \emph{roll back}, 17.32 +exactly one of these actions using the \hgcmd{rollback} command. (See 17.33 +section~\ref{sec:undo:rollback-after-push} for an important caveat 17.34 +about the use of this command.) 17.35 + 17.36 +Here's a mistake that I often find myself making: committing a change 17.37 +in which I've created a new file, but forgotten to \hgcmd{add} it. 17.38 +\interaction{rollback.commit} 17.39 +Looking at the output of \hgcmd{status} after the commit immediately 17.40 +confirms the error. 17.41 +\interaction{rollback.status} 17.42 +The commit captured the changes to the file \filename{a}, but not the 17.43 +new file \filename{b}. If I were to push this changeset to a 17.44 +repository that I shared with a colleague, the chances are high that 17.45 +something in \filename{a} would refer to \filename{b}, which would not 17.46 +be present in their repository when they pulled my changes. I would 17.47 +thus become the object of some indignation. 17.48 + 17.49 +However, luck is with me---I've caught my error before I pushed the 17.50 +changeset. I use the \hgcmd{rollback} command, and Mercurial makes 17.51 +that last changeset vanish. 17.52 +\interaction{rollback.rollback} 17.53 +Notice that the changeset is no longer present in the repository's 17.54 +history, and the working directory once again thinks that the file 17.55 +\filename{a} is modified. The commit and rollback have left the 17.56 +working directory exactly as it was prior to the commit; the changeset 17.57 +has been completely erased. I can now safely \hgcmd{add} the file 17.58 +\filename{b}, and rerun my commit. 17.59 +\interaction{rollback.add} 17.60 + 17.61 +\subsection{The erroneous pull} 17.62 + 17.63 +It's common practice with Mercurial to maintain separate development 17.64 +branches of a project in different repositories. Your development 17.65 +team might have one shared repository for your project's ``0.9'' 17.66 +release, and another, containing different changes, for the ``1.0'' 17.67 +release. 17.68 + 17.69 +Given this, you can imagine that the consequences could be messy if 17.70 +you had a local ``0.9'' repository, and accidentally pulled changes 17.71 +from the shared ``1.0'' repository into it. At worst, you could be 17.72 +paying insufficient attention, and push those changes into the shared 17.73 +``0.9'' tree, confusing your entire team (but don't worry, we'll 17.74 +return to this horror scenario later). However, it's more likely that 17.75 +you'll notice immediately, because Mercurial will display the URL it's 17.76 +pulling from, or you will see it pull a suspiciously large number of 17.77 +changes into the repository. 17.78 + 17.79 +The \hgcmd{rollback} command will work nicely to expunge all of the 17.80 +changesets that you just pulled. Mercurial groups all changes from 17.81 +one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is 17.82 +all you need to undo this mistake. 17.83 + 17.84 +\subsection{Rolling back is useless once you've pushed} 17.85 +\label{sec:undo:rollback-after-push} 17.86 + 17.87 +The value of the \hgcmd{rollback} command drops to zero once you've 17.88 +pushed your changes to another repository. Rolling back a change 17.89 +makes it disappear entirely, but \emph{only} in the repository in 17.90 +which you perform the \hgcmd{rollback}. Because a rollback eliminates 17.91 +history, there's no way for the disappearance of a change to propagate 17.92 +between repositories. 17.93 + 17.94 +If you've pushed a change to another repository---particularly if it's 17.95 +a shared repository---it has essentially ``escaped into the wild,'' 17.96 +and you'll have to recover from your mistake in a different way. What 17.97 +will happen if you push a changeset somewhere, then roll it back, then 17.98 +pull from the repository you pushed to, is that the changeset will 17.99 +reappear in your repository. 17.100 + 17.101 +(If you absolutely know for sure that the change you want to roll back 17.102 +is the most recent change in the repository that you pushed to, 17.103 +\emph{and} you know that nobody else could have pulled it from that 17.104 +repository, you can roll back the changeset there, too, but you really 17.105 +should really not rely on this working reliably. If you do this, 17.106 +sooner or later a change really will make it into a repository that 17.107 +you don't directly control (or have forgotten about), and come back to 17.108 +bite you.) 17.109 + 17.110 +\subsection{You can only roll back once} 17.111 + 17.112 +Mercurial stores exactly one transaction in its transaction log; that 17.113 +transaction is the most recent one that occurred in the repository. 17.114 +This means that you can only roll back one transaction. If you expect 17.115 +to be able to roll back one transaction, then its predecessor, this is 17.116 +not the behaviour you will get. 17.117 +\interaction{rollback.twice} 17.118 +Once you've rolled back one transaction in a repository, you can't 17.119 +roll back again in that repository until you perform another commit or 17.120 +pull. 17.121 + 17.122 +\section{Reverting the mistaken change} 17.123 + 17.124 +If you make a modification to a file, and decide that you really 17.125 +didn't want to change the file at all, and you haven't yet committed 17.126 +your changes, the \hgcmd{revert} command is the one you'll need. It 17.127 +looks at the changeset that's the parent of the working directory, and 17.128 +restores the contents of the file to their state as of that changeset. 17.129 +(That's a long-winded way of saying that, in the normal case, it 17.130 +undoes your modifications.) 17.131 + 17.132 +Let's illustrate how the \hgcmd{revert} command works with yet another 17.133 +small example. We'll begin by modifying a file that Mercurial is 17.134 +already tracking. 17.135 +\interaction{daily.revert.modify} 17.136 +If we don't want that change, we can simply \hgcmd{revert} the file. 17.137 +\interaction{daily.revert.unmodify} 17.138 +The \hgcmd{revert} command provides us with an extra degree of safety 17.139 +by saving our modified file with a \filename{.orig} extension. 17.140 +\interaction{daily.revert.status} 17.141 + 17.142 +Here is a summary of the cases that the \hgcmd{revert} command can 17.143 +deal with. We will describe each of these in more detail in the 17.144 +section that follows. 17.145 +\begin{itemize} 17.146 +\item If you modify a file, it will restore the file to its unmodified 17.147 + state. 17.148 +\item If you \hgcmd{add} a file, it will undo the ``added'' state of 17.149 + the file, but leave the file itself untouched. 17.150 +\item If you delete a file without telling Mercurial, it will restore 17.151 + the file to its unmodified contents. 17.152 +\item If you use the \hgcmd{remove} command to remove a file, it will 17.153 + undo the ``removed'' state of the file, and restore the file to its 17.154 + unmodified contents. 17.155 +\end{itemize} 17.156 + 17.157 +\subsection{File management errors} 17.158 +\label{sec:undo:mgmt} 17.159 + 17.160 +The \hgcmd{revert} command is useful for more than just modified 17.161 +files. It lets you reverse the results of all of Mercurial's file 17.162 +management commands---\hgcmd{add}, \hgcmd{remove}, and so on. 17.163 + 17.164 +If you \hgcmd{add} a file, then decide that in fact you don't want 17.165 +Mercurial to track it, use \hgcmd{revert} to undo the add. Don't 17.166 +worry; Mercurial will not modify the file in any way. It will just 17.167 +``unmark'' the file. 17.168 +\interaction{daily.revert.add} 17.169 + 17.170 +Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use 17.171 +\hgcmd{revert} to restore it to the contents it had as of the parent 17.172 +of the working directory. 17.173 +\interaction{daily.revert.remove} 17.174 +This works just as well for a file that you deleted by hand, without 17.175 +telling Mercurial (recall that in Mercurial terminology, this kind of 17.176 +file is called ``missing''). 17.177 +\interaction{daily.revert.missing} 17.178 + 17.179 +If you revert a \hgcmd{copy}, the copied-to file remains in your 17.180 +working directory afterwards, untracked. Since a copy doesn't affect 17.181 +the copied-from file in any way, Mercurial doesn't do anything with 17.182 +the copied-from file. 17.183 +\interaction{daily.revert.copy} 17.184 + 17.185 +\subsubsection{A slightly special case: reverting a rename} 17.186 + 17.187 +If you \hgcmd{rename} a file, there is one small detail that 17.188 +you should remember. When you \hgcmd{revert} a rename, it's not 17.189 +enough to provide the name of the renamed-to file, as you can see 17.190 +here. 17.191 +\interaction{daily.revert.rename} 17.192 +As you can see from the output of \hgcmd{status}, the renamed-to file 17.193 +is no longer identified as added, but the renamed-\emph{from} file is 17.194 +still removed! This is counter-intuitive (at least to me), but at 17.195 +least it's easy to deal with. 17.196 +\interaction{daily.revert.rename-orig} 17.197 +So remember, to revert a \hgcmd{rename}, you must provide \emph{both} 17.198 +the source and destination names. 17.199 + 17.200 +% TODO: the output doesn't look like it will be removed! 17.201 + 17.202 +(By the way, if you rename a file, then modify the renamed-to file, 17.203 +then revert both components of the rename, when Mercurial restores the 17.204 +file that was removed as part of the rename, it will be unmodified. 17.205 +If you need the modifications in the renamed-to file to show up in the 17.206 +renamed-from file, don't forget to copy them over.) 17.207 + 17.208 +These fiddly aspects of reverting a rename arguably constitute a small 17.209 +bug in Mercurial. 17.210 + 17.211 +\section{Dealing with committed changes} 17.212 + 17.213 +Consider a case where you have committed a change $a$, and another 17.214 +change $b$ on top of it; you then realise that change $a$ was 17.215 +incorrect. Mercurial lets you ``back out'' an entire changeset 17.216 +automatically, and building blocks that let you reverse part of a 17.217 +changeset by hand. 17.218 + 17.219 +Before you read this section, here's something to keep in mind: the 17.220 +\hgcmd{backout} command undoes changes by \emph{adding} history, not 17.221 +by modifying or erasing it. It's the right tool to use if you're 17.222 +fixing bugs, but not if you're trying to undo some change that has 17.223 +catastrophic consequences. To deal with those, see 17.224 +section~\ref{sec:undo:aaaiiieee}. 17.225 + 17.226 +\subsection{Backing out a changeset} 17.227 + 17.228 +The \hgcmd{backout} command lets you ``undo'' the effects of an entire 17.229 +changeset in an automated fashion. Because Mercurial's history is 17.230 +immutable, this command \emph{does not} get rid of the changeset you 17.231 +want to undo. Instead, it creates a new changeset that 17.232 +\emph{reverses} the effect of the to-be-undone changeset. 17.233 + 17.234 +The operation of the \hgcmd{backout} command is a little intricate, so 17.235 +let's illustrate it with some examples. First, we'll create a 17.236 +repository with some simple changes. 17.237 +\interaction{backout.init} 17.238 + 17.239 +The \hgcmd{backout} command takes a single changeset ID as its 17.240 +argument; this is the changeset to back out. Normally, 17.241 +\hgcmd{backout} will drop you into a text editor to write a commit 17.242 +message, so you can record why you're backing the change out. In this 17.243 +example, we provide a commit message on the command line using the 17.244 +\hgopt{backout}{-m} option. 17.245 + 17.246 +\subsection{Backing out the tip changeset} 17.247 + 17.248 +We're going to start by backing out the last changeset we committed. 17.249 +\interaction{backout.simple} 17.250 +You can see that the second line from \filename{myfile} is no longer 17.251 +present. Taking a look at the output of \hgcmd{log} gives us an idea 17.252 +of what the \hgcmd{backout} command has done. 17.253 +\interaction{backout.simple.log} 17.254 +Notice that the new changeset that \hgcmd{backout} has created is a 17.255 +child of the changeset we backed out. It's easier to see this in 17.256 +figure~\ref{fig:undo:backout}, which presents a graphical view of the 17.257 +change history. As you can see, the history is nice and linear. 17.258 + 17.259 +\begin{figure}[htb] 17.260 + \centering 17.261 + \grafix{undo-simple} 17.262 + \caption{Backing out a change using the \hgcmd{backout} command} 17.263 + \label{fig:undo:backout} 17.264 +\end{figure} 17.265 + 17.266 +\subsection{Backing out a non-tip change} 17.267 + 17.268 +If you want to back out a change other than the last one you 17.269 +committed, pass the \hgopt{backout}{--merge} option to the 17.270 +\hgcmd{backout} command. 17.271 +\interaction{backout.non-tip.clone} 17.272 +This makes backing out any changeset a ``one-shot'' operation that's 17.273 +usually simple and fast. 17.274 +\interaction{backout.non-tip.backout} 17.275 + 17.276 +If you take a look at the contents of \filename{myfile} after the 17.277 +backout finishes, you'll see that the first and third changes are 17.278 +present, but not the second. 17.279 +\interaction{backout.non-tip.cat} 17.280 + 17.281 +As the graphical history in figure~\ref{fig:undo:backout-non-tip} 17.282 +illustrates, Mercurial actually commits \emph{two} changes in this 17.283 +kind of situation (the box-shaped nodes are the ones that Mercurial 17.284 +commits automatically). Before Mercurial begins the backout process, 17.285 +it first remembers what the current parent of the working directory 17.286 +is. It then backs out the target changeset, and commits that as a 17.287 +changeset. Finally, it merges back to the previous parent of the 17.288 +working directory, and commits the result of the merge. 17.289 + 17.290 +% TODO: to me it looks like mercurial doesn't commit the second merge automatically! 17.291 + 17.292 +\begin{figure}[htb] 17.293 + \centering 17.294 + \grafix{undo-non-tip} 17.295 + \caption{Automated backout of a non-tip change using the \hgcmd{backout} command} 17.296 + \label{fig:undo:backout-non-tip} 17.297 +\end{figure} 17.298 + 17.299 +The result is that you end up ``back where you were'', only with some 17.300 +extra history that undoes the effect of the changeset you wanted to 17.301 +back out. 17.302 + 17.303 +\subsubsection{Always use the \hgopt{backout}{--merge} option} 17.304 + 17.305 +In fact, since the \hgopt{backout}{--merge} option will do the ``right 17.306 +thing'' whether or not the changeset you're backing out is the tip 17.307 +(i.e.~it won't try to merge if it's backing out the tip, since there's 17.308 +no need), you should \emph{always} use this option when you run the 17.309 +\hgcmd{backout} command. 17.310 + 17.311 +\subsection{Gaining more control of the backout process} 17.312 + 17.313 +While I've recommended that you always use the 17.314 +\hgopt{backout}{--merge} option when backing out a change, the 17.315 +\hgcmd{backout} command lets you decide how to merge a backout 17.316 +changeset. Taking control of the backout process by hand is something 17.317 +you will rarely need to do, but it can be useful to understand what 17.318 +the \hgcmd{backout} command is doing for you automatically. To 17.319 +illustrate this, let's clone our first repository, but omit the 17.320 +backout change that it contains. 17.321 + 17.322 +\interaction{backout.manual.clone} 17.323 +As with our earlier example, We'll commit a third changeset, then back 17.324 +out its parent, and see what happens. 17.325 +\interaction{backout.manual.backout} 17.326 +Our new changeset is again a descendant of the changeset we backout 17.327 +out; it's thus a new head, \emph{not} a descendant of the changeset 17.328 +that was the tip. The \hgcmd{backout} command was quite explicit in 17.329 +telling us this. 17.330 +\interaction{backout.manual.log} 17.331 + 17.332 +Again, it's easier to see what has happened by looking at a graph of 17.333 +the revision history, in figure~\ref{fig:undo:backout-manual}. This 17.334 +makes it clear that when we use \hgcmd{backout} to back out a change 17.335 +other than the tip, Mercurial adds a new head to the repository (the 17.336 +change it committed is box-shaped). 17.337 + 17.338 +\begin{figure}[htb] 17.339 + \centering 17.340 + \grafix{undo-manual} 17.341 + \caption{Backing out a change using the \hgcmd{backout} command} 17.342 + \label{fig:undo:backout-manual} 17.343 +\end{figure} 17.344 + 17.345 +After the \hgcmd{backout} command has completed, it leaves the new 17.346 +``backout'' changeset as the parent of the working directory. 17.347 +\interaction{backout.manual.parents} 17.348 +Now we have two isolated sets of changes. 17.349 +\interaction{backout.manual.heads} 17.350 + 17.351 +Let's think about what we expect to see as the contents of 17.352 +\filename{myfile} now. The first change should be present, because 17.353 +we've never backed it out. The second change should be missing, as 17.354 +that's the change we backed out. Since the history graph shows the 17.355 +third change as a separate head, we \emph{don't} expect to see the 17.356 +third change present in \filename{myfile}. 17.357 +\interaction{backout.manual.cat} 17.358 +To get the third change back into the file, we just do a normal merge 17.359 +of our two heads. 17.360 +\interaction{backout.manual.merge} 17.361 +Afterwards, the graphical history of our repository looks like 17.362 +figure~\ref{fig:undo:backout-manual-merge}. 17.363 + 17.364 +\begin{figure}[htb] 17.365 + \centering 17.366 + \grafix{undo-manual-merge} 17.367 + \caption{Manually merging a backout change} 17.368 + \label{fig:undo:backout-manual-merge} 17.369 +\end{figure} 17.370 + 17.371 +\subsection{Why \hgcmd{backout} works as it does} 17.372 + 17.373 +Here's a brief description of how the \hgcmd{backout} command works. 17.374 +\begin{enumerate} 17.375 +\item It ensures that the working directory is ``clean'', i.e.~that 17.376 + the output of \hgcmd{status} would be empty. 17.377 +\item It remembers the current parent of the working directory. Let's 17.378 + call this changeset \texttt{orig} 17.379 +\item It does the equivalent of a \hgcmd{update} to sync the working 17.380 + directory to the changeset you want to back out. Let's call this 17.381 + changeset \texttt{backout} 17.382 +\item It finds the parent of that changeset. Let's call that 17.383 + changeset \texttt{parent}. 17.384 +\item For each file that the \texttt{backout} changeset affected, it 17.385 + does the equivalent of a \hgcmdargs{revert}{-r parent} on that file, 17.386 + to restore it to the contents it had before that changeset was 17.387 + committed. 17.388 +\item It commits the result as a new changeset. This changeset has 17.389 + \texttt{backout} as its parent. 17.390 +\item If you specify \hgopt{backout}{--merge} on the command line, it 17.391 + merges with \texttt{orig}, and commits the result of the merge. 17.392 +\end{enumerate} 17.393 + 17.394 +An alternative way to implement the \hgcmd{backout} command would be 17.395 +to \hgcmd{export} the to-be-backed-out changeset as a diff, then use 17.396 +the \cmdopt{patch}{--reverse} option to the \command{patch} command to 17.397 +reverse the effect of the change without fiddling with the working 17.398 +directory. This sounds much simpler, but it would not work nearly as 17.399 +well. 17.400 + 17.401 +The reason that \hgcmd{backout} does an update, a commit, a merge, and 17.402 +another commit is to give the merge machinery the best chance to do a 17.403 +good job when dealing with all the changes \emph{between} the change 17.404 +you're backing out and the current tip. 17.405 + 17.406 +If you're backing out a changeset that's~100 revisions back in your 17.407 +project's history, the chances that the \command{patch} command will 17.408 +be able to apply a reverse diff cleanly are not good, because 17.409 +intervening changes are likely to have ``broken the context'' that 17.410 +\command{patch} uses to determine whether it can apply a patch (if 17.411 +this sounds like gibberish, see \ref{sec:mq:patch} for a 17.412 +discussion of the \command{patch} command). Also, Mercurial's merge 17.413 +machinery will handle files and directories being renamed, permission 17.414 +changes, and modifications to binary files, none of which 17.415 +\command{patch} can deal with. 17.416 + 17.417 +\section{Changes that should never have been} 17.418 +\label{sec:undo:aaaiiieee} 17.419 + 17.420 +Most of the time, the \hgcmd{backout} command is exactly what you need 17.421 +if you want to undo the effects of a change. It leaves a permanent 17.422 +record of exactly what you did, both when committing the original 17.423 +changeset and when you cleaned up after it. 17.424 + 17.425 +On rare occasions, though, you may find that you've committed a change 17.426 +that really should not be present in the repository at all. For 17.427 +example, it would be very unusual, and usually considered a mistake, 17.428 +to commit a software project's object files as well as its source 17.429 +files. Object files have almost no intrinsic value, and they're 17.430 +\emph{big}, so they increase the size of the repository and the amount 17.431 +of time it takes to clone or pull changes. 17.432 + 17.433 +Before I discuss the options that you have if you commit a ``brown 17.434 +paper bag'' change (the kind that's so bad that you want to pull a 17.435 +brown paper bag over your head), let me first discuss some approaches 17.436 +that probably won't work. 17.437 + 17.438 +Since Mercurial treats history as accumulative---every change builds 17.439 +on top of all changes that preceded it---you generally can't just make 17.440 +disastrous changes disappear. The one exception is when you've just 17.441 +committed a change, and it hasn't been pushed or pulled into another 17.442 +repository. That's when you can safely use the \hgcmd{rollback} 17.443 +command, as I detailed in section~\ref{sec:undo:rollback}. 17.444 + 17.445 +After you've pushed a bad change to another repository, you 17.446 +\emph{could} still use \hgcmd{rollback} to make your local copy of the 17.447 +change disappear, but it won't have the consequences you want. The 17.448 +change will still be present in the remote repository, so it will 17.449 +reappear in your local repository the next time you pull. 17.450 + 17.451 +If a situation like this arises, and you know which repositories your 17.452 +bad change has propagated into, you can \emph{try} to get rid of the 17.453 +changeefrom \emph{every} one of those repositories. This is, of 17.454 +course, not a satisfactory solution: if you miss even a single 17.455 +repository while you're expunging, the change is still ``in the 17.456 +wild'', and could propagate further. 17.457 + 17.458 +If you've committed one or more changes \emph{after} the change that 17.459 +you'd like to see disappear, your options are further reduced. 17.460 +Mercurial doesn't provide a way to ``punch a hole'' in history, 17.461 +leaving changesets intact. 17.462 + 17.463 +XXX This needs filling out. The \texttt{hg-replay} script in the 17.464 +\texttt{examples} directory works, but doesn't handle merge 17.465 +changesets. Kind of an important omission. 17.466 + 17.467 +\subsection{Protect yourself from ``escaped'' changes} 17.468 + 17.469 +If you've committed some changes to your local repository and they've 17.470 +been pushed or pulled somewhere else, this isn't necessarily a 17.471 +disaster. You can protect yourself ahead of time against some classes 17.472 +of bad changeset. This is particularly easy if your team usually 17.473 +pulls changes from a central repository. 17.474 + 17.475 +By configuring some hooks on that repository to validate incoming 17.476 +changesets (see chapter~\ref{chap:hook}), you can automatically 17.477 +prevent some kinds of bad changeset from being pushed to the central 17.478 +repository at all. With such a configuration in place, some kinds of 17.479 +bad changeset will naturally tend to ``die out'' because they can't 17.480 +propagate into the central repository. Better yet, this happens 17.481 +without any need for explicit intervention. 17.482 + 17.483 +For instance, an incoming change hook that verifies that a changeset 17.484 +will actually compile can prevent people from inadvertantly ``breaking 17.485 +the build''. 17.486 + 17.487 +\section{Finding the source of a bug} 17.488 +\label{sec:undo:bisect} 17.489 + 17.490 +While it's all very well to be able to back out a changeset that 17.491 +introduced a bug, this requires that you know which changeset to back 17.492 +out. Mercurial provides an invaluable command, called 17.493 +\hgcmd{bisect}, that helps you to automate this process and accomplish 17.494 +it very efficiently. 17.495 + 17.496 +The idea behind the \hgcmd{bisect} command is that a changeset has 17.497 +introduced some change of behaviour that you can identify with a 17.498 +simple binary test. You don't know which piece of code introduced the 17.499 +change, but you know how to test for the presence of the bug. The 17.500 +\hgcmd{bisect} command uses your test to direct its search for the 17.501 +changeset that introduced the code that caused the bug. 17.502 + 17.503 +Here are a few scenarios to help you understand how you might apply 17.504 +this command. 17.505 +\begin{itemize} 17.506 +\item The most recent version of your software has a bug that you 17.507 + remember wasn't present a few weeks ago, but you don't know when it 17.508 + was introduced. Here, your binary test checks for the presence of 17.509 + that bug. 17.510 +\item You fixed a bug in a rush, and now it's time to close the entry 17.511 + in your team's bug database. The bug database requires a changeset 17.512 + ID when you close an entry, but you don't remember which changeset 17.513 + you fixed the bug in. Once again, your binary test checks for the 17.514 + presence of the bug. 17.515 +\item Your software works correctly, but runs~15\% slower than the 17.516 + last time you measured it. You want to know which changeset 17.517 + introduced the performance regression. In this case, your binary 17.518 + test measures the performance of your software, to see whether it's 17.519 + ``fast'' or ``slow''. 17.520 +\item The sizes of the components of your project that you ship 17.521 + exploded recently, and you suspect that something changed in the way 17.522 + you build your project. 17.523 +\end{itemize} 17.524 + 17.525 +From these examples, it should be clear that the \hgcmd{bisect} 17.526 +command is not useful only for finding the sources of bugs. You can 17.527 +use it to find any ``emergent property'' of a repository (anything 17.528 +that you can't find from a simple text search of the files in the 17.529 +tree) for which you can write a binary test. 17.530 + 17.531 +We'll introduce a little bit of terminology here, just to make it 17.532 +clear which parts of the search process are your responsibility, and 17.533 +which are Mercurial's. A \emph{test} is something that \emph{you} run 17.534 +when \hgcmd{bisect} chooses a changeset. A \emph{probe} is what 17.535 +\hgcmd{bisect} runs to tell whether a revision is good. Finally, 17.536 +we'll use the word ``bisect'', as both a noun and a verb, to stand in 17.537 +for the phrase ``search using the \hgcmd{bisect} command. 17.538 + 17.539 +One simple way to automate the searching process would be simply to 17.540 +probe every changeset. However, this scales poorly. If it took ten 17.541 +minutes to test a single changeset, and you had 10,000 changesets in 17.542 +your repository, the exhaustive approach would take on average~35 17.543 +\emph{days} to find the changeset that introduced a bug. Even if you 17.544 +knew that the bug was introduced by one of the last 500 changesets, 17.545 +and limited your search to those, you'd still be looking at over 40 17.546 +hours to find the changeset that introduced your bug. 17.547 + 17.548 +What the \hgcmd{bisect} command does is use its knowledge of the 17.549 +``shape'' of your project's revision history to perform a search in 17.550 +time proportional to the \emph{logarithm} of the number of changesets 17.551 +to check (the kind of search it performs is called a dichotomic 17.552 +search). With this approach, searching through 10,000 changesets will 17.553 +take less than three hours, even at ten minutes per test (the search 17.554 +will require about 14 tests). Limit your search to the last hundred 17.555 +changesets, and it will take only about an hour (roughly seven tests). 17.556 + 17.557 +The \hgcmd{bisect} command is aware of the ``branchy'' nature of a 17.558 +Mercurial project's revision history, so it has no problems dealing 17.559 +with branches, merges, or multiple heads in a repository. It can 17.560 +prune entire branches of history with a single probe, which is how it 17.561 +operates so efficiently. 17.562 + 17.563 +\subsection{Using the \hgcmd{bisect} command} 17.564 + 17.565 +Here's an example of \hgcmd{bisect} in action. 17.566 + 17.567 +\begin{note} 17.568 + In versions 0.9.5 and earlier of Mercurial, \hgcmd{bisect} was not a 17.569 + core command: it was distributed with Mercurial as an extension. 17.570 + This section describes the built-in command, not the old extension. 17.571 +\end{note} 17.572 + 17.573 +Now let's create a repository, so that we can try out the 17.574 +\hgcmd{bisect} command in isolation. 17.575 +\interaction{bisect.init} 17.576 +We'll simulate a project that has a bug in it in a simple-minded way: 17.577 +create trivial changes in a loop, and nominate one specific change 17.578 +that will have the ``bug''. This loop creates 35 changesets, each 17.579 +adding a single file to the repository. We'll represent our ``bug'' 17.580 +with a file that contains the text ``i have a gub''. 17.581 +\interaction{bisect.commits} 17.582 + 17.583 +The next thing that we'd like to do is figure out how to use the 17.584 +\hgcmd{bisect} command. We can use Mercurial's normal built-in help 17.585 +mechanism for this. 17.586 +\interaction{bisect.help} 17.587 + 17.588 +The \hgcmd{bisect} command works in steps. Each step proceeds as follows. 17.589 +\begin{enumerate} 17.590 +\item You run your binary test. 17.591 + \begin{itemize} 17.592 + \item If the test succeeded, you tell \hgcmd{bisect} by running the 17.593 + \hgcmdargs{bisect}{good} command. 17.594 + \item If it failed, run the \hgcmdargs{bisect}{--bad} command. 17.595 + \end{itemize} 17.596 +\item The command uses your information to decide which changeset to 17.597 + test next. 17.598 +\item It updates the working directory to that changeset, and the 17.599 + process begins again. 17.600 +\end{enumerate} 17.601 +The process ends when \hgcmd{bisect} identifies a unique changeset 17.602 +that marks the point where your test transitioned from ``succeeding'' 17.603 +to ``failing''. 17.604 + 17.605 +To start the search, we must run the \hgcmdargs{bisect}{--reset} command. 17.606 +\interaction{bisect.search.init} 17.607 + 17.608 +In our case, the binary test we use is simple: we check to see if any 17.609 +file in the repository contains the string ``i have a gub''. If it 17.610 +does, this changeset contains the change that ``caused the bug''. By 17.611 +convention, a changeset that has the property we're searching for is 17.612 +``bad'', while one that doesn't is ``good''. 17.613 + 17.614 +Most of the time, the revision to which the working directory is 17.615 +synced (usually the tip) already exhibits the problem introduced by 17.616 +the buggy change, so we'll mark it as ``bad''. 17.617 +\interaction{bisect.search.bad-init} 17.618 + 17.619 +Our next task is to nominate a changeset that we know \emph{doesn't} 17.620 +have the bug; the \hgcmd{bisect} command will ``bracket'' its search 17.621 +between the first pair of good and bad changesets. In our case, we 17.622 +know that revision~10 didn't have the bug. (I'll have more words 17.623 +about choosing the first ``good'' changeset later.) 17.624 +\interaction{bisect.search.good-init} 17.625 + 17.626 +Notice that this command printed some output. 17.627 +\begin{itemize} 17.628 +\item It told us how many changesets it must consider before it can 17.629 + identify the one that introduced the bug, and how many tests that 17.630 + will require. 17.631 +\item It updated the working directory to the next changeset to test, 17.632 + and told us which changeset it's testing. 17.633 +\end{itemize} 17.634 + 17.635 +We now run our test in the working directory. We use the 17.636 +\command{grep} command to see if our ``bad'' file is present in the 17.637 +working directory. If it is, this revision is bad; if not, this 17.638 +revision is good. 17.639 +\interaction{bisect.search.step1} 17.640 + 17.641 +This test looks like a perfect candidate for automation, so let's turn 17.642 +it into a shell function. 17.643 +\interaction{bisect.search.mytest} 17.644 +We can now run an entire test step with a single command, 17.645 +\texttt{mytest}. 17.646 +\interaction{bisect.search.step2} 17.647 +A few more invocations of our canned test step command, and we're 17.648 +done. 17.649 +\interaction{bisect.search.rest} 17.650 + 17.651 +Even though we had~40 changesets to search through, the \hgcmd{bisect} 17.652 +command let us find the changeset that introduced our ``bug'' with 17.653 +only five tests. Because the number of tests that the \hgcmd{bisect} 17.654 +command performs grows logarithmically with the number of changesets to 17.655 +search, the advantage that it has over the ``brute force'' search 17.656 +approach increases with every changeset you add. 17.657 + 17.658 +\subsection{Cleaning up after your search} 17.659 + 17.660 +When you're finished using the \hgcmd{bisect} command in a 17.661 +repository, you can use the \hgcmdargs{bisect}{reset} command to drop 17.662 +the information it was using to drive your search. The command 17.663 +doesn't use much space, so it doesn't matter if you forget to run this 17.664 +command. However, \hgcmd{bisect} won't let you start a new search in 17.665 +that repository until you do a \hgcmdargs{bisect}{reset}. 17.666 +\interaction{bisect.search.reset} 17.667 + 17.668 +\section{Tips for finding bugs effectively} 17.669 + 17.670 +\subsection{Give consistent input} 17.671 + 17.672 +The \hgcmd{bisect} command requires that you correctly report the 17.673 +result of every test you perform. If you tell it that a test failed 17.674 +when it really succeeded, it \emph{might} be able to detect the 17.675 +inconsistency. If it can identify an inconsistency in your reports, 17.676 +it will tell you that a particular changeset is both good and bad. 17.677 +However, it can't do this perfectly; it's about as likely to report 17.678 +the wrong changeset as the source of the bug. 17.679 + 17.680 +\subsection{Automate as much as possible} 17.681 + 17.682 +When I started using the \hgcmd{bisect} command, I tried a few times 17.683 +to run my tests by hand, on the command line. This is an approach 17.684 +that I, at least, am not suited to. After a few tries, I found that I 17.685 +was making enough mistakes that I was having to restart my searches 17.686 +several times before finally getting correct results. 17.687 + 17.688 +My initial problems with driving the \hgcmd{bisect} command by hand 17.689 +occurred even with simple searches on small repositories; if the 17.690 +problem you're looking for is more subtle, or the number of tests that 17.691 +\hgcmd{bisect} must perform increases, the likelihood of operator 17.692 +error ruining the search is much higher. Once I started automating my 17.693 +tests, I had much better results. 17.694 + 17.695 +The key to automated testing is twofold: 17.696 +\begin{itemize} 17.697 +\item always test for the same symptom, and 17.698 +\item always feed consistent input to the \hgcmd{bisect} command. 17.699 +\end{itemize} 17.700 +In my tutorial example above, the \command{grep} command tests for the 17.701 +symptom, and the \texttt{if} statement takes the result of this check 17.702 +and ensures that we always feed the same input to the \hgcmd{bisect} 17.703 +command. The \texttt{mytest} function marries these together in a 17.704 +reproducible way, so that every test is uniform and consistent. 17.705 + 17.706 +\subsection{Check your results} 17.707 + 17.708 +Because the output of a \hgcmd{bisect} search is only as good as the 17.709 +input you give it, don't take the changeset it reports as the 17.710 +absolute truth. A simple way to cross-check its report is to manually 17.711 +run your test at each of the following changesets: 17.712 +\begin{itemize} 17.713 +\item The changeset that it reports as the first bad revision. Your 17.714 + test should still report this as bad. 17.715 +\item The parent of that changeset (either parent, if it's a merge). 17.716 + Your test should report this changeset as good. 17.717 +\item A child of that changeset. Your test should report this 17.718 + changeset as bad. 17.719 +\end{itemize} 17.720 + 17.721 +\subsection{Beware interference between bugs} 17.722 + 17.723 +It's possible that your search for one bug could be disrupted by the 17.724 +presence of another. For example, let's say your software crashes at 17.725 +revision 100, and worked correctly at revision 50. Unknown to you, 17.726 +someone else introduced a different crashing bug at revision 60, and 17.727 +fixed it at revision 80. This could distort your results in one of 17.728 +several ways. 17.729 + 17.730 +It is possible that this other bug completely ``masks'' yours, which 17.731 +is to say that it occurs before your bug has a chance to manifest 17.732 +itself. If you can't avoid that other bug (for example, it prevents 17.733 +your project from building), and so can't tell whether your bug is 17.734 +present in a particular changeset, the \hgcmd{bisect} command cannot 17.735 +help you directly. Instead, you can mark a changeset as untested by 17.736 +running \hgcmdargs{bisect}{--skip}. 17.737 + 17.738 +A different problem could arise if your test for a bug's presence is 17.739 +not specific enough. If you check for ``my program crashes'', then 17.740 +both your crashing bug and an unrelated crashing bug that masks it 17.741 +will look like the same thing, and mislead \hgcmd{bisect}. 17.742 + 17.743 +Another useful situation in which to use \hgcmdargs{bisect}{--skip} is 17.744 +if you can't test a revision because your project was in a broken and 17.745 +hence untestable state at that revision, perhaps because someone 17.746 +checked in a change that prevented the project from building. 17.747 + 17.748 +\subsection{Bracket your search lazily} 17.749 + 17.750 +Choosing the first ``good'' and ``bad'' changesets that will mark the 17.751 +end points of your search is often easy, but it bears a little 17.752 +discussion nevertheless. From the perspective of \hgcmd{bisect}, the 17.753 +``newest'' changeset is conventionally ``bad'', and the older 17.754 +changeset is ``good''. 17.755 + 17.756 +If you're having trouble remembering when a suitable ``good'' change 17.757 +was, so that you can tell \hgcmd{bisect}, you could do worse than 17.758 +testing changesets at random. Just remember to eliminate contenders 17.759 +that can't possibly exhibit the bug (perhaps because the feature with 17.760 +the bug isn't present yet) and those where another problem masks the 17.761 +bug (as I discussed above). 17.762 + 17.763 +Even if you end up ``early'' by thousands of changesets or months of 17.764 +history, you will only add a handful of tests to the total number that 17.765 +\hgcmd{bisect} must perform, thanks to its logarithmic behaviour. 17.766 + 17.767 +%%% Local Variables: 17.768 +%%% mode: latex 17.769 +%%% TeX-master: "00book" 17.770 +%%% End:
18.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 18.2 +++ b/en/ch10-hook.tex Thu Jan 29 22:56:27 2009 -0800 18.3 @@ -0,0 +1,1413 @@ 18.4 +\chapter{Handling repository events with hooks} 18.5 +\label{chap:hook} 18.6 + 18.7 +Mercurial offers a powerful mechanism to let you perform automated 18.8 +actions in response to events that occur in a repository. In some 18.9 +cases, you can even control Mercurial's response to those events. 18.10 + 18.11 +The name Mercurial uses for one of these actions is a \emph{hook}. 18.12 +Hooks are called ``triggers'' in some revision control systems, but 18.13 +the two names refer to the same idea. 18.14 + 18.15 +\section{An overview of hooks in Mercurial} 18.16 + 18.17 +Here is a brief list of the hooks that Mercurial supports. We will 18.18 +revisit each of these hooks in more detail later, in 18.19 +section~\ref{sec:hook:ref}. 18.20 + 18.21 +\begin{itemize} 18.22 +\item[\small\hook{changegroup}] This is run after a group of 18.23 + changesets has been brought into the repository from elsewhere. 18.24 +\item[\small\hook{commit}] This is run after a new changeset has been 18.25 + created in the local repository. 18.26 +\item[\small\hook{incoming}] This is run once for each new changeset 18.27 + that is brought into the repository from elsewhere. Notice the 18.28 + difference from \hook{changegroup}, which is run once per 18.29 + \emph{group} of changesets brought in. 18.30 +\item[\small\hook{outgoing}] This is run after a group of changesets 18.31 + has been transmitted from this repository. 18.32 +\item[\small\hook{prechangegroup}] This is run before starting to 18.33 + bring a group of changesets into the repository. 18.34 +\item[\small\hook{precommit}] Controlling. This is run before starting 18.35 + a commit. 18.36 +\item[\small\hook{preoutgoing}] Controlling. This is run before 18.37 + starting to transmit a group of changesets from this repository. 18.38 +\item[\small\hook{pretag}] Controlling. This is run before creating a tag. 18.39 +\item[\small\hook{pretxnchangegroup}] Controlling. This is run after a 18.40 + group of changesets has been brought into the local repository from 18.41 + another, but before the transaction completes that will make the 18.42 + changes permanent in the repository. 18.43 +\item[\small\hook{pretxncommit}] Controlling. This is run after a new 18.44 + changeset has been created in the local repository, but before the 18.45 + transaction completes that will make it permanent. 18.46 +\item[\small\hook{preupdate}] Controlling. This is run before starting 18.47 + an update or merge of the working directory. 18.48 +\item[\small\hook{tag}] This is run after a tag is created. 18.49 +\item[\small\hook{update}] This is run after an update or merge of the 18.50 + working directory has finished. 18.51 +\end{itemize} 18.52 +Each of the hooks whose description begins with the word 18.53 +``Controlling'' has the ability to determine whether an activity can 18.54 +proceed. If the hook succeeds, the activity may proceed; if it fails, 18.55 +the activity is either not permitted or undone, depending on the hook. 18.56 + 18.57 +\section{Hooks and security} 18.58 + 18.59 +\subsection{Hooks are run with your privileges} 18.60 + 18.61 +When you run a Mercurial command in a repository, and the command 18.62 +causes a hook to run, that hook runs on \emph{your} system, under 18.63 +\emph{your} user account, with \emph{your} privilege level. Since 18.64 +hooks are arbitrary pieces of executable code, you should treat them 18.65 +with an appropriate level of suspicion. Do not install a hook unless 18.66 +you are confident that you know who created it and what it does. 18.67 + 18.68 +In some cases, you may be exposed to hooks that you did not install 18.69 +yourself. If you work with Mercurial on an unfamiliar system, 18.70 +Mercurial will run hooks defined in that system's global \hgrc\ file. 18.71 + 18.72 +If you are working with a repository owned by another user, Mercurial 18.73 +can run hooks defined in that user's repository, but it will still run 18.74 +them as ``you''. For example, if you \hgcmd{pull} from that 18.75 +repository, and its \sfilename{.hg/hgrc} defines a local 18.76 +\hook{outgoing} hook, that hook will run under your user account, even 18.77 +though you don't own that repository. 18.78 + 18.79 +\begin{note} 18.80 + This only applies if you are pulling from a repository on a local or 18.81 + network filesystem. If you're pulling over http or ssh, any 18.82 + \hook{outgoing} hook will run under whatever account is executing 18.83 + the server process, on the server. 18.84 +\end{note} 18.85 + 18.86 +XXX To see what hooks are defined in a repository, use the 18.87 +\hgcmdargs{config}{hooks} command. If you are working in one 18.88 +repository, but talking to another that you do not own (e.g.~using 18.89 +\hgcmd{pull} or \hgcmd{incoming}), remember that it is the other 18.90 +repository's hooks you should be checking, not your own. 18.91 + 18.92 +\subsection{Hooks do not propagate} 18.93 + 18.94 +In Mercurial, hooks are not revision controlled, and do not propagate 18.95 +when you clone, or pull from, a repository. The reason for this is 18.96 +simple: a hook is a completely arbitrary piece of executable code. It 18.97 +runs under your user identity, with your privilege level, on your 18.98 +machine. 18.99 + 18.100 +It would be extremely reckless for any distributed revision control 18.101 +system to implement revision-controlled hooks, as this would offer an 18.102 +easily exploitable way to subvert the accounts of users of the 18.103 +revision control system. 18.104 + 18.105 +Since Mercurial does not propagate hooks, if you are collaborating 18.106 +with other people on a common project, you should not assume that they 18.107 +are using the same Mercurial hooks as you are, or that theirs are 18.108 +correctly configured. You should document the hooks you expect people 18.109 +to use. 18.110 + 18.111 +In a corporate intranet, this is somewhat easier to control, as you 18.112 +can for example provide a ``standard'' installation of Mercurial on an 18.113 +NFS filesystem, and use a site-wide \hgrc\ file to define hooks that 18.114 +all users will see. However, this too has its limits; see below. 18.115 + 18.116 +\subsection{Hooks can be overridden} 18.117 + 18.118 +Mercurial allows you to override a hook definition by redefining the 18.119 +hook. You can disable it by setting its value to the empty string, or 18.120 +change its behaviour as you wish. 18.121 + 18.122 +If you deploy a system-~or site-wide \hgrc\ file that defines some 18.123 +hooks, you should thus understand that your users can disable or 18.124 +override those hooks. 18.125 + 18.126 +\subsection{Ensuring that critical hooks are run} 18.127 + 18.128 +Sometimes you may want to enforce a policy that you do not want others 18.129 +to be able to work around. For example, you may have a requirement 18.130 +that every changeset must pass a rigorous set of tests. Defining this 18.131 +requirement via a hook in a site-wide \hgrc\ won't work for remote 18.132 +users on laptops, and of course local users can subvert it at will by 18.133 +overriding the hook. 18.134 + 18.135 +Instead, you can set up your policies for use of Mercurial so that 18.136 +people are expected to propagate changes through a well-known 18.137 +``canonical'' server that you have locked down and configured 18.138 +appropriately. 18.139 + 18.140 +One way to do this is via a combination of social engineering and 18.141 +technology. Set up a restricted-access account; users can push 18.142 +changes over the network to repositories managed by this account, but 18.143 +they cannot log into the account and run normal shell commands. In 18.144 +this scenario, a user can commit a changeset that contains any old 18.145 +garbage they want. 18.146 + 18.147 +When someone pushes a changeset to the server that everyone pulls 18.148 +from, the server will test the changeset before it accepts it as 18.149 +permanent, and reject it if it fails to pass the test suite. If 18.150 +people only pull changes from this filtering server, it will serve to 18.151 +ensure that all changes that people pull have been automatically 18.152 +vetted. 18.153 + 18.154 +\section{Care with \texttt{pretxn} hooks in a shared-access repository} 18.155 + 18.156 +If you want to use hooks to do some automated work in a repository 18.157 +that a number of people have shared access to, you need to be careful 18.158 +in how you do this. 18.159 + 18.160 +Mercurial only locks a repository when it is writing to the 18.161 +repository, and only the parts of Mercurial that write to the 18.162 +repository pay attention to locks. Write locks are necessary to 18.163 +prevent multiple simultaneous writers from scribbling on each other's 18.164 +work, corrupting the repository. 18.165 + 18.166 +Because Mercurial is careful with the order in which it reads and 18.167 +writes data, it does not need to acquire a lock when it wants to read 18.168 +data from the repository. The parts of Mercurial that read from the 18.169 +repository never pay attention to locks. This lockless reading scheme 18.170 +greatly increases performance and concurrency. 18.171 + 18.172 +With great performance comes a trade-off, though, one which has the 18.173 +potential to cause you trouble unless you're aware of it. To describe 18.174 +this requires a little detail about how Mercurial adds changesets to a 18.175 +repository and reads those changes. 18.176 + 18.177 +When Mercurial \emph{writes} metadata, it writes it straight into the 18.178 +destination file. It writes file data first, then manifest data 18.179 +(which contains pointers to the new file data), then changelog data 18.180 +(which contains pointers to the new manifest data). Before the first 18.181 +write to each file, it stores a record of where the end of the file 18.182 +was in its transaction log. If the transaction must be rolled back, 18.183 +Mercurial simply truncates each file back to the size it was before the 18.184 +transaction began. 18.185 + 18.186 +When Mercurial \emph{reads} metadata, it reads the changelog first, 18.187 +then everything else. Since a reader will only access parts of the 18.188 +manifest or file metadata that it can see in the changelog, it can 18.189 +never see partially written data. 18.190 + 18.191 +Some controlling hooks (\hook{pretxncommit} and 18.192 +\hook{pretxnchangegroup}) run when a transaction is almost complete. 18.193 +All of the metadata has been written, but Mercurial can still roll the 18.194 +transaction back and cause the newly-written data to disappear. 18.195 + 18.196 +If one of these hooks runs for long, it opens a window of time during 18.197 +which a reader can see the metadata for changesets that are not yet 18.198 +permanent, and should not be thought of as ``really there''. The 18.199 +longer the hook runs, the longer that window is open. 18.200 + 18.201 +\subsection{The problem illustrated} 18.202 + 18.203 +In principle, a good use for the \hook{pretxnchangegroup} hook would 18.204 +be to automatically build and test incoming changes before they are 18.205 +accepted into a central repository. This could let you guarantee that 18.206 +nobody can push changes to this repository that ``break the build''. 18.207 +But if a client can pull changes while they're being tested, the 18.208 +usefulness of the test is zero; an unsuspecting someone can pull 18.209 +untested changes, potentially breaking their build. 18.210 + 18.211 +The safest technological answer to this challenge is to set up such a 18.212 +``gatekeeper'' repository as \emph{unidirectional}. Let it take 18.213 +changes pushed in from the outside, but do not allow anyone to pull 18.214 +changes from it (use the \hook{preoutgoing} hook to lock it down). 18.215 +Configure a \hook{changegroup} hook so that if a build or test 18.216 +succeeds, the hook will push the new changes out to another repository 18.217 +that people \emph{can} pull from. 18.218 + 18.219 +In practice, putting a centralised bottleneck like this in place is 18.220 +not often a good idea, and transaction visibility has nothing to do 18.221 +with the problem. As the size of a project---and the time it takes to 18.222 +build and test---grows, you rapidly run into a wall with this ``try 18.223 +before you buy'' approach, where you have more changesets to test than 18.224 +time in which to deal with them. The inevitable result is frustration 18.225 +on the part of all involved. 18.226 + 18.227 +An approach that scales better is to get people to build and test 18.228 +before they push, then run automated builds and tests centrally 18.229 +\emph{after} a push, to be sure all is well. The advantage of this 18.230 +approach is that it does not impose a limit on the rate at which the 18.231 +repository can accept changes. 18.232 + 18.233 +\section{A short tutorial on using hooks} 18.234 +\label{sec:hook:simple} 18.235 + 18.236 +It is easy to write a Mercurial hook. Let's start with a hook that 18.237 +runs when you finish a \hgcmd{commit}, and simply prints the hash of 18.238 +the changeset you just created. The hook is called \hook{commit}. 18.239 + 18.240 +\begin{figure}[ht] 18.241 + \interaction{hook.simple.init} 18.242 + \caption{A simple hook that runs when a changeset is committed} 18.243 + \label{ex:hook:init} 18.244 +\end{figure} 18.245 + 18.246 +All hooks follow the pattern in example~\ref{ex:hook:init}. You add 18.247 +an entry to the \rcsection{hooks} section of your \hgrc. On the left 18.248 +is the name of the event to trigger on; on the right is the action to 18.249 +take. As you can see, you can run an arbitrary shell command in a 18.250 +hook. Mercurial passes extra information to the hook using 18.251 +environment variables (look for \envar{HG\_NODE} in the example). 18.252 + 18.253 +\subsection{Performing multiple actions per event} 18.254 + 18.255 +Quite often, you will want to define more than one hook for a 18.256 +particular kind of event, as shown in example~\ref{ex:hook:ext}. 18.257 +Mercurial lets you do this by adding an \emph{extension} to the end of 18.258 +a hook's name. You extend a hook's name by giving the name of the 18.259 +hook, followed by a full stop (the ``\texttt{.}'' character), followed 18.260 +by some more text of your choosing. For example, Mercurial will run 18.261 +both \texttt{commit.foo} and \texttt{commit.bar} when the 18.262 +\texttt{commit} event occurs. 18.263 + 18.264 +\begin{figure}[ht] 18.265 + \interaction{hook.simple.ext} 18.266 + \caption{Defining a second \hook{commit} hook} 18.267 + \label{ex:hook:ext} 18.268 +\end{figure} 18.269 + 18.270 +To give a well-defined order of execution when there are multiple 18.271 +hooks defined for an event, Mercurial sorts hooks by extension, and 18.272 +executes the hook commands in this sorted order. In the above 18.273 +example, it will execute \texttt{commit.bar} before 18.274 +\texttt{commit.foo}, and \texttt{commit} before both. 18.275 + 18.276 +It is a good idea to use a somewhat descriptive extension when you 18.277 +define a new hook. This will help you to remember what the hook was 18.278 +for. If the hook fails, you'll get an error message that contains the 18.279 +hook name and extension, so using a descriptive extension could give 18.280 +you an immediate hint as to why the hook failed (see 18.281 +section~\ref{sec:hook:perm} for an example). 18.282 + 18.283 +\subsection{Controlling whether an activity can proceed} 18.284 +\label{sec:hook:perm} 18.285 + 18.286 +In our earlier examples, we used the \hook{commit} hook, which is 18.287 +run after a commit has completed. This is one of several Mercurial 18.288 +hooks that run after an activity finishes. Such hooks have no way of 18.289 +influencing the activity itself. 18.290 + 18.291 +Mercurial defines a number of events that occur before an activity 18.292 +starts; or after it starts, but before it finishes. Hooks that 18.293 +trigger on these events have the added ability to choose whether the 18.294 +activity can continue, or will abort. 18.295 + 18.296 +The \hook{pretxncommit} hook runs after a commit has all but 18.297 +completed. In other words, the metadata representing the changeset 18.298 +has been written out to disk, but the transaction has not yet been 18.299 +allowed to complete. The \hook{pretxncommit} hook has the ability to 18.300 +decide whether the transaction can complete, or must be rolled back. 18.301 + 18.302 +If the \hook{pretxncommit} hook exits with a status code of zero, the 18.303 +transaction is allowed to complete; the commit finishes; and the 18.304 +\hook{commit} hook is run. If the \hook{pretxncommit} hook exits with 18.305 +a non-zero status code, the transaction is rolled back; the metadata 18.306 +representing the changeset is erased; and the \hook{commit} hook is 18.307 +not run. 18.308 + 18.309 +\begin{figure}[ht] 18.310 + \interaction{hook.simple.pretxncommit} 18.311 + \caption{Using the \hook{pretxncommit} hook to control commits} 18.312 + \label{ex:hook:pretxncommit} 18.313 +\end{figure} 18.314 + 18.315 +The hook in example~\ref{ex:hook:pretxncommit} checks that a commit 18.316 +comment contains a bug ID. If it does, the commit can complete. If 18.317 +not, the commit is rolled back. 18.318 + 18.319 +\section{Writing your own hooks} 18.320 + 18.321 +When you are writing a hook, you might find it useful to run Mercurial 18.322 +either with the \hggopt{-v} option, or the \rcitem{ui}{verbose} config 18.323 +item set to ``true''. When you do so, Mercurial will print a message 18.324 +before it calls each hook. 18.325 + 18.326 +\subsection{Choosing how your hook should run} 18.327 +\label{sec:hook:lang} 18.328 + 18.329 +You can write a hook either as a normal program---typically a shell 18.330 +script---or as a Python function that is executed within the Mercurial 18.331 +process. 18.332 + 18.333 +Writing a hook as an external program has the advantage that it 18.334 +requires no knowledge of Mercurial's internals. You can call normal 18.335 +Mercurial commands to get any added information you need. The 18.336 +trade-off is that external hooks are slower than in-process hooks. 18.337 + 18.338 +An in-process Python hook has complete access to the Mercurial API, 18.339 +and does not ``shell out'' to another process, so it is inherently 18.340 +faster than an external hook. It is also easier to obtain much of the 18.341 +information that a hook requires by using the Mercurial API than by 18.342 +running Mercurial commands. 18.343 + 18.344 +If you are comfortable with Python, or require high performance, 18.345 +writing your hooks in Python may be a good choice. However, when you 18.346 +have a straightforward hook to write and you don't need to care about 18.347 +performance (probably the majority of hooks), a shell script is 18.348 +perfectly fine. 18.349 + 18.350 +\subsection{Hook parameters} 18.351 +\label{sec:hook:param} 18.352 + 18.353 +Mercurial calls each hook with a set of well-defined parameters. In 18.354 +Python, a parameter is passed as a keyword argument to your hook 18.355 +function. For an external program, a parameter is passed as an 18.356 +environment variable. 18.357 + 18.358 +Whether your hook is written in Python or as a shell script, the 18.359 +hook-specific parameter names and values will be the same. A boolean 18.360 +parameter will be represented as a boolean value in Python, but as the 18.361 +number 1 (for ``true'') or 0 (for ``false'') as an environment 18.362 +variable for an external hook. If a hook parameter is named 18.363 +\texttt{foo}, the keyword argument for a Python hook will also be 18.364 +named \texttt{foo}, while the environment variable for an external 18.365 +hook will be named \texttt{HG\_FOO}. 18.366 + 18.367 +\subsection{Hook return values and activity control} 18.368 + 18.369 +A hook that executes successfully must exit with a status of zero if 18.370 +external, or return boolean ``false'' if in-process. Failure is 18.371 +indicated with a non-zero exit status from an external hook, or an 18.372 +in-process hook returning boolean ``true''. If an in-process hook 18.373 +raises an exception, the hook is considered to have failed. 18.374 + 18.375 +For a hook that controls whether an activity can proceed, zero/false 18.376 +means ``allow'', while non-zero/true/exception means ``deny''. 18.377 + 18.378 +\subsection{Writing an external hook} 18.379 + 18.380 +When you define an external hook in your \hgrc\ and the hook is run, 18.381 +its value is passed to your shell, which interprets it. This means 18.382 +that you can use normal shell constructs in the body of the hook. 18.383 + 18.384 +An executable hook is always run with its current directory set to a 18.385 +repository's root directory. 18.386 + 18.387 +Each hook parameter is passed in as an environment variable; the name 18.388 +is upper-cased, and prefixed with the string ``\texttt{HG\_}''. 18.389 + 18.390 +With the exception of hook parameters, Mercurial does not set or 18.391 +modify any environment variables when running a hook. This is useful 18.392 +to remember if you are writing a site-wide hook that may be run by a 18.393 +number of different users with differing environment variables set. 18.394 +In multi-user situations, you should not rely on environment variables 18.395 +being set to the values you have in your environment when testing the 18.396 +hook. 18.397 + 18.398 +\subsection{Telling Mercurial to use an in-process hook} 18.399 + 18.400 +The \hgrc\ syntax for defining an in-process hook is slightly 18.401 +different than for an executable hook. The value of the hook must 18.402 +start with the text ``\texttt{python:}'', and continue with the 18.403 +fully-qualified name of a callable object to use as the hook's value. 18.404 + 18.405 +The module in which a hook lives is automatically imported when a hook 18.406 +is run. So long as you have the module name and \envar{PYTHONPATH} 18.407 +right, it should ``just work''. 18.408 + 18.409 +The following \hgrc\ example snippet illustrates the syntax and 18.410 +meaning of the notions we just described. 18.411 +\begin{codesample2} 18.412 + [hooks] 18.413 + commit.example = python:mymodule.submodule.myhook 18.414 +\end{codesample2} 18.415 +When Mercurial runs the \texttt{commit.example} hook, it imports 18.416 +\texttt{mymodule.submodule}, looks for the callable object named 18.417 +\texttt{myhook}, and calls it. 18.418 + 18.419 +\subsection{Writing an in-process hook} 18.420 + 18.421 +The simplest in-process hook does nothing, but illustrates the basic 18.422 +shape of the hook API: 18.423 +\begin{codesample2} 18.424 + def myhook(ui, repo, **kwargs): 18.425 + pass 18.426 +\end{codesample2} 18.427 +The first argument to a Python hook is always a 18.428 +\pymodclass{mercurial.ui}{ui} object. The second is a repository object; 18.429 +at the moment, it is always an instance of 18.430 +\pymodclass{mercurial.localrepo}{localrepository}. Following these two 18.431 +arguments are other keyword arguments. Which ones are passed in 18.432 +depends on the hook being called, but a hook can ignore arguments it 18.433 +doesn't care about by dropping them into a keyword argument dict, as 18.434 +with \texttt{**kwargs} above. 18.435 + 18.436 +\section{Some hook examples} 18.437 + 18.438 +\subsection{Writing meaningful commit messages} 18.439 + 18.440 +It's hard to imagine a useful commit message being very short. The 18.441 +simple \hook{pretxncommit} hook of figure~\ref{ex:hook:msglen.go} 18.442 +will prevent you from committing a changeset with a message that is 18.443 +less than ten bytes long. 18.444 + 18.445 +\begin{figure}[ht] 18.446 + \interaction{hook.msglen.go} 18.447 + \caption{A hook that forbids overly short commit messages} 18.448 + \label{ex:hook:msglen.go} 18.449 +\end{figure} 18.450 + 18.451 +\subsection{Checking for trailing whitespace} 18.452 + 18.453 +An interesting use of a commit-related hook is to help you to write 18.454 +cleaner code. A simple example of ``cleaner code'' is the dictum that 18.455 +a change should not add any new lines of text that contain ``trailing 18.456 +whitespace''. Trailing whitespace is a series of space and tab 18.457 +characters at the end of a line of text. In most cases, trailing 18.458 +whitespace is unnecessary, invisible noise, but it is occasionally 18.459 +problematic, and people often prefer to get rid of it. 18.460 + 18.461 +You can use either the \hook{precommit} or \hook{pretxncommit} hook to 18.462 +tell whether you have a trailing whitespace problem. If you use the 18.463 +\hook{precommit} hook, the hook will not know which files you are 18.464 +committing, so it will have to check every modified file in the 18.465 +repository for trailing white space. If you want to commit a change 18.466 +to just the file \filename{foo}, but the file \filename{bar} contains 18.467 +trailing whitespace, doing a check in the \hook{precommit} hook will 18.468 +prevent you from committing \filename{foo} due to the problem with 18.469 +\filename{bar}. This doesn't seem right. 18.470 + 18.471 +Should you choose the \hook{pretxncommit} hook, the check won't occur 18.472 +until just before the transaction for the commit completes. This will 18.473 +allow you to check for problems only the exact files that are being 18.474 +committed. However, if you entered the commit message interactively 18.475 +and the hook fails, the transaction will roll back; you'll have to 18.476 +re-enter the commit message after you fix the trailing whitespace and 18.477 +run \hgcmd{commit} again. 18.478 + 18.479 +\begin{figure}[ht] 18.480 + \interaction{hook.ws.simple} 18.481 + \caption{A simple hook that checks for trailing whitespace} 18.482 + \label{ex:hook:ws.simple} 18.483 +\end{figure} 18.484 + 18.485 +Figure~\ref{ex:hook:ws.simple} introduces a simple \hook{pretxncommit} 18.486 +hook that checks for trailing whitespace. This hook is short, but not 18.487 +very helpful. It exits with an error status if a change adds a line 18.488 +with trailing whitespace to any file, but does not print any 18.489 +information that might help us to identify the offending file or 18.490 +line. It also has the nice property of not paying attention to 18.491 +unmodified lines; only lines that introduce new trailing whitespace 18.492 +cause problems. 18.493 + 18.494 +\begin{figure}[ht] 18.495 + \interaction{hook.ws.better} 18.496 + \caption{A better trailing whitespace hook} 18.497 + \label{ex:hook:ws.better} 18.498 +\end{figure} 18.499 + 18.500 +The example of figure~\ref{ex:hook:ws.better} is much more complex, 18.501 +but also more useful. It parses a unified diff to see if any lines 18.502 +add trailing whitespace, and prints the name of the file and the line 18.503 +number of each such occurrence. Even better, if the change adds 18.504 +trailing whitespace, this hook saves the commit comment and prints the 18.505 +name of the save file before exiting and telling Mercurial to roll the 18.506 +transaction back, so you can use 18.507 +\hgcmdargs{commit}{\hgopt{commit}{-l}~\emph{filename}} to reuse the 18.508 +saved commit message once you've corrected the problem. 18.509 + 18.510 +As a final aside, note in figure~\ref{ex:hook:ws.better} the use of 18.511 +\command{perl}'s in-place editing feature to get rid of trailing 18.512 +whitespace from a file. This is concise and useful enough that I will 18.513 +reproduce it here. 18.514 +\begin{codesample2} 18.515 + perl -pi -e 's,\textbackslash{}s+\$,,' filename 18.516 +\end{codesample2} 18.517 + 18.518 +\section{Bundled hooks} 18.519 + 18.520 +Mercurial ships with several bundled hooks. You can find them in the 18.521 +\dirname{hgext} directory of a Mercurial source tree. If you are 18.522 +using a Mercurial binary package, the hooks will be located in the 18.523 +\dirname{hgext} directory of wherever your package installer put 18.524 +Mercurial. 18.525 + 18.526 +\subsection{\hgext{acl}---access control for parts of a repository} 18.527 + 18.528 +The \hgext{acl} extension lets you control which remote users are 18.529 +allowed to push changesets to a networked server. You can protect any 18.530 +portion of a repository (including the entire repo), so that a 18.531 +specific remote user can push changes that do not affect the protected 18.532 +portion. 18.533 + 18.534 +This extension implements access control based on the identity of the 18.535 +user performing a push, \emph{not} on who committed the changesets 18.536 +they're pushing. It makes sense to use this hook only if you have a 18.537 +locked-down server environment that authenticates remote users, and 18.538 +you want to be sure that only specific users are allowed to push 18.539 +changes to that server. 18.540 + 18.541 +\subsubsection{Configuring the \hook{acl} hook} 18.542 + 18.543 +In order to manage incoming changesets, the \hgext{acl} hook must be 18.544 +used as a \hook{pretxnchangegroup} hook. This lets it see which files 18.545 +are modified by each incoming changeset, and roll back a group of 18.546 +changesets if they modify ``forbidden'' files. Example: 18.547 +\begin{codesample2} 18.548 + [hooks] 18.549 + pretxnchangegroup.acl = python:hgext.acl.hook 18.550 +\end{codesample2} 18.551 + 18.552 +The \hgext{acl} extension is configured using three sections. 18.553 + 18.554 +The \rcsection{acl} section has only one entry, \rcitem{acl}{sources}, 18.555 +which lists the sources of incoming changesets that the hook should 18.556 +pay attention to. You don't normally need to configure this section. 18.557 +\begin{itemize} 18.558 +\item[\rcitem{acl}{serve}] Control incoming changesets that are arriving 18.559 + from a remote repository over http or ssh. This is the default 18.560 + value of \rcitem{acl}{sources}, and usually the only setting you'll 18.561 + need for this configuration item. 18.562 +\item[\rcitem{acl}{pull}] Control incoming changesets that are 18.563 + arriving via a pull from a local repository. 18.564 +\item[\rcitem{acl}{push}] Control incoming changesets that are 18.565 + arriving via a push from a local repository. 18.566 +\item[\rcitem{acl}{bundle}] Control incoming changesets that are 18.567 + arriving from another repository via a bundle. 18.568 +\end{itemize} 18.569 + 18.570 +The \rcsection{acl.allow} section controls the users that are allowed to 18.571 +add changesets to the repository. If this section is not present, all 18.572 +users that are not explicitly denied are allowed. If this section is 18.573 +present, all users that are not explicitly allowed are denied (so an 18.574 +empty section means that all users are denied). 18.575 + 18.576 +The \rcsection{acl.deny} section determines which users are denied 18.577 +from adding changesets to the repository. If this section is not 18.578 +present or is empty, no users are denied. 18.579 + 18.580 +The syntaxes for the \rcsection{acl.allow} and \rcsection{acl.deny} 18.581 +sections are identical. On the left of each entry is a glob pattern 18.582 +that matches files or directories, relative to the root of the 18.583 +repository; on the right, a user name. 18.584 + 18.585 +In the following example, the user \texttt{docwriter} can only push 18.586 +changes to the \dirname{docs} subtree of the repository, while 18.587 +\texttt{intern} can push changes to any file or directory except 18.588 +\dirname{source/sensitive}. 18.589 +\begin{codesample2} 18.590 + [acl.allow] 18.591 + docs/** = docwriter 18.592 + 18.593 + [acl.deny] 18.594 + source/sensitive/** = intern 18.595 +\end{codesample2} 18.596 + 18.597 +\subsubsection{Testing and troubleshooting} 18.598 + 18.599 +If you want to test the \hgext{acl} hook, run it with Mercurial's 18.600 +debugging output enabled. Since you'll probably be running it on a 18.601 +server where it's not convenient (or sometimes possible) to pass in 18.602 +the \hggopt{--debug} option, don't forget that you can enable 18.603 +debugging output in your \hgrc: 18.604 +\begin{codesample2} 18.605 + [ui] 18.606 + debug = true 18.607 +\end{codesample2} 18.608 +With this enabled, the \hgext{acl} hook will print enough information 18.609 +to let you figure out why it is allowing or forbidding pushes from 18.610 +specific users. 18.611 + 18.612 +\subsection{\hgext{bugzilla}---integration with Bugzilla} 18.613 + 18.614 +The \hgext{bugzilla} extension adds a comment to a Bugzilla bug 18.615 +whenever it finds a reference to that bug ID in a commit comment. You 18.616 +can install this hook on a shared server, so that any time a remote 18.617 +user pushes changes to this server, the hook gets run. 18.618 + 18.619 +It adds a comment to the bug that looks like this (you can configure 18.620 +the contents of the comment---see below): 18.621 +\begin{codesample2} 18.622 + Changeset aad8b264143a, made by Joe User <joe.user@domain.com> in 18.623 + the frobnitz repository, refers to this bug. 18.624 + 18.625 + For complete details, see 18.626 + http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a 18.627 + 18.628 + Changeset description: 18.629 + Fix bug 10483 by guarding against some NULL pointers 18.630 +\end{codesample2} 18.631 +The value of this hook is that it automates the process of updating a 18.632 +bug any time a changeset refers to it. If you configure the hook 18.633 +properly, it makes it easy for people to browse straight from a 18.634 +Bugzilla bug to a changeset that refers to that bug. 18.635 + 18.636 +You can use the code in this hook as a starting point for some more 18.637 +exotic Bugzilla integration recipes. Here are a few possibilities: 18.638 +\begin{itemize} 18.639 +\item Require that every changeset pushed to the server have a valid 18.640 + bug~ID in its commit comment. In this case, you'd want to configure 18.641 + the hook as a \hook{pretxncommit} hook. This would allow the hook 18.642 + to reject changes that didn't contain bug IDs. 18.643 +\item Allow incoming changesets to automatically modify the 18.644 + \emph{state} of a bug, as well as simply adding a comment. For 18.645 + example, the hook could recognise the string ``fixed bug 31337'' as 18.646 + indicating that it should update the state of bug 31337 to 18.647 + ``requires testing''. 18.648 +\end{itemize} 18.649 + 18.650 +\subsubsection{Configuring the \hook{bugzilla} hook} 18.651 +\label{sec:hook:bugzilla:config} 18.652 + 18.653 +You should configure this hook in your server's \hgrc\ as an 18.654 +\hook{incoming} hook, for example as follows: 18.655 +\begin{codesample2} 18.656 + [hooks] 18.657 + incoming.bugzilla = python:hgext.bugzilla.hook 18.658 +\end{codesample2} 18.659 + 18.660 +Because of the specialised nature of this hook, and because Bugzilla 18.661 +was not written with this kind of integration in mind, configuring 18.662 +this hook is a somewhat involved process. 18.663 + 18.664 +Before you begin, you must install the MySQL bindings for Python on 18.665 +the host(s) where you'll be running the hook. If this is not 18.666 +available as a binary package for your system, you can download it 18.667 +from~\cite{web:mysql-python}. 18.668 + 18.669 +Configuration information for this hook lives in the 18.670 +\rcsection{bugzilla} section of your \hgrc. 18.671 +\begin{itemize} 18.672 +\item[\rcitem{bugzilla}{version}] The version of Bugzilla installed on 18.673 + the server. The database schema that Bugzilla uses changes 18.674 + occasionally, so this hook has to know exactly which schema to use. 18.675 + At the moment, the only version supported is \texttt{2.16}. 18.676 +\item[\rcitem{bugzilla}{host}] The hostname of the MySQL server that 18.677 + stores your Bugzilla data. The database must be configured to allow 18.678 + connections from whatever host you are running the \hook{bugzilla} 18.679 + hook on. 18.680 +\item[\rcitem{bugzilla}{user}] The username with which to connect to 18.681 + the MySQL server. The database must be configured to allow this 18.682 + user to connect from whatever host you are running the 18.683 + \hook{bugzilla} hook on. This user must be able to access and 18.684 + modify Bugzilla tables. The default value of this item is 18.685 + \texttt{bugs}, which is the standard name of the Bugzilla user in a 18.686 + MySQL database. 18.687 +\item[\rcitem{bugzilla}{password}] The MySQL password for the user you 18.688 + configured above. This is stored as plain text, so you should make 18.689 + sure that unauthorised users cannot read the \hgrc\ file where you 18.690 + store this information. 18.691 +\item[\rcitem{bugzilla}{db}] The name of the Bugzilla database on the 18.692 + MySQL server. The default value of this item is \texttt{bugs}, 18.693 + which is the standard name of the MySQL database where Bugzilla 18.694 + stores its data. 18.695 +\item[\rcitem{bugzilla}{notify}] If you want Bugzilla to send out a 18.696 + notification email to subscribers after this hook has added a 18.697 + comment to a bug, you will need this hook to run a command whenever 18.698 + it updates the database. The command to run depends on where you 18.699 + have installed Bugzilla, but it will typically look something like 18.700 + this, if you have Bugzilla installed in 18.701 + \dirname{/var/www/html/bugzilla}: 18.702 + \begin{codesample4} 18.703 + cd /var/www/html/bugzilla && ./processmail %s nobody@nowhere.com 18.704 + \end{codesample4} 18.705 + The Bugzilla \texttt{processmail} program expects to be given a 18.706 + bug~ID (the hook replaces ``\texttt{\%s}'' with the bug~ID) and an 18.707 + email address. It also expects to be able to write to some files in 18.708 + the directory that it runs in. If Bugzilla and this hook are not 18.709 + installed on the same machine, you will need to find a way to run 18.710 + \texttt{processmail} on the server where Bugzilla is installed. 18.711 +\end{itemize} 18.712 + 18.713 +\subsubsection{Mapping committer names to Bugzilla user names} 18.714 + 18.715 +By default, the \hgext{bugzilla} hook tries to use the email address 18.716 +of a changeset's committer as the Bugzilla user name with which to 18.717 +update a bug. If this does not suit your needs, you can map committer 18.718 +email addresses to Bugzilla user names using a \rcsection{usermap} 18.719 +section. 18.720 + 18.721 +Each item in the \rcsection{usermap} section contains an email address 18.722 +on the left, and a Bugzilla user name on the right. 18.723 +\begin{codesample2} 18.724 + [usermap] 18.725 + jane.user@example.com = jane 18.726 +\end{codesample2} 18.727 +You can either keep the \rcsection{usermap} data in a normal \hgrc, or 18.728 +tell the \hgext{bugzilla} hook to read the information from an 18.729 +external \filename{usermap} file. In the latter case, you can store 18.730 +\filename{usermap} data by itself in (for example) a user-modifiable 18.731 +repository. This makes it possible to let your users maintain their 18.732 +own \rcitem{bugzilla}{usermap} entries. The main \hgrc\ file might 18.733 +look like this: 18.734 +\begin{codesample2} 18.735 + # regular hgrc file refers to external usermap file 18.736 + [bugzilla] 18.737 + usermap = /home/hg/repos/userdata/bugzilla-usermap.conf 18.738 +\end{codesample2} 18.739 +While the \filename{usermap} file that it refers to might look like 18.740 +this: 18.741 +\begin{codesample2} 18.742 + # bugzilla-usermap.conf - inside a hg repository 18.743 + [usermap] 18.744 + stephanie@example.com = steph 18.745 +\end{codesample2} 18.746 + 18.747 +\subsubsection{Configuring the text that gets added to a bug} 18.748 + 18.749 +You can configure the text that this hook adds as a comment; you 18.750 +specify it in the form of a Mercurial template. Several \hgrc\ 18.751 +entries (still in the \rcsection{bugzilla} section) control this 18.752 +behaviour. 18.753 +\begin{itemize} 18.754 +\item[\texttt{strip}] The number of leading path elements to strip 18.755 + from a repository's path name to construct a partial path for a URL. 18.756 + For example, if the repositories on your server live under 18.757 + \dirname{/home/hg/repos}, and you have a repository whose path is 18.758 + \dirname{/home/hg/repos/app/tests}, then setting \texttt{strip} to 18.759 + \texttt{4} will give a partial path of \dirname{app/tests}. The 18.760 + hook will make this partial path available when expanding a 18.761 + template, as \texttt{webroot}. 18.762 +\item[\texttt{template}] The text of the template to use. In addition 18.763 + to the usual changeset-related variables, this template can use 18.764 + \texttt{hgweb} (the value of the \texttt{hgweb} configuration item 18.765 + above) and \texttt{webroot} (the path constructed using 18.766 + \texttt{strip} above). 18.767 +\end{itemize} 18.768 + 18.769 +In addition, you can add a \rcitem{web}{baseurl} item to the 18.770 +\rcsection{web} section of your \hgrc. The \hgext{bugzilla} hook will 18.771 +make this available when expanding a template, as the base string to 18.772 +use when constructing a URL that will let users browse from a Bugzilla 18.773 +comment to view a changeset. Example: 18.774 +\begin{codesample2} 18.775 + [web] 18.776 + baseurl = http://hg.domain.com/ 18.777 +\end{codesample2} 18.778 + 18.779 +Here is an example set of \hgext{bugzilla} hook config information. 18.780 +\begin{codesample2} 18.781 + [bugzilla] 18.782 + host = bugzilla.example.com 18.783 + password = mypassword 18.784 + version = 2.16 18.785 + # server-side repos live in /home/hg/repos, so strip 4 leading 18.786 + # separators 18.787 + strip = 4 18.788 + hgweb = http://hg.example.com/ 18.789 + usermap = /home/hg/repos/notify/bugzilla.conf 18.790 + template = Changeset \{node|short\}, made by \{author\} in the \{webroot\} 18.791 + repo, refers to this bug.\\nFor complete details, see 18.792 + \{hgweb\}\{webroot\}?cmd=changeset;node=\{node|short\}\\nChangeset 18.793 + description:\\n\\t\{desc|tabindent\} 18.794 +\end{codesample2} 18.795 + 18.796 +\subsubsection{Testing and troubleshooting} 18.797 + 18.798 +The most common problems with configuring the \hgext{bugzilla} hook 18.799 +relate to running Bugzilla's \filename{processmail} script and mapping 18.800 +committer names to user names. 18.801 + 18.802 +Recall from section~\ref{sec:hook:bugzilla:config} above that the user 18.803 +that runs the Mercurial process on the server is also the one that 18.804 +will run the \filename{processmail} script. The 18.805 +\filename{processmail} script sometimes causes Bugzilla to write to 18.806 +files in its configuration directory, and Bugzilla's configuration 18.807 +files are usually owned by the user that your web server runs under. 18.808 + 18.809 +You can cause \filename{processmail} to be run with the suitable 18.810 +user's identity using the \command{sudo} command. Here is an example 18.811 +entry for a \filename{sudoers} file. 18.812 +\begin{codesample2} 18.813 + hg_user = (httpd_user) NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s 18.814 +\end{codesample2} 18.815 +This allows the \texttt{hg\_user} user to run a 18.816 +\filename{processmail-wrapper} program under the identity of 18.817 +\texttt{httpd\_user}. 18.818 + 18.819 +This indirection through a wrapper script is necessary, because 18.820 +\filename{processmail} expects to be run with its current directory 18.821 +set to wherever you installed Bugzilla; you can't specify that kind of 18.822 +constraint in a \filename{sudoers} file. The contents of the wrapper 18.823 +script are simple: 18.824 +\begin{codesample2} 18.825 + #!/bin/sh 18.826 + cd `dirname $0` && ./processmail "$1" nobody@example.com 18.827 +\end{codesample2} 18.828 +It doesn't seem to matter what email address you pass to 18.829 +\filename{processmail}. 18.830 + 18.831 +If your \rcsection{usermap} is not set up correctly, users will see an 18.832 +error message from the \hgext{bugzilla} hook when they push changes 18.833 +to the server. The error message will look like this: 18.834 +\begin{codesample2} 18.835 + cannot find bugzilla user id for john.q.public@example.com 18.836 +\end{codesample2} 18.837 +What this means is that the committer's address, 18.838 +\texttt{john.q.public@example.com}, is not a valid Bugzilla user name, 18.839 +nor does it have an entry in your \rcsection{usermap} that maps it to 18.840 +a valid Bugzilla user name. 18.841 + 18.842 +\subsection{\hgext{notify}---send email notifications} 18.843 + 18.844 +Although Mercurial's built-in web server provides RSS feeds of changes 18.845 +in every repository, many people prefer to receive change 18.846 +notifications via email. The \hgext{notify} hook lets you send out 18.847 +notifications to a set of email addresses whenever changesets arrive 18.848 +that those subscribers are interested in. 18.849 + 18.850 +As with the \hgext{bugzilla} hook, the \hgext{notify} hook is 18.851 +template-driven, so you can customise the contents of the notification 18.852 +messages that it sends. 18.853 + 18.854 +By default, the \hgext{notify} hook includes a diff of every changeset 18.855 +that it sends out; you can limit the size of the diff, or turn this 18.856 +feature off entirely. It is useful for letting subscribers review 18.857 +changes immediately, rather than clicking to follow a URL. 18.858 + 18.859 +\subsubsection{Configuring the \hgext{notify} hook} 18.860 + 18.861 +You can set up the \hgext{notify} hook to send one email message per 18.862 +incoming changeset, or one per incoming group of changesets (all those 18.863 +that arrived in a single pull or push). 18.864 +\begin{codesample2} 18.865 + [hooks] 18.866 + # send one email per group of changes 18.867 + changegroup.notify = python:hgext.notify.hook 18.868 + # send one email per change 18.869 + incoming.notify = python:hgext.notify.hook 18.870 +\end{codesample2} 18.871 + 18.872 +Configuration information for this hook lives in the 18.873 +\rcsection{notify} section of a \hgrc\ file. 18.874 +\begin{itemize} 18.875 +\item[\rcitem{notify}{test}] By default, this hook does not send out 18.876 + email at all; instead, it prints the message that it \emph{would} 18.877 + send. Set this item to \texttt{false} to allow email to be sent. 18.878 + The reason that sending of email is turned off by default is that it 18.879 + takes several tries to configure this extension exactly as you would 18.880 + like, and it would be bad form to spam subscribers with a number of 18.881 + ``broken'' notifications while you debug your configuration. 18.882 +\item[\rcitem{notify}{config}] The path to a configuration file that 18.883 + contains subscription information. This is kept separate from the 18.884 + main \hgrc\ so that you can maintain it in a repository of its own. 18.885 + People can then clone that repository, update their subscriptions, 18.886 + and push the changes back to your server. 18.887 +\item[\rcitem{notify}{strip}] The number of leading path separator 18.888 + characters to strip from a repository's path, when deciding whether 18.889 + a repository has subscribers. For example, if the repositories on 18.890 + your server live in \dirname{/home/hg/repos}, and \hgext{notify} is 18.891 + considering a repository named \dirname{/home/hg/repos/shared/test}, 18.892 + setting \rcitem{notify}{strip} to \texttt{4} will cause 18.893 + \hgext{notify} to trim the path it considers down to 18.894 + \dirname{shared/test}, and it will match subscribers against that. 18.895 +\item[\rcitem{notify}{template}] The template text to use when sending 18.896 + messages. This specifies both the contents of the message header 18.897 + and its body. 18.898 +\item[\rcitem{notify}{maxdiff}] The maximum number of lines of diff 18.899 + data to append to the end of a message. If a diff is longer than 18.900 + this, it is truncated. By default, this is set to 300. Set this to 18.901 + \texttt{0} to omit diffs from notification emails. 18.902 +\item[\rcitem{notify}{sources}] A list of sources of changesets to 18.903 + consider. This lets you limit \hgext{notify} to only sending out 18.904 + email about changes that remote users pushed into this repository 18.905 + via a server, for example. See section~\ref{sec:hook:sources} for 18.906 + the sources you can specify here. 18.907 +\end{itemize} 18.908 + 18.909 +If you set the \rcitem{web}{baseurl} item in the \rcsection{web} 18.910 +section, you can use it in a template; it will be available as 18.911 +\texttt{webroot}. 18.912 + 18.913 +Here is an example set of \hgext{notify} configuration information. 18.914 +\begin{codesample2} 18.915 + [notify] 18.916 + # really send email 18.917 + test = false 18.918 + # subscriber data lives in the notify repo 18.919 + config = /home/hg/repos/notify/notify.conf 18.920 + # repos live in /home/hg/repos on server, so strip 4 "/" chars 18.921 + strip = 4 18.922 + template = X-Hg-Repo: \{webroot\} 18.923 + Subject: \{webroot\}: \{desc|firstline|strip\} 18.924 + From: \{author\} 18.925 + 18.926 + changeset \{node|short\} in \{root\} 18.927 + details: \{baseurl\}\{webroot\}?cmd=changeset;node=\{node|short\} 18.928 + description: 18.929 + \{desc|tabindent|strip\} 18.930 + 18.931 + [web] 18.932 + baseurl = http://hg.example.com/ 18.933 +\end{codesample2} 18.934 + 18.935 +This will produce a message that looks like the following: 18.936 +\begin{codesample2} 18.937 + X-Hg-Repo: tests/slave 18.938 + Subject: tests/slave: Handle error case when slave has no buffers 18.939 + Date: Wed, 2 Aug 2006 15:25:46 -0700 (PDT) 18.940 + 18.941 + changeset 3cba9bfe74b5 in /home/hg/repos/tests/slave 18.942 + details: http://hg.example.com/tests/slave?cmd=changeset;node=3cba9bfe74b5 18.943 + description: 18.944 + Handle error case when slave has no buffers 18.945 + diffs (54 lines): 18.946 + 18.947 + diff -r 9d95df7cf2ad -r 3cba9bfe74b5 include/tests.h 18.948 + --- a/include/tests.h Wed Aug 02 15:19:52 2006 -0700 18.949 + +++ b/include/tests.h Wed Aug 02 15:25:26 2006 -0700 18.950 + @@ -212,6 +212,15 @@ static __inline__ void test_headers(void *h) 18.951 + [...snip...] 18.952 +\end{codesample2} 18.953 + 18.954 +\subsubsection{Testing and troubleshooting} 18.955 + 18.956 +Do not forget that by default, the \hgext{notify} extension \emph{will 18.957 + not send any mail} until you explicitly configure it to do so, by 18.958 +setting \rcitem{notify}{test} to \texttt{false}. Until you do that, 18.959 +it simply prints the message it \emph{would} send. 18.960 + 18.961 +\section{Information for writers of hooks} 18.962 +\label{sec:hook:ref} 18.963 + 18.964 +\subsection{In-process hook execution} 18.965 + 18.966 +An in-process hook is called with arguments of the following form: 18.967 +\begin{codesample2} 18.968 + def myhook(ui, repo, **kwargs): 18.969 + pass 18.970 +\end{codesample2} 18.971 +The \texttt{ui} parameter is a \pymodclass{mercurial.ui}{ui} object. 18.972 +The \texttt{repo} parameter is a 18.973 +\pymodclass{mercurial.localrepo}{localrepository} object. The 18.974 +names and values of the \texttt{**kwargs} parameters depend on the 18.975 +hook being invoked, with the following common features: 18.976 +\begin{itemize} 18.977 +\item If a parameter is named \texttt{node} or 18.978 + \texttt{parent\emph{N}}, it will contain a hexadecimal changeset ID. 18.979 + The empty string is used to represent ``null changeset ID'' instead 18.980 + of a string of zeroes. 18.981 +\item If a parameter is named \texttt{url}, it will contain the URL of 18.982 + a remote repository, if that can be determined. 18.983 +\item Boolean-valued parameters are represented as Python 18.984 + \texttt{bool} objects. 18.985 +\end{itemize} 18.986 + 18.987 +An in-process hook is called without a change to the process's working 18.988 +directory (unlike external hooks, which are run in the root of the 18.989 +repository). It must not change the process's working directory, or 18.990 +it will cause any calls it makes into the Mercurial API to fail. 18.991 + 18.992 +If a hook returns a boolean ``false'' value, it is considered to have 18.993 +succeeded. If it returns a boolean ``true'' value or raises an 18.994 +exception, it is considered to have failed. A useful way to think of 18.995 +the calling convention is ``tell me if you fail''. 18.996 + 18.997 +Note that changeset IDs are passed into Python hooks as hexadecimal 18.998 +strings, not the binary hashes that Mercurial's APIs normally use. To 18.999 +convert a hash from hex to binary, use the 18.1000 +\pymodfunc{mercurial.node}{bin} function. 18.1001 + 18.1002 +\subsection{External hook execution} 18.1003 + 18.1004 +An external hook is passed to the shell of the user running Mercurial. 18.1005 +Features of that shell, such as variable substitution and command 18.1006 +redirection, are available. The hook is run in the root directory of 18.1007 +the repository (unlike in-process hooks, which are run in the same 18.1008 +directory that Mercurial was run in). 18.1009 + 18.1010 +Hook parameters are passed to the hook as environment variables. Each 18.1011 +environment variable's name is converted in upper case and prefixed 18.1012 +with the string ``\texttt{HG\_}''. For example, if the name of a 18.1013 +parameter is ``\texttt{node}'', the name of the environment variable 18.1014 +representing that parameter will be ``\texttt{HG\_NODE}''. 18.1015 + 18.1016 +A boolean parameter is represented as the string ``\texttt{1}'' for 18.1017 +``true'', ``\texttt{0}'' for ``false''. If an environment variable is 18.1018 +named \envar{HG\_NODE}, \envar{HG\_PARENT1} or \envar{HG\_PARENT2}, it 18.1019 +contains a changeset ID represented as a hexadecimal string. The 18.1020 +empty string is used to represent ``null changeset ID'' instead of a 18.1021 +string of zeroes. If an environment variable is named 18.1022 +\envar{HG\_URL}, it will contain the URL of a remote repository, if 18.1023 +that can be determined. 18.1024 + 18.1025 +If a hook exits with a status of zero, it is considered to have 18.1026 +succeeded. If it exits with a non-zero status, it is considered to 18.1027 +have failed. 18.1028 + 18.1029 +\subsection{Finding out where changesets come from} 18.1030 + 18.1031 +A hook that involves the transfer of changesets between a local 18.1032 +repository and another may be able to find out information about the 18.1033 +``far side''. Mercurial knows \emph{how} changes are being 18.1034 +transferred, and in many cases \emph{where} they are being transferred 18.1035 +to or from. 18.1036 + 18.1037 +\subsubsection{Sources of changesets} 18.1038 +\label{sec:hook:sources} 18.1039 + 18.1040 +Mercurial will tell a hook what means are, or were, used to transfer 18.1041 +changesets between repositories. This is provided by Mercurial in a 18.1042 +Python parameter named \texttt{source}, or an environment variable named 18.1043 +\envar{HG\_SOURCE}. 18.1044 + 18.1045 +\begin{itemize} 18.1046 +\item[\texttt{serve}] Changesets are transferred to or from a remote 18.1047 + repository over http or ssh. 18.1048 +\item[\texttt{pull}] Changesets are being transferred via a pull from 18.1049 + one repository into another. 18.1050 +\item[\texttt{push}] Changesets are being transferred via a push from 18.1051 + one repository into another. 18.1052 +\item[\texttt{bundle}] Changesets are being transferred to or from a 18.1053 + bundle. 18.1054 +\end{itemize} 18.1055 + 18.1056 +\subsubsection{Where changes are going---remote repository URLs} 18.1057 +\label{sec:hook:url} 18.1058 + 18.1059 +When possible, Mercurial will tell a hook the location of the ``far 18.1060 +side'' of an activity that transfers changeset data between 18.1061 +repositories. This is provided by Mercurial in a Python parameter 18.1062 +named \texttt{url}, or an environment variable named \envar{HG\_URL}. 18.1063 + 18.1064 +This information is not always known. If a hook is invoked in a 18.1065 +repository that is being served via http or ssh, Mercurial cannot tell 18.1066 +where the remote repository is, but it may know where the client is 18.1067 +connecting from. In such cases, the URL will take one of the 18.1068 +following forms: 18.1069 +\begin{itemize} 18.1070 +\item \texttt{remote:ssh:\emph{ip-address}}---remote ssh client, at 18.1071 + the given IP address. 18.1072 +\item \texttt{remote:http:\emph{ip-address}}---remote http client, at 18.1073 + the given IP address. If the client is using SSL, this will be of 18.1074 + the form \texttt{remote:https:\emph{ip-address}}. 18.1075 +\item Empty---no information could be discovered about the remote 18.1076 + client. 18.1077 +\end{itemize} 18.1078 + 18.1079 +\section{Hook reference} 18.1080 + 18.1081 +\subsection{\hook{changegroup}---after remote changesets added} 18.1082 +\label{sec:hook:changegroup} 18.1083 + 18.1084 +This hook is run after a group of pre-existing changesets has been 18.1085 +added to the repository, for example via a \hgcmd{pull} or 18.1086 +\hgcmd{unbundle}. This hook is run once per operation that added one 18.1087 +or more changesets. This is in contrast to the \hook{incoming} hook, 18.1088 +which is run once per changeset, regardless of whether the changesets 18.1089 +arrive in a group. 18.1090 + 18.1091 +Some possible uses for this hook include kicking off an automated 18.1092 +build or test of the added changesets, updating a bug database, or 18.1093 +notifying subscribers that a repository contains new changes. 18.1094 + 18.1095 +Parameters to this hook: 18.1096 +\begin{itemize} 18.1097 +\item[\texttt{node}] A changeset ID. The changeset ID of the first 18.1098 + changeset in the group that was added. All changesets between this 18.1099 + and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by 18.1100 + a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. 18.1101 +\item[\texttt{source}] A string. The source of these changes. See 18.1102 + section~\ref{sec:hook:sources} for details. 18.1103 +\item[\texttt{url}] A URL. The location of the remote repository, if 18.1104 + known. See section~\ref{sec:hook:url} for more information. 18.1105 +\end{itemize} 18.1106 + 18.1107 +See also: \hook{incoming} (section~\ref{sec:hook:incoming}), 18.1108 +\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), 18.1109 +\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) 18.1110 + 18.1111 +\subsection{\hook{commit}---after a new changeset is created} 18.1112 +\label{sec:hook:commit} 18.1113 + 18.1114 +This hook is run after a new changeset has been created. 18.1115 + 18.1116 +Parameters to this hook: 18.1117 +\begin{itemize} 18.1118 +\item[\texttt{node}] A changeset ID. The changeset ID of the newly 18.1119 + committed changeset. 18.1120 +\item[\texttt{parent1}] A changeset ID. The changeset ID of the first 18.1121 + parent of the newly committed changeset. 18.1122 +\item[\texttt{parent2}] A changeset ID. The changeset ID of the second 18.1123 + parent of the newly committed changeset. 18.1124 +\end{itemize} 18.1125 + 18.1126 +See also: \hook{precommit} (section~\ref{sec:hook:precommit}), 18.1127 +\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) 18.1128 + 18.1129 +\subsection{\hook{incoming}---after one remote changeset is added} 18.1130 +\label{sec:hook:incoming} 18.1131 + 18.1132 +This hook is run after a pre-existing changeset has been added to the 18.1133 +repository, for example via a \hgcmd{push}. If a group of changesets 18.1134 +was added in a single operation, this hook is called once for each 18.1135 +added changeset. 18.1136 + 18.1137 +You can use this hook for the same purposes as the \hook{changegroup} 18.1138 +hook (section~\ref{sec:hook:changegroup}); it's simply more convenient 18.1139 +sometimes to run a hook once per group of changesets, while other 18.1140 +times it's handier once per changeset. 18.1141 + 18.1142 +Parameters to this hook: 18.1143 +\begin{itemize} 18.1144 +\item[\texttt{node}] A changeset ID. The ID of the newly added 18.1145 + changeset. 18.1146 +\item[\texttt{source}] A string. The source of these changes. See 18.1147 + section~\ref{sec:hook:sources} for details. 18.1148 +\item[\texttt{url}] A URL. The location of the remote repository, if 18.1149 + known. See section~\ref{sec:hook:url} for more information. 18.1150 +\end{itemize} 18.1151 + 18.1152 +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}) \hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), \hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) 18.1153 + 18.1154 +\subsection{\hook{outgoing}---after changesets are propagated} 18.1155 +\label{sec:hook:outgoing} 18.1156 + 18.1157 +This hook is run after a group of changesets has been propagated out 18.1158 +of this repository, for example by a \hgcmd{push} or \hgcmd{bundle} 18.1159 +command. 18.1160 + 18.1161 +One possible use for this hook is to notify administrators that 18.1162 +changes have been pulled. 18.1163 + 18.1164 +Parameters to this hook: 18.1165 +\begin{itemize} 18.1166 +\item[\texttt{node}] A changeset ID. The changeset ID of the first 18.1167 + changeset of the group that was sent. 18.1168 +\item[\texttt{source}] A string. The source of the of the operation 18.1169 + (see section~\ref{sec:hook:sources}). If a remote client pulled 18.1170 + changes from this repository, \texttt{source} will be 18.1171 + \texttt{serve}. If the client that obtained changes from this 18.1172 + repository was local, \texttt{source} will be \texttt{bundle}, 18.1173 + \texttt{pull}, or \texttt{push}, depending on the operation the 18.1174 + client performed. 18.1175 +\item[\texttt{url}] A URL. The location of the remote repository, if 18.1176 + known. See section~\ref{sec:hook:url} for more information. 18.1177 +\end{itemize} 18.1178 + 18.1179 +See also: \hook{preoutgoing} (section~\ref{sec:hook:preoutgoing}) 18.1180 + 18.1181 +\subsection{\hook{prechangegroup}---before starting to add remote changesets} 18.1182 +\label{sec:hook:prechangegroup} 18.1183 + 18.1184 +This controlling hook is run before Mercurial begins to add a group of 18.1185 +changesets from another repository. 18.1186 + 18.1187 +This hook does not have any information about the changesets to be 18.1188 +added, because it is run before transmission of those changesets is 18.1189 +allowed to begin. If this hook fails, the changesets will not be 18.1190 +transmitted. 18.1191 + 18.1192 +One use for this hook is to prevent external changes from being added 18.1193 +to a repository. For example, you could use this to ``freeze'' a 18.1194 +server-hosted branch temporarily or permanently so that users cannot 18.1195 +push to it, while still allowing a local administrator to modify the 18.1196 +repository. 18.1197 + 18.1198 +Parameters to this hook: 18.1199 +\begin{itemize} 18.1200 +\item[\texttt{source}] A string. The source of these changes. See 18.1201 + section~\ref{sec:hook:sources} for details. 18.1202 +\item[\texttt{url}] A URL. The location of the remote repository, if 18.1203 + known. See section~\ref{sec:hook:url} for more information. 18.1204 +\end{itemize} 18.1205 + 18.1206 +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), 18.1207 +\hook{incoming} (section~\ref{sec:hook:incoming}), , 18.1208 +\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) 18.1209 + 18.1210 +\subsection{\hook{precommit}---before starting to commit a changeset} 18.1211 +\label{sec:hook:precommit} 18.1212 + 18.1213 +This hook is run before Mercurial begins to commit a new changeset. 18.1214 +It is run before Mercurial has any of the metadata for the commit, 18.1215 +such as the files to be committed, the commit message, or the commit 18.1216 +date. 18.1217 + 18.1218 +One use for this hook is to disable the ability to commit new 18.1219 +changesets, while still allowing incoming changesets. Another is to 18.1220 +run a build or test, and only allow the commit to begin if the build 18.1221 +or test succeeds. 18.1222 + 18.1223 +Parameters to this hook: 18.1224 +\begin{itemize} 18.1225 +\item[\texttt{parent1}] A changeset ID. The changeset ID of the first 18.1226 + parent of the working directory. 18.1227 +\item[\texttt{parent2}] A changeset ID. The changeset ID of the second 18.1228 + parent of the working directory. 18.1229 +\end{itemize} 18.1230 +If the commit proceeds, the parents of the working directory will 18.1231 +become the parents of the new changeset. 18.1232 + 18.1233 +See also: \hook{commit} (section~\ref{sec:hook:commit}), 18.1234 +\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) 18.1235 + 18.1236 +\subsection{\hook{preoutgoing}---before starting to propagate changesets} 18.1237 +\label{sec:hook:preoutgoing} 18.1238 + 18.1239 +This hook is invoked before Mercurial knows the identities of the 18.1240 +changesets to be transmitted. 18.1241 + 18.1242 +One use for this hook is to prevent changes from being transmitted to 18.1243 +another repository. 18.1244 + 18.1245 +Parameters to this hook: 18.1246 +\begin{itemize} 18.1247 +\item[\texttt{source}] A string. The source of the operation that is 18.1248 + attempting to obtain changes from this repository (see 18.1249 + section~\ref{sec:hook:sources}). See the documentation for the 18.1250 + \texttt{source} parameter to the \hook{outgoing} hook, in 18.1251 + section~\ref{sec:hook:outgoing}, for possible values of this 18.1252 + parameter. 18.1253 +\item[\texttt{url}] A URL. The location of the remote repository, if 18.1254 + known. See section~\ref{sec:hook:url} for more information. 18.1255 +\end{itemize} 18.1256 + 18.1257 +See also: \hook{outgoing} (section~\ref{sec:hook:outgoing}) 18.1258 + 18.1259 +\subsection{\hook{pretag}---before tagging a changeset} 18.1260 +\label{sec:hook:pretag} 18.1261 + 18.1262 +This controlling hook is run before a tag is created. If the hook 18.1263 +succeeds, creation of the tag proceeds. If the hook fails, the tag is 18.1264 +not created. 18.1265 + 18.1266 +Parameters to this hook: 18.1267 +\begin{itemize} 18.1268 +\item[\texttt{local}] A boolean. Whether the tag is local to this 18.1269 + repository instance (i.e.~stored in \sfilename{.hg/localtags}) or 18.1270 + managed by Mercurial (stored in \sfilename{.hgtags}). 18.1271 +\item[\texttt{node}] A changeset ID. The ID of the changeset to be tagged. 18.1272 +\item[\texttt{tag}] A string. The name of the tag to be created. 18.1273 +\end{itemize} 18.1274 + 18.1275 +If the tag to be created is revision-controlled, the \hook{precommit} 18.1276 +and \hook{pretxncommit} hooks (sections~\ref{sec:hook:commit} 18.1277 +and~\ref{sec:hook:pretxncommit}) will also be run. 18.1278 + 18.1279 +See also: \hook{tag} (section~\ref{sec:hook:tag}) 18.1280 + 18.1281 +\subsection{\hook{pretxnchangegroup}---before completing addition of 18.1282 + remote changesets} 18.1283 +\label{sec:hook:pretxnchangegroup} 18.1284 + 18.1285 +This controlling hook is run before a transaction---that manages the 18.1286 +addition of a group of new changesets from outside the 18.1287 +repository---completes. If the hook succeeds, the transaction 18.1288 +completes, and all of the changesets become permanent within this 18.1289 +repository. If the hook fails, the transaction is rolled back, and 18.1290 +the data for the changesets is erased. 18.1291 + 18.1292 +This hook can access the metadata associated with the almost-added 18.1293 +changesets, but it should not do anything permanent with this data. 18.1294 +It must also not modify the working directory. 18.1295 + 18.1296 +While this hook is running, if other Mercurial processes access this 18.1297 +repository, they will be able to see the almost-added changesets as if 18.1298 +they are permanent. This may lead to race conditions if you do not 18.1299 +take steps to avoid them. 18.1300 + 18.1301 +This hook can be used to automatically vet a group of changesets. If 18.1302 +the hook fails, all of the changesets are ``rejected'' when the 18.1303 +transaction rolls back. 18.1304 + 18.1305 +Parameters to this hook: 18.1306 +\begin{itemize} 18.1307 +\item[\texttt{node}] A changeset ID. The changeset ID of the first 18.1308 + changeset in the group that was added. All changesets between this 18.1309 + and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by 18.1310 + a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. 18.1311 +\item[\texttt{source}] A string. The source of these changes. See 18.1312 + section~\ref{sec:hook:sources} for details. 18.1313 +\item[\texttt{url}] A URL. The location of the remote repository, if 18.1314 + known. See section~\ref{sec:hook:url} for more information. 18.1315 +\end{itemize} 18.1316 + 18.1317 +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), 18.1318 +\hook{incoming} (section~\ref{sec:hook:incoming}), 18.1319 +\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}) 18.1320 + 18.1321 +\subsection{\hook{pretxncommit}---before completing commit of new changeset} 18.1322 +\label{sec:hook:pretxncommit} 18.1323 + 18.1324 +This controlling hook is run before a transaction---that manages a new 18.1325 +commit---completes. If the hook succeeds, the transaction completes 18.1326 +and the changeset becomes permanent within this repository. If the 18.1327 +hook fails, the transaction is rolled back, and the commit data is 18.1328 +erased. 18.1329 + 18.1330 +This hook can access the metadata associated with the almost-new 18.1331 +changeset, but it should not do anything permanent with this data. It 18.1332 +must also not modify the working directory. 18.1333 + 18.1334 +While this hook is running, if other Mercurial processes access this 18.1335 +repository, they will be able to see the almost-new changeset as if it 18.1336 +is permanent. This may lead to race conditions if you do not take 18.1337 +steps to avoid them. 18.1338 + 18.1339 +Parameters to this hook: 18.1340 +\begin{itemize} 18.1341 +\item[\texttt{node}] A changeset ID. The changeset ID of the newly 18.1342 + committed changeset. 18.1343 +\item[\texttt{parent1}] A changeset ID. The changeset ID of the first 18.1344 + parent of the newly committed changeset. 18.1345 +\item[\texttt{parent2}] A changeset ID. The changeset ID of the second 18.1346 + parent of the newly committed changeset. 18.1347 +\end{itemize} 18.1348 + 18.1349 +See also: \hook{precommit} (section~\ref{sec:hook:precommit}) 18.1350 + 18.1351 +\subsection{\hook{preupdate}---before updating or merging working directory} 18.1352 +\label{sec:hook:preupdate} 18.1353 + 18.1354 +This controlling hook is run before an update or merge of the working 18.1355 +directory begins. It is run only if Mercurial's normal pre-update 18.1356 +checks determine that the update or merge can proceed. If the hook 18.1357 +succeeds, the update or merge may proceed; if it fails, the update or 18.1358 +merge does not start. 18.1359 + 18.1360 +Parameters to this hook: 18.1361 +\begin{itemize} 18.1362 +\item[\texttt{parent1}] A changeset ID. The ID of the parent that the 18.1363 + working directory is to be updated to. If the working directory is 18.1364 + being merged, it will not change this parent. 18.1365 +\item[\texttt{parent2}] A changeset ID. Only set if the working 18.1366 + directory is being merged. The ID of the revision that the working 18.1367 + directory is being merged with. 18.1368 +\end{itemize} 18.1369 + 18.1370 +See also: \hook{update} (section~\ref{sec:hook:update}) 18.1371 + 18.1372 +\subsection{\hook{tag}---after tagging a changeset} 18.1373 +\label{sec:hook:tag} 18.1374 + 18.1375 +This hook is run after a tag has been created. 18.1376 + 18.1377 +Parameters to this hook: 18.1378 +\begin{itemize} 18.1379 +\item[\texttt{local}] A boolean. Whether the new tag is local to this 18.1380 + repository instance (i.e.~stored in \sfilename{.hg/localtags}) or 18.1381 + managed by Mercurial (stored in \sfilename{.hgtags}). 18.1382 +\item[\texttt{node}] A changeset ID. The ID of the changeset that was 18.1383 + tagged. 18.1384 +\item[\texttt{tag}] A string. The name of the tag that was created. 18.1385 +\end{itemize} 18.1386 + 18.1387 +If the created tag is revision-controlled, the \hook{commit} hook 18.1388 +(section~\ref{sec:hook:commit}) is run before this hook. 18.1389 + 18.1390 +See also: \hook{pretag} (section~\ref{sec:hook:pretag}) 18.1391 + 18.1392 +\subsection{\hook{update}---after updating or merging working directory} 18.1393 +\label{sec:hook:update} 18.1394 + 18.1395 +This hook is run after an update or merge of the working directory 18.1396 +completes. Since a merge can fail (if the external \command{hgmerge} 18.1397 +command fails to resolve conflicts in a file), this hook communicates 18.1398 +whether the update or merge completed cleanly. 18.1399 + 18.1400 +\begin{itemize} 18.1401 +\item[\texttt{error}] A boolean. Indicates whether the update or 18.1402 + merge completed successfully. 18.1403 +\item[\texttt{parent1}] A changeset ID. The ID of the parent that the 18.1404 + working directory was updated to. If the working directory was 18.1405 + merged, it will not have changed this parent. 18.1406 +\item[\texttt{parent2}] A changeset ID. Only set if the working 18.1407 + directory was merged. The ID of the revision that the working 18.1408 + directory was merged with. 18.1409 +\end{itemize} 18.1410 + 18.1411 +See also: \hook{preupdate} (section~\ref{sec:hook:preupdate}) 18.1412 + 18.1413 +%%% Local Variables: 18.1414 +%%% mode: latex 18.1415 +%%% TeX-master: "00book" 18.1416 +%%% End:
19.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 19.2 +++ b/en/ch11-template.tex Thu Jan 29 22:56:27 2009 -0800 19.3 @@ -0,0 +1,475 @@ 19.4 +\chapter{Customising the output of Mercurial} 19.5 +\label{chap:template} 19.6 + 19.7 +Mercurial provides a powerful mechanism to let you control how it 19.8 +displays information. The mechanism is based on templates. You can 19.9 +use templates to generate specific output for a single command, or to 19.10 +customise the entire appearance of the built-in web interface. 19.11 + 19.12 +\section{Using precanned output styles} 19.13 +\label{sec:style} 19.14 + 19.15 +Packaged with Mercurial are some output styles that you can use 19.16 +immediately. A style is simply a precanned template that someone 19.17 +wrote and installed somewhere that Mercurial can find. 19.18 + 19.19 +Before we take a look at Mercurial's bundled styles, let's review its 19.20 +normal output. 19.21 + 19.22 +\interaction{template.simple.normal} 19.23 + 19.24 +This is somewhat informative, but it takes up a lot of space---five 19.25 +lines of output per changeset. The \texttt{compact} style reduces 19.26 +this to three lines, presented in a sparse manner. 19.27 + 19.28 +\interaction{template.simple.compact} 19.29 + 19.30 +The \texttt{changelog} style hints at the expressive power of 19.31 +Mercurial's templating engine. This style attempts to follow the GNU 19.32 +Project's changelog guidelines\cite{web:changelog}. 19.33 + 19.34 +\interaction{template.simple.changelog} 19.35 + 19.36 +You will not be shocked to learn that Mercurial's default output style 19.37 +is named \texttt{default}. 19.38 + 19.39 +\subsection{Setting a default style} 19.40 + 19.41 +You can modify the output style that Mercurial will use for every 19.42 +command by editing your \hgrc\ file, naming the style you would 19.43 +prefer to use. 19.44 + 19.45 +\begin{codesample2} 19.46 + [ui] 19.47 + style = compact 19.48 +\end{codesample2} 19.49 + 19.50 +If you write a style of your own, you can use it by either providing 19.51 +the path to your style file, or copying your style file into a 19.52 +location where Mercurial can find it (typically the \texttt{templates} 19.53 +subdirectory of your Mercurial install directory). 19.54 + 19.55 +\section{Commands that support styles and templates} 19.56 + 19.57 +All of Mercurial's ``\texttt{log}-like'' commands let you use styles 19.58 +and templates: \hgcmd{incoming}, \hgcmd{log}, \hgcmd{outgoing}, and 19.59 +\hgcmd{tip}. 19.60 + 19.61 +As I write this manual, these are so far the only commands that 19.62 +support styles and templates. Since these are the most important 19.63 +commands that need customisable output, there has been little pressure 19.64 +from the Mercurial user community to add style and template support to 19.65 +other commands. 19.66 + 19.67 +\section{The basics of templating} 19.68 + 19.69 +At its simplest, a Mercurial template is a piece of text. Some of the 19.70 +text never changes, while other parts are \emph{expanded}, or replaced 19.71 +with new text, when necessary. 19.72 + 19.73 +Before we continue, let's look again at a simple example of 19.74 +Mercurial's normal output. 19.75 + 19.76 +\interaction{template.simple.normal} 19.77 + 19.78 +Now, let's run the same command, but using a template to change its 19.79 +output. 19.80 + 19.81 +\interaction{template.simple.simplest} 19.82 + 19.83 +The example above illustrates the simplest possible template; it's 19.84 +just a piece of static text, printed once for each changeset. The 19.85 +\hgopt{log}{--template} option to the \hgcmd{log} command tells 19.86 +Mercurial to use the given text as the template when printing each 19.87 +changeset. 19.88 + 19.89 +Notice that the template string above ends with the text 19.90 +``\Verb+\n+''. This is an \emph{escape sequence}, telling Mercurial 19.91 +to print a newline at the end of each template item. If you omit this 19.92 +newline, Mercurial will run each piece of output together. See 19.93 +section~\ref{sec:template:escape} for more details of escape sequences. 19.94 + 19.95 +A template that prints a fixed string of text all the time isn't very 19.96 +useful; let's try something a bit more complex. 19.97 + 19.98 +\interaction{template.simple.simplesub} 19.99 + 19.100 +As you can see, the string ``\Verb+{desc}+'' in the template has been 19.101 +replaced in the output with the description of each changeset. Every 19.102 +time Mercurial finds text enclosed in curly braces (``\texttt{\{}'' 19.103 +and ``\texttt{\}}''), it will try to replace the braces and text with 19.104 +the expansion of whatever is inside. To print a literal curly brace, 19.105 +you must escape it, as described in section~\ref{sec:template:escape}. 19.106 + 19.107 +\section{Common template keywords} 19.108 +\label{sec:template:keyword} 19.109 + 19.110 +You can start writing simple templates immediately using the keywords 19.111 +below. 19.112 + 19.113 +\begin{itemize} 19.114 +\item[\tplkword{author}] String. The unmodified author of the changeset. 19.115 +\item[\tplkword{branches}] String. The name of the branch on which 19.116 + the changeset was committed. Will be empty if the branch name was 19.117 + \texttt{default}. 19.118 +\item[\tplkword{date}] Date information. The date when the changeset 19.119 + was committed. This is \emph{not} human-readable; you must pass it 19.120 + through a filter that will render it appropriately. See 19.121 + section~\ref{sec:template:filter} for more information on filters. 19.122 + The date is expressed as a pair of numbers. The first number is a 19.123 + Unix UTC timestamp (seconds since January 1, 1970); the second is 19.124 + the offset of the committer's timezone from UTC, in seconds. 19.125 +\item[\tplkword{desc}] String. The text of the changeset description. 19.126 +\item[\tplkword{files}] List of strings. All files modified, added, or 19.127 + removed by this changeset. 19.128 +\item[\tplkword{file\_adds}] List of strings. Files added by this 19.129 + changeset. 19.130 +\item[\tplkword{file\_dels}] List of strings. Files removed by this 19.131 + changeset. 19.132 +\item[\tplkword{node}] String. The changeset identification hash, as a 19.133 + 40-character hexadecimal string. 19.134 +\item[\tplkword{parents}] List of strings. The parents of the 19.135 + changeset. 19.136 +\item[\tplkword{rev}] Integer. The repository-local changeset revision 19.137 + number. 19.138 +\item[\tplkword{tags}] List of strings. Any tags associated with the 19.139 + changeset. 19.140 +\end{itemize} 19.141 + 19.142 +A few simple experiments will show us what to expect when we use these 19.143 +keywords; you can see the results in 19.144 +figure~\ref{fig:template:keywords}. 19.145 + 19.146 +\begin{figure} 19.147 + \interaction{template.simple.keywords} 19.148 + \caption{Template keywords in use} 19.149 + \label{fig:template:keywords} 19.150 +\end{figure} 19.151 + 19.152 +As we noted above, the date keyword does not produce human-readable 19.153 +output, so we must treat it specially. This involves using a 19.154 +\emph{filter}, about which more in section~\ref{sec:template:filter}. 19.155 + 19.156 +\interaction{template.simple.datekeyword} 19.157 + 19.158 +\section{Escape sequences} 19.159 +\label{sec:template:escape} 19.160 + 19.161 +Mercurial's templating engine recognises the most commonly used escape 19.162 +sequences in strings. When it sees a backslash (``\Verb+\+'') 19.163 +character, it looks at the following character and substitutes the two 19.164 +characters with a single replacement, as described below. 19.165 + 19.166 +\begin{itemize} 19.167 +\item[\Verb+\textbackslash\textbackslash+] Backslash, ``\Verb+\+'', 19.168 + ASCII~134. 19.169 +\item[\Verb+\textbackslash n+] Newline, ASCII~12. 19.170 +\item[\Verb+\textbackslash r+] Carriage return, ASCII~15. 19.171 +\item[\Verb+\textbackslash t+] Tab, ASCII~11. 19.172 +\item[\Verb+\textbackslash v+] Vertical tab, ASCII~13. 19.173 +\item[\Verb+\textbackslash \{+] Open curly brace, ``\Verb+{+'', ASCII~173. 19.174 +\item[\Verb+\textbackslash \}+] Close curly brace, ``\Verb+}+'', ASCII~175. 19.175 +\end{itemize} 19.176 + 19.177 +As indicated above, if you want the expansion of a template to contain 19.178 +a literal ``\Verb+\+'', ``\Verb+{+'', or ``\Verb+{+'' character, you 19.179 +must escape it. 19.180 + 19.181 +\section{Filtering keywords to change their results} 19.182 +\label{sec:template:filter} 19.183 + 19.184 +Some of the results of template expansion are not immediately easy to 19.185 +use. Mercurial lets you specify an optional chain of \emph{filters} 19.186 +to modify the result of expanding a keyword. You have already seen a 19.187 +common filter, \tplkwfilt{date}{isodate}, in action above, to make a 19.188 +date readable. 19.189 + 19.190 +Below is a list of the most commonly used filters that Mercurial 19.191 +supports. While some filters can be applied to any text, others can 19.192 +only be used in specific circumstances. The name of each filter is 19.193 +followed first by an indication of where it can be used, then a 19.194 +description of its effect. 19.195 + 19.196 +\begin{itemize} 19.197 +\item[\tplfilter{addbreaks}] Any text. Add an XHTML ``\Verb+<br/>+'' 19.198 + tag before the end of every line except the last. For example, 19.199 + ``\Verb+foo\nbar+'' becomes ``\Verb+foo<br/>\nbar+''. 19.200 +\item[\tplkwfilt{date}{age}] \tplkword{date} keyword. Render the 19.201 + age of the date, relative to the current time. Yields a string like 19.202 + ``\Verb+10 minutes+''. 19.203 +\item[\tplfilter{basename}] Any text, but most useful for the 19.204 + \tplkword{files} keyword and its relatives. Treat the text as a 19.205 + path, and return the basename. For example, ``\Verb+foo/bar/baz+'' 19.206 + becomes ``\Verb+baz+''. 19.207 +\item[\tplkwfilt{date}{date}] \tplkword{date} keyword. Render a date 19.208 + in a similar format to the Unix \tplkword{date} command, but with 19.209 + timezone included. Yields a string like 19.210 + ``\Verb+Mon Sep 04 15:13:13 2006 -0700+''. 19.211 +\item[\tplkwfilt{author}{domain}] Any text, but most useful for the 19.212 + \tplkword{author} keyword. Finds the first string that looks like 19.213 + an email address, and extract just the domain component. For 19.214 + example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes 19.215 + ``\Verb+serpentine.com+''. 19.216 +\item[\tplkwfilt{author}{email}] Any text, but most useful for the 19.217 + \tplkword{author} keyword. Extract the first string that looks like 19.218 + an email address. For example, 19.219 + ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes 19.220 + ``\Verb+bos@serpentine.com+''. 19.221 +\item[\tplfilter{escape}] Any text. Replace the special XML/XHTML 19.222 + characters ``\Verb+&+'', ``\Verb+<+'' and ``\Verb+>+'' with 19.223 + XML entities. 19.224 +\item[\tplfilter{fill68}] Any text. Wrap the text to fit in 68 19.225 + columns. This is useful before you pass text through the 19.226 + \tplfilter{tabindent} filter, and still want it to fit in an 19.227 + 80-column fixed-font window. 19.228 +\item[\tplfilter{fill76}] Any text. Wrap the text to fit in 76 19.229 + columns. 19.230 +\item[\tplfilter{firstline}] Any text. Yield the first line of text, 19.231 + without any trailing newlines. 19.232 +\item[\tplkwfilt{date}{hgdate}] \tplkword{date} keyword. Render the 19.233 + date as a pair of readable numbers. Yields a string like 19.234 + ``\Verb+1157407993 25200+''. 19.235 +\item[\tplkwfilt{date}{isodate}] \tplkword{date} keyword. Render the 19.236 + date as a text string in ISO~8601 format. Yields a string like 19.237 + ``\Verb+2006-09-04 15:13:13 -0700+''. 19.238 +\item[\tplfilter{obfuscate}] Any text, but most useful for the 19.239 + \tplkword{author} keyword. Yield the input text rendered as a 19.240 + sequence of XML entities. This helps to defeat some particularly 19.241 + stupid screen-scraping email harvesting spambots. 19.242 +\item[\tplkwfilt{author}{person}] Any text, but most useful for the 19.243 + \tplkword{author} keyword. Yield the text before an email address. 19.244 + For example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' 19.245 + becomes ``\Verb+Bryan O'Sullivan+''. 19.246 +\item[\tplkwfilt{date}{rfc822date}] \tplkword{date} keyword. Render a 19.247 + date using the same format used in email headers. Yields a string 19.248 + like ``\Verb+Mon, 04 Sep 2006 15:13:13 -0700+''. 19.249 +\item[\tplkwfilt{node}{short}] Changeset hash. Yield the short form 19.250 + of a changeset hash, i.e.~a 12-character hexadecimal string. 19.251 +\item[\tplkwfilt{date}{shortdate}] \tplkword{date} keyword. Render 19.252 + the year, month, and day of the date. Yields a string like 19.253 + ``\Verb+2006-09-04+''. 19.254 +\item[\tplfilter{strip}] Any text. Strip all leading and trailing 19.255 + whitespace from the string. 19.256 +\item[\tplfilter{tabindent}] Any text. Yield the text, with every line 19.257 + except the first starting with a tab character. 19.258 +\item[\tplfilter{urlescape}] Any text. Escape all characters that are 19.259 + considered ``special'' by URL parsers. For example, \Verb+foo bar+ 19.260 + becomes \Verb+foo%20bar+. 19.261 +\item[\tplkwfilt{author}{user}] Any text, but most useful for the 19.262 + \tplkword{author} keyword. Return the ``user'' portion of an email 19.263 + address. For example, 19.264 + ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes 19.265 + ``\Verb+bos+''. 19.266 +\end{itemize} 19.267 + 19.268 +\begin{figure} 19.269 + \interaction{template.simple.manyfilters} 19.270 + \caption{Template filters in action} 19.271 + \label{fig:template:filters} 19.272 +\end{figure} 19.273 + 19.274 +\begin{note} 19.275 + If you try to apply a filter to a piece of data that it cannot 19.276 + process, Mercurial will fail and print a Python exception. For 19.277 + example, trying to run the output of the \tplkword{desc} keyword 19.278 + into the \tplkwfilt{date}{isodate} filter is not a good idea. 19.279 +\end{note} 19.280 + 19.281 +\subsection{Combining filters} 19.282 + 19.283 +It is easy to combine filters to yield output in the form you would 19.284 +like. The following chain of filters tidies up a description, then 19.285 +makes sure that it fits cleanly into 68 columns, then indents it by a 19.286 +further 8~characters (at least on Unix-like systems, where a tab is 19.287 +conventionally 8~characters wide). 19.288 + 19.289 +\interaction{template.simple.combine} 19.290 + 19.291 +Note the use of ``\Verb+\t+'' (a tab character) in the template to 19.292 +force the first line to be indented; this is necessary since 19.293 +\tplkword{tabindent} indents all lines \emph{except} the first. 19.294 + 19.295 +Keep in mind that the order of filters in a chain is significant. The 19.296 +first filter is applied to the result of the keyword; the second to 19.297 +the result of the first filter; and so on. For example, using 19.298 +\Verb+fill68|tabindent+ gives very different results from 19.299 +\Verb+tabindent|fill68+. 19.300 + 19.301 + 19.302 +\section{From templates to styles} 19.303 + 19.304 +A command line template provides a quick and simple way to format some 19.305 +output. Templates can become verbose, though, and it's useful to be 19.306 +able to give a template a name. A style file is a template with a 19.307 +name, stored in a file. 19.308 + 19.309 +More than that, using a style file unlocks the power of Mercurial's 19.310 +templating engine in ways that are not possible using the command line 19.311 +\hgopt{log}{--template} option. 19.312 + 19.313 +\subsection{The simplest of style files} 19.314 + 19.315 +Our simple style file contains just one line: 19.316 + 19.317 +\interaction{template.simple.rev} 19.318 + 19.319 +This tells Mercurial, ``if you're printing a changeset, use the text 19.320 +on the right as the template''. 19.321 + 19.322 +\subsection{Style file syntax} 19.323 + 19.324 +The syntax rules for a style file are simple. 19.325 + 19.326 +\begin{itemize} 19.327 +\item The file is processed one line at a time. 19.328 + 19.329 +\item Leading and trailing white space are ignored. 19.330 + 19.331 +\item Empty lines are skipped. 19.332 + 19.333 +\item If a line starts with either of the characters ``\texttt{\#}'' or 19.334 + ``\texttt{;}'', the entire line is treated as a comment, and skipped 19.335 + as if empty. 19.336 + 19.337 +\item A line starts with a keyword. This must start with an 19.338 + alphabetic character or underscore, and can subsequently contain any 19.339 + alphanumeric character or underscore. (In regexp notation, a 19.340 + keyword must match \Verb+[A-Za-z_][A-Za-z0-9_]*+.) 19.341 + 19.342 +\item The next element must be an ``\texttt{=}'' character, which can 19.343 + be preceded or followed by an arbitrary amount of white space. 19.344 + 19.345 +\item If the rest of the line starts and ends with matching quote 19.346 + characters (either single or double quote), it is treated as a 19.347 + template body. 19.348 + 19.349 +\item If the rest of the line \emph{does not} start with a quote 19.350 + character, it is treated as the name of a file; the contents of this 19.351 + file will be read and used as a template body. 19.352 +\end{itemize} 19.353 + 19.354 +\section{Style files by example} 19.355 + 19.356 +To illustrate how to write a style file, we will construct a few by 19.357 +example. Rather than provide a complete style file and walk through 19.358 +it, we'll mirror the usual process of developing a style file by 19.359 +starting with something very simple, and walking through a series of 19.360 +successively more complete examples. 19.361 + 19.362 +\subsection{Identifying mistakes in style files} 19.363 + 19.364 +If Mercurial encounters a problem in a style file you are working on, 19.365 +it prints a terse error message that, once you figure out what it 19.366 +means, is actually quite useful. 19.367 + 19.368 +\interaction{template.svnstyle.syntax.input} 19.369 + 19.370 +Notice that \filename{broken.style} attempts to define a 19.371 +\texttt{changeset} keyword, but forgets to give any content for it. 19.372 +When instructed to use this style file, Mercurial promptly complains. 19.373 + 19.374 +\interaction{template.svnstyle.syntax.error} 19.375 + 19.376 +This error message looks intimidating, but it is not too hard to 19.377 +follow. 19.378 + 19.379 +\begin{itemize} 19.380 +\item The first component is simply Mercurial's way of saying ``I am 19.381 + giving up''. 19.382 + \begin{codesample4} 19.383 + \textbf{abort:} broken.style:1: parse error 19.384 + \end{codesample4} 19.385 + 19.386 +\item Next comes the name of the style file that contains the error. 19.387 + \begin{codesample4} 19.388 + abort: \textbf{broken.style}:1: parse error 19.389 + \end{codesample4} 19.390 + 19.391 +\item Following the file name is the line number where the error was 19.392 + encountered. 19.393 + \begin{codesample4} 19.394 + abort: broken.style:\textbf{1}: parse error 19.395 + \end{codesample4} 19.396 + 19.397 +\item Finally, a description of what went wrong. 19.398 + \begin{codesample4} 19.399 + abort: broken.style:1: \textbf{parse error} 19.400 + \end{codesample4} 19.401 + The description of the problem is not always clear (as in this 19.402 + case), but even when it is cryptic, it is almost always trivial to 19.403 + visually inspect the offending line in the style file and see what 19.404 + is wrong. 19.405 +\end{itemize} 19.406 + 19.407 +\subsection{Uniquely identifying a repository} 19.408 + 19.409 +If you would like to be able to identify a Mercurial repository 19.410 +``fairly uniquely'' using a short string as an identifier, you can 19.411 +use the first revision in the repository. 19.412 +\interaction{template.svnstyle.id} 19.413 +This is not guaranteed to be unique, but it is nevertheless useful in 19.414 +many cases. 19.415 +\begin{itemize} 19.416 +\item It will not work in a completely empty repository, because such 19.417 + a repository does not have a revision~zero. 19.418 +\item Neither will it work in the (extremely rare) case where a 19.419 + repository is a merge of two or more formerly independent 19.420 + repositories, and you still have those repositories around. 19.421 +\end{itemize} 19.422 +Here are some uses to which you could put this identifier: 19.423 +\begin{itemize} 19.424 +\item As a key into a table for a database that manages repositories 19.425 + on a server. 19.426 +\item As half of a \{\emph{repository~ID}, \emph{revision~ID}\} tuple. 19.427 + Save this information away when you run an automated build or other 19.428 + activity, so that you can ``replay'' the build later if necessary. 19.429 +\end{itemize} 19.430 + 19.431 +\subsection{Mimicking Subversion's output} 19.432 + 19.433 +Let's try to emulate the default output format used by another 19.434 +revision control tool, Subversion. 19.435 +\interaction{template.svnstyle.short} 19.436 + 19.437 +Since Subversion's output style is fairly simple, it is easy to 19.438 +copy-and-paste a hunk of its output into a file, and replace the text 19.439 +produced above by Subversion with the template values we'd like to see 19.440 +expanded. 19.441 +\interaction{template.svnstyle.template} 19.442 + 19.443 +There are a few small ways in which this template deviates from the 19.444 +output produced by Subversion. 19.445 +\begin{itemize} 19.446 +\item Subversion prints a ``readable'' date (the ``\texttt{Wed, 27 Sep 19.447 + 2006}'' in the example output above) in parentheses. Mercurial's 19.448 + templating engine does not provide a way to display a date in this 19.449 + format without also printing the time and time zone. 19.450 +\item We emulate Subversion's printing of ``separator'' lines full of 19.451 + ``\texttt{-}'' characters by ending the template with such a line. 19.452 + We use the templating engine's \tplkword{header} keyword to print a 19.453 + separator line as the first line of output (see below), thus 19.454 + achieving similar output to Subversion. 19.455 +\item Subversion's output includes a count in the header of the number 19.456 + of lines in the commit message. We cannot replicate this in 19.457 + Mercurial; the templating engine does not currently provide a filter 19.458 + that counts the number of lines the template generates. 19.459 +\end{itemize} 19.460 +It took me no more than a minute or two of work to replace literal 19.461 +text from an example of Subversion's output with some keywords and 19.462 +filters to give the template above. The style file simply refers to 19.463 +the template. 19.464 +\interaction{template.svnstyle.style} 19.465 + 19.466 +We could have included the text of the template file directly in the 19.467 +style file by enclosing it in quotes and replacing the newlines with 19.468 +``\verb!\n!'' sequences, but it would have made the style file too 19.469 +difficult to read. Readability is a good guide when you're trying to 19.470 +decide whether some text belongs in a style file, or in a template 19.471 +file that the style file points to. If the style file will look too 19.472 +big or cluttered if you insert a literal piece of text, drop it into a 19.473 +template instead. 19.474 + 19.475 +%%% Local Variables: 19.476 +%%% mode: latex 19.477 +%%% TeX-master: "00book" 19.478 +%%% End:
20.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 20.2 +++ b/en/ch12-mq.tex Thu Jan 29 22:56:27 2009 -0800 20.3 @@ -0,0 +1,1043 @@ 20.4 +\chapter{Managing change with Mercurial Queues} 20.5 +\label{chap:mq} 20.6 + 20.7 +\section{The patch management problem} 20.8 +\label{sec:mq:patch-mgmt} 20.9 + 20.10 +Here is a common scenario: you need to install a software package from 20.11 +source, but you find a bug that you must fix in the source before you 20.12 +can start using the package. You make your changes, forget about the 20.13 +package for a while, and a few months later you need to upgrade to a 20.14 +newer version of the package. If the newer version of the package 20.15 +still has the bug, you must extract your fix from the older source 20.16 +tree and apply it against the newer version. This is a tedious task, 20.17 +and it's easy to make mistakes. 20.18 + 20.19 +This is a simple case of the ``patch management'' problem. You have 20.20 +an ``upstream'' source tree that you can't change; you need to make 20.21 +some local changes on top of the upstream tree; and you'd like to be 20.22 +able to keep those changes separate, so that you can apply them to 20.23 +newer versions of the upstream source. 20.24 + 20.25 +The patch management problem arises in many situations. Probably the 20.26 +most visible is that a user of an open source software project will 20.27 +contribute a bug fix or new feature to the project's maintainers in the 20.28 +form of a patch. 20.29 + 20.30 +Distributors of operating systems that include open source software 20.31 +often need to make changes to the packages they distribute so that 20.32 +they will build properly in their environments. 20.33 + 20.34 +When you have few changes to maintain, it is easy to manage a single 20.35 +patch using the standard \command{diff} and \command{patch} programs 20.36 +(see section~\ref{sec:mq:patch} for a discussion of these tools). 20.37 +Once the number of changes grows, it starts to make sense to maintain 20.38 +patches as discrete ``chunks of work,'' so that for example a single 20.39 +patch will contain only one bug fix (the patch might modify several 20.40 +files, but it's doing ``only one thing''), and you may have a number 20.41 +of such patches for different bugs you need fixed and local changes 20.42 +you require. In this situation, if you submit a bug fix patch to the 20.43 +upstream maintainers of a package and they include your fix in a 20.44 +subsequent release, you can simply drop that single patch when you're 20.45 +updating to the newer release. 20.46 + 20.47 +Maintaining a single patch against an upstream tree is a little 20.48 +tedious and error-prone, but not difficult. However, the complexity 20.49 +of the problem grows rapidly as the number of patches you have to 20.50 +maintain increases. With more than a tiny number of patches in hand, 20.51 +understanding which ones you have applied and maintaining them moves 20.52 +from messy to overwhelming. 20.53 + 20.54 +Fortunately, Mercurial includes a powerful extension, Mercurial Queues 20.55 +(or simply ``MQ''), that massively simplifies the patch management 20.56 +problem. 20.57 + 20.58 +\section{The prehistory of Mercurial Queues} 20.59 +\label{sec:mq:history} 20.60 + 20.61 +During the late 1990s, several Linux kernel developers started to 20.62 +maintain ``patch series'' that modified the behaviour of the Linux 20.63 +kernel. Some of these series were focused on stability, some on 20.64 +feature coverage, and others were more speculative. 20.65 + 20.66 +The sizes of these patch series grew rapidly. In 2002, Andrew Morton 20.67 +published some shell scripts he had been using to automate the task of 20.68 +managing his patch queues. Andrew was successfully using these 20.69 +scripts to manage hundreds (sometimes thousands) of patches on top of 20.70 +the Linux kernel. 20.71 + 20.72 +\subsection{A patchwork quilt} 20.73 +\label{sec:mq:quilt} 20.74 + 20.75 +In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the 20.76 +approach of Andrew's scripts and published a tool called ``patchwork 20.77 +quilt''~\cite{web:quilt}, or simply ``quilt'' 20.78 +(see~\cite{gruenbacher:2005} for a paper describing it). Because 20.79 +quilt substantially automated patch management, it rapidly gained a 20.80 +large following among open source software developers. 20.81 + 20.82 +Quilt manages a \emph{stack of patches} on top of a directory tree. 20.83 +To begin, you tell quilt to manage a directory tree, and tell it which 20.84 +files you want to manage; it stores away the names and contents of 20.85 +those files. To fix a bug, you create a new patch (using a single 20.86 +command), edit the files you need to fix, then ``refresh'' the patch. 20.87 + 20.88 +The refresh step causes quilt to scan the directory tree; it updates 20.89 +the patch with all of the changes you have made. You can create 20.90 +another patch on top of the first, which will track the changes 20.91 +required to modify the tree from ``tree with one patch applied'' to 20.92 +``tree with two patches applied''. 20.93 + 20.94 +You can \emph{change} which patches are applied to the tree. If you 20.95 +``pop'' a patch, the changes made by that patch will vanish from the 20.96 +directory tree. Quilt remembers which patches you have popped, 20.97 +though, so you can ``push'' a popped patch again, and the directory 20.98 +tree will be restored to contain the modifications in the patch. Most 20.99 +importantly, you can run the ``refresh'' command at any time, and the 20.100 +topmost applied patch will be updated. This means that you can, at 20.101 +any time, change both which patches are applied and what 20.102 +modifications those patches make. 20.103 + 20.104 +Quilt knows nothing about revision control tools, so it works equally 20.105 +well on top of an unpacked tarball or a Subversion working copy. 20.106 + 20.107 +\subsection{From patchwork quilt to Mercurial Queues} 20.108 +\label{sec:mq:quilt-mq} 20.109 + 20.110 +In mid-2005, Chris Mason took the features of quilt and wrote an 20.111 +extension that he called Mercurial Queues, which added quilt-like 20.112 +behaviour to Mercurial. 20.113 + 20.114 +The key difference between quilt and MQ is that quilt knows nothing 20.115 +about revision control systems, while MQ is \emph{integrated} into 20.116 +Mercurial. Each patch that you push is represented as a Mercurial 20.117 +changeset. Pop a patch, and the changeset goes away. 20.118 + 20.119 +Because quilt does not care about revision control tools, it is still 20.120 +a tremendously useful piece of software to know about for situations 20.121 +where you cannot use Mercurial and MQ. 20.122 + 20.123 +\section{The huge advantage of MQ} 20.124 + 20.125 +I cannot overstate the value that MQ offers through the unification of 20.126 +patches and revision control. 20.127 + 20.128 +A major reason that patches have persisted in the free software and 20.129 +open source world---in spite of the availability of increasingly 20.130 +capable revision control tools over the years---is the \emph{agility} 20.131 +they offer. 20.132 + 20.133 +Traditional revision control tools make a permanent, irreversible 20.134 +record of everything that you do. While this has great value, it's 20.135 +also somewhat stifling. If you want to perform a wild-eyed 20.136 +experiment, you have to be careful in how you go about it, or you risk 20.137 +leaving unneeded---or worse, misleading or destabilising---traces of 20.138 +your missteps and errors in the permanent revision record. 20.139 + 20.140 +By contrast, MQ's marriage of distributed revision control with 20.141 +patches makes it much easier to isolate your work. Your patches live 20.142 +on top of normal revision history, and you can make them disappear or 20.143 +reappear at will. If you don't like a patch, you can drop it. If a 20.144 +patch isn't quite as you want it to be, simply fix it---as many times 20.145 +as you need to, until you have refined it into the form you desire. 20.146 + 20.147 +As an example, the integration of patches with revision control makes 20.148 +understanding patches and debugging their effects---and their 20.149 +interplay with the code they're based on---\emph{enormously} easier. 20.150 +Since every applied patch has an associated changeset, you can use 20.151 +\hgcmdargs{log}{\emph{filename}} to see which changesets and patches 20.152 +affected a file. You can use the \hgext{bisect} command to 20.153 +binary-search through all changesets and applied patches to see where 20.154 +a bug got introduced or fixed. You can use the \hgcmd{annotate} 20.155 +command to see which changeset or patch modified a particular line of 20.156 +a source file. And so on. 20.157 + 20.158 +\section{Understanding patches} 20.159 +\label{sec:mq:patch} 20.160 + 20.161 +Because MQ doesn't hide its patch-oriented nature, it is helpful to 20.162 +understand what patches are, and a little about the tools that work 20.163 +with them. 20.164 + 20.165 +The traditional Unix \command{diff} command compares two files, and 20.166 +prints a list of differences between them. The \command{patch} command 20.167 +understands these differences as \emph{modifications} to make to a 20.168 +file. Take a look at figure~\ref{ex:mq:diff} for a simple example of 20.169 +these commands in action. 20.170 + 20.171 +\begin{figure}[ht] 20.172 + \interaction{mq.dodiff.diff} 20.173 + \caption{Simple uses of the \command{diff} and \command{patch} commands} 20.174 + \label{ex:mq:diff} 20.175 +\end{figure} 20.176 + 20.177 +The type of file that \command{diff} generates (and \command{patch} 20.178 +takes as input) is called a ``patch'' or a ``diff''; there is no 20.179 +difference between a patch and a diff. (We'll use the term ``patch'', 20.180 +since it's more commonly used.) 20.181 + 20.182 +A patch file can start with arbitrary text; the \command{patch} 20.183 +command ignores this text, but MQ uses it as the commit message when 20.184 +creating changesets. To find the beginning of the patch content, 20.185 +\command{patch} searches for the first line that starts with the 20.186 +string ``\texttt{diff~-}''. 20.187 + 20.188 +MQ works with \emph{unified} diffs (\command{patch} can accept several 20.189 +other diff formats, but MQ doesn't). A unified diff contains two 20.190 +kinds of header. The \emph{file header} describes the file being 20.191 +modified; it contains the name of the file to modify. When 20.192 +\command{patch} sees a new file header, it looks for a file with that 20.193 +name to start modifying. 20.194 + 20.195 +After the file header comes a series of \emph{hunks}. Each hunk 20.196 +starts with a header; this identifies the range of line numbers within 20.197 +the file that the hunk should modify. Following the header, a hunk 20.198 +starts and ends with a few (usually three) lines of text from the 20.199 +unmodified file; these are called the \emph{context} for the hunk. If 20.200 +there's only a small amount of context between successive hunks, 20.201 +\command{diff} doesn't print a new hunk header; it just runs the hunks 20.202 +together, with a few lines of context between modifications. 20.203 + 20.204 +Each line of context begins with a space character. Within the hunk, 20.205 +a line that begins with ``\texttt{-}'' means ``remove this line,'' 20.206 +while a line that begins with ``\texttt{+}'' means ``insert this 20.207 +line.'' For example, a line that is modified is represented by one 20.208 +deletion and one insertion. 20.209 + 20.210 +We will return to some of the more subtle aspects of patches later (in 20.211 +section~\ref{sec:mq:adv-patch}), but you should have enough information 20.212 +now to use MQ. 20.213 + 20.214 +\section{Getting started with Mercurial Queues} 20.215 +\label{sec:mq:start} 20.216 + 20.217 +Because MQ is implemented as an extension, you must explicitly enable 20.218 +before you can use it. (You don't need to download anything; MQ ships 20.219 +with the standard Mercurial distribution.) To enable MQ, edit your 20.220 +\tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}. 20.221 + 20.222 +\begin{figure}[ht] 20.223 + \begin{codesample4} 20.224 + [extensions] 20.225 + hgext.mq = 20.226 + \end{codesample4} 20.227 + \label{ex:mq:config} 20.228 + \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension} 20.229 +\end{figure} 20.230 + 20.231 +Once the extension is enabled, it will make a number of new commands 20.232 +available. To verify that the extension is working, you can use 20.233 +\hgcmd{help} to see if the \hgxcmd{mq}{qinit} command is now available; see 20.234 +the example in figure~\ref{ex:mq:enabled}. 20.235 + 20.236 +\begin{figure}[ht] 20.237 + \interaction{mq.qinit-help.help} 20.238 + \caption{How to verify that MQ is enabled} 20.239 + \label{ex:mq:enabled} 20.240 +\end{figure} 20.241 + 20.242 +You can use MQ with \emph{any} Mercurial repository, and its commands 20.243 +only operate within that repository. To get started, simply prepare 20.244 +the repository using the \hgxcmd{mq}{qinit} command (see 20.245 +figure~\ref{ex:mq:qinit}). This command creates an empty directory 20.246 +called \sdirname{.hg/patches}, where MQ will keep its metadata. As 20.247 +with many Mercurial commands, the \hgxcmd{mq}{qinit} command prints nothing 20.248 +if it succeeds. 20.249 + 20.250 +\begin{figure}[ht] 20.251 + \interaction{mq.tutorial.qinit} 20.252 + \caption{Preparing a repository for use with MQ} 20.253 + \label{ex:mq:qinit} 20.254 +\end{figure} 20.255 + 20.256 +\begin{figure}[ht] 20.257 + \interaction{mq.tutorial.qnew} 20.258 + \caption{Creating a new patch} 20.259 + \label{ex:mq:qnew} 20.260 +\end{figure} 20.261 + 20.262 +\subsection{Creating a new patch} 20.263 + 20.264 +To begin work on a new patch, use the \hgxcmd{mq}{qnew} command. This 20.265 +command takes one argument, the name of the patch to create. MQ will 20.266 +use this as the name of an actual file in the \sdirname{.hg/patches} 20.267 +directory, as you can see in figure~\ref{ex:mq:qnew}. 20.268 + 20.269 +Also newly present in the \sdirname{.hg/patches} directory are two 20.270 +other files, \sfilename{series} and \sfilename{status}. The 20.271 +\sfilename{series} file lists all of the patches that MQ knows about 20.272 +for this repository, with one patch per line. Mercurial uses the 20.273 +\sfilename{status} file for internal book-keeping; it tracks all of the 20.274 +patches that MQ has \emph{applied} in this repository. 20.275 + 20.276 +\begin{note} 20.277 + You may sometimes want to edit the \sfilename{series} file by hand; 20.278 + for example, to change the sequence in which some patches are 20.279 + applied. However, manually editing the \sfilename{status} file is 20.280 + almost always a bad idea, as it's easy to corrupt MQ's idea of what 20.281 + is happening. 20.282 +\end{note} 20.283 + 20.284 +Once you have created your new patch, you can edit files in the 20.285 +working directory as you usually would. All of the normal Mercurial 20.286 +commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as 20.287 +they did before. 20.288 + 20.289 +\subsection{Refreshing a patch} 20.290 + 20.291 +When you reach a point where you want to save your work, use the 20.292 +\hgxcmd{mq}{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch 20.293 +you are working on. This command folds the changes you have made in 20.294 +the working directory into your patch, and updates its corresponding 20.295 +changeset to contain those changes. 20.296 + 20.297 +\begin{figure}[ht] 20.298 + \interaction{mq.tutorial.qrefresh} 20.299 + \caption{Refreshing a patch} 20.300 + \label{ex:mq:qrefresh} 20.301 +\end{figure} 20.302 + 20.303 +You can run \hgxcmd{mq}{qrefresh} as often as you like, so it's a good way 20.304 +to ``checkpoint'' your work. Refresh your patch at an opportune 20.305 +time; try an experiment; and if the experiment doesn't work out, 20.306 +\hgcmd{revert} your modifications back to the last time you refreshed. 20.307 + 20.308 +\begin{figure}[ht] 20.309 + \interaction{mq.tutorial.qrefresh2} 20.310 + \caption{Refresh a patch many times to accumulate changes} 20.311 + \label{ex:mq:qrefresh2} 20.312 +\end{figure} 20.313 + 20.314 +\subsection{Stacking and tracking patches} 20.315 + 20.316 +Once you have finished working on a patch, or need to work on another, 20.317 +you can use the \hgxcmd{mq}{qnew} command again to create a new patch. 20.318 +Mercurial will apply this patch on top of your existing patch. See 20.319 +figure~\ref{ex:mq:qnew2} for an example. Notice that the patch 20.320 +contains the changes in our prior patch as part of its context (you 20.321 +can see this more clearly in the output of \hgcmd{annotate}). 20.322 + 20.323 +\begin{figure}[ht] 20.324 + \interaction{mq.tutorial.qnew2} 20.325 + \caption{Stacking a second patch on top of the first} 20.326 + \label{ex:mq:qnew2} 20.327 +\end{figure} 20.328 + 20.329 +So far, with the exception of \hgxcmd{mq}{qnew} and \hgxcmd{mq}{qrefresh}, we've 20.330 +been careful to only use regular Mercurial commands. However, MQ 20.331 +provides many commands that are easier to use when you are thinking 20.332 +about patches, as illustrated in figure~\ref{ex:mq:qseries}: 20.333 + 20.334 +\begin{itemize} 20.335 +\item The \hgxcmd{mq}{qseries} command lists every patch that MQ knows 20.336 + about in this repository, from oldest to newest (most recently 20.337 + \emph{created}). 20.338 +\item The \hgxcmd{mq}{qapplied} command lists every patch that MQ has 20.339 + \emph{applied} in this repository, again from oldest to newest (most 20.340 + recently applied). 20.341 +\end{itemize} 20.342 + 20.343 +\begin{figure}[ht] 20.344 + \interaction{mq.tutorial.qseries} 20.345 + \caption{Understanding the patch stack with \hgxcmd{mq}{qseries} and 20.346 + \hgxcmd{mq}{qapplied}} 20.347 + \label{ex:mq:qseries} 20.348 +\end{figure} 20.349 + 20.350 +\subsection{Manipulating the patch stack} 20.351 + 20.352 +The previous discussion implied that there must be a difference 20.353 +between ``known'' and ``applied'' patches, and there is. MQ can 20.354 +manage a patch without it being applied in the repository. 20.355 + 20.356 +An \emph{applied} patch has a corresponding changeset in the 20.357 +repository, and the effects of the patch and changeset are visible in 20.358 +the working directory. You can undo the application of a patch using 20.359 +the \hgxcmd{mq}{qpop} command. MQ still \emph{knows about}, or manages, a 20.360 +popped patch, but the patch no longer has a corresponding changeset in 20.361 +the repository, and the working directory does not contain the changes 20.362 +made by the patch. Figure~\ref{fig:mq:stack} illustrates the 20.363 +difference between applied and tracked patches. 20.364 + 20.365 +\begin{figure}[ht] 20.366 + \centering 20.367 + \grafix{mq-stack} 20.368 + \caption{Applied and unapplied patches in the MQ patch stack} 20.369 + \label{fig:mq:stack} 20.370 +\end{figure} 20.371 + 20.372 +You can reapply an unapplied, or popped, patch using the \hgxcmd{mq}{qpush} 20.373 +command. This creates a new changeset to correspond to the patch, and 20.374 +the patch's changes once again become present in the working 20.375 +directory. See figure~\ref{ex:mq:qpop} for examples of \hgxcmd{mq}{qpop} 20.376 +and \hgxcmd{mq}{qpush} in action. Notice that once we have popped a patch 20.377 +or two patches, the output of \hgxcmd{mq}{qseries} remains the same, while 20.378 +that of \hgxcmd{mq}{qapplied} has changed. 20.379 + 20.380 +\begin{figure}[ht] 20.381 + \interaction{mq.tutorial.qpop} 20.382 + \caption{Modifying the stack of applied patches} 20.383 + \label{ex:mq:qpop} 20.384 +\end{figure} 20.385 + 20.386 +\subsection{Pushing and popping many patches} 20.387 + 20.388 +While \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} each operate on a single patch at 20.389 +a time by default, you can push and pop many patches in one go. The 20.390 +\hgxopt{mq}{qpush}{-a} option to \hgxcmd{mq}{qpush} causes it to push all 20.391 +unapplied patches, while the \hgxopt{mq}{qpop}{-a} option to \hgxcmd{mq}{qpop} 20.392 +causes it to pop all applied patches. (For some more ways to push and 20.393 +pop many patches, see section~\ref{sec:mq:perf} below.) 20.394 + 20.395 +\begin{figure}[ht] 20.396 + \interaction{mq.tutorial.qpush-a} 20.397 + \caption{Pushing all unapplied patches} 20.398 + \label{ex:mq:qpush-a} 20.399 +\end{figure} 20.400 + 20.401 +\subsection{Safety checks, and overriding them} 20.402 + 20.403 +Several MQ commands check the working directory before they do 20.404 +anything, and fail if they find any modifications. They do this to 20.405 +ensure that you won't lose any changes that you have made, but not yet 20.406 +incorporated into a patch. Figure~\ref{ex:mq:add} illustrates this; 20.407 +the \hgxcmd{mq}{qnew} command will not create a new patch if there are 20.408 +outstanding changes, caused in this case by the \hgcmd{add} of 20.409 +\filename{file3}. 20.410 + 20.411 +\begin{figure}[ht] 20.412 + \interaction{mq.tutorial.add} 20.413 + \caption{Forcibly creating a patch} 20.414 + \label{ex:mq:add} 20.415 +\end{figure} 20.416 + 20.417 +Commands that check the working directory all take an ``I know what 20.418 +I'm doing'' option, which is always named \option{-f}. The exact 20.419 +meaning of \option{-f} depends on the command. For example, 20.420 +\hgcmdargs{qnew}{\hgxopt{mq}{qnew}{-f}} will incorporate any outstanding 20.421 +changes into the new patch it creates, but 20.422 +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-f}} will revert modifications to any 20.423 +files affected by the patch that it is popping. Be sure to read the 20.424 +documentation for a command's \option{-f} option before you use it! 20.425 + 20.426 +\subsection{Working on several patches at once} 20.427 + 20.428 +The \hgxcmd{mq}{qrefresh} command always refreshes the \emph{topmost} 20.429 +applied patch. This means that you can suspend work on one patch (by 20.430 +refreshing it), pop or push to make a different patch the top, and 20.431 +work on \emph{that} patch for a while. 20.432 + 20.433 +Here's an example that illustrates how you can use this ability. 20.434 +Let's say you're developing a new feature as two patches. The first 20.435 +is a change to the core of your software, and the second---layered on 20.436 +top of the first---changes the user interface to use the code you just 20.437 +added to the core. If you notice a bug in the core while you're 20.438 +working on the UI patch, it's easy to fix the core. Simply 20.439 +\hgxcmd{mq}{qrefresh} the UI patch to save your in-progress changes, and 20.440 +\hgxcmd{mq}{qpop} down to the core patch. Fix the core bug, 20.441 +\hgxcmd{mq}{qrefresh} the core patch, and \hgxcmd{mq}{qpush} back to the UI 20.442 +patch to continue where you left off. 20.443 + 20.444 +\section{More about patches} 20.445 +\label{sec:mq:adv-patch} 20.446 + 20.447 +MQ uses the GNU \command{patch} command to apply patches, so it's 20.448 +helpful to know a few more detailed aspects of how \command{patch} 20.449 +works, and about patches themselves. 20.450 + 20.451 +\subsection{The strip count} 20.452 + 20.453 +If you look at the file headers in a patch, you will notice that the 20.454 +pathnames usually have an extra component on the front that isn't 20.455 +present in the actual path name. This is a holdover from the way that 20.456 +people used to generate patches (people still do this, but it's 20.457 +somewhat rare with modern revision control tools). 20.458 + 20.459 +Alice would unpack a tarball, edit her files, then decide that she 20.460 +wanted to create a patch. So she'd rename her working directory, 20.461 +unpack the tarball again (hence the need for the rename), and use the 20.462 +\cmdopt{diff}{-r} and \cmdopt{diff}{-N} options to \command{diff} to 20.463 +recursively generate a patch between the unmodified directory and the 20.464 +modified one. The result would be that the name of the unmodified 20.465 +directory would be at the front of the left-hand path in every file 20.466 +header, and the name of the modified directory would be at the front 20.467 +of the right-hand path. 20.468 + 20.469 +Since someone receiving a patch from the Alices of the net would be 20.470 +unlikely to have unmodified and modified directories with exactly the 20.471 +same names, the \command{patch} command has a \cmdopt{patch}{-p} 20.472 +option that indicates the number of leading path name components to 20.473 +strip when trying to apply a patch. This number is called the 20.474 +\emph{strip count}. 20.475 + 20.476 +An option of ``\texttt{-p1}'' means ``use a strip count of one''. If 20.477 +\command{patch} sees a file name \filename{foo/bar/baz} in a file 20.478 +header, it will strip \filename{foo} and try to patch a file named 20.479 +\filename{bar/baz}. (Strictly speaking, the strip count refers to the 20.480 +number of \emph{path separators} (and the components that go with them 20.481 +) to strip. A strip count of one will turn \filename{foo/bar} into 20.482 +\filename{bar}, but \filename{/foo/bar} (notice the extra leading 20.483 +slash) into \filename{foo/bar}.) 20.484 + 20.485 +The ``standard'' strip count for patches is one; almost all patches 20.486 +contain one leading path name component that needs to be stripped. 20.487 +Mercurial's \hgcmd{diff} command generates path names in this form, 20.488 +and the \hgcmd{import} command and MQ expect patches to have a strip 20.489 +count of one. 20.490 + 20.491 +If you receive a patch from someone that you want to add to your patch 20.492 +queue, and the patch needs a strip count other than one, you cannot 20.493 +just \hgxcmd{mq}{qimport} the patch, because \hgxcmd{mq}{qimport} does not yet 20.494 +have a \texttt{-p} option (see~\bug{311}). Your best bet is to 20.495 +\hgxcmd{mq}{qnew} a patch of your own, then use \cmdargs{patch}{-p\emph{N}} 20.496 +to apply their patch, followed by \hgcmd{addremove} to pick up any 20.497 +files added or removed by the patch, followed by \hgxcmd{mq}{qrefresh}. 20.498 +This complexity may become unnecessary; see~\bug{311} for details. 20.499 +\subsection{Strategies for applying a patch} 20.500 + 20.501 +When \command{patch} applies a hunk, it tries a handful of 20.502 +successively less accurate strategies to try to make the hunk apply. 20.503 +This falling-back technique often makes it possible to take a patch 20.504 +that was generated against an old version of a file, and apply it 20.505 +against a newer version of that file. 20.506 + 20.507 +First, \command{patch} tries an exact match, where the line numbers, 20.508 +the context, and the text to be modified must apply exactly. If it 20.509 +cannot make an exact match, it tries to find an exact match for the 20.510 +context, without honouring the line numbering information. If this 20.511 +succeeds, it prints a line of output saying that the hunk was applied, 20.512 +but at some \emph{offset} from the original line number. 20.513 + 20.514 +If a context-only match fails, \command{patch} removes the first and 20.515 +last lines of the context, and tries a \emph{reduced} context-only 20.516 +match. If the hunk with reduced context succeeds, it prints a message 20.517 +saying that it applied the hunk with a \emph{fuzz factor} (the number 20.518 +after the fuzz factor indicates how many lines of context 20.519 +\command{patch} had to trim before the patch applied). 20.520 + 20.521 +When neither of these techniques works, \command{patch} prints a 20.522 +message saying that the hunk in question was rejected. It saves 20.523 +rejected hunks (also simply called ``rejects'') to a file with the 20.524 +same name, and an added \sfilename{.rej} extension. It also saves an 20.525 +unmodified copy of the file with a \sfilename{.orig} extension; the 20.526 +copy of the file without any extensions will contain any changes made 20.527 +by hunks that \emph{did} apply cleanly. If you have a patch that 20.528 +modifies \filename{foo} with six hunks, and one of them fails to 20.529 +apply, you will have: an unmodified \filename{foo.orig}, a 20.530 +\filename{foo.rej} containing one hunk, and \filename{foo}, containing 20.531 +the changes made by the five successful hunks. 20.532 + 20.533 +\subsection{Some quirks of patch representation} 20.534 + 20.535 +There are a few useful things to know about how \command{patch} works 20.536 +with files. 20.537 +\begin{itemize} 20.538 +\item This should already be obvious, but \command{patch} cannot 20.539 + handle binary files. 20.540 +\item Neither does it care about the executable bit; it creates new 20.541 + files as readable, but not executable. 20.542 +\item \command{patch} treats the removal of a file as a diff between 20.543 + the file to be removed and the empty file. So your idea of ``I 20.544 + deleted this file'' looks like ``every line of this file was 20.545 + deleted'' in a patch. 20.546 +\item It treats the addition of a file as a diff between the empty 20.547 + file and the file to be added. So in a patch, your idea of ``I 20.548 + added this file'' looks like ``every line of this file was added''. 20.549 +\item It treats a renamed file as the removal of the old name, and the 20.550 + addition of the new name. This means that renamed files have a big 20.551 + footprint in patches. (Note also that Mercurial does not currently 20.552 + try to infer when files have been renamed or copied in a patch.) 20.553 +\item \command{patch} cannot represent empty files, so you cannot use 20.554 + a patch to represent the notion ``I added this empty file to the 20.555 + tree''. 20.556 +\end{itemize} 20.557 +\subsection{Beware the fuzz} 20.558 + 20.559 +While applying a hunk at an offset, or with a fuzz factor, will often 20.560 +be completely successful, these inexact techniques naturally leave 20.561 +open the possibility of corrupting the patched file. The most common 20.562 +cases typically involve applying a patch twice, or at an incorrect 20.563 +location in the file. If \command{patch} or \hgxcmd{mq}{qpush} ever 20.564 +mentions an offset or fuzz factor, you should make sure that the 20.565 +modified files are correct afterwards. 20.566 + 20.567 +It's often a good idea to refresh a patch that has applied with an 20.568 +offset or fuzz factor; refreshing the patch generates new context 20.569 +information that will make it apply cleanly. I say ``often,'' not 20.570 +``always,'' because sometimes refreshing a patch will make it fail to 20.571 +apply against a different revision of the underlying files. In some 20.572 +cases, such as when you're maintaining a patch that must sit on top of 20.573 +multiple versions of a source tree, it's acceptable to have a patch 20.574 +apply with some fuzz, provided you've verified the results of the 20.575 +patching process in such cases. 20.576 + 20.577 +\subsection{Handling rejection} 20.578 + 20.579 +If \hgxcmd{mq}{qpush} fails to apply a patch, it will print an error 20.580 +message and exit. If it has left \sfilename{.rej} files behind, it is 20.581 +usually best to fix up the rejected hunks before you push more patches 20.582 +or do any further work. 20.583 + 20.584 +If your patch \emph{used to} apply cleanly, and no longer does because 20.585 +you've changed the underlying code that your patches are based on, 20.586 +Mercurial Queues can help; see section~\ref{sec:mq:merge} for details. 20.587 + 20.588 +Unfortunately, there aren't any great techniques for dealing with 20.589 +rejected hunks. Most often, you'll need to view the \sfilename{.rej} 20.590 +file and edit the target file, applying the rejected hunks by hand. 20.591 + 20.592 +If you're feeling adventurous, Neil Brown, a Linux kernel hacker, 20.593 +wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more 20.594 +vigorous than \command{patch} in its attempts to make a patch apply. 20.595 + 20.596 +Another Linux kernel hacker, Chris Mason (the author of Mercurial 20.597 +Queues), wrote a similar tool called 20.598 +\command{mpatch}~\cite{web:mpatch}, which takes a simple approach to 20.599 +automating the application of hunks rejected by \command{patch}. The 20.600 +\command{mpatch} command can help with four common reasons that a hunk 20.601 +may be rejected: 20.602 + 20.603 +\begin{itemize} 20.604 +\item The context in the middle of a hunk has changed. 20.605 +\item A hunk is missing some context at the beginning or end. 20.606 +\item A large hunk might apply better---either entirely or in 20.607 + part---if it was broken up into smaller hunks. 20.608 +\item A hunk removes lines with slightly different content than those 20.609 + currently present in the file. 20.610 +\end{itemize} 20.611 + 20.612 +If you use \command{wiggle} or \command{mpatch}, you should be doubly 20.613 +careful to check your results when you're done. In fact, 20.614 +\command{mpatch} enforces this method of double-checking the tool's 20.615 +output, by automatically dropping you into a merge program when it has 20.616 +done its job, so that you can verify its work and finish off any 20.617 +remaining merges. 20.618 + 20.619 +\section{Getting the best performance out of MQ} 20.620 +\label{sec:mq:perf} 20.621 + 20.622 +MQ is very efficient at handling a large number of patches. I ran 20.623 +some performance experiments in mid-2006 for a talk that I gave at the 20.624 +2006 EuroPython conference~\cite{web:europython}. I used as my data 20.625 +set the Linux 2.6.17-mm1 patch series, which consists of 1,738 20.626 +patches. I applied these on top of a Linux kernel repository 20.627 +containing all 27,472 revisions between Linux 2.6.12-rc2 and Linux 20.628 +2.6.17. 20.629 + 20.630 +On my old, slow laptop, I was able to 20.631 +\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} all 1,738 patches in 3.5 minutes, 20.632 +and \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} them all in 30 seconds. (On a 20.633 +newer laptop, the time to push all patches dropped to two minutes.) I 20.634 +could \hgxcmd{mq}{qrefresh} one of the biggest patches (which made 22,779 20.635 +lines of changes to 287 files) in 6.6 seconds. 20.636 + 20.637 +Clearly, MQ is well suited to working in large trees, but there are a 20.638 +few tricks you can use to get the best performance of it. 20.639 + 20.640 +First of all, try to ``batch'' operations together. Every time you 20.641 +run \hgxcmd{mq}{qpush} or \hgxcmd{mq}{qpop}, these commands scan the working 20.642 +directory once to make sure you haven't made some changes and then 20.643 +forgotten to run \hgxcmd{mq}{qrefresh}. On a small tree, the time that 20.644 +this scan takes is unnoticeable. However, on a medium-sized tree 20.645 +(containing tens of thousands of files), it can take a second or more. 20.646 + 20.647 +The \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} commands allow you to push and pop 20.648 +multiple patches at a time. You can identify the ``destination 20.649 +patch'' that you want to end up at. When you \hgxcmd{mq}{qpush} with a 20.650 +destination specified, it will push patches until that patch is at the 20.651 +top of the applied stack. When you \hgxcmd{mq}{qpop} to a destination, MQ 20.652 +will pop patches until the destination patch is at the top. 20.653 + 20.654 +You can identify a destination patch using either the name of the 20.655 +patch, or by number. If you use numeric addressing, patches are 20.656 +counted from zero; this means that the first patch is zero, the second 20.657 +is one, and so on. 20.658 + 20.659 +\section{Updating your patches when the underlying code changes} 20.660 +\label{sec:mq:merge} 20.661 + 20.662 +It's common to have a stack of patches on top of an underlying 20.663 +repository that you don't modify directly. If you're working on 20.664 +changes to third-party code, or on a feature that is taking longer to 20.665 +develop than the rate of change of the code beneath, you will often 20.666 +need to sync up with the underlying code, and fix up any hunks in your 20.667 +patches that no longer apply. This is called \emph{rebasing} your 20.668 +patch series. 20.669 + 20.670 +The simplest way to do this is to \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} 20.671 +your patches, then \hgcmd{pull} changes into the underlying 20.672 +repository, and finally \hgcmdargs{qpush}{\hgxopt{mq}{qpop}{-a}} your 20.673 +patches again. MQ will stop pushing any time it runs across a patch 20.674 +that fails to apply during conflicts, allowing you to fix your 20.675 +conflicts, \hgxcmd{mq}{qrefresh} the affected patch, and continue pushing 20.676 +until you have fixed your entire stack. 20.677 + 20.678 +This approach is easy to use and works well if you don't expect 20.679 +changes to the underlying code to affect how well your patches apply. 20.680 +If your patch stack touches code that is modified frequently or 20.681 +invasively in the underlying repository, however, fixing up rejected 20.682 +hunks by hand quickly becomes tiresome. 20.683 + 20.684 +It's possible to partially automate the rebasing process. If your 20.685 +patches apply cleanly against some revision of the underlying repo, MQ 20.686 +can use this information to help you to resolve conflicts between your 20.687 +patches and a different revision. 20.688 + 20.689 +The process is a little involved. 20.690 +\begin{enumerate} 20.691 +\item To begin, \hgcmdargs{qpush}{-a} all of your patches on top of 20.692 + the revision where you know that they apply cleanly. 20.693 +\item Save a backup copy of your patch directory using 20.694 + \hgcmdargs{qsave}{\hgxopt{mq}{qsave}{-e} \hgxopt{mq}{qsave}{-c}}. This prints 20.695 + the name of the directory that it has saved the patches in. It will 20.696 + save the patches to a directory called 20.697 + \sdirname{.hg/patches.\emph{N}}, where \texttt{\emph{N}} is a small 20.698 + integer. It also commits a ``save changeset'' on top of your 20.699 + applied patches; this is for internal book-keeping, and records the 20.700 + states of the \sfilename{series} and \sfilename{status} files. 20.701 +\item Use \hgcmd{pull} to bring new changes into the underlying 20.702 + repository. (Don't run \hgcmdargs{pull}{-u}; see below for why.) 20.703 +\item Update to the new tip revision, using 20.704 + \hgcmdargs{update}{\hgopt{update}{-C}} to override the patches you 20.705 + have pushed. 20.706 +\item Merge all patches using \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m} 20.707 + \hgxopt{mq}{qpush}{-a}}. The \hgxopt{mq}{qpush}{-m} option to \hgxcmd{mq}{qpush} 20.708 + tells MQ to perform a three-way merge if the patch fails to apply. 20.709 +\end{enumerate} 20.710 + 20.711 +During the \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}}, each patch in the 20.712 +\sfilename{series} file is applied normally. If a patch applies with 20.713 +fuzz or rejects, MQ looks at the queue you \hgxcmd{mq}{qsave}d, and 20.714 +performs a three-way merge with the corresponding changeset. This 20.715 +merge uses Mercurial's normal merge machinery, so it may pop up a GUI 20.716 +merge tool to help you to resolve problems. 20.717 + 20.718 +When you finish resolving the effects of a patch, MQ refreshes your 20.719 +patch based on the result of the merge. 20.720 + 20.721 +At the end of this process, your repository will have one extra head 20.722 +from the old patch queue, and a copy of the old patch queue will be in 20.723 +\sdirname{.hg/patches.\emph{N}}. You can remove the extra head using 20.724 +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a} \hgxopt{mq}{qpop}{-n} patches.\emph{N}} 20.725 +or \hgcmd{strip}. You can delete \sdirname{.hg/patches.\emph{N}} once 20.726 +you are sure that you no longer need it as a backup. 20.727 + 20.728 +\section{Identifying patches} 20.729 + 20.730 +MQ commands that work with patches let you refer to a patch either by 20.731 +using its name or by a number. By name is obvious enough; pass the 20.732 +name \filename{foo.patch} to \hgxcmd{mq}{qpush}, for example, and it will 20.733 +push patches until \filename{foo.patch} is applied. 20.734 + 20.735 +As a shortcut, you can refer to a patch using both a name and a 20.736 +numeric offset; \texttt{foo.patch-2} means ``two patches before 20.737 +\texttt{foo.patch}'', while \texttt{bar.patch+4} means ``four patches 20.738 +after \texttt{bar.patch}''. 20.739 + 20.740 +Referring to a patch by index isn't much different. The first patch 20.741 +printed in the output of \hgxcmd{mq}{qseries} is patch zero (yes, it's one 20.742 +of those start-at-zero counting systems); the second is patch one; and 20.743 +so on. 20.744 + 20.745 +MQ also makes it easy to work with patches when you are using normal 20.746 +Mercurial commands. Every command that accepts a changeset ID will 20.747 +also accept the name of an applied patch. MQ augments the tags 20.748 +normally in the repository with an eponymous one for each applied 20.749 +patch. In addition, the special tags \index{tags!special tag 20.750 + names!\texttt{qbase}}\texttt{qbase} and \index{tags!special tag 20.751 + names!\texttt{qtip}}\texttt{qtip} identify the ``bottom-most'' and 20.752 +topmost applied patches, respectively. 20.753 + 20.754 +These additions to Mercurial's normal tagging capabilities make 20.755 +dealing with patches even more of a breeze. 20.756 +\begin{itemize} 20.757 +\item Want to patchbomb a mailing list with your latest series of 20.758 + changes? 20.759 + \begin{codesample4} 20.760 + hg email qbase:qtip 20.761 + \end{codesample4} 20.762 + (Don't know what ``patchbombing'' is? See 20.763 + section~\ref{sec:hgext:patchbomb}.) 20.764 +\item Need to see all of the patches since \texttt{foo.patch} that 20.765 + have touched files in a subdirectory of your tree? 20.766 + \begin{codesample4} 20.767 + hg log -r foo.patch:qtip \emph{subdir} 20.768 + \end{codesample4} 20.769 +\end{itemize} 20.770 + 20.771 +Because MQ makes the names of patches available to the rest of 20.772 +Mercurial through its normal internal tag machinery, you don't need to 20.773 +type in the entire name of a patch when you want to identify it by 20.774 +name. 20.775 + 20.776 +\begin{figure}[ht] 20.777 + \interaction{mq.id.output} 20.778 + \caption{Using MQ's tag features to work with patches} 20.779 + \label{ex:mq:id} 20.780 +\end{figure} 20.781 + 20.782 +Another nice consequence of representing patch names as tags is that 20.783 +when you run the \hgcmd{log} command, it will display a patch's name 20.784 +as a tag, simply as part of its normal output. This makes it easy to 20.785 +visually distinguish applied patches from underlying ``normal'' 20.786 +revisions. Figure~\ref{ex:mq:id} shows a few normal Mercurial 20.787 +commands in use with applied patches. 20.788 + 20.789 +\section{Useful things to know about} 20.790 + 20.791 +There are a number of aspects of MQ usage that don't fit tidily into 20.792 +sections of their own, but that are good to know. Here they are, in 20.793 +one place. 20.794 + 20.795 +\begin{itemize} 20.796 +\item Normally, when you \hgxcmd{mq}{qpop} a patch and \hgxcmd{mq}{qpush} it 20.797 + again, the changeset that represents the patch after the pop/push 20.798 + will have a \emph{different identity} than the changeset that 20.799 + represented the hash beforehand. See 20.800 + section~\ref{sec:mqref:cmd:qpush} for information as to why this is. 20.801 +\item It's not a good idea to \hgcmd{merge} changes from another 20.802 + branch with a patch changeset, at least if you want to maintain the 20.803 + ``patchiness'' of that changeset and changesets below it on the 20.804 + patch stack. If you try to do this, it will appear to succeed, but 20.805 + MQ will become confused. 20.806 +\end{itemize} 20.807 + 20.808 +\section{Managing patches in a repository} 20.809 +\label{sec:mq:repo} 20.810 + 20.811 +Because MQ's \sdirname{.hg/patches} directory resides outside a 20.812 +Mercurial repository's working directory, the ``underlying'' Mercurial 20.813 +repository knows nothing about the management or presence of patches. 20.814 + 20.815 +This presents the interesting possibility of managing the contents of 20.816 +the patch directory as a Mercurial repository in its own right. This 20.817 +can be a useful way to work. For example, you can work on a patch for 20.818 +a while, \hgxcmd{mq}{qrefresh} it, then \hgcmd{commit} the current state of 20.819 +the patch. This lets you ``roll back'' to that version of the patch 20.820 +later on. 20.821 + 20.822 +You can then share different versions of the same patch stack among 20.823 +multiple underlying repositories. I use this when I am developing a 20.824 +Linux kernel feature. I have a pristine copy of my kernel sources for 20.825 +each of several CPU architectures, and a cloned repository under each 20.826 +that contains the patches I am working on. When I want to test a 20.827 +change on a different architecture, I push my current patches to the 20.828 +patch repository associated with that kernel tree, pop and push all of 20.829 +my patches, and build and test that kernel. 20.830 + 20.831 +Managing patches in a repository makes it possible for multiple 20.832 +developers to work on the same patch series without colliding with 20.833 +each other, all on top of an underlying source base that they may or 20.834 +may not control. 20.835 + 20.836 +\subsection{MQ support for patch repositories} 20.837 + 20.838 +MQ helps you to work with the \sdirname{.hg/patches} directory as a 20.839 +repository; when you prepare a repository for working with patches 20.840 +using \hgxcmd{mq}{qinit}, you can pass the \hgxopt{mq}{qinit}{-c} option to 20.841 +create the \sdirname{.hg/patches} directory as a Mercurial repository. 20.842 + 20.843 +\begin{note} 20.844 + If you forget to use the \hgxopt{mq}{qinit}{-c} option, you can simply go 20.845 + into the \sdirname{.hg/patches} directory at any time and run 20.846 + \hgcmd{init}. Don't forget to add an entry for the 20.847 + \sfilename{status} file to the \sfilename{.hgignore} file, though 20.848 + 20.849 + (\hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} does this for you 20.850 + automatically); you \emph{really} don't want to manage the 20.851 + \sfilename{status} file. 20.852 +\end{note} 20.853 + 20.854 +As a convenience, if MQ notices that the \dirname{.hg/patches} 20.855 +directory is a repository, it will automatically \hgcmd{add} every 20.856 +patch that you create and import. 20.857 + 20.858 +MQ provides a shortcut command, \hgxcmd{mq}{qcommit}, that runs 20.859 +\hgcmd{commit} in the \sdirname{.hg/patches} directory. This saves 20.860 +some bothersome typing. 20.861 + 20.862 +Finally, as a convenience to manage the patch directory, you can 20.863 +define the alias \command{mq} on Unix systems. For example, on Linux 20.864 +systems using the \command{bash} shell, you can include the following 20.865 +snippet in your \tildefile{.bashrc}. 20.866 + 20.867 +\begin{codesample2} 20.868 + alias mq=`hg -R \$(hg root)/.hg/patches' 20.869 +\end{codesample2} 20.870 + 20.871 +You can then issue commands of the form \cmdargs{mq}{pull} from 20.872 +the main repository. 20.873 + 20.874 +\subsection{A few things to watch out for} 20.875 + 20.876 +MQ's support for working with a repository full of patches is limited 20.877 +in a few small respects. 20.878 + 20.879 +MQ cannot automatically detect changes that you make to the patch 20.880 +directory. If you \hgcmd{pull}, manually edit, or \hgcmd{update} 20.881 +changes to patches or the \sfilename{series} file, you will have to 20.882 +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} and then 20.883 +\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} in the underlying repository to 20.884 +see those changes show up there. If you forget to do this, you can 20.885 +confuse MQ's idea of which patches are applied. 20.886 + 20.887 +\section{Third party tools for working with patches} 20.888 +\label{sec:mq:tools} 20.889 + 20.890 +Once you've been working with patches for a while, you'll find 20.891 +yourself hungry for tools that will help you to understand and 20.892 +manipulate the patches you're dealing with. 20.893 + 20.894 +The \command{diffstat} command~\cite{web:diffstat} generates a 20.895 +histogram of the modifications made to each file in a patch. It 20.896 +provides a good way to ``get a sense of'' a patch---which files it 20.897 +affects, and how much change it introduces to each file and as a 20.898 +whole. (I find that it's a good idea to use \command{diffstat}'s 20.899 +\cmdopt{diffstat}{-p} option as a matter of course, as otherwise it 20.900 +will try to do clever things with prefixes of file names that 20.901 +inevitably confuse at least me.) 20.902 + 20.903 +\begin{figure}[ht] 20.904 + \interaction{mq.tools.tools} 20.905 + \caption{The \command{diffstat}, \command{filterdiff}, and \command{lsdiff} commands} 20.906 + \label{ex:mq:tools} 20.907 +\end{figure} 20.908 + 20.909 +The \package{patchutils} package~\cite{web:patchutils} is invaluable. 20.910 +It provides a set of small utilities that follow the ``Unix 20.911 +philosophy;'' each does one useful thing with a patch. The 20.912 +\package{patchutils} command I use most is \command{filterdiff}, which 20.913 +extracts subsets from a patch file. For example, given a patch that 20.914 +modifies hundreds of files across dozens of directories, a single 20.915 +invocation of \command{filterdiff} can generate a smaller patch that 20.916 +only touches files whose names match a particular glob pattern. See 20.917 +section~\ref{mq-collab:tips:interdiff} for another example. 20.918 + 20.919 +\section{Good ways to work with patches} 20.920 + 20.921 +Whether you are working on a patch series to submit to a free software 20.922 +or open source project, or a series that you intend to treat as a 20.923 +sequence of regular changesets when you're done, you can use some 20.924 +simple techniques to keep your work well organised. 20.925 + 20.926 +Give your patches descriptive names. A good name for a patch might be 20.927 +\filename{rework-device-alloc.patch}, because it will immediately give 20.928 +you a hint what the purpose of the patch is. Long names shouldn't be 20.929 +a problem; you won't be typing the names often, but you \emph{will} be 20.930 +running commands like \hgxcmd{mq}{qapplied} and \hgxcmd{mq}{qtop} over and over. 20.931 +Good naming becomes especially important when you have a number of 20.932 +patches to work with, or if you are juggling a number of different 20.933 +tasks and your patches only get a fraction of your attention. 20.934 + 20.935 +Be aware of what patch you're working on. Use the \hgxcmd{mq}{qtop} 20.936 +command and skim over the text of your patches frequently---for 20.937 +example, using \hgcmdargs{tip}{\hgopt{tip}{-p}})---to be sure of where 20.938 +you stand. I have several times worked on and \hgxcmd{mq}{qrefresh}ed a 20.939 +patch other than the one I intended, and it's often tricky to migrate 20.940 +changes into the right patch after making them in the wrong one. 20.941 + 20.942 +For this reason, it is very much worth investing a little time to 20.943 +learn how to use some of the third-party tools I described in 20.944 +section~\ref{sec:mq:tools}, particularly \command{diffstat} and 20.945 +\command{filterdiff}. The former will give you a quick idea of what 20.946 +changes your patch is making, while the latter makes it easy to splice 20.947 +hunks selectively out of one patch and into another. 20.948 + 20.949 +\section{MQ cookbook} 20.950 + 20.951 +\subsection{Manage ``trivial'' patches} 20.952 + 20.953 +Because the overhead of dropping files into a new Mercurial repository 20.954 +is so low, it makes a lot of sense to manage patches this way even if 20.955 +you simply want to make a few changes to a source tarball that you 20.956 +downloaded. 20.957 + 20.958 +Begin by downloading and unpacking the source tarball, 20.959 +and turning it into a Mercurial repository. 20.960 +\interaction{mq.tarball.download} 20.961 + 20.962 +Continue by creating a patch stack and making your changes. 20.963 +\interaction{mq.tarball.qinit} 20.964 + 20.965 +Let's say a few weeks or months pass, and your package author releases 20.966 +a new version. First, bring their changes into the repository. 20.967 +\interaction{mq.tarball.newsource} 20.968 +The pipeline starting with \hgcmd{locate} above deletes all files in 20.969 +the working directory, so that \hgcmd{commit}'s 20.970 +\hgopt{commit}{--addremove} option can actually tell which files have 20.971 +really been removed in the newer version of the source. 20.972 + 20.973 +Finally, you can apply your patches on top of the new tree. 20.974 +\interaction{mq.tarball.repush} 20.975 + 20.976 +\subsection{Combining entire patches} 20.977 +\label{sec:mq:combine} 20.978 + 20.979 +MQ provides a command, \hgxcmd{mq}{qfold} that lets you combine entire 20.980 +patches. This ``folds'' the patches you name, in the order you name 20.981 +them, into the topmost applied patch, and concatenates their 20.982 +descriptions onto the end of its description. The patches that you 20.983 +fold must be unapplied before you fold them. 20.984 + 20.985 +The order in which you fold patches matters. If your topmost applied 20.986 +patch is \texttt{foo}, and you \hgxcmd{mq}{qfold} \texttt{bar} and 20.987 +\texttt{quux} into it, you will end up with a patch that has the same 20.988 +effect as if you applied first \texttt{foo}, then \texttt{bar}, 20.989 +followed by \texttt{quux}. 20.990 + 20.991 +\subsection{Merging part of one patch into another} 20.992 + 20.993 +Merging \emph{part} of one patch into another is more difficult than 20.994 +combining entire patches. 20.995 + 20.996 +If you want to move changes to entire files, you can use 20.997 +\command{filterdiff}'s \cmdopt{filterdiff}{-i} and 20.998 +\cmdopt{filterdiff}{-x} options to choose the modifications to snip 20.999 +out of one patch, concatenating its output onto the end of the patch 20.1000 +you want to merge into. You usually won't need to modify the patch 20.1001 +you've merged the changes from. Instead, MQ will report some rejected 20.1002 +hunks when you \hgxcmd{mq}{qpush} it (from the hunks you moved into the 20.1003 +other patch), and you can simply \hgxcmd{mq}{qrefresh} the patch to drop 20.1004 +the duplicate hunks. 20.1005 + 20.1006 +If you have a patch that has multiple hunks modifying a file, and you 20.1007 +only want to move a few of those hunks, the job becomes more messy, 20.1008 +but you can still partly automate it. Use \cmdargs{lsdiff}{-nvv} to 20.1009 +print some metadata about the patch. 20.1010 +\interaction{mq.tools.lsdiff} 20.1011 + 20.1012 +This command prints three different kinds of number: 20.1013 +\begin{itemize} 20.1014 +\item (in the first column) a \emph{file number} to identify each file 20.1015 + modified in the patch; 20.1016 +\item (on the next line, indented) the line number within a modified 20.1017 + file where a hunk starts; and 20.1018 +\item (on the same line) a \emph{hunk number} to identify that hunk. 20.1019 +\end{itemize} 20.1020 + 20.1021 +You'll have to use some visual inspection, and reading of the patch, 20.1022 +to identify the file and hunk numbers you'll want, but you can then 20.1023 +pass them to to \command{filterdiff}'s \cmdopt{filterdiff}{--files} 20.1024 +and \cmdopt{filterdiff}{--hunks} options, to select exactly the file 20.1025 +and hunk you want to extract. 20.1026 + 20.1027 +Once you have this hunk, you can concatenate it onto the end of your 20.1028 +destination patch and continue with the remainder of 20.1029 +section~\ref{sec:mq:combine}. 20.1030 + 20.1031 +\section{Differences between quilt and MQ} 20.1032 + 20.1033 +If you are already familiar with quilt, MQ provides a similar command 20.1034 +set. There are a few differences in the way that it works. 20.1035 + 20.1036 +You will already have noticed that most quilt commands have MQ 20.1037 +counterparts that simply begin with a ``\texttt{q}''. The exceptions 20.1038 +are quilt's \texttt{add} and \texttt{remove} commands, the 20.1039 +counterparts for which are the normal Mercurial \hgcmd{add} and 20.1040 +\hgcmd{remove} commands. There is no MQ equivalent of the quilt 20.1041 +\texttt{edit} command. 20.1042 + 20.1043 +%%% Local Variables: 20.1044 +%%% mode: latex 20.1045 +%%% TeX-master: "00book" 20.1046 +%%% End:
21.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 21.2 +++ b/en/ch13-mq-collab.tex Thu Jan 29 22:56:27 2009 -0800 21.3 @@ -0,0 +1,393 @@ 21.4 +\chapter{Advanced uses of Mercurial Queues} 21.5 +\label{chap:mq-collab} 21.6 + 21.7 +While it's easy to pick up straightforward uses of Mercurial Queues, 21.8 +use of a little discipline and some of MQ's less frequently used 21.9 +capabilities makes it possible to work in complicated development 21.10 +environments. 21.11 + 21.12 +In this chapter, I will use as an example a technique I have used to 21.13 +manage the development of an Infiniband device driver for the Linux 21.14 +kernel. The driver in question is large (at least as drivers go), 21.15 +with 25,000 lines of code spread across 35 source files. It is 21.16 +maintained by a small team of developers. 21.17 + 21.18 +While much of the material in this chapter is specific to Linux, the 21.19 +same principles apply to any code base for which you're not the 21.20 +primary owner, and upon which you need to do a lot of development. 21.21 + 21.22 +\section{The problem of many targets} 21.23 + 21.24 +The Linux kernel changes rapidly, and has never been internally 21.25 +stable; developers frequently make drastic changes between releases. 21.26 +This means that a version of the driver that works well with a 21.27 +particular released version of the kernel will not even \emph{compile} 21.28 +correctly against, typically, any other version. 21.29 + 21.30 +To maintain a driver, we have to keep a number of distinct versions of 21.31 +Linux in mind. 21.32 +\begin{itemize} 21.33 +\item One target is the main Linux kernel development tree. 21.34 + Maintenance of the code is in this case partly shared by other 21.35 + developers in the kernel community, who make ``drive-by'' 21.36 + modifications to the driver as they develop and refine kernel 21.37 + subsystems. 21.38 +\item We also maintain a number of ``backports'' to older versions of 21.39 + the Linux kernel, to support the needs of customers who are running 21.40 + older Linux distributions that do not incorporate our drivers. (To 21.41 + \emph{backport} a piece of code is to modify it to work in an older 21.42 + version of its target environment than the version it was developed 21.43 + for.) 21.44 +\item Finally, we make software releases on a schedule that is 21.45 + necessarily not aligned with those used by Linux distributors and 21.46 + kernel developers, so that we can deliver new features to customers 21.47 + without forcing them to upgrade their entire kernels or 21.48 + distributions. 21.49 +\end{itemize} 21.50 + 21.51 +\subsection{Tempting approaches that don't work well} 21.52 + 21.53 +There are two ``standard'' ways to maintain a piece of software that 21.54 +has to target many different environments. 21.55 + 21.56 +The first is to maintain a number of branches, each intended for a 21.57 +single target. The trouble with this approach is that you must 21.58 +maintain iron discipline in the flow of changes between repositories. 21.59 +A new feature or bug fix must start life in a ``pristine'' repository, 21.60 +then percolate out to every backport repository. Backport changes are 21.61 +more limited in the branches they should propagate to; a backport 21.62 +change that is applied to a branch where it doesn't belong will 21.63 +probably stop the driver from compiling. 21.64 + 21.65 +The second is to maintain a single source tree filled with conditional 21.66 +statements that turn chunks of code on or off depending on the 21.67 +intended target. Because these ``ifdefs'' are not allowed in the 21.68 +Linux kernel tree, a manual or automatic process must be followed to 21.69 +strip them out and yield a clean tree. A code base maintained in this 21.70 +fashion rapidly becomes a rat's nest of conditional blocks that are 21.71 +difficult to understand and maintain. 21.72 + 21.73 +Neither of these approaches is well suited to a situation where you 21.74 +don't ``own'' the canonical copy of a source tree. In the case of a 21.75 +Linux driver that is distributed with the standard kernel, Linus's 21.76 +tree contains the copy of the code that will be treated by the world 21.77 +as canonical. The upstream version of ``my'' driver can be modified 21.78 +by people I don't know, without me even finding out about it until 21.79 +after the changes show up in Linus's tree. 21.80 + 21.81 +These approaches have the added weakness of making it difficult to 21.82 +generate well-formed patches to submit upstream. 21.83 + 21.84 +In principle, Mercurial Queues seems like a good candidate to manage a 21.85 +development scenario such as the above. While this is indeed the 21.86 +case, MQ contains a few added features that make the job more 21.87 +pleasant. 21.88 + 21.89 +\section{Conditionally applying patches with 21.90 + guards} 21.91 + 21.92 +Perhaps the best way to maintain sanity with so many targets is to be 21.93 +able to choose specific patches to apply for a given situation. MQ 21.94 +provides a feature called ``guards'' (which originates with quilt's 21.95 +\texttt{guards} command) that does just this. To start off, let's 21.96 +create a simple repository for experimenting in. 21.97 +\interaction{mq.guards.init} 21.98 +This gives us a tiny repository that contains two patches that don't 21.99 +have any dependencies on each other, because they touch different files. 21.100 + 21.101 +The idea behind conditional application is that you can ``tag'' a 21.102 +patch with a \emph{guard}, which is simply a text string of your 21.103 +choosing, then tell MQ to select specific guards to use when applying 21.104 +patches. MQ will then either apply, or skip over, a guarded patch, 21.105 +depending on the guards that you have selected. 21.106 + 21.107 +A patch can have an arbitrary number of guards; 21.108 +each one is \emph{positive} (``apply this patch if this guard is 21.109 +selected'') or \emph{negative} (``skip this patch if this guard is 21.110 +selected''). A patch with no guards is always applied. 21.111 + 21.112 +\section{Controlling the guards on a patch} 21.113 + 21.114 +The \hgxcmd{mq}{qguard} command lets you determine which guards should 21.115 +apply to a patch, or display the guards that are already in effect. 21.116 +Without any arguments, it displays the guards on the current topmost 21.117 +patch. 21.118 +\interaction{mq.guards.qguard} 21.119 +To set a positive guard on a patch, prefix the name of the guard with 21.120 +a ``\texttt{+}''. 21.121 +\interaction{mq.guards.qguard.pos} 21.122 +To set a negative guard on a patch, prefix the name of the guard with 21.123 +a ``\texttt{-}''. 21.124 +\interaction{mq.guards.qguard.neg} 21.125 + 21.126 +\begin{note} 21.127 + The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it 21.128 + doesn't \emph{modify} them. What this means is that if you run 21.129 + \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on 21.130 + the same patch, the \emph{only} guard that will be set on it 21.131 + afterwards is \texttt{+c}. 21.132 +\end{note} 21.133 + 21.134 +Mercurial stores guards in the \sfilename{series} file; the form in 21.135 +which they are stored is easy both to understand and to edit by hand. 21.136 +(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if 21.137 +you don't want to; it's okay to simply edit the \sfilename{series} 21.138 +file.) 21.139 +\interaction{mq.guards.series} 21.140 + 21.141 +\section{Selecting the guards to use} 21.142 + 21.143 +The \hgxcmd{mq}{qselect} command determines which guards are active at a 21.144 +given time. The effect of this is to determine which patches MQ will 21.145 +apply the next time you run \hgxcmd{mq}{qpush}. It has no other effect; in 21.146 +particular, it doesn't do anything to patches that are already 21.147 +applied. 21.148 + 21.149 +With no arguments, the \hgxcmd{mq}{qselect} command lists the guards 21.150 +currently in effect, one per line of output. Each argument is treated 21.151 +as the name of a guard to apply. 21.152 +\interaction{mq.guards.qselect.foo} 21.153 +In case you're interested, the currently selected guards are stored in 21.154 +the \sfilename{guards} file. 21.155 +\interaction{mq.guards.qselect.cat} 21.156 +We can see the effect the selected guards have when we run 21.157 +\hgxcmd{mq}{qpush}. 21.158 +\interaction{mq.guards.qselect.qpush} 21.159 + 21.160 +A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}'' 21.161 +character. The name of a guard must not contain white space, but most 21.162 +other characters are acceptable. If you try to use a guard with an 21.163 +invalid name, MQ will complain: 21.164 +\interaction{mq.guards.qselect.error} 21.165 +Changing the selected guards changes the patches that are applied. 21.166 +\interaction{mq.guards.qselect.quux} 21.167 +You can see in the example below that negative guards take precedence 21.168 +over positive guards. 21.169 +\interaction{mq.guards.qselect.foobar} 21.170 + 21.171 +\section{MQ's rules for applying patches} 21.172 + 21.173 +The rules that MQ uses when deciding whether to apply a patch 21.174 +are as follows. 21.175 +\begin{itemize} 21.176 +\item A patch that has no guards is always applied. 21.177 +\item If the patch has any negative guard that matches any currently 21.178 + selected guard, the patch is skipped. 21.179 +\item If the patch has any positive guard that matches any currently 21.180 + selected guard, the patch is applied. 21.181 +\item If the patch has positive or negative guards, but none matches 21.182 + any currently selected guard, the patch is skipped. 21.183 +\end{itemize} 21.184 + 21.185 +\section{Trimming the work environment} 21.186 + 21.187 +In working on the device driver I mentioned earlier, I don't apply the 21.188 +patches to a normal Linux kernel tree. Instead, I use a repository 21.189 +that contains only a snapshot of the source files and headers that are 21.190 +relevant to Infiniband development. This repository is~1\% the size 21.191 +of a kernel repository, so it's easier to work with. 21.192 + 21.193 +I then choose a ``base'' version on top of which the patches are 21.194 +applied. This is a snapshot of the Linux kernel tree as of a revision 21.195 +of my choosing. When I take the snapshot, I record the changeset ID 21.196 +from the kernel repository in the commit message. Since the snapshot 21.197 +preserves the ``shape'' and content of the relevant parts of the 21.198 +kernel tree, I can apply my patches on top of either my tiny 21.199 +repository or a normal kernel tree. 21.200 + 21.201 +Normally, the base tree atop which the patches apply should be a 21.202 +snapshot of a very recent upstream tree. This best facilitates the 21.203 +development of patches that can easily be submitted upstream with few 21.204 +or no modifications. 21.205 + 21.206 +\section{Dividing up the \sfilename{series} file} 21.207 + 21.208 +I categorise the patches in the \sfilename{series} file into a number 21.209 +of logical groups. Each section of like patches begins with a block 21.210 +of comments that describes the purpose of the patches that follow. 21.211 + 21.212 +The sequence of patch groups that I maintain follows. The ordering of 21.213 +these groups is important; I'll describe why after I introduce the 21.214 +groups. 21.215 +\begin{itemize} 21.216 +\item The ``accepted'' group. Patches that the development team has 21.217 + submitted to the maintainer of the Infiniband subsystem, and which 21.218 + he has accepted, but which are not present in the snapshot that the 21.219 + tiny repository is based on. These are ``read only'' patches, 21.220 + present only to transform the tree into a similar state as it is in 21.221 + the upstream maintainer's repository. 21.222 +\item The ``rework'' group. Patches that I have submitted, but that 21.223 + the upstream maintainer has requested modifications to before he 21.224 + will accept them. 21.225 +\item The ``pending'' group. Patches that I have not yet submitted to 21.226 + the upstream maintainer, but which we have finished working on. 21.227 + These will be ``read only'' for a while. If the upstream maintainer 21.228 + accepts them upon submission, I'll move them to the end of the 21.229 + ``accepted'' group. If he requests that I modify any, I'll move 21.230 + them to the beginning of the ``rework'' group. 21.231 +\item The ``in progress'' group. Patches that are actively being 21.232 + developed, and should not be submitted anywhere yet. 21.233 +\item The ``backport'' group. Patches that adapt the source tree to 21.234 + older versions of the kernel tree. 21.235 +\item The ``do not ship'' group. Patches that for some reason should 21.236 + never be submitted upstream. For example, one such patch might 21.237 + change embedded driver identification strings to make it easier to 21.238 + distinguish, in the field, between an out-of-tree version of the 21.239 + driver and a version shipped by a distribution vendor. 21.240 +\end{itemize} 21.241 + 21.242 +Now to return to the reasons for ordering groups of patches in this 21.243 +way. We would like the lowest patches in the stack to be as stable as 21.244 +possible, so that we will not need to rework higher patches due to 21.245 +changes in context. Putting patches that will never be changed first 21.246 +in the \sfilename{series} file serves this purpose. 21.247 + 21.248 +We would also like the patches that we know we'll need to modify to be 21.249 +applied on top of a source tree that resembles the upstream tree as 21.250 +closely as possible. This is why we keep accepted patches around for 21.251 +a while. 21.252 + 21.253 +The ``backport'' and ``do not ship'' patches float at the end of the 21.254 +\sfilename{series} file. The backport patches must be applied on top 21.255 +of all other patches, and the ``do not ship'' patches might as well 21.256 +stay out of harm's way. 21.257 + 21.258 +\section{Maintaining the patch series} 21.259 + 21.260 +In my work, I use a number of guards to control which patches are to 21.261 +be applied. 21.262 + 21.263 +\begin{itemize} 21.264 +\item ``Accepted'' patches are guarded with \texttt{accepted}. I 21.265 + enable this guard most of the time. When I'm applying the patches 21.266 + on top of a tree where the patches are already present, I can turn 21.267 + this patch off, and the patches that follow it will apply cleanly. 21.268 +\item Patches that are ``finished'', but not yet submitted, have no 21.269 + guards. If I'm applying the patch stack to a copy of the upstream 21.270 + tree, I don't need to enable any guards in order to get a reasonably 21.271 + safe source tree. 21.272 +\item Those patches that need reworking before being resubmitted are 21.273 + guarded with \texttt{rework}. 21.274 +\item For those patches that are still under development, I use 21.275 + \texttt{devel}. 21.276 +\item A backport patch may have several guards, one for each version 21.277 + of the kernel to which it applies. For example, a patch that 21.278 + backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard. 21.279 +\end{itemize} 21.280 +This variety of guards gives me considerable flexibility in 21.281 +determining what kind of source tree I want to end up with. For most 21.282 +situations, the selection of appropriate guards is automated during 21.283 +the build process, but I can manually tune the guards to use for less 21.284 +common circumstances. 21.285 + 21.286 +\subsection{The art of writing backport patches} 21.287 + 21.288 +Using MQ, writing a backport patch is a simple process. All such a 21.289 +patch has to do is modify a piece of code that uses a kernel feature 21.290 +not present in the older version of the kernel, so that the driver 21.291 +continues to work correctly under that older version. 21.292 + 21.293 +A useful goal when writing a good backport patch is to make your code 21.294 +look as if it was written for the older version of the kernel you're 21.295 +targeting. The less obtrusive the patch, the easier it will be to 21.296 +understand and maintain. If you're writing a collection of backport 21.297 +patches to avoid the ``rat's nest'' effect of lots of 21.298 +\texttt{\#ifdef}s (hunks of source code that are only used 21.299 +conditionally) in your code, don't introduce version-dependent 21.300 +\texttt{\#ifdef}s into the patches. Instead, write several patches, 21.301 +each of which makes unconditional changes, and control their 21.302 +application using guards. 21.303 + 21.304 +There are two reasons to divide backport patches into a distinct 21.305 +group, away from the ``regular'' patches whose effects they modify. 21.306 +The first is that intermingling the two makes it more difficult to use 21.307 +a tool like the \hgext{patchbomb} extension to automate the process of 21.308 +submitting the patches to an upstream maintainer. The second is that 21.309 +a backport patch could perturb the context in which a subsequent 21.310 +regular patch is applied, making it impossible to apply the regular 21.311 +patch cleanly \emph{without} the earlier backport patch already being 21.312 +applied. 21.313 + 21.314 +\section{Useful tips for developing with MQ} 21.315 + 21.316 +\subsection{Organising patches in directories} 21.317 + 21.318 +If you're working on a substantial project with MQ, it's not difficult 21.319 +to accumulate a large number of patches. For example, I have one 21.320 +patch repository that contains over 250 patches. 21.321 + 21.322 +If you can group these patches into separate logical categories, you 21.323 +can if you like store them in different directories; MQ has no 21.324 +problems with patch names that contain path separators. 21.325 + 21.326 +\subsection{Viewing the history of a patch} 21.327 +\label{mq-collab:tips:interdiff} 21.328 + 21.329 +If you're developing a set of patches over a long time, it's a good 21.330 +idea to maintain them in a repository, as discussed in 21.331 +section~\ref{sec:mq:repo}. If you do so, you'll quickly discover that 21.332 +using the \hgcmd{diff} command to look at the history of changes to a 21.333 +patch is unworkable. This is in part because you're looking at the 21.334 +second derivative of the real code (a diff of a diff), but also 21.335 +because MQ adds noise to the process by modifying time stamps and 21.336 +directory names when it updates a patch. 21.337 + 21.338 +However, you can use the \hgext{extdiff} extension, which is bundled 21.339 +with Mercurial, to turn a diff of two versions of a patch into 21.340 +something readable. To do this, you will need a third-party package 21.341 +called \package{patchutils}~\cite{web:patchutils}. This provides a 21.342 +command named \command{interdiff}, which shows the differences between 21.343 +two diffs as a diff. Used on two versions of the same diff, it 21.344 +generates a diff that represents the diff from the first to the second 21.345 +version. 21.346 + 21.347 +You can enable the \hgext{extdiff} extension in the usual way, by 21.348 +adding a line to the \rcsection{extensions} section of your \hgrc. 21.349 +\begin{codesample2} 21.350 + [extensions] 21.351 + extdiff = 21.352 +\end{codesample2} 21.353 +The \command{interdiff} command expects to be passed the names of two 21.354 +files, but the \hgext{extdiff} extension passes the program it runs a 21.355 +pair of directories, each of which can contain an arbitrary number of 21.356 +files. We thus need a small program that will run \command{interdiff} 21.357 +on each pair of files in these two directories. This program is 21.358 +available as \sfilename{hg-interdiff} in the \dirname{examples} 21.359 +directory of the source code repository that accompanies this book. 21.360 +\excode{hg-interdiff} 21.361 + 21.362 +With the \sfilename{hg-interdiff} program in your shell's search path, 21.363 +you can run it as follows, from inside an MQ patch directory: 21.364 +\begin{codesample2} 21.365 + hg extdiff -p hg-interdiff -r A:B my-change.patch 21.366 +\end{codesample2} 21.367 +Since you'll probably want to use this long-winded command a lot, you 21.368 +can get \hgext{hgext} to make it available as a normal Mercurial 21.369 +command, again by editing your \hgrc. 21.370 +\begin{codesample2} 21.371 + [extdiff] 21.372 + cmd.interdiff = hg-interdiff 21.373 +\end{codesample2} 21.374 +This directs \hgext{hgext} to make an \texttt{interdiff} command 21.375 +available, so you can now shorten the previous invocation of 21.376 +\hgxcmd{extdiff}{extdiff} to something a little more wieldy. 21.377 +\begin{codesample2} 21.378 + hg interdiff -r A:B my-change.patch 21.379 +\end{codesample2} 21.380 + 21.381 +\begin{note} 21.382 + The \command{interdiff} command works well only if the underlying 21.383 + files against which versions of a patch are generated remain the 21.384 + same. If you create a patch, modify the underlying files, and then 21.385 + regenerate the patch, \command{interdiff} may not produce useful 21.386 + output. 21.387 +\end{note} 21.388 + 21.389 +The \hgext{extdiff} extension is useful for more than merely improving 21.390 +the presentation of MQ~patches. To read more about it, go to 21.391 +section~\ref{sec:hgext:extdiff}. 21.392 + 21.393 +%%% Local Variables: 21.394 +%%% mode: latex 21.395 +%%% TeX-master: "00book" 21.396 +%%% End:
22.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 22.2 +++ b/en/ch14-hgext.tex Thu Jan 29 22:56:27 2009 -0800 22.3 @@ -0,0 +1,429 @@ 22.4 +\chapter{Adding functionality with extensions} 22.5 +\label{chap:hgext} 22.6 + 22.7 +While the core of Mercurial is quite complete from a functionality 22.8 +standpoint, it's deliberately shorn of fancy features. This approach 22.9 +of preserving simplicity keeps the software easy to deal with for both 22.10 +maintainers and users. 22.11 + 22.12 +However, Mercurial doesn't box you in with an inflexible command set: 22.13 +you can add features to it as \emph{extensions} (sometimes known as 22.14 +\emph{plugins}). We've already discussed a few of these extensions in 22.15 +earlier chapters. 22.16 +\begin{itemize} 22.17 +\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch} 22.18 + extension; this combines pulling new changes and merging them with 22.19 + local changes into a single command, \hgxcmd{fetch}{fetch}. 22.20 +\item In chapter~\ref{chap:hook}, we covered several extensions that 22.21 + are useful for hook-related functionality: \hgext{acl} adds access 22.22 + control lists; \hgext{bugzilla} adds integration with the Bugzilla 22.23 + bug tracking system; and \hgext{notify} sends notification emails on 22.24 + new changes. 22.25 +\item The Mercurial Queues patch management extension is so invaluable 22.26 + that it merits two chapters and an appendix all to itself. 22.27 + Chapter~\ref{chap:mq} covers the basics; 22.28 + chapter~\ref{chap:mq-collab} discusses advanced topics; and 22.29 + appendix~\ref{chap:mqref} goes into detail on each command. 22.30 +\end{itemize} 22.31 + 22.32 +In this chapter, we'll cover some of the other extensions that are 22.33 +available for Mercurial, and briefly touch on some of the machinery 22.34 +you'll need to know about if you want to write an extension of your 22.35 +own. 22.36 +\begin{itemize} 22.37 +\item In section~\ref{sec:hgext:inotify}, we'll discuss the 22.38 + possibility of \emph{huge} performance improvements using the 22.39 + \hgext{inotify} extension. 22.40 +\end{itemize} 22.41 + 22.42 +\section{Improve performance with the \hgext{inotify} extension} 22.43 +\label{sec:hgext:inotify} 22.44 + 22.45 +Are you interested in having some of the most common Mercurial 22.46 +operations run as much as a hundred times faster? Read on! 22.47 + 22.48 +Mercurial has great performance under normal circumstances. For 22.49 +example, when you run the \hgcmd{status} command, Mercurial has to 22.50 +scan almost every directory and file in your repository so that it can 22.51 +display file status. Many other Mercurial commands need to do the 22.52 +same work behind the scenes; for example, the \hgcmd{diff} command 22.53 +uses the status machinery to avoid doing an expensive comparison 22.54 +operation on files that obviously haven't changed. 22.55 + 22.56 +Because obtaining file status is crucial to good performance, the 22.57 +authors of Mercurial have optimised this code to within an inch of its 22.58 +life. However, there's no avoiding the fact that when you run 22.59 +\hgcmd{status}, Mercurial is going to have to perform at least one 22.60 +expensive system call for each managed file to determine whether it's 22.61 +changed since the last time Mercurial checked. For a sufficiently 22.62 +large repository, this can take a long time. 22.63 + 22.64 +To put a number on the magnitude of this effect, I created a 22.65 +repository containing 150,000 managed files. I timed \hgcmd{status} 22.66 +as taking ten seconds to run, even when \emph{none} of those files had 22.67 +been modified. 22.68 + 22.69 +Many modern operating systems contain a file notification facility. 22.70 +If a program signs up to an appropriate service, the operating system 22.71 +will notify it every time a file of interest is created, modified, or 22.72 +deleted. On Linux systems, the kernel component that does this is 22.73 +called \texttt{inotify}. 22.74 + 22.75 +Mercurial's \hgext{inotify} extension talks to the kernel's 22.76 +\texttt{inotify} component to optimise \hgcmd{status} commands. The 22.77 +extension has two components. A daemon sits in the background and 22.78 +receives notifications from the \texttt{inotify} subsystem. It also 22.79 +listens for connections from a regular Mercurial command. The 22.80 +extension modifies Mercurial's behaviour so that instead of scanning 22.81 +the filesystem, it queries the daemon. Since the daemon has perfect 22.82 +information about the state of the repository, it can respond with a 22.83 +result instantaneously, avoiding the need to scan every directory and 22.84 +file in the repository. 22.85 + 22.86 +Recall the ten seconds that I measured plain Mercurial as taking to 22.87 +run \hgcmd{status} on a 150,000 file repository. With the 22.88 +\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a 22.89 +factor of \emph{one hundred} faster. 22.90 + 22.91 +Before we continue, please pay attention to some caveats. 22.92 +\begin{itemize} 22.93 +\item The \hgext{inotify} extension is Linux-specific. Because it 22.94 + interfaces directly to the Linux kernel's \texttt{inotify} 22.95 + subsystem, it does not work on other operating systems. 22.96 +\item It should work on any Linux distribution that was released after 22.97 + early~2005. Older distributions are likely to have a kernel that 22.98 + lacks \texttt{inotify}, or a version of \texttt{glibc} that does not 22.99 + have the necessary interfacing support. 22.100 +\item Not all filesystems are suitable for use with the 22.101 + \hgext{inotify} extension. Network filesystems such as NFS are a 22.102 + non-starter, for example, particularly if you're running Mercurial 22.103 + on several systems, all mounting the same network filesystem. The 22.104 + kernel's \texttt{inotify} system has no way of knowing about changes 22.105 + made on another system. Most local filesystems (e.g.~ext3, XFS, 22.106 + ReiserFS) should work fine. 22.107 +\end{itemize} 22.108 + 22.109 +The \hgext{inotify} extension is not yet shipped with Mercurial as of 22.110 +May~2007, so it's a little more involved to set up than other 22.111 +extensions. But the performance improvement is worth it! 22.112 + 22.113 +The extension currently comes in two parts: a set of patches to the 22.114 +Mercurial source code, and a library of Python bindings to the 22.115 +\texttt{inotify} subsystem. 22.116 +\begin{note} 22.117 + There are \emph{two} Python \texttt{inotify} binding libraries. One 22.118 + of them is called \texttt{pyinotify}, and is packaged by some Linux 22.119 + distributions as \texttt{python-inotify}. This is \emph{not} the 22.120 + one you'll need, as it is too buggy and inefficient to be practical. 22.121 +\end{note} 22.122 +To get going, it's best to already have a functioning copy of 22.123 +Mercurial installed. 22.124 +\begin{note} 22.125 + If you follow the instructions below, you'll be \emph{replacing} and 22.126 + overwriting any existing installation of Mercurial that you might 22.127 + already have, using the latest ``bleeding edge'' Mercurial code. 22.128 + Don't say you weren't warned! 22.129 +\end{note} 22.130 +\begin{enumerate} 22.131 +\item Clone the Python \texttt{inotify} binding repository. Build and 22.132 + install it. 22.133 + \begin{codesample4} 22.134 + hg clone http://hg.kublai.com/python/inotify 22.135 + cd inotify 22.136 + python setup.py build --force 22.137 + sudo python setup.py install --skip-build 22.138 + \end{codesample4} 22.139 +\item Clone the \dirname{crew} Mercurial repository. Clone the 22.140 + \hgext{inotify} patch repository so that Mercurial Queues will be 22.141 + able to apply patches to your cope of the \dirname{crew} repository. 22.142 + \begin{codesample4} 22.143 + hg clone http://hg.intevation.org/mercurial/crew 22.144 + hg clone crew inotify 22.145 + hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches 22.146 + \end{codesample4} 22.147 +\item Make sure that you have the Mercurial Queues extension, 22.148 + \hgext{mq}, enabled. If you've never used MQ, read 22.149 + section~\ref{sec:mq:start} to get started quickly. 22.150 +\item Go into the \dirname{inotify} repo, and apply all of the 22.151 + \hgext{inotify} patches using the \hgxopt{mq}{qpush}{-a} option to 22.152 + the \hgxcmd{mq}{qpush} command. 22.153 + \begin{codesample4} 22.154 + cd inotify 22.155 + hg qpush -a 22.156 + \end{codesample4} 22.157 + If you get an error message from \hgxcmd{mq}{qpush}, you should not 22.158 + continue. Instead, ask for help. 22.159 +\item Build and install the patched version of Mercurial. 22.160 + \begin{codesample4} 22.161 + python setup.py build --force 22.162 + sudo python setup.py install --skip-build 22.163 + \end{codesample4} 22.164 +\end{enumerate} 22.165 +Once you've build a suitably patched version of Mercurial, all you 22.166 +need to do to enable the \hgext{inotify} extension is add an entry to 22.167 +your \hgrc. 22.168 +\begin{codesample2} 22.169 + [extensions] 22.170 + inotify = 22.171 +\end{codesample2} 22.172 +When the \hgext{inotify} extension is enabled, Mercurial will 22.173 +automatically and transparently start the status daemon the first time 22.174 +you run a command that needs status in a repository. It runs one 22.175 +status daemon per repository. 22.176 + 22.177 +The status daemon is started silently, and runs in the background. If 22.178 +you look at a list of running processes after you've enabled the 22.179 +\hgext{inotify} extension and run a few commands in different 22.180 +repositories, you'll thus see a few \texttt{hg} processes sitting 22.181 +around, waiting for updates from the kernel and queries from 22.182 +Mercurial. 22.183 + 22.184 +The first time you run a Mercurial command in a repository when you 22.185 +have the \hgext{inotify} extension enabled, it will run with about the 22.186 +same performance as a normal Mercurial command. This is because the 22.187 +status daemon needs to perform a normal status scan so that it has a 22.188 +baseline against which to apply later updates from the kernel. 22.189 +However, \emph{every} subsequent command that does any kind of status 22.190 +check should be noticeably faster on repositories of even fairly 22.191 +modest size. Better yet, the bigger your repository is, the greater a 22.192 +performance advantage you'll see. The \hgext{inotify} daemon makes 22.193 +status operations almost instantaneous on repositories of all sizes! 22.194 + 22.195 +If you like, you can manually start a status daemon using the 22.196 +\hgxcmd{inotify}{inserve} command. This gives you slightly finer 22.197 +control over how the daemon ought to run. This command will of course 22.198 +only be available when the \hgext{inotify} extension is enabled. 22.199 + 22.200 +When you're using the \hgext{inotify} extension, you should notice 22.201 +\emph{no difference at all} in Mercurial's behaviour, with the sole 22.202 +exception of status-related commands running a whole lot faster than 22.203 +they used to. You should specifically expect that commands will not 22.204 +print different output; neither should they give different results. 22.205 +If either of these situations occurs, please report a bug. 22.206 + 22.207 +\section{Flexible diff support with the \hgext{extdiff} extension} 22.208 +\label{sec:hgext:extdiff} 22.209 + 22.210 +Mercurial's built-in \hgcmd{diff} command outputs plaintext unified 22.211 +diffs. 22.212 +\interaction{extdiff.diff} 22.213 +If you would like to use an external tool to display modifications, 22.214 +you'll want to use the \hgext{extdiff} extension. This will let you 22.215 +use, for example, a graphical diff tool. 22.216 + 22.217 +The \hgext{extdiff} extension is bundled with Mercurial, so it's easy 22.218 +to set up. In the \rcsection{extensions} section of your \hgrc, 22.219 +simply add a one-line entry to enable the extension. 22.220 +\begin{codesample2} 22.221 + [extensions] 22.222 + extdiff = 22.223 +\end{codesample2} 22.224 +This introduces a command named \hgxcmd{extdiff}{extdiff}, which by 22.225 +default uses your system's \command{diff} command to generate a 22.226 +unified diff in the same form as the built-in \hgcmd{diff} command. 22.227 +\interaction{extdiff.extdiff} 22.228 +The result won't be exactly the same as with the built-in \hgcmd{diff} 22.229 +variations, because the output of \command{diff} varies from one 22.230 +system to another, even when passed the same options. 22.231 + 22.232 +As the ``\texttt{making snapshot}'' lines of output above imply, the 22.233 +\hgxcmd{extdiff}{extdiff} command works by creating two snapshots of 22.234 +your source tree. The first snapshot is of the source revision; the 22.235 +second, of the target revision or working directory. The 22.236 +\hgxcmd{extdiff}{extdiff} command generates these snapshots in a 22.237 +temporary directory, passes the name of each directory to an external 22.238 +diff viewer, then deletes the temporary directory. For efficiency, it 22.239 +only snapshots the directories and files that have changed between the 22.240 +two revisions. 22.241 + 22.242 +Snapshot directory names have the same base name as your repository. 22.243 +If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo} 22.244 +will be the name of each snapshot directory. Each snapshot directory 22.245 +name has its changeset ID appended, if appropriate. If a snapshot is 22.246 +of revision \texttt{a631aca1083f}, the directory will be named 22.247 +\dirname{foo.a631aca1083f}. A snapshot of the working directory won't 22.248 +have a changeset ID appended, so it would just be \dirname{foo} in 22.249 +this example. To see what this looks like in practice, look again at 22.250 +the \hgxcmd{extdiff}{extdiff} example above. Notice that the diff has 22.251 +the snapshot directory names embedded in its header. 22.252 + 22.253 +The \hgxcmd{extdiff}{extdiff} command accepts two important options. 22.254 +The \hgxopt{extdiff}{extdiff}{-p} option lets you choose a program to 22.255 +view differences with, instead of \command{diff}. With the 22.256 +\hgxopt{extdiff}{extdiff}{-o} option, you can change the options that 22.257 +\hgxcmd{extdiff}{extdiff} passes to the program (by default, these 22.258 +options are ``\texttt{-Npru}'', which only make sense if you're 22.259 +running \command{diff}). In other respects, the 22.260 +\hgxcmd{extdiff}{extdiff} command acts similarly to the built-in 22.261 +\hgcmd{diff} command: you use the same option names, syntax, and 22.262 +arguments to specify the revisions you want, the files you want, and 22.263 +so on. 22.264 + 22.265 +As an example, here's how to run the normal system \command{diff} 22.266 +command, getting it to generate context diffs (using the 22.267 +\cmdopt{diff}{-c} option) instead of unified diffs, and five lines of 22.268 +context instead of the default three (passing \texttt{5} as the 22.269 +argument to the \cmdopt{diff}{-C} option). 22.270 +\interaction{extdiff.extdiff-ctx} 22.271 + 22.272 +Launching a visual diff tool is just as easy. Here's how to launch 22.273 +the \command{kdiff3} viewer. 22.274 +\begin{codesample2} 22.275 + hg extdiff -p kdiff3 -o '' 22.276 +\end{codesample2} 22.277 + 22.278 +If your diff viewing command can't deal with directories, you can 22.279 +easily work around this with a little scripting. For an example of 22.280 +such scripting in action with the \hgext{mq} extension and the 22.281 +\command{interdiff} command, see 22.282 +section~\ref{mq-collab:tips:interdiff}. 22.283 + 22.284 +\subsection{Defining command aliases} 22.285 + 22.286 +It can be cumbersome to remember the options to both the 22.287 +\hgxcmd{extdiff}{extdiff} command and the diff viewer you want to use, 22.288 +so the \hgext{extdiff} extension lets you define \emph{new} commands 22.289 +that will invoke your diff viewer with exactly the right options. 22.290 + 22.291 +All you need to do is edit your \hgrc, and add a section named 22.292 +\rcsection{extdiff}. Inside this section, you can define multiple 22.293 +commands. Here's how to add a \texttt{kdiff3} command. Once you've 22.294 +defined this, you can type ``\texttt{hg kdiff3}'' and the 22.295 +\hgext{extdiff} extension will run \command{kdiff3} for you. 22.296 +\begin{codesample2} 22.297 + [extdiff] 22.298 + cmd.kdiff3 = 22.299 +\end{codesample2} 22.300 +If you leave the right hand side of the definition empty, as above, 22.301 +the \hgext{extdiff} extension uses the name of the command you defined 22.302 +as the name of the external program to run. But these names don't 22.303 +have to be the same. Here, we define a command named ``\texttt{hg 22.304 + wibble}'', which runs \command{kdiff3}. 22.305 +\begin{codesample2} 22.306 + [extdiff] 22.307 + cmd.wibble = kdiff3 22.308 +\end{codesample2} 22.309 + 22.310 +You can also specify the default options that you want to invoke your 22.311 +diff viewing program with. The prefix to use is ``\texttt{opts.}'', 22.312 +followed by the name of the command to which the options apply. This 22.313 +example defines a ``\texttt{hg vimdiff}'' command that runs the 22.314 +\command{vim} editor's \texttt{DirDiff} extension. 22.315 +\begin{codesample2} 22.316 + [extdiff] 22.317 + cmd.vimdiff = vim 22.318 + opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)' 22.319 +\end{codesample2} 22.320 + 22.321 +\section{Cherrypicking changes with the \hgext{transplant} extension} 22.322 +\label{sec:hgext:transplant} 22.323 + 22.324 +Need to have a long chat with Brendan about this. 22.325 + 22.326 +\section{Send changes via email with the \hgext{patchbomb} extension} 22.327 +\label{sec:hgext:patchbomb} 22.328 + 22.329 +Many projects have a culture of ``change review'', in which people 22.330 +send their modifications to a mailing list for others to read and 22.331 +comment on before they commit the final version to a shared 22.332 +repository. Some projects have people who act as gatekeepers; they 22.333 +apply changes from other people to a repository to which those others 22.334 +don't have access. 22.335 + 22.336 +Mercurial makes it easy to send changes over email for review or 22.337 +application, via its \hgext{patchbomb} extension. The extension is so 22.338 +namd because changes are formatted as patches, and it's usual to send 22.339 +one changeset per email message. Sending a long series of changes by 22.340 +email is thus much like ``bombing'' the recipient's inbox, hence 22.341 +``patchbomb''. 22.342 + 22.343 +As usual, the basic configuration of the \hgext{patchbomb} extension 22.344 +takes just one or two lines in your \hgrc. 22.345 +\begin{codesample2} 22.346 + [extensions] 22.347 + patchbomb = 22.348 +\end{codesample2} 22.349 +Once you've enabled the extension, you will have a new command 22.350 +available, named \hgxcmd{patchbomb}{email}. 22.351 + 22.352 +The safest and best way to invoke the \hgxcmd{patchbomb}{email} 22.353 +command is to \emph{always} run it first with the 22.354 +\hgxopt{patchbomb}{email}{-n} option. This will show you what the 22.355 +command \emph{would} send, without actually sending anything. Once 22.356 +you've had a quick glance over the changes and verified that you are 22.357 +sending the right ones, you can rerun the same command, with the 22.358 +\hgxopt{patchbomb}{email}{-n} option removed. 22.359 + 22.360 +The \hgxcmd{patchbomb}{email} command accepts the same kind of 22.361 +revision syntax as every other Mercurial command. For example, this 22.362 +command will send every revision between 7 and \texttt{tip}, 22.363 +inclusive. 22.364 +\begin{codesample2} 22.365 + hg email -n 7:tip 22.366 +\end{codesample2} 22.367 +You can also specify a \emph{repository} to compare with. If you 22.368 +provide a repository but no revisions, the \hgxcmd{patchbomb}{email} 22.369 +command will send all revisions in the local repository that are not 22.370 +present in the remote repository. If you additionally specify 22.371 +revisions or a branch name (the latter using the 22.372 +\hgxopt{patchbomb}{email}{-b} option), this will constrain the 22.373 +revisions sent. 22.374 + 22.375 +It's perfectly safe to run the \hgxcmd{patchbomb}{email} command 22.376 +without the names of the people you want to send to: if you do this, 22.377 +it will just prompt you for those values interactively. (If you're 22.378 +using a Linux or Unix-like system, you should have enhanced 22.379 +\texttt{readline}-style editing capabilities when entering those 22.380 +headers, too, which is useful.) 22.381 + 22.382 +When you are sending just one revision, the \hgxcmd{patchbomb}{email} 22.383 +command will by default use the first line of the changeset 22.384 +description as the subject of the single email message it sends. 22.385 + 22.386 +If you send multiple revisions, the \hgxcmd{patchbomb}{email} command 22.387 +will usually send one message per changeset. It will preface the 22.388 +series with an introductory message, in which you should describe the 22.389 +purpose of the series of changes you're sending. 22.390 + 22.391 +\subsection{Changing the behaviour of patchbombs} 22.392 + 22.393 +Not every project has exactly the same conventions for sending changes 22.394 +in email; the \hgext{patchbomb} extension tries to accommodate a 22.395 +number of variations through command line options. 22.396 +\begin{itemize} 22.397 +\item You can write a subject for the introductory message on the 22.398 + command line using the \hgxopt{patchbomb}{email}{-s} option. This 22.399 + takes one argument, the text of the subject to use. 22.400 +\item To change the email address from which the messages originate, 22.401 + use the \hgxopt{patchbomb}{email}{-f} option. This takes one 22.402 + argument, the email address to use. 22.403 +\item The default behaviour is to send unified diffs (see 22.404 + section~\ref{sec:mq:patch} for a description of the format), one per 22.405 + message. You can send a binary bundle instead with the 22.406 + \hgxopt{patchbomb}{email}{-b} option. 22.407 +\item Unified diffs are normally prefaced with a metadata header. You 22.408 + can omit this, and send unadorned diffs, with the 22.409 + \hgxopt{patchbomb}{email}{--plain} option. 22.410 +\item Diffs are normally sent ``inline'', in the same body part as the 22.411 + description of a patch. This makes it easiest for the largest 22.412 + number of readers to quote and respond to parts of a diff, as some 22.413 + mail clients will only quote the first MIME body part in a message. 22.414 + If you'd prefer to send the description and the diff in separate 22.415 + body parts, use the \hgxopt{patchbomb}{email}{-a} option. 22.416 +\item Instead of sending mail messages, you can write them to an 22.417 + \texttt{mbox}-format mail folder using the 22.418 + \hgxopt{patchbomb}{email}{-m} option. That option takes one 22.419 + argument, the name of the file to write to. 22.420 +\item If you would like to add a \command{diffstat}-format summary to 22.421 + each patch, and one to the introductory message, use the 22.422 + \hgxopt{patchbomb}{email}{-d} option. The \command{diffstat} 22.423 + command displays a table containing the name of each file patched, 22.424 + the number of lines affected, and a histogram showing how much each 22.425 + file is modified. This gives readers a qualitative glance at how 22.426 + complex a patch is. 22.427 +\end{itemize} 22.428 + 22.429 +%%% Local Variables: 22.430 +%%% mode: latex 22.431 +%%% TeX-master: "00book" 22.432 +%%% End:
23.1 --- a/en/cmdref.tex Thu Jan 29 22:47:34 2009 -0800 23.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 23.3 @@ -1,176 +0,0 @@ 23.4 -\chapter{Command reference} 23.5 -\label{cmdref} 23.6 - 23.7 -\cmdref{add}{add files at the next commit} 23.8 -\optref{add}{I}{include} 23.9 -\optref{add}{X}{exclude} 23.10 -\optref{add}{n}{dry-run} 23.11 - 23.12 -\cmdref{diff}{print changes in history or working directory} 23.13 - 23.14 -Show differences between revisions for the specified files or 23.15 -directories, using the unified diff format. For a description of the 23.16 -unified diff format, see section~\ref{sec:mq:patch}. 23.17 - 23.18 -By default, this command does not print diffs for files that Mercurial 23.19 -considers to contain binary data. To control this behaviour, see the 23.20 -\hgopt{diff}{-a} and \hgopt{diff}{--git} options. 23.21 - 23.22 -\subsection{Options} 23.23 - 23.24 -\loptref{diff}{nodates} 23.25 - 23.26 -Omit date and time information when printing diff headers. 23.27 - 23.28 -\optref{diff}{B}{ignore-blank-lines} 23.29 - 23.30 -Do not print changes that only insert or delete blank lines. A line 23.31 -that contains only whitespace is not considered blank. 23.32 - 23.33 -\optref{diff}{I}{include} 23.34 - 23.35 -Include files and directories whose names match the given patterns. 23.36 - 23.37 -\optref{diff}{X}{exclude} 23.38 - 23.39 -Exclude files and directories whose names match the given patterns. 23.40 - 23.41 -\optref{diff}{a}{text} 23.42 - 23.43 -If this option is not specified, \hgcmd{diff} will refuse to print 23.44 -diffs for files that it detects as binary. Specifying \hgopt{diff}{-a} 23.45 -forces \hgcmd{diff} to treat all files as text, and generate diffs for 23.46 -all of them. 23.47 - 23.48 -This option is useful for files that are ``mostly text'' but have a 23.49 -few embedded NUL characters. If you use it on files that contain a 23.50 -lot of binary data, its output will be incomprehensible. 23.51 - 23.52 -\optref{diff}{b}{ignore-space-change} 23.53 - 23.54 -Do not print a line if the only change to that line is in the amount 23.55 -of white space it contains. 23.56 - 23.57 -\optref{diff}{g}{git} 23.58 - 23.59 -Print \command{git}-compatible diffs. XXX reference a format 23.60 -description. 23.61 - 23.62 -\optref{diff}{p}{show-function} 23.63 - 23.64 -Display the name of the enclosing function in a hunk header, using a 23.65 -simple heuristic. This functionality is enabled by default, so the 23.66 -\hgopt{diff}{-p} option has no effect unless you change the value of 23.67 -the \rcitem{diff}{showfunc} config item, as in the following example. 23.68 -\interaction{cmdref.diff-p} 23.69 - 23.70 -\optref{diff}{r}{rev} 23.71 - 23.72 -Specify one or more revisions to compare. The \hgcmd{diff} command 23.73 -accepts up to two \hgopt{diff}{-r} options to specify the revisions to 23.74 -compare. 23.75 - 23.76 -\begin{enumerate} 23.77 -\setcounter{enumi}{0} 23.78 -\item Display the differences between the parent revision of the 23.79 - working directory and the working directory. 23.80 -\item Display the differences between the specified changeset and the 23.81 - working directory. 23.82 -\item Display the differences between the two specified changesets. 23.83 -\end{enumerate} 23.84 - 23.85 -You can specify two revisions using either two \hgopt{diff}{-r} 23.86 -options or revision range notation. For example, the two revision 23.87 -specifications below are equivalent. 23.88 -\begin{codesample2} 23.89 - hg diff -r 10 -r 20 23.90 - hg diff -r10:20 23.91 -\end{codesample2} 23.92 - 23.93 -When you provide two revisions, Mercurial treats the order of those 23.94 -revisions as significant. Thus, \hgcmdargs{diff}{-r10:20} will 23.95 -produce a diff that will transform files from their contents as of 23.96 -revision~10 to their contents as of revision~20, while 23.97 -\hgcmdargs{diff}{-r20:10} means the opposite: the diff that will 23.98 -transform files from their revision~20 contents to their revision~10 23.99 -contents. You cannot reverse the ordering in this way if you are 23.100 -diffing against the working directory. 23.101 - 23.102 -\optref{diff}{w}{ignore-all-space} 23.103 - 23.104 -\cmdref{version}{print version and copyright information} 23.105 - 23.106 -This command displays the version of Mercurial you are running, and 23.107 -its copyright license. There are four kinds of version string that 23.108 -you may see. 23.109 -\begin{itemize} 23.110 -\item The string ``\texttt{unknown}''. This version of Mercurial was 23.111 - not built in a Mercurial repository, and cannot determine its own 23.112 - version. 23.113 -\item A short numeric string, such as ``\texttt{1.1}''. This is a 23.114 - build of a revision of Mercurial that was identified by a specific 23.115 - tag in the repository where it was built. (This doesn't necessarily 23.116 - mean that you're running an official release; someone else could 23.117 - have added that tag to any revision in the repository where they 23.118 - built Mercurial.) 23.119 -\item A hexadecimal string, such as ``\texttt{875489e31abe}''. This 23.120 - is a build of the given revision of Mercurial. 23.121 -\item A hexadecimal string followed by a date, such as 23.122 - ``\texttt{875489e31abe+20070205}''. This is a build of the given 23.123 - revision of Mercurial, where the build repository contained some 23.124 - local changes that had not been committed. 23.125 -\end{itemize} 23.126 - 23.127 -\subsection{Tips and tricks} 23.128 - 23.129 -\subsubsection{Why do the results of \hgcmd{diff} and \hgcmd{status} 23.130 - differ?} 23.131 -\label{cmdref:diff-vs-status} 23.132 - 23.133 -When you run the \hgcmd{status} command, you'll see a list of files 23.134 -that Mercurial will record changes for the next time you perform a 23.135 -commit. If you run the \hgcmd{diff} command, you may notice that it 23.136 -prints diffs for only a \emph{subset} of the files that \hgcmd{status} 23.137 -listed. There are two possible reasons for this. 23.138 - 23.139 -The first is that \hgcmd{status} prints some kinds of modifications 23.140 -that \hgcmd{diff} doesn't normally display. The \hgcmd{diff} command 23.141 -normally outputs unified diffs, which don't have the ability to 23.142 -represent some changes that Mercurial can track. Most notably, 23.143 -traditional diffs can't represent a change in whether or not a file is 23.144 -executable, but Mercurial records this information. 23.145 - 23.146 -If you use the \hgopt{diff}{--git} option to \hgcmd{diff}, it will 23.147 -display \command{git}-compatible diffs that \emph{can} display this 23.148 -extra information. 23.149 - 23.150 -The second possible reason that \hgcmd{diff} might be printing diffs 23.151 -for a subset of the files displayed by \hgcmd{status} is that if you 23.152 -invoke it without any arguments, \hgcmd{diff} prints diffs against the 23.153 -first parent of the working directory. If you have run \hgcmd{merge} 23.154 -to merge two changesets, but you haven't yet committed the results of 23.155 -the merge, your working directory has two parents (use \hgcmd{parents} 23.156 -to see them). While \hgcmd{status} prints modifications relative to 23.157 -\emph{both} parents after an uncommitted merge, \hgcmd{diff} still 23.158 -operates relative only to the first parent. You can get it to print 23.159 -diffs relative to the second parent by specifying that parent with the 23.160 -\hgopt{diff}{-r} option. There is no way to print diffs relative to 23.161 -both parents. 23.162 - 23.163 -\subsubsection{Generating safe binary diffs} 23.164 - 23.165 -If you use the \hgopt{diff}{-a} option to force Mercurial to print 23.166 -diffs of files that are either ``mostly text'' or contain lots of 23.167 -binary data, those diffs cannot subsequently be applied by either 23.168 -Mercurial's \hgcmd{import} command or the system's \command{patch} 23.169 -command. 23.170 - 23.171 -If you want to generate a diff of a binary file that is safe to use as 23.172 -input for \hgcmd{import}, use the \hgcmd{diff}{--git} option when you 23.173 -generate the patch. The system \command{patch} command cannot handle 23.174 -binary patches at all. 23.175 - 23.176 -%%% Local Variables: 23.177 -%%% mode: latex 23.178 -%%% TeX-master: "00book" 23.179 -%%% End:
24.1 --- a/en/collab.tex Thu Jan 29 22:47:34 2009 -0800 24.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 24.3 @@ -1,1118 +0,0 @@ 24.4 -\chapter{Collaborating with other people} 24.5 -\label{cha:collab} 24.6 - 24.7 -As a completely decentralised tool, Mercurial doesn't impose any 24.8 -policy on how people ought to work with each other. However, if 24.9 -you're new to distributed revision control, it helps to have some 24.10 -tools and examples in mind when you're thinking about possible 24.11 -workflow models. 24.12 - 24.13 -\section{Mercurial's web interface} 24.14 - 24.15 -Mercurial has a powerful web interface that provides several 24.16 -useful capabilities. 24.17 - 24.18 -For interactive use, the web interface lets you browse a single 24.19 -repository or a collection of repositories. You can view the history 24.20 -of a repository, examine each change (comments and diffs), and view 24.21 -the contents of each directory and file. 24.22 - 24.23 -Also for human consumption, the web interface provides an RSS feed of 24.24 -the changes in a repository. This lets you ``subscribe'' to a 24.25 -repository using your favourite feed reader, and be automatically 24.26 -notified of activity in that repository as soon as it happens. I find 24.27 -this capability much more convenient than the model of subscribing to 24.28 -a mailing list to which notifications are sent, as it requires no 24.29 -additional configuration on the part of whoever is serving the 24.30 -repository. 24.31 - 24.32 -The web interface also lets remote users clone a repository, pull 24.33 -changes from it, and (when the server is configured to permit it) push 24.34 -changes back to it. Mercurial's HTTP tunneling protocol aggressively 24.35 -compresses data, so that it works efficiently even over low-bandwidth 24.36 -network connections. 24.37 - 24.38 -The easiest way to get started with the web interface is to use your 24.39 -web browser to visit an existing repository, such as the master 24.40 -Mercurial repository at 24.41 -\url{http://www.selenic.com/repo/hg?style=gitweb}. 24.42 - 24.43 -If you're interested in providing a web interface to your own 24.44 -repositories, Mercurial provides two ways to do this. The first is 24.45 -using the \hgcmd{serve} command, which is best suited to short-term 24.46 -``lightweight'' serving. See section~\ref{sec:collab:serve} below for 24.47 -details of how to use this command. If you have a long-lived 24.48 -repository that you'd like to make permanently available, Mercurial 24.49 -has built-in support for the CGI (Common Gateway Interface) standard, 24.50 -which all common web servers support. See 24.51 -section~\ref{sec:collab:cgi} for details of CGI configuration. 24.52 - 24.53 -\section{Collaboration models} 24.54 - 24.55 -With a suitably flexible tool, making decisions about workflow is much 24.56 -more of a social engineering challenge than a technical one. 24.57 -Mercurial imposes few limitations on how you can structure the flow of 24.58 -work in a project, so it's up to you and your group to set up and live 24.59 -with a model that matches your own particular needs. 24.60 - 24.61 -\subsection{Factors to keep in mind} 24.62 - 24.63 -The most important aspect of any model that you must keep in mind is 24.64 -how well it matches the needs and capabilities of the people who will 24.65 -be using it. This might seem self-evident; even so, you still can't 24.66 -afford to forget it for a moment. 24.67 - 24.68 -I once put together a workflow model that seemed to make perfect sense 24.69 -to me, but that caused a considerable amount of consternation and 24.70 -strife within my development team. In spite of my attempts to explain 24.71 -why we needed a complex set of branches, and how changes ought to flow 24.72 -between them, a few team members revolted. Even though they were 24.73 -smart people, they didn't want to pay attention to the constraints we 24.74 -were operating under, or face the consequences of those constraints in 24.75 -the details of the model that I was advocating. 24.76 - 24.77 -Don't sweep foreseeable social or technical problems under the rug. 24.78 -Whatever scheme you put into effect, you should plan for mistakes and 24.79 -problem scenarios. Consider adding automated machinery to prevent, or 24.80 -quickly recover from, trouble that you can anticipate. As an example, 24.81 -if you intend to have a branch with not-for-release changes in it, 24.82 -you'd do well to think early about the possibility that someone might 24.83 -accidentally merge those changes into a release branch. You could 24.84 -avoid this particular problem by writing a hook that prevents changes 24.85 -from being merged from an inappropriate branch. 24.86 - 24.87 -\subsection{Informal anarchy} 24.88 - 24.89 -I wouldn't suggest an ``anything goes'' approach as something 24.90 -sustainable, but it's a model that's easy to grasp, and it works 24.91 -perfectly well in a few unusual situations. 24.92 - 24.93 -As one example, many projects have a loose-knit group of collaborators 24.94 -who rarely physically meet each other. Some groups like to overcome 24.95 -the isolation of working at a distance by organising occasional 24.96 -``sprints''. In a sprint, a number of people get together in a single 24.97 -location (a company's conference room, a hotel meeting room, that kind 24.98 -of place) and spend several days more or less locked in there, hacking 24.99 -intensely on a handful of projects. 24.100 - 24.101 -A sprint is the perfect place to use the \hgcmd{serve} command, since 24.102 -\hgcmd{serve} does not requires any fancy server infrastructure. You 24.103 -can get started with \hgcmd{serve} in moments, by reading 24.104 -section~\ref{sec:collab:serve} below. Then simply tell the person 24.105 -next to you that you're running a server, send the URL to them in an 24.106 -instant message, and you immediately have a quick-turnaround way to 24.107 -work together. They can type your URL into their web browser and 24.108 -quickly review your changes; or they can pull a bugfix from you and 24.109 -verify it; or they can clone a branch containing a new feature and try 24.110 -it out. 24.111 - 24.112 -The charm, and the problem, with doing things in an ad hoc fashion 24.113 -like this is that only people who know about your changes, and where 24.114 -they are, can see them. Such an informal approach simply doesn't 24.115 -scale beyond a handful people, because each individual needs to know 24.116 -about $n$ different repositories to pull from. 24.117 - 24.118 -\subsection{A single central repository} 24.119 - 24.120 -For smaller projects migrating from a centralised revision control 24.121 -tool, perhaps the easiest way to get started is to have changes flow 24.122 -through a single shared central repository. This is also the 24.123 -most common ``building block'' for more ambitious workflow schemes. 24.124 - 24.125 -Contributors start by cloning a copy of this repository. They can 24.126 -pull changes from it whenever they need to, and some (perhaps all) 24.127 -developers have permission to push a change back when they're ready 24.128 -for other people to see it. 24.129 - 24.130 -Under this model, it can still often make sense for people to pull 24.131 -changes directly from each other, without going through the central 24.132 -repository. Consider a case in which I have a tentative bug fix, but 24.133 -I am worried that if I were to publish it to the central repository, 24.134 -it might subsequently break everyone else's trees as they pull it. To 24.135 -reduce the potential for damage, I can ask you to clone my repository 24.136 -into a temporary repository of your own and test it. This lets us put 24.137 -off publishing the potentially unsafe change until it has had a little 24.138 -testing. 24.139 - 24.140 -In this kind of scenario, people usually use the \command{ssh} 24.141 -protocol to securely push changes to the central repository, as 24.142 -documented in section~\ref{sec:collab:ssh}. It's also usual to 24.143 -publish a read-only copy of the repository over HTTP using CGI, as in 24.144 -section~\ref{sec:collab:cgi}. Publishing over HTTP satisfies the 24.145 -needs of people who don't have push access, and those who want to use 24.146 -web browsers to browse the repository's history. 24.147 - 24.148 -\subsection{Working with multiple branches} 24.149 - 24.150 -Projects of any significant size naturally tend to make progress on 24.151 -several fronts simultaneously. In the case of software, it's common 24.152 -for a project to go through periodic official releases. A release 24.153 -might then go into ``maintenance mode'' for a while after its first 24.154 -publication; maintenance releases tend to contain only bug fixes, not 24.155 -new features. In parallel with these maintenance releases, one or 24.156 -more future releases may be under development. People normally use 24.157 -the word ``branch'' to refer to one of these many slightly different 24.158 -directions in which development is proceeding. 24.159 - 24.160 -Mercurial is particularly well suited to managing a number of 24.161 -simultaneous, but not identical, branches. Each ``development 24.162 -direction'' can live in its own central repository, and you can merge 24.163 -changes from one to another as the need arises. Because repositories 24.164 -are independent of each other, unstable changes in a development 24.165 -branch will never affect a stable branch unless someone explicitly 24.166 -merges those changes in. 24.167 - 24.168 -Here's an example of how this can work in practice. Let's say you 24.169 -have one ``main branch'' on a central server. 24.170 -\interaction{branching.init} 24.171 -People clone it, make changes locally, test them, and push them back. 24.172 - 24.173 -Once the main branch reaches a release milestone, you can use the 24.174 -\hgcmd{tag} command to give a permanent name to the milestone 24.175 -revision. 24.176 -\interaction{branching.tag} 24.177 -Let's say some ongoing development occurs on the main branch. 24.178 -\interaction{branching.main} 24.179 -Using the tag that was recorded at the milestone, people who clone 24.180 -that repository at any time in the future can use \hgcmd{update} to 24.181 -get a copy of the working directory exactly as it was when that tagged 24.182 -revision was committed. 24.183 -\interaction{branching.update} 24.184 - 24.185 -In addition, immediately after the main branch is tagged, someone can 24.186 -then clone the main branch on the server to a new ``stable'' branch, 24.187 -also on the server. 24.188 -\interaction{branching.clone} 24.189 - 24.190 -Someone who needs to make a change to the stable branch can then clone 24.191 -\emph{that} repository, make their changes, commit, and push their 24.192 -changes back there. 24.193 -\interaction{branching.stable} 24.194 -Because Mercurial repositories are independent, and Mercurial doesn't 24.195 -move changes around automatically, the stable and main branches are 24.196 -\emph{isolated} from each other. The changes that you made on the 24.197 -main branch don't ``leak'' to the stable branch, and vice versa. 24.198 - 24.199 -You'll often want all of your bugfixes on the stable branch to show up 24.200 -on the main branch, too. Rather than rewrite a bugfix on the main 24.201 -branch, you can simply pull and merge changes from the stable to the 24.202 -main branch, and Mercurial will bring those bugfixes in for you. 24.203 -\interaction{branching.merge} 24.204 -The main branch will still contain changes that are not on the stable 24.205 -branch, but it will also contain all of the bugfixes from the stable 24.206 -branch. The stable branch remains unaffected by these changes. 24.207 - 24.208 -\subsection{Feature branches} 24.209 - 24.210 -For larger projects, an effective way to manage change is to break up 24.211 -a team into smaller groups. Each group has a shared branch of its 24.212 -own, cloned from a single ``master'' branch used by the entire 24.213 -project. People working on an individual branch are typically quite 24.214 -isolated from developments on other branches. 24.215 - 24.216 -\begin{figure}[ht] 24.217 - \centering 24.218 - \grafix{feature-branches} 24.219 - \caption{Feature branches} 24.220 - \label{fig:collab:feature-branches} 24.221 -\end{figure} 24.222 - 24.223 -When a particular feature is deemed to be in suitable shape, someone 24.224 -on that feature team pulls and merges from the master branch into the 24.225 -feature branch, then pushes back up to the master branch. 24.226 - 24.227 -\subsection{The release train} 24.228 - 24.229 -Some projects are organised on a ``train'' basis: a release is 24.230 -scheduled to happen every few months, and whatever features are ready 24.231 -when the ``train'' is ready to leave are allowed in. 24.232 - 24.233 -This model resembles working with feature branches. The difference is 24.234 -that when a feature branch misses a train, someone on the feature team 24.235 -pulls and merges the changes that went out on that train release into 24.236 -the feature branch, and the team continues its work on top of that 24.237 -release so that their feature can make the next release. 24.238 - 24.239 -\subsection{The Linux kernel model} 24.240 - 24.241 -The development of the Linux kernel has a shallow hierarchical 24.242 -structure, surrounded by a cloud of apparent chaos. Because most 24.243 -Linux developers use \command{git}, a distributed revision control 24.244 -tool with capabilities similar to Mercurial, it's useful to describe 24.245 -the way work flows in that environment; if you like the ideas, the 24.246 -approach translates well across tools. 24.247 - 24.248 -At the center of the community sits Linus Torvalds, the creator of 24.249 -Linux. He publishes a single source repository that is considered the 24.250 -``authoritative'' current tree by the entire developer community. 24.251 -Anyone can clone Linus's tree, but he is very choosy about whose trees 24.252 -he pulls from. 24.253 - 24.254 -Linus has a number of ``trusted lieutenants''. As a general rule, he 24.255 -pulls whatever changes they publish, in most cases without even 24.256 -reviewing those changes. Some of those lieutenants are generally 24.257 -agreed to be ``maintainers'', responsible for specific subsystems 24.258 -within the kernel. If a random kernel hacker wants to make a change 24.259 -to a subsystem that they want to end up in Linus's tree, they must 24.260 -find out who the subsystem's maintainer is, and ask that maintainer to 24.261 -take their change. If the maintainer reviews their changes and agrees 24.262 -to take them, they'll pass them along to Linus in due course. 24.263 - 24.264 -Individual lieutenants have their own approaches to reviewing, 24.265 -accepting, and publishing changes; and for deciding when to feed them 24.266 -to Linus. In addition, there are several well known branches that 24.267 -people use for different purposes. For example, a few people maintain 24.268 -``stable'' repositories of older versions of the kernel, to which they 24.269 -apply critical fixes as needed. Some maintainers publish multiple 24.270 -trees: one for experimental changes; one for changes that they are 24.271 -about to feed upstream; and so on. Others just publish a single 24.272 -tree. 24.273 - 24.274 -This model has two notable features. The first is that it's ``pull 24.275 -only''. You have to ask, convince, or beg another developer to take a 24.276 -change from you, because there are almost no trees to which more than 24.277 -one person can push, and there's no way to push changes into a tree 24.278 -that someone else controls. 24.279 - 24.280 -The second is that it's based on reputation and acclaim. If you're an 24.281 -unknown, Linus will probably ignore changes from you without even 24.282 -responding. But a subsystem maintainer will probably review them, and 24.283 -will likely take them if they pass their criteria for suitability. 24.284 -The more ``good'' changes you contribute to a maintainer, the more 24.285 -likely they are to trust your judgment and accept your changes. If 24.286 -you're well-known and maintain a long-lived branch for something Linus 24.287 -hasn't yet accepted, people with similar interests may pull your 24.288 -changes regularly to keep up with your work. 24.289 - 24.290 -Reputation and acclaim don't necessarily cross subsystem or ``people'' 24.291 -boundaries. If you're a respected but specialised storage hacker, and 24.292 -you try to fix a networking bug, that change will receive a level of 24.293 -scrutiny from a network maintainer comparable to a change from a 24.294 -complete stranger. 24.295 - 24.296 -To people who come from more orderly project backgrounds, the 24.297 -comparatively chaotic Linux kernel development process often seems 24.298 -completely insane. It's subject to the whims of individuals; people 24.299 -make sweeping changes whenever they deem it appropriate; and the pace 24.300 -of development is astounding. And yet Linux is a highly successful, 24.301 -well-regarded piece of software. 24.302 - 24.303 -\subsection{Pull-only versus shared-push collaboration} 24.304 - 24.305 -A perpetual source of heat in the open source community is whether a 24.306 -development model in which people only ever pull changes from others 24.307 -is ``better than'' one in which multiple people can push changes to a 24.308 -shared repository. 24.309 - 24.310 -Typically, the backers of the shared-push model use tools that 24.311 -actively enforce this approach. If you're using a centralised 24.312 -revision control tool such as Subversion, there's no way to make a 24.313 -choice over which model you'll use: the tool gives you shared-push, 24.314 -and if you want to do anything else, you'll have to roll your own 24.315 -approach on top (such as applying a patch by hand). 24.316 - 24.317 -A good distributed revision control tool, such as Mercurial, will 24.318 -support both models. You and your collaborators can then structure 24.319 -how you work together based on your own needs and preferences, not on 24.320 -what contortions your tools force you into. 24.321 - 24.322 -\subsection{Where collaboration meets branch management} 24.323 - 24.324 -Once you and your team set up some shared repositories and start 24.325 -propagating changes back and forth between local and shared repos, you 24.326 -begin to face a related, but slightly different challenge: that of 24.327 -managing the multiple directions in which your team may be moving at 24.328 -once. Even though this subject is intimately related to how your team 24.329 -collaborates, it's dense enough to merit treatment of its own, in 24.330 -chapter~\ref{chap:branch}. 24.331 - 24.332 -\section{The technical side of sharing} 24.333 - 24.334 -The remainder of this chapter is devoted to the question of serving 24.335 -data to your collaborators. 24.336 - 24.337 -\section{Informal sharing with \hgcmd{serve}} 24.338 -\label{sec:collab:serve} 24.339 - 24.340 -Mercurial's \hgcmd{serve} command is wonderfully suited to small, 24.341 -tight-knit, and fast-paced group environments. It also provides a 24.342 -great way to get a feel for using Mercurial commands over a network. 24.343 - 24.344 -Run \hgcmd{serve} inside a repository, and in under a second it will 24.345 -bring up a specialised HTTP server; this will accept connections from 24.346 -any client, and serve up data for that repository until you terminate 24.347 -it. Anyone who knows the URL of the server you just started, and can 24.348 -talk to your computer over the network, can then use a web browser or 24.349 -Mercurial to read data from that repository. A URL for a 24.350 -\hgcmd{serve} instance running on a laptop is likely to look something 24.351 -like \Verb|http://my-laptop.local:8000/|. 24.352 - 24.353 -The \hgcmd{serve} command is \emph{not} a general-purpose web server. 24.354 -It can do only two things: 24.355 -\begin{itemize} 24.356 -\item Allow people to browse the history of the repository it's 24.357 - serving, from their normal web browsers. 24.358 -\item Speak Mercurial's wire protocol, so that people can 24.359 - \hgcmd{clone} or \hgcmd{pull} changes from that repository. 24.360 -\end{itemize} 24.361 -In particular, \hgcmd{serve} won't allow remote users to \emph{modify} 24.362 -your repository. It's intended for read-only use. 24.363 - 24.364 -If you're getting started with Mercurial, there's nothing to prevent 24.365 -you from using \hgcmd{serve} to serve up a repository on your own 24.366 -computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and 24.367 -so on to talk to that server as if the repository was hosted remotely. 24.368 -This can help you to quickly get acquainted with using commands on 24.369 -network-hosted repositories. 24.370 - 24.371 -\subsection{A few things to keep in mind} 24.372 - 24.373 -Because it provides unauthenticated read access to all clients, you 24.374 -should only use \hgcmd{serve} in an environment where you either don't 24.375 -care, or have complete control over, who can access your network and 24.376 -pull data from your repository. 24.377 - 24.378 -The \hgcmd{serve} command knows nothing about any firewall software 24.379 -you might have installed on your system or network. It cannot detect 24.380 -or control your firewall software. If other people are unable to talk 24.381 -to a running \hgcmd{serve} instance, the second thing you should do 24.382 -(\emph{after} you make sure that they're using the correct URL) is 24.383 -check your firewall configuration. 24.384 - 24.385 -By default, \hgcmd{serve} listens for incoming connections on 24.386 -port~8000. If another process is already listening on the port you 24.387 -want to use, you can specify a different port to listen on using the 24.388 -\hgopt{serve}{-p} option. 24.389 - 24.390 -Normally, when \hgcmd{serve} starts, it prints no output, which can be 24.391 -a bit unnerving. If you'd like to confirm that it is indeed running 24.392 -correctly, and find out what URL you should send to your 24.393 -collaborators, start it with the \hggopt{-v} option. 24.394 - 24.395 -\section{Using the Secure Shell (ssh) protocol} 24.396 -\label{sec:collab:ssh} 24.397 - 24.398 -You can pull and push changes securely over a network connection using 24.399 -the Secure Shell (\texttt{ssh}) protocol. To use this successfully, 24.400 -you may have to do a little bit of configuration on the client or 24.401 -server sides. 24.402 - 24.403 -If you're not familiar with ssh, it's a network protocol that lets you 24.404 -securely communicate with another computer. To use it with Mercurial, 24.405 -you'll be setting up one or more user accounts on a server so that 24.406 -remote users can log in and execute commands. 24.407 - 24.408 -(If you \emph{are} familiar with ssh, you'll probably find some of the 24.409 -material that follows to be elementary in nature.) 24.410 - 24.411 -\subsection{How to read and write ssh URLs} 24.412 - 24.413 -An ssh URL tends to look like this: 24.414 -\begin{codesample2} 24.415 - ssh://bos@hg.serpentine.com:22/hg/hgbook 24.416 -\end{codesample2} 24.417 -\begin{enumerate} 24.418 -\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh 24.419 - protocol. 24.420 -\item The ``\texttt{bos@}'' component indicates what username to log 24.421 - into the server as. You can leave this out if the remote username 24.422 - is the same as your local username. 24.423 -\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the 24.424 - server to log into. 24.425 -\item The ``:22'' identifies the port number to connect to the server 24.426 - on. The default port is~22, so you only need to specify this part 24.427 - if you're \emph{not} using port~22. 24.428 -\item The remainder of the URL is the local path to the repository on 24.429 - the server. 24.430 -\end{enumerate} 24.431 - 24.432 -There's plenty of scope for confusion with the path component of ssh 24.433 -URLs, as there is no standard way for tools to interpret it. Some 24.434 -programs behave differently than others when dealing with these paths. 24.435 -This isn't an ideal situation, but it's unlikely to change. Please 24.436 -read the following paragraphs carefully. 24.437 - 24.438 -Mercurial treats the path to a repository on the server as relative to 24.439 -the remote user's home directory. For example, if user \texttt{foo} 24.440 -on the server has a home directory of \dirname{/home/foo}, then an ssh 24.441 -URL that contains a path component of \dirname{bar} 24.442 -\emph{really} refers to the directory \dirname{/home/foo/bar}. 24.443 - 24.444 -If you want to specify a path relative to another user's home 24.445 -directory, you can use a path that starts with a tilde character 24.446 -followed by the user's name (let's call them \texttt{otheruser}), like 24.447 -this. 24.448 -\begin{codesample2} 24.449 - ssh://server/~otheruser/hg/repo 24.450 -\end{codesample2} 24.451 - 24.452 -And if you really want to specify an \emph{absolute} path on the 24.453 -server, begin the path component with two slashes, as in this example. 24.454 -\begin{codesample2} 24.455 - ssh://server//absolute/path 24.456 -\end{codesample2} 24.457 - 24.458 -\subsection{Finding an ssh client for your system} 24.459 - 24.460 -Almost every Unix-like system comes with OpenSSH preinstalled. If 24.461 -you're using such a system, run \Verb|which ssh| to find out if 24.462 -the \command{ssh} command is installed (it's usually in 24.463 -\dirname{/usr/bin}). In the unlikely event that it isn't present, 24.464 -take a look at your system documentation to figure out how to install 24.465 -it. 24.466 - 24.467 -On Windows, you'll first need to download a suitable ssh 24.468 -client. There are two alternatives. 24.469 -\begin{itemize} 24.470 -\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides 24.471 - a complete suite of ssh client commands. 24.472 -\item If you have a high tolerance for pain, you can use the Cygwin 24.473 - port of OpenSSH. 24.474 -\end{itemize} 24.475 -In either case, you'll need to edit your \hgini\ file to tell 24.476 -Mercurial where to find the actual client command. For example, if 24.477 -you're using PuTTY, you'll need to use the \command{plink} command as 24.478 -a command-line ssh client. 24.479 -\begin{codesample2} 24.480 - [ui] 24.481 - ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key" 24.482 -\end{codesample2} 24.483 - 24.484 -\begin{note} 24.485 - The path to \command{plink} shouldn't contain any whitespace 24.486 - characters, or Mercurial may not be able to run it correctly (so 24.487 - putting it in \dirname{C:\\Program Files} is probably not a good 24.488 - idea). 24.489 -\end{note} 24.490 - 24.491 -\subsection{Generating a key pair} 24.492 - 24.493 -To avoid the need to repetitively type a password every time you need 24.494 -to use your ssh client, I recommend generating a key pair. On a 24.495 -Unix-like system, the \command{ssh-keygen} command will do the trick. 24.496 -On Windows, if you're using PuTTY, the \command{puttygen} command is 24.497 -what you'll need. 24.498 - 24.499 -When you generate a key pair, it's usually \emph{highly} advisable to 24.500 -protect it with a passphrase. (The only time that you might not want 24.501 -to do this is when you're using the ssh protocol for automated tasks 24.502 -on a secure network.) 24.503 - 24.504 -Simply generating a key pair isn't enough, however. You'll need to 24.505 -add the public key to the set of authorised keys for whatever user 24.506 -you're logging in remotely as. For servers using OpenSSH (the vast 24.507 -majority), this will mean adding the public key to a list in a file 24.508 -called \sfilename{authorized\_keys} in their \sdirname{.ssh} 24.509 -directory. 24.510 - 24.511 -On a Unix-like system, your public key will have a \filename{.pub} 24.512 -extension. If you're using \command{puttygen} on Windows, you can 24.513 -save the public key to a file of your choosing, or paste it from the 24.514 -window it's displayed in straight into the 24.515 -\sfilename{authorized\_keys} file. 24.516 - 24.517 -\subsection{Using an authentication agent} 24.518 - 24.519 -An authentication agent is a daemon that stores passphrases in memory 24.520 -(so it will forget passphrases if you log out and log back in again). 24.521 -An ssh client will notice if it's running, and query it for a 24.522 -passphrase. If there's no authentication agent running, or the agent 24.523 -doesn't store the necessary passphrase, you'll have to type your 24.524 -passphrase every time Mercurial tries to communicate with a server on 24.525 -your behalf (e.g.~whenever you pull or push changes). 24.526 - 24.527 -The downside of storing passphrases in an agent is that it's possible 24.528 -for a well-prepared attacker to recover the plain text of your 24.529 -passphrases, in some cases even if your system has been power-cycled. 24.530 -You should make your own judgment as to whether this is an acceptable 24.531 -risk. It certainly saves a lot of repeated typing. 24.532 - 24.533 -On Unix-like systems, the agent is called \command{ssh-agent}, and 24.534 -it's often run automatically for you when you log in. You'll need to 24.535 -use the \command{ssh-add} command to add passphrases to the agent's 24.536 -store. On Windows, if you're using PuTTY, the \command{pageant} 24.537 -command acts as the agent. It adds an icon to your system tray that 24.538 -will let you manage stored passphrases. 24.539 - 24.540 -\subsection{Configuring the server side properly} 24.541 - 24.542 -Because ssh can be fiddly to set up if you're new to it, there's a 24.543 -variety of things that can go wrong. Add Mercurial on top, and 24.544 -there's plenty more scope for head-scratching. Most of these 24.545 -potential problems occur on the server side, not the client side. The 24.546 -good news is that once you've gotten a configuration working, it will 24.547 -usually continue to work indefinitely. 24.548 - 24.549 -Before you try using Mercurial to talk to an ssh server, it's best to 24.550 -make sure that you can use the normal \command{ssh} or \command{putty} 24.551 -command to talk to the server first. If you run into problems with 24.552 -using these commands directly, Mercurial surely won't work. Worse, it 24.553 -will obscure the underlying problem. Any time you want to debug 24.554 -ssh-related Mercurial problems, you should drop back to making sure 24.555 -that plain ssh client commands work first, \emph{before} you worry 24.556 -about whether there's a problem with Mercurial. 24.557 - 24.558 -The first thing to be sure of on the server side is that you can 24.559 -actually log in from another machine at all. If you can't use 24.560 -\command{ssh} or \command{putty} to log in, the error message you get 24.561 -may give you a few hints as to what's wrong. The most common problems 24.562 -are as follows. 24.563 -\begin{itemize} 24.564 -\item If you get a ``connection refused'' error, either there isn't an 24.565 - SSH daemon running on the server at all, or it's inaccessible due to 24.566 - firewall configuration. 24.567 -\item If you get a ``no route to host'' error, you either have an 24.568 - incorrect address for the server or a seriously locked down firewall 24.569 - that won't admit its existence at all. 24.570 -\item If you get a ``permission denied'' error, you may have mistyped 24.571 - the username on the server, or you could have mistyped your key's 24.572 - passphrase or the remote user's password. 24.573 -\end{itemize} 24.574 -In summary, if you're having trouble talking to the server's ssh 24.575 -daemon, first make sure that one is running at all. On many systems 24.576 -it will be installed, but disabled, by default. Once you're done with 24.577 -this step, you should then check that the server's firewall is 24.578 -configured to allow incoming connections on the port the ssh daemon is 24.579 -listening on (usually~22). Don't worry about more exotic 24.580 -possibilities for misconfiguration until you've checked these two 24.581 -first. 24.582 - 24.583 -If you're using an authentication agent on the client side to store 24.584 -passphrases for your keys, you ought to be able to log into the server 24.585 -without being prompted for a passphrase or a password. If you're 24.586 -prompted for a passphrase, there are a few possible culprits. 24.587 -\begin{itemize} 24.588 -\item You might have forgotten to use \command{ssh-add} or 24.589 - \command{pageant} to store the passphrase. 24.590 -\item You might have stored the passphrase for the wrong key. 24.591 -\end{itemize} 24.592 -If you're being prompted for the remote user's password, there are 24.593 -another few possible problems to check. 24.594 -\begin{itemize} 24.595 -\item Either the user's home directory or their \sdirname{.ssh} 24.596 - directory might have excessively liberal permissions. As a result, 24.597 - the ssh daemon will not trust or read their 24.598 - \sfilename{authorized\_keys} file. For example, a group-writable 24.599 - home or \sdirname{.ssh} directory will often cause this symptom. 24.600 -\item The user's \sfilename{authorized\_keys} file may have a problem. 24.601 - If anyone other than the user owns or can write to that file, the 24.602 - ssh daemon will not trust or read it. 24.603 -\end{itemize} 24.604 - 24.605 -In the ideal world, you should be able to run the following command 24.606 -successfully, and it should print exactly one line of output, the 24.607 -current date and time. 24.608 -\begin{codesample2} 24.609 - ssh myserver date 24.610 -\end{codesample2} 24.611 - 24.612 -If, on your server, you have login scripts that print banners or other 24.613 -junk even when running non-interactive commands like this, you should 24.614 -fix them before you continue, so that they only print output if 24.615 -they're run interactively. Otherwise these banners will at least 24.616 -clutter up Mercurial's output. Worse, they could potentially cause 24.617 -problems with running Mercurial commands remotely. Mercurial makes 24.618 -tries to detect and ignore banners in non-interactive \command{ssh} 24.619 -sessions, but it is not foolproof. (If you're editing your login 24.620 -scripts on your server, the usual way to see if a login script is 24.621 -running in an interactive shell is to check the return code from the 24.622 -command \Verb|tty -s|.) 24.623 - 24.624 -Once you've verified that plain old ssh is working with your server, 24.625 -the next step is to ensure that Mercurial runs on the server. The 24.626 -following command should run successfully: 24.627 -\begin{codesample2} 24.628 - ssh myserver hg version 24.629 -\end{codesample2} 24.630 -If you see an error message instead of normal \hgcmd{version} output, 24.631 -this is usually because you haven't installed Mercurial to 24.632 -\dirname{/usr/bin}. Don't worry if this is the case; you don't need 24.633 -to do that. But you should check for a few possible problems. 24.634 -\begin{itemize} 24.635 -\item Is Mercurial really installed on the server at all? I know this 24.636 - sounds trivial, but it's worth checking! 24.637 -\item Maybe your shell's search path (usually set via the \envar{PATH} 24.638 - environment variable) is simply misconfigured. 24.639 -\item Perhaps your \envar{PATH} environment variable is only being set 24.640 - to point to the location of the \command{hg} executable if the login 24.641 - session is interactive. This can happen if you're setting the path 24.642 - in the wrong shell login script. See your shell's documentation for 24.643 - details. 24.644 -\item The \envar{PYTHONPATH} environment variable may need to contain 24.645 - the path to the Mercurial Python modules. It might not be set at 24.646 - all; it could be incorrect; or it may be set only if the login is 24.647 - interactive. 24.648 -\end{itemize} 24.649 - 24.650 -If you can run \hgcmd{version} over an ssh connection, well done! 24.651 -You've got the server and client sorted out. You should now be able 24.652 -to use Mercurial to access repositories hosted by that username on 24.653 -that server. If you run into problems with Mercurial and ssh at this 24.654 -point, try using the \hggopt{--debug} option to get a clearer picture 24.655 -of what's going on. 24.656 - 24.657 -\subsection{Using compression with ssh} 24.658 - 24.659 -Mercurial does not compress data when it uses the ssh protocol, 24.660 -because the ssh protocol can transparently compress data. However, 24.661 -the default behaviour of ssh clients is \emph{not} to request 24.662 -compression. 24.663 - 24.664 -Over any network other than a fast LAN (even a wireless network), 24.665 -using compression is likely to significantly speed up Mercurial's 24.666 -network operations. For example, over a WAN, someone measured 24.667 -compression as reducing the amount of time required to clone a 24.668 -particularly large repository from~51 minutes to~17 minutes. 24.669 - 24.670 -Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C} 24.671 -option which turns on compression. You can easily edit your \hgrc\ to 24.672 -enable compression for all of Mercurial's uses of the ssh protocol. 24.673 -\begin{codesample2} 24.674 - [ui] 24.675 - ssh = ssh -C 24.676 -\end{codesample2} 24.677 - 24.678 -If you use \command{ssh}, you can configure it to always use 24.679 -compression when talking to your server. To do this, edit your 24.680 -\sfilename{.ssh/config} file (which may not yet exist), as follows. 24.681 -\begin{codesample2} 24.682 - Host hg 24.683 - Compression yes 24.684 - HostName hg.example.com 24.685 -\end{codesample2} 24.686 -This defines an alias, \texttt{hg}. When you use it on the 24.687 -\command{ssh} command line or in a Mercurial \texttt{ssh}-protocol 24.688 -URL, it will cause \command{ssh} to connect to \texttt{hg.example.com} 24.689 -and use compression. This gives you both a shorter name to type and 24.690 -compression, each of which is a good thing in its own right. 24.691 - 24.692 -\section{Serving over HTTP using CGI} 24.693 -\label{sec:collab:cgi} 24.694 - 24.695 -Depending on how ambitious you are, configuring Mercurial's CGI 24.696 -interface can take anything from a few moments to several hours. 24.697 - 24.698 -We'll begin with the simplest of examples, and work our way towards a 24.699 -more complex configuration. Even for the most basic case, you're 24.700 -almost certainly going to need to read and modify your web server's 24.701 -configuration. 24.702 - 24.703 -\begin{note} 24.704 - Configuring a web server is a complex, fiddly, and highly 24.705 - system-dependent activity. I can't possibly give you instructions 24.706 - that will cover anything like all of the cases you will encounter. 24.707 - Please use your discretion and judgment in following the sections 24.708 - below. Be prepared to make plenty of mistakes, and to spend a lot 24.709 - of time reading your server's error logs. 24.710 -\end{note} 24.711 - 24.712 -\subsection{Web server configuration checklist} 24.713 - 24.714 -Before you continue, do take a few moments to check a few aspects of 24.715 -your system's setup. 24.716 - 24.717 -\begin{enumerate} 24.718 -\item Do you have a web server installed at all? Mac OS X ships with 24.719 - Apache, but many other systems may not have a web server installed. 24.720 -\item If you have a web server installed, is it actually running? On 24.721 - most systems, even if one is present, it will be disabled by 24.722 - default. 24.723 -\item Is your server configured to allow you to run CGI programs in 24.724 - the directory where you plan to do so? Most servers default to 24.725 - explicitly disabling the ability to run CGI programs. 24.726 -\end{enumerate} 24.727 - 24.728 -If you don't have a web server installed, and don't have substantial 24.729 -experience configuring Apache, you should consider using the 24.730 -\texttt{lighttpd} web server instead of Apache. Apache has a 24.731 -well-deserved reputation for baroque and confusing configuration. 24.732 -While \texttt{lighttpd} is less capable in some ways than Apache, most 24.733 -of these capabilities are not relevant to serving Mercurial 24.734 -repositories. And \texttt{lighttpd} is undeniably \emph{much} easier 24.735 -to get started with than Apache. 24.736 - 24.737 -\subsection{Basic CGI configuration} 24.738 - 24.739 -On Unix-like systems, it's common for users to have a subdirectory 24.740 -named something like \dirname{public\_html} in their home directory, 24.741 -from which they can serve up web pages. A file named \filename{foo} 24.742 -in this directory will be accessible at a URL of the form 24.743 -\texttt{http://www.example.com/\~{}username/foo}. 24.744 - 24.745 -To get started, find the \sfilename{hgweb.cgi} script that should be 24.746 -present in your Mercurial installation. If you can't quickly find a 24.747 -local copy on your system, simply download one from the master 24.748 -Mercurial repository at 24.749 -\url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}. 24.750 - 24.751 -You'll need to copy this script into your \dirname{public\_html} 24.752 -directory, and ensure that it's executable. 24.753 -\begin{codesample2} 24.754 - cp .../hgweb.cgi ~/public_html 24.755 - chmod 755 ~/public_html/hgweb.cgi 24.756 -\end{codesample2} 24.757 -The \texttt{755} argument to \command{chmod} is a little more general 24.758 -than just making the script executable: it ensures that the script is 24.759 -executable by anyone, and that ``group'' and ``other'' write 24.760 -permissions are \emph{not} set. If you were to leave those write 24.761 -permissions enabled, Apache's \texttt{suexec} subsystem would likely 24.762 -refuse to execute the script. In fact, \texttt{suexec} also insists 24.763 -that the \emph{directory} in which the script resides must not be 24.764 -writable by others. 24.765 -\begin{codesample2} 24.766 - chmod 755 ~/public_html 24.767 -\end{codesample2} 24.768 - 24.769 -\subsubsection{What could \emph{possibly} go wrong?} 24.770 -\label{sec:collab:wtf} 24.771 - 24.772 -Once you've copied the CGI script into place, go into a web browser, 24.773 -and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi}, 24.774 -\emph{but} brace yourself for instant failure. There's a high 24.775 -probability that trying to visit this URL will fail, and there are 24.776 -many possible reasons for this. In fact, you're likely to stumble 24.777 -over almost every one of the possible errors below, so please read 24.778 -carefully. The following are all of the problems I ran into on a 24.779 -system running Fedora~7, with a fresh installation of Apache, and a 24.780 -user account that I created specially to perform this exercise. 24.781 - 24.782 -Your web server may have per-user directories disabled. If you're 24.783 -using Apache, search your config file for a \texttt{UserDir} 24.784 -directive. If there's none present, per-user directories will be 24.785 -disabled. If one exists, but its value is \texttt{disabled}, then 24.786 -per-user directories will be disabled. Otherwise, the string after 24.787 -\texttt{UserDir} gives the name of the subdirectory that Apache will 24.788 -look in under your home directory, for example \dirname{public\_html}. 24.789 - 24.790 -Your file access permissions may be too restrictive. The web server 24.791 -must be able to traverse your home directory and directories under 24.792 -your \dirname{public\_html} directory, and read files under the latter 24.793 -too. Here's a quick recipe to help you to make your permissions more 24.794 -appropriate. 24.795 -\begin{codesample2} 24.796 - chmod 755 ~ 24.797 - find ~/public_html -type d -print0 | xargs -0r chmod 755 24.798 - find ~/public_html -type f -print0 | xargs -0r chmod 644 24.799 -\end{codesample2} 24.800 - 24.801 -The other possibility with permissions is that you might get a 24.802 -completely empty window when you try to load the script. In this 24.803 -case, it's likely that your access permissions are \emph{too 24.804 - permissive}. Apache's \texttt{suexec} subsystem won't execute a 24.805 -script that's group-~or world-writable, for example. 24.806 - 24.807 -Your web server may be configured to disallow execution of CGI 24.808 -programs in your per-user web directory. Here's Apache's 24.809 -default per-user configuration from my Fedora system. 24.810 -\begin{codesample2} 24.811 - <Directory /home/*/public_html> 24.812 - AllowOverride FileInfo AuthConfig Limit 24.813 - Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec 24.814 - <Limit GET POST OPTIONS> 24.815 - Order allow,deny 24.816 - Allow from all 24.817 - </Limit> 24.818 - <LimitExcept GET POST OPTIONS> 24.819 - Order deny,allow 24.820 - Deny from all 24.821 - </LimitExcept> 24.822 - </Directory> 24.823 -\end{codesample2} 24.824 -If you find a similar-looking \texttt{Directory} group in your Apache 24.825 -configuration, the directive to look at inside it is \texttt{Options}. 24.826 -Add \texttt{ExecCGI} to the end of this list if it's missing, and 24.827 -restart the web server. 24.828 - 24.829 -If you find that Apache serves you the text of the CGI script instead 24.830 -of executing it, you may need to either uncomment (if already present) 24.831 -or add a directive like this. 24.832 -\begin{codesample2} 24.833 - AddHandler cgi-script .cgi 24.834 -\end{codesample2} 24.835 - 24.836 -The next possibility is that you might be served with a colourful 24.837 -Python backtrace claiming that it can't import a 24.838 -\texttt{mercurial}-related module. This is actually progress! The 24.839 -server is now capable of executing your CGI script. This error is 24.840 -only likely to occur if you're running a private installation of 24.841 -Mercurial, instead of a system-wide version. Remember that the web 24.842 -server runs the CGI program without any of the environment variables 24.843 -that you take for granted in an interactive session. If this error 24.844 -happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the 24.845 -directions inside it to correctly set your \envar{PYTHONPATH} 24.846 -environment variable. 24.847 - 24.848 -Finally, you are \emph{certain} to by served with another colourful 24.849 -Python backtrace: this one will complain that it can't find 24.850 -\dirname{/path/to/repository}. Edit your \sfilename{hgweb.cgi} script 24.851 -and replace the \dirname{/path/to/repository} string with the complete 24.852 -path to the repository you want to serve up. 24.853 - 24.854 -At this point, when you try to reload the page, you should be 24.855 -presented with a nice HTML view of your repository's history. Whew! 24.856 - 24.857 -\subsubsection{Configuring lighttpd} 24.858 - 24.859 -To be exhaustive in my experiments, I tried configuring the 24.860 -increasingly popular \texttt{lighttpd} web server to serve the same 24.861 -repository as I described with Apache above. I had already overcome 24.862 -all of the problems I outlined with Apache, many of which are not 24.863 -server-specific. As a result, I was fairly sure that my file and 24.864 -directory permissions were good, and that my \sfilename{hgweb.cgi} 24.865 -script was properly edited. 24.866 - 24.867 -Once I had Apache running, getting \texttt{lighttpd} to serve the 24.868 -repository was a snap (in other words, even if you're trying to use 24.869 -\texttt{lighttpd}, you should read the Apache section). I first had 24.870 -to edit the \texttt{mod\_access} section of its config file to enable 24.871 -\texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were 24.872 -disabled by default on my system. I then added a few lines to the end 24.873 -of the config file, to configure these modules. 24.874 -\begin{codesample2} 24.875 - userdir.path = "public_html" 24.876 - cgi.assign = ( ".cgi" => "" ) 24.877 -\end{codesample2} 24.878 -With this done, \texttt{lighttpd} ran immediately for me. If I had 24.879 -configured \texttt{lighttpd} before Apache, I'd almost certainly have 24.880 -run into many of the same system-level configuration problems as I did 24.881 -with Apache. However, I found \texttt{lighttpd} to be noticeably 24.882 -easier to configure than Apache, even though I've used Apache for over 24.883 -a decade, and this was my first exposure to \texttt{lighttpd}. 24.884 - 24.885 -\subsection{Sharing multiple repositories with one CGI script} 24.886 - 24.887 -The \sfilename{hgweb.cgi} script only lets you publish a single 24.888 -repository, which is an annoying restriction. If you want to publish 24.889 -more than one without wracking yourself with multiple copies of the 24.890 -same script, each with different names, a better choice is to use the 24.891 -\sfilename{hgwebdir.cgi} script. 24.892 - 24.893 -The procedure to configure \sfilename{hgwebdir.cgi} is only a little 24.894 -more involved than for \sfilename{hgweb.cgi}. First, you must obtain 24.895 -a copy of the script. If you don't have one handy, you can download a 24.896 -copy from the master Mercurial repository at 24.897 -\url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}. 24.898 - 24.899 -You'll need to copy this script into your \dirname{public\_html} 24.900 -directory, and ensure that it's executable. 24.901 -\begin{codesample2} 24.902 - cp .../hgwebdir.cgi ~/public_html 24.903 - chmod 755 ~/public_html ~/public_html/hgwebdir.cgi 24.904 -\end{codesample2} 24.905 -With basic configuration out of the way, try to visit 24.906 -\url{http://myhostname/~myuser/hgwebdir.cgi} in your browser. It 24.907 -should display an empty list of repositories. If you get a blank 24.908 -window or error message, try walking through the list of potential 24.909 -problems in section~\ref{sec:collab:wtf}. 24.910 - 24.911 -The \sfilename{hgwebdir.cgi} script relies on an external 24.912 -configuration file. By default, it searches for a file named 24.913 -\sfilename{hgweb.config} in the same directory as itself. You'll need 24.914 -to create this file, and make it world-readable. The format of the 24.915 -file is similar to a Windows ``ini'' file, as understood by Python's 24.916 -\texttt{ConfigParser}~\cite{web:configparser} module. 24.917 - 24.918 -The easiest way to configure \sfilename{hgwebdir.cgi} is with a 24.919 -section named \texttt{collections}. This will automatically publish 24.920 -\emph{every} repository under the directories you name. The section 24.921 -should look like this: 24.922 -\begin{codesample2} 24.923 - [collections] 24.924 - /my/root = /my/root 24.925 -\end{codesample2} 24.926 -Mercurial interprets this by looking at the directory name on the 24.927 -\emph{right} hand side of the ``\texttt{=}'' sign; finding 24.928 -repositories in that directory hierarchy; and using the text on the 24.929 -\emph{left} to strip off matching text from the names it will actually 24.930 -list in the web interface. The remaining component of a path after 24.931 -this stripping has occurred is called a ``virtual path''. 24.932 - 24.933 -Given the example above, if we have a repository whose local path is 24.934 -\dirname{/my/root/this/repo}, the CGI script will strip the leading 24.935 -\dirname{/my/root} from the name, and publish the repository with a 24.936 -virtual path of \dirname{this/repo}. If the base URL for our CGI 24.937 -script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete 24.938 -URL for that repository will be 24.939 -\url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}. 24.940 - 24.941 -If we replace \dirname{/my/root} on the left hand side of this example 24.942 -with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off 24.943 -\dirname{/my} from the repository name, and will give us a virtual 24.944 -path of \dirname{root/this/repo} instead of \dirname{this/repo}. 24.945 - 24.946 -The \sfilename{hgwebdir.cgi} script will recursively search each 24.947 -directory listed in the \texttt{collections} section of its 24.948 -configuration file, but it will \texttt{not} recurse into the 24.949 -repositories it finds. 24.950 - 24.951 -The \texttt{collections} mechanism makes it easy to publish many 24.952 -repositories in a ``fire and forget'' manner. You only need to set up 24.953 -the CGI script and configuration file one time. Afterwards, you can 24.954 -publish or unpublish a repository at any time by simply moving it 24.955 -into, or out of, the directory hierarchy in which you've configured 24.956 -\sfilename{hgwebdir.cgi} to look. 24.957 - 24.958 -\subsubsection{Explicitly specifying which repositories to publish} 24.959 - 24.960 -In addition to the \texttt{collections} mechanism, the 24.961 -\sfilename{hgwebdir.cgi} script allows you to publish a specific list 24.962 -of repositories. To do so, create a \texttt{paths} section, with 24.963 -contents of the following form. 24.964 -\begin{codesample2} 24.965 - [paths] 24.966 - repo1 = /my/path/to/some/repo 24.967 - repo2 = /some/path/to/another 24.968 -\end{codesample2} 24.969 -In this case, the virtual path (the component that will appear in a 24.970 -URL) is on the left hand side of each definition, while the path to 24.971 -the repository is on the right. Notice that there does not need to be 24.972 -any relationship between the virtual path you choose and the location 24.973 -of a repository in your filesystem. 24.974 - 24.975 -If you wish, you can use both the \texttt{collections} and 24.976 -\texttt{paths} mechanisms simultaneously in a single configuration 24.977 -file. 24.978 - 24.979 -\begin{note} 24.980 - If multiple repositories have the same virtual path, 24.981 - \sfilename{hgwebdir.cgi} will not report an error. Instead, it will 24.982 - behave unpredictably. 24.983 -\end{note} 24.984 - 24.985 -\subsection{Downloading source archives} 24.986 - 24.987 -Mercurial's web interface lets users download an archive of any 24.988 -revision. This archive will contain a snapshot of the working 24.989 -directory as of that revision, but it will not contain a copy of the 24.990 -repository data. 24.991 - 24.992 -By default, this feature is not enabled. To enable it, you'll need to 24.993 -add an \rcitem{web}{allow\_archive} item to the \rcsection{web} 24.994 -section of your \hgrc. 24.995 - 24.996 -\subsection{Web configuration options} 24.997 - 24.998 -Mercurial's web interfaces (the \hgcmd{serve} command, and the 24.999 -\sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a 24.1000 -number of configuration options that you can set. These belong in a 24.1001 -section named \rcsection{web}. 24.1002 -\begin{itemize} 24.1003 -\item[\rcitem{web}{allow\_archive}] Determines which (if any) archive 24.1004 - download mechanisms Mercurial supports. If you enable this 24.1005 - feature, users of the web interface will be able to download an 24.1006 - archive of whatever revision of a repository they are viewing. 24.1007 - To enable the archive feature, this item must take the form of a 24.1008 - sequence of words drawn from the list below. 24.1009 - \begin{itemize} 24.1010 - \item[\texttt{bz2}] A \command{tar} archive, compressed using 24.1011 - \texttt{bzip2} compression. This has the best compression ratio, 24.1012 - but uses the most CPU time on the server. 24.1013 - \item[\texttt{gz}] A \command{tar} archive, compressed using 24.1014 - \texttt{gzip} compression. 24.1015 - \item[\texttt{zip}] A \command{zip} archive, compressed using LZW 24.1016 - compression. This format has the worst compression ratio, but is 24.1017 - widely used in the Windows world. 24.1018 - \end{itemize} 24.1019 - If you provide an empty list, or don't have an 24.1020 - \rcitem{web}{allow\_archive} entry at all, this feature will be 24.1021 - disabled. Here is an example of how to enable all three supported 24.1022 - formats. 24.1023 - \begin{codesample4} 24.1024 - [web] 24.1025 - allow_archive = bz2 gz zip 24.1026 - \end{codesample4} 24.1027 -\item[\rcitem{web}{allowpull}] Boolean. Determines whether the web 24.1028 - interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this 24.1029 - repository over~HTTP. If set to \texttt{no} or \texttt{false}, only 24.1030 - the ``human-oriented'' portion of the web interface is available. 24.1031 -\item[\rcitem{web}{contact}] String. A free-form (but preferably 24.1032 - brief) string identifying the person or group in charge of the 24.1033 - repository. This often contains the name and email address of a 24.1034 - person or mailing list. It often makes sense to place this entry in 24.1035 - a repository's own \sfilename{.hg/hgrc} file, but it can make sense 24.1036 - to use in a global \hgrc\ if every repository has a single 24.1037 - maintainer. 24.1038 -\item[\rcitem{web}{maxchanges}] Integer. The default maximum number 24.1039 - of changesets to display in a single page of output. 24.1040 -\item[\rcitem{web}{maxfiles}] Integer. The default maximum number 24.1041 - of modified files to display in a single page of output. 24.1042 -\item[\rcitem{web}{stripes}] Integer. If the web interface displays 24.1043 - alternating ``stripes'' to make it easier to visually align rows 24.1044 - when you are looking at a table, this number controls the number of 24.1045 - rows in each stripe. 24.1046 -\item[\rcitem{web}{style}] Controls the template Mercurial uses to 24.1047 - display the web interface. Mercurial ships with two web templates, 24.1048 - named \texttt{default} and \texttt{gitweb} (the latter is much more 24.1049 - visually attractive). You can also specify a custom template of 24.1050 - your own; see chapter~\ref{chap:template} for details. Here, you 24.1051 - can see how to enable the \texttt{gitweb} style. 24.1052 - \begin{codesample4} 24.1053 - [web] 24.1054 - style = gitweb 24.1055 - \end{codesample4} 24.1056 -\item[\rcitem{web}{templates}] Path. The directory in which to search 24.1057 - for template files. By default, Mercurial searches in the directory 24.1058 - in which it was installed. 24.1059 -\end{itemize} 24.1060 -If you are using \sfilename{hgwebdir.cgi}, you can place a few 24.1061 -configuration items in a \rcsection{web} section of the 24.1062 -\sfilename{hgweb.config} file instead of a \hgrc\ file, for 24.1063 -convenience. These items are \rcitem{web}{motd} and 24.1064 -\rcitem{web}{style}. 24.1065 - 24.1066 -\subsubsection{Options specific to an individual repository} 24.1067 - 24.1068 -A few \rcsection{web} configuration items ought to be placed in a 24.1069 -repository's local \sfilename{.hg/hgrc}, rather than a user's or 24.1070 -global \hgrc. 24.1071 -\begin{itemize} 24.1072 -\item[\rcitem{web}{description}] String. A free-form (but preferably 24.1073 - brief) string that describes the contents or purpose of the 24.1074 - repository. 24.1075 -\item[\rcitem{web}{name}] String. The name to use for the repository 24.1076 - in the web interface. This overrides the default name, which is the 24.1077 - last component of the repository's path. 24.1078 -\end{itemize} 24.1079 - 24.1080 -\subsubsection{Options specific to the \hgcmd{serve} command} 24.1081 - 24.1082 -Some of the items in the \rcsection{web} section of a \hgrc\ file are 24.1083 -only for use with the \hgcmd{serve} command. 24.1084 -\begin{itemize} 24.1085 -\item[\rcitem{web}{accesslog}] Path. The name of a file into which to 24.1086 - write an access log. By default, the \hgcmd{serve} command writes 24.1087 - this information to standard output, not to a file. Log entries are 24.1088 - written in the standard ``combined'' file format used by almost all 24.1089 - web servers. 24.1090 -\item[\rcitem{web}{address}] String. The local address on which the 24.1091 - server should listen for incoming connections. By default, the 24.1092 - server listens on all addresses. 24.1093 -\item[\rcitem{web}{errorlog}] Path. The name of a file into which to 24.1094 - write an error log. By default, the \hgcmd{serve} command writes this 24.1095 - information to standard error, not to a file. 24.1096 -\item[\rcitem{web}{ipv6}] Boolean. Whether to use the IPv6 protocol. 24.1097 - By default, IPv6 is not used. 24.1098 -\item[\rcitem{web}{port}] Integer. The TCP~port number on which the 24.1099 - server should listen. The default port number used is~8000. 24.1100 -\end{itemize} 24.1101 - 24.1102 -\subsubsection{Choosing the right \hgrc\ file to add \rcsection{web} 24.1103 - items to} 24.1104 - 24.1105 -It is important to remember that a web server like Apache or 24.1106 -\texttt{lighttpd} will run under a user~ID that is different to yours. 24.1107 -CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will 24.1108 -usually also run under that user~ID. 24.1109 - 24.1110 -If you add \rcsection{web} items to your own personal \hgrc\ file, CGI 24.1111 -scripts won't read that \hgrc\ file. Those settings will thus only 24.1112 -affect the behaviour of the \hgcmd{serve} command when you run it. To 24.1113 -cause CGI scripts to see your settings, either create a \hgrc\ file in 24.1114 -the home directory of the user ID that runs your web server, or add 24.1115 -those settings to a system-wide \hgrc\ file. 24.1116 - 24.1117 - 24.1118 -%%% Local Variables: 24.1119 -%%% mode: latex 24.1120 -%%% TeX-master: "00book" 24.1121 -%%% End:
25.1 --- a/en/concepts.tex Thu Jan 29 22:47:34 2009 -0800 25.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 25.3 @@ -1,577 +0,0 @@ 25.4 -\chapter{Behind the scenes} 25.5 -\label{chap:concepts} 25.6 - 25.7 -Unlike many revision control systems, the concepts upon which 25.8 -Mercurial is built are simple enough that it's easy to understand how 25.9 -the software really works. Knowing this certainly isn't necessary, 25.10 -but I find it useful to have a ``mental model'' of what's going on. 25.11 - 25.12 -This understanding gives me confidence that Mercurial has been 25.13 -carefully designed to be both \emph{safe} and \emph{efficient}. And 25.14 -just as importantly, if it's easy for me to retain a good idea of what 25.15 -the software is doing when I perform a revision control task, I'm less 25.16 -likely to be surprised by its behaviour. 25.17 - 25.18 -In this chapter, we'll initially cover the core concepts behind 25.19 -Mercurial's design, then continue to discuss some of the interesting 25.20 -details of its implementation. 25.21 - 25.22 -\section{Mercurial's historical record} 25.23 - 25.24 -\subsection{Tracking the history of a single file} 25.25 - 25.26 -When Mercurial tracks modifications to a file, it stores the history 25.27 -of that file in a metadata object called a \emph{filelog}. Each entry 25.28 -in the filelog contains enough information to reconstruct one revision 25.29 -of the file that is being tracked. Filelogs are stored as files in 25.30 -the \sdirname{.hg/store/data} directory. A filelog contains two kinds 25.31 -of information: revision data, and an index to help Mercurial to find 25.32 -a revision efficiently. 25.33 - 25.34 -A file that is large, or has a lot of history, has its filelog stored 25.35 -in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' 25.36 -suffix) files. For small files without much history, the revision 25.37 -data and index are combined in a single ``\texttt{.i}'' file. The 25.38 -correspondence between a file in the working directory and the filelog 25.39 -that tracks its history in the repository is illustrated in 25.40 -figure~\ref{fig:concepts:filelog}. 25.41 - 25.42 -\begin{figure}[ht] 25.43 - \centering 25.44 - \grafix{filelog} 25.45 - \caption{Relationships between files in working directory and 25.46 - filelogs in repository} 25.47 - \label{fig:concepts:filelog} 25.48 -\end{figure} 25.49 - 25.50 -\subsection{Managing tracked files} 25.51 - 25.52 -Mercurial uses a structure called a \emph{manifest} to collect 25.53 -together information about the files that it tracks. Each entry in 25.54 -the manifest contains information about the files present in a single 25.55 -changeset. An entry records which files are present in the changeset, 25.56 -the revision of each file, and a few other pieces of file metadata. 25.57 - 25.58 -\subsection{Recording changeset information} 25.59 - 25.60 -The \emph{changelog} contains information about each changeset. Each 25.61 -revision records who committed a change, the changeset comment, other 25.62 -pieces of changeset-related information, and the revision of the 25.63 -manifest to use. 25.64 - 25.65 -\subsection{Relationships between revisions} 25.66 - 25.67 -Within a changelog, a manifest, or a filelog, each revision stores a 25.68 -pointer to its immediate parent (or to its two parents, if it's a 25.69 -merge revision). As I mentioned above, there are also relationships 25.70 -between revisions \emph{across} these structures, and they are 25.71 -hierarchical in nature. 25.72 - 25.73 -For every changeset in a repository, there is exactly one revision 25.74 -stored in the changelog. Each revision of the changelog contains a 25.75 -pointer to a single revision of the manifest. A revision of the 25.76 -manifest stores a pointer to a single revision of each filelog tracked 25.77 -when that changeset was created. These relationships are illustrated 25.78 -in figure~\ref{fig:concepts:metadata}. 25.79 - 25.80 -\begin{figure}[ht] 25.81 - \centering 25.82 - \grafix{metadata} 25.83 - \caption{Metadata relationships} 25.84 - \label{fig:concepts:metadata} 25.85 -\end{figure} 25.86 - 25.87 -As the illustration shows, there is \emph{not} a ``one to one'' 25.88 -relationship between revisions in the changelog, manifest, or filelog. 25.89 -If the manifest hasn't changed between two changesets, the changelog 25.90 -entries for those changesets will point to the same revision of the 25.91 -manifest. If a file that Mercurial tracks hasn't changed between two 25.92 -changesets, the entry for that file in the two revisions of the 25.93 -manifest will point to the same revision of its filelog. 25.94 - 25.95 -\section{Safe, efficient storage} 25.96 - 25.97 -The underpinnings of changelogs, manifests, and filelogs are provided 25.98 -by a single structure called the \emph{revlog}. 25.99 - 25.100 -\subsection{Efficient storage} 25.101 - 25.102 -The revlog provides efficient storage of revisions using a 25.103 -\emph{delta} mechanism. Instead of storing a complete copy of a file 25.104 -for each revision, it stores the changes needed to transform an older 25.105 -revision into the new revision. For many kinds of file data, these 25.106 -deltas are typically a fraction of a percent of the size of a full 25.107 -copy of a file. 25.108 - 25.109 -Some obsolete revision control systems can only work with deltas of 25.110 -text files. They must either store binary files as complete snapshots 25.111 -or encoded into a text representation, both of which are wasteful 25.112 -approaches. Mercurial can efficiently handle deltas of files with 25.113 -arbitrary binary contents; it doesn't need to treat text as special. 25.114 - 25.115 -\subsection{Safe operation} 25.116 -\label{sec:concepts:txn} 25.117 - 25.118 -Mercurial only ever \emph{appends} data to the end of a revlog file. 25.119 -It never modifies a section of a file after it has written it. This 25.120 -is both more robust and efficient than schemes that need to modify or 25.121 -rewrite data. 25.122 - 25.123 -In addition, Mercurial treats every write as part of a 25.124 -\emph{transaction} that can span a number of files. A transaction is 25.125 -\emph{atomic}: either the entire transaction succeeds and its effects 25.126 -are all visible to readers in one go, or the whole thing is undone. 25.127 -This guarantee of atomicity means that if you're running two copies of 25.128 -Mercurial, where one is reading data and one is writing it, the reader 25.129 -will never see a partially written result that might confuse it. 25.130 - 25.131 -The fact that Mercurial only appends to files makes it easier to 25.132 -provide this transactional guarantee. The easier it is to do stuff 25.133 -like this, the more confident you should be that it's done correctly. 25.134 - 25.135 -\subsection{Fast retrieval} 25.136 - 25.137 -Mercurial cleverly avoids a pitfall common to all earlier 25.138 -revision control systems: the problem of \emph{inefficient retrieval}. 25.139 -Most revision control systems store the contents of a revision as an 25.140 -incremental series of modifications against a ``snapshot''. To 25.141 -reconstruct a specific revision, you must first read the snapshot, and 25.142 -then every one of the revisions between the snapshot and your target 25.143 -revision. The more history that a file accumulates, the more 25.144 -revisions you must read, hence the longer it takes to reconstruct a 25.145 -particular revision. 25.146 - 25.147 -\begin{figure}[ht] 25.148 - \centering 25.149 - \grafix{snapshot} 25.150 - \caption{Snapshot of a revlog, with incremental deltas} 25.151 - \label{fig:concepts:snapshot} 25.152 -\end{figure} 25.153 - 25.154 -The innovation that Mercurial applies to this problem is simple but 25.155 -effective. Once the cumulative amount of delta information stored 25.156 -since the last snapshot exceeds a fixed threshold, it stores a new 25.157 -snapshot (compressed, of course), instead of another delta. This 25.158 -makes it possible to reconstruct \emph{any} revision of a file 25.159 -quickly. This approach works so well that it has since been copied by 25.160 -several other revision control systems. 25.161 - 25.162 -Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry 25.163 -in a revlog's index file, Mercurial stores the range of entries from 25.164 -the data file that it must read to reconstruct a particular revision. 25.165 - 25.166 -\subsubsection{Aside: the influence of video compression} 25.167 - 25.168 -If you're familiar with video compression or have ever watched a TV 25.169 -feed through a digital cable or satellite service, you may know that 25.170 -most video compression schemes store each frame of video as a delta 25.171 -against its predecessor frame. In addition, these schemes use 25.172 -``lossy'' compression techniques to increase the compression ratio, so 25.173 -visual errors accumulate over the course of a number of inter-frame 25.174 -deltas. 25.175 - 25.176 -Because it's possible for a video stream to ``drop out'' occasionally 25.177 -due to signal glitches, and to limit the accumulation of artefacts 25.178 -introduced by the lossy compression process, video encoders 25.179 -periodically insert a complete frame (called a ``key frame'') into the 25.180 -video stream; the next delta is generated against that frame. This 25.181 -means that if the video signal gets interrupted, it will resume once 25.182 -the next key frame is received. Also, the accumulation of encoding 25.183 -errors restarts anew with each key frame. 25.184 - 25.185 -\subsection{Identification and strong integrity} 25.186 - 25.187 -Along with delta or snapshot information, a revlog entry contains a 25.188 -cryptographic hash of the data that it represents. This makes it 25.189 -difficult to forge the contents of a revision, and easy to detect 25.190 -accidental corruption. 25.191 - 25.192 -Hashes provide more than a mere check against corruption; they are 25.193 -used as the identifiers for revisions. The changeset identification 25.194 -hashes that you see as an end user are from revisions of the 25.195 -changelog. Although filelogs and the manifest also use hashes, 25.196 -Mercurial only uses these behind the scenes. 25.197 - 25.198 -Mercurial verifies that hashes are correct when it retrieves file 25.199 -revisions and when it pulls changes from another repository. If it 25.200 -encounters an integrity problem, it will complain and stop whatever 25.201 -it's doing. 25.202 - 25.203 -In addition to the effect it has on retrieval efficiency, Mercurial's 25.204 -use of periodic snapshots makes it more robust against partial data 25.205 -corruption. If a revlog becomes partly corrupted due to a hardware 25.206 -error or system bug, it's often possible to reconstruct some or most 25.207 -revisions from the uncorrupted sections of the revlog, both before and 25.208 -after the corrupted section. This would not be possible with a 25.209 -delta-only storage model. 25.210 - 25.211 -\section{Revision history, branching, 25.212 - and merging} 25.213 - 25.214 -Every entry in a Mercurial revlog knows the identity of its immediate 25.215 -ancestor revision, usually referred to as its \emph{parent}. In fact, 25.216 -a revision contains room for not one parent, but two. Mercurial uses 25.217 -a special hash, called the ``null ID'', to represent the idea ``there 25.218 -is no parent here''. This hash is simply a string of zeroes. 25.219 - 25.220 -In figure~\ref{fig:concepts:revlog}, you can see an example of the 25.221 -conceptual structure of a revlog. Filelogs, manifests, and changelogs 25.222 -all have this same structure; they differ only in the kind of data 25.223 -stored in each delta or snapshot. 25.224 - 25.225 -The first revision in a revlog (at the bottom of the image) has the 25.226 -null ID in both of its parent slots. For a ``normal'' revision, its 25.227 -first parent slot contains the ID of its parent revision, and its 25.228 -second contains the null ID, indicating that the revision has only one 25.229 -real parent. Any two revisions that have the same parent ID are 25.230 -branches. A revision that represents a merge between branches has two 25.231 -normal revision IDs in its parent slots. 25.232 - 25.233 -\begin{figure}[ht] 25.234 - \centering 25.235 - \grafix{revlog} 25.236 - \caption{} 25.237 - \label{fig:concepts:revlog} 25.238 -\end{figure} 25.239 - 25.240 -\section{The working directory} 25.241 - 25.242 -In the working directory, Mercurial stores a snapshot of the files 25.243 -from the repository as of a particular changeset. 25.244 - 25.245 -The working directory ``knows'' which changeset it contains. When you 25.246 -update the working directory to contain a particular changeset, 25.247 -Mercurial looks up the appropriate revision of the manifest to find 25.248 -out which files it was tracking at the time that changeset was 25.249 -committed, and which revision of each file was then current. It then 25.250 -recreates a copy of each of those files, with the same contents it had 25.251 -when the changeset was committed. 25.252 - 25.253 -The \emph{dirstate} contains Mercurial's knowledge of the working 25.254 -directory. This details which changeset the working directory is 25.255 -updated to, and all of the files that Mercurial is tracking in the 25.256 -working directory. 25.257 - 25.258 -Just as a revision of a revlog has room for two parents, so that it 25.259 -can represent either a normal revision (with one parent) or a merge of 25.260 -two earlier revisions, the dirstate has slots for two parents. When 25.261 -you use the \hgcmd{update} command, the changeset that you update to 25.262 -is stored in the ``first parent'' slot, and the null ID in the second. 25.263 -When you \hgcmd{merge} with another changeset, the first parent 25.264 -remains unchanged, and the second parent is filled in with the 25.265 -changeset you're merging with. The \hgcmd{parents} command tells you 25.266 -what the parents of the dirstate are. 25.267 - 25.268 -\subsection{What happens when you commit} 25.269 - 25.270 -The dirstate stores parent information for more than just book-keeping 25.271 -purposes. Mercurial uses the parents of the dirstate as \emph{the 25.272 - parents of a new changeset} when you perform a commit. 25.273 - 25.274 -\begin{figure}[ht] 25.275 - \centering 25.276 - \grafix{wdir} 25.277 - \caption{The working directory can have two parents} 25.278 - \label{fig:concepts:wdir} 25.279 -\end{figure} 25.280 - 25.281 -Figure~\ref{fig:concepts:wdir} shows the normal state of the working 25.282 -directory, where it has a single changeset as parent. That changeset 25.283 -is the \emph{tip}, the newest changeset in the repository that has no 25.284 -children. 25.285 - 25.286 -\begin{figure}[ht] 25.287 - \centering 25.288 - \grafix{wdir-after-commit} 25.289 - \caption{The working directory gains new parents after a commit} 25.290 - \label{fig:concepts:wdir-after-commit} 25.291 -\end{figure} 25.292 - 25.293 -It's useful to think of the working directory as ``the changeset I'm 25.294 -about to commit''. Any files that you tell Mercurial that you've 25.295 -added, removed, renamed, or copied will be reflected in that 25.296 -changeset, as will modifications to any files that Mercurial is 25.297 -already tracking; the new changeset will have the parents of the 25.298 -working directory as its parents. 25.299 - 25.300 -After a commit, Mercurial will update the parents of the working 25.301 -directory, so that the first parent is the ID of the new changeset, 25.302 -and the second is the null ID. This is shown in 25.303 -figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch 25.304 -any of the files in the working directory when you commit; it just 25.305 -modifies the dirstate to note its new parents. 25.306 - 25.307 -\subsection{Creating a new head} 25.308 - 25.309 -It's perfectly normal to update the working directory to a changeset 25.310 -other than the current tip. For example, you might want to know what 25.311 -your project looked like last Tuesday, or you could be looking through 25.312 -changesets to see which one introduced a bug. In cases like this, the 25.313 -natural thing to do is update the working directory to the changeset 25.314 -you're interested in, and then examine the files in the working 25.315 -directory directly to see their contents as they were when you 25.316 -committed that changeset. The effect of this is shown in 25.317 -figure~\ref{fig:concepts:wdir-pre-branch}. 25.318 - 25.319 -\begin{figure}[ht] 25.320 - \centering 25.321 - \grafix{wdir-pre-branch} 25.322 - \caption{The working directory, updated to an older changeset} 25.323 - \label{fig:concepts:wdir-pre-branch} 25.324 -\end{figure} 25.325 - 25.326 -Having updated the working directory to an older changeset, what 25.327 -happens if you make some changes, and then commit? Mercurial behaves 25.328 -in the same way as I outlined above. The parents of the working 25.329 -directory become the parents of the new changeset. This new changeset 25.330 -has no children, so it becomes the new tip. And the repository now 25.331 -contains two changesets that have no children; we call these 25.332 -\emph{heads}. You can see the structure that this creates in 25.333 -figure~\ref{fig:concepts:wdir-branch}. 25.334 - 25.335 -\begin{figure}[ht] 25.336 - \centering 25.337 - \grafix{wdir-branch} 25.338 - \caption{After a commit made while synced to an older changeset} 25.339 - \label{fig:concepts:wdir-branch} 25.340 -\end{figure} 25.341 - 25.342 -\begin{note} 25.343 - If you're new to Mercurial, you should keep in mind a common 25.344 - ``error'', which is to use the \hgcmd{pull} command without any 25.345 - options. By default, the \hgcmd{pull} command \emph{does not} 25.346 - update the working directory, so you'll bring new changesets into 25.347 - your repository, but the working directory will stay synced at the 25.348 - same changeset as before the pull. If you make some changes and 25.349 - commit afterwards, you'll thus create a new head, because your 25.350 - working directory isn't synced to whatever the current tip is. 25.351 - 25.352 - I put the word ``error'' in quotes because all that you need to do 25.353 - to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In 25.354 - other words, this almost never has negative consequences; it just 25.355 - surprises people. I'll discuss other ways to avoid this behaviour, 25.356 - and why Mercurial behaves in this initially surprising way, later 25.357 - on. 25.358 -\end{note} 25.359 - 25.360 -\subsection{Merging heads} 25.361 - 25.362 -When you run the \hgcmd{merge} command, Mercurial leaves the first 25.363 -parent of the working directory unchanged, and sets the second parent 25.364 -to the changeset you're merging with, as shown in 25.365 -figure~\ref{fig:concepts:wdir-merge}. 25.366 - 25.367 -\begin{figure}[ht] 25.368 - \centering 25.369 - \grafix{wdir-merge} 25.370 - \caption{Merging two heads} 25.371 - \label{fig:concepts:wdir-merge} 25.372 -\end{figure} 25.373 - 25.374 -Mercurial also has to modify the working directory, to merge the files 25.375 -managed in the two changesets. Simplified a little, the merging 25.376 -process goes like this, for every file in the manifests of both 25.377 -changesets. 25.378 -\begin{itemize} 25.379 -\item If neither changeset has modified a file, do nothing with that 25.380 - file. 25.381 -\item If one changeset has modified a file, and the other hasn't, 25.382 - create the modified copy of the file in the working directory. 25.383 -\item If one changeset has removed a file, and the other hasn't (or 25.384 - has also deleted it), delete the file from the working directory. 25.385 -\item If one changeset has removed a file, but the other has modified 25.386 - the file, ask the user what to do: keep the modified file, or remove 25.387 - it? 25.388 -\item If both changesets have modified a file, invoke an external 25.389 - merge program to choose the new contents for the merged file. This 25.390 - may require input from the user. 25.391 -\item If one changeset has modified a file, and the other has renamed 25.392 - or copied the file, make sure that the changes follow the new name 25.393 - of the file. 25.394 -\end{itemize} 25.395 -There are more details---merging has plenty of corner cases---but 25.396 -these are the most common choices that are involved in a merge. As 25.397 -you can see, most cases are completely automatic, and indeed most 25.398 -merges finish automatically, without requiring your input to resolve 25.399 -any conflicts. 25.400 - 25.401 -When you're thinking about what happens when you commit after a merge, 25.402 -once again the working directory is ``the changeset I'm about to 25.403 -commit''. After the \hgcmd{merge} command completes, the working 25.404 -directory has two parents; these will become the parents of the new 25.405 -changeset. 25.406 - 25.407 -Mercurial lets you perform multiple merges, but you must commit the 25.408 -results of each individual merge as you go. This is necessary because 25.409 -Mercurial only tracks two parents for both revisions and the working 25.410 -directory. While it would be technically possible to merge multiple 25.411 -changesets at once, the prospect of user confusion and making a 25.412 -terrible mess of a merge immediately becomes overwhelming. 25.413 - 25.414 -\section{Other interesting design features} 25.415 - 25.416 -In the sections above, I've tried to highlight some of the most 25.417 -important aspects of Mercurial's design, to illustrate that it pays 25.418 -careful attention to reliability and performance. However, the 25.419 -attention to detail doesn't stop there. There are a number of other 25.420 -aspects of Mercurial's construction that I personally find 25.421 -interesting. I'll detail a few of them here, separate from the ``big 25.422 -ticket'' items above, so that if you're interested, you can gain a 25.423 -better idea of the amount of thinking that goes into a well-designed 25.424 -system. 25.425 - 25.426 -\subsection{Clever compression} 25.427 - 25.428 -When appropriate, Mercurial will store both snapshots and deltas in 25.429 -compressed form. It does this by always \emph{trying to} compress a 25.430 -snapshot or delta, but only storing the compressed version if it's 25.431 -smaller than the uncompressed version. 25.432 - 25.433 -This means that Mercurial does ``the right thing'' when storing a file 25.434 -whose native form is compressed, such as a \texttt{zip} archive or a 25.435 -JPEG image. When these types of files are compressed a second time, 25.436 -the resulting file is usually bigger than the once-compressed form, 25.437 -and so Mercurial will store the plain \texttt{zip} or JPEG. 25.438 - 25.439 -Deltas between revisions of a compressed file are usually larger than 25.440 -snapshots of the file, and Mercurial again does ``the right thing'' in 25.441 -these cases. It finds that such a delta exceeds the threshold at 25.442 -which it should store a complete snapshot of the file, so it stores 25.443 -the snapshot, again saving space compared to a naive delta-only 25.444 -approach. 25.445 - 25.446 -\subsubsection{Network recompression} 25.447 - 25.448 -When storing revisions on disk, Mercurial uses the ``deflate'' 25.449 -compression algorithm (the same one used by the popular \texttt{zip} 25.450 -archive format), which balances good speed with a respectable 25.451 -compression ratio. However, when transmitting revision data over a 25.452 -network connection, Mercurial uncompresses the compressed revision 25.453 -data. 25.454 - 25.455 -If the connection is over HTTP, Mercurial recompresses the entire 25.456 -stream of data using a compression algorithm that gives a better 25.457 -compression ratio (the Burrows-Wheeler algorithm from the widely used 25.458 -\texttt{bzip2} compression package). This combination of algorithm 25.459 -and compression of the entire stream (instead of a revision at a time) 25.460 -substantially reduces the number of bytes to be transferred, yielding 25.461 -better network performance over almost all kinds of network. 25.462 - 25.463 -(If the connection is over \command{ssh}, Mercurial \emph{doesn't} 25.464 -recompress the stream, because \command{ssh} can already do this 25.465 -itself.) 25.466 - 25.467 -\subsection{Read/write ordering and atomicity} 25.468 - 25.469 -Appending to files isn't the whole story when it comes to guaranteeing 25.470 -that a reader won't see a partial write. If you recall 25.471 -figure~\ref{fig:concepts:metadata}, revisions in the changelog point to 25.472 -revisions in the manifest, and revisions in the manifest point to 25.473 -revisions in filelogs. This hierarchy is deliberate. 25.474 - 25.475 -A writer starts a transaction by writing filelog and manifest data, 25.476 -and doesn't write any changelog data until those are finished. A 25.477 -reader starts by reading changelog data, then manifest data, followed 25.478 -by filelog data. 25.479 - 25.480 -Since the writer has always finished writing filelog and manifest data 25.481 -before it writes to the changelog, a reader will never read a pointer 25.482 -to a partially written manifest revision from the changelog, and it will 25.483 -never read a pointer to a partially written filelog revision from the 25.484 -manifest. 25.485 - 25.486 -\subsection{Concurrent access} 25.487 - 25.488 -The read/write ordering and atomicity guarantees mean that Mercurial 25.489 -never needs to \emph{lock} a repository when it's reading data, even 25.490 -if the repository is being written to while the read is occurring. 25.491 -This has a big effect on scalability; you can have an arbitrary number 25.492 -of Mercurial processes safely reading data from a repository safely 25.493 -all at once, no matter whether it's being written to or not. 25.494 - 25.495 -The lockless nature of reading means that if you're sharing a 25.496 -repository on a multi-user system, you don't need to grant other local 25.497 -users permission to \emph{write} to your repository in order for them 25.498 -to be able to clone it or pull changes from it; they only need 25.499 -\emph{read} permission. (This is \emph{not} a common feature among 25.500 -revision control systems, so don't take it for granted! Most require 25.501 -readers to be able to lock a repository to access it safely, and this 25.502 -requires write permission on at least one directory, which of course 25.503 -makes for all kinds of nasty and annoying security and administrative 25.504 -problems.) 25.505 - 25.506 -Mercurial uses locks to ensure that only one process can write to a 25.507 -repository at a time (the locking mechanism is safe even over 25.508 -filesystems that are notoriously hostile to locking, such as NFS). If 25.509 -a repository is locked, a writer will wait for a while to retry if the 25.510 -repository becomes unlocked, but if the repository remains locked for 25.511 -too long, the process attempting to write will time out after a while. 25.512 -This means that your daily automated scripts won't get stuck forever 25.513 -and pile up if a system crashes unnoticed, for example. (Yes, the 25.514 -timeout is configurable, from zero to infinity.) 25.515 - 25.516 -\subsubsection{Safe dirstate access} 25.517 - 25.518 -As with revision data, Mercurial doesn't take a lock to read the 25.519 -dirstate file; it does acquire a lock to write it. To avoid the 25.520 -possibility of reading a partially written copy of the dirstate file, 25.521 -Mercurial writes to a file with a unique name in the same directory as 25.522 -the dirstate file, then renames the temporary file atomically to 25.523 -\filename{dirstate}. The file named \filename{dirstate} is thus 25.524 -guaranteed to be complete, not partially written. 25.525 - 25.526 -\subsection{Avoiding seeks} 25.527 - 25.528 -Critical to Mercurial's performance is the avoidance of seeks of the 25.529 -disk head, since any seek is far more expensive than even a 25.530 -comparatively large read operation. 25.531 - 25.532 -This is why, for example, the dirstate is stored in a single file. If 25.533 -there were a dirstate file per directory that Mercurial tracked, the 25.534 -disk would seek once per directory. Instead, Mercurial reads the 25.535 -entire single dirstate file in one step. 25.536 - 25.537 -Mercurial also uses a ``copy on write'' scheme when cloning a 25.538 -repository on local storage. Instead of copying every revlog file 25.539 -from the old repository into the new repository, it makes a ``hard 25.540 -link'', which is a shorthand way to say ``these two names point to the 25.541 -same file''. When Mercurial is about to write to one of a revlog's 25.542 -files, it checks to see if the number of names pointing at the file is 25.543 -greater than one. If it is, more than one repository is using the 25.544 -file, so Mercurial makes a new copy of the file that is private to 25.545 -this repository. 25.546 - 25.547 -A few revision control developers have pointed out that this idea of 25.548 -making a complete private copy of a file is not very efficient in its 25.549 -use of storage. While this is true, storage is cheap, and this method 25.550 -gives the highest performance while deferring most book-keeping to the 25.551 -operating system. An alternative scheme would most likely reduce 25.552 -performance and increase the complexity of the software, each of which 25.553 -is much more important to the ``feel'' of day-to-day use. 25.554 - 25.555 -\subsection{Other contents of the dirstate} 25.556 - 25.557 -Because Mercurial doesn't force you to tell it when you're modifying a 25.558 -file, it uses the dirstate to store some extra information so it can 25.559 -determine efficiently whether you have modified a file. For each file 25.560 -in the working directory, it stores the time that it last modified the 25.561 -file itself, and the size of the file at that time. 25.562 - 25.563 -When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or 25.564 -\hgcmd{copy} files, Mercurial updates the dirstate so that it knows 25.565 -what to do with those files when you commit. 25.566 - 25.567 -When Mercurial is checking the states of files in the working 25.568 -directory, it first checks a file's modification time. If that has 25.569 -not changed, the file must not have been modified. If the file's size 25.570 -has changed, the file must have been modified. If the modification 25.571 -time has changed, but the size has not, only then does Mercurial need 25.572 -to read the actual contents of the file to see if they've changed. 25.573 -Storing these few extra pieces of information dramatically reduces the 25.574 -amount of data that Mercurial needs to read, which yields large 25.575 -performance improvements compared to other revision control systems. 25.576 - 25.577 -%%% Local Variables: 25.578 -%%% mode: latex 25.579 -%%% TeX-master: "00book" 25.580 -%%% End:
26.1 --- a/en/daily.tex Thu Jan 29 22:47:34 2009 -0800 26.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 26.3 @@ -1,381 +0,0 @@ 26.4 -\chapter{Mercurial in daily use} 26.5 -\label{chap:daily} 26.6 - 26.7 -\section{Telling Mercurial which files to track} 26.8 - 26.9 -Mercurial does not work with files in your repository unless you tell 26.10 -it to manage them. The \hgcmd{status} command will tell you which 26.11 -files Mercurial doesn't know about; it uses a ``\texttt{?}'' to 26.12 -display such files. 26.13 - 26.14 -To tell Mercurial to track a file, use the \hgcmd{add} command. Once 26.15 -you have added a file, the entry in the output of \hgcmd{status} for 26.16 -that file changes from ``\texttt{?}'' to ``\texttt{A}''. 26.17 -\interaction{daily.files.add} 26.18 - 26.19 -After you run a \hgcmd{commit}, the files that you added before the 26.20 -commit will no longer be listed in the output of \hgcmd{status}. The 26.21 -reason for this is that \hgcmd{status} only tells you about 26.22 -``interesting'' files---those that you have modified or told Mercurial 26.23 -to do something with---by default. If you have a repository that 26.24 -contains thousands of files, you will rarely want to know about files 26.25 -that Mercurial is tracking, but that have not changed. (You can still 26.26 -get this information; we'll return to this later.) 26.27 - 26.28 -Once you add a file, Mercurial doesn't do anything with it 26.29 -immediately. Instead, it will take a snapshot of the file's state the 26.30 -next time you perform a commit. It will then continue to track the 26.31 -changes you make to the file every time you commit, until you remove 26.32 -the file. 26.33 - 26.34 -\subsection{Explicit versus implicit file naming} 26.35 - 26.36 -A useful behaviour that Mercurial has is that if you pass the name of 26.37 -a directory to a command, every Mercurial command will treat this as 26.38 -``I want to operate on every file in this directory and its 26.39 -subdirectories''. 26.40 -\interaction{daily.files.add-dir} 26.41 -Notice in this example that Mercurial printed the names of the files 26.42 -it added, whereas it didn't do so when we added the file named 26.43 -\filename{a} in the earlier example. 26.44 - 26.45 -What's going on is that in the former case, we explicitly named the 26.46 -file to add on the command line, so the assumption that Mercurial 26.47 -makes in such cases is that you know what you were doing, and it 26.48 -doesn't print any output. 26.49 - 26.50 -However, when we \emph{imply} the names of files by giving the name of 26.51 -a directory, Mercurial takes the extra step of printing the name of 26.52 -each file that it does something with. This makes it more clear what 26.53 -is happening, and reduces the likelihood of a silent and nasty 26.54 -surprise. This behaviour is common to most Mercurial commands. 26.55 - 26.56 -\subsection{Aside: Mercurial tracks files, not directories} 26.57 - 26.58 -Mercurial does not track directory information. Instead, it tracks 26.59 -the path to a file. Before creating a file, it first creates any 26.60 -missing directory components of the path. After it deletes a file, it 26.61 -then deletes any empty directories that were in the deleted file's 26.62 -path. This sounds like a trivial distinction, but it has one minor 26.63 -practical consequence: it is not possible to represent a completely 26.64 -empty directory in Mercurial. 26.65 - 26.66 -Empty directories are rarely useful, and there are unintrusive 26.67 -workarounds that you can use to achieve an appropriate effect. The 26.68 -developers of Mercurial thus felt that the complexity that would be 26.69 -required to manage empty directories was not worth the limited benefit 26.70 -this feature would bring. 26.71 - 26.72 -If you need an empty directory in your repository, there are a few 26.73 -ways to achieve this. One is to create a directory, then \hgcmd{add} a 26.74 -``hidden'' file to that directory. On Unix-like systems, any file 26.75 -name that begins with a period (``\texttt{.}'') is treated as hidden 26.76 -by most commands and GUI tools. This approach is illustrated in 26.77 -figure~\ref{ex:daily:hidden}. 26.78 - 26.79 -\begin{figure}[ht] 26.80 - \interaction{daily.files.hidden} 26.81 - \caption{Simulating an empty directory using a hidden file} 26.82 - \label{ex:daily:hidden} 26.83 -\end{figure} 26.84 - 26.85 -Another way to tackle a need for an empty directory is to simply 26.86 -create one in your automated build scripts before they will need it. 26.87 - 26.88 -\section{How to stop tracking a file} 26.89 - 26.90 -Once you decide that a file no longer belongs in your repository, use 26.91 -the \hgcmd{remove} command; this deletes the file, and tells Mercurial 26.92 -to stop tracking it. A removed file is represented in the output of 26.93 -\hgcmd{status} with a ``\texttt{R}''. 26.94 -\interaction{daily.files.remove} 26.95 - 26.96 -After you \hgcmd{remove} a file, Mercurial will no longer track 26.97 -changes to that file, even if you recreate a file with the same name 26.98 -in your working directory. If you do recreate a file with the same 26.99 -name and want Mercurial to track the new file, simply \hgcmd{add} it. 26.100 -Mercurial will know that the newly added file is not related to the 26.101 -old file of the same name. 26.102 - 26.103 -\subsection{Removing a file does not affect its history} 26.104 - 26.105 -It is important to understand that removing a file has only two 26.106 -effects. 26.107 -\begin{itemize} 26.108 -\item It removes the current version of the file from the working 26.109 - directory. 26.110 -\item It stops Mercurial from tracking changes to the file, from the 26.111 - time of the next commit. 26.112 -\end{itemize} 26.113 -Removing a file \emph{does not} in any way alter the \emph{history} of 26.114 -the file. 26.115 - 26.116 -If you update the working directory to a changeset in which a file 26.117 -that you have removed was still tracked, it will reappear in the 26.118 -working directory, with the contents it had when you committed that 26.119 -changeset. If you then update the working directory to a later 26.120 -changeset, in which the file had been removed, Mercurial will once 26.121 -again remove the file from the working directory. 26.122 - 26.123 -\subsection{Missing files} 26.124 - 26.125 -Mercurial considers a file that you have deleted, but not used 26.126 -\hgcmd{remove} to delete, to be \emph{missing}. A missing file is 26.127 -represented with ``\texttt{!}'' in the output of \hgcmd{status}. 26.128 -Mercurial commands will not generally do anything with missing files. 26.129 -\interaction{daily.files.missing} 26.130 - 26.131 -If your repository contains a file that \hgcmd{status} reports as 26.132 -missing, and you want the file to stay gone, you can run 26.133 -\hgcmdargs{remove}{\hgopt{remove}{--after}} at any time later on, to 26.134 -tell Mercurial that you really did mean to remove the file. 26.135 -\interaction{daily.files.remove-after} 26.136 - 26.137 -On the other hand, if you deleted the missing file by accident, use 26.138 -\hgcmdargs{revert}{\emph{filename}} to recover the file. It will 26.139 -reappear, in unmodified form. 26.140 -\interaction{daily.files.recover-missing} 26.141 - 26.142 -\subsection{Aside: why tell Mercurial explicitly to 26.143 - remove a file?} 26.144 - 26.145 -You might wonder why Mercurial requires you to explicitly tell it that 26.146 -you are deleting a file. Early during the development of Mercurial, 26.147 -it let you delete a file however you pleased; Mercurial would notice 26.148 -the absence of the file automatically when you next ran a 26.149 -\hgcmd{commit}, and stop tracking the file. In practice, this made it 26.150 -too easy to accidentally remove a file without noticing. 26.151 - 26.152 -\subsection{Useful shorthand---adding and removing files 26.153 - in one step} 26.154 - 26.155 -Mercurial offers a combination command, \hgcmd{addremove}, that adds 26.156 -untracked files and marks missing files as removed. 26.157 -\interaction{daily.files.addremove} 26.158 -The \hgcmd{commit} command also provides a \hgopt{commit}{-A} option 26.159 -that performs this same add-and-remove, immediately followed by a 26.160 -commit. 26.161 -\interaction{daily.files.commit-addremove} 26.162 - 26.163 -\section{Copying files} 26.164 - 26.165 -Mercurial provides a \hgcmd{copy} command that lets you make a new 26.166 -copy of a file. When you copy a file using this command, Mercurial 26.167 -makes a record of the fact that the new file is a copy of the original 26.168 -file. It treats these copied files specially when you merge your work 26.169 -with someone else's. 26.170 - 26.171 -\subsection{The results of copying during a merge} 26.172 - 26.173 -What happens during a merge is that changes ``follow'' a copy. To 26.174 -best illustrate what this means, let's create an example. We'll start 26.175 -with the usual tiny repository that contains a single file. 26.176 -\interaction{daily.copy.init} 26.177 -We need to do some work in parallel, so that we'll have something to 26.178 -merge. So let's clone our repository. 26.179 -\interaction{daily.copy.clone} 26.180 -Back in our initial repository, let's use the \hgcmd{copy} command to 26.181 -make a copy of the first file we created. 26.182 -\interaction{daily.copy.copy} 26.183 - 26.184 -If we look at the output of the \hgcmd{status} command afterwards, the 26.185 -copied file looks just like a normal added file. 26.186 -\interaction{daily.copy.status} 26.187 -But if we pass the \hgopt{status}{-C} option to \hgcmd{status}, it 26.188 -prints another line of output: this is the file that our newly-added 26.189 -file was copied \emph{from}. 26.190 -\interaction{daily.copy.status-copy} 26.191 - 26.192 -Now, back in the repository we cloned, let's make a change in 26.193 -parallel. We'll add a line of content to the original file that we 26.194 -created. 26.195 -\interaction{daily.copy.other} 26.196 -Now we have a modified \filename{file} in this repository. When we 26.197 -pull the changes from the first repository, and merge the two heads, 26.198 -Mercurial will propagate the changes that we made locally to 26.199 -\filename{file} into its copy, \filename{new-file}. 26.200 -\interaction{daily.copy.merge} 26.201 - 26.202 -\subsection{Why should changes follow copies?} 26.203 -\label{sec:daily:why-copy} 26.204 - 26.205 -This behaviour, of changes to a file propagating out to copies of the 26.206 -file, might seem esoteric, but in most cases it's highly desirable. 26.207 - 26.208 -First of all, remember that this propagation \emph{only} happens when 26.209 -you merge. So if you \hgcmd{copy} a file, and subsequently modify the 26.210 -original file during the normal course of your work, nothing will 26.211 -happen. 26.212 - 26.213 -The second thing to know is that modifications will only propagate 26.214 -across a copy as long as the repository that you're pulling changes 26.215 -from \emph{doesn't know} about the copy. 26.216 - 26.217 -The reason that Mercurial does this is as follows. Let's say I make 26.218 -an important bug fix in a source file, and commit my changes. 26.219 -Meanwhile, you've decided to \hgcmd{copy} the file in your repository, 26.220 -without knowing about the bug or having seen the fix, and you have 26.221 -started hacking on your copy of the file. 26.222 - 26.223 -If you pulled and merged my changes, and Mercurial \emph{didn't} 26.224 -propagate changes across copies, your source file would now contain 26.225 -the bug, and unless you remembered to propagate the bug fix by hand, 26.226 -the bug would \emph{remain} in your copy of the file. 26.227 - 26.228 -By automatically propagating the change that fixed the bug from the 26.229 -original file to the copy, Mercurial prevents this class of problem. 26.230 -To my knowledge, Mercurial is the \emph{only} revision control system 26.231 -that propagates changes across copies like this. 26.232 - 26.233 -Once your change history has a record that the copy and subsequent 26.234 -merge occurred, there's usually no further need to propagate changes 26.235 -from the original file to the copied file, and that's why Mercurial 26.236 -only propagates changes across copies until this point, and no 26.237 -further. 26.238 - 26.239 -\subsection{How to make changes \emph{not} follow a copy} 26.240 - 26.241 -If, for some reason, you decide that this business of automatically 26.242 -propagating changes across copies is not for you, simply use your 26.243 -system's normal file copy command (on Unix-like systems, that's 26.244 -\command{cp}) to make a copy of a file, then \hgcmd{add} the new copy 26.245 -by hand. Before you do so, though, please do reread 26.246 -section~\ref{sec:daily:why-copy}, and make an informed decision that 26.247 -this behaviour is not appropriate to your specific case. 26.248 - 26.249 -\subsection{Behaviour of the \hgcmd{copy} command} 26.250 - 26.251 -When you use the \hgcmd{copy} command, Mercurial makes a copy of each 26.252 -source file as it currently stands in the working directory. This 26.253 -means that if you make some modifications to a file, then \hgcmd{copy} 26.254 -it without first having committed those changes, the new copy will 26.255 -also contain the modifications you have made up until that point. (I 26.256 -find this behaviour a little counterintuitive, which is why I mention 26.257 -it here.) 26.258 - 26.259 -The \hgcmd{copy} command acts similarly to the Unix \command{cp} 26.260 -command (you can use the \hgcmd{cp} alias if you prefer). The last 26.261 -argument is the \emph{destination}, and all prior arguments are 26.262 -\emph{sources}. If you pass it a single file as the source, and the 26.263 -destination does not exist, it creates a new file with that name. 26.264 -\interaction{daily.copy.simple} 26.265 -If the destination is a directory, Mercurial copies its sources into 26.266 -that directory. 26.267 -\interaction{daily.copy.dir-dest} 26.268 -Copying a directory is recursive, and preserves the directory 26.269 -structure of the source. 26.270 -\interaction{daily.copy.dir-src} 26.271 -If the source and destination are both directories, the source tree is 26.272 -recreated in the destination directory. 26.273 -\interaction{daily.copy.dir-src-dest} 26.274 - 26.275 -As with the \hgcmd{rename} command, if you copy a file manually and 26.276 -then want Mercurial to know that you've copied the file, simply use 26.277 -the \hgopt{copy}{--after} option to \hgcmd{copy}. 26.278 -\interaction{daily.copy.after} 26.279 - 26.280 -\section{Renaming files} 26.281 - 26.282 -It's rather more common to need to rename a file than to make a copy 26.283 -of it. The reason I discussed the \hgcmd{copy} command before talking 26.284 -about renaming files is that Mercurial treats a rename in essentially 26.285 -the same way as a copy. Therefore, knowing what Mercurial does when 26.286 -you copy a file tells you what to expect when you rename a file. 26.287 - 26.288 -When you use the \hgcmd{rename} command, Mercurial makes a copy of 26.289 -each source file, then deletes it and marks the file as removed. 26.290 -\interaction{daily.rename.rename} 26.291 -The \hgcmd{status} command shows the newly copied file as added, and 26.292 -the copied-from file as removed. 26.293 -\interaction{daily.rename.status} 26.294 -As with the results of a \hgcmd{copy}, we must use the 26.295 -\hgopt{status}{-C} option to \hgcmd{status} to see that the added file 26.296 -is really being tracked by Mercurial as a copy of the original, now 26.297 -removed, file. 26.298 -\interaction{daily.rename.status-copy} 26.299 - 26.300 -As with \hgcmd{remove} and \hgcmd{copy}, you can tell Mercurial about 26.301 -a rename after the fact using the \hgopt{rename}{--after} option. In 26.302 -most other respects, the behaviour of the \hgcmd{rename} command, and 26.303 -the options it accepts, are similar to the \hgcmd{copy} command. 26.304 - 26.305 -\subsection{Renaming files and merging changes} 26.306 - 26.307 -Since Mercurial's rename is implemented as copy-and-remove, the same 26.308 -propagation of changes happens when you merge after a rename as after 26.309 -a copy. 26.310 - 26.311 -If I modify a file, and you rename it to a new name, and then we merge 26.312 -our respective changes, my modifications to the file under its 26.313 -original name will be propagated into the file under its new name. 26.314 -(This is something you might expect to ``simply work,'' but not all 26.315 -revision control systems actually do this.) 26.316 - 26.317 -Whereas having changes follow a copy is a feature where you can 26.318 -perhaps nod and say ``yes, that might be useful,'' it should be clear 26.319 -that having them follow a rename is definitely important. Without 26.320 -this facility, it would simply be too easy for changes to become 26.321 -orphaned when files are renamed. 26.322 - 26.323 -\subsection{Divergent renames and merging} 26.324 - 26.325 -The case of diverging names occurs when two developers start with a 26.326 -file---let's call it \filename{foo}---in their respective 26.327 -repositories. 26.328 - 26.329 -\interaction{rename.divergent.clone} 26.330 -Anne renames the file to \filename{bar}. 26.331 -\interaction{rename.divergent.rename.anne} 26.332 -Meanwhile, Bob renames it to \filename{quux}. 26.333 -\interaction{rename.divergent.rename.bob} 26.334 - 26.335 -I like to think of this as a conflict because each developer has 26.336 -expressed different intentions about what the file ought to be named. 26.337 - 26.338 -What do you think should happen when they merge their work? 26.339 -Mercurial's actual behaviour is that it always preserves \emph{both} 26.340 -names when it merges changesets that contain divergent renames. 26.341 -\interaction{rename.divergent.merge} 26.342 - 26.343 -Notice that Mercurial does warn about the divergent renames, but it 26.344 -leaves it up to you to do something about the divergence after the merge. 26.345 - 26.346 -\subsection{Convergent renames and merging} 26.347 - 26.348 -Another kind of rename conflict occurs when two people choose to 26.349 -rename different \emph{source} files to the same \emph{destination}. 26.350 -In this case, Mercurial runs its normal merge machinery, and lets you 26.351 -guide it to a suitable resolution. 26.352 - 26.353 -\subsection{Other name-related corner cases} 26.354 - 26.355 -Mercurial has a longstanding bug in which it fails to handle a merge 26.356 -where one side has a file with a given name, while another has a 26.357 -directory with the same name. This is documented as~\bug{29}. 26.358 -\interaction{issue29.go} 26.359 - 26.360 -\section{Recovering from mistakes} 26.361 - 26.362 -Mercurial has some useful commands that will help you to recover from 26.363 -some common mistakes. 26.364 - 26.365 -The \hgcmd{revert} command lets you undo changes that you have made to 26.366 -your working directory. For example, if you \hgcmd{add} a file by 26.367 -accident, just run \hgcmd{revert} with the name of the file you added, 26.368 -and while the file won't be touched in any way, it won't be tracked 26.369 -for adding by Mercurial any longer, either. You can also use 26.370 -\hgcmd{revert} to get rid of erroneous changes to a file. 26.371 - 26.372 -It's useful to remember that the \hgcmd{revert} command is useful for 26.373 -changes that you have not yet committed. Once you've committed a 26.374 -change, if you decide it was a mistake, you can still do something 26.375 -about it, though your options may be more limited. 26.376 - 26.377 -For more information about the \hgcmd{revert} command, and details 26.378 -about how to deal with changes you have already committed, see 26.379 -chapter~\ref{chap:undo}. 26.380 - 26.381 -%%% Local Variables: 26.382 -%%% mode: latex 26.383 -%%% TeX-master: "00book" 26.384 -%%% End:
27.1 --- a/en/filenames.tex Thu Jan 29 22:47:34 2009 -0800 27.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 27.3 @@ -1,306 +0,0 @@ 27.4 -\chapter{File names and pattern matching} 27.5 -\label{chap:names} 27.6 - 27.7 -Mercurial provides mechanisms that let you work with file names in a 27.8 -consistent and expressive way. 27.9 - 27.10 -\section{Simple file naming} 27.11 - 27.12 -Mercurial uses a unified piece of machinery ``under the hood'' to 27.13 -handle file names. Every command behaves uniformly with respect to 27.14 -file names. The way in which commands work with file names is as 27.15 -follows. 27.16 - 27.17 -If you explicitly name real files on the command line, Mercurial works 27.18 -with exactly those files, as you would expect. 27.19 -\interaction{filenames.files} 27.20 - 27.21 -When you provide a directory name, Mercurial will interpret this as 27.22 -``operate on every file in this directory and its subdirectories''. 27.23 -Mercurial traverses the files and subdirectories in a directory in 27.24 -alphabetical order. When it encounters a subdirectory, it will 27.25 -traverse that subdirectory before continuing with the current 27.26 -directory. 27.27 -\interaction{filenames.dirs} 27.28 - 27.29 -\section{Running commands without any file names} 27.30 - 27.31 -Mercurial's commands that work with file names have useful default 27.32 -behaviours when you invoke them without providing any file names or 27.33 -patterns. What kind of behaviour you should expect depends on what 27.34 -the command does. Here are a few rules of thumb you can use to 27.35 -predict what a command is likely to do if you don't give it any names 27.36 -to work with. 27.37 -\begin{itemize} 27.38 -\item Most commands will operate on the entire working directory. 27.39 - This is what the \hgcmd{add} command does, for example. 27.40 -\item If the command has effects that are difficult or impossible to 27.41 - reverse, it will force you to explicitly provide at least one name 27.42 - or pattern (see below). This protects you from accidentally 27.43 - deleting files by running \hgcmd{remove} with no arguments, for 27.44 - example. 27.45 -\end{itemize} 27.46 - 27.47 -It's easy to work around these default behaviours if they don't suit 27.48 -you. If a command normally operates on the whole working directory, 27.49 -you can invoke it on just the current directory and its subdirectories 27.50 -by giving it the name ``\dirname{.}''. 27.51 -\interaction{filenames.wdir-subdir} 27.52 - 27.53 -Along the same lines, some commands normally print file names relative 27.54 -to the root of the repository, even if you're invoking them from a 27.55 -subdirectory. Such a command will print file names relative to your 27.56 -subdirectory if you give it explicit names. Here, we're going to run 27.57 -\hgcmd{status} from a subdirectory, and get it to operate on the 27.58 -entire working directory while printing file names relative to our 27.59 -subdirectory, by passing it the output of the \hgcmd{root} command. 27.60 -\interaction{filenames.wdir-relname} 27.61 - 27.62 -\section{Telling you what's going on} 27.63 - 27.64 -The \hgcmd{add} example in the preceding section illustrates something 27.65 -else that's helpful about Mercurial commands. If a command operates 27.66 -on a file that you didn't name explicitly on the command line, it will 27.67 -usually print the name of the file, so that you will not be surprised 27.68 -what's going on. 27.69 - 27.70 -The principle here is of \emph{least surprise}. If you've exactly 27.71 -named a file on the command line, there's no point in repeating it 27.72 -back at you. If Mercurial is acting on a file \emph{implicitly}, 27.73 -because you provided no names, or a directory, or a pattern (see 27.74 -below), it's safest to tell you what it's doing. 27.75 - 27.76 -For commands that behave this way, you can silence them using the 27.77 -\hggopt{-q} option. You can also get them to print the name of every 27.78 -file, even those you've named explicitly, using the \hggopt{-v} 27.79 -option. 27.80 - 27.81 -\section{Using patterns to identify files} 27.82 - 27.83 -In addition to working with file and directory names, Mercurial lets 27.84 -you use \emph{patterns} to identify files. Mercurial's pattern 27.85 -handling is expressive. 27.86 - 27.87 -On Unix-like systems (Linux, MacOS, etc.), the job of matching file 27.88 -names to patterns normally falls to the shell. On these systems, you 27.89 -must explicitly tell Mercurial that a name is a pattern. On Windows, 27.90 -the shell does not expand patterns, so Mercurial will automatically 27.91 -identify names that are patterns, and expand them for you. 27.92 - 27.93 -To provide a pattern in place of a regular name on the command line, 27.94 -the mechanism is simple: 27.95 -\begin{codesample2} 27.96 - syntax:patternbody 27.97 -\end{codesample2} 27.98 -That is, a pattern is identified by a short text string that says what 27.99 -kind of pattern this is, followed by a colon, followed by the actual 27.100 -pattern. 27.101 - 27.102 -Mercurial supports two kinds of pattern syntax. The most frequently 27.103 -used is called \texttt{glob}; this is the same kind of pattern 27.104 -matching used by the Unix shell, and should be familiar to Windows 27.105 -command prompt users, too. 27.106 - 27.107 -When Mercurial does automatic pattern matching on Windows, it uses 27.108 -\texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix 27.109 -on Windows, but it's safe to use it, too. 27.110 - 27.111 -The \texttt{re} syntax is more powerful; it lets you specify patterns 27.112 -using regular expressions, also known as regexps. 27.113 - 27.114 -By the way, in the examples that follow, notice that I'm careful to 27.115 -wrap all of my patterns in quote characters, so that they won't get 27.116 -expanded by the shell before Mercurial sees them. 27.117 - 27.118 -\subsection{Shell-style \texttt{glob} patterns} 27.119 - 27.120 -This is an overview of the kinds of patterns you can use when you're 27.121 -matching on glob patterns. 27.122 - 27.123 -The ``\texttt{*}'' character matches any string, within a single 27.124 -directory. 27.125 -\interaction{filenames.glob.star} 27.126 - 27.127 -The ``\texttt{**}'' pattern matches any string, and crosses directory 27.128 -boundaries. It's not a standard Unix glob token, but it's accepted by 27.129 -several popular Unix shells, and is very useful. 27.130 -\interaction{filenames.glob.starstar} 27.131 - 27.132 -The ``\texttt{?}'' pattern matches any single character. 27.133 -\interaction{filenames.glob.question} 27.134 - 27.135 -The ``\texttt{[}'' character begins a \emph{character class}. This 27.136 -matches any single character within the class. The class ends with a 27.137 -``\texttt{]}'' character. A class may contain multiple \emph{range}s 27.138 -of the form ``\texttt{a-f}'', which is shorthand for 27.139 -``\texttt{abcdef}''. 27.140 -\interaction{filenames.glob.range} 27.141 -If the first character after the ``\texttt{[}'' in a character class 27.142 -is a ``\texttt{!}'', it \emph{negates} the class, making it match any 27.143 -single character not in the class. 27.144 - 27.145 -A ``\texttt{\{}'' begins a group of subpatterns, where the whole group 27.146 -matches if any subpattern in the group matches. The ``\texttt{,}'' 27.147 -character separates subpatterns, and ``\texttt{\}}'' ends the group. 27.148 -\interaction{filenames.glob.group} 27.149 - 27.150 -\subsubsection{Watch out!} 27.151 - 27.152 -Don't forget that if you want to match a pattern in any directory, you 27.153 -should not be using the ``\texttt{*}'' match-any token, as this will 27.154 -only match within one directory. Instead, use the ``\texttt{**}'' 27.155 -token. This small example illustrates the difference between the two. 27.156 -\interaction{filenames.glob.star-starstar} 27.157 - 27.158 -\subsection{Regular expression matching with \texttt{re} patterns} 27.159 - 27.160 -Mercurial accepts the same regular expression syntax as the Python 27.161 -programming language (it uses Python's regexp engine internally). 27.162 -This is based on the Perl language's regexp syntax, which is the most 27.163 -popular dialect in use (it's also used in Java, for example). 27.164 - 27.165 -I won't discuss Mercurial's regexp dialect in any detail here, as 27.166 -regexps are not often used. Perl-style regexps are in any case 27.167 -already exhaustively documented on a multitude of web sites, and in 27.168 -many books. Instead, I will focus here on a few things you should 27.169 -know if you find yourself needing to use regexps with Mercurial. 27.170 - 27.171 -A regexp is matched against an entire file name, relative to the root 27.172 -of the repository. In other words, even if you're already in 27.173 -subbdirectory \dirname{foo}, if you want to match files under this 27.174 -directory, your pattern must start with ``\texttt{foo/}''. 27.175 - 27.176 -One thing to note, if you're familiar with Perl-style regexps, is that 27.177 -Mercurial's are \emph{rooted}. That is, a regexp starts matching 27.178 -against the beginning of a string; it doesn't look for a match 27.179 -anywhere within the string. To match anywhere in a string, start 27.180 -your pattern with ``\texttt{.*}''. 27.181 - 27.182 -\section{Filtering files} 27.183 - 27.184 -Not only does Mercurial give you a variety of ways to specify files; 27.185 -it lets you further winnow those files using \emph{filters}. Commands 27.186 -that work with file names accept two filtering options. 27.187 -\begin{itemize} 27.188 -\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern 27.189 - that file names must match in order to be processed. 27.190 -\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to 27.191 - \emph{avoid} processing files, if they match this pattern. 27.192 -\end{itemize} 27.193 -You can provide multiple \hggopt{-I} and \hggopt{-X} options on the 27.194 -command line, and intermix them as you please. Mercurial interprets 27.195 -the patterns you provide using glob syntax by default (but you can use 27.196 -regexps if you need to). 27.197 - 27.198 -You can read a \hggopt{-I} filter as ``process only the files that 27.199 -match this filter''. 27.200 -\interaction{filenames.filter.include} 27.201 -The \hggopt{-X} filter is best read as ``process only the files that 27.202 -don't match this pattern''. 27.203 -\interaction{filenames.filter.exclude} 27.204 - 27.205 -\section{Ignoring unwanted files and directories} 27.206 - 27.207 -XXX. 27.208 - 27.209 -\section{Case sensitivity} 27.210 -\label{sec:names:case} 27.211 - 27.212 -If you're working in a mixed development environment that contains 27.213 -both Linux (or other Unix) systems and Macs or Windows systems, you 27.214 -should keep in the back of your mind the knowledge that they treat the 27.215 -case (``N'' versus ``n'') of file names in incompatible ways. This is 27.216 -not very likely to affect you, and it's easy to deal with if it does, 27.217 -but it could surprise you if you don't know about it. 27.218 - 27.219 -Operating systems and filesystems differ in the way they handle the 27.220 -\emph{case} of characters in file and directory names. There are 27.221 -three common ways to handle case in names. 27.222 -\begin{itemize} 27.223 -\item Completely case insensitive. Uppercase and lowercase versions 27.224 - of a letter are treated as identical, both when creating a file and 27.225 - during subsequent accesses. This is common on older DOS-based 27.226 - systems. 27.227 -\item Case preserving, but insensitive. When a file or directory is 27.228 - created, the case of its name is stored, and can be retrieved and 27.229 - displayed by the operating system. When an existing file is being 27.230 - looked up, its case is ignored. This is the standard arrangement on 27.231 - Windows and MacOS. The names \filename{foo} and \filename{FoO} 27.232 - identify the same file. This treatment of uppercase and lowercase 27.233 - letters as interchangeable is also referred to as \emph{case 27.234 - folding}. 27.235 -\item Case sensitive. The case of a name is significant at all times. 27.236 - The names \filename{foo} and {FoO} identify different files. This 27.237 - is the way Linux and Unix systems normally work. 27.238 -\end{itemize} 27.239 - 27.240 -On Unix-like systems, it is possible to have any or all of the above 27.241 -ways of handling case in action at once. For example, if you use a 27.242 -USB thumb drive formatted with a FAT32 filesystem on a Linux system, 27.243 -Linux will handle names on that filesystem in a case preserving, but 27.244 -insensitive, way. 27.245 - 27.246 -\subsection{Safe, portable repository storage} 27.247 - 27.248 -Mercurial's repository storage mechanism is \emph{case safe}. It 27.249 -translates file names so that they can be safely stored on both case 27.250 -sensitive and case insensitive filesystems. This means that you can 27.251 -use normal file copying tools to transfer a Mercurial repository onto, 27.252 -for example, a USB thumb drive, and safely move that drive and 27.253 -repository back and forth between a Mac, a PC running Windows, and a 27.254 -Linux box. 27.255 - 27.256 -\subsection{Detecting case conflicts} 27.257 - 27.258 -When operating in the working directory, Mercurial honours the naming 27.259 -policy of the filesystem where the working directory is located. If 27.260 -the filesystem is case preserving, but insensitive, Mercurial will 27.261 -treat names that differ only in case as the same. 27.262 - 27.263 -An important aspect of this approach is that it is possible to commit 27.264 -a changeset on a case sensitive (typically Linux or Unix) filesystem 27.265 -that will cause trouble for users on case insensitive (usually Windows 27.266 -and MacOS) users. If a Linux user commits changes to two files, one 27.267 -named \filename{myfile.c} and the other named \filename{MyFile.C}, 27.268 -they will be stored correctly in the repository. And in the working 27.269 -directories of other Linux users, they will be correctly represented 27.270 -as separate files. 27.271 - 27.272 -If a Windows or Mac user pulls this change, they will not initially 27.273 -have a problem, because Mercurial's repository storage mechanism is 27.274 -case safe. However, once they try to \hgcmd{update} the working 27.275 -directory to that changeset, or \hgcmd{merge} with that changeset, 27.276 -Mercurial will spot the conflict between the two file names that the 27.277 -filesystem would treat as the same, and forbid the update or merge 27.278 -from occurring. 27.279 - 27.280 -\subsection{Fixing a case conflict} 27.281 - 27.282 -If you are using Windows or a Mac in a mixed environment where some of 27.283 -your collaborators are using Linux or Unix, and Mercurial reports a 27.284 -case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, 27.285 -the procedure to fix the problem is simple. 27.286 - 27.287 -Just find a nearby Linux or Unix box, clone the problem repository 27.288 -onto it, and use Mercurial's \hgcmd{rename} command to change the 27.289 -names of any offending files or directories so that they will no 27.290 -longer cause case folding conflicts. Commit this change, \hgcmd{pull} 27.291 -or \hgcmd{push} it across to your Windows or MacOS system, and 27.292 -\hgcmd{update} to the revision with the non-conflicting names. 27.293 - 27.294 -The changeset with case-conflicting names will remain in your 27.295 -project's history, and you still won't be able to \hgcmd{update} your 27.296 -working directory to that changeset on a Windows or MacOS system, but 27.297 -you can continue development unimpeded. 27.298 - 27.299 -\begin{note} 27.300 - Prior to version~0.9.3, Mercurial did not use a case safe repository 27.301 - storage mechanism, and did not detect case folding conflicts. If 27.302 - you are using an older version of Mercurial on Windows or MacOS, I 27.303 - strongly recommend that you upgrade. 27.304 -\end{note} 27.305 - 27.306 -%%% Local Variables: 27.307 -%%% mode: latex 27.308 -%%% TeX-master: "00book" 27.309 -%%% End:
28.1 --- a/en/hgext.tex Thu Jan 29 22:47:34 2009 -0800 28.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 28.3 @@ -1,429 +0,0 @@ 28.4 -\chapter{Adding functionality with extensions} 28.5 -\label{chap:hgext} 28.6 - 28.7 -While the core of Mercurial is quite complete from a functionality 28.8 -standpoint, it's deliberately shorn of fancy features. This approach 28.9 -of preserving simplicity keeps the software easy to deal with for both 28.10 -maintainers and users. 28.11 - 28.12 -However, Mercurial doesn't box you in with an inflexible command set: 28.13 -you can add features to it as \emph{extensions} (sometimes known as 28.14 -\emph{plugins}). We've already discussed a few of these extensions in 28.15 -earlier chapters. 28.16 -\begin{itemize} 28.17 -\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch} 28.18 - extension; this combines pulling new changes and merging them with 28.19 - local changes into a single command, \hgxcmd{fetch}{fetch}. 28.20 -\item In chapter~\ref{chap:hook}, we covered several extensions that 28.21 - are useful for hook-related functionality: \hgext{acl} adds access 28.22 - control lists; \hgext{bugzilla} adds integration with the Bugzilla 28.23 - bug tracking system; and \hgext{notify} sends notification emails on 28.24 - new changes. 28.25 -\item The Mercurial Queues patch management extension is so invaluable 28.26 - that it merits two chapters and an appendix all to itself. 28.27 - Chapter~\ref{chap:mq} covers the basics; 28.28 - chapter~\ref{chap:mq-collab} discusses advanced topics; and 28.29 - appendix~\ref{chap:mqref} goes into detail on each command. 28.30 -\end{itemize} 28.31 - 28.32 -In this chapter, we'll cover some of the other extensions that are 28.33 -available for Mercurial, and briefly touch on some of the machinery 28.34 -you'll need to know about if you want to write an extension of your 28.35 -own. 28.36 -\begin{itemize} 28.37 -\item In section~\ref{sec:hgext:inotify}, we'll discuss the 28.38 - possibility of \emph{huge} performance improvements using the 28.39 - \hgext{inotify} extension. 28.40 -\end{itemize} 28.41 - 28.42 -\section{Improve performance with the \hgext{inotify} extension} 28.43 -\label{sec:hgext:inotify} 28.44 - 28.45 -Are you interested in having some of the most common Mercurial 28.46 -operations run as much as a hundred times faster? Read on! 28.47 - 28.48 -Mercurial has great performance under normal circumstances. For 28.49 -example, when you run the \hgcmd{status} command, Mercurial has to 28.50 -scan almost every directory and file in your repository so that it can 28.51 -display file status. Many other Mercurial commands need to do the 28.52 -same work behind the scenes; for example, the \hgcmd{diff} command 28.53 -uses the status machinery to avoid doing an expensive comparison 28.54 -operation on files that obviously haven't changed. 28.55 - 28.56 -Because obtaining file status is crucial to good performance, the 28.57 -authors of Mercurial have optimised this code to within an inch of its 28.58 -life. However, there's no avoiding the fact that when you run 28.59 -\hgcmd{status}, Mercurial is going to have to perform at least one 28.60 -expensive system call for each managed file to determine whether it's 28.61 -changed since the last time Mercurial checked. For a sufficiently 28.62 -large repository, this can take a long time. 28.63 - 28.64 -To put a number on the magnitude of this effect, I created a 28.65 -repository containing 150,000 managed files. I timed \hgcmd{status} 28.66 -as taking ten seconds to run, even when \emph{none} of those files had 28.67 -been modified. 28.68 - 28.69 -Many modern operating systems contain a file notification facility. 28.70 -If a program signs up to an appropriate service, the operating system 28.71 -will notify it every time a file of interest is created, modified, or 28.72 -deleted. On Linux systems, the kernel component that does this is 28.73 -called \texttt{inotify}. 28.74 - 28.75 -Mercurial's \hgext{inotify} extension talks to the kernel's 28.76 -\texttt{inotify} component to optimise \hgcmd{status} commands. The 28.77 -extension has two components. A daemon sits in the background and 28.78 -receives notifications from the \texttt{inotify} subsystem. It also 28.79 -listens for connections from a regular Mercurial command. The 28.80 -extension modifies Mercurial's behaviour so that instead of scanning 28.81 -the filesystem, it queries the daemon. Since the daemon has perfect 28.82 -information about the state of the repository, it can respond with a 28.83 -result instantaneously, avoiding the need to scan every directory and 28.84 -file in the repository. 28.85 - 28.86 -Recall the ten seconds that I measured plain Mercurial as taking to 28.87 -run \hgcmd{status} on a 150,000 file repository. With the 28.88 -\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a 28.89 -factor of \emph{one hundred} faster. 28.90 - 28.91 -Before we continue, please pay attention to some caveats. 28.92 -\begin{itemize} 28.93 -\item The \hgext{inotify} extension is Linux-specific. Because it 28.94 - interfaces directly to the Linux kernel's \texttt{inotify} 28.95 - subsystem, it does not work on other operating systems. 28.96 -\item It should work on any Linux distribution that was released after 28.97 - early~2005. Older distributions are likely to have a kernel that 28.98 - lacks \texttt{inotify}, or a version of \texttt{glibc} that does not 28.99 - have the necessary interfacing support. 28.100 -\item Not all filesystems are suitable for use with the 28.101 - \hgext{inotify} extension. Network filesystems such as NFS are a 28.102 - non-starter, for example, particularly if you're running Mercurial 28.103 - on several systems, all mounting the same network filesystem. The 28.104 - kernel's \texttt{inotify} system has no way of knowing about changes 28.105 - made on another system. Most local filesystems (e.g.~ext3, XFS, 28.106 - ReiserFS) should work fine. 28.107 -\end{itemize} 28.108 - 28.109 -The \hgext{inotify} extension is not yet shipped with Mercurial as of 28.110 -May~2007, so it's a little more involved to set up than other 28.111 -extensions. But the performance improvement is worth it! 28.112 - 28.113 -The extension currently comes in two parts: a set of patches to the 28.114 -Mercurial source code, and a library of Python bindings to the 28.115 -\texttt{inotify} subsystem. 28.116 -\begin{note} 28.117 - There are \emph{two} Python \texttt{inotify} binding libraries. One 28.118 - of them is called \texttt{pyinotify}, and is packaged by some Linux 28.119 - distributions as \texttt{python-inotify}. This is \emph{not} the 28.120 - one you'll need, as it is too buggy and inefficient to be practical. 28.121 -\end{note} 28.122 -To get going, it's best to already have a functioning copy of 28.123 -Mercurial installed. 28.124 -\begin{note} 28.125 - If you follow the instructions below, you'll be \emph{replacing} and 28.126 - overwriting any existing installation of Mercurial that you might 28.127 - already have, using the latest ``bleeding edge'' Mercurial code. 28.128 - Don't say you weren't warned! 28.129 -\end{note} 28.130 -\begin{enumerate} 28.131 -\item Clone the Python \texttt{inotify} binding repository. Build and 28.132 - install it. 28.133 - \begin{codesample4} 28.134 - hg clone http://hg.kublai.com/python/inotify 28.135 - cd inotify 28.136 - python setup.py build --force 28.137 - sudo python setup.py install --skip-build 28.138 - \end{codesample4} 28.139 -\item Clone the \dirname{crew} Mercurial repository. Clone the 28.140 - \hgext{inotify} patch repository so that Mercurial Queues will be 28.141 - able to apply patches to your cope of the \dirname{crew} repository. 28.142 - \begin{codesample4} 28.143 - hg clone http://hg.intevation.org/mercurial/crew 28.144 - hg clone crew inotify 28.145 - hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches 28.146 - \end{codesample4} 28.147 -\item Make sure that you have the Mercurial Queues extension, 28.148 - \hgext{mq}, enabled. If you've never used MQ, read 28.149 - section~\ref{sec:mq:start} to get started quickly. 28.150 -\item Go into the \dirname{inotify} repo, and apply all of the 28.151 - \hgext{inotify} patches using the \hgxopt{mq}{qpush}{-a} option to 28.152 - the \hgxcmd{mq}{qpush} command. 28.153 - \begin{codesample4} 28.154 - cd inotify 28.155 - hg qpush -a 28.156 - \end{codesample4} 28.157 - If you get an error message from \hgxcmd{mq}{qpush}, you should not 28.158 - continue. Instead, ask for help. 28.159 -\item Build and install the patched version of Mercurial. 28.160 - \begin{codesample4} 28.161 - python setup.py build --force 28.162 - sudo python setup.py install --skip-build 28.163 - \end{codesample4} 28.164 -\end{enumerate} 28.165 -Once you've build a suitably patched version of Mercurial, all you 28.166 -need to do to enable the \hgext{inotify} extension is add an entry to 28.167 -your \hgrc. 28.168 -\begin{codesample2} 28.169 - [extensions] 28.170 - inotify = 28.171 -\end{codesample2} 28.172 -When the \hgext{inotify} extension is enabled, Mercurial will 28.173 -automatically and transparently start the status daemon the first time 28.174 -you run a command that needs status in a repository. It runs one 28.175 -status daemon per repository. 28.176 - 28.177 -The status daemon is started silently, and runs in the background. If 28.178 -you look at a list of running processes after you've enabled the 28.179 -\hgext{inotify} extension and run a few commands in different 28.180 -repositories, you'll thus see a few \texttt{hg} processes sitting 28.181 -around, waiting for updates from the kernel and queries from 28.182 -Mercurial. 28.183 - 28.184 -The first time you run a Mercurial command in a repository when you 28.185 -have the \hgext{inotify} extension enabled, it will run with about the 28.186 -same performance as a normal Mercurial command. This is because the 28.187 -status daemon needs to perform a normal status scan so that it has a 28.188 -baseline against which to apply later updates from the kernel. 28.189 -However, \emph{every} subsequent command that does any kind of status 28.190 -check should be noticeably faster on repositories of even fairly 28.191 -modest size. Better yet, the bigger your repository is, the greater a 28.192 -performance advantage you'll see. The \hgext{inotify} daemon makes 28.193 -status operations almost instantaneous on repositories of all sizes! 28.194 - 28.195 -If you like, you can manually start a status daemon using the 28.196 -\hgxcmd{inotify}{inserve} command. This gives you slightly finer 28.197 -control over how the daemon ought to run. This command will of course 28.198 -only be available when the \hgext{inotify} extension is enabled. 28.199 - 28.200 -When you're using the \hgext{inotify} extension, you should notice 28.201 -\emph{no difference at all} in Mercurial's behaviour, with the sole 28.202 -exception of status-related commands running a whole lot faster than 28.203 -they used to. You should specifically expect that commands will not 28.204 -print different output; neither should they give different results. 28.205 -If either of these situations occurs, please report a bug. 28.206 - 28.207 -\section{Flexible diff support with the \hgext{extdiff} extension} 28.208 -\label{sec:hgext:extdiff} 28.209 - 28.210 -Mercurial's built-in \hgcmd{diff} command outputs plaintext unified 28.211 -diffs. 28.212 -\interaction{extdiff.diff} 28.213 -If you would like to use an external tool to display modifications, 28.214 -you'll want to use the \hgext{extdiff} extension. This will let you 28.215 -use, for example, a graphical diff tool. 28.216 - 28.217 -The \hgext{extdiff} extension is bundled with Mercurial, so it's easy 28.218 -to set up. In the \rcsection{extensions} section of your \hgrc, 28.219 -simply add a one-line entry to enable the extension. 28.220 -\begin{codesample2} 28.221 - [extensions] 28.222 - extdiff = 28.223 -\end{codesample2} 28.224 -This introduces a command named \hgxcmd{extdiff}{extdiff}, which by 28.225 -default uses your system's \command{diff} command to generate a 28.226 -unified diff in the same form as the built-in \hgcmd{diff} command. 28.227 -\interaction{extdiff.extdiff} 28.228 -The result won't be exactly the same as with the built-in \hgcmd{diff} 28.229 -variations, because the output of \command{diff} varies from one 28.230 -system to another, even when passed the same options. 28.231 - 28.232 -As the ``\texttt{making snapshot}'' lines of output above imply, the 28.233 -\hgxcmd{extdiff}{extdiff} command works by creating two snapshots of 28.234 -your source tree. The first snapshot is of the source revision; the 28.235 -second, of the target revision or working directory. The 28.236 -\hgxcmd{extdiff}{extdiff} command generates these snapshots in a 28.237 -temporary directory, passes the name of each directory to an external 28.238 -diff viewer, then deletes the temporary directory. For efficiency, it 28.239 -only snapshots the directories and files that have changed between the 28.240 -two revisions. 28.241 - 28.242 -Snapshot directory names have the same base name as your repository. 28.243 -If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo} 28.244 -will be the name of each snapshot directory. Each snapshot directory 28.245 -name has its changeset ID appended, if appropriate. If a snapshot is 28.246 -of revision \texttt{a631aca1083f}, the directory will be named 28.247 -\dirname{foo.a631aca1083f}. A snapshot of the working directory won't 28.248 -have a changeset ID appended, so it would just be \dirname{foo} in 28.249 -this example. To see what this looks like in practice, look again at 28.250 -the \hgxcmd{extdiff}{extdiff} example above. Notice that the diff has 28.251 -the snapshot directory names embedded in its header. 28.252 - 28.253 -The \hgxcmd{extdiff}{extdiff} command accepts two important options. 28.254 -The \hgxopt{extdiff}{extdiff}{-p} option lets you choose a program to 28.255 -view differences with, instead of \command{diff}. With the 28.256 -\hgxopt{extdiff}{extdiff}{-o} option, you can change the options that 28.257 -\hgxcmd{extdiff}{extdiff} passes to the program (by default, these 28.258 -options are ``\texttt{-Npru}'', which only make sense if you're 28.259 -running \command{diff}). In other respects, the 28.260 -\hgxcmd{extdiff}{extdiff} command acts similarly to the built-in 28.261 -\hgcmd{diff} command: you use the same option names, syntax, and 28.262 -arguments to specify the revisions you want, the files you want, and 28.263 -so on. 28.264 - 28.265 -As an example, here's how to run the normal system \command{diff} 28.266 -command, getting it to generate context diffs (using the 28.267 -\cmdopt{diff}{-c} option) instead of unified diffs, and five lines of 28.268 -context instead of the default three (passing \texttt{5} as the 28.269 -argument to the \cmdopt{diff}{-C} option). 28.270 -\interaction{extdiff.extdiff-ctx} 28.271 - 28.272 -Launching a visual diff tool is just as easy. Here's how to launch 28.273 -the \command{kdiff3} viewer. 28.274 -\begin{codesample2} 28.275 - hg extdiff -p kdiff3 -o '' 28.276 -\end{codesample2} 28.277 - 28.278 -If your diff viewing command can't deal with directories, you can 28.279 -easily work around this with a little scripting. For an example of 28.280 -such scripting in action with the \hgext{mq} extension and the 28.281 -\command{interdiff} command, see 28.282 -section~\ref{mq-collab:tips:interdiff}. 28.283 - 28.284 -\subsection{Defining command aliases} 28.285 - 28.286 -It can be cumbersome to remember the options to both the 28.287 -\hgxcmd{extdiff}{extdiff} command and the diff viewer you want to use, 28.288 -so the \hgext{extdiff} extension lets you define \emph{new} commands 28.289 -that will invoke your diff viewer with exactly the right options. 28.290 - 28.291 -All you need to do is edit your \hgrc, and add a section named 28.292 -\rcsection{extdiff}. Inside this section, you can define multiple 28.293 -commands. Here's how to add a \texttt{kdiff3} command. Once you've 28.294 -defined this, you can type ``\texttt{hg kdiff3}'' and the 28.295 -\hgext{extdiff} extension will run \command{kdiff3} for you. 28.296 -\begin{codesample2} 28.297 - [extdiff] 28.298 - cmd.kdiff3 = 28.299 -\end{codesample2} 28.300 -If you leave the right hand side of the definition empty, as above, 28.301 -the \hgext{extdiff} extension uses the name of the command you defined 28.302 -as the name of the external program to run. But these names don't 28.303 -have to be the same. Here, we define a command named ``\texttt{hg 28.304 - wibble}'', which runs \command{kdiff3}. 28.305 -\begin{codesample2} 28.306 - [extdiff] 28.307 - cmd.wibble = kdiff3 28.308 -\end{codesample2} 28.309 - 28.310 -You can also specify the default options that you want to invoke your 28.311 -diff viewing program with. The prefix to use is ``\texttt{opts.}'', 28.312 -followed by the name of the command to which the options apply. This 28.313 -example defines a ``\texttt{hg vimdiff}'' command that runs the 28.314 -\command{vim} editor's \texttt{DirDiff} extension. 28.315 -\begin{codesample2} 28.316 - [extdiff] 28.317 - cmd.vimdiff = vim 28.318 - opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)' 28.319 -\end{codesample2} 28.320 - 28.321 -\section{Cherrypicking changes with the \hgext{transplant} extension} 28.322 -\label{sec:hgext:transplant} 28.323 - 28.324 -Need to have a long chat with Brendan about this. 28.325 - 28.326 -\section{Send changes via email with the \hgext{patchbomb} extension} 28.327 -\label{sec:hgext:patchbomb} 28.328 - 28.329 -Many projects have a culture of ``change review'', in which people 28.330 -send their modifications to a mailing list for others to read and 28.331 -comment on before they commit the final version to a shared 28.332 -repository. Some projects have people who act as gatekeepers; they 28.333 -apply changes from other people to a repository to which those others 28.334 -don't have access. 28.335 - 28.336 -Mercurial makes it easy to send changes over email for review or 28.337 -application, via its \hgext{patchbomb} extension. The extension is so 28.338 -namd because changes are formatted as patches, and it's usual to send 28.339 -one changeset per email message. Sending a long series of changes by 28.340 -email is thus much like ``bombing'' the recipient's inbox, hence 28.341 -``patchbomb''. 28.342 - 28.343 -As usual, the basic configuration of the \hgext{patchbomb} extension 28.344 -takes just one or two lines in your \hgrc. 28.345 -\begin{codesample2} 28.346 - [extensions] 28.347 - patchbomb = 28.348 -\end{codesample2} 28.349 -Once you've enabled the extension, you will have a new command 28.350 -available, named \hgxcmd{patchbomb}{email}. 28.351 - 28.352 -The safest and best way to invoke the \hgxcmd{patchbomb}{email} 28.353 -command is to \emph{always} run it first with the 28.354 -\hgxopt{patchbomb}{email}{-n} option. This will show you what the 28.355 -command \emph{would} send, without actually sending anything. Once 28.356 -you've had a quick glance over the changes and verified that you are 28.357 -sending the right ones, you can rerun the same command, with the 28.358 -\hgxopt{patchbomb}{email}{-n} option removed. 28.359 - 28.360 -The \hgxcmd{patchbomb}{email} command accepts the same kind of 28.361 -revision syntax as every other Mercurial command. For example, this 28.362 -command will send every revision between 7 and \texttt{tip}, 28.363 -inclusive. 28.364 -\begin{codesample2} 28.365 - hg email -n 7:tip 28.366 -\end{codesample2} 28.367 -You can also specify a \emph{repository} to compare with. If you 28.368 -provide a repository but no revisions, the \hgxcmd{patchbomb}{email} 28.369 -command will send all revisions in the local repository that are not 28.370 -present in the remote repository. If you additionally specify 28.371 -revisions or a branch name (the latter using the 28.372 -\hgxopt{patchbomb}{email}{-b} option), this will constrain the 28.373 -revisions sent. 28.374 - 28.375 -It's perfectly safe to run the \hgxcmd{patchbomb}{email} command 28.376 -without the names of the people you want to send to: if you do this, 28.377 -it will just prompt you for those values interactively. (If you're 28.378 -using a Linux or Unix-like system, you should have enhanced 28.379 -\texttt{readline}-style editing capabilities when entering those 28.380 -headers, too, which is useful.) 28.381 - 28.382 -When you are sending just one revision, the \hgxcmd{patchbomb}{email} 28.383 -command will by default use the first line of the changeset 28.384 -description as the subject of the single email message it sends. 28.385 - 28.386 -If you send multiple revisions, the \hgxcmd{patchbomb}{email} command 28.387 -will usually send one message per changeset. It will preface the 28.388 -series with an introductory message, in which you should describe the 28.389 -purpose of the series of changes you're sending. 28.390 - 28.391 -\subsection{Changing the behaviour of patchbombs} 28.392 - 28.393 -Not every project has exactly the same conventions for sending changes 28.394 -in email; the \hgext{patchbomb} extension tries to accommodate a 28.395 -number of variations through command line options. 28.396 -\begin{itemize} 28.397 -\item You can write a subject for the introductory message on the 28.398 - command line using the \hgxopt{patchbomb}{email}{-s} option. This 28.399 - takes one argument, the text of the subject to use. 28.400 -\item To change the email address from which the messages originate, 28.401 - use the \hgxopt{patchbomb}{email}{-f} option. This takes one 28.402 - argument, the email address to use. 28.403 -\item The default behaviour is to send unified diffs (see 28.404 - section~\ref{sec:mq:patch} for a description of the format), one per 28.405 - message. You can send a binary bundle instead with the 28.406 - \hgxopt{patchbomb}{email}{-b} option. 28.407 -\item Unified diffs are normally prefaced with a metadata header. You 28.408 - can omit this, and send unadorned diffs, with the 28.409 - \hgxopt{patchbomb}{email}{--plain} option. 28.410 -\item Diffs are normally sent ``inline'', in the same body part as the 28.411 - description of a patch. This makes it easiest for the largest 28.412 - number of readers to quote and respond to parts of a diff, as some 28.413 - mail clients will only quote the first MIME body part in a message. 28.414 - If you'd prefer to send the description and the diff in separate 28.415 - body parts, use the \hgxopt{patchbomb}{email}{-a} option. 28.416 -\item Instead of sending mail messages, you can write them to an 28.417 - \texttt{mbox}-format mail folder using the 28.418 - \hgxopt{patchbomb}{email}{-m} option. That option takes one 28.419 - argument, the name of the file to write to. 28.420 -\item If you would like to add a \command{diffstat}-format summary to 28.421 - each patch, and one to the introductory message, use the 28.422 - \hgxopt{patchbomb}{email}{-d} option. The \command{diffstat} 28.423 - command displays a table containing the name of each file patched, 28.424 - the number of lines affected, and a histogram showing how much each 28.425 - file is modified. This gives readers a qualitative glance at how 28.426 - complex a patch is. 28.427 -\end{itemize} 28.428 - 28.429 -%%% Local Variables: 28.430 -%%% mode: latex 28.431 -%%% TeX-master: "00book" 28.432 -%%% End:
29.1 --- a/en/hook.tex Thu Jan 29 22:47:34 2009 -0800 29.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 29.3 @@ -1,1413 +0,0 @@ 29.4 -\chapter{Handling repository events with hooks} 29.5 -\label{chap:hook} 29.6 - 29.7 -Mercurial offers a powerful mechanism to let you perform automated 29.8 -actions in response to events that occur in a repository. In some 29.9 -cases, you can even control Mercurial's response to those events. 29.10 - 29.11 -The name Mercurial uses for one of these actions is a \emph{hook}. 29.12 -Hooks are called ``triggers'' in some revision control systems, but 29.13 -the two names refer to the same idea. 29.14 - 29.15 -\section{An overview of hooks in Mercurial} 29.16 - 29.17 -Here is a brief list of the hooks that Mercurial supports. We will 29.18 -revisit each of these hooks in more detail later, in 29.19 -section~\ref{sec:hook:ref}. 29.20 - 29.21 -\begin{itemize} 29.22 -\item[\small\hook{changegroup}] This is run after a group of 29.23 - changesets has been brought into the repository from elsewhere. 29.24 -\item[\small\hook{commit}] This is run after a new changeset has been 29.25 - created in the local repository. 29.26 -\item[\small\hook{incoming}] This is run once for each new changeset 29.27 - that is brought into the repository from elsewhere. Notice the 29.28 - difference from \hook{changegroup}, which is run once per 29.29 - \emph{group} of changesets brought in. 29.30 -\item[\small\hook{outgoing}] This is run after a group of changesets 29.31 - has been transmitted from this repository. 29.32 -\item[\small\hook{prechangegroup}] This is run before starting to 29.33 - bring a group of changesets into the repository. 29.34 -\item[\small\hook{precommit}] Controlling. This is run before starting 29.35 - a commit. 29.36 -\item[\small\hook{preoutgoing}] Controlling. This is run before 29.37 - starting to transmit a group of changesets from this repository. 29.38 -\item[\small\hook{pretag}] Controlling. This is run before creating a tag. 29.39 -\item[\small\hook{pretxnchangegroup}] Controlling. This is run after a 29.40 - group of changesets has been brought into the local repository from 29.41 - another, but before the transaction completes that will make the 29.42 - changes permanent in the repository. 29.43 -\item[\small\hook{pretxncommit}] Controlling. This is run after a new 29.44 - changeset has been created in the local repository, but before the 29.45 - transaction completes that will make it permanent. 29.46 -\item[\small\hook{preupdate}] Controlling. This is run before starting 29.47 - an update or merge of the working directory. 29.48 -\item[\small\hook{tag}] This is run after a tag is created. 29.49 -\item[\small\hook{update}] This is run after an update or merge of the 29.50 - working directory has finished. 29.51 -\end{itemize} 29.52 -Each of the hooks whose description begins with the word 29.53 -``Controlling'' has the ability to determine whether an activity can 29.54 -proceed. If the hook succeeds, the activity may proceed; if it fails, 29.55 -the activity is either not permitted or undone, depending on the hook. 29.56 - 29.57 -\section{Hooks and security} 29.58 - 29.59 -\subsection{Hooks are run with your privileges} 29.60 - 29.61 -When you run a Mercurial command in a repository, and the command 29.62 -causes a hook to run, that hook runs on \emph{your} system, under 29.63 -\emph{your} user account, with \emph{your} privilege level. Since 29.64 -hooks are arbitrary pieces of executable code, you should treat them 29.65 -with an appropriate level of suspicion. Do not install a hook unless 29.66 -you are confident that you know who created it and what it does. 29.67 - 29.68 -In some cases, you may be exposed to hooks that you did not install 29.69 -yourself. If you work with Mercurial on an unfamiliar system, 29.70 -Mercurial will run hooks defined in that system's global \hgrc\ file. 29.71 - 29.72 -If you are working with a repository owned by another user, Mercurial 29.73 -can run hooks defined in that user's repository, but it will still run 29.74 -them as ``you''. For example, if you \hgcmd{pull} from that 29.75 -repository, and its \sfilename{.hg/hgrc} defines a local 29.76 -\hook{outgoing} hook, that hook will run under your user account, even 29.77 -though you don't own that repository. 29.78 - 29.79 -\begin{note} 29.80 - This only applies if you are pulling from a repository on a local or 29.81 - network filesystem. If you're pulling over http or ssh, any 29.82 - \hook{outgoing} hook will run under whatever account is executing 29.83 - the server process, on the server. 29.84 -\end{note} 29.85 - 29.86 -XXX To see what hooks are defined in a repository, use the 29.87 -\hgcmdargs{config}{hooks} command. If you are working in one 29.88 -repository, but talking to another that you do not own (e.g.~using 29.89 -\hgcmd{pull} or \hgcmd{incoming}), remember that it is the other 29.90 -repository's hooks you should be checking, not your own. 29.91 - 29.92 -\subsection{Hooks do not propagate} 29.93 - 29.94 -In Mercurial, hooks are not revision controlled, and do not propagate 29.95 -when you clone, or pull from, a repository. The reason for this is 29.96 -simple: a hook is a completely arbitrary piece of executable code. It 29.97 -runs under your user identity, with your privilege level, on your 29.98 -machine. 29.99 - 29.100 -It would be extremely reckless for any distributed revision control 29.101 -system to implement revision-controlled hooks, as this would offer an 29.102 -easily exploitable way to subvert the accounts of users of the 29.103 -revision control system. 29.104 - 29.105 -Since Mercurial does not propagate hooks, if you are collaborating 29.106 -with other people on a common project, you should not assume that they 29.107 -are using the same Mercurial hooks as you are, or that theirs are 29.108 -correctly configured. You should document the hooks you expect people 29.109 -to use. 29.110 - 29.111 -In a corporate intranet, this is somewhat easier to control, as you 29.112 -can for example provide a ``standard'' installation of Mercurial on an 29.113 -NFS filesystem, and use a site-wide \hgrc\ file to define hooks that 29.114 -all users will see. However, this too has its limits; see below. 29.115 - 29.116 -\subsection{Hooks can be overridden} 29.117 - 29.118 -Mercurial allows you to override a hook definition by redefining the 29.119 -hook. You can disable it by setting its value to the empty string, or 29.120 -change its behaviour as you wish. 29.121 - 29.122 -If you deploy a system-~or site-wide \hgrc\ file that defines some 29.123 -hooks, you should thus understand that your users can disable or 29.124 -override those hooks. 29.125 - 29.126 -\subsection{Ensuring that critical hooks are run} 29.127 - 29.128 -Sometimes you may want to enforce a policy that you do not want others 29.129 -to be able to work around. For example, you may have a requirement 29.130 -that every changeset must pass a rigorous set of tests. Defining this 29.131 -requirement via a hook in a site-wide \hgrc\ won't work for remote 29.132 -users on laptops, and of course local users can subvert it at will by 29.133 -overriding the hook. 29.134 - 29.135 -Instead, you can set up your policies for use of Mercurial so that 29.136 -people are expected to propagate changes through a well-known 29.137 -``canonical'' server that you have locked down and configured 29.138 -appropriately. 29.139 - 29.140 -One way to do this is via a combination of social engineering and 29.141 -technology. Set up a restricted-access account; users can push 29.142 -changes over the network to repositories managed by this account, but 29.143 -they cannot log into the account and run normal shell commands. In 29.144 -this scenario, a user can commit a changeset that contains any old 29.145 -garbage they want. 29.146 - 29.147 -When someone pushes a changeset to the server that everyone pulls 29.148 -from, the server will test the changeset before it accepts it as 29.149 -permanent, and reject it if it fails to pass the test suite. If 29.150 -people only pull changes from this filtering server, it will serve to 29.151 -ensure that all changes that people pull have been automatically 29.152 -vetted. 29.153 - 29.154 -\section{Care with \texttt{pretxn} hooks in a shared-access repository} 29.155 - 29.156 -If you want to use hooks to do some automated work in a repository 29.157 -that a number of people have shared access to, you need to be careful 29.158 -in how you do this. 29.159 - 29.160 -Mercurial only locks a repository when it is writing to the 29.161 -repository, and only the parts of Mercurial that write to the 29.162 -repository pay attention to locks. Write locks are necessary to 29.163 -prevent multiple simultaneous writers from scribbling on each other's 29.164 -work, corrupting the repository. 29.165 - 29.166 -Because Mercurial is careful with the order in which it reads and 29.167 -writes data, it does not need to acquire a lock when it wants to read 29.168 -data from the repository. The parts of Mercurial that read from the 29.169 -repository never pay attention to locks. This lockless reading scheme 29.170 -greatly increases performance and concurrency. 29.171 - 29.172 -With great performance comes a trade-off, though, one which has the 29.173 -potential to cause you trouble unless you're aware of it. To describe 29.174 -this requires a little detail about how Mercurial adds changesets to a 29.175 -repository and reads those changes. 29.176 - 29.177 -When Mercurial \emph{writes} metadata, it writes it straight into the 29.178 -destination file. It writes file data first, then manifest data 29.179 -(which contains pointers to the new file data), then changelog data 29.180 -(which contains pointers to the new manifest data). Before the first 29.181 -write to each file, it stores a record of where the end of the file 29.182 -was in its transaction log. If the transaction must be rolled back, 29.183 -Mercurial simply truncates each file back to the size it was before the 29.184 -transaction began. 29.185 - 29.186 -When Mercurial \emph{reads} metadata, it reads the changelog first, 29.187 -then everything else. Since a reader will only access parts of the 29.188 -manifest or file metadata that it can see in the changelog, it can 29.189 -never see partially written data. 29.190 - 29.191 -Some controlling hooks (\hook{pretxncommit} and 29.192 -\hook{pretxnchangegroup}) run when a transaction is almost complete. 29.193 -All of the metadata has been written, but Mercurial can still roll the 29.194 -transaction back and cause the newly-written data to disappear. 29.195 - 29.196 -If one of these hooks runs for long, it opens a window of time during 29.197 -which a reader can see the metadata for changesets that are not yet 29.198 -permanent, and should not be thought of as ``really there''. The 29.199 -longer the hook runs, the longer that window is open. 29.200 - 29.201 -\subsection{The problem illustrated} 29.202 - 29.203 -In principle, a good use for the \hook{pretxnchangegroup} hook would 29.204 -be to automatically build and test incoming changes before they are 29.205 -accepted into a central repository. This could let you guarantee that 29.206 -nobody can push changes to this repository that ``break the build''. 29.207 -But if a client can pull changes while they're being tested, the 29.208 -usefulness of the test is zero; an unsuspecting someone can pull 29.209 -untested changes, potentially breaking their build. 29.210 - 29.211 -The safest technological answer to this challenge is to set up such a 29.212 -``gatekeeper'' repository as \emph{unidirectional}. Let it take 29.213 -changes pushed in from the outside, but do not allow anyone to pull 29.214 -changes from it (use the \hook{preoutgoing} hook to lock it down). 29.215 -Configure a \hook{changegroup} hook so that if a build or test 29.216 -succeeds, the hook will push the new changes out to another repository 29.217 -that people \emph{can} pull from. 29.218 - 29.219 -In practice, putting a centralised bottleneck like this in place is 29.220 -not often a good idea, and transaction visibility has nothing to do 29.221 -with the problem. As the size of a project---and the time it takes to 29.222 -build and test---grows, you rapidly run into a wall with this ``try 29.223 -before you buy'' approach, where you have more changesets to test than 29.224 -time in which to deal with them. The inevitable result is frustration 29.225 -on the part of all involved. 29.226 - 29.227 -An approach that scales better is to get people to build and test 29.228 -before they push, then run automated builds and tests centrally 29.229 -\emph{after} a push, to be sure all is well. The advantage of this 29.230 -approach is that it does not impose a limit on the rate at which the 29.231 -repository can accept changes. 29.232 - 29.233 -\section{A short tutorial on using hooks} 29.234 -\label{sec:hook:simple} 29.235 - 29.236 -It is easy to write a Mercurial hook. Let's start with a hook that 29.237 -runs when you finish a \hgcmd{commit}, and simply prints the hash of 29.238 -the changeset you just created. The hook is called \hook{commit}. 29.239 - 29.240 -\begin{figure}[ht] 29.241 - \interaction{hook.simple.init} 29.242 - \caption{A simple hook that runs when a changeset is committed} 29.243 - \label{ex:hook:init} 29.244 -\end{figure} 29.245 - 29.246 -All hooks follow the pattern in example~\ref{ex:hook:init}. You add 29.247 -an entry to the \rcsection{hooks} section of your \hgrc. On the left 29.248 -is the name of the event to trigger on; on the right is the action to 29.249 -take. As you can see, you can run an arbitrary shell command in a 29.250 -hook. Mercurial passes extra information to the hook using 29.251 -environment variables (look for \envar{HG\_NODE} in the example). 29.252 - 29.253 -\subsection{Performing multiple actions per event} 29.254 - 29.255 -Quite often, you will want to define more than one hook for a 29.256 -particular kind of event, as shown in example~\ref{ex:hook:ext}. 29.257 -Mercurial lets you do this by adding an \emph{extension} to the end of 29.258 -a hook's name. You extend a hook's name by giving the name of the 29.259 -hook, followed by a full stop (the ``\texttt{.}'' character), followed 29.260 -by some more text of your choosing. For example, Mercurial will run 29.261 -both \texttt{commit.foo} and \texttt{commit.bar} when the 29.262 -\texttt{commit} event occurs. 29.263 - 29.264 -\begin{figure}[ht] 29.265 - \interaction{hook.simple.ext} 29.266 - \caption{Defining a second \hook{commit} hook} 29.267 - \label{ex:hook:ext} 29.268 -\end{figure} 29.269 - 29.270 -To give a well-defined order of execution when there are multiple 29.271 -hooks defined for an event, Mercurial sorts hooks by extension, and 29.272 -executes the hook commands in this sorted order. In the above 29.273 -example, it will execute \texttt{commit.bar} before 29.274 -\texttt{commit.foo}, and \texttt{commit} before both. 29.275 - 29.276 -It is a good idea to use a somewhat descriptive extension when you 29.277 -define a new hook. This will help you to remember what the hook was 29.278 -for. If the hook fails, you'll get an error message that contains the 29.279 -hook name and extension, so using a descriptive extension could give 29.280 -you an immediate hint as to why the hook failed (see 29.281 -section~\ref{sec:hook:perm} for an example). 29.282 - 29.283 -\subsection{Controlling whether an activity can proceed} 29.284 -\label{sec:hook:perm} 29.285 - 29.286 -In our earlier examples, we used the \hook{commit} hook, which is 29.287 -run after a commit has completed. This is one of several Mercurial 29.288 -hooks that run after an activity finishes. Such hooks have no way of 29.289 -influencing the activity itself. 29.290 - 29.291 -Mercurial defines a number of events that occur before an activity 29.292 -starts; or after it starts, but before it finishes. Hooks that 29.293 -trigger on these events have the added ability to choose whether the 29.294 -activity can continue, or will abort. 29.295 - 29.296 -The \hook{pretxncommit} hook runs after a commit has all but 29.297 -completed. In other words, the metadata representing the changeset 29.298 -has been written out to disk, but the transaction has not yet been 29.299 -allowed to complete. The \hook{pretxncommit} hook has the ability to 29.300 -decide whether the transaction can complete, or must be rolled back. 29.301 - 29.302 -If the \hook{pretxncommit} hook exits with a status code of zero, the 29.303 -transaction is allowed to complete; the commit finishes; and the 29.304 -\hook{commit} hook is run. If the \hook{pretxncommit} hook exits with 29.305 -a non-zero status code, the transaction is rolled back; the metadata 29.306 -representing the changeset is erased; and the \hook{commit} hook is 29.307 -not run. 29.308 - 29.309 -\begin{figure}[ht] 29.310 - \interaction{hook.simple.pretxncommit} 29.311 - \caption{Using the \hook{pretxncommit} hook to control commits} 29.312 - \label{ex:hook:pretxncommit} 29.313 -\end{figure} 29.314 - 29.315 -The hook in example~\ref{ex:hook:pretxncommit} checks that a commit 29.316 -comment contains a bug ID. If it does, the commit can complete. If 29.317 -not, the commit is rolled back. 29.318 - 29.319 -\section{Writing your own hooks} 29.320 - 29.321 -When you are writing a hook, you might find it useful to run Mercurial 29.322 -either with the \hggopt{-v} option, or the \rcitem{ui}{verbose} config 29.323 -item set to ``true''. When you do so, Mercurial will print a message 29.324 -before it calls each hook. 29.325 - 29.326 -\subsection{Choosing how your hook should run} 29.327 -\label{sec:hook:lang} 29.328 - 29.329 -You can write a hook either as a normal program---typically a shell 29.330 -script---or as a Python function that is executed within the Mercurial 29.331 -process. 29.332 - 29.333 -Writing a hook as an external program has the advantage that it 29.334 -requires no knowledge of Mercurial's internals. You can call normal 29.335 -Mercurial commands to get any added information you need. The 29.336 -trade-off is that external hooks are slower than in-process hooks. 29.337 - 29.338 -An in-process Python hook has complete access to the Mercurial API, 29.339 -and does not ``shell out'' to another process, so it is inherently 29.340 -faster than an external hook. It is also easier to obtain much of the 29.341 -information that a hook requires by using the Mercurial API than by 29.342 -running Mercurial commands. 29.343 - 29.344 -If you are comfortable with Python, or require high performance, 29.345 -writing your hooks in Python may be a good choice. However, when you 29.346 -have a straightforward hook to write and you don't need to care about 29.347 -performance (probably the majority of hooks), a shell script is 29.348 -perfectly fine. 29.349 - 29.350 -\subsection{Hook parameters} 29.351 -\label{sec:hook:param} 29.352 - 29.353 -Mercurial calls each hook with a set of well-defined parameters. In 29.354 -Python, a parameter is passed as a keyword argument to your hook 29.355 -function. For an external program, a parameter is passed as an 29.356 -environment variable. 29.357 - 29.358 -Whether your hook is written in Python or as a shell script, the 29.359 -hook-specific parameter names and values will be the same. A boolean 29.360 -parameter will be represented as a boolean value in Python, but as the 29.361 -number 1 (for ``true'') or 0 (for ``false'') as an environment 29.362 -variable for an external hook. If a hook parameter is named 29.363 -\texttt{foo}, the keyword argument for a Python hook will also be 29.364 -named \texttt{foo}, while the environment variable for an external 29.365 -hook will be named \texttt{HG\_FOO}. 29.366 - 29.367 -\subsection{Hook return values and activity control} 29.368 - 29.369 -A hook that executes successfully must exit with a status of zero if 29.370 -external, or return boolean ``false'' if in-process. Failure is 29.371 -indicated with a non-zero exit status from an external hook, or an 29.372 -in-process hook returning boolean ``true''. If an in-process hook 29.373 -raises an exception, the hook is considered to have failed. 29.374 - 29.375 -For a hook that controls whether an activity can proceed, zero/false 29.376 -means ``allow'', while non-zero/true/exception means ``deny''. 29.377 - 29.378 -\subsection{Writing an external hook} 29.379 - 29.380 -When you define an external hook in your \hgrc\ and the hook is run, 29.381 -its value is passed to your shell, which interprets it. This means 29.382 -that you can use normal shell constructs in the body of the hook. 29.383 - 29.384 -An executable hook is always run with its current directory set to a 29.385 -repository's root directory. 29.386 - 29.387 -Each hook parameter is passed in as an environment variable; the name 29.388 -is upper-cased, and prefixed with the string ``\texttt{HG\_}''. 29.389 - 29.390 -With the exception of hook parameters, Mercurial does not set or 29.391 -modify any environment variables when running a hook. This is useful 29.392 -to remember if you are writing a site-wide hook that may be run by a 29.393 -number of different users with differing environment variables set. 29.394 -In multi-user situations, you should not rely on environment variables 29.395 -being set to the values you have in your environment when testing the 29.396 -hook. 29.397 - 29.398 -\subsection{Telling Mercurial to use an in-process hook} 29.399 - 29.400 -The \hgrc\ syntax for defining an in-process hook is slightly 29.401 -different than for an executable hook. The value of the hook must 29.402 -start with the text ``\texttt{python:}'', and continue with the 29.403 -fully-qualified name of a callable object to use as the hook's value. 29.404 - 29.405 -The module in which a hook lives is automatically imported when a hook 29.406 -is run. So long as you have the module name and \envar{PYTHONPATH} 29.407 -right, it should ``just work''. 29.408 - 29.409 -The following \hgrc\ example snippet illustrates the syntax and 29.410 -meaning of the notions we just described. 29.411 -\begin{codesample2} 29.412 - [hooks] 29.413 - commit.example = python:mymodule.submodule.myhook 29.414 -\end{codesample2} 29.415 -When Mercurial runs the \texttt{commit.example} hook, it imports 29.416 -\texttt{mymodule.submodule}, looks for the callable object named 29.417 -\texttt{myhook}, and calls it. 29.418 - 29.419 -\subsection{Writing an in-process hook} 29.420 - 29.421 -The simplest in-process hook does nothing, but illustrates the basic 29.422 -shape of the hook API: 29.423 -\begin{codesample2} 29.424 - def myhook(ui, repo, **kwargs): 29.425 - pass 29.426 -\end{codesample2} 29.427 -The first argument to a Python hook is always a 29.428 -\pymodclass{mercurial.ui}{ui} object. The second is a repository object; 29.429 -at the moment, it is always an instance of 29.430 -\pymodclass{mercurial.localrepo}{localrepository}. Following these two 29.431 -arguments are other keyword arguments. Which ones are passed in 29.432 -depends on the hook being called, but a hook can ignore arguments it 29.433 -doesn't care about by dropping them into a keyword argument dict, as 29.434 -with \texttt{**kwargs} above. 29.435 - 29.436 -\section{Some hook examples} 29.437 - 29.438 -\subsection{Writing meaningful commit messages} 29.439 - 29.440 -It's hard to imagine a useful commit message being very short. The 29.441 -simple \hook{pretxncommit} hook of figure~\ref{ex:hook:msglen.go} 29.442 -will prevent you from committing a changeset with a message that is 29.443 -less than ten bytes long. 29.444 - 29.445 -\begin{figure}[ht] 29.446 - \interaction{hook.msglen.go} 29.447 - \caption{A hook that forbids overly short commit messages} 29.448 - \label{ex:hook:msglen.go} 29.449 -\end{figure} 29.450 - 29.451 -\subsection{Checking for trailing whitespace} 29.452 - 29.453 -An interesting use of a commit-related hook is to help you to write 29.454 -cleaner code. A simple example of ``cleaner code'' is the dictum that 29.455 -a change should not add any new lines of text that contain ``trailing 29.456 -whitespace''. Trailing whitespace is a series of space and tab 29.457 -characters at the end of a line of text. In most cases, trailing 29.458 -whitespace is unnecessary, invisible noise, but it is occasionally 29.459 -problematic, and people often prefer to get rid of it. 29.460 - 29.461 -You can use either the \hook{precommit} or \hook{pretxncommit} hook to 29.462 -tell whether you have a trailing whitespace problem. If you use the 29.463 -\hook{precommit} hook, the hook will not know which files you are 29.464 -committing, so it will have to check every modified file in the 29.465 -repository for trailing white space. If you want to commit a change 29.466 -to just the file \filename{foo}, but the file \filename{bar} contains 29.467 -trailing whitespace, doing a check in the \hook{precommit} hook will 29.468 -prevent you from committing \filename{foo} due to the problem with 29.469 -\filename{bar}. This doesn't seem right. 29.470 - 29.471 -Should you choose the \hook{pretxncommit} hook, the check won't occur 29.472 -until just before the transaction for the commit completes. This will 29.473 -allow you to check for problems only the exact files that are being 29.474 -committed. However, if you entered the commit message interactively 29.475 -and the hook fails, the transaction will roll back; you'll have to 29.476 -re-enter the commit message after you fix the trailing whitespace and 29.477 -run \hgcmd{commit} again. 29.478 - 29.479 -\begin{figure}[ht] 29.480 - \interaction{hook.ws.simple} 29.481 - \caption{A simple hook that checks for trailing whitespace} 29.482 - \label{ex:hook:ws.simple} 29.483 -\end{figure} 29.484 - 29.485 -Figure~\ref{ex:hook:ws.simple} introduces a simple \hook{pretxncommit} 29.486 -hook that checks for trailing whitespace. This hook is short, but not 29.487 -very helpful. It exits with an error status if a change adds a line 29.488 -with trailing whitespace to any file, but does not print any 29.489 -information that might help us to identify the offending file or 29.490 -line. It also has the nice property of not paying attention to 29.491 -unmodified lines; only lines that introduce new trailing whitespace 29.492 -cause problems. 29.493 - 29.494 -\begin{figure}[ht] 29.495 - \interaction{hook.ws.better} 29.496 - \caption{A better trailing whitespace hook} 29.497 - \label{ex:hook:ws.better} 29.498 -\end{figure} 29.499 - 29.500 -The example of figure~\ref{ex:hook:ws.better} is much more complex, 29.501 -but also more useful. It parses a unified diff to see if any lines 29.502 -add trailing whitespace, and prints the name of the file and the line 29.503 -number of each such occurrence. Even better, if the change adds 29.504 -trailing whitespace, this hook saves the commit comment and prints the 29.505 -name of the save file before exiting and telling Mercurial to roll the 29.506 -transaction back, so you can use 29.507 -\hgcmdargs{commit}{\hgopt{commit}{-l}~\emph{filename}} to reuse the 29.508 -saved commit message once you've corrected the problem. 29.509 - 29.510 -As a final aside, note in figure~\ref{ex:hook:ws.better} the use of 29.511 -\command{perl}'s in-place editing feature to get rid of trailing 29.512 -whitespace from a file. This is concise and useful enough that I will 29.513 -reproduce it here. 29.514 -\begin{codesample2} 29.515 - perl -pi -e 's,\textbackslash{}s+\$,,' filename 29.516 -\end{codesample2} 29.517 - 29.518 -\section{Bundled hooks} 29.519 - 29.520 -Mercurial ships with several bundled hooks. You can find them in the 29.521 -\dirname{hgext} directory of a Mercurial source tree. If you are 29.522 -using a Mercurial binary package, the hooks will be located in the 29.523 -\dirname{hgext} directory of wherever your package installer put 29.524 -Mercurial. 29.525 - 29.526 -\subsection{\hgext{acl}---access control for parts of a repository} 29.527 - 29.528 -The \hgext{acl} extension lets you control which remote users are 29.529 -allowed to push changesets to a networked server. You can protect any 29.530 -portion of a repository (including the entire repo), so that a 29.531 -specific remote user can push changes that do not affect the protected 29.532 -portion. 29.533 - 29.534 -This extension implements access control based on the identity of the 29.535 -user performing a push, \emph{not} on who committed the changesets 29.536 -they're pushing. It makes sense to use this hook only if you have a 29.537 -locked-down server environment that authenticates remote users, and 29.538 -you want to be sure that only specific users are allowed to push 29.539 -changes to that server. 29.540 - 29.541 -\subsubsection{Configuring the \hook{acl} hook} 29.542 - 29.543 -In order to manage incoming changesets, the \hgext{acl} hook must be 29.544 -used as a \hook{pretxnchangegroup} hook. This lets it see which files 29.545 -are modified by each incoming changeset, and roll back a group of 29.546 -changesets if they modify ``forbidden'' files. Example: 29.547 -\begin{codesample2} 29.548 - [hooks] 29.549 - pretxnchangegroup.acl = python:hgext.acl.hook 29.550 -\end{codesample2} 29.551 - 29.552 -The \hgext{acl} extension is configured using three sections. 29.553 - 29.554 -The \rcsection{acl} section has only one entry, \rcitem{acl}{sources}, 29.555 -which lists the sources of incoming changesets that the hook should 29.556 -pay attention to. You don't normally need to configure this section. 29.557 -\begin{itemize} 29.558 -\item[\rcitem{acl}{serve}] Control incoming changesets that are arriving 29.559 - from a remote repository over http or ssh. This is the default 29.560 - value of \rcitem{acl}{sources}, and usually the only setting you'll 29.561 - need for this configuration item. 29.562 -\item[\rcitem{acl}{pull}] Control incoming changesets that are 29.563 - arriving via a pull from a local repository. 29.564 -\item[\rcitem{acl}{push}] Control incoming changesets that are 29.565 - arriving via a push from a local repository. 29.566 -\item[\rcitem{acl}{bundle}] Control incoming changesets that are 29.567 - arriving from another repository via a bundle. 29.568 -\end{itemize} 29.569 - 29.570 -The \rcsection{acl.allow} section controls the users that are allowed to 29.571 -add changesets to the repository. If this section is not present, all 29.572 -users that are not explicitly denied are allowed. If this section is 29.573 -present, all users that are not explicitly allowed are denied (so an 29.574 -empty section means that all users are denied). 29.575 - 29.576 -The \rcsection{acl.deny} section determines which users are denied 29.577 -from adding changesets to the repository. If this section is not 29.578 -present or is empty, no users are denied. 29.579 - 29.580 -The syntaxes for the \rcsection{acl.allow} and \rcsection{acl.deny} 29.581 -sections are identical. On the left of each entry is a glob pattern 29.582 -that matches files or directories, relative to the root of the 29.583 -repository; on the right, a user name. 29.584 - 29.585 -In the following example, the user \texttt{docwriter} can only push 29.586 -changes to the \dirname{docs} subtree of the repository, while 29.587 -\texttt{intern} can push changes to any file or directory except 29.588 -\dirname{source/sensitive}. 29.589 -\begin{codesample2} 29.590 - [acl.allow] 29.591 - docs/** = docwriter 29.592 - 29.593 - [acl.deny] 29.594 - source/sensitive/** = intern 29.595 -\end{codesample2} 29.596 - 29.597 -\subsubsection{Testing and troubleshooting} 29.598 - 29.599 -If you want to test the \hgext{acl} hook, run it with Mercurial's 29.600 -debugging output enabled. Since you'll probably be running it on a 29.601 -server where it's not convenient (or sometimes possible) to pass in 29.602 -the \hggopt{--debug} option, don't forget that you can enable 29.603 -debugging output in your \hgrc: 29.604 -\begin{codesample2} 29.605 - [ui] 29.606 - debug = true 29.607 -\end{codesample2} 29.608 -With this enabled, the \hgext{acl} hook will print enough information 29.609 -to let you figure out why it is allowing or forbidding pushes from 29.610 -specific users. 29.611 - 29.612 -\subsection{\hgext{bugzilla}---integration with Bugzilla} 29.613 - 29.614 -The \hgext{bugzilla} extension adds a comment to a Bugzilla bug 29.615 -whenever it finds a reference to that bug ID in a commit comment. You 29.616 -can install this hook on a shared server, so that any time a remote 29.617 -user pushes changes to this server, the hook gets run. 29.618 - 29.619 -It adds a comment to the bug that looks like this (you can configure 29.620 -the contents of the comment---see below): 29.621 -\begin{codesample2} 29.622 - Changeset aad8b264143a, made by Joe User <joe.user@domain.com> in 29.623 - the frobnitz repository, refers to this bug. 29.624 - 29.625 - For complete details, see 29.626 - http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a 29.627 - 29.628 - Changeset description: 29.629 - Fix bug 10483 by guarding against some NULL pointers 29.630 -\end{codesample2} 29.631 -The value of this hook is that it automates the process of updating a 29.632 -bug any time a changeset refers to it. If you configure the hook 29.633 -properly, it makes it easy for people to browse straight from a 29.634 -Bugzilla bug to a changeset that refers to that bug. 29.635 - 29.636 -You can use the code in this hook as a starting point for some more 29.637 -exotic Bugzilla integration recipes. Here are a few possibilities: 29.638 -\begin{itemize} 29.639 -\item Require that every changeset pushed to the server have a valid 29.640 - bug~ID in its commit comment. In this case, you'd want to configure 29.641 - the hook as a \hook{pretxncommit} hook. This would allow the hook 29.642 - to reject changes that didn't contain bug IDs. 29.643 -\item Allow incoming changesets to automatically modify the 29.644 - \emph{state} of a bug, as well as simply adding a comment. For 29.645 - example, the hook could recognise the string ``fixed bug 31337'' as 29.646 - indicating that it should update the state of bug 31337 to 29.647 - ``requires testing''. 29.648 -\end{itemize} 29.649 - 29.650 -\subsubsection{Configuring the \hook{bugzilla} hook} 29.651 -\label{sec:hook:bugzilla:config} 29.652 - 29.653 -You should configure this hook in your server's \hgrc\ as an 29.654 -\hook{incoming} hook, for example as follows: 29.655 -\begin{codesample2} 29.656 - [hooks] 29.657 - incoming.bugzilla = python:hgext.bugzilla.hook 29.658 -\end{codesample2} 29.659 - 29.660 -Because of the specialised nature of this hook, and because Bugzilla 29.661 -was not written with this kind of integration in mind, configuring 29.662 -this hook is a somewhat involved process. 29.663 - 29.664 -Before you begin, you must install the MySQL bindings for Python on 29.665 -the host(s) where you'll be running the hook. If this is not 29.666 -available as a binary package for your system, you can download it 29.667 -from~\cite{web:mysql-python}. 29.668 - 29.669 -Configuration information for this hook lives in the 29.670 -\rcsection{bugzilla} section of your \hgrc. 29.671 -\begin{itemize} 29.672 -\item[\rcitem{bugzilla}{version}] The version of Bugzilla installed on 29.673 - the server. The database schema that Bugzilla uses changes 29.674 - occasionally, so this hook has to know exactly which schema to use. 29.675 - At the moment, the only version supported is \texttt{2.16}. 29.676 -\item[\rcitem{bugzilla}{host}] The hostname of the MySQL server that 29.677 - stores your Bugzilla data. The database must be configured to allow 29.678 - connections from whatever host you are running the \hook{bugzilla} 29.679 - hook on. 29.680 -\item[\rcitem{bugzilla}{user}] The username with which to connect to 29.681 - the MySQL server. The database must be configured to allow this 29.682 - user to connect from whatever host you are running the 29.683 - \hook{bugzilla} hook on. This user must be able to access and 29.684 - modify Bugzilla tables. The default value of this item is 29.685 - \texttt{bugs}, which is the standard name of the Bugzilla user in a 29.686 - MySQL database. 29.687 -\item[\rcitem{bugzilla}{password}] The MySQL password for the user you 29.688 - configured above. This is stored as plain text, so you should make 29.689 - sure that unauthorised users cannot read the \hgrc\ file where you 29.690 - store this information. 29.691 -\item[\rcitem{bugzilla}{db}] The name of the Bugzilla database on the 29.692 - MySQL server. The default value of this item is \texttt{bugs}, 29.693 - which is the standard name of the MySQL database where Bugzilla 29.694 - stores its data. 29.695 -\item[\rcitem{bugzilla}{notify}] If you want Bugzilla to send out a 29.696 - notification email to subscribers after this hook has added a 29.697 - comment to a bug, you will need this hook to run a command whenever 29.698 - it updates the database. The command to run depends on where you 29.699 - have installed Bugzilla, but it will typically look something like 29.700 - this, if you have Bugzilla installed in 29.701 - \dirname{/var/www/html/bugzilla}: 29.702 - \begin{codesample4} 29.703 - cd /var/www/html/bugzilla && ./processmail %s nobody@nowhere.com 29.704 - \end{codesample4} 29.705 - The Bugzilla \texttt{processmail} program expects to be given a 29.706 - bug~ID (the hook replaces ``\texttt{\%s}'' with the bug~ID) and an 29.707 - email address. It also expects to be able to write to some files in 29.708 - the directory that it runs in. If Bugzilla and this hook are not 29.709 - installed on the same machine, you will need to find a way to run 29.710 - \texttt{processmail} on the server where Bugzilla is installed. 29.711 -\end{itemize} 29.712 - 29.713 -\subsubsection{Mapping committer names to Bugzilla user names} 29.714 - 29.715 -By default, the \hgext{bugzilla} hook tries to use the email address 29.716 -of a changeset's committer as the Bugzilla user name with which to 29.717 -update a bug. If this does not suit your needs, you can map committer 29.718 -email addresses to Bugzilla user names using a \rcsection{usermap} 29.719 -section. 29.720 - 29.721 -Each item in the \rcsection{usermap} section contains an email address 29.722 -on the left, and a Bugzilla user name on the right. 29.723 -\begin{codesample2} 29.724 - [usermap] 29.725 - jane.user@example.com = jane 29.726 -\end{codesample2} 29.727 -You can either keep the \rcsection{usermap} data in a normal \hgrc, or 29.728 -tell the \hgext{bugzilla} hook to read the information from an 29.729 -external \filename{usermap} file. In the latter case, you can store 29.730 -\filename{usermap} data by itself in (for example) a user-modifiable 29.731 -repository. This makes it possible to let your users maintain their 29.732 -own \rcitem{bugzilla}{usermap} entries. The main \hgrc\ file might 29.733 -look like this: 29.734 -\begin{codesample2} 29.735 - # regular hgrc file refers to external usermap file 29.736 - [bugzilla] 29.737 - usermap = /home/hg/repos/userdata/bugzilla-usermap.conf 29.738 -\end{codesample2} 29.739 -While the \filename{usermap} file that it refers to might look like 29.740 -this: 29.741 -\begin{codesample2} 29.742 - # bugzilla-usermap.conf - inside a hg repository 29.743 - [usermap] 29.744 - stephanie@example.com = steph 29.745 -\end{codesample2} 29.746 - 29.747 -\subsubsection{Configuring the text that gets added to a bug} 29.748 - 29.749 -You can configure the text that this hook adds as a comment; you 29.750 -specify it in the form of a Mercurial template. Several \hgrc\ 29.751 -entries (still in the \rcsection{bugzilla} section) control this 29.752 -behaviour. 29.753 -\begin{itemize} 29.754 -\item[\texttt{strip}] The number of leading path elements to strip 29.755 - from a repository's path name to construct a partial path for a URL. 29.756 - For example, if the repositories on your server live under 29.757 - \dirname{/home/hg/repos}, and you have a repository whose path is 29.758 - \dirname{/home/hg/repos/app/tests}, then setting \texttt{strip} to 29.759 - \texttt{4} will give a partial path of \dirname{app/tests}. The 29.760 - hook will make this partial path available when expanding a 29.761 - template, as \texttt{webroot}. 29.762 -\item[\texttt{template}] The text of the template to use. In addition 29.763 - to the usual changeset-related variables, this template can use 29.764 - \texttt{hgweb} (the value of the \texttt{hgweb} configuration item 29.765 - above) and \texttt{webroot} (the path constructed using 29.766 - \texttt{strip} above). 29.767 -\end{itemize} 29.768 - 29.769 -In addition, you can add a \rcitem{web}{baseurl} item to the 29.770 -\rcsection{web} section of your \hgrc. The \hgext{bugzilla} hook will 29.771 -make this available when expanding a template, as the base string to 29.772 -use when constructing a URL that will let users browse from a Bugzilla 29.773 -comment to view a changeset. Example: 29.774 -\begin{codesample2} 29.775 - [web] 29.776 - baseurl = http://hg.domain.com/ 29.777 -\end{codesample2} 29.778 - 29.779 -Here is an example set of \hgext{bugzilla} hook config information. 29.780 -\begin{codesample2} 29.781 - [bugzilla] 29.782 - host = bugzilla.example.com 29.783 - password = mypassword 29.784 - version = 2.16 29.785 - # server-side repos live in /home/hg/repos, so strip 4 leading 29.786 - # separators 29.787 - strip = 4 29.788 - hgweb = http://hg.example.com/ 29.789 - usermap = /home/hg/repos/notify/bugzilla.conf 29.790 - template = Changeset \{node|short\}, made by \{author\} in the \{webroot\} 29.791 - repo, refers to this bug.\\nFor complete details, see 29.792 - \{hgweb\}\{webroot\}?cmd=changeset;node=\{node|short\}\\nChangeset 29.793 - description:\\n\\t\{desc|tabindent\} 29.794 -\end{codesample2} 29.795 - 29.796 -\subsubsection{Testing and troubleshooting} 29.797 - 29.798 -The most common problems with configuring the \hgext{bugzilla} hook 29.799 -relate to running Bugzilla's \filename{processmail} script and mapping 29.800 -committer names to user names. 29.801 - 29.802 -Recall from section~\ref{sec:hook:bugzilla:config} above that the user 29.803 -that runs the Mercurial process on the server is also the one that 29.804 -will run the \filename{processmail} script. The 29.805 -\filename{processmail} script sometimes causes Bugzilla to write to 29.806 -files in its configuration directory, and Bugzilla's configuration 29.807 -files are usually owned by the user that your web server runs under. 29.808 - 29.809 -You can cause \filename{processmail} to be run with the suitable 29.810 -user's identity using the \command{sudo} command. Here is an example 29.811 -entry for a \filename{sudoers} file. 29.812 -\begin{codesample2} 29.813 - hg_user = (httpd_user) NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s 29.814 -\end{codesample2} 29.815 -This allows the \texttt{hg\_user} user to run a 29.816 -\filename{processmail-wrapper} program under the identity of 29.817 -\texttt{httpd\_user}. 29.818 - 29.819 -This indirection through a wrapper script is necessary, because 29.820 -\filename{processmail} expects to be run with its current directory 29.821 -set to wherever you installed Bugzilla; you can't specify that kind of 29.822 -constraint in a \filename{sudoers} file. The contents of the wrapper 29.823 -script are simple: 29.824 -\begin{codesample2} 29.825 - #!/bin/sh 29.826 - cd `dirname $0` && ./processmail "$1" nobody@example.com 29.827 -\end{codesample2} 29.828 -It doesn't seem to matter what email address you pass to 29.829 -\filename{processmail}. 29.830 - 29.831 -If your \rcsection{usermap} is not set up correctly, users will see an 29.832 -error message from the \hgext{bugzilla} hook when they push changes 29.833 -to the server. The error message will look like this: 29.834 -\begin{codesample2} 29.835 - cannot find bugzilla user id for john.q.public@example.com 29.836 -\end{codesample2} 29.837 -What this means is that the committer's address, 29.838 -\texttt{john.q.public@example.com}, is not a valid Bugzilla user name, 29.839 -nor does it have an entry in your \rcsection{usermap} that maps it to 29.840 -a valid Bugzilla user name. 29.841 - 29.842 -\subsection{\hgext{notify}---send email notifications} 29.843 - 29.844 -Although Mercurial's built-in web server provides RSS feeds of changes 29.845 -in every repository, many people prefer to receive change 29.846 -notifications via email. The \hgext{notify} hook lets you send out 29.847 -notifications to a set of email addresses whenever changesets arrive 29.848 -that those subscribers are interested in. 29.849 - 29.850 -As with the \hgext{bugzilla} hook, the \hgext{notify} hook is 29.851 -template-driven, so you can customise the contents of the notification 29.852 -messages that it sends. 29.853 - 29.854 -By default, the \hgext{notify} hook includes a diff of every changeset 29.855 -that it sends out; you can limit the size of the diff, or turn this 29.856 -feature off entirely. It is useful for letting subscribers review 29.857 -changes immediately, rather than clicking to follow a URL. 29.858 - 29.859 -\subsubsection{Configuring the \hgext{notify} hook} 29.860 - 29.861 -You can set up the \hgext{notify} hook to send one email message per 29.862 -incoming changeset, or one per incoming group of changesets (all those 29.863 -that arrived in a single pull or push). 29.864 -\begin{codesample2} 29.865 - [hooks] 29.866 - # send one email per group of changes 29.867 - changegroup.notify = python:hgext.notify.hook 29.868 - # send one email per change 29.869 - incoming.notify = python:hgext.notify.hook 29.870 -\end{codesample2} 29.871 - 29.872 -Configuration information for this hook lives in the 29.873 -\rcsection{notify} section of a \hgrc\ file. 29.874 -\begin{itemize} 29.875 -\item[\rcitem{notify}{test}] By default, this hook does not send out 29.876 - email at all; instead, it prints the message that it \emph{would} 29.877 - send. Set this item to \texttt{false} to allow email to be sent. 29.878 - The reason that sending of email is turned off by default is that it 29.879 - takes several tries to configure this extension exactly as you would 29.880 - like, and it would be bad form to spam subscribers with a number of 29.881 - ``broken'' notifications while you debug your configuration. 29.882 -\item[\rcitem{notify}{config}] The path to a configuration file that 29.883 - contains subscription information. This is kept separate from the 29.884 - main \hgrc\ so that you can maintain it in a repository of its own. 29.885 - People can then clone that repository, update their subscriptions, 29.886 - and push the changes back to your server. 29.887 -\item[\rcitem{notify}{strip}] The number of leading path separator 29.888 - characters to strip from a repository's path, when deciding whether 29.889 - a repository has subscribers. For example, if the repositories on 29.890 - your server live in \dirname{/home/hg/repos}, and \hgext{notify} is 29.891 - considering a repository named \dirname{/home/hg/repos/shared/test}, 29.892 - setting \rcitem{notify}{strip} to \texttt{4} will cause 29.893 - \hgext{notify} to trim the path it considers down to 29.894 - \dirname{shared/test}, and it will match subscribers against that. 29.895 -\item[\rcitem{notify}{template}] The template text to use when sending 29.896 - messages. This specifies both the contents of the message header 29.897 - and its body. 29.898 -\item[\rcitem{notify}{maxdiff}] The maximum number of lines of diff 29.899 - data to append to the end of a message. If a diff is longer than 29.900 - this, it is truncated. By default, this is set to 300. Set this to 29.901 - \texttt{0} to omit diffs from notification emails. 29.902 -\item[\rcitem{notify}{sources}] A list of sources of changesets to 29.903 - consider. This lets you limit \hgext{notify} to only sending out 29.904 - email about changes that remote users pushed into this repository 29.905 - via a server, for example. See section~\ref{sec:hook:sources} for 29.906 - the sources you can specify here. 29.907 -\end{itemize} 29.908 - 29.909 -If you set the \rcitem{web}{baseurl} item in the \rcsection{web} 29.910 -section, you can use it in a template; it will be available as 29.911 -\texttt{webroot}. 29.912 - 29.913 -Here is an example set of \hgext{notify} configuration information. 29.914 -\begin{codesample2} 29.915 - [notify] 29.916 - # really send email 29.917 - test = false 29.918 - # subscriber data lives in the notify repo 29.919 - config = /home/hg/repos/notify/notify.conf 29.920 - # repos live in /home/hg/repos on server, so strip 4 "/" chars 29.921 - strip = 4 29.922 - template = X-Hg-Repo: \{webroot\} 29.923 - Subject: \{webroot\}: \{desc|firstline|strip\} 29.924 - From: \{author\} 29.925 - 29.926 - changeset \{node|short\} in \{root\} 29.927 - details: \{baseurl\}\{webroot\}?cmd=changeset;node=\{node|short\} 29.928 - description: 29.929 - \{desc|tabindent|strip\} 29.930 - 29.931 - [web] 29.932 - baseurl = http://hg.example.com/ 29.933 -\end{codesample2} 29.934 - 29.935 -This will produce a message that looks like the following: 29.936 -\begin{codesample2} 29.937 - X-Hg-Repo: tests/slave 29.938 - Subject: tests/slave: Handle error case when slave has no buffers 29.939 - Date: Wed, 2 Aug 2006 15:25:46 -0700 (PDT) 29.940 - 29.941 - changeset 3cba9bfe74b5 in /home/hg/repos/tests/slave 29.942 - details: http://hg.example.com/tests/slave?cmd=changeset;node=3cba9bfe74b5 29.943 - description: 29.944 - Handle error case when slave has no buffers 29.945 - diffs (54 lines): 29.946 - 29.947 - diff -r 9d95df7cf2ad -r 3cba9bfe74b5 include/tests.h 29.948 - --- a/include/tests.h Wed Aug 02 15:19:52 2006 -0700 29.949 - +++ b/include/tests.h Wed Aug 02 15:25:26 2006 -0700 29.950 - @@ -212,6 +212,15 @@ static __inline__ void test_headers(void *h) 29.951 - [...snip...] 29.952 -\end{codesample2} 29.953 - 29.954 -\subsubsection{Testing and troubleshooting} 29.955 - 29.956 -Do not forget that by default, the \hgext{notify} extension \emph{will 29.957 - not send any mail} until you explicitly configure it to do so, by 29.958 -setting \rcitem{notify}{test} to \texttt{false}. Until you do that, 29.959 -it simply prints the message it \emph{would} send. 29.960 - 29.961 -\section{Information for writers of hooks} 29.962 -\label{sec:hook:ref} 29.963 - 29.964 -\subsection{In-process hook execution} 29.965 - 29.966 -An in-process hook is called with arguments of the following form: 29.967 -\begin{codesample2} 29.968 - def myhook(ui, repo, **kwargs): 29.969 - pass 29.970 -\end{codesample2} 29.971 -The \texttt{ui} parameter is a \pymodclass{mercurial.ui}{ui} object. 29.972 -The \texttt{repo} parameter is a 29.973 -\pymodclass{mercurial.localrepo}{localrepository} object. The 29.974 -names and values of the \texttt{**kwargs} parameters depend on the 29.975 -hook being invoked, with the following common features: 29.976 -\begin{itemize} 29.977 -\item If a parameter is named \texttt{node} or 29.978 - \texttt{parent\emph{N}}, it will contain a hexadecimal changeset ID. 29.979 - The empty string is used to represent ``null changeset ID'' instead 29.980 - of a string of zeroes. 29.981 -\item If a parameter is named \texttt{url}, it will contain the URL of 29.982 - a remote repository, if that can be determined. 29.983 -\item Boolean-valued parameters are represented as Python 29.984 - \texttt{bool} objects. 29.985 -\end{itemize} 29.986 - 29.987 -An in-process hook is called without a change to the process's working 29.988 -directory (unlike external hooks, which are run in the root of the 29.989 -repository). It must not change the process's working directory, or 29.990 -it will cause any calls it makes into the Mercurial API to fail. 29.991 - 29.992 -If a hook returns a boolean ``false'' value, it is considered to have 29.993 -succeeded. If it returns a boolean ``true'' value or raises an 29.994 -exception, it is considered to have failed. A useful way to think of 29.995 -the calling convention is ``tell me if you fail''. 29.996 - 29.997 -Note that changeset IDs are passed into Python hooks as hexadecimal 29.998 -strings, not the binary hashes that Mercurial's APIs normally use. To 29.999 -convert a hash from hex to binary, use the 29.1000 -\pymodfunc{mercurial.node}{bin} function. 29.1001 - 29.1002 -\subsection{External hook execution} 29.1003 - 29.1004 -An external hook is passed to the shell of the user running Mercurial. 29.1005 -Features of that shell, such as variable substitution and command 29.1006 -redirection, are available. The hook is run in the root directory of 29.1007 -the repository (unlike in-process hooks, which are run in the same 29.1008 -directory that Mercurial was run in). 29.1009 - 29.1010 -Hook parameters are passed to the hook as environment variables. Each 29.1011 -environment variable's name is converted in upper case and prefixed 29.1012 -with the string ``\texttt{HG\_}''. For example, if the name of a 29.1013 -parameter is ``\texttt{node}'', the name of the environment variable 29.1014 -representing that parameter will be ``\texttt{HG\_NODE}''. 29.1015 - 29.1016 -A boolean parameter is represented as the string ``\texttt{1}'' for 29.1017 -``true'', ``\texttt{0}'' for ``false''. If an environment variable is 29.1018 -named \envar{HG\_NODE}, \envar{HG\_PARENT1} or \envar{HG\_PARENT2}, it 29.1019 -contains a changeset ID represented as a hexadecimal string. The 29.1020 -empty string is used to represent ``null changeset ID'' instead of a 29.1021 -string of zeroes. If an environment variable is named 29.1022 -\envar{HG\_URL}, it will contain the URL of a remote repository, if 29.1023 -that can be determined. 29.1024 - 29.1025 -If a hook exits with a status of zero, it is considered to have 29.1026 -succeeded. If it exits with a non-zero status, it is considered to 29.1027 -have failed. 29.1028 - 29.1029 -\subsection{Finding out where changesets come from} 29.1030 - 29.1031 -A hook that involves the transfer of changesets between a local 29.1032 -repository and another may be able to find out information about the 29.1033 -``far side''. Mercurial knows \emph{how} changes are being 29.1034 -transferred, and in many cases \emph{where} they are being transferred 29.1035 -to or from. 29.1036 - 29.1037 -\subsubsection{Sources of changesets} 29.1038 -\label{sec:hook:sources} 29.1039 - 29.1040 -Mercurial will tell a hook what means are, or were, used to transfer 29.1041 -changesets between repositories. This is provided by Mercurial in a 29.1042 -Python parameter named \texttt{source}, or an environment variable named 29.1043 -\envar{HG\_SOURCE}. 29.1044 - 29.1045 -\begin{itemize} 29.1046 -\item[\texttt{serve}] Changesets are transferred to or from a remote 29.1047 - repository over http or ssh. 29.1048 -\item[\texttt{pull}] Changesets are being transferred via a pull from 29.1049 - one repository into another. 29.1050 -\item[\texttt{push}] Changesets are being transferred via a push from 29.1051 - one repository into another. 29.1052 -\item[\texttt{bundle}] Changesets are being transferred to or from a 29.1053 - bundle. 29.1054 -\end{itemize} 29.1055 - 29.1056 -\subsubsection{Where changes are going---remote repository URLs} 29.1057 -\label{sec:hook:url} 29.1058 - 29.1059 -When possible, Mercurial will tell a hook the location of the ``far 29.1060 -side'' of an activity that transfers changeset data between 29.1061 -repositories. This is provided by Mercurial in a Python parameter 29.1062 -named \texttt{url}, or an environment variable named \envar{HG\_URL}. 29.1063 - 29.1064 -This information is not always known. If a hook is invoked in a 29.1065 -repository that is being served via http or ssh, Mercurial cannot tell 29.1066 -where the remote repository is, but it may know where the client is 29.1067 -connecting from. In such cases, the URL will take one of the 29.1068 -following forms: 29.1069 -\begin{itemize} 29.1070 -\item \texttt{remote:ssh:\emph{ip-address}}---remote ssh client, at 29.1071 - the given IP address. 29.1072 -\item \texttt{remote:http:\emph{ip-address}}---remote http client, at 29.1073 - the given IP address. If the client is using SSL, this will be of 29.1074 - the form \texttt{remote:https:\emph{ip-address}}. 29.1075 -\item Empty---no information could be discovered about the remote 29.1076 - client. 29.1077 -\end{itemize} 29.1078 - 29.1079 -\section{Hook reference} 29.1080 - 29.1081 -\subsection{\hook{changegroup}---after remote changesets added} 29.1082 -\label{sec:hook:changegroup} 29.1083 - 29.1084 -This hook is run after a group of pre-existing changesets has been 29.1085 -added to the repository, for example via a \hgcmd{pull} or 29.1086 -\hgcmd{unbundle}. This hook is run once per operation that added one 29.1087 -or more changesets. This is in contrast to the \hook{incoming} hook, 29.1088 -which is run once per changeset, regardless of whether the changesets 29.1089 -arrive in a group. 29.1090 - 29.1091 -Some possible uses for this hook include kicking off an automated 29.1092 -build or test of the added changesets, updating a bug database, or 29.1093 -notifying subscribers that a repository contains new changes. 29.1094 - 29.1095 -Parameters to this hook: 29.1096 -\begin{itemize} 29.1097 -\item[\texttt{node}] A changeset ID. The changeset ID of the first 29.1098 - changeset in the group that was added. All changesets between this 29.1099 - and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by 29.1100 - a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. 29.1101 -\item[\texttt{source}] A string. The source of these changes. See 29.1102 - section~\ref{sec:hook:sources} for details. 29.1103 -\item[\texttt{url}] A URL. The location of the remote repository, if 29.1104 - known. See section~\ref{sec:hook:url} for more information. 29.1105 -\end{itemize} 29.1106 - 29.1107 -See also: \hook{incoming} (section~\ref{sec:hook:incoming}), 29.1108 -\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), 29.1109 -\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) 29.1110 - 29.1111 -\subsection{\hook{commit}---after a new changeset is created} 29.1112 -\label{sec:hook:commit} 29.1113 - 29.1114 -This hook is run after a new changeset has been created. 29.1115 - 29.1116 -Parameters to this hook: 29.1117 -\begin{itemize} 29.1118 -\item[\texttt{node}] A changeset ID. The changeset ID of the newly 29.1119 - committed changeset. 29.1120 -\item[\texttt{parent1}] A changeset ID. The changeset ID of the first 29.1121 - parent of the newly committed changeset. 29.1122 -\item[\texttt{parent2}] A changeset ID. The changeset ID of the second 29.1123 - parent of the newly committed changeset. 29.1124 -\end{itemize} 29.1125 - 29.1126 -See also: \hook{precommit} (section~\ref{sec:hook:precommit}), 29.1127 -\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) 29.1128 - 29.1129 -\subsection{\hook{incoming}---after one remote changeset is added} 29.1130 -\label{sec:hook:incoming} 29.1131 - 29.1132 -This hook is run after a pre-existing changeset has been added to the 29.1133 -repository, for example via a \hgcmd{push}. If a group of changesets 29.1134 -was added in a single operation, this hook is called once for each 29.1135 -added changeset. 29.1136 - 29.1137 -You can use this hook for the same purposes as the \hook{changegroup} 29.1138 -hook (section~\ref{sec:hook:changegroup}); it's simply more convenient 29.1139 -sometimes to run a hook once per group of changesets, while other 29.1140 -times it's handier once per changeset. 29.1141 - 29.1142 -Parameters to this hook: 29.1143 -\begin{itemize} 29.1144 -\item[\texttt{node}] A changeset ID. The ID of the newly added 29.1145 - changeset. 29.1146 -\item[\texttt{source}] A string. The source of these changes. See 29.1147 - section~\ref{sec:hook:sources} for details. 29.1148 -\item[\texttt{url}] A URL. The location of the remote repository, if 29.1149 - known. See section~\ref{sec:hook:url} for more information. 29.1150 -\end{itemize} 29.1151 - 29.1152 -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}) \hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), \hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) 29.1153 - 29.1154 -\subsection{\hook{outgoing}---after changesets are propagated} 29.1155 -\label{sec:hook:outgoing} 29.1156 - 29.1157 -This hook is run after a group of changesets has been propagated out 29.1158 -of this repository, for example by a \hgcmd{push} or \hgcmd{bundle} 29.1159 -command. 29.1160 - 29.1161 -One possible use for this hook is to notify administrators that 29.1162 -changes have been pulled. 29.1163 - 29.1164 -Parameters to this hook: 29.1165 -\begin{itemize} 29.1166 -\item[\texttt{node}] A changeset ID. The changeset ID of the first 29.1167 - changeset of the group that was sent. 29.1168 -\item[\texttt{source}] A string. The source of the of the operation 29.1169 - (see section~\ref{sec:hook:sources}). If a remote client pulled 29.1170 - changes from this repository, \texttt{source} will be 29.1171 - \texttt{serve}. If the client that obtained changes from this 29.1172 - repository was local, \texttt{source} will be \texttt{bundle}, 29.1173 - \texttt{pull}, or \texttt{push}, depending on the operation the 29.1174 - client performed. 29.1175 -\item[\texttt{url}] A URL. The location of the remote repository, if 29.1176 - known. See section~\ref{sec:hook:url} for more information. 29.1177 -\end{itemize} 29.1178 - 29.1179 -See also: \hook{preoutgoing} (section~\ref{sec:hook:preoutgoing}) 29.1180 - 29.1181 -\subsection{\hook{prechangegroup}---before starting to add remote changesets} 29.1182 -\label{sec:hook:prechangegroup} 29.1183 - 29.1184 -This controlling hook is run before Mercurial begins to add a group of 29.1185 -changesets from another repository. 29.1186 - 29.1187 -This hook does not have any information about the changesets to be 29.1188 -added, because it is run before transmission of those changesets is 29.1189 -allowed to begin. If this hook fails, the changesets will not be 29.1190 -transmitted. 29.1191 - 29.1192 -One use for this hook is to prevent external changes from being added 29.1193 -to a repository. For example, you could use this to ``freeze'' a 29.1194 -server-hosted branch temporarily or permanently so that users cannot 29.1195 -push to it, while still allowing a local administrator to modify the 29.1196 -repository. 29.1197 - 29.1198 -Parameters to this hook: 29.1199 -\begin{itemize} 29.1200 -\item[\texttt{source}] A string. The source of these changes. See 29.1201 - section~\ref{sec:hook:sources} for details. 29.1202 -\item[\texttt{url}] A URL. The location of the remote repository, if 29.1203 - known. See section~\ref{sec:hook:url} for more information. 29.1204 -\end{itemize} 29.1205 - 29.1206 -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), 29.1207 -\hook{incoming} (section~\ref{sec:hook:incoming}), , 29.1208 -\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) 29.1209 - 29.1210 -\subsection{\hook{precommit}---before starting to commit a changeset} 29.1211 -\label{sec:hook:precommit} 29.1212 - 29.1213 -This hook is run before Mercurial begins to commit a new changeset. 29.1214 -It is run before Mercurial has any of the metadata for the commit, 29.1215 -such as the files to be committed, the commit message, or the commit 29.1216 -date. 29.1217 - 29.1218 -One use for this hook is to disable the ability to commit new 29.1219 -changesets, while still allowing incoming changesets. Another is to 29.1220 -run a build or test, and only allow the commit to begin if the build 29.1221 -or test succeeds. 29.1222 - 29.1223 -Parameters to this hook: 29.1224 -\begin{itemize} 29.1225 -\item[\texttt{parent1}] A changeset ID. The changeset ID of the first 29.1226 - parent of the working directory. 29.1227 -\item[\texttt{parent2}] A changeset ID. The changeset ID of the second 29.1228 - parent of the working directory. 29.1229 -\end{itemize} 29.1230 -If the commit proceeds, the parents of the working directory will 29.1231 -become the parents of the new changeset. 29.1232 - 29.1233 -See also: \hook{commit} (section~\ref{sec:hook:commit}), 29.1234 -\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) 29.1235 - 29.1236 -\subsection{\hook{preoutgoing}---before starting to propagate changesets} 29.1237 -\label{sec:hook:preoutgoing} 29.1238 - 29.1239 -This hook is invoked before Mercurial knows the identities of the 29.1240 -changesets to be transmitted. 29.1241 - 29.1242 -One use for this hook is to prevent changes from being transmitted to 29.1243 -another repository. 29.1244 - 29.1245 -Parameters to this hook: 29.1246 -\begin{itemize} 29.1247 -\item[\texttt{source}] A string. The source of the operation that is 29.1248 - attempting to obtain changes from this repository (see 29.1249 - section~\ref{sec:hook:sources}). See the documentation for the 29.1250 - \texttt{source} parameter to the \hook{outgoing} hook, in 29.1251 - section~\ref{sec:hook:outgoing}, for possible values of this 29.1252 - parameter. 29.1253 -\item[\texttt{url}] A URL. The location of the remote repository, if 29.1254 - known. See section~\ref{sec:hook:url} for more information. 29.1255 -\end{itemize} 29.1256 - 29.1257 -See also: \hook{outgoing} (section~\ref{sec:hook:outgoing}) 29.1258 - 29.1259 -\subsection{\hook{pretag}---before tagging a changeset} 29.1260 -\label{sec:hook:pretag} 29.1261 - 29.1262 -This controlling hook is run before a tag is created. If the hook 29.1263 -succeeds, creation of the tag proceeds. If the hook fails, the tag is 29.1264 -not created. 29.1265 - 29.1266 -Parameters to this hook: 29.1267 -\begin{itemize} 29.1268 -\item[\texttt{local}] A boolean. Whether the tag is local to this 29.1269 - repository instance (i.e.~stored in \sfilename{.hg/localtags}) or 29.1270 - managed by Mercurial (stored in \sfilename{.hgtags}). 29.1271 -\item[\texttt{node}] A changeset ID. The ID of the changeset to be tagged. 29.1272 -\item[\texttt{tag}] A string. The name of the tag to be created. 29.1273 -\end{itemize} 29.1274 - 29.1275 -If the tag to be created is revision-controlled, the \hook{precommit} 29.1276 -and \hook{pretxncommit} hooks (sections~\ref{sec:hook:commit} 29.1277 -and~\ref{sec:hook:pretxncommit}) will also be run. 29.1278 - 29.1279 -See also: \hook{tag} (section~\ref{sec:hook:tag}) 29.1280 - 29.1281 -\subsection{\hook{pretxnchangegroup}---before completing addition of 29.1282 - remote changesets} 29.1283 -\label{sec:hook:pretxnchangegroup} 29.1284 - 29.1285 -This controlling hook is run before a transaction---that manages the 29.1286 -addition of a group of new changesets from outside the 29.1287 -repository---completes. If the hook succeeds, the transaction 29.1288 -completes, and all of the changesets become permanent within this 29.1289 -repository. If the hook fails, the transaction is rolled back, and 29.1290 -the data for the changesets is erased. 29.1291 - 29.1292 -This hook can access the metadata associated with the almost-added 29.1293 -changesets, but it should not do anything permanent with this data. 29.1294 -It must also not modify the working directory. 29.1295 - 29.1296 -While this hook is running, if other Mercurial processes access this 29.1297 -repository, they will be able to see the almost-added changesets as if 29.1298 -they are permanent. This may lead to race conditions if you do not 29.1299 -take steps to avoid them. 29.1300 - 29.1301 -This hook can be used to automatically vet a group of changesets. If 29.1302 -the hook fails, all of the changesets are ``rejected'' when the 29.1303 -transaction rolls back. 29.1304 - 29.1305 -Parameters to this hook: 29.1306 -\begin{itemize} 29.1307 -\item[\texttt{node}] A changeset ID. The changeset ID of the first 29.1308 - changeset in the group that was added. All changesets between this 29.1309 - and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by 29.1310 - a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. 29.1311 -\item[\texttt{source}] A string. The source of these changes. See 29.1312 - section~\ref{sec:hook:sources} for details. 29.1313 -\item[\texttt{url}] A URL. The location of the remote repository, if 29.1314 - known. See section~\ref{sec:hook:url} for more information. 29.1315 -\end{itemize} 29.1316 - 29.1317 -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), 29.1318 -\hook{incoming} (section~\ref{sec:hook:incoming}), 29.1319 -\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}) 29.1320 - 29.1321 -\subsection{\hook{pretxncommit}---before completing commit of new changeset} 29.1322 -\label{sec:hook:pretxncommit} 29.1323 - 29.1324 -This controlling hook is run before a transaction---that manages a new 29.1325 -commit---completes. If the hook succeeds, the transaction completes 29.1326 -and the changeset becomes permanent within this repository. If the 29.1327 -hook fails, the transaction is rolled back, and the commit data is 29.1328 -erased. 29.1329 - 29.1330 -This hook can access the metadata associated with the almost-new 29.1331 -changeset, but it should not do anything permanent with this data. It 29.1332 -must also not modify the working directory. 29.1333 - 29.1334 -While this hook is running, if other Mercurial processes access this 29.1335 -repository, they will be able to see the almost-new changeset as if it 29.1336 -is permanent. This may lead to race conditions if you do not take 29.1337 -steps to avoid them. 29.1338 - 29.1339 -Parameters to this hook: 29.1340 -\begin{itemize} 29.1341 -\item[\texttt{node}] A changeset ID. The changeset ID of the newly 29.1342 - committed changeset. 29.1343 -\item[\texttt{parent1}] A changeset ID. The changeset ID of the first 29.1344 - parent of the newly committed changeset. 29.1345 -\item[\texttt{parent2}] A changeset ID. The changeset ID of the second 29.1346 - parent of the newly committed changeset. 29.1347 -\end{itemize} 29.1348 - 29.1349 -See also: \hook{precommit} (section~\ref{sec:hook:precommit}) 29.1350 - 29.1351 -\subsection{\hook{preupdate}---before updating or merging working directory} 29.1352 -\label{sec:hook:preupdate} 29.1353 - 29.1354 -This controlling hook is run before an update or merge of the working 29.1355 -directory begins. It is run only if Mercurial's normal pre-update 29.1356 -checks determine that the update or merge can proceed. If the hook 29.1357 -succeeds, the update or merge may proceed; if it fails, the update or 29.1358 -merge does not start. 29.1359 - 29.1360 -Parameters to this hook: 29.1361 -\begin{itemize} 29.1362 -\item[\texttt{parent1}] A changeset ID. The ID of the parent that the 29.1363 - working directory is to be updated to. If the working directory is 29.1364 - being merged, it will not change this parent. 29.1365 -\item[\texttt{parent2}] A changeset ID. Only set if the working 29.1366 - directory is being merged. The ID of the revision that the working 29.1367 - directory is being merged with. 29.1368 -\end{itemize} 29.1369 - 29.1370 -See also: \hook{update} (section~\ref{sec:hook:update}) 29.1371 - 29.1372 -\subsection{\hook{tag}---after tagging a changeset} 29.1373 -\label{sec:hook:tag} 29.1374 - 29.1375 -This hook is run after a tag has been created. 29.1376 - 29.1377 -Parameters to this hook: 29.1378 -\begin{itemize} 29.1379 -\item[\texttt{local}] A boolean. Whether the new tag is local to this 29.1380 - repository instance (i.e.~stored in \sfilename{.hg/localtags}) or 29.1381 - managed by Mercurial (stored in \sfilename{.hgtags}). 29.1382 -\item[\texttt{node}] A changeset ID. The ID of the changeset that was 29.1383 - tagged. 29.1384 -\item[\texttt{tag}] A string. The name of the tag that was created. 29.1385 -\end{itemize} 29.1386 - 29.1387 -If the created tag is revision-controlled, the \hook{commit} hook 29.1388 -(section~\ref{sec:hook:commit}) is run before this hook. 29.1389 - 29.1390 -See also: \hook{pretag} (section~\ref{sec:hook:pretag}) 29.1391 - 29.1392 -\subsection{\hook{update}---after updating or merging working directory} 29.1393 -\label{sec:hook:update} 29.1394 - 29.1395 -This hook is run after an update or merge of the working directory 29.1396 -completes. Since a merge can fail (if the external \command{hgmerge} 29.1397 -command fails to resolve conflicts in a file), this hook communicates 29.1398 -whether the update or merge completed cleanly. 29.1399 - 29.1400 -\begin{itemize} 29.1401 -\item[\texttt{error}] A boolean. Indicates whether the update or 29.1402 - merge completed successfully. 29.1403 -\item[\texttt{parent1}] A changeset ID. The ID of the parent that the 29.1404 - working directory was updated to. If the working directory was 29.1405 - merged, it will not have changed this parent. 29.1406 -\item[\texttt{parent2}] A changeset ID. Only set if the working 29.1407 - directory was merged. The ID of the revision that the working 29.1408 - directory was merged with. 29.1409 -\end{itemize} 29.1410 - 29.1411 -See also: \hook{preupdate} (section~\ref{sec:hook:preupdate}) 29.1412 - 29.1413 -%%% Local Variables: 29.1414 -%%% mode: latex 29.1415 -%%% TeX-master: "00book" 29.1416 -%%% End:
30.1 --- a/en/intro.tex Thu Jan 29 22:47:34 2009 -0800 30.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 30.3 @@ -1,561 +0,0 @@ 30.4 -\chapter{Introduction} 30.5 -\label{chap:intro} 30.6 - 30.7 -\section{About revision control} 30.8 - 30.9 -Revision control is the process of managing multiple versions of a 30.10 -piece of information. In its simplest form, this is something that 30.11 -many people do by hand: every time you modify a file, save it under a 30.12 -new name that contains a number, each one higher than the number of 30.13 -the preceding version. 30.14 - 30.15 -Manually managing multiple versions of even a single file is an 30.16 -error-prone task, though, so software tools to help automate this 30.17 -process have long been available. The earliest automated revision 30.18 -control tools were intended to help a single user to manage revisions 30.19 -of a single file. Over the past few decades, the scope of revision 30.20 -control tools has expanded greatly; they now manage multiple files, 30.21 -and help multiple people to work together. The best modern revision 30.22 -control tools have no problem coping with thousands of people working 30.23 -together on projects that consist of hundreds of thousands of files. 30.24 - 30.25 -\subsection{Why use revision control?} 30.26 - 30.27 -There are a number of reasons why you or your team might want to use 30.28 -an automated revision control tool for a project. 30.29 -\begin{itemize} 30.30 -\item It will track the history and evolution of your project, so you 30.31 - don't have to. For every change, you'll have a log of \emph{who} 30.32 - made it; \emph{why} they made it; \emph{when} they made it; and 30.33 - \emph{what} the change was. 30.34 -\item When you're working with other people, revision control software 30.35 - makes it easier for you to collaborate. For example, when people 30.36 - more or less simultaneously make potentially incompatible changes, 30.37 - the software will help you to identify and resolve those conflicts. 30.38 -\item It can help you to recover from mistakes. If you make a change 30.39 - that later turns out to be in error, you can revert to an earlier 30.40 - version of one or more files. In fact, a \emph{really} good 30.41 - revision control tool will even help you to efficiently figure out 30.42 - exactly when a problem was introduced (see 30.43 - section~\ref{sec:undo:bisect} for details). 30.44 -\item It will help you to work simultaneously on, and manage the drift 30.45 - between, multiple versions of your project. 30.46 -\end{itemize} 30.47 -Most of these reasons are equally valid---at least in theory---whether 30.48 -you're working on a project by yourself, or with a hundred other 30.49 -people. 30.50 - 30.51 -A key question about the practicality of revision control at these two 30.52 -different scales (``lone hacker'' and ``huge team'') is how its 30.53 -\emph{benefits} compare to its \emph{costs}. A revision control tool 30.54 -that's difficult to understand or use is going to impose a high cost. 30.55 - 30.56 -A five-hundred-person project is likely to collapse under its own 30.57 -weight almost immediately without a revision control tool and process. 30.58 -In this case, the cost of using revision control might hardly seem 30.59 -worth considering, since \emph{without} it, failure is almost 30.60 -guaranteed. 30.61 - 30.62 -On the other hand, a one-person ``quick hack'' might seem like a poor 30.63 -place to use a revision control tool, because surely the cost of using 30.64 -one must be close to the overall cost of the project. Right? 30.65 - 30.66 -Mercurial uniquely supports \emph{both} of these scales of 30.67 -development. You can learn the basics in just a few minutes, and due 30.68 -to its low overhead, you can apply revision control to the smallest of 30.69 -projects with ease. Its simplicity means you won't have a lot of 30.70 -abstruse concepts or command sequences competing for mental space with 30.71 -whatever you're \emph{really} trying to do. At the same time, 30.72 -Mercurial's high performance and peer-to-peer nature let you scale 30.73 -painlessly to handle large projects. 30.74 - 30.75 -No revision control tool can rescue a poorly run project, but a good 30.76 -choice of tools can make a huge difference to the fluidity with which 30.77 -you can work on a project. 30.78 - 30.79 -\subsection{The many names of revision control} 30.80 - 30.81 -Revision control is a diverse field, so much so that it doesn't 30.82 -actually have a single name or acronym. Here are a few of the more 30.83 -common names and acronyms you'll encounter: 30.84 -\begin{itemize} 30.85 -\item Revision control (RCS) 30.86 -\item Software configuration management (SCM), or configuration management 30.87 -\item Source code management 30.88 -\item Source code control, or source control 30.89 -\item Version control (VCS) 30.90 -\end{itemize} 30.91 -Some people claim that these terms actually have different meanings, 30.92 -but in practice they overlap so much that there's no agreed or even 30.93 -useful way to tease them apart. 30.94 - 30.95 -\section{A short history of revision control} 30.96 - 30.97 -The best known of the old-time revision control tools is SCCS (Source 30.98 -Code Control System), which Marc Rochkind wrote at Bell Labs, in the 30.99 -early 1970s. SCCS operated on individual files, and required every 30.100 -person working on a project to have access to a shared workspace on a 30.101 -single system. Only one person could modify a file at any time; 30.102 -arbitration for access to files was via locks. It was common for 30.103 -people to lock files, and later forget to unlock them, preventing 30.104 -anyone else from modifying those files without the help of an 30.105 -administrator. 30.106 - 30.107 -Walter Tichy developed a free alternative to SCCS in the early 1980s; 30.108 -he called his program RCS (Revison Control System). Like SCCS, RCS 30.109 -required developers to work in a single shared workspace, and to lock 30.110 -files to prevent multiple people from modifying them simultaneously. 30.111 - 30.112 -Later in the 1980s, Dick Grune used RCS as a building block for a set 30.113 -of shell scripts he initially called cmt, but then renamed to CVS 30.114 -(Concurrent Versions System). The big innovation of CVS was that it 30.115 -let developers work simultaneously and somewhat independently in their 30.116 -own personal workspaces. The personal workspaces prevented developers 30.117 -from stepping on each other's toes all the time, as was common with 30.118 -SCCS and RCS. Each developer had a copy of every project file, and 30.119 -could modify their copies independently. They had to merge their 30.120 -edits prior to committing changes to the central repository. 30.121 - 30.122 -Brian Berliner took Grune's original scripts and rewrote them in~C, 30.123 -releasing in 1989 the code that has since developed into the modern 30.124 -version of CVS. CVS subsequently acquired the ability to operate over 30.125 -a network connection, giving it a client/server architecture. CVS's 30.126 -architecture is centralised; only the server has a copy of the history 30.127 -of the project. Client workspaces just contain copies of recent 30.128 -versions of the project's files, and a little metadata to tell them 30.129 -where the server is. CVS has been enormously successful; it is 30.130 -probably the world's most widely used revision control system. 30.131 - 30.132 -In the early 1990s, Sun Microsystems developed an early distributed 30.133 -revision control system, called TeamWare. A TeamWare workspace 30.134 -contains a complete copy of the project's history. TeamWare has no 30.135 -notion of a central repository. (CVS relied upon RCS for its history 30.136 -storage; TeamWare used SCCS.) 30.137 - 30.138 -As the 1990s progressed, awareness grew of a number of problems with 30.139 -CVS. It records simultaneous changes to multiple files individually, 30.140 -instead of grouping them together as a single logically atomic 30.141 -operation. It does not manage its file hierarchy well; it is easy to 30.142 -make a mess of a repository by renaming files and directories. Worse, 30.143 -its source code is difficult to read and maintain, which made the 30.144 -``pain level'' of fixing these architectural problems prohibitive. 30.145 - 30.146 -In 2001, Jim Blandy and Karl Fogel, two developers who had worked on 30.147 -CVS, started a project to replace it with a tool that would have a 30.148 -better architecture and cleaner code. The result, Subversion, does 30.149 -not stray from CVS's centralised client/server model, but it adds 30.150 -multi-file atomic commits, better namespace management, and a number 30.151 -of other features that make it a generally better tool than CVS. 30.152 -Since its initial release, it has rapidly grown in popularity. 30.153 - 30.154 -More or less simultaneously, Graydon Hoare began working on an 30.155 -ambitious distributed revision control system that he named Monotone. 30.156 -While Monotone addresses many of CVS's design flaws and has a 30.157 -peer-to-peer architecture, it goes beyond earlier (and subsequent) 30.158 -revision control tools in a number of innovative ways. It uses 30.159 -cryptographic hashes as identifiers, and has an integral notion of 30.160 -``trust'' for code from different sources. 30.161 - 30.162 -Mercurial began life in 2005. While a few aspects of its design are 30.163 -influenced by Monotone, Mercurial focuses on ease of use, high 30.164 -performance, and scalability to very large projects. 30.165 - 30.166 -\section{Trends in revision control} 30.167 - 30.168 -There has been an unmistakable trend in the development and use of 30.169 -revision control tools over the past four decades, as people have 30.170 -become familiar with the capabilities of their tools and constrained 30.171 -by their limitations. 30.172 - 30.173 -The first generation began by managing single files on individual 30.174 -computers. Although these tools represented a huge advance over 30.175 -ad-hoc manual revision control, their locking model and reliance on a 30.176 -single computer limited them to small, tightly-knit teams. 30.177 - 30.178 -The second generation loosened these constraints by moving to 30.179 -network-centered architectures, and managing entire projects at a 30.180 -time. As projects grew larger, they ran into new problems. With 30.181 -clients needing to talk to servers very frequently, server scaling 30.182 -became an issue for large projects. An unreliable network connection 30.183 -could prevent remote users from being able to talk to the server at 30.184 -all. As open source projects started making read-only access 30.185 -available anonymously to anyone, people without commit privileges 30.186 -found that they could not use the tools to interact with a project in 30.187 -a natural way, as they could not record their changes. 30.188 - 30.189 -The current generation of revision control tools is peer-to-peer in 30.190 -nature. All of these systems have dropped the dependency on a single 30.191 -central server, and allow people to distribute their revision control 30.192 -data to where it's actually needed. Collaboration over the Internet 30.193 -has moved from constrained by technology to a matter of choice and 30.194 -consensus. Modern tools can operate offline indefinitely and 30.195 -autonomously, with a network connection only needed when syncing 30.196 -changes with another repository. 30.197 - 30.198 -\section{A few of the advantages of distributed revision control} 30.199 - 30.200 -Even though distributed revision control tools have for several years 30.201 -been as robust and usable as their previous-generation counterparts, 30.202 -people using older tools have not yet necessarily woken up to their 30.203 -advantages. There are a number of ways in which distributed tools 30.204 -shine relative to centralised ones. 30.205 - 30.206 -For an individual developer, distributed tools are almost always much 30.207 -faster than centralised tools. This is for a simple reason: a 30.208 -centralised tool needs to talk over the network for many common 30.209 -operations, because most metadata is stored in a single copy on the 30.210 -central server. A distributed tool stores all of its metadata 30.211 -locally. All else being equal, talking over the network adds overhead 30.212 -to a centralised tool. Don't underestimate the value of a snappy, 30.213 -responsive tool: you're going to spend a lot of time interacting with 30.214 -your revision control software. 30.215 - 30.216 -Distributed tools are indifferent to the vagaries of your server 30.217 -infrastructure, again because they replicate metadata to so many 30.218 -locations. If you use a centralised system and your server catches 30.219 -fire, you'd better hope that your backup media are reliable, and that 30.220 -your last backup was recent and actually worked. With a distributed 30.221 -tool, you have many backups available on every contributor's computer. 30.222 - 30.223 -The reliability of your network will affect distributed tools far less 30.224 -than it will centralised tools. You can't even use a centralised tool 30.225 -without a network connection, except for a few highly constrained 30.226 -commands. With a distributed tool, if your network connection goes 30.227 -down while you're working, you may not even notice. The only thing 30.228 -you won't be able to do is talk to repositories on other computers, 30.229 -something that is relatively rare compared with local operations. If 30.230 -you have a far-flung team of collaborators, this may be significant. 30.231 - 30.232 -\subsection{Advantages for open source projects} 30.233 - 30.234 -If you take a shine to an open source project and decide that you 30.235 -would like to start hacking on it, and that project uses a distributed 30.236 -revision control tool, you are at once a peer with the people who 30.237 -consider themselves the ``core'' of that project. If they publish 30.238 -their repositories, you can immediately copy their project history, 30.239 -start making changes, and record your work, using the same tools in 30.240 -the same ways as insiders. By contrast, with a centralised tool, you 30.241 -must use the software in a ``read only'' mode unless someone grants 30.242 -you permission to commit changes to their central server. Until then, 30.243 -you won't be able to record changes, and your local modifications will 30.244 -be at risk of corruption any time you try to update your client's view 30.245 -of the repository. 30.246 - 30.247 -\subsubsection{The forking non-problem} 30.248 - 30.249 -It has been suggested that distributed revision control tools pose 30.250 -some sort of risk to open source projects because they make it easy to 30.251 -``fork'' the development of a project. A fork happens when there are 30.252 -differences in opinion or attitude between groups of developers that 30.253 -cause them to decide that they can't work together any longer. Each 30.254 -side takes a more or less complete copy of the project's source code, 30.255 -and goes off in its own direction. 30.256 - 30.257 -Sometimes the camps in a fork decide to reconcile their differences. 30.258 -With a centralised revision control system, the \emph{technical} 30.259 -process of reconciliation is painful, and has to be performed largely 30.260 -by hand. You have to decide whose revision history is going to 30.261 -``win'', and graft the other team's changes into the tree somehow. 30.262 -This usually loses some or all of one side's revision history. 30.263 - 30.264 -What distributed tools do with respect to forking is they make forking 30.265 -the \emph{only} way to develop a project. Every single change that 30.266 -you make is potentially a fork point. The great strength of this 30.267 -approach is that a distributed revision control tool has to be really 30.268 -good at \emph{merging} forks, because forks are absolutely 30.269 -fundamental: they happen all the time. 30.270 - 30.271 -If every piece of work that everybody does, all the time, is framed in 30.272 -terms of forking and merging, then what the open source world refers 30.273 -to as a ``fork'' becomes \emph{purely} a social issue. If anything, 30.274 -distributed tools \emph{lower} the likelihood of a fork: 30.275 -\begin{itemize} 30.276 -\item They eliminate the social distinction that centralised tools 30.277 - impose: that between insiders (people with commit access) and 30.278 - outsiders (people without). 30.279 -\item They make it easier to reconcile after a social fork, because 30.280 - all that's involved from the perspective of the revision control 30.281 - software is just another merge. 30.282 -\end{itemize} 30.283 - 30.284 -Some people resist distributed tools because they want to retain tight 30.285 -control over their projects, and they believe that centralised tools 30.286 -give them this control. However, if you're of this belief, and you 30.287 -publish your CVS or Subversion repositories publically, there are 30.288 -plenty of tools available that can pull out your entire project's 30.289 -history (albeit slowly) and recreate it somewhere that you don't 30.290 -control. So while your control in this case is illusory, you are 30.291 -forgoing the ability to fluidly collaborate with whatever people feel 30.292 -compelled to mirror and fork your history. 30.293 - 30.294 -\subsection{Advantages for commercial projects} 30.295 - 30.296 -Many commercial projects are undertaken by teams that are scattered 30.297 -across the globe. Contributors who are far from a central server will 30.298 -see slower command execution and perhaps less reliability. Commercial 30.299 -revision control systems attempt to ameliorate these problems with 30.300 -remote-site replication add-ons that are typically expensive to buy 30.301 -and cantankerous to administer. A distributed system doesn't suffer 30.302 -from these problems in the first place. Better yet, you can easily 30.303 -set up multiple authoritative servers, say one per site, so that 30.304 -there's no redundant communication between repositories over expensive 30.305 -long-haul network links. 30.306 - 30.307 -Centralised revision control systems tend to have relatively low 30.308 -scalability. It's not unusual for an expensive centralised system to 30.309 -fall over under the combined load of just a few dozen concurrent 30.310 -users. Once again, the typical response tends to be an expensive and 30.311 -clunky replication facility. Since the load on a central server---if 30.312 -you have one at all---is many times lower with a distributed 30.313 -tool (because all of the data is replicated everywhere), a single 30.314 -cheap server can handle the needs of a much larger team, and 30.315 -replication to balance load becomes a simple matter of scripting. 30.316 - 30.317 -If you have an employee in the field, troubleshooting a problem at a 30.318 -customer's site, they'll benefit from distributed revision control. 30.319 -The tool will let them generate custom builds, try different fixes in 30.320 -isolation from each other, and search efficiently through history for 30.321 -the sources of bugs and regressions in the customer's environment, all 30.322 -without needing to connect to your company's network. 30.323 - 30.324 -\section{Why choose Mercurial?} 30.325 - 30.326 -Mercurial has a unique set of properties that make it a particularly 30.327 -good choice as a revision control system. 30.328 -\begin{itemize} 30.329 -\item It is easy to learn and use. 30.330 -\item It is lightweight. 30.331 -\item It scales excellently. 30.332 -\item It is easy to customise. 30.333 -\end{itemize} 30.334 - 30.335 -If you are at all familiar with revision control systems, you should 30.336 -be able to get up and running with Mercurial in less than five 30.337 -minutes. Even if not, it will take no more than a few minutes 30.338 -longer. Mercurial's command and feature sets are generally uniform 30.339 -and consistent, so you can keep track of a few general rules instead 30.340 -of a host of exceptions. 30.341 - 30.342 -On a small project, you can start working with Mercurial in moments. 30.343 -Creating new changes and branches; transferring changes around 30.344 -(whether locally or over a network); and history and status operations 30.345 -are all fast. Mercurial attempts to stay nimble and largely out of 30.346 -your way by combining low cognitive overhead with blazingly fast 30.347 -operations. 30.348 - 30.349 -The usefulness of Mercurial is not limited to small projects: it is 30.350 -used by projects with hundreds to thousands of contributors, each 30.351 -containing tens of thousands of files and hundreds of megabytes of 30.352 -source code. 30.353 - 30.354 -If the core functionality of Mercurial is not enough for you, it's 30.355 -easy to build on. Mercurial is well suited to scripting tasks, and 30.356 -its clean internals and implementation in Python make it easy to add 30.357 -features in the form of extensions. There are a number of popular and 30.358 -useful extensions already available, ranging from helping to identify 30.359 -bugs to improving performance. 30.360 - 30.361 -\section{Mercurial compared with other tools} 30.362 - 30.363 -Before you read on, please understand that this section necessarily 30.364 -reflects my own experiences, interests, and (dare I say it) biases. I 30.365 -have used every one of the revision control tools listed below, in 30.366 -most cases for several years at a time. 30.367 - 30.368 - 30.369 -\subsection{Subversion} 30.370 - 30.371 -Subversion is a popular revision control tool, developed to replace 30.372 -CVS. It has a centralised client/server architecture. 30.373 - 30.374 -Subversion and Mercurial have similarly named commands for performing 30.375 -the same operations, so if you're familiar with one, it is easy to 30.376 -learn to use the other. Both tools are portable to all popular 30.377 -operating systems. 30.378 - 30.379 -Prior to version 1.5, Subversion had no useful support for merges. 30.380 -At the time of writing, its merge tracking capability is new, and known to be 30.381 -\href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated 30.382 - and buggy}. 30.383 - 30.384 -Mercurial has a substantial performance advantage over Subversion on 30.385 -every revision control operation I have benchmarked. I have measured 30.386 -its advantage as ranging from a factor of two to a factor of six when 30.387 -compared with Subversion~1.4.3's \emph{ra\_local} file store, which is 30.388 -the fastest access method available. In more realistic deployments 30.389 -involving a network-based store, Subversion will be at a substantially 30.390 -larger disadvantage. Because many Subversion commands must talk to 30.391 -the server and Subversion does not have useful replication facilities, 30.392 -server capacity and network bandwidth become bottlenecks for modestly 30.393 -large projects. 30.394 - 30.395 -Additionally, Subversion incurs substantial storage overhead to avoid 30.396 -network transactions for a few common operations, such as finding 30.397 -modified files (\texttt{status}) and displaying modifications against 30.398 -the current revision (\texttt{diff}). As a result, a Subversion 30.399 -working copy is often the same size as, or larger than, a Mercurial 30.400 -repository and working directory, even though the Mercurial repository 30.401 -contains a complete history of the project. 30.402 - 30.403 -Subversion is widely supported by third party tools. Mercurial 30.404 -currently lags considerably in this area. This gap is closing, 30.405 -however, and indeed some of Mercurial's GUI tools now outshine their 30.406 -Subversion equivalents. Like Mercurial, Subversion has an excellent 30.407 -user manual. 30.408 - 30.409 -Because Subversion doesn't store revision history on the client, it is 30.410 -well suited to managing projects that deal with lots of large, opaque 30.411 -binary files. If you check in fifty revisions to an incompressible 30.412 -10MB file, Subversion's client-side space usage stays constant The 30.413 -space used by any distributed SCM will grow rapidly in proportion to 30.414 -the number of revisions, because the differences between each revision 30.415 -are large. 30.416 - 30.417 -In addition, it's often difficult or, more usually, impossible to 30.418 -merge different versions of a binary file. Subversion's ability to 30.419 -let a user lock a file, so that they temporarily have the exclusive 30.420 -right to commit changes to it, can be a significant advantage to a 30.421 -project where binary files are widely used. 30.422 - 30.423 -Mercurial can import revision history from a Subversion repository. 30.424 -It can also export revision history to a Subversion repository. This 30.425 -makes it easy to ``test the waters'' and use Mercurial and Subversion 30.426 -in parallel before deciding to switch. History conversion is 30.427 -incremental, so you can perform an initial conversion, then small 30.428 -additional conversions afterwards to bring in new changes. 30.429 - 30.430 - 30.431 -\subsection{Git} 30.432 - 30.433 -Git is a distributed revision control tool that was developed for 30.434 -managing the Linux kernel source tree. Like Mercurial, its early 30.435 -design was somewhat influenced by Monotone. 30.436 - 30.437 -Git has a very large command set, with version~1.5.0 providing~139 30.438 -individual commands. It has something of a reputation for being 30.439 -difficult to learn. Compared to Git, Mercurial has a strong focus on 30.440 -simplicity. 30.441 - 30.442 -In terms of performance, Git is extremely fast. In several cases, it 30.443 -is faster than Mercurial, at least on Linux, while Mercurial performs 30.444 -better on other operations. However, on Windows, the performance and 30.445 -general level of support that Git provides is, at the time of writing, 30.446 -far behind that of Mercurial. 30.447 - 30.448 -While a Mercurial repository needs no maintenance, a Git repository 30.449 -requires frequent manual ``repacks'' of its metadata. Without these, 30.450 -performance degrades, while space usage grows rapidly. A server that 30.451 -contains many Git repositories that are not rigorously and frequently 30.452 -repacked will become heavily disk-bound during backups, and there have 30.453 -been instances of daily backups taking far longer than~24 hours as a 30.454 -result. A freshly packed Git repository is slightly smaller than a 30.455 -Mercurial repository, but an unpacked repository is several orders of 30.456 -magnitude larger. 30.457 - 30.458 -The core of Git is written in C. Many Git commands are implemented as 30.459 -shell or Perl scripts, and the quality of these scripts varies widely. 30.460 -I have encountered several instances where scripts charged along 30.461 -blindly in the presence of errors that should have been fatal. 30.462 - 30.463 -Mercurial can import revision history from a Git repository. 30.464 - 30.465 - 30.466 -\subsection{CVS} 30.467 - 30.468 -CVS is probably the most widely used revision control tool in the 30.469 -world. Due to its age and internal untidiness, it has been only 30.470 -lightly maintained for many years. 30.471 - 30.472 -It has a centralised client/server architecture. It does not group 30.473 -related file changes into atomic commits, making it easy for people to 30.474 -``break the build'': one person can successfully commit part of a 30.475 -change and then be blocked by the need for a merge, causing other 30.476 -people to see only a portion of the work they intended to do. This 30.477 -also affects how you work with project history. If you want to see 30.478 -all of the modifications someone made as part of a task, you will need 30.479 -to manually inspect the descriptions and timestamps of the changes 30.480 -made to each file involved (if you even know what those files were). 30.481 - 30.482 -CVS has a muddled notion of tags and branches that I will not attempt 30.483 -to even describe. It does not support renaming of files or 30.484 -directories well, making it easy to corrupt a repository. It has 30.485 -almost no internal consistency checking capabilities, so it is usually 30.486 -not even possible to tell whether or how a repository is corrupt. I 30.487 -would not recommend CVS for any project, existing or new. 30.488 - 30.489 -Mercurial can import CVS revision history. However, there are a few 30.490 -caveats that apply; these are true of every other revision control 30.491 -tool's CVS importer, too. Due to CVS's lack of atomic changes and 30.492 -unversioned filesystem hierarchy, it is not possible to reconstruct 30.493 -CVS history completely accurately; some guesswork is involved, and 30.494 -renames will usually not show up. Because a lot of advanced CVS 30.495 -administration has to be done by hand and is hence error-prone, it's 30.496 -common for CVS importers to run into multiple problems with corrupted 30.497 -repositories (completely bogus revision timestamps and files that have 30.498 -remained locked for over a decade are just two of the less interesting 30.499 -problems I can recall from personal experience). 30.500 - 30.501 -Mercurial can import revision history from a CVS repository. 30.502 - 30.503 - 30.504 -\subsection{Commercial tools} 30.505 - 30.506 -Perforce has a centralised client/server architecture, with no 30.507 -client-side caching of any data. Unlike modern revision control 30.508 -tools, Perforce requires that a user run a command to inform the 30.509 -server about every file they intend to edit. 30.510 - 30.511 -The performance of Perforce is quite good for small teams, but it 30.512 -falls off rapidly as the number of users grows beyond a few dozen. 30.513 -Modestly large Perforce installations require the deployment of 30.514 -proxies to cope with the load their users generate. 30.515 - 30.516 - 30.517 -\subsection{Choosing a revision control tool} 30.518 - 30.519 -With the exception of CVS, all of the tools listed above have unique 30.520 -strengths that suit them to particular styles of work. There is no 30.521 -single revision control tool that is best in all situations. 30.522 - 30.523 -As an example, Subversion is a good choice for working with frequently 30.524 -edited binary files, due to its centralised nature and support for 30.525 -file locking. 30.526 - 30.527 -I personally find Mercurial's properties of simplicity, performance, 30.528 -and good merge support to be a compelling combination that has served 30.529 -me well for several years. 30.530 - 30.531 - 30.532 -\section{Switching from another tool to Mercurial} 30.533 - 30.534 -Mercurial is bundled with an extension named \hgext{convert}, which 30.535 -can incrementally import revision history from several other revision 30.536 -control tools. By ``incremental'', I mean that you can convert all of 30.537 -a project's history to date in one go, then rerun the conversion later 30.538 -to obtain new changes that happened after the initial conversion. 30.539 - 30.540 -The revision control tools supported by \hgext{convert} are as 30.541 -follows: 30.542 -\begin{itemize} 30.543 -\item Subversion 30.544 -\item CVS 30.545 -\item Git 30.546 -\item Darcs 30.547 -\end{itemize} 30.548 - 30.549 -In addition, \hgext{convert} can export changes from Mercurial to 30.550 -Subversion. This makes it possible to try Subversion and Mercurial in 30.551 -parallel before committing to a switchover, without risking the loss 30.552 -of any work. 30.553 - 30.554 -The \hgxcmd{conver}{convert} command is easy to use. Simply point it 30.555 -at the path or URL of the source repository, optionally give it the 30.556 -name of the destination repository, and it will start working. After 30.557 -the initial conversion, just run the same command again to import new 30.558 -changes. 30.559 - 30.560 - 30.561 -%%% Local Variables: 30.562 -%%% mode: latex 30.563 -%%% TeX-master: "00book" 30.564 -%%% End:
31.1 --- a/en/license.tex Thu Jan 29 22:47:34 2009 -0800 31.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 31.3 @@ -1,138 +0,0 @@ 31.4 -\chapter{Open Publication License} 31.5 -\label{cha:opl} 31.6 - 31.7 -Version 1.0, 8 June 1999 31.8 - 31.9 -\section{Requirements on both unmodified and modified versions} 31.10 - 31.11 -The Open Publication works may be reproduced and distributed in whole 31.12 -or in part, in any medium physical or electronic, provided that the 31.13 -terms of this license are adhered to, and that this license or an 31.14 -incorporation of it by reference (with any options elected by the 31.15 -author(s) and/or publisher) is displayed in the reproduction. 31.16 - 31.17 -Proper form for an incorporation by reference is as follows: 31.18 - 31.19 -\begin{quote} 31.20 - Copyright (c) \emph{year} by \emph{author's name or designee}. This 31.21 - material may be distributed only subject to the terms and conditions 31.22 - set forth in the Open Publication License, v\emph{x.y} or later (the 31.23 - latest version is presently available at 31.24 - \url{http://www.opencontent.org/openpub/}). 31.25 -\end{quote} 31.26 - 31.27 -The reference must be immediately followed with any options elected by 31.28 -the author(s) and/or publisher of the document (see 31.29 -section~\ref{sec:opl:options}). 31.30 - 31.31 -Commercial redistribution of Open Publication-licensed material is 31.32 -permitted. 31.33 - 31.34 -Any publication in standard (paper) book form shall require the 31.35 -citation of the original publisher and author. The publisher and 31.36 -author's names shall appear on all outer surfaces of the book. On all 31.37 -outer surfaces of the book the original publisher's name shall be as 31.38 -large as the title of the work and cited as possessive with respect to 31.39 -the title. 31.40 - 31.41 -\section{Copyright} 31.42 - 31.43 -The copyright to each Open Publication is owned by its author(s) or 31.44 -designee. 31.45 - 31.46 -\section{Scope of license} 31.47 - 31.48 -The following license terms apply to all Open Publication works, 31.49 -unless otherwise explicitly stated in the document. 31.50 - 31.51 -Mere aggregation of Open Publication works or a portion of an Open 31.52 -Publication work with other works or programs on the same media shall 31.53 -not cause this license to apply to those other works. The aggregate 31.54 -work shall contain a notice specifying the inclusion of the Open 31.55 -Publication material and appropriate copyright notice. 31.56 - 31.57 -\textbf{Severability}. If any part of this license is found to be 31.58 -unenforceable in any jurisdiction, the remaining portions of the 31.59 -license remain in force. 31.60 - 31.61 -\textbf{No warranty}. Open Publication works are licensed and provided 31.62 -``as is'' without warranty of any kind, express or implied, including, 31.63 -but not limited to, the implied warranties of merchantability and 31.64 -fitness for a particular purpose or a warranty of non-infringement. 31.65 - 31.66 -\section{Requirements on modified works} 31.67 - 31.68 -All modified versions of documents covered by this license, including 31.69 -translations, anthologies, compilations and partial documents, must 31.70 -meet the following requirements: 31.71 - 31.72 -\begin{enumerate} 31.73 -\item The modified version must be labeled as such. 31.74 -\item The person making the modifications must be identified and the 31.75 - modifications dated. 31.76 -\item Acknowledgement of the original author and publisher if 31.77 - applicable must be retained according to normal academic citation 31.78 - practices. 31.79 -\item The location of the original unmodified document must be 31.80 - identified. 31.81 -\item The original author's (or authors') name(s) may not be used to 31.82 - assert or imply endorsement of the resulting document without the 31.83 - original author's (or authors') permission. 31.84 -\end{enumerate} 31.85 - 31.86 -\section{Good-practice recommendations} 31.87 - 31.88 -In addition to the requirements of this license, it is requested from 31.89 -and strongly recommended of redistributors that: 31.90 - 31.91 -\begin{enumerate} 31.92 -\item If you are distributing Open Publication works on hardcopy or 31.93 - CD-ROM, you provide email notification to the authors of your intent 31.94 - to redistribute at least thirty days before your manuscript or media 31.95 - freeze, to give the authors time to provide updated documents. This 31.96 - notification should describe modifications, if any, made to the 31.97 - document. 31.98 -\item All substantive modifications (including deletions) be either 31.99 - clearly marked up in the document or else described in an attachment 31.100 - to the document. 31.101 -\item Finally, while it is not mandatory under this license, it is 31.102 - considered good form to offer a free copy of any hardcopy and CD-ROM 31.103 - expression of an Open Publication-licensed work to its author(s). 31.104 -\end{enumerate} 31.105 - 31.106 -\section{License options} 31.107 -\label{sec:opl:options} 31.108 - 31.109 -The author(s) and/or publisher of an Open Publication-licensed 31.110 -document may elect certain options by appending language to the 31.111 -reference to or copy of the license. These options are considered part 31.112 -of the license instance and must be included with the license (or its 31.113 -incorporation by reference) in derived works. 31.114 - 31.115 -\begin{enumerate}[A] 31.116 -\item To prohibit distribution of substantively modified versions 31.117 - without the explicit permission of the author(s). ``Substantive 31.118 - modification'' is defined as a change to the semantic content of the 31.119 - document, and excludes mere changes in format or typographical 31.120 - corrections. 31.121 - 31.122 - To accomplish this, add the phrase ``Distribution of substantively 31.123 - modified versions of this document is prohibited without the 31.124 - explicit permission of the copyright holder.'' to the license 31.125 - reference or copy. 31.126 - 31.127 -\item To prohibit any publication of this work or derivative works in 31.128 - whole or in part in standard (paper) book form for commercial 31.129 - purposes is prohibited unless prior permission is obtained from the 31.130 - copyright holder. 31.131 - 31.132 - To accomplish this, add the phrase ``Distribution of the work or 31.133 - derivative of the work in any standard (paper) book form is 31.134 - prohibited unless prior permission is obtained from the copyright 31.135 - holder.'' to the license reference or copy. 31.136 -\end{enumerate} 31.137 - 31.138 -%%% Local Variables: 31.139 -%%% mode: latex 31.140 -%%% TeX-master: "00book" 31.141 -%%% End:
32.1 --- a/en/mq-collab.tex Thu Jan 29 22:47:34 2009 -0800 32.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 32.3 @@ -1,393 +0,0 @@ 32.4 -\chapter{Advanced uses of Mercurial Queues} 32.5 -\label{chap:mq-collab} 32.6 - 32.7 -While it's easy to pick up straightforward uses of Mercurial Queues, 32.8 -use of a little discipline and some of MQ's less frequently used 32.9 -capabilities makes it possible to work in complicated development 32.10 -environments. 32.11 - 32.12 -In this chapter, I will use as an example a technique I have used to 32.13 -manage the development of an Infiniband device driver for the Linux 32.14 -kernel. The driver in question is large (at least as drivers go), 32.15 -with 25,000 lines of code spread across 35 source files. It is 32.16 -maintained by a small team of developers. 32.17 - 32.18 -While much of the material in this chapter is specific to Linux, the 32.19 -same principles apply to any code base for which you're not the 32.20 -primary owner, and upon which you need to do a lot of development. 32.21 - 32.22 -\section{The problem of many targets} 32.23 - 32.24 -The Linux kernel changes rapidly, and has never been internally 32.25 -stable; developers frequently make drastic changes between releases. 32.26 -This means that a version of the driver that works well with a 32.27 -particular released version of the kernel will not even \emph{compile} 32.28 -correctly against, typically, any other version. 32.29 - 32.30 -To maintain a driver, we have to keep a number of distinct versions of 32.31 -Linux in mind. 32.32 -\begin{itemize} 32.33 -\item One target is the main Linux kernel development tree. 32.34 - Maintenance of the code is in this case partly shared by other 32.35 - developers in the kernel community, who make ``drive-by'' 32.36 - modifications to the driver as they develop and refine kernel 32.37 - subsystems. 32.38 -\item We also maintain a number of ``backports'' to older versions of 32.39 - the Linux kernel, to support the needs of customers who are running 32.40 - older Linux distributions that do not incorporate our drivers. (To 32.41 - \emph{backport} a piece of code is to modify it to work in an older 32.42 - version of its target environment than the version it was developed 32.43 - for.) 32.44 -\item Finally, we make software releases on a schedule that is 32.45 - necessarily not aligned with those used by Linux distributors and 32.46 - kernel developers, so that we can deliver new features to customers 32.47 - without forcing them to upgrade their entire kernels or 32.48 - distributions. 32.49 -\end{itemize} 32.50 - 32.51 -\subsection{Tempting approaches that don't work well} 32.52 - 32.53 -There are two ``standard'' ways to maintain a piece of software that 32.54 -has to target many different environments. 32.55 - 32.56 -The first is to maintain a number of branches, each intended for a 32.57 -single target. The trouble with this approach is that you must 32.58 -maintain iron discipline in the flow of changes between repositories. 32.59 -A new feature or bug fix must start life in a ``pristine'' repository, 32.60 -then percolate out to every backport repository. Backport changes are 32.61 -more limited in the branches they should propagate to; a backport 32.62 -change that is applied to a branch where it doesn't belong will 32.63 -probably stop the driver from compiling. 32.64 - 32.65 -The second is to maintain a single source tree filled with conditional 32.66 -statements that turn chunks of code on or off depending on the 32.67 -intended target. Because these ``ifdefs'' are not allowed in the 32.68 -Linux kernel tree, a manual or automatic process must be followed to 32.69 -strip them out and yield a clean tree. A code base maintained in this 32.70 -fashion rapidly becomes a rat's nest of conditional blocks that are 32.71 -difficult to understand and maintain. 32.72 - 32.73 -Neither of these approaches is well suited to a situation where you 32.74 -don't ``own'' the canonical copy of a source tree. In the case of a 32.75 -Linux driver that is distributed with the standard kernel, Linus's 32.76 -tree contains the copy of the code that will be treated by the world 32.77 -as canonical. The upstream version of ``my'' driver can be modified 32.78 -by people I don't know, without me even finding out about it until 32.79 -after the changes show up in Linus's tree. 32.80 - 32.81 -These approaches have the added weakness of making it difficult to 32.82 -generate well-formed patches to submit upstream. 32.83 - 32.84 -In principle, Mercurial Queues seems like a good candidate to manage a 32.85 -development scenario such as the above. While this is indeed the 32.86 -case, MQ contains a few added features that make the job more 32.87 -pleasant. 32.88 - 32.89 -\section{Conditionally applying patches with 32.90 - guards} 32.91 - 32.92 -Perhaps the best way to maintain sanity with so many targets is to be 32.93 -able to choose specific patches to apply for a given situation. MQ 32.94 -provides a feature called ``guards'' (which originates with quilt's 32.95 -\texttt{guards} command) that does just this. To start off, let's 32.96 -create a simple repository for experimenting in. 32.97 -\interaction{mq.guards.init} 32.98 -This gives us a tiny repository that contains two patches that don't 32.99 -have any dependencies on each other, because they touch different files. 32.100 - 32.101 -The idea behind conditional application is that you can ``tag'' a 32.102 -patch with a \emph{guard}, which is simply a text string of your 32.103 -choosing, then tell MQ to select specific guards to use when applying 32.104 -patches. MQ will then either apply, or skip over, a guarded patch, 32.105 -depending on the guards that you have selected. 32.106 - 32.107 -A patch can have an arbitrary number of guards; 32.108 -each one is \emph{positive} (``apply this patch if this guard is 32.109 -selected'') or \emph{negative} (``skip this patch if this guard is 32.110 -selected''). A patch with no guards is always applied. 32.111 - 32.112 -\section{Controlling the guards on a patch} 32.113 - 32.114 -The \hgxcmd{mq}{qguard} command lets you determine which guards should 32.115 -apply to a patch, or display the guards that are already in effect. 32.116 -Without any arguments, it displays the guards on the current topmost 32.117 -patch. 32.118 -\interaction{mq.guards.qguard} 32.119 -To set a positive guard on a patch, prefix the name of the guard with 32.120 -a ``\texttt{+}''. 32.121 -\interaction{mq.guards.qguard.pos} 32.122 -To set a negative guard on a patch, prefix the name of the guard with 32.123 -a ``\texttt{-}''. 32.124 -\interaction{mq.guards.qguard.neg} 32.125 - 32.126 -\begin{note} 32.127 - The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it 32.128 - doesn't \emph{modify} them. What this means is that if you run 32.129 - \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on 32.130 - the same patch, the \emph{only} guard that will be set on it 32.131 - afterwards is \texttt{+c}. 32.132 -\end{note} 32.133 - 32.134 -Mercurial stores guards in the \sfilename{series} file; the form in 32.135 -which they are stored is easy both to understand and to edit by hand. 32.136 -(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if 32.137 -you don't want to; it's okay to simply edit the \sfilename{series} 32.138 -file.) 32.139 -\interaction{mq.guards.series} 32.140 - 32.141 -\section{Selecting the guards to use} 32.142 - 32.143 -The \hgxcmd{mq}{qselect} command determines which guards are active at a 32.144 -given time. The effect of this is to determine which patches MQ will 32.145 -apply the next time you run \hgxcmd{mq}{qpush}. It has no other effect; in 32.146 -particular, it doesn't do anything to patches that are already 32.147 -applied. 32.148 - 32.149 -With no arguments, the \hgxcmd{mq}{qselect} command lists the guards 32.150 -currently in effect, one per line of output. Each argument is treated 32.151 -as the name of a guard to apply. 32.152 -\interaction{mq.guards.qselect.foo} 32.153 -In case you're interested, the currently selected guards are stored in 32.154 -the \sfilename{guards} file. 32.155 -\interaction{mq.guards.qselect.cat} 32.156 -We can see the effect the selected guards have when we run 32.157 -\hgxcmd{mq}{qpush}. 32.158 -\interaction{mq.guards.qselect.qpush} 32.159 - 32.160 -A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}'' 32.161 -character. The name of a guard must not contain white space, but most 32.162 -other characters are acceptable. If you try to use a guard with an 32.163 -invalid name, MQ will complain: 32.164 -\interaction{mq.guards.qselect.error} 32.165 -Changing the selected guards changes the patches that are applied. 32.166 -\interaction{mq.guards.qselect.quux} 32.167 -You can see in the example below that negative guards take precedence 32.168 -over positive guards. 32.169 -\interaction{mq.guards.qselect.foobar} 32.170 - 32.171 -\section{MQ's rules for applying patches} 32.172 - 32.173 -The rules that MQ uses when deciding whether to apply a patch 32.174 -are as follows. 32.175 -\begin{itemize} 32.176 -\item A patch that has no guards is always applied. 32.177 -\item If the patch has any negative guard that matches any currently 32.178 - selected guard, the patch is skipped. 32.179 -\item If the patch has any positive guard that matches any currently 32.180 - selected guard, the patch is applied. 32.181 -\item If the patch has positive or negative guards, but none matches 32.182 - any currently selected guard, the patch is skipped. 32.183 -\end{itemize} 32.184 - 32.185 -\section{Trimming the work environment} 32.186 - 32.187 -In working on the device driver I mentioned earlier, I don't apply the 32.188 -patches to a normal Linux kernel tree. Instead, I use a repository 32.189 -that contains only a snapshot of the source files and headers that are 32.190 -relevant to Infiniband development. This repository is~1\% the size 32.191 -of a kernel repository, so it's easier to work with. 32.192 - 32.193 -I then choose a ``base'' version on top of which the patches are 32.194 -applied. This is a snapshot of the Linux kernel tree as of a revision 32.195 -of my choosing. When I take the snapshot, I record the changeset ID 32.196 -from the kernel repository in the commit message. Since the snapshot 32.197 -preserves the ``shape'' and content of the relevant parts of the 32.198 -kernel tree, I can apply my patches on top of either my tiny 32.199 -repository or a normal kernel tree. 32.200 - 32.201 -Normally, the base tree atop which the patches apply should be a 32.202 -snapshot of a very recent upstream tree. This best facilitates the 32.203 -development of patches that can easily be submitted upstream with few 32.204 -or no modifications. 32.205 - 32.206 -\section{Dividing up the \sfilename{series} file} 32.207 - 32.208 -I categorise the patches in the \sfilename{series} file into a number 32.209 -of logical groups. Each section of like patches begins with a block 32.210 -of comments that describes the purpose of the patches that follow. 32.211 - 32.212 -The sequence of patch groups that I maintain follows. The ordering of 32.213 -these groups is important; I'll describe why after I introduce the 32.214 -groups. 32.215 -\begin{itemize} 32.216 -\item The ``accepted'' group. Patches that the development team has 32.217 - submitted to the maintainer of the Infiniband subsystem, and which 32.218 - he has accepted, but which are not present in the snapshot that the 32.219 - tiny repository is based on. These are ``read only'' patches, 32.220 - present only to transform the tree into a similar state as it is in 32.221 - the upstream maintainer's repository. 32.222 -\item The ``rework'' group. Patches that I have submitted, but that 32.223 - the upstream maintainer has requested modifications to before he 32.224 - will accept them. 32.225 -\item The ``pending'' group. Patches that I have not yet submitted to 32.226 - the upstream maintainer, but which we have finished working on. 32.227 - These will be ``read only'' for a while. If the upstream maintainer 32.228 - accepts them upon submission, I'll move them to the end of the 32.229 - ``accepted'' group. If he requests that I modify any, I'll move 32.230 - them to the beginning of the ``rework'' group. 32.231 -\item The ``in progress'' group. Patches that are actively being 32.232 - developed, and should not be submitted anywhere yet. 32.233 -\item The ``backport'' group. Patches that adapt the source tree to 32.234 - older versions of the kernel tree. 32.235 -\item The ``do not ship'' group. Patches that for some reason should 32.236 - never be submitted upstream. For example, one such patch might 32.237 - change embedded driver identification strings to make it easier to 32.238 - distinguish, in the field, between an out-of-tree version of the 32.239 - driver and a version shipped by a distribution vendor. 32.240 -\end{itemize} 32.241 - 32.242 -Now to return to the reasons for ordering groups of patches in this 32.243 -way. We would like the lowest patches in the stack to be as stable as 32.244 -possible, so that we will not need to rework higher patches due to 32.245 -changes in context. Putting patches that will never be changed first 32.246 -in the \sfilename{series} file serves this purpose. 32.247 - 32.248 -We would also like the patches that we know we'll need to modify to be 32.249 -applied on top of a source tree that resembles the upstream tree as 32.250 -closely as possible. This is why we keep accepted patches around for 32.251 -a while. 32.252 - 32.253 -The ``backport'' and ``do not ship'' patches float at the end of the 32.254 -\sfilename{series} file. The backport patches must be applied on top 32.255 -of all other patches, and the ``do not ship'' patches might as well 32.256 -stay out of harm's way. 32.257 - 32.258 -\section{Maintaining the patch series} 32.259 - 32.260 -In my work, I use a number of guards to control which patches are to 32.261 -be applied. 32.262 - 32.263 -\begin{itemize} 32.264 -\item ``Accepted'' patches are guarded with \texttt{accepted}. I 32.265 - enable this guard most of the time. When I'm applying the patches 32.266 - on top of a tree where the patches are already present, I can turn 32.267 - this patch off, and the patches that follow it will apply cleanly. 32.268 -\item Patches that are ``finished'', but not yet submitted, have no 32.269 - guards. If I'm applying the patch stack to a copy of the upstream 32.270 - tree, I don't need to enable any guards in order to get a reasonably 32.271 - safe source tree. 32.272 -\item Those patches that need reworking before being resubmitted are 32.273 - guarded with \texttt{rework}. 32.274 -\item For those patches that are still under development, I use 32.275 - \texttt{devel}. 32.276 -\item A backport patch may have several guards, one for each version 32.277 - of the kernel to which it applies. For example, a patch that 32.278 - backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard. 32.279 -\end{itemize} 32.280 -This variety of guards gives me considerable flexibility in 32.281 -determining what kind of source tree I want to end up with. For most 32.282 -situations, the selection of appropriate guards is automated during 32.283 -the build process, but I can manually tune the guards to use for less 32.284 -common circumstances. 32.285 - 32.286 -\subsection{The art of writing backport patches} 32.287 - 32.288 -Using MQ, writing a backport patch is a simple process. All such a 32.289 -patch has to do is modify a piece of code that uses a kernel feature 32.290 -not present in the older version of the kernel, so that the driver 32.291 -continues to work correctly under that older version. 32.292 - 32.293 -A useful goal when writing a good backport patch is to make your code 32.294 -look as if it was written for the older version of the kernel you're 32.295 -targeting. The less obtrusive the patch, the easier it will be to 32.296 -understand and maintain. If you're writing a collection of backport 32.297 -patches to avoid the ``rat's nest'' effect of lots of 32.298 -\texttt{\#ifdef}s (hunks of source code that are only used 32.299 -conditionally) in your code, don't introduce version-dependent 32.300 -\texttt{\#ifdef}s into the patches. Instead, write several patches, 32.301 -each of which makes unconditional changes, and control their 32.302 -application using guards. 32.303 - 32.304 -There are two reasons to divide backport patches into a distinct 32.305 -group, away from the ``regular'' patches whose effects they modify. 32.306 -The first is that intermingling the two makes it more difficult to use 32.307 -a tool like the \hgext{patchbomb} extension to automate the process of 32.308 -submitting the patches to an upstream maintainer. The second is that 32.309 -a backport patch could perturb the context in which a subsequent 32.310 -regular patch is applied, making it impossible to apply the regular 32.311 -patch cleanly \emph{without} the earlier backport patch already being 32.312 -applied. 32.313 - 32.314 -\section{Useful tips for developing with MQ} 32.315 - 32.316 -\subsection{Organising patches in directories} 32.317 - 32.318 -If you're working on a substantial project with MQ, it's not difficult 32.319 -to accumulate a large number of patches. For example, I have one 32.320 -patch repository that contains over 250 patches. 32.321 - 32.322 -If you can group these patches into separate logical categories, you 32.323 -can if you like store them in different directories; MQ has no 32.324 -problems with patch names that contain path separators. 32.325 - 32.326 -\subsection{Viewing the history of a patch} 32.327 -\label{mq-collab:tips:interdiff} 32.328 - 32.329 -If you're developing a set of patches over a long time, it's a good 32.330 -idea to maintain them in a repository, as discussed in 32.331 -section~\ref{sec:mq:repo}. If you do so, you'll quickly discover that 32.332 -using the \hgcmd{diff} command to look at the history of changes to a 32.333 -patch is unworkable. This is in part because you're looking at the 32.334 -second derivative of the real code (a diff of a diff), but also 32.335 -because MQ adds noise to the process by modifying time stamps and 32.336 -directory names when it updates a patch. 32.337 - 32.338 -However, you can use the \hgext{extdiff} extension, which is bundled 32.339 -with Mercurial, to turn a diff of two versions of a patch into 32.340 -something readable. To do this, you will need a third-party package 32.341 -called \package{patchutils}~\cite{web:patchutils}. This provides a 32.342 -command named \command{interdiff}, which shows the differences between 32.343 -two diffs as a diff. Used on two versions of the same diff, it 32.344 -generates a diff that represents the diff from the first to the second 32.345 -version. 32.346 - 32.347 -You can enable the \hgext{extdiff} extension in the usual way, by 32.348 -adding a line to the \rcsection{extensions} section of your \hgrc. 32.349 -\begin{codesample2} 32.350 - [extensions] 32.351 - extdiff = 32.352 -\end{codesample2} 32.353 -The \command{interdiff} command expects to be passed the names of two 32.354 -files, but the \hgext{extdiff} extension passes the program it runs a 32.355 -pair of directories, each of which can contain an arbitrary number of 32.356 -files. We thus need a small program that will run \command{interdiff} 32.357 -on each pair of files in these two directories. This program is 32.358 -available as \sfilename{hg-interdiff} in the \dirname{examples} 32.359 -directory of the source code repository that accompanies this book. 32.360 -\excode{hg-interdiff} 32.361 - 32.362 -With the \sfilename{hg-interdiff} program in your shell's search path, 32.363 -you can run it as follows, from inside an MQ patch directory: 32.364 -\begin{codesample2} 32.365 - hg extdiff -p hg-interdiff -r A:B my-change.patch 32.366 -\end{codesample2} 32.367 -Since you'll probably want to use this long-winded command a lot, you 32.368 -can get \hgext{hgext} to make it available as a normal Mercurial 32.369 -command, again by editing your \hgrc. 32.370 -\begin{codesample2} 32.371 - [extdiff] 32.372 - cmd.interdiff = hg-interdiff 32.373 -\end{codesample2} 32.374 -This directs \hgext{hgext} to make an \texttt{interdiff} command 32.375 -available, so you can now shorten the previous invocation of 32.376 -\hgxcmd{extdiff}{extdiff} to something a little more wieldy. 32.377 -\begin{codesample2} 32.378 - hg interdiff -r A:B my-change.patch 32.379 -\end{codesample2} 32.380 - 32.381 -\begin{note} 32.382 - The \command{interdiff} command works well only if the underlying 32.383 - files against which versions of a patch are generated remain the 32.384 - same. If you create a patch, modify the underlying files, and then 32.385 - regenerate the patch, \command{interdiff} may not produce useful 32.386 - output. 32.387 -\end{note} 32.388 - 32.389 -The \hgext{extdiff} extension is useful for more than merely improving 32.390 -the presentation of MQ~patches. To read more about it, go to 32.391 -section~\ref{sec:hgext:extdiff}. 32.392 - 32.393 -%%% Local Variables: 32.394 -%%% mode: latex 32.395 -%%% TeX-master: "00book" 32.396 -%%% End:
33.1 --- a/en/mq-ref.tex Thu Jan 29 22:47:34 2009 -0800 33.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 33.3 @@ -1,349 +0,0 @@ 33.4 -\chapter{Mercurial Queues reference} 33.5 -\label{chap:mqref} 33.6 - 33.7 -\section{MQ command reference} 33.8 -\label{sec:mqref:cmdref} 33.9 - 33.10 -For an overview of the commands provided by MQ, use the command 33.11 -\hgcmdargs{help}{mq}. 33.12 - 33.13 -\subsection{\hgxcmd{mq}{qapplied}---print applied patches} 33.14 - 33.15 -The \hgxcmd{mq}{qapplied} command prints the current stack of applied 33.16 -patches. Patches are printed in oldest-to-newest order, so the last 33.17 -patch in the list is the ``top'' patch. 33.18 - 33.19 -\subsection{\hgxcmd{mq}{qcommit}---commit changes in the queue repository} 33.20 - 33.21 -The \hgxcmd{mq}{qcommit} command commits any outstanding changes in the 33.22 -\sdirname{.hg/patches} repository. This command only works if the 33.23 -\sdirname{.hg/patches} directory is a repository, i.e.~you created the 33.24 -directory using \hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} or ran 33.25 -\hgcmd{init} in the directory after running \hgxcmd{mq}{qinit}. 33.26 - 33.27 -This command is shorthand for \hgcmdargs{commit}{--cwd .hg/patches}. 33.28 - 33.29 -\subsection{\hgxcmd{mq}{qdelete}---delete a patch from the 33.30 - \sfilename{series} file} 33.31 - 33.32 -The \hgxcmd{mq}{qdelete} command removes the entry for a patch from the 33.33 -\sfilename{series} file in the \sdirname{.hg/patches} directory. It 33.34 -does not pop the patch if the patch is already applied. By default, 33.35 -it does not delete the patch file; use the \hgxopt{mq}{qdel}{-f} option to 33.36 -do that. 33.37 - 33.38 -Options: 33.39 -\begin{itemize} 33.40 -\item[\hgxopt{mq}{qdel}{-f}] Delete the patch file. 33.41 -\end{itemize} 33.42 - 33.43 -\subsection{\hgxcmd{mq}{qdiff}---print a diff of the topmost applied patch} 33.44 - 33.45 -The \hgxcmd{mq}{qdiff} command prints a diff of the topmost applied patch. 33.46 -It is equivalent to \hgcmdargs{diff}{-r-2:-1}. 33.47 - 33.48 -\subsection{\hgxcmd{mq}{qfold}---merge (``fold'') several patches into one} 33.49 - 33.50 -The \hgxcmd{mq}{qfold} command merges multiple patches into the topmost 33.51 -applied patch, so that the topmost applied patch makes the union of 33.52 -all of the changes in the patches in question. 33.53 - 33.54 -The patches to fold must not be applied; \hgxcmd{mq}{qfold} will exit with 33.55 -an error if any is. The order in which patches are folded is 33.56 -significant; \hgcmdargs{qfold}{a b} means ``apply the current topmost 33.57 -patch, followed by \texttt{a}, followed by \texttt{b}''. 33.58 - 33.59 -The comments from the folded patches are appended to the comments of 33.60 -the destination patch, with each block of comments separated by three 33.61 -asterisk (``\texttt{*}'') characters. Use the \hgxopt{mq}{qfold}{-e} 33.62 -option to edit the commit message for the combined patch/changeset 33.63 -after the folding has completed. 33.64 - 33.65 -Options: 33.66 -\begin{itemize} 33.67 -\item[\hgxopt{mq}{qfold}{-e}] Edit the commit message and patch description 33.68 - for the newly folded patch. 33.69 -\item[\hgxopt{mq}{qfold}{-l}] Use the contents of the given file as the new 33.70 - commit message and patch description for the folded patch. 33.71 -\item[\hgxopt{mq}{qfold}{-m}] Use the given text as the new commit message 33.72 - and patch description for the folded patch. 33.73 -\end{itemize} 33.74 - 33.75 -\subsection{\hgxcmd{mq}{qheader}---display the header/description of a patch} 33.76 - 33.77 -The \hgxcmd{mq}{qheader} command prints the header, or description, of a 33.78 -patch. By default, it prints the header of the topmost applied patch. 33.79 -Given an argument, it prints the header of the named patch. 33.80 - 33.81 -\subsection{\hgxcmd{mq}{qimport}---import a third-party patch into the queue} 33.82 - 33.83 -The \hgxcmd{mq}{qimport} command adds an entry for an external patch to the 33.84 -\sfilename{series} file, and copies the patch into the 33.85 -\sdirname{.hg/patches} directory. It adds the entry immediately after 33.86 -the topmost applied patch, but does not push the patch. 33.87 - 33.88 -If the \sdirname{.hg/patches} directory is a repository, 33.89 -\hgxcmd{mq}{qimport} automatically does an \hgcmd{add} of the imported 33.90 -patch. 33.91 - 33.92 -\subsection{\hgxcmd{mq}{qinit}---prepare a repository to work with MQ} 33.93 - 33.94 -The \hgxcmd{mq}{qinit} command prepares a repository to work with MQ. It 33.95 -creates a directory called \sdirname{.hg/patches}. 33.96 - 33.97 -Options: 33.98 -\begin{itemize} 33.99 -\item[\hgxopt{mq}{qinit}{-c}] Create \sdirname{.hg/patches} as a repository 33.100 - in its own right. Also creates a \sfilename{.hgignore} file that 33.101 - will ignore the \sfilename{status} file. 33.102 -\end{itemize} 33.103 - 33.104 -When the \sdirname{.hg/patches} directory is a repository, the 33.105 -\hgxcmd{mq}{qimport} and \hgxcmd{mq}{qnew} commands automatically \hgcmd{add} 33.106 -new patches. 33.107 - 33.108 -\subsection{\hgxcmd{mq}{qnew}---create a new patch} 33.109 - 33.110 -The \hgxcmd{mq}{qnew} command creates a new patch. It takes one mandatory 33.111 -argument, the name to use for the patch file. The newly created patch 33.112 -is created empty by default. It is added to the \sfilename{series} 33.113 -file after the current topmost applied patch, and is immediately 33.114 -pushed on top of that patch. 33.115 - 33.116 -If \hgxcmd{mq}{qnew} finds modified files in the working directory, it will 33.117 -refuse to create a new patch unless the \hgxopt{mq}{qnew}{-f} option is 33.118 -used (see below). This behaviour allows you to \hgxcmd{mq}{qrefresh} your 33.119 -topmost applied patch before you apply a new patch on top of it. 33.120 - 33.121 -Options: 33.122 -\begin{itemize} 33.123 -\item[\hgxopt{mq}{qnew}{-f}] Create a new patch if the contents of the 33.124 - working directory are modified. Any outstanding modifications are 33.125 - added to the newly created patch, so after this command completes, 33.126 - the working directory will no longer be modified. 33.127 -\item[\hgxopt{mq}{qnew}{-m}] Use the given text as the commit message. 33.128 - This text will be stored at the beginning of the patch file, before 33.129 - the patch data. 33.130 -\end{itemize} 33.131 - 33.132 -\subsection{\hgxcmd{mq}{qnext}---print the name of the next patch} 33.133 - 33.134 -The \hgxcmd{mq}{qnext} command prints the name name of the next patch in 33.135 -the \sfilename{series} file after the topmost applied patch. This 33.136 -patch will become the topmost applied patch if you run \hgxcmd{mq}{qpush}. 33.137 - 33.138 -\subsection{\hgxcmd{mq}{qpop}---pop patches off the stack} 33.139 - 33.140 -The \hgxcmd{mq}{qpop} command removes applied patches from the top of the 33.141 -stack of applied patches. By default, it removes only one patch. 33.142 - 33.143 -This command removes the changesets that represent the popped patches 33.144 -from the repository, and updates the working directory to undo the 33.145 -effects of the patches. 33.146 - 33.147 -This command takes an optional argument, which it uses as the name or 33.148 -index of the patch to pop to. If given a name, it will pop patches 33.149 -until the named patch is the topmost applied patch. If given a 33.150 -number, \hgxcmd{mq}{qpop} treats the number as an index into the entries in 33.151 -the series file, counting from zero (empty lines and lines containing 33.152 -only comments do not count). It pops patches until the patch 33.153 -identified by the given index is the topmost applied patch. 33.154 - 33.155 -The \hgxcmd{mq}{qpop} command does not read or write patches or the 33.156 -\sfilename{series} file. It is thus safe to \hgxcmd{mq}{qpop} a patch that 33.157 -you have removed from the \sfilename{series} file, or a patch that you 33.158 -have renamed or deleted entirely. In the latter two cases, use the 33.159 -name of the patch as it was when you applied it. 33.160 - 33.161 -By default, the \hgxcmd{mq}{qpop} command will not pop any patches if the 33.162 -working directory has been modified. You can override this behaviour 33.163 -using the \hgxopt{mq}{qpop}{-f} option, which reverts all modifications in 33.164 -the working directory. 33.165 - 33.166 -Options: 33.167 -\begin{itemize} 33.168 -\item[\hgxopt{mq}{qpop}{-a}] Pop all applied patches. This returns the 33.169 - repository to its state before you applied any patches. 33.170 -\item[\hgxopt{mq}{qpop}{-f}] Forcibly revert any modifications to the 33.171 - working directory when popping. 33.172 -\item[\hgxopt{mq}{qpop}{-n}] Pop a patch from the named queue. 33.173 -\end{itemize} 33.174 - 33.175 -The \hgxcmd{mq}{qpop} command removes one line from the end of the 33.176 -\sfilename{status} file for each patch that it pops. 33.177 - 33.178 -\subsection{\hgxcmd{mq}{qprev}---print the name of the previous patch} 33.179 - 33.180 -The \hgxcmd{mq}{qprev} command prints the name of the patch in the 33.181 -\sfilename{series} file that comes before the topmost applied patch. 33.182 -This will become the topmost applied patch if you run \hgxcmd{mq}{qpop}. 33.183 - 33.184 -\subsection{\hgxcmd{mq}{qpush}---push patches onto the stack} 33.185 -\label{sec:mqref:cmd:qpush} 33.186 - 33.187 -The \hgxcmd{mq}{qpush} command adds patches onto the applied stack. By 33.188 -default, it adds only one patch. 33.189 - 33.190 -This command creates a new changeset to represent each applied patch, 33.191 -and updates the working directory to apply the effects of the patches. 33.192 - 33.193 -The default data used when creating a changeset are as follows: 33.194 -\begin{itemize} 33.195 -\item The commit date and time zone are the current date and time 33.196 - zone. Because these data are used to compute the identity of a 33.197 - changeset, this means that if you \hgxcmd{mq}{qpop} a patch and 33.198 - \hgxcmd{mq}{qpush} it again, the changeset that you push will have a 33.199 - different identity than the changeset you popped. 33.200 -\item The author is the same as the default used by the \hgcmd{commit} 33.201 - command. 33.202 -\item The commit message is any text from the patch file that comes 33.203 - before the first diff header. If there is no such text, a default 33.204 - commit message is used that identifies the name of the patch. 33.205 -\end{itemize} 33.206 -If a patch contains a Mercurial patch header (XXX add link), the 33.207 -information in the patch header overrides these defaults. 33.208 - 33.209 -Options: 33.210 -\begin{itemize} 33.211 -\item[\hgxopt{mq}{qpush}{-a}] Push all unapplied patches from the 33.212 - \sfilename{series} file until there are none left to push. 33.213 -\item[\hgxopt{mq}{qpush}{-l}] Add the name of the patch to the end 33.214 - of the commit message. 33.215 -\item[\hgxopt{mq}{qpush}{-m}] If a patch fails to apply cleanly, use the 33.216 - entry for the patch in another saved queue to compute the parameters 33.217 - for a three-way merge, and perform a three-way merge using the 33.218 - normal Mercurial merge machinery. Use the resolution of the merge 33.219 - as the new patch content. 33.220 -\item[\hgxopt{mq}{qpush}{-n}] Use the named queue if merging while pushing. 33.221 -\end{itemize} 33.222 - 33.223 -The \hgxcmd{mq}{qpush} command reads, but does not modify, the 33.224 -\sfilename{series} file. It appends one line to the \hgcmd{status} 33.225 -file for each patch that it pushes. 33.226 - 33.227 -\subsection{\hgxcmd{mq}{qrefresh}---update the topmost applied patch} 33.228 - 33.229 -The \hgxcmd{mq}{qrefresh} command updates the topmost applied patch. It 33.230 -modifies the patch, removes the old changeset that represented the 33.231 -patch, and creates a new changeset to represent the modified patch. 33.232 - 33.233 -The \hgxcmd{mq}{qrefresh} command looks for the following modifications: 33.234 -\begin{itemize} 33.235 -\item Changes to the commit message, i.e.~the text before the first 33.236 - diff header in the patch file, are reflected in the new changeset 33.237 - that represents the patch. 33.238 -\item Modifications to tracked files in the working directory are 33.239 - added to the patch. 33.240 -\item Changes to the files tracked using \hgcmd{add}, \hgcmd{copy}, 33.241 - \hgcmd{remove}, or \hgcmd{rename}. Added files and copy and rename 33.242 - destinations are added to the patch, while removed files and rename 33.243 - sources are removed. 33.244 -\end{itemize} 33.245 - 33.246 -Even if \hgxcmd{mq}{qrefresh} detects no changes, it still recreates the 33.247 -changeset that represents the patch. This causes the identity of the 33.248 -changeset to differ from the previous changeset that identified the 33.249 -patch. 33.250 - 33.251 -Options: 33.252 -\begin{itemize} 33.253 -\item[\hgxopt{mq}{qrefresh}{-e}] Modify the commit and patch description, 33.254 - using the preferred text editor. 33.255 -\item[\hgxopt{mq}{qrefresh}{-m}] Modify the commit message and patch 33.256 - description, using the given text. 33.257 -\item[\hgxopt{mq}{qrefresh}{-l}] Modify the commit message and patch 33.258 - description, using text from the given file. 33.259 -\end{itemize} 33.260 - 33.261 -\subsection{\hgxcmd{mq}{qrename}---rename a patch} 33.262 - 33.263 -The \hgxcmd{mq}{qrename} command renames a patch, and changes the entry for 33.264 -the patch in the \sfilename{series} file. 33.265 - 33.266 -With a single argument, \hgxcmd{mq}{qrename} renames the topmost applied 33.267 -patch. With two arguments, it renames its first argument to its 33.268 -second. 33.269 - 33.270 -\subsection{\hgxcmd{mq}{qrestore}---restore saved queue state} 33.271 - 33.272 -XXX No idea what this does. 33.273 - 33.274 -\subsection{\hgxcmd{mq}{qsave}---save current queue state} 33.275 - 33.276 -XXX Likewise. 33.277 - 33.278 -\subsection{\hgxcmd{mq}{qseries}---print the entire patch series} 33.279 - 33.280 -The \hgxcmd{mq}{qseries} command prints the entire patch series from the 33.281 -\sfilename{series} file. It prints only patch names, not empty lines 33.282 -or comments. It prints in order from first to be applied to last. 33.283 - 33.284 -\subsection{\hgxcmd{mq}{qtop}---print the name of the current patch} 33.285 - 33.286 -The \hgxcmd{mq}{qtop} prints the name of the topmost currently applied 33.287 -patch. 33.288 - 33.289 -\subsection{\hgxcmd{mq}{qunapplied}---print patches not yet applied} 33.290 - 33.291 -The \hgxcmd{mq}{qunapplied} command prints the names of patches from the 33.292 -\sfilename{series} file that are not yet applied. It prints them in 33.293 -order from the next patch that will be pushed to the last. 33.294 - 33.295 -\subsection{\hgcmd{strip}---remove a revision and descendants} 33.296 - 33.297 -The \hgcmd{strip} command removes a revision, and all of its 33.298 -descendants, from the repository. It undoes the effects of the 33.299 -removed revisions from the repository, and updates the working 33.300 -directory to the first parent of the removed revision. 33.301 - 33.302 -The \hgcmd{strip} command saves a backup of the removed changesets in 33.303 -a bundle, so that they can be reapplied if removed in error. 33.304 - 33.305 -Options: 33.306 -\begin{itemize} 33.307 -\item[\hgopt{strip}{-b}] Save unrelated changesets that are intermixed 33.308 - with the stripped changesets in the backup bundle. 33.309 -\item[\hgopt{strip}{-f}] If a branch has multiple heads, remove all 33.310 - heads. XXX This should be renamed, and use \texttt{-f} to strip revs 33.311 - when there are pending changes. 33.312 -\item[\hgopt{strip}{-n}] Do not save a backup bundle. 33.313 -\end{itemize} 33.314 - 33.315 -\section{MQ file reference} 33.316 - 33.317 -\subsection{The \sfilename{series} file} 33.318 - 33.319 -The \sfilename{series} file contains a list of the names of all 33.320 -patches that MQ can apply. It is represented as a list of names, with 33.321 -one name saved per line. Leading and trailing white space in each 33.322 -line are ignored. 33.323 - 33.324 -Lines may contain comments. A comment begins with the ``\texttt{\#}'' 33.325 -character, and extends to the end of the line. Empty lines, and lines 33.326 -that contain only comments, are ignored. 33.327 - 33.328 -You will often need to edit the \sfilename{series} file by hand, hence 33.329 -the support for comments and empty lines noted above. For example, 33.330 -you can comment out a patch temporarily, and \hgxcmd{mq}{qpush} will skip 33.331 -over that patch when applying patches. You can also change the order 33.332 -in which patches are applied by reordering their entries in the 33.333 -\sfilename{series} file. 33.334 - 33.335 -Placing the \sfilename{series} file under revision control is also 33.336 -supported; it is a good idea to place all of the patches that it 33.337 -refers to under revision control, as well. If you create a patch 33.338 -directory using the \hgxopt{mq}{qinit}{-c} option to \hgxcmd{mq}{qinit}, this 33.339 -will be done for you automatically. 33.340 - 33.341 -\subsection{The \sfilename{status} file} 33.342 - 33.343 -The \sfilename{status} file contains the names and changeset hashes of 33.344 -all patches that MQ currently has applied. Unlike the 33.345 -\sfilename{series} file, this file is not intended for editing. You 33.346 -should not place this file under revision control, or modify it in any 33.347 -way. It is used by MQ strictly for internal book-keeping. 33.348 - 33.349 -%%% Local Variables: 33.350 -%%% mode: latex 33.351 -%%% TeX-master: "00book" 33.352 -%%% End:
34.1 --- a/en/mq.tex Thu Jan 29 22:47:34 2009 -0800 34.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 34.3 @@ -1,1043 +0,0 @@ 34.4 -\chapter{Managing change with Mercurial Queues} 34.5 -\label{chap:mq} 34.6 - 34.7 -\section{The patch management problem} 34.8 -\label{sec:mq:patch-mgmt} 34.9 - 34.10 -Here is a common scenario: you need to install a software package from 34.11 -source, but you find a bug that you must fix in the source before you 34.12 -can start using the package. You make your changes, forget about the 34.13 -package for a while, and a few months later you need to upgrade to a 34.14 -newer version of the package. If the newer version of the package 34.15 -still has the bug, you must extract your fix from the older source 34.16 -tree and apply it against the newer version. This is a tedious task, 34.17 -and it's easy to make mistakes. 34.18 - 34.19 -This is a simple case of the ``patch management'' problem. You have 34.20 -an ``upstream'' source tree that you can't change; you need to make 34.21 -some local changes on top of the upstream tree; and you'd like to be 34.22 -able to keep those changes separate, so that you can apply them to 34.23 -newer versions of the upstream source. 34.24 - 34.25 -The patch management problem arises in many situations. Probably the 34.26 -most visible is that a user of an open source software project will 34.27 -contribute a bug fix or new feature to the project's maintainers in the 34.28 -form of a patch. 34.29 - 34.30 -Distributors of operating systems that include open source software 34.31 -often need to make changes to the packages they distribute so that 34.32 -they will build properly in their environments. 34.33 - 34.34 -When you have few changes to maintain, it is easy to manage a single 34.35 -patch using the standard \command{diff} and \command{patch} programs 34.36 -(see section~\ref{sec:mq:patch} for a discussion of these tools). 34.37 -Once the number of changes grows, it starts to make sense to maintain 34.38 -patches as discrete ``chunks of work,'' so that for example a single 34.39 -patch will contain only one bug fix (the patch might modify several 34.40 -files, but it's doing ``only one thing''), and you may have a number 34.41 -of such patches for different bugs you need fixed and local changes 34.42 -you require. In this situation, if you submit a bug fix patch to the 34.43 -upstream maintainers of a package and they include your fix in a 34.44 -subsequent release, you can simply drop that single patch when you're 34.45 -updating to the newer release. 34.46 - 34.47 -Maintaining a single patch against an upstream tree is a little 34.48 -tedious and error-prone, but not difficult. However, the complexity 34.49 -of the problem grows rapidly as the number of patches you have to 34.50 -maintain increases. With more than a tiny number of patches in hand, 34.51 -understanding which ones you have applied and maintaining them moves 34.52 -from messy to overwhelming. 34.53 - 34.54 -Fortunately, Mercurial includes a powerful extension, Mercurial Queues 34.55 -(or simply ``MQ''), that massively simplifies the patch management 34.56 -problem. 34.57 - 34.58 -\section{The prehistory of Mercurial Queues} 34.59 -\label{sec:mq:history} 34.60 - 34.61 -During the late 1990s, several Linux kernel developers started to 34.62 -maintain ``patch series'' that modified the behaviour of the Linux 34.63 -kernel. Some of these series were focused on stability, some on 34.64 -feature coverage, and others were more speculative. 34.65 - 34.66 -The sizes of these patch series grew rapidly. In 2002, Andrew Morton 34.67 -published some shell scripts he had been using to automate the task of 34.68 -managing his patch queues. Andrew was successfully using these 34.69 -scripts to manage hundreds (sometimes thousands) of patches on top of 34.70 -the Linux kernel. 34.71 - 34.72 -\subsection{A patchwork quilt} 34.73 -\label{sec:mq:quilt} 34.74 - 34.75 -In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the 34.76 -approach of Andrew's scripts and published a tool called ``patchwork 34.77 -quilt''~\cite{web:quilt}, or simply ``quilt'' 34.78 -(see~\cite{gruenbacher:2005} for a paper describing it). Because 34.79 -quilt substantially automated patch management, it rapidly gained a 34.80 -large following among open source software developers. 34.81 - 34.82 -Quilt manages a \emph{stack of patches} on top of a directory tree. 34.83 -To begin, you tell quilt to manage a directory tree, and tell it which 34.84 -files you want to manage; it stores away the names and contents of 34.85 -those files. To fix a bug, you create a new patch (using a single 34.86 -command), edit the files you need to fix, then ``refresh'' the patch. 34.87 - 34.88 -The refresh step causes quilt to scan the directory tree; it updates 34.89 -the patch with all of the changes you have made. You can create 34.90 -another patch on top of the first, which will track the changes 34.91 -required to modify the tree from ``tree with one patch applied'' to 34.92 -``tree with two patches applied''. 34.93 - 34.94 -You can \emph{change} which patches are applied to the tree. If you 34.95 -``pop'' a patch, the changes made by that patch will vanish from the 34.96 -directory tree. Quilt remembers which patches you have popped, 34.97 -though, so you can ``push'' a popped patch again, and the directory 34.98 -tree will be restored to contain the modifications in the patch. Most 34.99 -importantly, you can run the ``refresh'' command at any time, and the 34.100 -topmost applied patch will be updated. This means that you can, at 34.101 -any time, change both which patches are applied and what 34.102 -modifications those patches make. 34.103 - 34.104 -Quilt knows nothing about revision control tools, so it works equally 34.105 -well on top of an unpacked tarball or a Subversion working copy. 34.106 - 34.107 -\subsection{From patchwork quilt to Mercurial Queues} 34.108 -\label{sec:mq:quilt-mq} 34.109 - 34.110 -In mid-2005, Chris Mason took the features of quilt and wrote an 34.111 -extension that he called Mercurial Queues, which added quilt-like 34.112 -behaviour to Mercurial. 34.113 - 34.114 -The key difference between quilt and MQ is that quilt knows nothing 34.115 -about revision control systems, while MQ is \emph{integrated} into 34.116 -Mercurial. Each patch that you push is represented as a Mercurial 34.117 -changeset. Pop a patch, and the changeset goes away. 34.118 - 34.119 -Because quilt does not care about revision control tools, it is still 34.120 -a tremendously useful piece of software to know about for situations 34.121 -where you cannot use Mercurial and MQ. 34.122 - 34.123 -\section{The huge advantage of MQ} 34.124 - 34.125 -I cannot overstate the value that MQ offers through the unification of 34.126 -patches and revision control. 34.127 - 34.128 -A major reason that patches have persisted in the free software and 34.129 -open source world---in spite of the availability of increasingly 34.130 -capable revision control tools over the years---is the \emph{agility} 34.131 -they offer. 34.132 - 34.133 -Traditional revision control tools make a permanent, irreversible 34.134 -record of everything that you do. While this has great value, it's 34.135 -also somewhat stifling. If you want to perform a wild-eyed 34.136 -experiment, you have to be careful in how you go about it, or you risk 34.137 -leaving unneeded---or worse, misleading or destabilising---traces of 34.138 -your missteps and errors in the permanent revision record. 34.139 - 34.140 -By contrast, MQ's marriage of distributed revision control with 34.141 -patches makes it much easier to isolate your work. Your patches live 34.142 -on top of normal revision history, and you can make them disappear or 34.143 -reappear at will. If you don't like a patch, you can drop it. If a 34.144 -patch isn't quite as you want it to be, simply fix it---as many times 34.145 -as you need to, until you have refined it into the form you desire. 34.146 - 34.147 -As an example, the integration of patches with revision control makes 34.148 -understanding patches and debugging their effects---and their 34.149 -interplay with the code they're based on---\emph{enormously} easier. 34.150 -Since every applied patch has an associated changeset, you can use 34.151 -\hgcmdargs{log}{\emph{filename}} to see which changesets and patches 34.152 -affected a file. You can use the \hgext{bisect} command to 34.153 -binary-search through all changesets and applied patches to see where 34.154 -a bug got introduced or fixed. You can use the \hgcmd{annotate} 34.155 -command to see which changeset or patch modified a particular line of 34.156 -a source file. And so on. 34.157 - 34.158 -\section{Understanding patches} 34.159 -\label{sec:mq:patch} 34.160 - 34.161 -Because MQ doesn't hide its patch-oriented nature, it is helpful to 34.162 -understand what patches are, and a little about the tools that work 34.163 -with them. 34.164 - 34.165 -The traditional Unix \command{diff} command compares two files, and 34.166 -prints a list of differences between them. The \command{patch} command 34.167 -understands these differences as \emph{modifications} to make to a 34.168 -file. Take a look at figure~\ref{ex:mq:diff} for a simple example of 34.169 -these commands in action. 34.170 - 34.171 -\begin{figure}[ht] 34.172 - \interaction{mq.dodiff.diff} 34.173 - \caption{Simple uses of the \command{diff} and \command{patch} commands} 34.174 - \label{ex:mq:diff} 34.175 -\end{figure} 34.176 - 34.177 -The type of file that \command{diff} generates (and \command{patch} 34.178 -takes as input) is called a ``patch'' or a ``diff''; there is no 34.179 -difference between a patch and a diff. (We'll use the term ``patch'', 34.180 -since it's more commonly used.) 34.181 - 34.182 -A patch file can start with arbitrary text; the \command{patch} 34.183 -command ignores this text, but MQ uses it as the commit message when 34.184 -creating changesets. To find the beginning of the patch content, 34.185 -\command{patch} searches for the first line that starts with the 34.186 -string ``\texttt{diff~-}''. 34.187 - 34.188 -MQ works with \emph{unified} diffs (\command{patch} can accept several 34.189 -other diff formats, but MQ doesn't). A unified diff contains two 34.190 -kinds of header. The \emph{file header} describes the file being 34.191 -modified; it contains the name of the file to modify. When 34.192 -\command{patch} sees a new file header, it looks for a file with that 34.193 -name to start modifying. 34.194 - 34.195 -After the file header comes a series of \emph{hunks}. Each hunk 34.196 -starts with a header; this identifies the range of line numbers within 34.197 -the file that the hunk should modify. Following the header, a hunk 34.198 -starts and ends with a few (usually three) lines of text from the 34.199 -unmodified file; these are called the \emph{context} for the hunk. If 34.200 -there's only a small amount of context between successive hunks, 34.201 -\command{diff} doesn't print a new hunk header; it just runs the hunks 34.202 -together, with a few lines of context between modifications. 34.203 - 34.204 -Each line of context begins with a space character. Within the hunk, 34.205 -a line that begins with ``\texttt{-}'' means ``remove this line,'' 34.206 -while a line that begins with ``\texttt{+}'' means ``insert this 34.207 -line.'' For example, a line that is modified is represented by one 34.208 -deletion and one insertion. 34.209 - 34.210 -We will return to some of the more subtle aspects of patches later (in 34.211 -section~\ref{sec:mq:adv-patch}), but you should have enough information 34.212 -now to use MQ. 34.213 - 34.214 -\section{Getting started with Mercurial Queues} 34.215 -\label{sec:mq:start} 34.216 - 34.217 -Because MQ is implemented as an extension, you must explicitly enable 34.218 -before you can use it. (You don't need to download anything; MQ ships 34.219 -with the standard Mercurial distribution.) To enable MQ, edit your 34.220 -\tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}. 34.221 - 34.222 -\begin{figure}[ht] 34.223 - \begin{codesample4} 34.224 - [extensions] 34.225 - hgext.mq = 34.226 - \end{codesample4} 34.227 - \label{ex:mq:config} 34.228 - \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension} 34.229 -\end{figure} 34.230 - 34.231 -Once the extension is enabled, it will make a number of new commands 34.232 -available. To verify that the extension is working, you can use 34.233 -\hgcmd{help} to see if the \hgxcmd{mq}{qinit} command is now available; see 34.234 -the example in figure~\ref{ex:mq:enabled}. 34.235 - 34.236 -\begin{figure}[ht] 34.237 - \interaction{mq.qinit-help.help} 34.238 - \caption{How to verify that MQ is enabled} 34.239 - \label{ex:mq:enabled} 34.240 -\end{figure} 34.241 - 34.242 -You can use MQ with \emph{any} Mercurial repository, and its commands 34.243 -only operate within that repository. To get started, simply prepare 34.244 -the repository using the \hgxcmd{mq}{qinit} command (see 34.245 -figure~\ref{ex:mq:qinit}). This command creates an empty directory 34.246 -called \sdirname{.hg/patches}, where MQ will keep its metadata. As 34.247 -with many Mercurial commands, the \hgxcmd{mq}{qinit} command prints nothing 34.248 -if it succeeds. 34.249 - 34.250 -\begin{figure}[ht] 34.251 - \interaction{mq.tutorial.qinit} 34.252 - \caption{Preparing a repository for use with MQ} 34.253 - \label{ex:mq:qinit} 34.254 -\end{figure} 34.255 - 34.256 -\begin{figure}[ht] 34.257 - \interaction{mq.tutorial.qnew} 34.258 - \caption{Creating a new patch} 34.259 - \label{ex:mq:qnew} 34.260 -\end{figure} 34.261 - 34.262 -\subsection{Creating a new patch} 34.263 - 34.264 -To begin work on a new patch, use the \hgxcmd{mq}{qnew} command. This 34.265 -command takes one argument, the name of the patch to create. MQ will 34.266 -use this as the name of an actual file in the \sdirname{.hg/patches} 34.267 -directory, as you can see in figure~\ref{ex:mq:qnew}. 34.268 - 34.269 -Also newly present in the \sdirname{.hg/patches} directory are two 34.270 -other files, \sfilename{series} and \sfilename{status}. The 34.271 -\sfilename{series} file lists all of the patches that MQ knows about 34.272 -for this repository, with one patch per line. Mercurial uses the 34.273 -\sfilename{status} file for internal book-keeping; it tracks all of the 34.274 -patches that MQ has \emph{applied} in this repository. 34.275 - 34.276 -\begin{note} 34.277 - You may sometimes want to edit the \sfilename{series} file by hand; 34.278 - for example, to change the sequence in which some patches are 34.279 - applied. However, manually editing the \sfilename{status} file is 34.280 - almost always a bad idea, as it's easy to corrupt MQ's idea of what 34.281 - is happening. 34.282 -\end{note} 34.283 - 34.284 -Once you have created your new patch, you can edit files in the 34.285 -working directory as you usually would. All of the normal Mercurial 34.286 -commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as 34.287 -they did before. 34.288 - 34.289 -\subsection{Refreshing a patch} 34.290 - 34.291 -When you reach a point where you want to save your work, use the 34.292 -\hgxcmd{mq}{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch 34.293 -you are working on. This command folds the changes you have made in 34.294 -the working directory into your patch, and updates its corresponding 34.295 -changeset to contain those changes. 34.296 - 34.297 -\begin{figure}[ht] 34.298 - \interaction{mq.tutorial.qrefresh} 34.299 - \caption{Refreshing a patch} 34.300 - \label{ex:mq:qrefresh} 34.301 -\end{figure} 34.302 - 34.303 -You can run \hgxcmd{mq}{qrefresh} as often as you like, so it's a good way 34.304 -to ``checkpoint'' your work. Refresh your patch at an opportune 34.305 -time; try an experiment; and if the experiment doesn't work out, 34.306 -\hgcmd{revert} your modifications back to the last time you refreshed. 34.307 - 34.308 -\begin{figure}[ht] 34.309 - \interaction{mq.tutorial.qrefresh2} 34.310 - \caption{Refresh a patch many times to accumulate changes} 34.311 - \label{ex:mq:qrefresh2} 34.312 -\end{figure} 34.313 - 34.314 -\subsection{Stacking and tracking patches} 34.315 - 34.316 -Once you have finished working on a patch, or need to work on another, 34.317 -you can use the \hgxcmd{mq}{qnew} command again to create a new patch. 34.318 -Mercurial will apply this patch on top of your existing patch. See 34.319 -figure~\ref{ex:mq:qnew2} for an example. Notice that the patch 34.320 -contains the changes in our prior patch as part of its context (you 34.321 -can see this more clearly in the output of \hgcmd{annotate}). 34.322 - 34.323 -\begin{figure}[ht] 34.324 - \interaction{mq.tutorial.qnew2} 34.325 - \caption{Stacking a second patch on top of the first} 34.326 - \label{ex:mq:qnew2} 34.327 -\end{figure} 34.328 - 34.329 -So far, with the exception of \hgxcmd{mq}{qnew} and \hgxcmd{mq}{qrefresh}, we've 34.330 -been careful to only use regular Mercurial commands. However, MQ 34.331 -provides many commands that are easier to use when you are thinking 34.332 -about patches, as illustrated in figure~\ref{ex:mq:qseries}: 34.333 - 34.334 -\begin{itemize} 34.335 -\item The \hgxcmd{mq}{qseries} command lists every patch that MQ knows 34.336 - about in this repository, from oldest to newest (most recently 34.337 - \emph{created}). 34.338 -\item The \hgxcmd{mq}{qapplied} command lists every patch that MQ has 34.339 - \emph{applied} in this repository, again from oldest to newest (most 34.340 - recently applied). 34.341 -\end{itemize} 34.342 - 34.343 -\begin{figure}[ht] 34.344 - \interaction{mq.tutorial.qseries} 34.345 - \caption{Understanding the patch stack with \hgxcmd{mq}{qseries} and 34.346 - \hgxcmd{mq}{qapplied}} 34.347 - \label{ex:mq:qseries} 34.348 -\end{figure} 34.349 - 34.350 -\subsection{Manipulating the patch stack} 34.351 - 34.352 -The previous discussion implied that there must be a difference 34.353 -between ``known'' and ``applied'' patches, and there is. MQ can 34.354 -manage a patch without it being applied in the repository. 34.355 - 34.356 -An \emph{applied} patch has a corresponding changeset in the 34.357 -repository, and the effects of the patch and changeset are visible in 34.358 -the working directory. You can undo the application of a patch using 34.359 -the \hgxcmd{mq}{qpop} command. MQ still \emph{knows about}, or manages, a 34.360 -popped patch, but the patch no longer has a corresponding changeset in 34.361 -the repository, and the working directory does not contain the changes 34.362 -made by the patch. Figure~\ref{fig:mq:stack} illustrates the 34.363 -difference between applied and tracked patches. 34.364 - 34.365 -\begin{figure}[ht] 34.366 - \centering 34.367 - \grafix{mq-stack} 34.368 - \caption{Applied and unapplied patches in the MQ patch stack} 34.369 - \label{fig:mq:stack} 34.370 -\end{figure} 34.371 - 34.372 -You can reapply an unapplied, or popped, patch using the \hgxcmd{mq}{qpush} 34.373 -command. This creates a new changeset to correspond to the patch, and 34.374 -the patch's changes once again become present in the working 34.375 -directory. See figure~\ref{ex:mq:qpop} for examples of \hgxcmd{mq}{qpop} 34.376 -and \hgxcmd{mq}{qpush} in action. Notice that once we have popped a patch 34.377 -or two patches, the output of \hgxcmd{mq}{qseries} remains the same, while 34.378 -that of \hgxcmd{mq}{qapplied} has changed. 34.379 - 34.380 -\begin{figure}[ht] 34.381 - \interaction{mq.tutorial.qpop} 34.382 - \caption{Modifying the stack of applied patches} 34.383 - \label{ex:mq:qpop} 34.384 -\end{figure} 34.385 - 34.386 -\subsection{Pushing and popping many patches} 34.387 - 34.388 -While \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} each operate on a single patch at 34.389 -a time by default, you can push and pop many patches in one go. The 34.390 -\hgxopt{mq}{qpush}{-a} option to \hgxcmd{mq}{qpush} causes it to push all 34.391 -unapplied patches, while the \hgxopt{mq}{qpop}{-a} option to \hgxcmd{mq}{qpop} 34.392 -causes it to pop all applied patches. (For some more ways to push and 34.393 -pop many patches, see section~\ref{sec:mq:perf} below.) 34.394 - 34.395 -\begin{figure}[ht] 34.396 - \interaction{mq.tutorial.qpush-a} 34.397 - \caption{Pushing all unapplied patches} 34.398 - \label{ex:mq:qpush-a} 34.399 -\end{figure} 34.400 - 34.401 -\subsection{Safety checks, and overriding them} 34.402 - 34.403 -Several MQ commands check the working directory before they do 34.404 -anything, and fail if they find any modifications. They do this to 34.405 -ensure that you won't lose any changes that you have made, but not yet 34.406 -incorporated into a patch. Figure~\ref{ex:mq:add} illustrates this; 34.407 -the \hgxcmd{mq}{qnew} command will not create a new patch if there are 34.408 -outstanding changes, caused in this case by the \hgcmd{add} of 34.409 -\filename{file3}. 34.410 - 34.411 -\begin{figure}[ht] 34.412 - \interaction{mq.tutorial.add} 34.413 - \caption{Forcibly creating a patch} 34.414 - \label{ex:mq:add} 34.415 -\end{figure} 34.416 - 34.417 -Commands that check the working directory all take an ``I know what 34.418 -I'm doing'' option, which is always named \option{-f}. The exact 34.419 -meaning of \option{-f} depends on the command. For example, 34.420 -\hgcmdargs{qnew}{\hgxopt{mq}{qnew}{-f}} will incorporate any outstanding 34.421 -changes into the new patch it creates, but 34.422 -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-f}} will revert modifications to any 34.423 -files affected by the patch that it is popping. Be sure to read the 34.424 -documentation for a command's \option{-f} option before you use it! 34.425 - 34.426 -\subsection{Working on several patches at once} 34.427 - 34.428 -The \hgxcmd{mq}{qrefresh} command always refreshes the \emph{topmost} 34.429 -applied patch. This means that you can suspend work on one patch (by 34.430 -refreshing it), pop or push to make a different patch the top, and 34.431 -work on \emph{that} patch for a while. 34.432 - 34.433 -Here's an example that illustrates how you can use this ability. 34.434 -Let's say you're developing a new feature as two patches. The first 34.435 -is a change to the core of your software, and the second---layered on 34.436 -top of the first---changes the user interface to use the code you just 34.437 -added to the core. If you notice a bug in the core while you're 34.438 -working on the UI patch, it's easy to fix the core. Simply 34.439 -\hgxcmd{mq}{qrefresh} the UI patch to save your in-progress changes, and 34.440 -\hgxcmd{mq}{qpop} down to the core patch. Fix the core bug, 34.441 -\hgxcmd{mq}{qrefresh} the core patch, and \hgxcmd{mq}{qpush} back to the UI 34.442 -patch to continue where you left off. 34.443 - 34.444 -\section{More about patches} 34.445 -\label{sec:mq:adv-patch} 34.446 - 34.447 -MQ uses the GNU \command{patch} command to apply patches, so it's 34.448 -helpful to know a few more detailed aspects of how \command{patch} 34.449 -works, and about patches themselves. 34.450 - 34.451 -\subsection{The strip count} 34.452 - 34.453 -If you look at the file headers in a patch, you will notice that the 34.454 -pathnames usually have an extra component on the front that isn't 34.455 -present in the actual path name. This is a holdover from the way that 34.456 -people used to generate patches (people still do this, but it's 34.457 -somewhat rare with modern revision control tools). 34.458 - 34.459 -Alice would unpack a tarball, edit her files, then decide that she 34.460 -wanted to create a patch. So she'd rename her working directory, 34.461 -unpack the tarball again (hence the need for the rename), and use the 34.462 -\cmdopt{diff}{-r} and \cmdopt{diff}{-N} options to \command{diff} to 34.463 -recursively generate a patch between the unmodified directory and the 34.464 -modified one. The result would be that the name of the unmodified 34.465 -directory would be at the front of the left-hand path in every file 34.466 -header, and the name of the modified directory would be at the front 34.467 -of the right-hand path. 34.468 - 34.469 -Since someone receiving a patch from the Alices of the net would be 34.470 -unlikely to have unmodified and modified directories with exactly the 34.471 -same names, the \command{patch} command has a \cmdopt{patch}{-p} 34.472 -option that indicates the number of leading path name components to 34.473 -strip when trying to apply a patch. This number is called the 34.474 -\emph{strip count}. 34.475 - 34.476 -An option of ``\texttt{-p1}'' means ``use a strip count of one''. If 34.477 -\command{patch} sees a file name \filename{foo/bar/baz} in a file 34.478 -header, it will strip \filename{foo} and try to patch a file named 34.479 -\filename{bar/baz}. (Strictly speaking, the strip count refers to the 34.480 -number of \emph{path separators} (and the components that go with them 34.481 -) to strip. A strip count of one will turn \filename{foo/bar} into 34.482 -\filename{bar}, but \filename{/foo/bar} (notice the extra leading 34.483 -slash) into \filename{foo/bar}.) 34.484 - 34.485 -The ``standard'' strip count for patches is one; almost all patches 34.486 -contain one leading path name component that needs to be stripped. 34.487 -Mercurial's \hgcmd{diff} command generates path names in this form, 34.488 -and the \hgcmd{import} command and MQ expect patches to have a strip 34.489 -count of one. 34.490 - 34.491 -If you receive a patch from someone that you want to add to your patch 34.492 -queue, and the patch needs a strip count other than one, you cannot 34.493 -just \hgxcmd{mq}{qimport} the patch, because \hgxcmd{mq}{qimport} does not yet 34.494 -have a \texttt{-p} option (see~\bug{311}). Your best bet is to 34.495 -\hgxcmd{mq}{qnew} a patch of your own, then use \cmdargs{patch}{-p\emph{N}} 34.496 -to apply their patch, followed by \hgcmd{addremove} to pick up any 34.497 -files added or removed by the patch, followed by \hgxcmd{mq}{qrefresh}. 34.498 -This complexity may become unnecessary; see~\bug{311} for details. 34.499 -\subsection{Strategies for applying a patch} 34.500 - 34.501 -When \command{patch} applies a hunk, it tries a handful of 34.502 -successively less accurate strategies to try to make the hunk apply. 34.503 -This falling-back technique often makes it possible to take a patch 34.504 -that was generated against an old version of a file, and apply it 34.505 -against a newer version of that file. 34.506 - 34.507 -First, \command{patch} tries an exact match, where the line numbers, 34.508 -the context, and the text to be modified must apply exactly. If it 34.509 -cannot make an exact match, it tries to find an exact match for the 34.510 -context, without honouring the line numbering information. If this 34.511 -succeeds, it prints a line of output saying that the hunk was applied, 34.512 -but at some \emph{offset} from the original line number. 34.513 - 34.514 -If a context-only match fails, \command{patch} removes the first and 34.515 -last lines of the context, and tries a \emph{reduced} context-only 34.516 -match. If the hunk with reduced context succeeds, it prints a message 34.517 -saying that it applied the hunk with a \emph{fuzz factor} (the number 34.518 -after the fuzz factor indicates how many lines of context 34.519 -\command{patch} had to trim before the patch applied). 34.520 - 34.521 -When neither of these techniques works, \command{patch} prints a 34.522 -message saying that the hunk in question was rejected. It saves 34.523 -rejected hunks (also simply called ``rejects'') to a file with the 34.524 -same name, and an added \sfilename{.rej} extension. It also saves an 34.525 -unmodified copy of the file with a \sfilename{.orig} extension; the 34.526 -copy of the file without any extensions will contain any changes made 34.527 -by hunks that \emph{did} apply cleanly. If you have a patch that 34.528 -modifies \filename{foo} with six hunks, and one of them fails to 34.529 -apply, you will have: an unmodified \filename{foo.orig}, a 34.530 -\filename{foo.rej} containing one hunk, and \filename{foo}, containing 34.531 -the changes made by the five successful hunks. 34.532 - 34.533 -\subsection{Some quirks of patch representation} 34.534 - 34.535 -There are a few useful things to know about how \command{patch} works 34.536 -with files. 34.537 -\begin{itemize} 34.538 -\item This should already be obvious, but \command{patch} cannot 34.539 - handle binary files. 34.540 -\item Neither does it care about the executable bit; it creates new 34.541 - files as readable, but not executable. 34.542 -\item \command{patch} treats the removal of a file as a diff between 34.543 - the file to be removed and the empty file. So your idea of ``I 34.544 - deleted this file'' looks like ``every line of this file was 34.545 - deleted'' in a patch. 34.546 -\item It treats the addition of a file as a diff between the empty 34.547 - file and the file to be added. So in a patch, your idea of ``I 34.548 - added this file'' looks like ``every line of this file was added''. 34.549 -\item It treats a renamed file as the removal of the old name, and the 34.550 - addition of the new name. This means that renamed files have a big 34.551 - footprint in patches. (Note also that Mercurial does not currently 34.552 - try to infer when files have been renamed or copied in a patch.) 34.553 -\item \command{patch} cannot represent empty files, so you cannot use 34.554 - a patch to represent the notion ``I added this empty file to the 34.555 - tree''. 34.556 -\end{itemize} 34.557 -\subsection{Beware the fuzz} 34.558 - 34.559 -While applying a hunk at an offset, or with a fuzz factor, will often 34.560 -be completely successful, these inexact techniques naturally leave 34.561 -open the possibility of corrupting the patched file. The most common 34.562 -cases typically involve applying a patch twice, or at an incorrect 34.563 -location in the file. If \command{patch} or \hgxcmd{mq}{qpush} ever 34.564 -mentions an offset or fuzz factor, you should make sure that the 34.565 -modified files are correct afterwards. 34.566 - 34.567 -It's often a good idea to refresh a patch that has applied with an 34.568 -offset or fuzz factor; refreshing the patch generates new context 34.569 -information that will make it apply cleanly. I say ``often,'' not 34.570 -``always,'' because sometimes refreshing a patch will make it fail to 34.571 -apply against a different revision of the underlying files. In some 34.572 -cases, such as when you're maintaining a patch that must sit on top of 34.573 -multiple versions of a source tree, it's acceptable to have a patch 34.574 -apply with some fuzz, provided you've verified the results of the 34.575 -patching process in such cases. 34.576 - 34.577 -\subsection{Handling rejection} 34.578 - 34.579 -If \hgxcmd{mq}{qpush} fails to apply a patch, it will print an error 34.580 -message and exit. If it has left \sfilename{.rej} files behind, it is 34.581 -usually best to fix up the rejected hunks before you push more patches 34.582 -or do any further work. 34.583 - 34.584 -If your patch \emph{used to} apply cleanly, and no longer does because 34.585 -you've changed the underlying code that your patches are based on, 34.586 -Mercurial Queues can help; see section~\ref{sec:mq:merge} for details. 34.587 - 34.588 -Unfortunately, there aren't any great techniques for dealing with 34.589 -rejected hunks. Most often, you'll need to view the \sfilename{.rej} 34.590 -file and edit the target file, applying the rejected hunks by hand. 34.591 - 34.592 -If you're feeling adventurous, Neil Brown, a Linux kernel hacker, 34.593 -wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more 34.594 -vigorous than \command{patch} in its attempts to make a patch apply. 34.595 - 34.596 -Another Linux kernel hacker, Chris Mason (the author of Mercurial 34.597 -Queues), wrote a similar tool called 34.598 -\command{mpatch}~\cite{web:mpatch}, which takes a simple approach to 34.599 -automating the application of hunks rejected by \command{patch}. The 34.600 -\command{mpatch} command can help with four common reasons that a hunk 34.601 -may be rejected: 34.602 - 34.603 -\begin{itemize} 34.604 -\item The context in the middle of a hunk has changed. 34.605 -\item A hunk is missing some context at the beginning or end. 34.606 -\item A large hunk might apply better---either entirely or in 34.607 - part---if it was broken up into smaller hunks. 34.608 -\item A hunk removes lines with slightly different content than those 34.609 - currently present in the file. 34.610 -\end{itemize} 34.611 - 34.612 -If you use \command{wiggle} or \command{mpatch}, you should be doubly 34.613 -careful to check your results when you're done. In fact, 34.614 -\command{mpatch} enforces this method of double-checking the tool's 34.615 -output, by automatically dropping you into a merge program when it has 34.616 -done its job, so that you can verify its work and finish off any 34.617 -remaining merges. 34.618 - 34.619 -\section{Getting the best performance out of MQ} 34.620 -\label{sec:mq:perf} 34.621 - 34.622 -MQ is very efficient at handling a large number of patches. I ran 34.623 -some performance experiments in mid-2006 for a talk that I gave at the 34.624 -2006 EuroPython conference~\cite{web:europython}. I used as my data 34.625 -set the Linux 2.6.17-mm1 patch series, which consists of 1,738 34.626 -patches. I applied these on top of a Linux kernel repository 34.627 -containing all 27,472 revisions between Linux 2.6.12-rc2 and Linux 34.628 -2.6.17. 34.629 - 34.630 -On my old, slow laptop, I was able to 34.631 -\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} all 1,738 patches in 3.5 minutes, 34.632 -and \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} them all in 30 seconds. (On a 34.633 -newer laptop, the time to push all patches dropped to two minutes.) I 34.634 -could \hgxcmd{mq}{qrefresh} one of the biggest patches (which made 22,779 34.635 -lines of changes to 287 files) in 6.6 seconds. 34.636 - 34.637 -Clearly, MQ is well suited to working in large trees, but there are a 34.638 -few tricks you can use to get the best performance of it. 34.639 - 34.640 -First of all, try to ``batch'' operations together. Every time you 34.641 -run \hgxcmd{mq}{qpush} or \hgxcmd{mq}{qpop}, these commands scan the working 34.642 -directory once to make sure you haven't made some changes and then 34.643 -forgotten to run \hgxcmd{mq}{qrefresh}. On a small tree, the time that 34.644 -this scan takes is unnoticeable. However, on a medium-sized tree 34.645 -(containing tens of thousands of files), it can take a second or more. 34.646 - 34.647 -The \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} commands allow you to push and pop 34.648 -multiple patches at a time. You can identify the ``destination 34.649 -patch'' that you want to end up at. When you \hgxcmd{mq}{qpush} with a 34.650 -destination specified, it will push patches until that patch is at the 34.651 -top of the applied stack. When you \hgxcmd{mq}{qpop} to a destination, MQ 34.652 -will pop patches until the destination patch is at the top. 34.653 - 34.654 -You can identify a destination patch using either the name of the 34.655 -patch, or by number. If you use numeric addressing, patches are 34.656 -counted from zero; this means that the first patch is zero, the second 34.657 -is one, and so on. 34.658 - 34.659 -\section{Updating your patches when the underlying code changes} 34.660 -\label{sec:mq:merge} 34.661 - 34.662 -It's common to have a stack of patches on top of an underlying 34.663 -repository that you don't modify directly. If you're working on 34.664 -changes to third-party code, or on a feature that is taking longer to 34.665 -develop than the rate of change of the code beneath, you will often 34.666 -need to sync up with the underlying code, and fix up any hunks in your 34.667 -patches that no longer apply. This is called \emph{rebasing} your 34.668 -patch series. 34.669 - 34.670 -The simplest way to do this is to \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} 34.671 -your patches, then \hgcmd{pull} changes into the underlying 34.672 -repository, and finally \hgcmdargs{qpush}{\hgxopt{mq}{qpop}{-a}} your 34.673 -patches again. MQ will stop pushing any time it runs across a patch 34.674 -that fails to apply during conflicts, allowing you to fix your 34.675 -conflicts, \hgxcmd{mq}{qrefresh} the affected patch, and continue pushing 34.676 -until you have fixed your entire stack. 34.677 - 34.678 -This approach is easy to use and works well if you don't expect 34.679 -changes to the underlying code to affect how well your patches apply. 34.680 -If your patch stack touches code that is modified frequently or 34.681 -invasively in the underlying repository, however, fixing up rejected 34.682 -hunks by hand quickly becomes tiresome. 34.683 - 34.684 -It's possible to partially automate the rebasing process. If your 34.685 -patches apply cleanly against some revision of the underlying repo, MQ 34.686 -can use this information to help you to resolve conflicts between your 34.687 -patches and a different revision. 34.688 - 34.689 -The process is a little involved. 34.690 -\begin{enumerate} 34.691 -\item To begin, \hgcmdargs{qpush}{-a} all of your patches on top of 34.692 - the revision where you know that they apply cleanly. 34.693 -\item Save a backup copy of your patch directory using 34.694 - \hgcmdargs{qsave}{\hgxopt{mq}{qsave}{-e} \hgxopt{mq}{qsave}{-c}}. This prints 34.695 - the name of the directory that it has saved the patches in. It will 34.696 - save the patches to a directory called 34.697 - \sdirname{.hg/patches.\emph{N}}, where \texttt{\emph{N}} is a small 34.698 - integer. It also commits a ``save changeset'' on top of your 34.699 - applied patches; this is for internal book-keeping, and records the 34.700 - states of the \sfilename{series} and \sfilename{status} files. 34.701 -\item Use \hgcmd{pull} to bring new changes into the underlying 34.702 - repository. (Don't run \hgcmdargs{pull}{-u}; see below for why.) 34.703 -\item Update to the new tip revision, using 34.704 - \hgcmdargs{update}{\hgopt{update}{-C}} to override the patches you 34.705 - have pushed. 34.706 -\item Merge all patches using \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m} 34.707 - \hgxopt{mq}{qpush}{-a}}. The \hgxopt{mq}{qpush}{-m} option to \hgxcmd{mq}{qpush} 34.708 - tells MQ to perform a three-way merge if the patch fails to apply. 34.709 -\end{enumerate} 34.710 - 34.711 -During the \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}}, each patch in the 34.712 -\sfilename{series} file is applied normally. If a patch applies with 34.713 -fuzz or rejects, MQ looks at the queue you \hgxcmd{mq}{qsave}d, and 34.714 -performs a three-way merge with the corresponding changeset. This 34.715 -merge uses Mercurial's normal merge machinery, so it may pop up a GUI 34.716 -merge tool to help you to resolve problems. 34.717 - 34.718 -When you finish resolving the effects of a patch, MQ refreshes your 34.719 -patch based on the result of the merge. 34.720 - 34.721 -At the end of this process, your repository will have one extra head 34.722 -from the old patch queue, and a copy of the old patch queue will be in 34.723 -\sdirname{.hg/patches.\emph{N}}. You can remove the extra head using 34.724 -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a} \hgxopt{mq}{qpop}{-n} patches.\emph{N}} 34.725 -or \hgcmd{strip}. You can delete \sdirname{.hg/patches.\emph{N}} once 34.726 -you are sure that you no longer need it as a backup. 34.727 - 34.728 -\section{Identifying patches} 34.729 - 34.730 -MQ commands that work with patches let you refer to a patch either by 34.731 -using its name or by a number. By name is obvious enough; pass the 34.732 -name \filename{foo.patch} to \hgxcmd{mq}{qpush}, for example, and it will 34.733 -push patches until \filename{foo.patch} is applied. 34.734 - 34.735 -As a shortcut, you can refer to a patch using both a name and a 34.736 -numeric offset; \texttt{foo.patch-2} means ``two patches before 34.737 -\texttt{foo.patch}'', while \texttt{bar.patch+4} means ``four patches 34.738 -after \texttt{bar.patch}''. 34.739 - 34.740 -Referring to a patch by index isn't much different. The first patch 34.741 -printed in the output of \hgxcmd{mq}{qseries} is patch zero (yes, it's one 34.742 -of those start-at-zero counting systems); the second is patch one; and 34.743 -so on. 34.744 - 34.745 -MQ also makes it easy to work with patches when you are using normal 34.746 -Mercurial commands. Every command that accepts a changeset ID will 34.747 -also accept the name of an applied patch. MQ augments the tags 34.748 -normally in the repository with an eponymous one for each applied 34.749 -patch. In addition, the special tags \index{tags!special tag 34.750 - names!\texttt{qbase}}\texttt{qbase} and \index{tags!special tag 34.751 - names!\texttt{qtip}}\texttt{qtip} identify the ``bottom-most'' and 34.752 -topmost applied patches, respectively. 34.753 - 34.754 -These additions to Mercurial's normal tagging capabilities make 34.755 -dealing with patches even more of a breeze. 34.756 -\begin{itemize} 34.757 -\item Want to patchbomb a mailing list with your latest series of 34.758 - changes? 34.759 - \begin{codesample4} 34.760 - hg email qbase:qtip 34.761 - \end{codesample4} 34.762 - (Don't know what ``patchbombing'' is? See 34.763 - section~\ref{sec:hgext:patchbomb}.) 34.764 -\item Need to see all of the patches since \texttt{foo.patch} that 34.765 - have touched files in a subdirectory of your tree? 34.766 - \begin{codesample4} 34.767 - hg log -r foo.patch:qtip \emph{subdir} 34.768 - \end{codesample4} 34.769 -\end{itemize} 34.770 - 34.771 -Because MQ makes the names of patches available to the rest of 34.772 -Mercurial through its normal internal tag machinery, you don't need to 34.773 -type in the entire name of a patch when you want to identify it by 34.774 -name. 34.775 - 34.776 -\begin{figure}[ht] 34.777 - \interaction{mq.id.output} 34.778 - \caption{Using MQ's tag features to work with patches} 34.779 - \label{ex:mq:id} 34.780 -\end{figure} 34.781 - 34.782 -Another nice consequence of representing patch names as tags is that 34.783 -when you run the \hgcmd{log} command, it will display a patch's name 34.784 -as a tag, simply as part of its normal output. This makes it easy to 34.785 -visually distinguish applied patches from underlying ``normal'' 34.786 -revisions. Figure~\ref{ex:mq:id} shows a few normal Mercurial 34.787 -commands in use with applied patches. 34.788 - 34.789 -\section{Useful things to know about} 34.790 - 34.791 -There are a number of aspects of MQ usage that don't fit tidily into 34.792 -sections of their own, but that are good to know. Here they are, in 34.793 -one place. 34.794 - 34.795 -\begin{itemize} 34.796 -\item Normally, when you \hgxcmd{mq}{qpop} a patch and \hgxcmd{mq}{qpush} it 34.797 - again, the changeset that represents the patch after the pop/push 34.798 - will have a \emph{different identity} than the changeset that 34.799 - represented the hash beforehand. See 34.800 - section~\ref{sec:mqref:cmd:qpush} for information as to why this is. 34.801 -\item It's not a good idea to \hgcmd{merge} changes from another 34.802 - branch with a patch changeset, at least if you want to maintain the 34.803 - ``patchiness'' of that changeset and changesets below it on the 34.804 - patch stack. If you try to do this, it will appear to succeed, but 34.805 - MQ will become confused. 34.806 -\end{itemize} 34.807 - 34.808 -\section{Managing patches in a repository} 34.809 -\label{sec:mq:repo} 34.810 - 34.811 -Because MQ's \sdirname{.hg/patches} directory resides outside a 34.812 -Mercurial repository's working directory, the ``underlying'' Mercurial 34.813 -repository knows nothing about the management or presence of patches. 34.814 - 34.815 -This presents the interesting possibility of managing the contents of 34.816 -the patch directory as a Mercurial repository in its own right. This 34.817 -can be a useful way to work. For example, you can work on a patch for 34.818 -a while, \hgxcmd{mq}{qrefresh} it, then \hgcmd{commit} the current state of 34.819 -the patch. This lets you ``roll back'' to that version of the patch 34.820 -later on. 34.821 - 34.822 -You can then share different versions of the same patch stack among 34.823 -multiple underlying repositories. I use this when I am developing a 34.824 -Linux kernel feature. I have a pristine copy of my kernel sources for 34.825 -each of several CPU architectures, and a cloned repository under each 34.826 -that contains the patches I am working on. When I want to test a 34.827 -change on a different architecture, I push my current patches to the 34.828 -patch repository associated with that kernel tree, pop and push all of 34.829 -my patches, and build and test that kernel. 34.830 - 34.831 -Managing patches in a repository makes it possible for multiple 34.832 -developers to work on the same patch series without colliding with 34.833 -each other, all on top of an underlying source base that they may or 34.834 -may not control. 34.835 - 34.836 -\subsection{MQ support for patch repositories} 34.837 - 34.838 -MQ helps you to work with the \sdirname{.hg/patches} directory as a 34.839 -repository; when you prepare a repository for working with patches 34.840 -using \hgxcmd{mq}{qinit}, you can pass the \hgxopt{mq}{qinit}{-c} option to 34.841 -create the \sdirname{.hg/patches} directory as a Mercurial repository. 34.842 - 34.843 -\begin{note} 34.844 - If you forget to use the \hgxopt{mq}{qinit}{-c} option, you can simply go 34.845 - into the \sdirname{.hg/patches} directory at any time and run 34.846 - \hgcmd{init}. Don't forget to add an entry for the 34.847 - \sfilename{status} file to the \sfilename{.hgignore} file, though 34.848 - 34.849 - (\hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} does this for you 34.850 - automatically); you \emph{really} don't want to manage the 34.851 - \sfilename{status} file. 34.852 -\end{note} 34.853 - 34.854 -As a convenience, if MQ notices that the \dirname{.hg/patches} 34.855 -directory is a repository, it will automatically \hgcmd{add} every 34.856 -patch that you create and import. 34.857 - 34.858 -MQ provides a shortcut command, \hgxcmd{mq}{qcommit}, that runs 34.859 -\hgcmd{commit} in the \sdirname{.hg/patches} directory. This saves 34.860 -some bothersome typing. 34.861 - 34.862 -Finally, as a convenience to manage the patch directory, you can 34.863 -define the alias \command{mq} on Unix systems. For example, on Linux 34.864 -systems using the \command{bash} shell, you can include the following 34.865 -snippet in your \tildefile{.bashrc}. 34.866 - 34.867 -\begin{codesample2} 34.868 - alias mq=`hg -R \$(hg root)/.hg/patches' 34.869 -\end{codesample2} 34.870 - 34.871 -You can then issue commands of the form \cmdargs{mq}{pull} from 34.872 -the main repository. 34.873 - 34.874 -\subsection{A few things to watch out for} 34.875 - 34.876 -MQ's support for working with a repository full of patches is limited 34.877 -in a few small respects. 34.878 - 34.879 -MQ cannot automatically detect changes that you make to the patch 34.880 -directory. If you \hgcmd{pull}, manually edit, or \hgcmd{update} 34.881 -changes to patches or the \sfilename{series} file, you will have to 34.882 -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} and then 34.883 -\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} in the underlying repository to 34.884 -see those changes show up there. If you forget to do this, you can 34.885 -confuse MQ's idea of which patches are applied. 34.886 - 34.887 -\section{Third party tools for working with patches} 34.888 -\label{sec:mq:tools} 34.889 - 34.890 -Once you've been working with patches for a while, you'll find 34.891 -yourself hungry for tools that will help you to understand and 34.892 -manipulate the patches you're dealing with. 34.893 - 34.894 -The \command{diffstat} command~\cite{web:diffstat} generates a 34.895 -histogram of the modifications made to each file in a patch. It 34.896 -provides a good way to ``get a sense of'' a patch---which files it 34.897 -affects, and how much change it introduces to each file and as a 34.898 -whole. (I find that it's a good idea to use \command{diffstat}'s 34.899 -\cmdopt{diffstat}{-p} option as a matter of course, as otherwise it 34.900 -will try to do clever things with prefixes of file names that 34.901 -inevitably confuse at least me.) 34.902 - 34.903 -\begin{figure}[ht] 34.904 - \interaction{mq.tools.tools} 34.905 - \caption{The \command{diffstat}, \command{filterdiff}, and \command{lsdiff} commands} 34.906 - \label{ex:mq:tools} 34.907 -\end{figure} 34.908 - 34.909 -The \package{patchutils} package~\cite{web:patchutils} is invaluable. 34.910 -It provides a set of small utilities that follow the ``Unix 34.911 -philosophy;'' each does one useful thing with a patch. The 34.912 -\package{patchutils} command I use most is \command{filterdiff}, which 34.913 -extracts subsets from a patch file. For example, given a patch that 34.914 -modifies hundreds of files across dozens of directories, a single 34.915 -invocation of \command{filterdiff} can generate a smaller patch that 34.916 -only touches files whose names match a particular glob pattern. See 34.917 -section~\ref{mq-collab:tips:interdiff} for another example. 34.918 - 34.919 -\section{Good ways to work with patches} 34.920 - 34.921 -Whether you are working on a patch series to submit to a free software 34.922 -or open source project, or a series that you intend to treat as a 34.923 -sequence of regular changesets when you're done, you can use some 34.924 -simple techniques to keep your work well organised. 34.925 - 34.926 -Give your patches descriptive names. A good name for a patch might be 34.927 -\filename{rework-device-alloc.patch}, because it will immediately give 34.928 -you a hint what the purpose of the patch is. Long names shouldn't be 34.929 -a problem; you won't be typing the names often, but you \emph{will} be 34.930 -running commands like \hgxcmd{mq}{qapplied} and \hgxcmd{mq}{qtop} over and over. 34.931 -Good naming becomes especially important when you have a number of 34.932 -patches to work with, or if you are juggling a number of different 34.933 -tasks and your patches only get a fraction of your attention. 34.934 - 34.935 -Be aware of what patch you're working on. Use the \hgxcmd{mq}{qtop} 34.936 -command and skim over the text of your patches frequently---for 34.937 -example, using \hgcmdargs{tip}{\hgopt{tip}{-p}})---to be sure of where 34.938 -you stand. I have several times worked on and \hgxcmd{mq}{qrefresh}ed a 34.939 -patch other than the one I intended, and it's often tricky to migrate 34.940 -changes into the right patch after making them in the wrong one. 34.941 - 34.942 -For this reason, it is very much worth investing a little time to 34.943 -learn how to use some of the third-party tools I described in 34.944 -section~\ref{sec:mq:tools}, particularly \command{diffstat} and 34.945 -\command{filterdiff}. The former will give you a quick idea of what 34.946 -changes your patch is making, while the latter makes it easy to splice 34.947 -hunks selectively out of one patch and into another. 34.948 - 34.949 -\section{MQ cookbook} 34.950 - 34.951 -\subsection{Manage ``trivial'' patches} 34.952 - 34.953 -Because the overhead of dropping files into a new Mercurial repository 34.954 -is so low, it makes a lot of sense to manage patches this way even if 34.955 -you simply want to make a few changes to a source tarball that you 34.956 -downloaded. 34.957 - 34.958 -Begin by downloading and unpacking the source tarball, 34.959 -and turning it into a Mercurial repository. 34.960 -\interaction{mq.tarball.download} 34.961 - 34.962 -Continue by creating a patch stack and making your changes. 34.963 -\interaction{mq.tarball.qinit} 34.964 - 34.965 -Let's say a few weeks or months pass, and your package author releases 34.966 -a new version. First, bring their changes into the repository. 34.967 -\interaction{mq.tarball.newsource} 34.968 -The pipeline starting with \hgcmd{locate} above deletes all files in 34.969 -the working directory, so that \hgcmd{commit}'s 34.970 -\hgopt{commit}{--addremove} option can actually tell which files have 34.971 -really been removed in the newer version of the source. 34.972 - 34.973 -Finally, you can apply your patches on top of the new tree. 34.974 -\interaction{mq.tarball.repush} 34.975 - 34.976 -\subsection{Combining entire patches} 34.977 -\label{sec:mq:combine} 34.978 - 34.979 -MQ provides a command, \hgxcmd{mq}{qfold} that lets you combine entire 34.980 -patches. This ``folds'' the patches you name, in the order you name 34.981 -them, into the topmost applied patch, and concatenates their 34.982 -descriptions onto the end of its description. The patches that you 34.983 -fold must be unapplied before you fold them. 34.984 - 34.985 -The order in which you fold patches matters. If your topmost applied 34.986 -patch is \texttt{foo}, and you \hgxcmd{mq}{qfold} \texttt{bar} and 34.987 -\texttt{quux} into it, you will end up with a patch that has the same 34.988 -effect as if you applied first \texttt{foo}, then \texttt{bar}, 34.989 -followed by \texttt{quux}. 34.990 - 34.991 -\subsection{Merging part of one patch into another} 34.992 - 34.993 -Merging \emph{part} of one patch into another is more difficult than 34.994 -combining entire patches. 34.995 - 34.996 -If you want to move changes to entire files, you can use 34.997 -\command{filterdiff}'s \cmdopt{filterdiff}{-i} and 34.998 -\cmdopt{filterdiff}{-x} options to choose the modifications to snip 34.999 -out of one patch, concatenating its output onto the end of the patch 34.1000 -you want to merge into. You usually won't need to modify the patch 34.1001 -you've merged the changes from. Instead, MQ will report some rejected 34.1002 -hunks when you \hgxcmd{mq}{qpush} it (from the hunks you moved into the 34.1003 -other patch), and you can simply \hgxcmd{mq}{qrefresh} the patch to drop 34.1004 -the duplicate hunks. 34.1005 - 34.1006 -If you have a patch that has multiple hunks modifying a file, and you 34.1007 -only want to move a few of those hunks, the job becomes more messy, 34.1008 -but you can still partly automate it. Use \cmdargs{lsdiff}{-nvv} to 34.1009 -print some metadata about the patch. 34.1010 -\interaction{mq.tools.lsdiff} 34.1011 - 34.1012 -This command prints three different kinds of number: 34.1013 -\begin{itemize} 34.1014 -\item (in the first column) a \emph{file number} to identify each file 34.1015 - modified in the patch; 34.1016 -\item (on the next line, indented) the line number within a modified 34.1017 - file where a hunk starts; and 34.1018 -\item (on the same line) a \emph{hunk number} to identify that hunk. 34.1019 -\end{itemize} 34.1020 - 34.1021 -You'll have to use some visual inspection, and reading of the patch, 34.1022 -to identify the file and hunk numbers you'll want, but you can then 34.1023 -pass them to to \command{filterdiff}'s \cmdopt{filterdiff}{--files} 34.1024 -and \cmdopt{filterdiff}{--hunks} options, to select exactly the file 34.1025 -and hunk you want to extract. 34.1026 - 34.1027 -Once you have this hunk, you can concatenate it onto the end of your 34.1028 -destination patch and continue with the remainder of 34.1029 -section~\ref{sec:mq:combine}. 34.1030 - 34.1031 -\section{Differences between quilt and MQ} 34.1032 - 34.1033 -If you are already familiar with quilt, MQ provides a similar command 34.1034 -set. There are a few differences in the way that it works. 34.1035 - 34.1036 -You will already have noticed that most quilt commands have MQ 34.1037 -counterparts that simply begin with a ``\texttt{q}''. The exceptions 34.1038 -are quilt's \texttt{add} and \texttt{remove} commands, the 34.1039 -counterparts for which are the normal Mercurial \hgcmd{add} and 34.1040 -\hgcmd{remove} commands. There is no MQ equivalent of the quilt 34.1041 -\texttt{edit} command. 34.1042 - 34.1043 -%%% Local Variables: 34.1044 -%%% mode: latex 34.1045 -%%% TeX-master: "00book" 34.1046 -%%% End:
35.1 --- a/en/preface.tex Thu Jan 29 22:47:34 2009 -0800 35.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 35.3 @@ -1,67 +0,0 @@ 35.4 -\chapter*{Preface} 35.5 -\addcontentsline{toc}{chapter}{Preface} 35.6 -\label{chap:preface} 35.7 - 35.8 -Distributed revision control is a relatively new territory, and has 35.9 -thus far grown due to people's willingness to strike out into 35.10 -ill-charted territory. 35.11 - 35.12 -I am writing a book about distributed revision control because I 35.13 -believe that it is an important subject that deserves a field guide. 35.14 -I chose to write about Mercurial because it is the easiest tool to 35.15 -learn the terrain with, and yet it scales to the demands of real, 35.16 -challenging environments where many other revision control tools fail. 35.17 - 35.18 -\section{This book is a work in progress} 35.19 - 35.20 -I am releasing this book while I am still writing it, in the hope that 35.21 -it will prove useful to others. I also hope that readers will 35.22 -contribute as they see fit. 35.23 - 35.24 -\section{About the examples in this book} 35.25 - 35.26 -This book takes an unusual approach to code samples. Every example is 35.27 -``live''---each one is actually the result of a shell script that 35.28 -executes the Mercurial commands you see. Every time an image of the 35.29 -book is built from its sources, all the example scripts are 35.30 -automatically run, and their current results compared against their 35.31 -expected results. 35.32 - 35.33 -The advantage of this approach is that the examples are always 35.34 -accurate; they describe \emph{exactly} the behaviour of the version of 35.35 -Mercurial that's mentioned at the front of the book. If I update the 35.36 -version of Mercurial that I'm documenting, and the output of some 35.37 -command changes, the build fails. 35.38 - 35.39 -There is a small disadvantage to this approach, which is that the 35.40 -dates and times you'll see in examples tend to be ``squashed'' 35.41 -together in a way that they wouldn't be if the same commands were 35.42 -being typed by a human. Where a human can issue no more than one 35.43 -command every few seconds, with any resulting timestamps 35.44 -correspondingly spread out, my automated example scripts run many 35.45 -commands in one second. 35.46 - 35.47 -As an instance of this, several consecutive commits in an example can 35.48 -show up as having occurred during the same second. You can see this 35.49 -occur in the \hgext{bisect} example in section~\ref{sec:undo:bisect}, 35.50 -for instance. 35.51 - 35.52 -So when you're reading examples, don't place too much weight on the 35.53 -dates or times you see in the output of commands. But \emph{do} be 35.54 -confident that the behaviour you're seeing is consistent and 35.55 -reproducible. 35.56 - 35.57 -\section{Colophon---this book is Free} 35.58 - 35.59 -This book is licensed under the Open Publication License, and is 35.60 -produced entirely using Free Software tools. It is typeset with 35.61 -\LaTeX{}; illustrations are drawn and rendered with 35.62 -\href{http://www.inkscape.org/}{Inkscape}. 35.63 - 35.64 -The complete source code for this book is published as a Mercurial 35.65 -repository, at \url{http://hg.serpentine.com/mercurial/book}. 35.66 - 35.67 -%%% Local Variables: 35.68 -%%% mode: latex 35.69 -%%% TeX-master: "00book" 35.70 -%%% End:
36.1 --- a/en/srcinstall.tex Thu Jan 29 22:47:34 2009 -0800 36.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 36.3 @@ -1,53 +0,0 @@ 36.4 -\chapter{Installing Mercurial from source} 36.5 -\label{chap:srcinstall} 36.6 - 36.7 -\section{On a Unix-like system} 36.8 -\label{sec:srcinstall:unixlike} 36.9 - 36.10 -If you are using a Unix-like system that has a sufficiently recent 36.11 -version of Python (2.3~or newer) available, it is easy to install 36.12 -Mercurial from source. 36.13 -\begin{enumerate} 36.14 -\item Download a recent source tarball from 36.15 - \url{http://www.selenic.com/mercurial/download}. 36.16 -\item Unpack the tarball: 36.17 - \begin{codesample4} 36.18 - gzip -dc mercurial-\emph{version}.tar.gz | tar xf - 36.19 - \end{codesample4} 36.20 -\item Go into the source directory and run the installer script. This 36.21 - will build Mercurial and install it in your home directory. 36.22 - \begin{codesample4} 36.23 - cd mercurial-\emph{version} 36.24 - python setup.py install --force --home=\$HOME 36.25 - \end{codesample4} 36.26 -\end{enumerate} 36.27 -Once the install finishes, Mercurial will be in the \texttt{bin} 36.28 -subdirectory of your home directory. Don't forget to make sure that 36.29 -this directory is present in your shell's search path. 36.30 - 36.31 -You will probably need to set the \envar{PYTHONPATH} environment 36.32 -variable so that the Mercurial executable can find the rest of the 36.33 -Mercurial packages. For example, on my laptop, I have set it to 36.34 -\texttt{/home/bos/lib/python}. The exact path that you will need to 36.35 -use depends on how Python was built for your system, but should be 36.36 -easy to figure out. If you're uncertain, look through the output of 36.37 -the installer script above, and see where the contents of the 36.38 -\texttt{mercurial} directory were installed to. 36.39 - 36.40 -\section{On Windows} 36.41 - 36.42 -Building and installing Mercurial on Windows requires a variety of 36.43 -tools, a fair amount of technical knowledge, and considerable 36.44 -patience. I very much \emph{do not recommend} this route if you are a 36.45 -``casual user''. Unless you intend to hack on Mercurial, I strongly 36.46 -suggest that you use a binary package instead. 36.47 - 36.48 -If you are intent on building Mercurial from source on Windows, follow 36.49 -the ``hard way'' directions on the Mercurial wiki at 36.50 -\url{http://www.selenic.com/mercurial/wiki/index.cgi/WindowsInstall}, 36.51 -and expect the process to involve a lot of fiddly work. 36.52 - 36.53 -%%% Local Variables: 36.54 -%%% mode: latex 36.55 -%%% TeX-master: "00book" 36.56 -%%% End:
37.1 --- a/en/template.tex Thu Jan 29 22:47:34 2009 -0800 37.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 37.3 @@ -1,475 +0,0 @@ 37.4 -\chapter{Customising the output of Mercurial} 37.5 -\label{chap:template} 37.6 - 37.7 -Mercurial provides a powerful mechanism to let you control how it 37.8 -displays information. The mechanism is based on templates. You can 37.9 -use templates to generate specific output for a single command, or to 37.10 -customise the entire appearance of the built-in web interface. 37.11 - 37.12 -\section{Using precanned output styles} 37.13 -\label{sec:style} 37.14 - 37.15 -Packaged with Mercurial are some output styles that you can use 37.16 -immediately. A style is simply a precanned template that someone 37.17 -wrote and installed somewhere that Mercurial can find. 37.18 - 37.19 -Before we take a look at Mercurial's bundled styles, let's review its 37.20 -normal output. 37.21 - 37.22 -\interaction{template.simple.normal} 37.23 - 37.24 -This is somewhat informative, but it takes up a lot of space---five 37.25 -lines of output per changeset. The \texttt{compact} style reduces 37.26 -this to three lines, presented in a sparse manner. 37.27 - 37.28 -\interaction{template.simple.compact} 37.29 - 37.30 -The \texttt{changelog} style hints at the expressive power of 37.31 -Mercurial's templating engine. This style attempts to follow the GNU 37.32 -Project's changelog guidelines\cite{web:changelog}. 37.33 - 37.34 -\interaction{template.simple.changelog} 37.35 - 37.36 -You will not be shocked to learn that Mercurial's default output style 37.37 -is named \texttt{default}. 37.38 - 37.39 -\subsection{Setting a default style} 37.40 - 37.41 -You can modify the output style that Mercurial will use for every 37.42 -command by editing your \hgrc\ file, naming the style you would 37.43 -prefer to use. 37.44 - 37.45 -\begin{codesample2} 37.46 - [ui] 37.47 - style = compact 37.48 -\end{codesample2} 37.49 - 37.50 -If you write a style of your own, you can use it by either providing 37.51 -the path to your style file, or copying your style file into a 37.52 -location where Mercurial can find it (typically the \texttt{templates} 37.53 -subdirectory of your Mercurial install directory). 37.54 - 37.55 -\section{Commands that support styles and templates} 37.56 - 37.57 -All of Mercurial's ``\texttt{log}-like'' commands let you use styles 37.58 -and templates: \hgcmd{incoming}, \hgcmd{log}, \hgcmd{outgoing}, and 37.59 -\hgcmd{tip}. 37.60 - 37.61 -As I write this manual, these are so far the only commands that 37.62 -support styles and templates. Since these are the most important 37.63 -commands that need customisable output, there has been little pressure 37.64 -from the Mercurial user community to add style and template support to 37.65 -other commands. 37.66 - 37.67 -\section{The basics of templating} 37.68 - 37.69 -At its simplest, a Mercurial template is a piece of text. Some of the 37.70 -text never changes, while other parts are \emph{expanded}, or replaced 37.71 -with new text, when necessary. 37.72 - 37.73 -Before we continue, let's look again at a simple example of 37.74 -Mercurial's normal output. 37.75 - 37.76 -\interaction{template.simple.normal} 37.77 - 37.78 -Now, let's run the same command, but using a template to change its 37.79 -output. 37.80 - 37.81 -\interaction{template.simple.simplest} 37.82 - 37.83 -The example above illustrates the simplest possible template; it's 37.84 -just a piece of static text, printed once for each changeset. The 37.85 -\hgopt{log}{--template} option to the \hgcmd{log} command tells 37.86 -Mercurial to use the given text as the template when printing each 37.87 -changeset. 37.88 - 37.89 -Notice that the template string above ends with the text 37.90 -``\Verb+\n+''. This is an \emph{escape sequence}, telling Mercurial 37.91 -to print a newline at the end of each template item. If you omit this 37.92 -newline, Mercurial will run each piece of output together. See 37.93 -section~\ref{sec:template:escape} for more details of escape sequences. 37.94 - 37.95 -A template that prints a fixed string of text all the time isn't very 37.96 -useful; let's try something a bit more complex. 37.97 - 37.98 -\interaction{template.simple.simplesub} 37.99 - 37.100 -As you can see, the string ``\Verb+{desc}+'' in the template has been 37.101 -replaced in the output with the description of each changeset. Every 37.102 -time Mercurial finds text enclosed in curly braces (``\texttt{\{}'' 37.103 -and ``\texttt{\}}''), it will try to replace the braces and text with 37.104 -the expansion of whatever is inside. To print a literal curly brace, 37.105 -you must escape it, as described in section~\ref{sec:template:escape}. 37.106 - 37.107 -\section{Common template keywords} 37.108 -\label{sec:template:keyword} 37.109 - 37.110 -You can start writing simple templates immediately using the keywords 37.111 -below. 37.112 - 37.113 -\begin{itemize} 37.114 -\item[\tplkword{author}] String. The unmodified author of the changeset. 37.115 -\item[\tplkword{branches}] String. The name of the branch on which 37.116 - the changeset was committed. Will be empty if the branch name was 37.117 - \texttt{default}. 37.118 -\item[\tplkword{date}] Date information. The date when the changeset 37.119 - was committed. This is \emph{not} human-readable; you must pass it 37.120 - through a filter that will render it appropriately. See 37.121 - section~\ref{sec:template:filter} for more information on filters. 37.122 - The date is expressed as a pair of numbers. The first number is a 37.123 - Unix UTC timestamp (seconds since January 1, 1970); the second is 37.124 - the offset of the committer's timezone from UTC, in seconds. 37.125 -\item[\tplkword{desc}] String. The text of the changeset description. 37.126 -\item[\tplkword{files}] List of strings. All files modified, added, or 37.127 - removed by this changeset. 37.128 -\item[\tplkword{file\_adds}] List of strings. Files added by this 37.129 - changeset. 37.130 -\item[\tplkword{file\_dels}] List of strings. Files removed by this 37.131 - changeset. 37.132 -\item[\tplkword{node}] String. The changeset identification hash, as a 37.133 - 40-character hexadecimal string. 37.134 -\item[\tplkword{parents}] List of strings. The parents of the 37.135 - changeset. 37.136 -\item[\tplkword{rev}] Integer. The repository-local changeset revision 37.137 - number. 37.138 -\item[\tplkword{tags}] List of strings. Any tags associated with the 37.139 - changeset. 37.140 -\end{itemize} 37.141 - 37.142 -A few simple experiments will show us what to expect when we use these 37.143 -keywords; you can see the results in 37.144 -figure~\ref{fig:template:keywords}. 37.145 - 37.146 -\begin{figure} 37.147 - \interaction{template.simple.keywords} 37.148 - \caption{Template keywords in use} 37.149 - \label{fig:template:keywords} 37.150 -\end{figure} 37.151 - 37.152 -As we noted above, the date keyword does not produce human-readable 37.153 -output, so we must treat it specially. This involves using a 37.154 -\emph{filter}, about which more in section~\ref{sec:template:filter}. 37.155 - 37.156 -\interaction{template.simple.datekeyword} 37.157 - 37.158 -\section{Escape sequences} 37.159 -\label{sec:template:escape} 37.160 - 37.161 -Mercurial's templating engine recognises the most commonly used escape 37.162 -sequences in strings. When it sees a backslash (``\Verb+\+'') 37.163 -character, it looks at the following character and substitutes the two 37.164 -characters with a single replacement, as described below. 37.165 - 37.166 -\begin{itemize} 37.167 -\item[\Verb+\textbackslash\textbackslash+] Backslash, ``\Verb+\+'', 37.168 - ASCII~134. 37.169 -\item[\Verb+\textbackslash n+] Newline, ASCII~12. 37.170 -\item[\Verb+\textbackslash r+] Carriage return, ASCII~15. 37.171 -\item[\Verb+\textbackslash t+] Tab, ASCII~11. 37.172 -\item[\Verb+\textbackslash v+] Vertical tab, ASCII~13. 37.173 -\item[\Verb+\textbackslash \{+] Open curly brace, ``\Verb+{+'', ASCII~173. 37.174 -\item[\Verb+\textbackslash \}+] Close curly brace, ``\Verb+}+'', ASCII~175. 37.175 -\end{itemize} 37.176 - 37.177 -As indicated above, if you want the expansion of a template to contain 37.178 -a literal ``\Verb+\+'', ``\Verb+{+'', or ``\Verb+{+'' character, you 37.179 -must escape it. 37.180 - 37.181 -\section{Filtering keywords to change their results} 37.182 -\label{sec:template:filter} 37.183 - 37.184 -Some of the results of template expansion are not immediately easy to 37.185 -use. Mercurial lets you specify an optional chain of \emph{filters} 37.186 -to modify the result of expanding a keyword. You have already seen a 37.187 -common filter, \tplkwfilt{date}{isodate}, in action above, to make a 37.188 -date readable. 37.189 - 37.190 -Below is a list of the most commonly used filters that Mercurial 37.191 -supports. While some filters can be applied to any text, others can 37.192 -only be used in specific circumstances. The name of each filter is 37.193 -followed first by an indication of where it can be used, then a 37.194 -description of its effect. 37.195 - 37.196 -\begin{itemize} 37.197 -\item[\tplfilter{addbreaks}] Any text. Add an XHTML ``\Verb+<br/>+'' 37.198 - tag before the end of every line except the last. For example, 37.199 - ``\Verb+foo\nbar+'' becomes ``\Verb+foo<br/>\nbar+''. 37.200 -\item[\tplkwfilt{date}{age}] \tplkword{date} keyword. Render the 37.201 - age of the date, relative to the current time. Yields a string like 37.202 - ``\Verb+10 minutes+''. 37.203 -\item[\tplfilter{basename}] Any text, but most useful for the 37.204 - \tplkword{files} keyword and its relatives. Treat the text as a 37.205 - path, and return the basename. For example, ``\Verb+foo/bar/baz+'' 37.206 - becomes ``\Verb+baz+''. 37.207 -\item[\tplkwfilt{date}{date}] \tplkword{date} keyword. Render a date 37.208 - in a similar format to the Unix \tplkword{date} command, but with 37.209 - timezone included. Yields a string like 37.210 - ``\Verb+Mon Sep 04 15:13:13 2006 -0700+''. 37.211 -\item[\tplkwfilt{author}{domain}] Any text, but most useful for the 37.212 - \tplkword{author} keyword. Finds the first string that looks like 37.213 - an email address, and extract just the domain component. For 37.214 - example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes 37.215 - ``\Verb+serpentine.com+''. 37.216 -\item[\tplkwfilt{author}{email}] Any text, but most useful for the 37.217 - \tplkword{author} keyword. Extract the first string that looks like 37.218 - an email address. For example, 37.219 - ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes 37.220 - ``\Verb+bos@serpentine.com+''. 37.221 -\item[\tplfilter{escape}] Any text. Replace the special XML/XHTML 37.222 - characters ``\Verb+&+'', ``\Verb+<+'' and ``\Verb+>+'' with 37.223 - XML entities. 37.224 -\item[\tplfilter{fill68}] Any text. Wrap the text to fit in 68 37.225 - columns. This is useful before you pass text through the 37.226 - \tplfilter{tabindent} filter, and still want it to fit in an 37.227 - 80-column fixed-font window. 37.228 -\item[\tplfilter{fill76}] Any text. Wrap the text to fit in 76 37.229 - columns. 37.230 -\item[\tplfilter{firstline}] Any text. Yield the first line of text, 37.231 - without any trailing newlines. 37.232 -\item[\tplkwfilt{date}{hgdate}] \tplkword{date} keyword. Render the 37.233 - date as a pair of readable numbers. Yields a string like 37.234 - ``\Verb+1157407993 25200+''. 37.235 -\item[\tplkwfilt{date}{isodate}] \tplkword{date} keyword. Render the 37.236 - date as a text string in ISO~8601 format. Yields a string like 37.237 - ``\Verb+2006-09-04 15:13:13 -0700+''. 37.238 -\item[\tplfilter{obfuscate}] Any text, but most useful for the 37.239 - \tplkword{author} keyword. Yield the input text rendered as a 37.240 - sequence of XML entities. This helps to defeat some particularly 37.241 - stupid screen-scraping email harvesting spambots. 37.242 -\item[\tplkwfilt{author}{person}] Any text, but most useful for the 37.243 - \tplkword{author} keyword. Yield the text before an email address. 37.244 - For example, ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' 37.245 - becomes ``\Verb+Bryan O'Sullivan+''. 37.246 -\item[\tplkwfilt{date}{rfc822date}] \tplkword{date} keyword. Render a 37.247 - date using the same format used in email headers. Yields a string 37.248 - like ``\Verb+Mon, 04 Sep 2006 15:13:13 -0700+''. 37.249 -\item[\tplkwfilt{node}{short}] Changeset hash. Yield the short form 37.250 - of a changeset hash, i.e.~a 12-character hexadecimal string. 37.251 -\item[\tplkwfilt{date}{shortdate}] \tplkword{date} keyword. Render 37.252 - the year, month, and day of the date. Yields a string like 37.253 - ``\Verb+2006-09-04+''. 37.254 -\item[\tplfilter{strip}] Any text. Strip all leading and trailing 37.255 - whitespace from the string. 37.256 -\item[\tplfilter{tabindent}] Any text. Yield the text, with every line 37.257 - except the first starting with a tab character. 37.258 -\item[\tplfilter{urlescape}] Any text. Escape all characters that are 37.259 - considered ``special'' by URL parsers. For example, \Verb+foo bar+ 37.260 - becomes \Verb+foo%20bar+. 37.261 -\item[\tplkwfilt{author}{user}] Any text, but most useful for the 37.262 - \tplkword{author} keyword. Return the ``user'' portion of an email 37.263 - address. For example, 37.264 - ``\Verb+Bryan O'Sullivan <bos@serpentine.com>+'' becomes 37.265 - ``\Verb+bos+''. 37.266 -\end{itemize} 37.267 - 37.268 -\begin{figure} 37.269 - \interaction{template.simple.manyfilters} 37.270 - \caption{Template filters in action} 37.271 - \label{fig:template:filters} 37.272 -\end{figure} 37.273 - 37.274 -\begin{note} 37.275 - If you try to apply a filter to a piece of data that it cannot 37.276 - process, Mercurial will fail and print a Python exception. For 37.277 - example, trying to run the output of the \tplkword{desc} keyword 37.278 - into the \tplkwfilt{date}{isodate} filter is not a good idea. 37.279 -\end{note} 37.280 - 37.281 -\subsection{Combining filters} 37.282 - 37.283 -It is easy to combine filters to yield output in the form you would 37.284 -like. The following chain of filters tidies up a description, then 37.285 -makes sure that it fits cleanly into 68 columns, then indents it by a 37.286 -further 8~characters (at least on Unix-like systems, where a tab is 37.287 -conventionally 8~characters wide). 37.288 - 37.289 -\interaction{template.simple.combine} 37.290 - 37.291 -Note the use of ``\Verb+\t+'' (a tab character) in the template to 37.292 -force the first line to be indented; this is necessary since 37.293 -\tplkword{tabindent} indents all lines \emph{except} the first. 37.294 - 37.295 -Keep in mind that the order of filters in a chain is significant. The 37.296 -first filter is applied to the result of the keyword; the second to 37.297 -the result of the first filter; and so on. For example, using 37.298 -\Verb+fill68|tabindent+ gives very different results from 37.299 -\Verb+tabindent|fill68+. 37.300 - 37.301 - 37.302 -\section{From templates to styles} 37.303 - 37.304 -A command line template provides a quick and simple way to format some 37.305 -output. Templates can become verbose, though, and it's useful to be 37.306 -able to give a template a name. A style file is a template with a 37.307 -name, stored in a file. 37.308 - 37.309 -More than that, using a style file unlocks the power of Mercurial's 37.310 -templating engine in ways that are not possible using the command line 37.311 -\hgopt{log}{--template} option. 37.312 - 37.313 -\subsection{The simplest of style files} 37.314 - 37.315 -Our simple style file contains just one line: 37.316 - 37.317 -\interaction{template.simple.rev} 37.318 - 37.319 -This tells Mercurial, ``if you're printing a changeset, use the text 37.320 -on the right as the template''. 37.321 - 37.322 -\subsection{Style file syntax} 37.323 - 37.324 -The syntax rules for a style file are simple. 37.325 - 37.326 -\begin{itemize} 37.327 -\item The file is processed one line at a time. 37.328 - 37.329 -\item Leading and trailing white space are ignored. 37.330 - 37.331 -\item Empty lines are skipped. 37.332 - 37.333 -\item If a line starts with either of the characters ``\texttt{\#}'' or 37.334 - ``\texttt{;}'', the entire line is treated as a comment, and skipped 37.335 - as if empty. 37.336 - 37.337 -\item A line starts with a keyword. This must start with an 37.338 - alphabetic character or underscore, and can subsequently contain any 37.339 - alphanumeric character or underscore. (In regexp notation, a 37.340 - keyword must match \Verb+[A-Za-z_][A-Za-z0-9_]*+.) 37.341 - 37.342 -\item The next element must be an ``\texttt{=}'' character, which can 37.343 - be preceded or followed by an arbitrary amount of white space. 37.344 - 37.345 -\item If the rest of the line starts and ends with matching quote 37.346 - characters (either single or double quote), it is treated as a 37.347 - template body. 37.348 - 37.349 -\item If the rest of the line \emph{does not} start with a quote 37.350 - character, it is treated as the name of a file; the contents of this 37.351 - file will be read and used as a template body. 37.352 -\end{itemize} 37.353 - 37.354 -\section{Style files by example} 37.355 - 37.356 -To illustrate how to write a style file, we will construct a few by 37.357 -example. Rather than provide a complete style file and walk through 37.358 -it, we'll mirror the usual process of developing a style file by 37.359 -starting with something very simple, and walking through a series of 37.360 -successively more complete examples. 37.361 - 37.362 -\subsection{Identifying mistakes in style files} 37.363 - 37.364 -If Mercurial encounters a problem in a style file you are working on, 37.365 -it prints a terse error message that, once you figure out what it 37.366 -means, is actually quite useful. 37.367 - 37.368 -\interaction{template.svnstyle.syntax.input} 37.369 - 37.370 -Notice that \filename{broken.style} attempts to define a 37.371 -\texttt{changeset} keyword, but forgets to give any content for it. 37.372 -When instructed to use this style file, Mercurial promptly complains. 37.373 - 37.374 -\interaction{template.svnstyle.syntax.error} 37.375 - 37.376 -This error message looks intimidating, but it is not too hard to 37.377 -follow. 37.378 - 37.379 -\begin{itemize} 37.380 -\item The first component is simply Mercurial's way of saying ``I am 37.381 - giving up''. 37.382 - \begin{codesample4} 37.383 - \textbf{abort:} broken.style:1: parse error 37.384 - \end{codesample4} 37.385 - 37.386 -\item Next comes the name of the style file that contains the error. 37.387 - \begin{codesample4} 37.388 - abort: \textbf{broken.style}:1: parse error 37.389 - \end{codesample4} 37.390 - 37.391 -\item Following the file name is the line number where the error was 37.392 - encountered. 37.393 - \begin{codesample4} 37.394 - abort: broken.style:\textbf{1}: parse error 37.395 - \end{codesample4} 37.396 - 37.397 -\item Finally, a description of what went wrong. 37.398 - \begin{codesample4} 37.399 - abort: broken.style:1: \textbf{parse error} 37.400 - \end{codesample4} 37.401 - The description of the problem is not always clear (as in this 37.402 - case), but even when it is cryptic, it is almost always trivial to 37.403 - visually inspect the offending line in the style file and see what 37.404 - is wrong. 37.405 -\end{itemize} 37.406 - 37.407 -\subsection{Uniquely identifying a repository} 37.408 - 37.409 -If you would like to be able to identify a Mercurial repository 37.410 -``fairly uniquely'' using a short string as an identifier, you can 37.411 -use the first revision in the repository. 37.412 -\interaction{template.svnstyle.id} 37.413 -This is not guaranteed to be unique, but it is nevertheless useful in 37.414 -many cases. 37.415 -\begin{itemize} 37.416 -\item It will not work in a completely empty repository, because such 37.417 - a repository does not have a revision~zero. 37.418 -\item Neither will it work in the (extremely rare) case where a 37.419 - repository is a merge of two or more formerly independent 37.420 - repositories, and you still have those repositories around. 37.421 -\end{itemize} 37.422 -Here are some uses to which you could put this identifier: 37.423 -\begin{itemize} 37.424 -\item As a key into a table for a database that manages repositories 37.425 - on a server. 37.426 -\item As half of a \{\emph{repository~ID}, \emph{revision~ID}\} tuple. 37.427 - Save this information away when you run an automated build or other 37.428 - activity, so that you can ``replay'' the build later if necessary. 37.429 -\end{itemize} 37.430 - 37.431 -\subsection{Mimicking Subversion's output} 37.432 - 37.433 -Let's try to emulate the default output format used by another 37.434 -revision control tool, Subversion. 37.435 -\interaction{template.svnstyle.short} 37.436 - 37.437 -Since Subversion's output style is fairly simple, it is easy to 37.438 -copy-and-paste a hunk of its output into a file, and replace the text 37.439 -produced above by Subversion with the template values we'd like to see 37.440 -expanded. 37.441 -\interaction{template.svnstyle.template} 37.442 - 37.443 -There are a few small ways in which this template deviates from the 37.444 -output produced by Subversion. 37.445 -\begin{itemize} 37.446 -\item Subversion prints a ``readable'' date (the ``\texttt{Wed, 27 Sep 37.447 - 2006}'' in the example output above) in parentheses. Mercurial's 37.448 - templating engine does not provide a way to display a date in this 37.449 - format without also printing the time and time zone. 37.450 -\item We emulate Subversion's printing of ``separator'' lines full of 37.451 - ``\texttt{-}'' characters by ending the template with such a line. 37.452 - We use the templating engine's \tplkword{header} keyword to print a 37.453 - separator line as the first line of output (see below), thus 37.454 - achieving similar output to Subversion. 37.455 -\item Subversion's output includes a count in the header of the number 37.456 - of lines in the commit message. We cannot replicate this in 37.457 - Mercurial; the templating engine does not currently provide a filter 37.458 - that counts the number of lines the template generates. 37.459 -\end{itemize} 37.460 -It took me no more than a minute or two of work to replace literal 37.461 -text from an example of Subversion's output with some keywords and 37.462 -filters to give the template above. The style file simply refers to 37.463 -the template. 37.464 -\interaction{template.svnstyle.style} 37.465 - 37.466 -We could have included the text of the template file directly in the 37.467 -style file by enclosing it in quotes and replacing the newlines with 37.468 -``\verb!\n!'' sequences, but it would have made the style file too 37.469 -difficult to read. Readability is a good guide when you're trying to 37.470 -decide whether some text belongs in a style file, or in a template 37.471 -file that the style file points to. If the style file will look too 37.472 -big or cluttered if you insert a literal piece of text, drop it into a 37.473 -template instead. 37.474 - 37.475 -%%% Local Variables: 37.476 -%%% mode: latex 37.477 -%%% TeX-master: "00book" 37.478 -%%% End:
38.1 --- a/en/tour-basic.tex Thu Jan 29 22:47:34 2009 -0800 38.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 38.3 @@ -1,624 +0,0 @@ 38.4 -\chapter{A tour of Mercurial: the basics} 38.5 -\label{chap:tour-basic} 38.6 - 38.7 -\section{Installing Mercurial on your system} 38.8 -\label{sec:tour:install} 38.9 - 38.10 -Prebuilt binary packages of Mercurial are available for every popular 38.11 -operating system. These make it easy to start using Mercurial on your 38.12 -computer immediately. 38.13 - 38.14 -\subsection{Linux} 38.15 - 38.16 -Because each Linux distribution has its own packaging tools, policies, 38.17 -and rate of development, it's difficult to give a comprehensive set of 38.18 -instructions on how to install Mercurial binaries. The version of 38.19 -Mercurial that you will end up with can vary depending on how active 38.20 -the person is who maintains the package for your distribution. 38.21 - 38.22 -To keep things simple, I will focus on installing Mercurial from the 38.23 -command line under the most popular Linux distributions. Most of 38.24 -these distributions provide graphical package managers that will let 38.25 -you install Mercurial with a single click; the package name to look 38.26 -for is \texttt{mercurial}. 38.27 - 38.28 -\begin{itemize} 38.29 -\item[Debian] 38.30 - \begin{codesample4} 38.31 - apt-get install mercurial 38.32 - \end{codesample4} 38.33 - 38.34 -\item[Fedora Core] 38.35 - \begin{codesample4} 38.36 - yum install mercurial 38.37 - \end{codesample4} 38.38 - 38.39 -\item[Gentoo] 38.40 - \begin{codesample4} 38.41 - emerge mercurial 38.42 - \end{codesample4} 38.43 - 38.44 -\item[OpenSUSE] 38.45 - \begin{codesample4} 38.46 - yum install mercurial 38.47 - \end{codesample4} 38.48 - 38.49 -\item[Ubuntu] Ubuntu's Mercurial package is based on Debian's. To 38.50 - install it, run the following command. 38.51 - \begin{codesample4} 38.52 - apt-get install mercurial 38.53 - \end{codesample4} 38.54 - The Ubuntu package for Mercurial tends to lag behind the Debian 38.55 - version by a considerable time margin (at the time of writing, seven 38.56 - months), which in some cases will mean that on Ubuntu, you may run 38.57 - into problems that have since been fixed in the Debian package. 38.58 -\end{itemize} 38.59 - 38.60 -\subsection{Solaris} 38.61 - 38.62 -SunFreeWare, at \url{http://www.sunfreeware.com}, is a good source for a 38.63 -large number of pre-built Solaris packages for 32 and 64 bit Intel and 38.64 -Sparc architectures, including current versions of Mercurial. 38.65 - 38.66 -\subsection{Mac OS X} 38.67 - 38.68 -Lee Cantey publishes an installer of Mercurial for Mac OS~X at 38.69 -\url{http://mercurial.berkwood.com}. This package works on both 38.70 -Intel-~and Power-based Macs. Before you can use it, you must install 38.71 -a compatible version of Universal MacPython~\cite{web:macpython}. This 38.72 -is easy to do; simply follow the instructions on Lee's site. 38.73 - 38.74 -It's also possible to install Mercurial using Fink or MacPorts, 38.75 -two popular free package managers for Mac OS X. If you have Fink, 38.76 -use \command{sudo apt-get install mercurial-py25}. If MacPorts, 38.77 -\command{sudo port install mercurial}. 38.78 - 38.79 -\subsection{Windows} 38.80 - 38.81 -Lee Cantey publishes an installer of Mercurial for Windows at 38.82 -\url{http://mercurial.berkwood.com}. This package has no external 38.83 -dependencies; it ``just works''. 38.84 - 38.85 -\begin{note} 38.86 - The Windows version of Mercurial does not automatically convert line 38.87 - endings between Windows and Unix styles. If you want to share work 38.88 - with Unix users, you must do a little additional configuration 38.89 - work. XXX Flesh this out. 38.90 -\end{note} 38.91 - 38.92 -\section{Getting started} 38.93 - 38.94 -To begin, we'll use the \hgcmd{version} command to find out whether 38.95 -Mercurial is actually installed properly. The actual version 38.96 -information that it prints isn't so important; it's whether it prints 38.97 -anything at all that we care about. 38.98 -\interaction{tour.version} 38.99 - 38.100 -\subsection{Built-in help} 38.101 - 38.102 -Mercurial provides a built-in help system. This is invaluable for those 38.103 -times when you find yourself stuck trying to remember how to run a 38.104 -command. If you are completely stuck, simply run \hgcmd{help}; it 38.105 -will print a brief list of commands, along with a description of what 38.106 -each does. If you ask for help on a specific command (as below), it 38.107 -prints more detailed information. 38.108 -\interaction{tour.help} 38.109 -For a more impressive level of detail (which you won't usually need) 38.110 -run \hgcmdargs{help}{\hggopt{-v}}. The \hggopt{-v} option is short 38.111 -for \hggopt{--verbose}, and tells Mercurial to print more information 38.112 -than it usually would. 38.113 - 38.114 -\section{Working with a repository} 38.115 - 38.116 -In Mercurial, everything happens inside a \emph{repository}. The 38.117 -repository for a project contains all of the files that ``belong to'' 38.118 -that project, along with a historical record of the project's files. 38.119 - 38.120 -There's nothing particularly magical about a repository; it is simply 38.121 -a directory tree in your filesystem that Mercurial treats as special. 38.122 -You can rename or delete a repository any time you like, using either the 38.123 -command line or your file browser. 38.124 - 38.125 -\subsection{Making a local copy of a repository} 38.126 - 38.127 -\emph{Copying} a repository is just a little bit special. While you 38.128 -could use a normal file copying command to make a copy of a 38.129 -repository, it's best to use a built-in command that Mercurial 38.130 -provides. This command is called \hgcmd{clone}, because it creates an 38.131 -identical copy of an existing repository. 38.132 -\interaction{tour.clone} 38.133 -If our clone succeeded, we should now have a local directory called 38.134 -\dirname{hello}. This directory will contain some files. 38.135 -\interaction{tour.ls} 38.136 -These files have the same contents and history in our repository as 38.137 -they do in the repository we cloned. 38.138 - 38.139 -Every Mercurial repository is complete, self-contained, and 38.140 -independent. It contains its own private copy of a project's files 38.141 -and history. A cloned repository remembers the location of the 38.142 -repository it was cloned from, but it does not communicate with that 38.143 -repository, or any other, unless you tell it to. 38.144 - 38.145 -What this means for now is that we're free to experiment with our 38.146 -repository, safe in the knowledge that it's a private ``sandbox'' that 38.147 -won't affect anyone else. 38.148 - 38.149 -\subsection{What's in a repository?} 38.150 - 38.151 -When we take a more detailed look inside a repository, we can see that 38.152 -it contains a directory named \dirname{.hg}. This is where Mercurial 38.153 -keeps all of its metadata for the repository. 38.154 -\interaction{tour.ls-a} 38.155 - 38.156 -The contents of the \dirname{.hg} directory and its subdirectories are 38.157 -private to Mercurial. Every other file and directory in the 38.158 -repository is yours to do with as you please. 38.159 - 38.160 -To introduce a little terminology, the \dirname{.hg} directory is the 38.161 -``real'' repository, and all of the files and directories that coexist 38.162 -with it are said to live in the \emph{working directory}. An easy way 38.163 -to remember the distinction is that the \emph{repository} contains the 38.164 -\emph{history} of your project, while the \emph{working directory} 38.165 -contains a \emph{snapshot} of your project at a particular point in 38.166 -history. 38.167 - 38.168 -\section{A tour through history} 38.169 - 38.170 -One of the first things we might want to do with a new, unfamiliar 38.171 -repository is understand its history. The \hgcmd{log} command gives 38.172 -us a view of history. 38.173 -\interaction{tour.log} 38.174 -By default, this command prints a brief paragraph of output for each 38.175 -change to the project that was recorded. In Mercurial terminology, we 38.176 -call each of these recorded events a \emph{changeset}, because it can 38.177 -contain a record of changes to several files. 38.178 - 38.179 -The fields in a record of output from \hgcmd{log} are as follows. 38.180 -\begin{itemize} 38.181 -\item[\texttt{changeset}] This field has the format of a number, 38.182 - followed by a colon, followed by a hexadecimal string. These are 38.183 - \emph{identifiers} for the changeset. There are two identifiers 38.184 - because the number is shorter and easier to type than the hex 38.185 - string. 38.186 -\item[\texttt{user}] The identity of the person who created the 38.187 - changeset. This is a free-form field, but it most often contains a 38.188 - person's name and email address. 38.189 -\item[\texttt{date}] The date and time on which the changeset was 38.190 - created, and the timezone in which it was created. (The date and 38.191 - time are local to that timezone; they display what time and date it 38.192 - was for the person who created the changeset.) 38.193 -\item[\texttt{summary}] The first line of the text message that the 38.194 - creator of the changeset entered to describe the changeset. 38.195 -\end{itemize} 38.196 -The default output printed by \hgcmd{log} is purely a summary; it is 38.197 -missing a lot of detail. 38.198 - 38.199 -Figure~\ref{fig:tour-basic:history} provides a graphical representation of 38.200 -the history of the \dirname{hello} repository, to make it a little 38.201 -easier to see which direction history is ``flowing'' in. We'll be 38.202 -returning to this figure several times in this chapter and the chapter 38.203 -that follows. 38.204 - 38.205 -\begin{figure}[ht] 38.206 - \centering 38.207 - \grafix{tour-history} 38.208 - \caption{Graphical history of the \dirname{hello} repository} 38.209 - \label{fig:tour-basic:history} 38.210 -\end{figure} 38.211 - 38.212 -\subsection{Changesets, revisions, and talking to other 38.213 - people} 38.214 - 38.215 -As English is a notoriously sloppy language, and computer science has 38.216 -a hallowed history of terminological confusion (why use one term when 38.217 -four will do?), revision control has a variety of words and phrases 38.218 -that mean the same thing. If you are talking about Mercurial history 38.219 -with other people, you will find that the word ``changeset'' is often 38.220 -compressed to ``change'' or (when written) ``cset'', and sometimes a 38.221 -changeset is referred to as a ``revision'' or a ``rev''. 38.222 - 38.223 -While it doesn't matter what \emph{word} you use to refer to the 38.224 -concept of ``a~changeset'', the \emph{identifier} that you use to 38.225 -refer to ``a~\emph{specific} changeset'' is of great importance. 38.226 -Recall that the \texttt{changeset} field in the output from 38.227 -\hgcmd{log} identifies a changeset using both a number and a 38.228 -hexadecimal string. 38.229 -\begin{itemize} 38.230 -\item The revision number is \emph{only valid in that repository}, 38.231 -\item while the hex string is the \emph{permanent, unchanging 38.232 - identifier} that will always identify that exact changeset in 38.233 - \emph{every} copy of the repository. 38.234 -\end{itemize} 38.235 -This distinction is important. If you send someone an email talking 38.236 -about ``revision~33'', there's a high likelihood that their 38.237 -revision~33 will \emph{not be the same} as yours. The reason for this 38.238 -is that a revision number depends on the order in which changes 38.239 -arrived in a repository, and there is no guarantee that the same 38.240 -changes will happen in the same order in different repositories. 38.241 -Three changes $a,b,c$ can easily appear in one repository as $0,1,2$, 38.242 -while in another as $1,0,2$. 38.243 - 38.244 -Mercurial uses revision numbers purely as a convenient shorthand. If 38.245 -you need to discuss a changeset with someone, or make a record of a 38.246 -changeset for some other reason (for example, in a bug report), use 38.247 -the hexadecimal identifier. 38.248 - 38.249 -\subsection{Viewing specific revisions} 38.250 - 38.251 -To narrow the output of \hgcmd{log} down to a single revision, use the 38.252 -\hgopt{log}{-r} (or \hgopt{log}{--rev}) option. You can use either a 38.253 -revision number or a long-form changeset identifier, and you can 38.254 -provide as many revisions as you want. \interaction{tour.log-r} 38.255 - 38.256 -If you want to see the history of several revisions without having to 38.257 -list each one, you can use \emph{range notation}; this lets you 38.258 -express the idea ``I want all revisions between $a$ and $b$, 38.259 -inclusive''. 38.260 -\interaction{tour.log.range} 38.261 -Mercurial also honours the order in which you specify revisions, so 38.262 -\hgcmdargs{log}{-r 2:4} prints $2,3,4$ while \hgcmdargs{log}{-r 4:2} 38.263 -prints $4,3,2$. 38.264 - 38.265 -\subsection{More detailed information} 38.266 - 38.267 -While the summary information printed by \hgcmd{log} is useful if you 38.268 -already know what you're looking for, you may need to see a complete 38.269 -description of the change, or a list of the files changed, if you're 38.270 -trying to decide whether a changeset is the one you're looking for. 38.271 -The \hgcmd{log} command's \hggopt{-v} (or \hggopt{--verbose}) 38.272 -option gives you this extra detail. 38.273 -\interaction{tour.log-v} 38.274 - 38.275 -If you want to see both the description and content of a change, add 38.276 -the \hgopt{log}{-p} (or \hgopt{log}{--patch}) option. This displays 38.277 -the content of a change as a \emph{unified diff} (if you've never seen 38.278 -a unified diff before, see section~\ref{sec:mq:patch} for an overview). 38.279 -\interaction{tour.log-vp} 38.280 - 38.281 -\section{All about command options} 38.282 - 38.283 -Let's take a brief break from exploring Mercurial commands to discuss 38.284 -a pattern in the way that they work; you may find this useful to keep 38.285 -in mind as we continue our tour. 38.286 - 38.287 -Mercurial has a consistent and straightforward approach to dealing 38.288 -with the options that you can pass to commands. It follows the 38.289 -conventions for options that are common to modern Linux and Unix 38.290 -systems. 38.291 -\begin{itemize} 38.292 -\item Every option has a long name. For example, as we've already 38.293 - seen, the \hgcmd{log} command accepts a \hgopt{log}{--rev} option. 38.294 -\item Most options have short names, too. Instead of 38.295 - \hgopt{log}{--rev}, we can use \hgopt{log}{-r}. (The reason that 38.296 - some options don't have short names is that the options in question 38.297 - are rarely used.) 38.298 -\item Long options start with two dashes (e.g.~\hgopt{log}{--rev}), 38.299 - while short options start with one (e.g.~\hgopt{log}{-r}). 38.300 -\item Option naming and usage is consistent across commands. For 38.301 - example, every command that lets you specify a changeset~ID or 38.302 - revision number accepts both \hgopt{log}{-r} and \hgopt{log}{--rev} 38.303 - arguments. 38.304 -\end{itemize} 38.305 -In the examples throughout this book, I use short options instead of 38.306 -long. This just reflects my own preference, so don't read anything 38.307 -significant into it. 38.308 - 38.309 -Most commands that print output of some kind will print more output 38.310 -when passed a \hggopt{-v} (or \hggopt{--verbose}) option, and less 38.311 -when passed \hggopt{-q} (or \hggopt{--quiet}). 38.312 - 38.313 -\section{Making and reviewing changes} 38.314 - 38.315 -Now that we have a grasp of viewing history in Mercurial, let's take a 38.316 -look at making some changes and examining them. 38.317 - 38.318 -The first thing we'll do is isolate our experiment in a repository of 38.319 -its own. We use the \hgcmd{clone} command, but we don't need to 38.320 -clone a copy of the remote repository. Since we already have a copy 38.321 -of it locally, we can just clone that instead. This is much faster 38.322 -than cloning over the network, and cloning a local repository uses 38.323 -less disk space in most cases, too. 38.324 -\interaction{tour.reclone} 38.325 -As an aside, it's often good practice to keep a ``pristine'' copy of a 38.326 -remote repository around, which you can then make temporary clones of 38.327 -to create sandboxes for each task you want to work on. This lets you 38.328 -work on multiple tasks in parallel, each isolated from the others 38.329 -until it's complete and you're ready to integrate it back. Because 38.330 -local clones are so cheap, there's almost no overhead to cloning and 38.331 -destroying repositories whenever you want. 38.332 - 38.333 -In our \dirname{my-hello} repository, we have a file 38.334 -\filename{hello.c} that contains the classic ``hello, world'' program. 38.335 -Let's use the ancient and venerable \command{sed} command to edit this 38.336 -file so that it prints a second line of output. (I'm only using 38.337 -\command{sed} to do this because it's easy to write a scripted example 38.338 -this way. Since you're not under the same constraint, you probably 38.339 -won't want to use \command{sed}; simply use your preferred text editor to 38.340 -do the same thing.) 38.341 -\interaction{tour.sed} 38.342 - 38.343 -Mercurial's \hgcmd{status} command will tell us what Mercurial knows 38.344 -about the files in the repository. 38.345 -\interaction{tour.status} 38.346 -The \hgcmd{status} command prints no output for some files, but a line 38.347 -starting with ``\texttt{M}'' for \filename{hello.c}. Unless you tell 38.348 -it to, \hgcmd{status} will not print any output for files that have 38.349 -not been modified. 38.350 - 38.351 -The ``\texttt{M}'' indicates that Mercurial has noticed that we 38.352 -modified \filename{hello.c}. We didn't need to \emph{inform} 38.353 -Mercurial that we were going to modify the file before we started, or 38.354 -that we had modified the file after we were done; it was able to 38.355 -figure this out itself. 38.356 - 38.357 -It's a little bit helpful to know that we've modified 38.358 -\filename{hello.c}, but we might prefer to know exactly \emph{what} 38.359 -changes we've made to it. To do this, we use the \hgcmd{diff} 38.360 -command. 38.361 -\interaction{tour.diff} 38.362 - 38.363 -\section{Recording changes in a new changeset} 38.364 - 38.365 -We can modify files, build and test our changes, and use 38.366 -\hgcmd{status} and \hgcmd{diff} to review our changes, until we're 38.367 -satisfied with what we've done and arrive at a natural stopping point 38.368 -where we want to record our work in a new changeset. 38.369 - 38.370 -The \hgcmd{commit} command lets us create a new changeset; we'll 38.371 -usually refer to this as ``making a commit'' or ``committing''. 38.372 - 38.373 -\subsection{Setting up a username} 38.374 - 38.375 -When you try to run \hgcmd{commit} for the first time, it is not 38.376 -guaranteed to succeed. Mercurial records your name and address with 38.377 -each change that you commit, so that you and others will later be able 38.378 -to tell who made each change. Mercurial tries to automatically figure 38.379 -out a sensible username to commit the change with. It will attempt 38.380 -each of the following methods, in order: 38.381 -\begin{enumerate} 38.382 -\item If you specify a \hgopt{commit}{-u} option to the \hgcmd{commit} 38.383 - command on the command line, followed by a username, this is always 38.384 - given the highest precedence. 38.385 -\item If you have set the \envar{HGUSER} environment variable, this is 38.386 - checked next. 38.387 -\item If you create a file in your home directory called 38.388 - \sfilename{.hgrc}, with a \rcitem{ui}{username} entry, that will be 38.389 - used next. To see what the contents of this file should look like, 38.390 - refer to section~\ref{sec:tour-basic:username} below. 38.391 -\item If you have set the \envar{EMAIL} environment variable, this 38.392 - will be used next. 38.393 -\item Mercurial will query your system to find out your local user 38.394 - name and host name, and construct a username from these components. 38.395 - Since this often results in a username that is not very useful, it 38.396 - will print a warning if it has to do this. 38.397 -\end{enumerate} 38.398 -If all of these mechanisms fail, Mercurial will fail, printing an 38.399 -error message. In this case, it will not let you commit until you set 38.400 -up a username. 38.401 - 38.402 -You should think of the \envar{HGUSER} environment variable and the 38.403 -\hgopt{commit}{-u} option to the \hgcmd{commit} command as ways to 38.404 -\emph{override} Mercurial's default selection of username. For normal 38.405 -use, the simplest and most robust way to set a username for yourself 38.406 -is by creating a \sfilename{.hgrc} file; see below for details. 38.407 - 38.408 -\subsubsection{Creating a Mercurial configuration file} 38.409 -\label{sec:tour-basic:username} 38.410 - 38.411 -To set a user name, use your favourite editor to create a file called 38.412 -\sfilename{.hgrc} in your home directory. Mercurial will use this 38.413 -file to look up your personalised configuration settings. The initial 38.414 -contents of your \sfilename{.hgrc} should look like this. 38.415 -\begin{codesample2} 38.416 - # This is a Mercurial configuration file. 38.417 - [ui] 38.418 - username = Firstname Lastname <email.address@domain.net> 38.419 -\end{codesample2} 38.420 -The ``\texttt{[ui]}'' line begins a \emph{section} of the config file, 38.421 -so you can read the ``\texttt{username = ...}'' line as meaning ``set 38.422 -the value of the \texttt{username} item in the \texttt{ui} section''. 38.423 -A section continues until a new section begins, or the end of the 38.424 -file. Mercurial ignores empty lines and treats any text from 38.425 -``\texttt{\#}'' to the end of a line as a comment. 38.426 - 38.427 -\subsubsection{Choosing a user name} 38.428 - 38.429 -You can use any text you like as the value of the \texttt{username} 38.430 -config item, since this information is for reading by other people, 38.431 -but for interpreting by Mercurial. The convention that most people 38.432 -follow is to use their name and email address, as in the example 38.433 -above. 38.434 - 38.435 -\begin{note} 38.436 - Mercurial's built-in web server obfuscates email addresses, to make 38.437 - it more difficult for the email harvesting tools that spammers use. 38.438 - This reduces the likelihood that you'll start receiving more junk 38.439 - email if you publish a Mercurial repository on the web. 38.440 -\end{note} 38.441 - 38.442 -\subsection{Writing a commit message} 38.443 - 38.444 -When we commit a change, Mercurial drops us into a text editor, to 38.445 -enter a message that will describe the modifications we've made in 38.446 -this changeset. This is called the \emph{commit message}. It will be 38.447 -a record for readers of what we did and why, and it will be printed by 38.448 -\hgcmd{log} after we've finished committing. 38.449 -\interaction{tour.commit} 38.450 - 38.451 -The editor that the \hgcmd{commit} command drops us into will contain 38.452 -an empty line, followed by a number of lines starting with 38.453 -``\texttt{HG:}''. 38.454 -\begin{codesample2} 38.455 - \emph{empty line} 38.456 - HG: changed hello.c 38.457 -\end{codesample2} 38.458 -Mercurial ignores the lines that start with ``\texttt{HG:}''; it uses 38.459 -them only to tell us which files it's recording changes to. Modifying 38.460 -or deleting these lines has no effect. 38.461 - 38.462 -\subsection{Writing a good commit message} 38.463 - 38.464 -Since \hgcmd{log} only prints the first line of a commit message by 38.465 -default, it's best to write a commit message whose first line stands 38.466 -alone. Here's a real example of a commit message that \emph{doesn't} 38.467 -follow this guideline, and hence has a summary that is not readable. 38.468 -\begin{codesample2} 38.469 - changeset: 73:584af0e231be 38.470 - user: Censored Person <censored.person@example.org> 38.471 - date: Tue Sep 26 21:37:07 2006 -0700 38.472 - summary: include buildmeister/commondefs. Add an exports and install 38.473 -\end{codesample2} 38.474 - 38.475 -As far as the remainder of the contents of the commit message are 38.476 -concerned, there are no hard-and-fast rules. Mercurial itself doesn't 38.477 -interpret or care about the contents of the commit message, though 38.478 -your project may have policies that dictate a certain kind of 38.479 -formatting. 38.480 - 38.481 -My personal preference is for short, but informative, commit messages 38.482 -that tell me something that I can't figure out with a quick glance at 38.483 -the output of \hgcmdargs{log}{--patch}. 38.484 - 38.485 -\subsection{Aborting a commit} 38.486 - 38.487 -If you decide that you don't want to commit while in the middle of 38.488 -editing a commit message, simply exit from your editor without saving 38.489 -the file that it's editing. This will cause nothing to happen to 38.490 -either the repository or the working directory. 38.491 - 38.492 -If we run the \hgcmd{commit} command without any arguments, it records 38.493 -all of the changes we've made, as reported by \hgcmd{status} and 38.494 -\hgcmd{diff}. 38.495 - 38.496 -\subsection{Admiring our new handiwork} 38.497 - 38.498 -Once we've finished the commit, we can use the \hgcmd{tip} command to 38.499 -display the changeset we just created. This command produces output 38.500 -that is identical to \hgcmd{log}, but it only displays the newest 38.501 -revision in the repository. 38.502 -\interaction{tour.tip} 38.503 -We refer to the newest revision in the repository as the tip revision, 38.504 -or simply the tip. 38.505 - 38.506 -\section{Sharing changes} 38.507 - 38.508 -We mentioned earlier that repositories in Mercurial are 38.509 -self-contained. This means that the changeset we just created exists 38.510 -only in our \dirname{my-hello} repository. Let's look at a few ways 38.511 -that we can propagate this change into other repositories. 38.512 - 38.513 -\subsection{Pulling changes from another repository} 38.514 -\label{sec:tour:pull} 38.515 - 38.516 -To get started, let's clone our original \dirname{hello} repository, 38.517 -which does not contain the change we just committed. We'll call our 38.518 -temporary repository \dirname{hello-pull}. 38.519 -\interaction{tour.clone-pull} 38.520 - 38.521 -We'll use the \hgcmd{pull} command to bring changes from 38.522 -\dirname{my-hello} into \dirname{hello-pull}. However, blindly 38.523 -pulling unknown changes into a repository is a somewhat scary 38.524 -prospect. Mercurial provides the \hgcmd{incoming} command to tell us 38.525 -what changes the \hgcmd{pull} command \emph{would} pull into the 38.526 -repository, without actually pulling the changes in. 38.527 -\interaction{tour.incoming} 38.528 -(Of course, someone could cause more changesets to appear in the 38.529 -repository that we ran \hgcmd{incoming} in, before we get a chance to 38.530 -\hgcmd{pull} the changes, so that we could end up pulling changes that we 38.531 -didn't expect.) 38.532 - 38.533 -Bringing changes into a repository is a simple matter of running the 38.534 -\hgcmd{pull} command, and telling it which repository to pull from. 38.535 -\interaction{tour.pull} 38.536 -As you can see from the before-and-after output of \hgcmd{tip}, we 38.537 -have successfully pulled changes into our repository. There remains 38.538 -one step before we can see these changes in the working directory. 38.539 - 38.540 -\subsection{Updating the working directory} 38.541 - 38.542 -We have so far glossed over the relationship between a repository and 38.543 -its working directory. The \hgcmd{pull} command that we ran in 38.544 -section~\ref{sec:tour:pull} brought changes into the repository, but 38.545 -if we check, there's no sign of those changes in the working 38.546 -directory. This is because \hgcmd{pull} does not (by default) touch 38.547 -the working directory. Instead, we use the \hgcmd{update} command to 38.548 -do this. 38.549 -\interaction{tour.update} 38.550 - 38.551 -It might seem a bit strange that \hgcmd{pull} doesn't update the 38.552 -working directory automatically. There's actually a good reason for 38.553 -this: you can use \hgcmd{update} to update the working directory to 38.554 -the state it was in at \emph{any revision} in the history of the 38.555 -repository. If you had the working directory updated to an old 38.556 -revision---to hunt down the origin of a bug, say---and ran a 38.557 -\hgcmd{pull} which automatically updated the working directory to a 38.558 -new revision, you might not be terribly happy. 38.559 - 38.560 -However, since pull-then-update is such a common thing to do, 38.561 -Mercurial lets you combine the two by passing the \hgopt{pull}{-u} 38.562 -option to \hgcmd{pull}. 38.563 -\begin{codesample2} 38.564 - hg pull -u 38.565 -\end{codesample2} 38.566 -If you look back at the output of \hgcmd{pull} in 38.567 -section~\ref{sec:tour:pull} when we ran it without \hgopt{pull}{-u}, 38.568 -you can see that it printed a helpful reminder that we'd have to take 38.569 -an explicit step to update the working directory: 38.570 -\begin{codesample2} 38.571 - (run 'hg update' to get a working copy) 38.572 -\end{codesample2} 38.573 - 38.574 -To find out what revision the working directory is at, use the 38.575 -\hgcmd{parents} command. 38.576 -\interaction{tour.parents} 38.577 -If you look back at figure~\ref{fig:tour-basic:history}, you'll see 38.578 -arrows connecting each changeset. The node that the arrow leads 38.579 -\emph{from} in each case is a parent, and the node that the arrow 38.580 -leads \emph{to} is its child. The working directory has a parent in 38.581 -just the same way; this is the changeset that the working directory 38.582 -currently contains. 38.583 - 38.584 -To update the working directory to a particular revision, give a 38.585 -revision number or changeset~ID to the \hgcmd{update} command. 38.586 -\interaction{tour.older} 38.587 -If you omit an explicit revision, \hgcmd{update} will update to the 38.588 -tip revision, as shown by the second call to \hgcmd{update} in the 38.589 -example above. 38.590 - 38.591 -\subsection{Pushing changes to another repository} 38.592 - 38.593 -Mercurial lets us push changes to another repository, from the 38.594 -repository we're currently visiting. As with the example of 38.595 -\hgcmd{pull} above, we'll create a temporary repository to push our 38.596 -changes into. 38.597 -\interaction{tour.clone-push} 38.598 -The \hgcmd{outgoing} command tells us what changes would be pushed 38.599 -into another repository. 38.600 -\interaction{tour.outgoing} 38.601 -And the \hgcmd{push} command does the actual push. 38.602 -\interaction{tour.push} 38.603 -As with \hgcmd{pull}, the \hgcmd{push} command does not update the 38.604 -working directory in the repository that it's pushing changes into. 38.605 -(Unlike \hgcmd{pull}, \hgcmd{push} does not provide a \texttt{-u} 38.606 -option that updates the other repository's working directory.) 38.607 - 38.608 -What happens if we try to pull or push changes and the receiving 38.609 -repository already has those changes? Nothing too exciting. 38.610 -\interaction{tour.push.nothing} 38.611 - 38.612 -\subsection{Sharing changes over a network} 38.613 - 38.614 -The commands we have covered in the previous few sections are not 38.615 -limited to working with local repositories. Each works in exactly the 38.616 -same fashion over a network connection; simply pass in a URL instead 38.617 -of a local path. 38.618 -\interaction{tour.outgoing.net} 38.619 -In this example, we can see what changes we could push to the remote 38.620 -repository, but the repository is understandably not set up to let 38.621 -anonymous users push to it. 38.622 -\interaction{tour.push.net} 38.623 - 38.624 -%%% Local Variables: 38.625 -%%% mode: latex 38.626 -%%% TeX-master: "00book" 38.627 -%%% End:
39.1 --- a/en/tour-merge.tex Thu Jan 29 22:47:34 2009 -0800 39.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 39.3 @@ -1,283 +0,0 @@ 39.4 -\chapter{A tour of Mercurial: merging work} 39.5 -\label{chap:tour-merge} 39.6 - 39.7 -We've now covered cloning a repository, making changes in a 39.8 -repository, and pulling or pushing changes from one repository into 39.9 -another. Our next step is \emph{merging} changes from separate 39.10 -repositories. 39.11 - 39.12 -\section{Merging streams of work} 39.13 - 39.14 -Merging is a fundamental part of working with a distributed revision 39.15 -control tool. 39.16 -\begin{itemize} 39.17 -\item Alice and Bob each have a personal copy of a repository for a 39.18 - project they're collaborating on. Alice fixes a bug in her 39.19 - repository; Bob adds a new feature in his. They want the shared 39.20 - repository to contain both the bug fix and the new feature. 39.21 -\item I frequently work on several different tasks for a single 39.22 - project at once, each safely isolated in its own repository. 39.23 - Working this way means that I often need to merge one piece of my 39.24 - own work with another. 39.25 -\end{itemize} 39.26 - 39.27 -Because merging is such a common thing to need to do, Mercurial makes 39.28 -it easy. Let's walk through the process. We'll begin by cloning yet 39.29 -another repository (see how often they spring up?) and making a change 39.30 -in it. 39.31 -\interaction{tour.merge.clone} 39.32 -We should now have two copies of \filename{hello.c} with different 39.33 -contents. The histories of the two repositories have also diverged, 39.34 -as illustrated in figure~\ref{fig:tour-merge:sep-repos}. 39.35 -\interaction{tour.merge.cat} 39.36 - 39.37 -\begin{figure}[ht] 39.38 - \centering 39.39 - \grafix{tour-merge-sep-repos} 39.40 - \caption{Divergent recent histories of the \dirname{my-hello} and 39.41 - \dirname{my-new-hello} repositories} 39.42 - \label{fig:tour-merge:sep-repos} 39.43 -\end{figure} 39.44 - 39.45 -We already know that pulling changes from our \dirname{my-hello} 39.46 -repository will have no effect on the working directory. 39.47 -\interaction{tour.merge.pull} 39.48 -However, the \hgcmd{pull} command says something about ``heads''. 39.49 - 39.50 -\subsection{Head changesets} 39.51 - 39.52 -A head is a change that has no descendants, or children, as they're 39.53 -also known. The tip revision is thus a head, because the newest 39.54 -revision in a repository doesn't have any children, but a repository 39.55 -can contain more than one head. 39.56 - 39.57 -\begin{figure}[ht] 39.58 - \centering 39.59 - \grafix{tour-merge-pull} 39.60 - \caption{Repository contents after pulling from \dirname{my-hello} into 39.61 - \dirname{my-new-hello}} 39.62 - \label{fig:tour-merge:pull} 39.63 -\end{figure} 39.64 - 39.65 -In figure~\ref{fig:tour-merge:pull}, you can see the effect of the 39.66 -pull from \dirname{my-hello} into \dirname{my-new-hello}. The history 39.67 -that was already present in \dirname{my-new-hello} is untouched, but a 39.68 -new revision has been added. By referring to 39.69 -figure~\ref{fig:tour-merge:sep-repos}, we can see that the 39.70 -\emph{changeset ID} remains the same in the new repository, but the 39.71 -\emph{revision number} has changed. (This, incidentally, is a fine 39.72 -example of why it's not safe to use revision numbers when discussing 39.73 -changesets.) We can view the heads in a repository using the 39.74 -\hgcmd{heads} command. 39.75 -\interaction{tour.merge.heads} 39.76 - 39.77 -\subsection{Performing the merge} 39.78 - 39.79 -What happens if we try to use the normal \hgcmd{update} command to 39.80 -update to the new tip? 39.81 -\interaction{tour.merge.update} 39.82 -Mercurial is telling us that the \hgcmd{update} command won't do a 39.83 -merge; it won't update the working directory when it thinks we might 39.84 -be wanting to do a merge, unless we force it to do so. Instead, we 39.85 -use the \hgcmd{merge} command to merge the two heads. 39.86 -\interaction{tour.merge.merge} 39.87 - 39.88 -\begin{figure}[ht] 39.89 - \centering 39.90 - \grafix{tour-merge-merge} 39.91 - \caption{Working directory and repository during merge, and 39.92 - following commit} 39.93 - \label{fig:tour-merge:merge} 39.94 -\end{figure} 39.95 - 39.96 -This updates the working directory so that it contains changes from 39.97 -\emph{both} heads, which is reflected in both the output of 39.98 -\hgcmd{parents} and the contents of \filename{hello.c}. 39.99 -\interaction{tour.merge.parents} 39.100 - 39.101 -\subsection{Committing the results of the merge} 39.102 - 39.103 -Whenever we've done a merge, \hgcmd{parents} will display two parents 39.104 -until we \hgcmd{commit} the results of the merge. 39.105 -\interaction{tour.merge.commit} 39.106 -We now have a new tip revision; notice that it has \emph{both} of 39.107 -our former heads as its parents. These are the same revisions that 39.108 -were previously displayed by \hgcmd{parents}. 39.109 -\interaction{tour.merge.tip} 39.110 -In figure~\ref{fig:tour-merge:merge}, you can see a representation of 39.111 -what happens to the working directory during the merge, and how this 39.112 -affects the repository when the commit happens. During the merge, the 39.113 -working directory has two parent changesets, and these become the 39.114 -parents of the new changeset. 39.115 - 39.116 -\section{Merging conflicting changes} 39.117 - 39.118 -Most merges are simple affairs, but sometimes you'll find yourself 39.119 -merging changes where each modifies the same portions of the same 39.120 -files. Unless both modifications are identical, this results in a 39.121 -\emph{conflict}, where you have to decide how to reconcile the 39.122 -different changes into something coherent. 39.123 - 39.124 -\begin{figure}[ht] 39.125 - \centering 39.126 - \grafix{tour-merge-conflict} 39.127 - \caption{Conflicting changes to a document} 39.128 - \label{fig:tour-merge:conflict} 39.129 -\end{figure} 39.130 - 39.131 -Figure~\ref{fig:tour-merge:conflict} illustrates an instance of two 39.132 -conflicting changes to a document. We started with a single version 39.133 -of the file; then we made some changes; while someone else made 39.134 -different changes to the same text. Our task in resolving the 39.135 -conflicting changes is to decide what the file should look like. 39.136 - 39.137 -Mercurial doesn't have a built-in facility for handling conflicts. 39.138 -Instead, it runs an external program called \command{hgmerge}. This 39.139 -is a shell script that is bundled with Mercurial; you can change it to 39.140 -behave however you please. What it does by default is try to find one 39.141 -of several different merging tools that are likely to be installed on 39.142 -your system. It first tries a few fully automatic merging tools; if 39.143 -these don't succeed (because the resolution process requires human 39.144 -guidance) or aren't present, the script tries a few different 39.145 -graphical merging tools. 39.146 - 39.147 -It's also possible to get Mercurial to run another program or script 39.148 -instead of \command{hgmerge}, by setting the \envar{HGMERGE} 39.149 -environment variable to the name of your preferred program. 39.150 - 39.151 -\subsection{Using a graphical merge tool} 39.152 - 39.153 -My preferred graphical merge tool is \command{kdiff3}, which I'll use 39.154 -to describe the features that are common to graphical file merging 39.155 -tools. You can see a screenshot of \command{kdiff3} in action in 39.156 -figure~\ref{fig:tour-merge:kdiff3}. The kind of merge it is 39.157 -performing is called a \emph{three-way merge}, because there are three 39.158 -different versions of the file of interest to us. The tool thus 39.159 -splits the upper portion of the window into three panes: 39.160 -\begin{itemize} 39.161 -\item At the left is the \emph{base} version of the file, i.e.~the 39.162 - most recent version from which the two versions we're trying to 39.163 - merge are descended. 39.164 -\item In the middle is ``our'' version of the file, with the contents 39.165 - that we modified. 39.166 -\item On the right is ``their'' version of the file, the one that 39.167 - from the changeset that we're trying to merge with. 39.168 -\end{itemize} 39.169 -In the pane below these is the current \emph{result} of the merge. 39.170 -Our task is to replace all of the red text, which indicates unresolved 39.171 -conflicts, with some sensible merger of the ``ours'' and ``theirs'' 39.172 -versions of the file. 39.173 - 39.174 -All four of these panes are \emph{locked together}; if we scroll 39.175 -vertically or horizontally in any of them, the others are updated to 39.176 -display the corresponding sections of their respective files. 39.177 - 39.178 -\begin{figure}[ht] 39.179 - \centering 39.180 - \grafix{kdiff3} 39.181 - \caption{Using \command{kdiff3} to merge versions of a file} 39.182 - \label{fig:tour-merge:kdiff3} 39.183 -\end{figure} 39.184 - 39.185 -For each conflicting portion of the file, we can choose to resolve 39.186 -the conflict using some combination of text from the base version, 39.187 -ours, or theirs. We can also manually edit the merged file at any 39.188 -time, in case we need to make further modifications. 39.189 - 39.190 -There are \emph{many} file merging tools available, too many to cover 39.191 -here. They vary in which platforms they are available for, and in 39.192 -their particular strengths and weaknesses. Most are tuned for merging 39.193 -files containing plain text, while a few are aimed at specialised file 39.194 -formats (generally XML). 39.195 - 39.196 -\subsection{A worked example} 39.197 - 39.198 -In this example, we will reproduce the file modification history of 39.199 -figure~\ref{fig:tour-merge:conflict} above. Let's begin by creating a 39.200 -repository with a base version of our document. 39.201 -\interaction{tour-merge-conflict.wife} 39.202 -We'll clone the repository and make a change to the file. 39.203 -\interaction{tour-merge-conflict.cousin} 39.204 -And another clone, to simulate someone else making a change to the 39.205 -file. (This hints at the idea that it's not all that unusual to merge 39.206 -with yourself when you isolate tasks in separate repositories, and 39.207 -indeed to find and resolve conflicts while doing so.) 39.208 -\interaction{tour-merge-conflict.son} 39.209 -Having created two different versions of the file, we'll set up an 39.210 -environment suitable for running our merge. 39.211 -\interaction{tour-merge-conflict.pull} 39.212 - 39.213 -In this example, I won't use Mercurial's normal \command{hgmerge} 39.214 -program to do the merge, because it would drop my nice automated 39.215 -example-running tool into a graphical user interface. Instead, I'll 39.216 -set \envar{HGMERGE} to tell Mercurial to use the non-interactive 39.217 -\command{merge} command. This is bundled with many Unix-like systems. 39.218 -If you're following this example on your computer, don't bother 39.219 -setting \envar{HGMERGE}. 39.220 -\interaction{tour-merge-conflict.merge} 39.221 -Because \command{merge} can't resolve the conflicting changes, it 39.222 -leaves \emph{merge markers} inside the file that has conflicts, 39.223 -indicating which lines have conflicts, and whether they came from our 39.224 -version of the file or theirs. 39.225 - 39.226 -Mercurial can tell from the way \command{merge} exits that it wasn't 39.227 -able to merge successfully, so it tells us what commands we'll need to 39.228 -run if we want to redo the merging operation. This could be useful 39.229 -if, for example, we were running a graphical merge tool and quit 39.230 -because we were confused or realised we had made a mistake. 39.231 - 39.232 -If automatic or manual merges fail, there's nothing to prevent us from 39.233 -``fixing up'' the affected files ourselves, and committing the results 39.234 -of our merge: 39.235 -\interaction{tour-merge-conflict.commit} 39.236 - 39.237 -\section{Simplifying the pull-merge-commit sequence} 39.238 -\label{sec:tour-merge:fetch} 39.239 - 39.240 -The process of merging changes as outlined above is straightforward, 39.241 -but requires running three commands in sequence. 39.242 -\begin{codesample2} 39.243 - hg pull 39.244 - hg merge 39.245 - hg commit -m 'Merged remote changes' 39.246 -\end{codesample2} 39.247 -In the case of the final commit, you also need to enter a commit 39.248 -message, which is almost always going to be a piece of uninteresting 39.249 -``boilerplate'' text. 39.250 - 39.251 -It would be nice to reduce the number of steps needed, if this were 39.252 -possible. Indeed, Mercurial is distributed with an extension called 39.253 -\hgext{fetch} that does just this. 39.254 - 39.255 -Mercurial provides a flexible extension mechanism that lets people 39.256 -extend its functionality, while keeping the core of Mercurial small 39.257 -and easy to deal with. Some extensions add new commands that you can 39.258 -use from the command line, while others work ``behind the scenes,'' 39.259 -for example adding capabilities to the server. 39.260 - 39.261 -The \hgext{fetch} extension adds a new command called, not 39.262 -surprisingly, \hgcmd{fetch}. This extension acts as a combination of 39.263 -\hgcmd{pull}, \hgcmd{update} and \hgcmd{merge}. It begins by pulling 39.264 -changes from another repository into the current repository. If it 39.265 -finds that the changes added a new head to the repository, it begins a 39.266 -merge, then commits the result of the merge with an 39.267 -automatically-generated commit message. If no new heads were added, 39.268 -it updates the working directory to the new tip changeset. 39.269 - 39.270 -Enabling the \hgext{fetch} extension is easy. Edit your 39.271 -\sfilename{.hgrc}, and either go to the \rcsection{extensions} section 39.272 -or create an \rcsection{extensions} section. Then add a line that 39.273 -simply reads ``\Verb+fetch +''. 39.274 -\begin{codesample2} 39.275 - [extensions] 39.276 - fetch = 39.277 -\end{codesample2} 39.278 -(Normally, on the right-hand side of the ``\texttt{=}'' would appear 39.279 -the location of the extension, but since the \hgext{fetch} extension 39.280 -is in the standard distribution, Mercurial knows where to search for 39.281 -it.) 39.282 - 39.283 -%%% Local Variables: 39.284 -%%% mode: latex 39.285 -%%% TeX-master: "00book" 39.286 -%%% End:
40.1 --- a/en/undo.tex Thu Jan 29 22:47:34 2009 -0800 40.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 40.3 @@ -1,767 +0,0 @@ 40.4 -\chapter{Finding and fixing your mistakes} 40.5 -\label{chap:undo} 40.6 - 40.7 -To err might be human, but to really handle the consequences well 40.8 -takes a top-notch revision control system. In this chapter, we'll 40.9 -discuss some of the techniques you can use when you find that a 40.10 -problem has crept into your project. Mercurial has some highly 40.11 -capable features that will help you to isolate the sources of 40.12 -problems, and to handle them appropriately. 40.13 - 40.14 -\section{Erasing local history} 40.15 - 40.16 -\subsection{The accidental commit} 40.17 - 40.18 -I have the occasional but persistent problem of typing rather more 40.19 -quickly than I can think, which sometimes results in me committing a 40.20 -changeset that is either incomplete or plain wrong. In my case, the 40.21 -usual kind of incomplete changeset is one in which I've created a new 40.22 -source file, but forgotten to \hgcmd{add} it. A ``plain wrong'' 40.23 -changeset is not as common, but no less annoying. 40.24 - 40.25 -\subsection{Rolling back a transaction} 40.26 -\label{sec:undo:rollback} 40.27 - 40.28 -In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats 40.29 -each modification of a repository as a \emph{transaction}. Every time 40.30 -you commit a changeset or pull changes from another repository, 40.31 -Mercurial remembers what you did. You can undo, or \emph{roll back}, 40.32 -exactly one of these actions using the \hgcmd{rollback} command. (See 40.33 -section~\ref{sec:undo:rollback-after-push} for an important caveat 40.34 -about the use of this command.) 40.35 - 40.36 -Here's a mistake that I often find myself making: committing a change 40.37 -in which I've created a new file, but forgotten to \hgcmd{add} it. 40.38 -\interaction{rollback.commit} 40.39 -Looking at the output of \hgcmd{status} after the commit immediately 40.40 -confirms the error. 40.41 -\interaction{rollback.status} 40.42 -The commit captured the changes to the file \filename{a}, but not the 40.43 -new file \filename{b}. If I were to push this changeset to a 40.44 -repository that I shared with a colleague, the chances are high that 40.45 -something in \filename{a} would refer to \filename{b}, which would not 40.46 -be present in their repository when they pulled my changes. I would 40.47 -thus become the object of some indignation. 40.48 - 40.49 -However, luck is with me---I've caught my error before I pushed the 40.50 -changeset. I use the \hgcmd{rollback} command, and Mercurial makes 40.51 -that last changeset vanish. 40.52 -\interaction{rollback.rollback} 40.53 -Notice that the changeset is no longer present in the repository's 40.54 -history, and the working directory once again thinks that the file 40.55 -\filename{a} is modified. The commit and rollback have left the 40.56 -working directory exactly as it was prior to the commit; the changeset 40.57 -has been completely erased. I can now safely \hgcmd{add} the file 40.58 -\filename{b}, and rerun my commit. 40.59 -\interaction{rollback.add} 40.60 - 40.61 -\subsection{The erroneous pull} 40.62 - 40.63 -It's common practice with Mercurial to maintain separate development 40.64 -branches of a project in different repositories. Your development 40.65 -team might have one shared repository for your project's ``0.9'' 40.66 -release, and another, containing different changes, for the ``1.0'' 40.67 -release. 40.68 - 40.69 -Given this, you can imagine that the consequences could be messy if 40.70 -you had a local ``0.9'' repository, and accidentally pulled changes 40.71 -from the shared ``1.0'' repository into it. At worst, you could be 40.72 -paying insufficient attention, and push those changes into the shared 40.73 -``0.9'' tree, confusing your entire team (but don't worry, we'll 40.74 -return to this horror scenario later). However, it's more likely that 40.75 -you'll notice immediately, because Mercurial will display the URL it's 40.76 -pulling from, or you will see it pull a suspiciously large number of 40.77 -changes into the repository. 40.78 - 40.79 -The \hgcmd{rollback} command will work nicely to expunge all of the 40.80 -changesets that you just pulled. Mercurial groups all changes from 40.81 -one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is 40.82 -all you need to undo this mistake. 40.83 - 40.84 -\subsection{Rolling back is useless once you've pushed} 40.85 -\label{sec:undo:rollback-after-push} 40.86 - 40.87 -The value of the \hgcmd{rollback} command drops to zero once you've 40.88 -pushed your changes to another repository. Rolling back a change 40.89 -makes it disappear entirely, but \emph{only} in the repository in 40.90 -which you perform the \hgcmd{rollback}. Because a rollback eliminates 40.91 -history, there's no way for the disappearance of a change to propagate 40.92 -between repositories. 40.93 - 40.94 -If you've pushed a change to another repository---particularly if it's 40.95 -a shared repository---it has essentially ``escaped into the wild,'' 40.96 -and you'll have to recover from your mistake in a different way. What 40.97 -will happen if you push a changeset somewhere, then roll it back, then 40.98 -pull from the repository you pushed to, is that the changeset will 40.99 -reappear in your repository. 40.100 - 40.101 -(If you absolutely know for sure that the change you want to roll back 40.102 -is the most recent change in the repository that you pushed to, 40.103 -\emph{and} you know that nobody else could have pulled it from that 40.104 -repository, you can roll back the changeset there, too, but you really 40.105 -should really not rely on this working reliably. If you do this, 40.106 -sooner or later a change really will make it into a repository that 40.107 -you don't directly control (or have forgotten about), and come back to 40.108 -bite you.) 40.109 - 40.110 -\subsection{You can only roll back once} 40.111 - 40.112 -Mercurial stores exactly one transaction in its transaction log; that 40.113 -transaction is the most recent one that occurred in the repository. 40.114 -This means that you can only roll back one transaction. If you expect 40.115 -to be able to roll back one transaction, then its predecessor, this is 40.116 -not the behaviour you will get. 40.117 -\interaction{rollback.twice} 40.118 -Once you've rolled back one transaction in a repository, you can't 40.119 -roll back again in that repository until you perform another commit or 40.120 -pull. 40.121 - 40.122 -\section{Reverting the mistaken change} 40.123 - 40.124 -If you make a modification to a file, and decide that you really 40.125 -didn't want to change the file at all, and you haven't yet committed 40.126 -your changes, the \hgcmd{revert} command is the one you'll need. It 40.127 -looks at the changeset that's the parent of the working directory, and 40.128 -restores the contents of the file to their state as of that changeset. 40.129 -(That's a long-winded way of saying that, in the normal case, it 40.130 -undoes your modifications.) 40.131 - 40.132 -Let's illustrate how the \hgcmd{revert} command works with yet another 40.133 -small example. We'll begin by modifying a file that Mercurial is 40.134 -already tracking. 40.135 -\interaction{daily.revert.modify} 40.136 -If we don't want that change, we can simply \hgcmd{revert} the file. 40.137 -\interaction{daily.revert.unmodify} 40.138 -The \hgcmd{revert} command provides us with an extra degree of safety 40.139 -by saving our modified file with a \filename{.orig} extension. 40.140 -\interaction{daily.revert.status} 40.141 - 40.142 -Here is a summary of the cases that the \hgcmd{revert} command can 40.143 -deal with. We will describe each of these in more detail in the 40.144 -section that follows. 40.145 -\begin{itemize} 40.146 -\item If you modify a file, it will restore the file to its unmodified 40.147 - state. 40.148 -\item If you \hgcmd{add} a file, it will undo the ``added'' state of 40.149 - the file, but leave the file itself untouched. 40.150 -\item If you delete a file without telling Mercurial, it will restore 40.151 - the file to its unmodified contents. 40.152 -\item If you use the \hgcmd{remove} command to remove a file, it will 40.153 - undo the ``removed'' state of the file, and restore the file to its 40.154 - unmodified contents. 40.155 -\end{itemize} 40.156 - 40.157 -\subsection{File management errors} 40.158 -\label{sec:undo:mgmt} 40.159 - 40.160 -The \hgcmd{revert} command is useful for more than just modified 40.161 -files. It lets you reverse the results of all of Mercurial's file 40.162 -management commands---\hgcmd{add}, \hgcmd{remove}, and so on. 40.163 - 40.164 -If you \hgcmd{add} a file, then decide that in fact you don't want 40.165 -Mercurial to track it, use \hgcmd{revert} to undo the add. Don't 40.166 -worry; Mercurial will not modify the file in any way. It will just 40.167 -``unmark'' the file. 40.168 -\interaction{daily.revert.add} 40.169 - 40.170 -Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use 40.171 -\hgcmd{revert} to restore it to the contents it had as of the parent 40.172 -of the working directory. 40.173 -\interaction{daily.revert.remove} 40.174 -This works just as well for a file that you deleted by hand, without 40.175 -telling Mercurial (recall that in Mercurial terminology, this kind of 40.176 -file is called ``missing''). 40.177 -\interaction{daily.revert.missing} 40.178 - 40.179 -If you revert a \hgcmd{copy}, the copied-to file remains in your 40.180 -working directory afterwards, untracked. Since a copy doesn't affect 40.181 -the copied-from file in any way, Mercurial doesn't do anything with 40.182 -the copied-from file. 40.183 -\interaction{daily.revert.copy} 40.184 - 40.185 -\subsubsection{A slightly special case: reverting a rename} 40.186 - 40.187 -If you \hgcmd{rename} a file, there is one small detail that 40.188 -you should remember. When you \hgcmd{revert} a rename, it's not 40.189 -enough to provide the name of the renamed-to file, as you can see 40.190 -here. 40.191 -\interaction{daily.revert.rename} 40.192 -As you can see from the output of \hgcmd{status}, the renamed-to file 40.193 -is no longer identified as added, but the renamed-\emph{from} file is 40.194 -still removed! This is counter-intuitive (at least to me), but at 40.195 -least it's easy to deal with. 40.196 -\interaction{daily.revert.rename-orig} 40.197 -So remember, to revert a \hgcmd{rename}, you must provide \emph{both} 40.198 -the source and destination names. 40.199 - 40.200 -% TODO: the output doesn't look like it will be removed! 40.201 - 40.202 -(By the way, if you rename a file, then modify the renamed-to file, 40.203 -then revert both components of the rename, when Mercurial restores the 40.204 -file that was removed as part of the rename, it will be unmodified. 40.205 -If you need the modifications in the renamed-to file to show up in the 40.206 -renamed-from file, don't forget to copy them over.) 40.207 - 40.208 -These fiddly aspects of reverting a rename arguably constitute a small 40.209 -bug in Mercurial. 40.210 - 40.211 -\section{Dealing with committed changes} 40.212 - 40.213 -Consider a case where you have committed a change $a$, and another 40.214 -change $b$ on top of it; you then realise that change $a$ was 40.215 -incorrect. Mercurial lets you ``back out'' an entire changeset 40.216 -automatically, and building blocks that let you reverse part of a 40.217 -changeset by hand. 40.218 - 40.219 -Before you read this section, here's something to keep in mind: the 40.220 -\hgcmd{backout} command undoes changes by \emph{adding} history, not 40.221 -by modifying or erasing it. It's the right tool to use if you're 40.222 -fixing bugs, but not if you're trying to undo some change that has 40.223 -catastrophic consequences. To deal with those, see 40.224 -section~\ref{sec:undo:aaaiiieee}. 40.225 - 40.226 -\subsection{Backing out a changeset} 40.227 - 40.228 -The \hgcmd{backout} command lets you ``undo'' the effects of an entire 40.229 -changeset in an automated fashion. Because Mercurial's history is 40.230 -immutable, this command \emph{does not} get rid of the changeset you 40.231 -want to undo. Instead, it creates a new changeset that 40.232 -\emph{reverses} the effect of the to-be-undone changeset. 40.233 - 40.234 -The operation of the \hgcmd{backout} command is a little intricate, so 40.235 -let's illustrate it with some examples. First, we'll create a 40.236 -repository with some simple changes. 40.237 -\interaction{backout.init} 40.238 - 40.239 -The \hgcmd{backout} command takes a single changeset ID as its 40.240 -argument; this is the changeset to back out. Normally, 40.241 -\hgcmd{backout} will drop you into a text editor to write a commit 40.242 -message, so you can record why you're backing the change out. In this 40.243 -example, we provide a commit message on the command line using the 40.244 -\hgopt{backout}{-m} option. 40.245 - 40.246 -\subsection{Backing out the tip changeset} 40.247 - 40.248 -We're going to start by backing out the last changeset we committed. 40.249 -\interaction{backout.simple} 40.250 -You can see that the second line from \filename{myfile} is no longer 40.251 -present. Taking a look at the output of \hgcmd{log} gives us an idea 40.252 -of what the \hgcmd{backout} command has done. 40.253 -\interaction{backout.simple.log} 40.254 -Notice that the new changeset that \hgcmd{backout} has created is a 40.255 -child of the changeset we backed out. It's easier to see this in 40.256 -figure~\ref{fig:undo:backout}, which presents a graphical view of the 40.257 -change history. As you can see, the history is nice and linear. 40.258 - 40.259 -\begin{figure}[htb] 40.260 - \centering 40.261 - \grafix{undo-simple} 40.262 - \caption{Backing out a change using the \hgcmd{backout} command} 40.263 - \label{fig:undo:backout} 40.264 -\end{figure} 40.265 - 40.266 -\subsection{Backing out a non-tip change} 40.267 - 40.268 -If you want to back out a change other than the last one you 40.269 -committed, pass the \hgopt{backout}{--merge} option to the 40.270 -\hgcmd{backout} command. 40.271 -\interaction{backout.non-tip.clone} 40.272 -This makes backing out any changeset a ``one-shot'' operation that's 40.273 -usually simple and fast. 40.274 -\interaction{backout.non-tip.backout} 40.275 - 40.276 -If you take a look at the contents of \filename{myfile} after the 40.277 -backout finishes, you'll see that the first and third changes are 40.278 -present, but not the second. 40.279 -\interaction{backout.non-tip.cat} 40.280 - 40.281 -As the graphical history in figure~\ref{fig:undo:backout-non-tip} 40.282 -illustrates, Mercurial actually commits \emph{two} changes in this 40.283 -kind of situation (the box-shaped nodes are the ones that Mercurial 40.284 -commits automatically). Before Mercurial begins the backout process, 40.285 -it first remembers what the current parent of the working directory 40.286 -is. It then backs out the target changeset, and commits that as a 40.287 -changeset. Finally, it merges back to the previous parent of the 40.288 -working directory, and commits the result of the merge. 40.289 - 40.290 -% TODO: to me it looks like mercurial doesn't commit the second merge automatically! 40.291 - 40.292 -\begin{figure}[htb] 40.293 - \centering 40.294 - \grafix{undo-non-tip} 40.295 - \caption{Automated backout of a non-tip change using the \hgcmd{backout} command} 40.296 - \label{fig:undo:backout-non-tip} 40.297 -\end{figure} 40.298 - 40.299 -The result is that you end up ``back where you were'', only with some 40.300 -extra history that undoes the effect of the changeset you wanted to 40.301 -back out. 40.302 - 40.303 -\subsubsection{Always use the \hgopt{backout}{--merge} option} 40.304 - 40.305 -In fact, since the \hgopt{backout}{--merge} option will do the ``right 40.306 -thing'' whether or not the changeset you're backing out is the tip 40.307 -(i.e.~it won't try to merge if it's backing out the tip, since there's 40.308 -no need), you should \emph{always} use this option when you run the 40.309 -\hgcmd{backout} command. 40.310 - 40.311 -\subsection{Gaining more control of the backout process} 40.312 - 40.313 -While I've recommended that you always use the 40.314 -\hgopt{backout}{--merge} option when backing out a change, the 40.315 -\hgcmd{backout} command lets you decide how to merge a backout 40.316 -changeset. Taking control of the backout process by hand is something 40.317 -you will rarely need to do, but it can be useful to understand what 40.318 -the \hgcmd{backout} command is doing for you automatically. To 40.319 -illustrate this, let's clone our first repository, but omit the 40.320 -backout change that it contains. 40.321 - 40.322 -\interaction{backout.manual.clone} 40.323 -As with our earlier example, We'll commit a third changeset, then back 40.324 -out its parent, and see what happens. 40.325 -\interaction{backout.manual.backout} 40.326 -Our new changeset is again a descendant of the changeset we backout 40.327 -out; it's thus a new head, \emph{not} a descendant of the changeset 40.328 -that was the tip. The \hgcmd{backout} command was quite explicit in 40.329 -telling us this. 40.330 -\interaction{backout.manual.log} 40.331 - 40.332 -Again, it's easier to see what has happened by looking at a graph of 40.333 -the revision history, in figure~\ref{fig:undo:backout-manual}. This 40.334 -makes it clear that when we use \hgcmd{backout} to back out a change 40.335 -other than the tip, Mercurial adds a new head to the repository (the 40.336 -change it committed is box-shaped). 40.337 - 40.338 -\begin{figure}[htb] 40.339 - \centering 40.340 - \grafix{undo-manual} 40.341 - \caption{Backing out a change using the \hgcmd{backout} command} 40.342 - \label{fig:undo:backout-manual} 40.343 -\end{figure} 40.344 - 40.345 -After the \hgcmd{backout} command has completed, it leaves the new 40.346 -``backout'' changeset as the parent of the working directory. 40.347 -\interaction{backout.manual.parents} 40.348 -Now we have two isolated sets of changes. 40.349 -\interaction{backout.manual.heads} 40.350 - 40.351 -Let's think about what we expect to see as the contents of 40.352 -\filename{myfile} now. The first change should be present, because 40.353 -we've never backed it out. The second change should be missing, as 40.354 -that's the change we backed out. Since the history graph shows the 40.355 -third change as a separate head, we \emph{don't} expect to see the 40.356 -third change present in \filename{myfile}. 40.357 -\interaction{backout.manual.cat} 40.358 -To get the third change back into the file, we just do a normal merge 40.359 -of our two heads. 40.360 -\interaction{backout.manual.merge} 40.361 -Afterwards, the graphical history of our repository looks like 40.362 -figure~\ref{fig:undo:backout-manual-merge}. 40.363 - 40.364 -\begin{figure}[htb] 40.365 - \centering 40.366 - \grafix{undo-manual-merge} 40.367 - \caption{Manually merging a backout change} 40.368 - \label{fig:undo:backout-manual-merge} 40.369 -\end{figure} 40.370 - 40.371 -\subsection{Why \hgcmd{backout} works as it does} 40.372 - 40.373 -Here's a brief description of how the \hgcmd{backout} command works. 40.374 -\begin{enumerate} 40.375 -\item It ensures that the working directory is ``clean'', i.e.~that 40.376 - the output of \hgcmd{status} would be empty. 40.377 -\item It remembers the current parent of the working directory. Let's 40.378 - call this changeset \texttt{orig} 40.379 -\item It does the equivalent of a \hgcmd{update} to sync the working 40.380 - directory to the changeset you want to back out. Let's call this 40.381 - changeset \texttt{backout} 40.382 -\item It finds the parent of that changeset. Let's call that 40.383 - changeset \texttt{parent}. 40.384 -\item For each file that the \texttt{backout} changeset affected, it 40.385 - does the equivalent of a \hgcmdargs{revert}{-r parent} on that file, 40.386 - to restore it to the contents it had before that changeset was 40.387 - committed. 40.388 -\item It commits the result as a new changeset. This changeset has 40.389 - \texttt{backout} as its parent. 40.390 -\item If you specify \hgopt{backout}{--merge} on the command line, it 40.391 - merges with \texttt{orig}, and commits the result of the merge. 40.392 -\end{enumerate} 40.393 - 40.394 -An alternative way to implement the \hgcmd{backout} command would be 40.395 -to \hgcmd{export} the to-be-backed-out changeset as a diff, then use 40.396 -the \cmdopt{patch}{--reverse} option to the \command{patch} command to 40.397 -reverse the effect of the change without fiddling with the working 40.398 -directory. This sounds much simpler, but it would not work nearly as 40.399 -well. 40.400 - 40.401 -The reason that \hgcmd{backout} does an update, a commit, a merge, and 40.402 -another commit is to give the merge machinery the best chance to do a 40.403 -good job when dealing with all the changes \emph{between} the change 40.404 -you're backing out and the current tip. 40.405 - 40.406 -If you're backing out a changeset that's~100 revisions back in your 40.407 -project's history, the chances that the \command{patch} command will 40.408 -be able to apply a reverse diff cleanly are not good, because 40.409 -intervening changes are likely to have ``broken the context'' that 40.410 -\command{patch} uses to determine whether it can apply a patch (if 40.411 -this sounds like gibberish, see \ref{sec:mq:patch} for a 40.412 -discussion of the \command{patch} command). Also, Mercurial's merge 40.413 -machinery will handle files and directories being renamed, permission 40.414 -changes, and modifications to binary files, none of which 40.415 -\command{patch} can deal with. 40.416 - 40.417 -\section{Changes that should never have been} 40.418 -\label{sec:undo:aaaiiieee} 40.419 - 40.420 -Most of the time, the \hgcmd{backout} command is exactly what you need 40.421 -if you want to undo the effects of a change. It leaves a permanent 40.422 -record of exactly what you did, both when committing the original 40.423 -changeset and when you cleaned up after it. 40.424 - 40.425 -On rare occasions, though, you may find that you've committed a change 40.426 -that really should not be present in the repository at all. For 40.427 -example, it would be very unusual, and usually considered a mistake, 40.428 -to commit a software project's object files as well as its source 40.429 -files. Object files have almost no intrinsic value, and they're 40.430 -\emph{big}, so they increase the size of the repository and the amount 40.431 -of time it takes to clone or pull changes. 40.432 - 40.433 -Before I discuss the options that you have if you commit a ``brown 40.434 -paper bag'' change (the kind that's so bad that you want to pull a 40.435 -brown paper bag over your head), let me first discuss some approaches 40.436 -that probably won't work. 40.437 - 40.438 -Since Mercurial treats history as accumulative---every change builds 40.439 -on top of all changes that preceded it---you generally can't just make 40.440 -disastrous changes disappear. The one exception is when you've just 40.441 -committed a change, and it hasn't been pushed or pulled into another 40.442 -repository. That's when you can safely use the \hgcmd{rollback} 40.443 -command, as I detailed in section~\ref{sec:undo:rollback}. 40.444 - 40.445 -After you've pushed a bad change to another repository, you 40.446 -\emph{could} still use \hgcmd{rollback} to make your local copy of the 40.447 -change disappear, but it won't have the consequences you want. The 40.448 -change will still be present in the remote repository, so it will 40.449 -reappear in your local repository the next time you pull. 40.450 - 40.451 -If a situation like this arises, and you know which repositories your 40.452 -bad change has propagated into, you can \emph{try} to get rid of the 40.453 -changeefrom \emph{every} one of those repositories. This is, of 40.454 -course, not a satisfactory solution: if you miss even a single 40.455 -repository while you're expunging, the change is still ``in the 40.456 -wild'', and could propagate further. 40.457 - 40.458 -If you've committed one or more changes \emph{after} the change that 40.459 -you'd like to see disappear, your options are further reduced. 40.460 -Mercurial doesn't provide a way to ``punch a hole'' in history, 40.461 -leaving changesets intact. 40.462 - 40.463 -XXX This needs filling out. The \texttt{hg-replay} script in the 40.464 -\texttt{examples} directory works, but doesn't handle merge 40.465 -changesets. Kind of an important omission. 40.466 - 40.467 -\subsection{Protect yourself from ``escaped'' changes} 40.468 - 40.469 -If you've committed some changes to your local repository and they've 40.470 -been pushed or pulled somewhere else, this isn't necessarily a 40.471 -disaster. You can protect yourself ahead of time against some classes 40.472 -of bad changeset. This is particularly easy if your team usually 40.473 -pulls changes from a central repository. 40.474 - 40.475 -By configuring some hooks on that repository to validate incoming 40.476 -changesets (see chapter~\ref{chap:hook}), you can automatically 40.477 -prevent some kinds of bad changeset from being pushed to the central 40.478 -repository at all. With such a configuration in place, some kinds of 40.479 -bad changeset will naturally tend to ``die out'' because they can't 40.480 -propagate into the central repository. Better yet, this happens 40.481 -without any need for explicit intervention. 40.482 - 40.483 -For instance, an incoming change hook that verifies that a changeset 40.484 -will actually compile can prevent people from inadvertantly ``breaking 40.485 -the build''. 40.486 - 40.487 -\section{Finding the source of a bug} 40.488 -\label{sec:undo:bisect} 40.489 - 40.490 -While it's all very well to be able to back out a changeset that 40.491 -introduced a bug, this requires that you know which changeset to back 40.492 -out. Mercurial provides an invaluable command, called 40.493 -\hgcmd{bisect}, that helps you to automate this process and accomplish 40.494 -it very efficiently. 40.495 - 40.496 -The idea behind the \hgcmd{bisect} command is that a changeset has 40.497 -introduced some change of behaviour that you can identify with a 40.498 -simple binary test. You don't know which piece of code introduced the 40.499 -change, but you know how to test for the presence of the bug. The 40.500 -\hgcmd{bisect} command uses your test to direct its search for the 40.501 -changeset that introduced the code that caused the bug. 40.502 - 40.503 -Here are a few scenarios to help you understand how you might apply 40.504 -this command. 40.505 -\begin{itemize} 40.506 -\item The most recent version of your software has a bug that you 40.507 - remember wasn't present a few weeks ago, but you don't know when it 40.508 - was introduced. Here, your binary test checks for the presence of 40.509 - that bug. 40.510 -\item You fixed a bug in a rush, and now it's time to close the entry 40.511 - in your team's bug database. The bug database requires a changeset 40.512 - ID when you close an entry, but you don't remember which changeset 40.513 - you fixed the bug in. Once again, your binary test checks for the 40.514 - presence of the bug. 40.515 -\item Your software works correctly, but runs~15\% slower than the 40.516 - last time you measured it. You want to know which changeset 40.517 - introduced the performance regression. In this case, your binary 40.518 - test measures the performance of your software, to see whether it's 40.519 - ``fast'' or ``slow''. 40.520 -\item The sizes of the components of your project that you ship 40.521 - exploded recently, and you suspect that something changed in the way 40.522 - you build your project. 40.523 -\end{itemize} 40.524 - 40.525 -From these examples, it should be clear that the \hgcmd{bisect} 40.526 -command is not useful only for finding the sources of bugs. You can 40.527 -use it to find any ``emergent property'' of a repository (anything 40.528 -that you can't find from a simple text search of the files in the 40.529 -tree) for which you can write a binary test. 40.530 - 40.531 -We'll introduce a little bit of terminology here, just to make it 40.532 -clear which parts of the search process are your responsibility, and 40.533 -which are Mercurial's. A \emph{test} is something that \emph{you} run 40.534 -when \hgcmd{bisect} chooses a changeset. A \emph{probe} is what 40.535 -\hgcmd{bisect} runs to tell whether a revision is good. Finally, 40.536 -we'll use the word ``bisect'', as both a noun and a verb, to stand in 40.537 -for the phrase ``search using the \hgcmd{bisect} command. 40.538 - 40.539 -One simple way to automate the searching process would be simply to 40.540 -probe every changeset. However, this scales poorly. If it took ten 40.541 -minutes to test a single changeset, and you had 10,000 changesets in 40.542 -your repository, the exhaustive approach would take on average~35 40.543 -\emph{days} to find the changeset that introduced a bug. Even if you 40.544 -knew that the bug was introduced by one of the last 500 changesets, 40.545 -and limited your search to those, you'd still be looking at over 40 40.546 -hours to find the changeset that introduced your bug. 40.547 - 40.548 -What the \hgcmd{bisect} command does is use its knowledge of the 40.549 -``shape'' of your project's revision history to perform a search in 40.550 -time proportional to the \emph{logarithm} of the number of changesets 40.551 -to check (the kind of search it performs is called a dichotomic 40.552 -search). With this approach, searching through 10,000 changesets will 40.553 -take less than three hours, even at ten minutes per test (the search 40.554 -will require about 14 tests). Limit your search to the last hundred 40.555 -changesets, and it will take only about an hour (roughly seven tests). 40.556 - 40.557 -The \hgcmd{bisect} command is aware of the ``branchy'' nature of a 40.558 -Mercurial project's revision history, so it has no problems dealing 40.559 -with branches, merges, or multiple heads in a repository. It can 40.560 -prune entire branches of history with a single probe, which is how it 40.561 -operates so efficiently. 40.562 - 40.563 -\subsection{Using the \hgcmd{bisect} command} 40.564 - 40.565 -Here's an example of \hgcmd{bisect} in action. 40.566 - 40.567 -\begin{note} 40.568 - In versions 0.9.5 and earlier of Mercurial, \hgcmd{bisect} was not a 40.569 - core command: it was distributed with Mercurial as an extension. 40.570 - This section describes the built-in command, not the old extension. 40.571 -\end{note} 40.572 - 40.573 -Now let's create a repository, so that we can try out the 40.574 -\hgcmd{bisect} command in isolation. 40.575 -\interaction{bisect.init} 40.576 -We'll simulate a project that has a bug in it in a simple-minded way: 40.577 -create trivial changes in a loop, and nominate one specific change 40.578 -that will have the ``bug''. This loop creates 35 changesets, each 40.579 -adding a single file to the repository. We'll represent our ``bug'' 40.580 -with a file that contains the text ``i have a gub''. 40.581 -\interaction{bisect.commits} 40.582 - 40.583 -The next thing that we'd like to do is figure out how to use the 40.584 -\hgcmd{bisect} command. We can use Mercurial's normal built-in help 40.585 -mechanism for this. 40.586 -\interaction{bisect.help} 40.587 - 40.588 -The \hgcmd{bisect} command works in steps. Each step proceeds as follows. 40.589 -\begin{enumerate} 40.590 -\item You run your binary test. 40.591 - \begin{itemize} 40.592 - \item If the test succeeded, you tell \hgcmd{bisect} by running the 40.593 - \hgcmdargs{bisect}{good} command. 40.594 - \item If it failed, run the \hgcmdargs{bisect}{--bad} command. 40.595 - \end{itemize} 40.596 -\item The command uses your information to decide which changeset to 40.597 - test next. 40.598 -\item It updates the working directory to that changeset, and the 40.599 - process begins again. 40.600 -\end{enumerate} 40.601 -The process ends when \hgcmd{bisect} identifies a unique changeset 40.602 -that marks the point where your test transitioned from ``succeeding'' 40.603 -to ``failing''. 40.604 - 40.605 -To start the search, we must run the \hgcmdargs{bisect}{--reset} command. 40.606 -\interaction{bisect.search.init} 40.607 - 40.608 -In our case, the binary test we use is simple: we check to see if any 40.609 -file in the repository contains the string ``i have a gub''. If it 40.610 -does, this changeset contains the change that ``caused the bug''. By 40.611 -convention, a changeset that has the property we're searching for is 40.612 -``bad'', while one that doesn't is ``good''. 40.613 - 40.614 -Most of the time, the revision to which the working directory is 40.615 -synced (usually the tip) already exhibits the problem introduced by 40.616 -the buggy change, so we'll mark it as ``bad''. 40.617 -\interaction{bisect.search.bad-init} 40.618 - 40.619 -Our next task is to nominate a changeset that we know \emph{doesn't} 40.620 -have the bug; the \hgcmd{bisect} command will ``bracket'' its search 40.621 -between the first pair of good and bad changesets. In our case, we 40.622 -know that revision~10 didn't have the bug. (I'll have more words 40.623 -about choosing the first ``good'' changeset later.) 40.624 -\interaction{bisect.search.good-init} 40.625 - 40.626 -Notice that this command printed some output. 40.627 -\begin{itemize} 40.628 -\item It told us how many changesets it must consider before it can 40.629 - identify the one that introduced the bug, and how many tests that 40.630 - will require. 40.631 -\item It updated the working directory to the next changeset to test, 40.632 - and told us which changeset it's testing. 40.633 -\end{itemize} 40.634 - 40.635 -We now run our test in the working directory. We use the 40.636 -\command{grep} command to see if our ``bad'' file is present in the 40.637 -working directory. If it is, this revision is bad; if not, this 40.638 -revision is good. 40.639 -\interaction{bisect.search.step1} 40.640 - 40.641 -This test looks like a perfect candidate for automation, so let's turn 40.642 -it into a shell function. 40.643 -\interaction{bisect.search.mytest} 40.644 -We can now run an entire test step with a single command, 40.645 -\texttt{mytest}. 40.646 -\interaction{bisect.search.step2} 40.647 -A few more invocations of our canned test step command, and we're 40.648 -done. 40.649 -\interaction{bisect.search.rest} 40.650 - 40.651 -Even though we had~40 changesets to search through, the \hgcmd{bisect} 40.652 -command let us find the changeset that introduced our ``bug'' with 40.653 -only five tests. Because the number of tests that the \hgcmd{bisect} 40.654 -command performs grows logarithmically with the number of changesets to 40.655 -search, the advantage that it has over the ``brute force'' search 40.656 -approach increases with every changeset you add. 40.657 - 40.658 -\subsection{Cleaning up after your search} 40.659 - 40.660 -When you're finished using the \hgcmd{bisect} command in a 40.661 -repository, you can use the \hgcmdargs{bisect}{reset} command to drop 40.662 -the information it was using to drive your search. The command 40.663 -doesn't use much space, so it doesn't matter if you forget to run this 40.664 -command. However, \hgcmd{bisect} won't let you start a new search in 40.665 -that repository until you do a \hgcmdargs{bisect}{reset}. 40.666 -\interaction{bisect.search.reset} 40.667 - 40.668 -\section{Tips for finding bugs effectively} 40.669 - 40.670 -\subsection{Give consistent input} 40.671 - 40.672 -The \hgcmd{bisect} command requires that you correctly report the 40.673 -result of every test you perform. If you tell it that a test failed 40.674 -when it really succeeded, it \emph{might} be able to detect the 40.675 -inconsistency. If it can identify an inconsistency in your reports, 40.676 -it will tell you that a particular changeset is both good and bad. 40.677 -However, it can't do this perfectly; it's about as likely to report 40.678 -the wrong changeset as the source of the bug. 40.679 - 40.680 -\subsection{Automate as much as possible} 40.681 - 40.682 -When I started using the \hgcmd{bisect} command, I tried a few times 40.683 -to run my tests by hand, on the command line. This is an approach 40.684 -that I, at least, am not suited to. After a few tries, I found that I 40.685 -was making enough mistakes that I was having to restart my searches 40.686 -several times before finally getting correct results. 40.687 - 40.688 -My initial problems with driving the \hgcmd{bisect} command by hand 40.689 -occurred even with simple searches on small repositories; if the 40.690 -problem you're looking for is more subtle, or the number of tests that 40.691 -\hgcmd{bisect} must perform increases, the likelihood of operator 40.692 -error ruining the search is much higher. Once I started automating my 40.693 -tests, I had much better results. 40.694 - 40.695 -The key to automated testing is twofold: 40.696 -\begin{itemize} 40.697 -\item always test for the same symptom, and 40.698 -\item always feed consistent input to the \hgcmd{bisect} command. 40.699 -\end{itemize} 40.700 -In my tutorial example above, the \command{grep} command tests for the 40.701 -symptom, and the \texttt{if} statement takes the result of this check 40.702 -and ensures that we always feed the same input to the \hgcmd{bisect} 40.703 -command. The \texttt{mytest} function marries these together in a 40.704 -reproducible way, so that every test is uniform and consistent. 40.705 - 40.706 -\subsection{Check your results} 40.707 - 40.708 -Because the output of a \hgcmd{bisect} search is only as good as the 40.709 -input you give it, don't take the changeset it reports as the 40.710 -absolute truth. A simple way to cross-check its report is to manually 40.711 -run your test at each of the following changesets: 40.712 -\begin{itemize} 40.713 -\item The changeset that it reports as the first bad revision. Your 40.714 - test should still report this as bad. 40.715 -\item The parent of that changeset (either parent, if it's a merge). 40.716 - Your test should report this changeset as good. 40.717 -\item A child of that changeset. Your test should report this 40.718 - changeset as bad. 40.719 -\end{itemize} 40.720 - 40.721 -\subsection{Beware interference between bugs} 40.722 - 40.723 -It's possible that your search for one bug could be disrupted by the 40.724 -presence of another. For example, let's say your software crashes at 40.725 -revision 100, and worked correctly at revision 50. Unknown to you, 40.726 -someone else introduced a different crashing bug at revision 60, and 40.727 -fixed it at revision 80. This could distort your results in one of 40.728 -several ways. 40.729 - 40.730 -It is possible that this other bug completely ``masks'' yours, which 40.731 -is to say that it occurs before your bug has a chance to manifest 40.732 -itself. If you can't avoid that other bug (for example, it prevents 40.733 -your project from building), and so can't tell whether your bug is 40.734 -present in a particular changeset, the \hgcmd{bisect} command cannot 40.735 -help you directly. Instead, you can mark a changeset as untested by 40.736 -running \hgcmdargs{bisect}{--skip}. 40.737 - 40.738 -A different problem could arise if your test for a bug's presence is 40.739 -not specific enough. If you check for ``my program crashes'', then 40.740 -both your crashing bug and an unrelated crashing bug that masks it 40.741 -will look like the same thing, and mislead \hgcmd{bisect}. 40.742 - 40.743 -Another useful situation in which to use \hgcmdargs{bisect}{--skip} is 40.744 -if you can't test a revision because your project was in a broken and 40.745 -hence untestable state at that revision, perhaps because someone 40.746 -checked in a change that prevented the project from building. 40.747 - 40.748 -\subsection{Bracket your search lazily} 40.749 - 40.750 -Choosing the first ``good'' and ``bad'' changesets that will mark the 40.751 -end points of your search is often easy, but it bears a little 40.752 -discussion nevertheless. From the perspective of \hgcmd{bisect}, the 40.753 -``newest'' changeset is conventionally ``bad'', and the older 40.754 -changeset is ``good''. 40.755 - 40.756 -If you're having trouble remembering when a suitable ``good'' change 40.757 -was, so that you can tell \hgcmd{bisect}, you could do worse than 40.758 -testing changesets at random. Just remember to eliminate contenders 40.759 -that can't possibly exhibit the bug (perhaps because the feature with 40.760 -the bug isn't present yet) and those where another problem masks the 40.761 -bug (as I discussed above). 40.762 - 40.763 -Even if you end up ``early'' by thousands of changesets or months of 40.764 -history, you will only add a handful of tests to the total number that 40.765 -\hgcmd{bisect} must perform, thanks to its logarithmic behaviour. 40.766 - 40.767 -%%% Local Variables: 40.768 -%%% mode: latex 40.769 -%%% TeX-master: "00book" 40.770 -%%% End: