hgbook
diff en/concepts.tex @ 115:b74102b56df5
Wow! Lots more work detailing the working directory, merging, etc.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Mon Nov 13 16:19:48 2006 -0800 (2006-11-13) |
parents | a0f57b3e677e |
children | ca99f247899e |
line diff
1.1 --- a/en/concepts.tex Mon Nov 13 14:32:16 2006 -0800 1.2 +++ b/en/concepts.tex Mon Nov 13 16:19:48 2006 -0800 1.3 @@ -204,6 +204,35 @@ 1.4 after the corrupted section. This would not be possible with a 1.5 delta-only storage model. 1.6 1.7 +\section{Revision history, branching, 1.8 + and merging} 1.9 + 1.10 +Every entry in a Mercurial revlog knows the identity of its immediate 1.11 +ancestor revision, usually referred to as its \emph{parent}. In fact, 1.12 +a revision contains room for not one parent, but two. Mercurial uses 1.13 +a special hash, called the ``null ID'', to represent the idea ``there 1.14 +is no parent here''. This hash is simply a string of zeroes. 1.15 + 1.16 +In figure~\ref{fig:concepts:revlog}, you can see an example of the 1.17 +conceptual structure of a revlog. Filelogs, manifests, and changelogs 1.18 +all have this same structure; they differ only in the kind of data 1.19 +stored in each delta or snapshot. 1.20 + 1.21 +The first revision in a revlog (at the bottom of the image) has the 1.22 +null ID in both of its parent slots. For a ``normal'' revision, its 1.23 +first parent slot contains the ID of its parent revision, and its 1.24 +second contains the null ID, indicating that the revision has only one 1.25 +real parent. Any two revisions that have the same parent ID are 1.26 +branches. A revision that represents a merge between branches has two 1.27 +normal revision IDs in its parent slots. 1.28 + 1.29 +\begin{figure}[ht] 1.30 + \centering 1.31 + \grafix{revlog} 1.32 + \caption{} 1.33 + \label{fig:concepts:revlog} 1.34 +\end{figure} 1.35 + 1.36 \section{The working directory} 1.37 1.38 In the working directory, Mercurial stores a snapshot of the files 1.39 @@ -266,58 +295,117 @@ 1.40 1.41 After a commit, Mercurial will update the parents of the working 1.42 directory, so that the first parent is the ID of the new changeset, 1.43 -and the second is the null ID. This is illustrated in 1.44 -figure~\ref{fig:concepts:wdir-after-commit}. 1.45 - 1.46 -\subsection{Other contents of the dirstate} 1.47 - 1.48 -Because Mercurial doesn't force you to tell it when you're modifying a 1.49 -file, it uses the dirstate to store some extra information so it can 1.50 -determine efficiently whether you have modified a file. For each file 1.51 -in the working directory, it stores the time that it last modified the 1.52 -file itself, and the size of the file at that time. 1.53 - 1.54 -When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or 1.55 -\hgcmd{copy} files, the dirstate is updated each time. 1.56 - 1.57 -When Mercurial is checking the states of files in the working 1.58 -directory, it first checks a file's modification time. If that has 1.59 -not changed, the file must not have been modified. If the file's size 1.60 -has changed, the file must have been modified. If the modification 1.61 -time has changed, but the size has not, only then does Mercurial need 1.62 -to read the actual contents of the file to see if they've changed. 1.63 -Storing these few extra pieces of information dramatically reduces the 1.64 -amount of data that Mercurial needs to read, which yields large 1.65 -performance improvements compared to other revision control systems. 1.66 - 1.67 -\section{Revision history, branching, 1.68 - and merging} 1.69 - 1.70 -Every entry in a Mercurial revlog knows the identity of its immediate 1.71 -ancestor revision, usually referred to as its \emph{parent}. In fact, 1.72 -a revision contains room for not one parent, but two. Mercurial uses 1.73 -a special hash, called the ``null ID'', to represent the idea ``there 1.74 -is no parent here''. This hash is simply a string of zeroes. 1.75 - 1.76 -In figure~\ref{fig:concepts:revlog}, you can see an example of the 1.77 -conceptual structure of a revlog. Filelogs, manifests, and changelogs 1.78 -all have this same structure; they differ only in the kind of data 1.79 -stored in each delta or snapshot. 1.80 - 1.81 -The first revision in a revlog (at the bottom of the image) has the 1.82 -null ID in both of its parent slots. For a ``normal'' revision, its 1.83 -first parent slot contains the ID of its parent revision, and its 1.84 -second contains the null ID, indicating that the revision has only one 1.85 -real parent. Any two revisions that have the same parent ID are 1.86 -branches. A revision that represents a merge between branches has two 1.87 -normal revision IDs in its parent slots. 1.88 - 1.89 -\begin{figure}[ht] 1.90 - \centering 1.91 - \grafix{revlog} 1.92 - \caption{} 1.93 - \label{fig:concepts:revlog} 1.94 -\end{figure} 1.95 +and the second is the null ID. This is shown in 1.96 +figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch 1.97 +any of the files in the working directory when you commit; it just 1.98 +modifies the dirstate to note its new parents. 1.99 + 1.100 +\subsection{Creating a new head} 1.101 + 1.102 +It's perfectly normal to update the working directory to a changeset 1.103 +other than the current tip. For example, you might want to know what 1.104 +your project looked like last Tuesday, or you could be looking through 1.105 +changesets to see which one introduced a bug. In cases like this, the 1.106 +natural thing to do is update the working directory to the changeset 1.107 +you're interested in, and then examine the files in the working 1.108 +directory directly to see their contents as they werea when you 1.109 +committed that changeset. The effect of this is shown in 1.110 +figure~\ref{fig:concepts:wdir-pre-branch}. 1.111 + 1.112 +\begin{figure}[ht] 1.113 + \centering 1.114 + \grafix{wdir-pre-branch} 1.115 + \caption{The working directory, updated to an older changeset} 1.116 + \label{fig:concepts:wdir-pre-branch} 1.117 +\end{figure} 1.118 + 1.119 +Having updated the working directory to an older changeset, what 1.120 +happens if you make some changes, and then commit? Mercurial behaves 1.121 +in the same way as I outlined above. The parents of the working 1.122 +directory become the parents of the new changeset. This new changeset 1.123 +has no children, so it becomes the new tip. And the repository now 1.124 +contains two changesets that have no children; we call these 1.125 +\emph{heads}. You can see the structure that this creates in 1.126 +figure~\ref{fig:concepts:wdir-branch}. 1.127 + 1.128 +\begin{figure}[ht] 1.129 + \centering 1.130 + \grafix{wdir-branch} 1.131 + \caption{After a commit made while synced to an older changeset} 1.132 + \label{fig:concepts:wdir-branch} 1.133 +\end{figure} 1.134 + 1.135 +\begin{note} 1.136 + If you're new to Mercurial, you should keep in mind a common 1.137 + ``error'', which is to use the \hgcmd{pull} command without any 1.138 + options. By default, the \hgcmd{pull} command \emph{does not} 1.139 + update the working directory, so you'll bring new changesets into 1.140 + your repository, but the working directory will stay synced at the 1.141 + same changeset as before the pull. If you make some changes and 1.142 + commit afterwards, you'll thus create a new head, because your 1.143 + working directory isn't synced to whatever the current tip is. 1.144 + 1.145 + I put the word ``error'' in quotes because all that you need to do 1.146 + to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In 1.147 + other words, this almost never has negative consequences; it just 1.148 + surprises people. I'll discuss other ways to avoid this behaviour, 1.149 + and why Mercurial behaves in this initially surprising way, later 1.150 + on. 1.151 +\end{note} 1.152 + 1.153 +\subsection{Merging heads} 1.154 + 1.155 +When you run the \hgcmd{merge} command, Mercurial leaves the first 1.156 +parent of the working directory unchanged, and sets the second parent 1.157 +to the changeset you're merging with, as shown in 1.158 +figure~\ref{fig:concepts:wdir-merge}. 1.159 + 1.160 +\begin{figure}[ht] 1.161 + \centering 1.162 + \grafix{wdir-merge} 1.163 + \caption{Merging two hehads} 1.164 + \label{fig:concepts:wdir-merge} 1.165 +\end{figure} 1.166 + 1.167 +Mercurial also has to modify the working directory, to merge the files 1.168 +managed in the two changesets. Simplified a little, the merging 1.169 +process goes like this, for every file in the manifests of both 1.170 +changesets. 1.171 +\begin{itemize} 1.172 +\item If neither changeset has modified a file, do nothing with that 1.173 + file. 1.174 +\item If one changeset has modified a file, and the other hasn't, 1.175 + create the modified copy of the file in the working directory. 1.176 +\item If one changeset has removed a file, and the other hasn't (or 1.177 + has also deleted it), delete the file from the working directory. 1.178 +\item If one changeset has removed a file, but the other has modified 1.179 + the file, ask the user what to do: keep the modified file, or remove 1.180 + it? 1.181 +\item If both changesets have modified a file, invoke an external 1.182 + merge program to choose the new contents for the merged file. This 1.183 + may require input from the user. 1.184 +\item If one changeset has modified a file, and the other has renamed 1.185 + or copied the file, make sure that the changes follow the new name 1.186 + of the file. 1.187 +\end{itemize} 1.188 +There are more details---merging has plenty of corner cases---but 1.189 +these are the most common choices that are involved in a merge. As 1.190 +you can see, most cases are completely automatic, and indeed most 1.191 +merges finish automatically, without requiring your input to resolve 1.192 +any conflicts. 1.193 + 1.194 +When you're thinking about what happens when you commit after a merge, 1.195 +once again the working directory is ``the changeset I'm about to 1.196 +commit''. After the \hgcmd{merge} command completes, the working 1.197 +directory has two parents; these will become the parents of the new 1.198 +changeset. 1.199 + 1.200 +Mercurial lets you perform multiple merges, but you must commit the 1.201 +results of each individual merge as you go. This is necessary because 1.202 +Mercurial only tracks two parents for both revisions and the working 1.203 +directory. While it would be technically possible to merge multiple 1.204 +changesets at once, the prospect of user confusion and making a 1.205 +terrible mess of a merge immediately becomes overwhelming. 1.206 1.207 \section{Other interesting design features} 1.208 1.209 @@ -460,6 +548,27 @@ 1.210 performance and increase the complexity of the software, each of which 1.211 is much more important to the ``feel'' of day-to-day use. 1.212 1.213 +\subsection{Other contents of the dirstate} 1.214 + 1.215 +Because Mercurial doesn't force you to tell it when you're modifying a 1.216 +file, it uses the dirstate to store some extra information so it can 1.217 +determine efficiently whether you have modified a file. For each file 1.218 +in the working directory, it stores the time that it last modified the 1.219 +file itself, and the size of the file at that time. 1.220 + 1.221 +When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or 1.222 +\hgcmd{copy} files, the dirstate is updated each time. 1.223 + 1.224 +When Mercurial is checking the states of files in the working 1.225 +directory, it first checks a file's modification time. If that has 1.226 +not changed, the file must not have been modified. If the file's size 1.227 +has changed, the file must have been modified. If the modification 1.228 +time has changed, but the size has not, only then does Mercurial need 1.229 +to read the actual contents of the file to see if they've changed. 1.230 +Storing these few extra pieces of information dramatically reduces the 1.231 +amount of data that Mercurial needs to read, which yields large 1.232 +performance improvements compared to other revision control systems. 1.233 + 1.234 %%% Local Variables: 1.235 %%% mode: latex 1.236 %%% TeX-master: "00book"