hgbook

diff en/concepts.tex @ 115:b74102b56df5

Wow! Lots more work detailing the working directory, merging, etc.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon Nov 13 16:19:48 2006 -0800 (2006-11-13)
parents a0f57b3e677e
children ca99f247899e
line diff
     1.1 --- a/en/concepts.tex	Mon Nov 13 14:32:16 2006 -0800
     1.2 +++ b/en/concepts.tex	Mon Nov 13 16:19:48 2006 -0800
     1.3 @@ -204,6 +204,35 @@
     1.4  after the corrupted section.  This would not be possible with a
     1.5  delta-only storage model.
     1.6  
     1.7 +\section{Revision history, branching,
     1.8 +  and merging}
     1.9 +
    1.10 +Every entry in a Mercurial revlog knows the identity of its immediate
    1.11 +ancestor revision, usually referred to as its \emph{parent}.  In fact,
    1.12 +a revision contains room for not one parent, but two.  Mercurial uses
    1.13 +a special hash, called the ``null ID'', to represent the idea ``there
    1.14 +is no parent here''.  This hash is simply a string of zeroes.
    1.15 +
    1.16 +In figure~\ref{fig:concepts:revlog}, you can see an example of the
    1.17 +conceptual structure of a revlog.  Filelogs, manifests, and changelogs
    1.18 +all have this same structure; they differ only in the kind of data
    1.19 +stored in each delta or snapshot.
    1.20 +
    1.21 +The first revision in a revlog (at the bottom of the image) has the
    1.22 +null ID in both of its parent slots.  For a ``normal'' revision, its
    1.23 +first parent slot contains the ID of its parent revision, and its
    1.24 +second contains the null ID, indicating that the revision has only one
    1.25 +real parent.  Any two revisions that have the same parent ID are
    1.26 +branches.  A revision that represents a merge between branches has two
    1.27 +normal revision IDs in its parent slots.
    1.28 +
    1.29 +\begin{figure}[ht]
    1.30 +  \centering
    1.31 +  \grafix{revlog}
    1.32 +  \caption{}
    1.33 +  \label{fig:concepts:revlog}
    1.34 +\end{figure}
    1.35 +
    1.36  \section{The working directory}
    1.37  
    1.38  In the working directory, Mercurial stores a snapshot of the files
    1.39 @@ -266,58 +295,117 @@
    1.40  
    1.41  After a commit, Mercurial will update the parents of the working
    1.42  directory, so that the first parent is the ID of the new changeset,
    1.43 -and the second is the null ID.  This is illustrated in
    1.44 -figure~\ref{fig:concepts:wdir-after-commit}.
    1.45 -
    1.46 -\subsection{Other contents of the dirstate}
    1.47 -
    1.48 -Because Mercurial doesn't force you to tell it when you're modifying a
    1.49 -file, it uses the dirstate to store some extra information so it can
    1.50 -determine efficiently whether you have modified a file.  For each file
    1.51 -in the working directory, it stores the time that it last modified the
    1.52 -file itself, and the size of the file at that time.  
    1.53 -
    1.54 -When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
    1.55 -\hgcmd{copy} files, the dirstate is updated each time.
    1.56 -
    1.57 -When Mercurial is checking the states of files in the working
    1.58 -directory, it first checks a file's modification time.  If that has
    1.59 -not changed, the file must not have been modified.  If the file's size
    1.60 -has changed, the file must have been modified.  If the modification
    1.61 -time has changed, but the size has not, only then does Mercurial need
    1.62 -to read the actual contents of the file to see if they've changed.
    1.63 -Storing these few extra pieces of information dramatically reduces the
    1.64 -amount of data that Mercurial needs to read, which yields large
    1.65 -performance improvements compared to other revision control systems.
    1.66 -
    1.67 -\section{Revision history, branching,
    1.68 -  and merging}
    1.69 -
    1.70 -Every entry in a Mercurial revlog knows the identity of its immediate
    1.71 -ancestor revision, usually referred to as its \emph{parent}.  In fact,
    1.72 -a revision contains room for not one parent, but two.  Mercurial uses
    1.73 -a special hash, called the ``null ID'', to represent the idea ``there
    1.74 -is no parent here''.  This hash is simply a string of zeroes.
    1.75 -
    1.76 -In figure~\ref{fig:concepts:revlog}, you can see an example of the
    1.77 -conceptual structure of a revlog.  Filelogs, manifests, and changelogs
    1.78 -all have this same structure; they differ only in the kind of data
    1.79 -stored in each delta or snapshot.
    1.80 -
    1.81 -The first revision in a revlog (at the bottom of the image) has the
    1.82 -null ID in both of its parent slots.  For a ``normal'' revision, its
    1.83 -first parent slot contains the ID of its parent revision, and its
    1.84 -second contains the null ID, indicating that the revision has only one
    1.85 -real parent.  Any two revisions that have the same parent ID are
    1.86 -branches.  A revision that represents a merge between branches has two
    1.87 -normal revision IDs in its parent slots.
    1.88 -
    1.89 -\begin{figure}[ht]
    1.90 -  \centering
    1.91 -  \grafix{revlog}
    1.92 -  \caption{}
    1.93 -  \label{fig:concepts:revlog}
    1.94 -\end{figure}
    1.95 +and the second is the null ID.  This is shown in
    1.96 +figure~\ref{fig:concepts:wdir-after-commit}.  Mercurial doesn't touch
    1.97 +any of the files in the working directory when you commit; it just
    1.98 +modifies the dirstate to note its new parents.
    1.99 +
   1.100 +\subsection{Creating a new head}
   1.101 +
   1.102 +It's perfectly normal to update the working directory to a changeset
   1.103 +other than the current tip.  For example, you might want to know what
   1.104 +your project looked like last Tuesday, or you could be looking through
   1.105 +changesets to see which one introduced a bug.  In cases like this, the
   1.106 +natural thing to do is update the working directory to the changeset
   1.107 +you're interested in, and then examine the files in the working
   1.108 +directory directly to see their contents as they werea when you
   1.109 +committed that changeset.  The effect of this is shown in
   1.110 +figure~\ref{fig:concepts:wdir-pre-branch}.
   1.111 +
   1.112 +\begin{figure}[ht]
   1.113 +  \centering
   1.114 +  \grafix{wdir-pre-branch}
   1.115 +  \caption{The working directory, updated to an older changeset}
   1.116 +  \label{fig:concepts:wdir-pre-branch}
   1.117 +\end{figure}
   1.118 +
   1.119 +Having updated the working directory to an older changeset, what
   1.120 +happens if you make some changes, and then commit?  Mercurial behaves
   1.121 +in the same way as I outlined above.  The parents of the working
   1.122 +directory become the parents of the new changeset.  This new changeset
   1.123 +has no children, so it becomes the new tip.  And the repository now
   1.124 +contains two changesets that have no children; we call these
   1.125 +\emph{heads}.  You can see the structure that this creates in
   1.126 +figure~\ref{fig:concepts:wdir-branch}.
   1.127 +
   1.128 +\begin{figure}[ht]
   1.129 +  \centering
   1.130 +  \grafix{wdir-branch}
   1.131 +  \caption{After a commit made while synced to an older changeset}
   1.132 +  \label{fig:concepts:wdir-branch}
   1.133 +\end{figure}
   1.134 +
   1.135 +\begin{note}
   1.136 +  If you're new to Mercurial, you should keep in mind a common
   1.137 +  ``error'', which is to use the \hgcmd{pull} command without any
   1.138 +  options.  By default, the \hgcmd{pull} command \emph{does not}
   1.139 +  update the working directory, so you'll bring new changesets into
   1.140 +  your repository, but the working directory will stay synced at the
   1.141 +  same changeset as before the pull.  If you make some changes and
   1.142 +  commit afterwards, you'll thus create a new head, because your
   1.143 +  working directory isn't synced to whatever the current tip is.
   1.144 +
   1.145 +  I put the word ``error'' in quotes because all that you need to do
   1.146 +  to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}.  In
   1.147 +  other words, this almost never has negative consequences; it just
   1.148 +  surprises people.  I'll discuss other ways to avoid this behaviour,
   1.149 +  and why Mercurial behaves in this initially surprising way, later
   1.150 +  on.
   1.151 +\end{note}
   1.152 +
   1.153 +\subsection{Merging heads}
   1.154 +
   1.155 +When you run the \hgcmd{merge} command, Mercurial leaves the first
   1.156 +parent of the working directory unchanged, and sets the second parent
   1.157 +to the changeset you're merging with, as shown in
   1.158 +figure~\ref{fig:concepts:wdir-merge}.
   1.159 +
   1.160 +\begin{figure}[ht]
   1.161 +  \centering
   1.162 +  \grafix{wdir-merge}
   1.163 +  \caption{Merging two hehads}
   1.164 +  \label{fig:concepts:wdir-merge}
   1.165 +\end{figure}
   1.166 +
   1.167 +Mercurial also has to modify the working directory, to merge the files
   1.168 +managed in the two changesets.  Simplified a little, the merging
   1.169 +process goes like this, for every file in the manifests of both
   1.170 +changesets.
   1.171 +\begin{itemize}
   1.172 +\item If neither changeset has modified a file, do nothing with that
   1.173 +  file.
   1.174 +\item If one changeset has modified a file, and the other hasn't,
   1.175 +  create the modified copy of the file in the working directory.
   1.176 +\item If one changeset has removed a file, and the other hasn't (or
   1.177 +  has also deleted it), delete the file from the working directory.
   1.178 +\item If one changeset has removed a file, but the other has modified
   1.179 +  the file, ask the user what to do: keep the modified file, or remove
   1.180 +  it?
   1.181 +\item If both changesets have modified a file, invoke an external
   1.182 +  merge program to choose the new contents for the merged file.  This
   1.183 +  may require input from the user.
   1.184 +\item If one changeset has modified a file, and the other has renamed
   1.185 +  or copied the file, make sure that the changes follow the new name
   1.186 +  of the file.
   1.187 +\end{itemize}
   1.188 +There are more details---merging has plenty of corner cases---but
   1.189 +these are the most common choices that are involved in a merge.  As
   1.190 +you can see, most cases are completely automatic, and indeed most
   1.191 +merges finish automatically, without requiring your input to resolve
   1.192 +any conflicts.
   1.193 +
   1.194 +When you're thinking about what happens when you commit after a merge,
   1.195 +once again the working directory is ``the changeset I'm about to
   1.196 +commit''.  After the \hgcmd{merge} command completes, the working
   1.197 +directory has two parents; these will become the parents of the new
   1.198 +changeset.
   1.199 +
   1.200 +Mercurial lets you perform multiple merges, but you must commit the
   1.201 +results of each individual merge as you go.  This is necessary because
   1.202 +Mercurial only tracks two parents for both revisions and the working
   1.203 +directory.  While it would be technically possible to merge multiple
   1.204 +changesets at once, the prospect of user confusion and making a
   1.205 +terrible mess of a merge immediately becomes overwhelming.
   1.206  
   1.207  \section{Other interesting design features}
   1.208  
   1.209 @@ -460,6 +548,27 @@
   1.210  performance and increase the complexity of the software, each of which
   1.211  is much more important to the ``feel'' of day-to-day use.
   1.212  
   1.213 +\subsection{Other contents of the dirstate}
   1.214 +
   1.215 +Because Mercurial doesn't force you to tell it when you're modifying a
   1.216 +file, it uses the dirstate to store some extra information so it can
   1.217 +determine efficiently whether you have modified a file.  For each file
   1.218 +in the working directory, it stores the time that it last modified the
   1.219 +file itself, and the size of the file at that time.  
   1.220 +
   1.221 +When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
   1.222 +\hgcmd{copy} files, the dirstate is updated each time.
   1.223 +
   1.224 +When Mercurial is checking the states of files in the working
   1.225 +directory, it first checks a file's modification time.  If that has
   1.226 +not changed, the file must not have been modified.  If the file's size
   1.227 +has changed, the file must have been modified.  If the modification
   1.228 +time has changed, but the size has not, only then does Mercurial need
   1.229 +to read the actual contents of the file to see if they've changed.
   1.230 +Storing these few extra pieces of information dramatically reduces the
   1.231 +amount of data that Mercurial needs to read, which yields large
   1.232 +performance improvements compared to other revision control systems.
   1.233 +
   1.234  %%% Local Variables: 
   1.235  %%% mode: latex
   1.236  %%% TeX-master: "00book"