hgbook

diff fr/ch04-concepts.xml @ 973:1df99de46e39

Finishing works to adapt already existing translations to new xdoc fmt - also add a couple new translations to follow recent modification from Bryan.
author Romain PELISSE <belaran@gmail.com>
date Tue Sep 01 17:00:12 2009 +0200 (2009-09-01)
parents
children e6894aa7baf2
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/fr/ch04-concepts.xml	Tue Sep 01 17:00:12 2009 +0200
     1.3 @@ -0,0 +1,710 @@
     1.4 +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
     1.5 +
     1.6 +<chapter>
     1.7 +<title>Behind the scenes</title>
     1.8 +<para>\label{chap:concepts}</para>
     1.9 +
    1.10 +<para>Unlike many revision control systems, the concepts upon which
    1.11 +Mercurial is built are simple enough that it's easy to understand how
    1.12 +the software really works.  Knowing this certainly isn't necessary,
    1.13 +but I find it useful to have a <quote>mental model</quote> of what's going on.</para>
    1.14 +
    1.15 +<para>This understanding gives me confidence that Mercurial has been
    1.16 +carefully designed to be both <emphasis>safe</emphasis> and <emphasis>efficient</emphasis>.  And
    1.17 +just as importantly, if it's easy for me to retain a good idea of what
    1.18 +the software is doing when I perform a revision control task, I'm less
    1.19 +likely to be surprised by its behaviour.</para>
    1.20 +
    1.21 +<para>In this chapter, we'll initially cover the core concepts behind
    1.22 +Mercurial's design, then continue to discuss some of the interesting
    1.23 +details of its implementation.</para>
    1.24 +
    1.25 +<sect1>
    1.26 +<title>Mercurial's historical record</title>
    1.27 +
    1.28 +<sect2>
    1.29 +<title>Tracking the history of a single file</title>
    1.30 +
    1.31 +<para>When Mercurial tracks modifications to a file, it stores the history
    1.32 +of that file in a metadata object called a <emphasis>filelog</emphasis>.  Each entry
    1.33 +in the filelog contains enough information to reconstruct one revision
    1.34 +of the file that is being tracked.  Filelogs are stored as files in
    1.35 +the <filename role="special" class="directory">.hg/store/data</filename> directory.  A filelog contains two kinds
    1.36 +of information: revision data, and an index to help Mercurial to find
    1.37 +a revision efficiently.</para>
    1.38 +
    1.39 +<para>A file that is large, or has a lot of history, has its filelog stored
    1.40 +in separate data (<quote><literal>.d</literal></quote> suffix) and index (<quote><literal>.i</literal></quote>
    1.41 +suffix) files.  For small files without much history, the revision
    1.42 +data and index are combined in a single <quote><literal>.i</literal></quote> file.  The
    1.43 +correspondence between a file in the working directory and the filelog
    1.44 +that tracks its history in the repository is illustrated in
    1.45 +figure <xref linkend="fig:concepts:filelog"/>.</para>
    1.46 +
    1.47 +<informalfigure>
    1.48 +
    1.49 +<para>  <mediaobject><imageobject><imagedata fileref="filelog"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
    1.50 +  \caption{Relationships between files in working directory and
    1.51 +    filelogs in repository}
    1.52 +  \label{fig:concepts:filelog}</para>
    1.53 +</informalfigure>
    1.54 +
    1.55 +</sect2>
    1.56 +<sect2>
    1.57 +<title>Managing tracked files</title>
    1.58 +
    1.59 +<para>Mercurial uses a structure called a <emphasis>manifest</emphasis> to collect
    1.60 +together information about the files that it tracks.  Each entry in
    1.61 +the manifest contains information about the files present in a single
    1.62 +changeset.  An entry records which files are present in the changeset,
    1.63 +the revision of each file, and a few other pieces of file metadata.</para>
    1.64 +
    1.65 +</sect2>
    1.66 +<sect2>
    1.67 +<title>Recording changeset information</title>
    1.68 +
    1.69 +<para>The <emphasis>changelog</emphasis> contains information about each changeset.  Each
    1.70 +revision records who committed a change, the changeset comment, other
    1.71 +pieces of changeset-related information, and the revision of the
    1.72 +manifest to use.
    1.73 +</para>
    1.74 +
    1.75 +</sect2>
    1.76 +<sect2>
    1.77 +<title>Relationships between revisions</title>
    1.78 +
    1.79 +<para>Within a changelog, a manifest, or a filelog, each revision stores a
    1.80 +pointer to its immediate parent (or to its two parents, if it's a
    1.81 +merge revision).  As I mentioned above, there are also relationships
    1.82 +between revisions <emphasis>across</emphasis> these structures, and they are
    1.83 +hierarchical in nature.
    1.84 +</para>
    1.85 +
    1.86 +<para>For every changeset in a repository, there is exactly one revision
    1.87 +stored in the changelog.  Each revision of the changelog contains a
    1.88 +pointer to a single revision of the manifest.  A revision of the
    1.89 +manifest stores a pointer to a single revision of each filelog tracked
    1.90 +when that changeset was created.  These relationships are illustrated
    1.91 +in figure <xref linkend="fig:concepts:metadata"/>.
    1.92 +</para>
    1.93 +
    1.94 +<informalfigure>
    1.95 +
    1.96 +<para>  <mediaobject><imageobject><imagedata fileref="metadata"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
    1.97 +  <caption><para>Metadata relationships</para></caption>
    1.98 +  \label{fig:concepts:metadata}
    1.99 +</para>
   1.100 +</informalfigure>
   1.101 +
   1.102 +<para>As the illustration shows, there is <emphasis>not</emphasis> a <quote>one to one</quote>
   1.103 +relationship between revisions in the changelog, manifest, or filelog.
   1.104 +If the manifest hasn't changed between two changesets, the changelog
   1.105 +entries for those changesets will point to the same revision of the
   1.106 +manifest.  If a file that Mercurial tracks hasn't changed between two
   1.107 +changesets, the entry for that file in the two revisions of the
   1.108 +manifest will point to the same revision of its filelog.
   1.109 +</para>
   1.110 +
   1.111 +</sect2>
   1.112 +</sect1>
   1.113 +<sect1>
   1.114 +<title>Safe, efficient storage</title>
   1.115 +
   1.116 +<para>The underpinnings of changelogs, manifests, and filelogs are provided
   1.117 +by a single structure called the <emphasis>revlog</emphasis>.
   1.118 +</para>
   1.119 +
   1.120 +<sect2>
   1.121 +<title>Efficient storage</title>
   1.122 +
   1.123 +<para>The revlog provides efficient storage of revisions using a
   1.124 +<emphasis>delta</emphasis> mechanism.  Instead of storing a complete copy of a file
   1.125 +for each revision, it stores the changes needed to transform an older
   1.126 +revision into the new revision.  For many kinds of file data, these
   1.127 +deltas are typically a fraction of a percent of the size of a full
   1.128 +copy of a file.
   1.129 +</para>
   1.130 +
   1.131 +<para>Some obsolete revision control systems can only work with deltas of
   1.132 +text files.  They must either store binary files as complete snapshots
   1.133 +or encoded into a text representation, both of which are wasteful
   1.134 +approaches.  Mercurial can efficiently handle deltas of files with
   1.135 +arbitrary binary contents; it doesn't need to treat text as special.
   1.136 +</para>
   1.137 +
   1.138 +</sect2>
   1.139 +<sect2>
   1.140 +<title>Safe operation</title>
   1.141 +<para>\label{sec:concepts:txn}
   1.142 +</para>
   1.143 +
   1.144 +<para>Mercurial only ever <emphasis>appends</emphasis> data to the end of a revlog file.
   1.145 +It never modifies a section of a file after it has written it.  This
   1.146 +is both more robust and efficient than schemes that need to modify or
   1.147 +rewrite data.
   1.148 +</para>
   1.149 +
   1.150 +<para>In addition, Mercurial treats every write as part of a
   1.151 +<emphasis>transaction</emphasis> that can span a number of files.  A transaction is
   1.152 +<emphasis>atomic</emphasis>: either the entire transaction succeeds and its effects
   1.153 +are all visible to readers in one go, or the whole thing is undone.
   1.154 +This guarantee of atomicity means that if you're running two copies of
   1.155 +Mercurial, where one is reading data and one is writing it, the reader
   1.156 +will never see a partially written result that might confuse it.
   1.157 +</para>
   1.158 +
   1.159 +<para>The fact that Mercurial only appends to files makes it easier to
   1.160 +provide this transactional guarantee.  The easier it is to do stuff
   1.161 +like this, the more confident you should be that it's done correctly.
   1.162 +</para>
   1.163 +
   1.164 +</sect2>
   1.165 +<sect2>
   1.166 +<title>Fast retrieval</title>
   1.167 +
   1.168 +<para>Mercurial cleverly avoids a pitfall common to all earlier
   1.169 +revision control systems: the problem of <emphasis>inefficient retrieval</emphasis>.
   1.170 +Most revision control systems store the contents of a revision as an
   1.171 +incremental series of modifications against a <quote>snapshot</quote>.  To
   1.172 +reconstruct a specific revision, you must first read the snapshot, and
   1.173 +then every one of the revisions between the snapshot and your target
   1.174 +revision.  The more history that a file accumulates, the more
   1.175 +revisions you must read, hence the longer it takes to reconstruct a
   1.176 +particular revision.
   1.177 +</para>
   1.178 +
   1.179 +<informalfigure>
   1.180 +
   1.181 +<para>  <mediaobject><imageobject><imagedata fileref="snapshot"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.182 +  <caption><para>Snapshot of a revlog, with incremental deltas</para></caption>
   1.183 +  \label{fig:concepts:snapshot}
   1.184 +</para>
   1.185 +</informalfigure>
   1.186 +
   1.187 +<para>The innovation that Mercurial applies to this problem is simple but
   1.188 +effective.  Once the cumulative amount of delta information stored
   1.189 +since the last snapshot exceeds a fixed threshold, it stores a new
   1.190 +snapshot (compressed, of course), instead of another delta.  This
   1.191 +makes it possible to reconstruct <emphasis>any</emphasis> revision of a file
   1.192 +quickly.  This approach works so well that it has since been copied by
   1.193 +several other revision control systems.
   1.194 +</para>
   1.195 +
   1.196 +<para>Figure <xref linkend="fig:concepts:snapshot"/> illustrates the idea.  In an entry
   1.197 +in a revlog's index file, Mercurial stores the range of entries from
   1.198 +the data file that it must read to reconstruct a particular revision.
   1.199 +</para>
   1.200 +
   1.201 +<sect3>
   1.202 +<title>Aside: the influence of video compression</title>
   1.203 +
   1.204 +<para>If you're familiar with video compression or have ever watched a TV
   1.205 +feed through a digital cable or satellite service, you may know that
   1.206 +most video compression schemes store each frame of video as a delta
   1.207 +against its predecessor frame.  In addition, these schemes use
   1.208 +<quote>lossy</quote> compression techniques to increase the compression ratio, so
   1.209 +visual errors accumulate over the course of a number of inter-frame
   1.210 +deltas.
   1.211 +</para>
   1.212 +
   1.213 +<para>Because it's possible for a video stream to <quote>drop out</quote> occasionally
   1.214 +due to signal glitches, and to limit the accumulation of artefacts
   1.215 +introduced by the lossy compression process, video encoders
   1.216 +periodically insert a complete frame (called a <quote>key frame</quote>) into the
   1.217 +video stream; the next delta is generated against that frame.  This
   1.218 +means that if the video signal gets interrupted, it will resume once
   1.219 +the next key frame is received.  Also, the accumulation of encoding
   1.220 +errors restarts anew with each key frame.
   1.221 +</para>
   1.222 +
   1.223 +</sect3>
   1.224 +</sect2>
   1.225 +<sect2>
   1.226 +<title>Identification and strong integrity</title>
   1.227 +
   1.228 +<para>Along with delta or snapshot information, a revlog entry contains a
   1.229 +cryptographic hash of the data that it represents.  This makes it
   1.230 +difficult to forge the contents of a revision, and easy to detect
   1.231 +accidental corruption.
   1.232 +</para>
   1.233 +
   1.234 +<para>Hashes provide more than a mere check against corruption; they are
   1.235 +used as the identifiers for revisions.  The changeset identification
   1.236 +hashes that you see as an end user are from revisions of the
   1.237 +changelog.  Although filelogs and the manifest also use hashes,
   1.238 +Mercurial only uses these behind the scenes.
   1.239 +</para>
   1.240 +
   1.241 +<para>Mercurial verifies that hashes are correct when it retrieves file
   1.242 +revisions and when it pulls changes from another repository.  If it
   1.243 +encounters an integrity problem, it will complain and stop whatever
   1.244 +it's doing.
   1.245 +</para>
   1.246 +
   1.247 +<para>In addition to the effect it has on retrieval efficiency, Mercurial's
   1.248 +use of periodic snapshots makes it more robust against partial data
   1.249 +corruption.  If a revlog becomes partly corrupted due to a hardware
   1.250 +error or system bug, it's often possible to reconstruct some or most
   1.251 +revisions from the uncorrupted sections of the revlog, both before and
   1.252 +after the corrupted section.  This would not be possible with a
   1.253 +delta-only storage model.
   1.254 +</para>
   1.255 +
   1.256 +<para>\section{Revision history, branching,
   1.257 +  and merging}
   1.258 +</para>
   1.259 +
   1.260 +<para>Every entry in a Mercurial revlog knows the identity of its immediate
   1.261 +ancestor revision, usually referred to as its <emphasis>parent</emphasis>.  In fact,
   1.262 +a revision contains room for not one parent, but two.  Mercurial uses
   1.263 +a special hash, called the <quote>null ID</quote>, to represent the idea <quote>there
   1.264 +is no parent here</quote>.  This hash is simply a string of zeroes.
   1.265 +</para>
   1.266 +
   1.267 +<para>In figure <xref linkend="fig:concepts:revlog"/>, you can see an example of the
   1.268 +conceptual structure of a revlog.  Filelogs, manifests, and changelogs
   1.269 +all have this same structure; they differ only in the kind of data
   1.270 +stored in each delta or snapshot.
   1.271 +</para>
   1.272 +
   1.273 +<para>The first revision in a revlog (at the bottom of the image) has the
   1.274 +null ID in both of its parent slots.  For a <quote>normal</quote> revision, its
   1.275 +first parent slot contains the ID of its parent revision, and its
   1.276 +second contains the null ID, indicating that the revision has only one
   1.277 +real parent.  Any two revisions that have the same parent ID are
   1.278 +branches.  A revision that represents a merge between branches has two
   1.279 +normal revision IDs in its parent slots.
   1.280 +</para>
   1.281 +
   1.282 +<informalfigure>
   1.283 +
   1.284 +<para>  <mediaobject><imageobject><imagedata fileref="revlog"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.285 +  \caption{}
   1.286 +  \label{fig:concepts:revlog}
   1.287 +</para>
   1.288 +</informalfigure>
   1.289 +
   1.290 +</sect2>
   1.291 +</sect1>
   1.292 +<sect1>
   1.293 +<title>The working directory</title>
   1.294 +
   1.295 +<para>In the working directory, Mercurial stores a snapshot of the files
   1.296 +from the repository as of a particular changeset.
   1.297 +</para>
   1.298 +
   1.299 +<para>The working directory <quote>knows</quote> which changeset it contains.  When you
   1.300 +update the working directory to contain a particular changeset,
   1.301 +Mercurial looks up the appropriate revision of the manifest to find
   1.302 +out which files it was tracking at the time that changeset was
   1.303 +committed, and which revision of each file was then current.  It then
   1.304 +recreates a copy of each of those files, with the same contents it had
   1.305 +when the changeset was committed.
   1.306 +</para>
   1.307 +
   1.308 +<para>The <emphasis>dirstate</emphasis> contains Mercurial's knowledge of the working
   1.309 +directory.  This details which changeset the working directory is
   1.310 +updated to, and all of the files that Mercurial is tracking in the
   1.311 +working directory.
   1.312 +</para>
   1.313 +
   1.314 +<para>Just as a revision of a revlog has room for two parents, so that it
   1.315 +can represent either a normal revision (with one parent) or a merge of
   1.316 +two earlier revisions, the dirstate has slots for two parents.  When
   1.317 +you use the <command role="hg-cmd">hg update</command> command, the changeset that you update to
   1.318 +is stored in the <quote>first parent</quote> slot, and the null ID in the second.
   1.319 +When you <command role="hg-cmd">hg merge</command> with another changeset, the first parent
   1.320 +remains unchanged, and the second parent is filled in with the
   1.321 +changeset you're merging with.  The <command role="hg-cmd">hg parents</command> command tells you
   1.322 +what the parents of the dirstate are.
   1.323 +</para>
   1.324 +
   1.325 +<sect2>
   1.326 +<title>What happens when you commit</title>
   1.327 +
   1.328 +<para>The dirstate stores parent information for more than just book-keeping
   1.329 +purposes.  Mercurial uses the parents of the dirstate as \emph{the
   1.330 +  parents of a new changeset} when you perform a commit.
   1.331 +</para>
   1.332 +
   1.333 +<informalfigure>
   1.334 +
   1.335 +<para>  <mediaobject><imageobject><imagedata fileref="wdir"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.336 +  <caption><para>The working directory can have two parents</para></caption>
   1.337 +  \label{fig:concepts:wdir}
   1.338 +</para>
   1.339 +</informalfigure>
   1.340 +
   1.341 +<para>Figure <xref linkend="fig:concepts:wdir"/> shows the normal state of the working
   1.342 +directory, where it has a single changeset as parent.  That changeset
   1.343 +is the <emphasis>tip</emphasis>, the newest changeset in the repository that has no
   1.344 +children.
   1.345 +</para>
   1.346 +
   1.347 +<informalfigure>
   1.348 +
   1.349 +<para>  <mediaobject><imageobject><imagedata fileref="wdir-after-commit"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.350 +  <caption><para>The working directory gains new parents after a commit</para></caption>
   1.351 +  \label{fig:concepts:wdir-after-commit}
   1.352 +</para>
   1.353 +</informalfigure>
   1.354 +
   1.355 +<para>It's useful to think of the working directory as <quote>the changeset I'm
   1.356 +about to commit</quote>.  Any files that you tell Mercurial that you've
   1.357 +added, removed, renamed, or copied will be reflected in that
   1.358 +changeset, as will modifications to any files that Mercurial is
   1.359 +already tracking; the new changeset will have the parents of the
   1.360 +working directory as its parents.
   1.361 +</para>
   1.362 +
   1.363 +<para>After a commit, Mercurial will update the parents of the working
   1.364 +directory, so that the first parent is the ID of the new changeset,
   1.365 +and the second is the null ID.  This is shown in
   1.366 +figure <xref linkend="fig:concepts:wdir-after-commit"/>.  Mercurial doesn't touch
   1.367 +any of the files in the working directory when you commit; it just
   1.368 +modifies the dirstate to note its new parents.
   1.369 +</para>
   1.370 +
   1.371 +</sect2>
   1.372 +<sect2>
   1.373 +<title>Creating a new head</title>
   1.374 +
   1.375 +<para>It's perfectly normal to update the working directory to a changeset
   1.376 +other than the current tip.  For example, you might want to know what
   1.377 +your project looked like last Tuesday, or you could be looking through
   1.378 +changesets to see which one introduced a bug.  In cases like this, the
   1.379 +natural thing to do is update the working directory to the changeset
   1.380 +you're interested in, and then examine the files in the working
   1.381 +directory directly to see their contents as they were when you
   1.382 +committed that changeset.  The effect of this is shown in
   1.383 +figure <xref linkend="fig:concepts:wdir-pre-branch"/>.
   1.384 +</para>
   1.385 +
   1.386 +<informalfigure>
   1.387 +
   1.388 +<para>  <mediaobject><imageobject><imagedata fileref="wdir-pre-branch"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.389 +  <caption><para>The working directory, updated to an older changeset</para></caption>
   1.390 +  \label{fig:concepts:wdir-pre-branch}
   1.391 +</para>
   1.392 +</informalfigure>
   1.393 +
   1.394 +<para>Having updated the working directory to an older changeset, what
   1.395 +happens if you make some changes, and then commit?  Mercurial behaves
   1.396 +in the same way as I outlined above.  The parents of the working
   1.397 +directory become the parents of the new changeset.  This new changeset
   1.398 +has no children, so it becomes the new tip.  And the repository now
   1.399 +contains two changesets that have no children; we call these
   1.400 +<emphasis>heads</emphasis>.  You can see the structure that this creates in
   1.401 +figure <xref linkend="fig:concepts:wdir-branch"/>.
   1.402 +</para>
   1.403 +
   1.404 +<informalfigure>
   1.405 +
   1.406 +<para>  <mediaobject><imageobject><imagedata fileref="wdir-branch"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.407 +  <caption><para>After a commit made while synced to an older changeset</para></caption>
   1.408 +  \label{fig:concepts:wdir-branch}
   1.409 +</para>
   1.410 +</informalfigure>
   1.411 +
   1.412 +<note>
   1.413 +<para>  If you're new to Mercurial, you should keep in mind a common
   1.414 +  <quote>error</quote>, which is to use the <command role="hg-cmd">hg pull</command> command without any
   1.415 +  options.  By default, the <command role="hg-cmd">hg pull</command> command <emphasis>does not</emphasis>
   1.416 +  update the working directory, so you'll bring new changesets into
   1.417 +  your repository, but the working directory will stay synced at the
   1.418 +  same changeset as before the pull.  If you make some changes and
   1.419 +  commit afterwards, you'll thus create a new head, because your
   1.420 +  working directory isn't synced to whatever the current tip is.
   1.421 +</para>
   1.422 +
   1.423 +<para>  I put the word <quote>error</quote> in quotes because all that you need to do
   1.424 +  to rectify this situation is <command role="hg-cmd">hg merge</command>, then <command role="hg-cmd">hg commit</command>.  In
   1.425 +  other words, this almost never has negative consequences; it just
   1.426 +  surprises people.  I'll discuss other ways to avoid this behaviour,
   1.427 +  and why Mercurial behaves in this initially surprising way, later
   1.428 +  on.
   1.429 +</para>
   1.430 +</note>
   1.431 +
   1.432 +</sect2>
   1.433 +<sect2>
   1.434 +<title>Merging heads</title>
   1.435 +
   1.436 +<para>When you run the <command role="hg-cmd">hg merge</command> command, Mercurial leaves the first
   1.437 +parent of the working directory unchanged, and sets the second parent
   1.438 +to the changeset you're merging with, as shown in
   1.439 +figure <xref linkend="fig:concepts:wdir-merge"/>.
   1.440 +</para>
   1.441 +
   1.442 +<informalfigure>
   1.443 +
   1.444 +<para>  <mediaobject><imageobject><imagedata fileref="wdir-merge"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.445 +  <caption><para>Merging two heads</para></caption>
   1.446 +  \label{fig:concepts:wdir-merge}
   1.447 +</para>
   1.448 +</informalfigure>
   1.449 +
   1.450 +<para>Mercurial also has to modify the working directory, to merge the files
   1.451 +managed in the two changesets.  Simplified a little, the merging
   1.452 +process goes like this, for every file in the manifests of both
   1.453 +changesets.
   1.454 +</para>
   1.455 +<itemizedlist>
   1.456 +<listitem><para>If neither changeset has modified a file, do nothing with that
   1.457 +  file.
   1.458 +</para>
   1.459 +</listitem>
   1.460 +<listitem><para>If one changeset has modified a file, and the other hasn't,
   1.461 +  create the modified copy of the file in the working directory.
   1.462 +</para>
   1.463 +</listitem>
   1.464 +<listitem><para>If one changeset has removed a file, and the other hasn't (or
   1.465 +  has also deleted it), delete the file from the working directory.
   1.466 +</para>
   1.467 +</listitem>
   1.468 +<listitem><para>If one changeset has removed a file, but the other has modified
   1.469 +  the file, ask the user what to do: keep the modified file, or remove
   1.470 +  it?
   1.471 +</para>
   1.472 +</listitem>
   1.473 +<listitem><para>If both changesets have modified a file, invoke an external
   1.474 +  merge program to choose the new contents for the merged file.  This
   1.475 +  may require input from the user.
   1.476 +</para>
   1.477 +</listitem>
   1.478 +<listitem><para>If one changeset has modified a file, and the other has renamed
   1.479 +  or copied the file, make sure that the changes follow the new name
   1.480 +  of the file.
   1.481 +</para>
   1.482 +</listitem></itemizedlist>
   1.483 +<para>There are more details&emdash;merging has plenty of corner cases&emdash;but
   1.484 +these are the most common choices that are involved in a merge.  As
   1.485 +you can see, most cases are completely automatic, and indeed most
   1.486 +merges finish automatically, without requiring your input to resolve
   1.487 +any conflicts.
   1.488 +</para>
   1.489 +
   1.490 +<para>When you're thinking about what happens when you commit after a merge,
   1.491 +once again the working directory is <quote>the changeset I'm about to
   1.492 +commit</quote>.  After the <command role="hg-cmd">hg merge</command> command completes, the working
   1.493 +directory has two parents; these will become the parents of the new
   1.494 +changeset.
   1.495 +</para>
   1.496 +
   1.497 +<para>Mercurial lets you perform multiple merges, but you must commit the
   1.498 +results of each individual merge as you go.  This is necessary because
   1.499 +Mercurial only tracks two parents for both revisions and the working
   1.500 +directory.  While it would be technically possible to merge multiple
   1.501 +changesets at once, the prospect of user confusion and making a
   1.502 +terrible mess of a merge immediately becomes overwhelming.
   1.503 +</para>
   1.504 +
   1.505 +</sect2>
   1.506 +</sect1>
   1.507 +<sect1>
   1.508 +<title>Other interesting design features</title>
   1.509 +
   1.510 +<para>In the sections above, I've tried to highlight some of the most
   1.511 +important aspects of Mercurial's design, to illustrate that it pays
   1.512 +careful attention to reliability and performance.  However, the
   1.513 +attention to detail doesn't stop there.  There are a number of other
   1.514 +aspects of Mercurial's construction that I personally find
   1.515 +interesting.  I'll detail a few of them here, separate from the <quote>big
   1.516 +ticket</quote> items above, so that if you're interested, you can gain a
   1.517 +better idea of the amount of thinking that goes into a well-designed
   1.518 +system.
   1.519 +</para>
   1.520 +
   1.521 +<sect2>
   1.522 +<title>Clever compression</title>
   1.523 +
   1.524 +<para>When appropriate, Mercurial will store both snapshots and deltas in
   1.525 +compressed form.  It does this by always <emphasis>trying to</emphasis> compress a
   1.526 +snapshot or delta, but only storing the compressed version if it's
   1.527 +smaller than the uncompressed version.
   1.528 +</para>
   1.529 +
   1.530 +<para>This means that Mercurial does <quote>the right thing</quote> when storing a file
   1.531 +whose native form is compressed, such as a <literal>zip</literal> archive or a
   1.532 +JPEG image.  When these types of files are compressed a second time,
   1.533 +the resulting file is usually bigger than the once-compressed form,
   1.534 +and so Mercurial will store the plain <literal>zip</literal> or JPEG.
   1.535 +</para>
   1.536 +
   1.537 +<para>Deltas between revisions of a compressed file are usually larger than
   1.538 +snapshots of the file, and Mercurial again does <quote>the right thing</quote> in
   1.539 +these cases.  It finds that such a delta exceeds the threshold at
   1.540 +which it should store a complete snapshot of the file, so it stores
   1.541 +the snapshot, again saving space compared to a naive delta-only
   1.542 +approach.
   1.543 +</para>
   1.544 +
   1.545 +<sect3>
   1.546 +<title>Network recompression</title>
   1.547 +
   1.548 +<para>When storing revisions on disk, Mercurial uses the <quote>deflate</quote>
   1.549 +compression algorithm (the same one used by the popular <literal>zip</literal>
   1.550 +archive format), which balances good speed with a respectable
   1.551 +compression ratio.  However, when transmitting revision data over a
   1.552 +network connection, Mercurial uncompresses the compressed revision
   1.553 +data.
   1.554 +</para>
   1.555 +
   1.556 +<para>If the connection is over HTTP, Mercurial recompresses the entire
   1.557 +stream of data using a compression algorithm that gives a better
   1.558 +compression ratio (the Burrows-Wheeler algorithm from the widely used
   1.559 +<literal>bzip2</literal> compression package).  This combination of algorithm
   1.560 +and compression of the entire stream (instead of a revision at a time)
   1.561 +substantially reduces the number of bytes to be transferred, yielding
   1.562 +better network performance over almost all kinds of network.
   1.563 +</para>
   1.564 +
   1.565 +<para>(If the connection is over <command>ssh</command>, Mercurial <emphasis>doesn't</emphasis>
   1.566 +recompress the stream, because <command>ssh</command> can already do this
   1.567 +itself.)
   1.568 +</para>
   1.569 +
   1.570 +</sect3>
   1.571 +</sect2>
   1.572 +<sect2>
   1.573 +<title>Read/write ordering and atomicity</title>
   1.574 +
   1.575 +<para>Appending to files isn't the whole story when it comes to guaranteeing
   1.576 +that a reader won't see a partial write.  If you recall
   1.577 +figure <xref linkend="fig:concepts:metadata"/>, revisions in the changelog point to
   1.578 +revisions in the manifest, and revisions in the manifest point to
   1.579 +revisions in filelogs.  This hierarchy is deliberate.
   1.580 +</para>
   1.581 +
   1.582 +<para>A writer starts a transaction by writing filelog and manifest data,
   1.583 +and doesn't write any changelog data until those are finished.  A
   1.584 +reader starts by reading changelog data, then manifest data, followed
   1.585 +by filelog data.
   1.586 +</para>
   1.587 +
   1.588 +<para>Since the writer has always finished writing filelog and manifest data
   1.589 +before it writes to the changelog, a reader will never read a pointer
   1.590 +to a partially written manifest revision from the changelog, and it will
   1.591 +never read a pointer to a partially written filelog revision from the
   1.592 +manifest.
   1.593 +</para>
   1.594 +
   1.595 +</sect2>
   1.596 +<sect2>
   1.597 +<title>Concurrent access</title>
   1.598 +
   1.599 +<para>The read/write ordering and atomicity guarantees mean that Mercurial
   1.600 +never needs to <emphasis>lock</emphasis> a repository when it's reading data, even
   1.601 +if the repository is being written to while the read is occurring.
   1.602 +This has a big effect on scalability; you can have an arbitrary number
   1.603 +of Mercurial processes safely reading data from a repository safely
   1.604 +all at once, no matter whether it's being written to or not.
   1.605 +</para>
   1.606 +
   1.607 +<para>The lockless nature of reading means that if you're sharing a
   1.608 +repository on a multi-user system, you don't need to grant other local
   1.609 +users permission to <emphasis>write</emphasis> to your repository in order for them
   1.610 +to be able to clone it or pull changes from it; they only need
   1.611 +<emphasis>read</emphasis> permission.  (This is <emphasis>not</emphasis> a common feature among
   1.612 +revision control systems, so don't take it for granted!  Most require
   1.613 +readers to be able to lock a repository to access it safely, and this
   1.614 +requires write permission on at least one directory, which of course
   1.615 +makes for all kinds of nasty and annoying security and administrative
   1.616 +problems.)
   1.617 +</para>
   1.618 +
   1.619 +<para>Mercurial uses locks to ensure that only one process can write to a
   1.620 +repository at a time (the locking mechanism is safe even over
   1.621 +filesystems that are notoriously hostile to locking, such as NFS).  If
   1.622 +a repository is locked, a writer will wait for a while to retry if the
   1.623 +repository becomes unlocked, but if the repository remains locked for
   1.624 +too long, the process attempting to write will time out after a while.
   1.625 +This means that your daily automated scripts won't get stuck forever
   1.626 +and pile up if a system crashes unnoticed, for example.  (Yes, the
   1.627 +timeout is configurable, from zero to infinity.)
   1.628 +</para>
   1.629 +
   1.630 +<sect3>
   1.631 +<title>Safe dirstate access</title>
   1.632 +
   1.633 +<para>As with revision data, Mercurial doesn't take a lock to read the
   1.634 +dirstate file; it does acquire a lock to write it.  To avoid the
   1.635 +possibility of reading a partially written copy of the dirstate file,
   1.636 +Mercurial writes to a file with a unique name in the same directory as
   1.637 +the dirstate file, then renames the temporary file atomically to
   1.638 +<filename>dirstate</filename>.  The file named <filename>dirstate</filename> is thus
   1.639 +guaranteed to be complete, not partially written.
   1.640 +</para>
   1.641 +
   1.642 +</sect3>
   1.643 +</sect2>
   1.644 +<sect2>
   1.645 +<title>Avoiding seeks</title>
   1.646 +
   1.647 +<para>Critical to Mercurial's performance is the avoidance of seeks of the
   1.648 +disk head, since any seek is far more expensive than even a
   1.649 +comparatively large read operation.
   1.650 +</para>
   1.651 +
   1.652 +<para>This is why, for example, the dirstate is stored in a single file.  If
   1.653 +there were a dirstate file per directory that Mercurial tracked, the
   1.654 +disk would seek once per directory.  Instead, Mercurial reads the
   1.655 +entire single dirstate file in one step.
   1.656 +</para>
   1.657 +
   1.658 +<para>Mercurial also uses a <quote>copy on write</quote> scheme when cloning a
   1.659 +repository on local storage.  Instead of copying every revlog file
   1.660 +from the old repository into the new repository, it makes a <quote>hard
   1.661 +link</quote>, which is a shorthand way to say <quote>these two names point to the
   1.662 +same file</quote>.  When Mercurial is about to write to one of a revlog's
   1.663 +files, it checks to see if the number of names pointing at the file is
   1.664 +greater than one.  If it is, more than one repository is using the
   1.665 +file, so Mercurial makes a new copy of the file that is private to
   1.666 +this repository.
   1.667 +</para>
   1.668 +
   1.669 +<para>A few revision control developers have pointed out that this idea of
   1.670 +making a complete private copy of a file is not very efficient in its
   1.671 +use of storage.  While this is true, storage is cheap, and this method
   1.672 +gives the highest performance while deferring most book-keeping to the
   1.673 +operating system.  An alternative scheme would most likely reduce
   1.674 +performance and increase the complexity of the software, each of which
   1.675 +is much more important to the <quote>feel</quote> of day-to-day use.
   1.676 +</para>
   1.677 +
   1.678 +</sect2>
   1.679 +<sect2>
   1.680 +<title>Other contents of the dirstate</title>
   1.681 +
   1.682 +<para>Because Mercurial doesn't force you to tell it when you're modifying a
   1.683 +file, it uses the dirstate to store some extra information so it can
   1.684 +determine efficiently whether you have modified a file.  For each file
   1.685 +in the working directory, it stores the time that it last modified the
   1.686 +file itself, and the size of the file at that time.
   1.687 +</para>
   1.688 +
   1.689 +<para>When you explicitly <command role="hg-cmd">hg add</command>, <command role="hg-cmd">hg remove</command>, <command role="hg-cmd">hg rename</command> or
   1.690 +<command role="hg-cmd">hg copy</command> files, Mercurial updates the dirstate so that it knows
   1.691 +what to do with those files when you commit.
   1.692 +</para>
   1.693 +
   1.694 +<para>When Mercurial is checking the states of files in the working
   1.695 +directory, it first checks a file's modification time.  If that has
   1.696 +not changed, the file must not have been modified.  If the file's size
   1.697 +has changed, the file must have been modified.  If the modification
   1.698 +time has changed, but the size has not, only then does Mercurial need
   1.699 +to read the actual contents of the file to see if they've changed.
   1.700 +Storing these few extra pieces of information dramatically reduces the
   1.701 +amount of data that Mercurial needs to read, which yields large
   1.702 +performance improvements compared to other revision control systems.
   1.703 +</para>
   1.704 +
   1.705 +</sect2>
   1.706 +</sect1>
   1.707 +</chapter>
   1.708 +
   1.709 +<!--
   1.710 +local variables: 
   1.711 +sgml-parent-document: ("00book.xml" "book" "chapter")
   1.712 +end:
   1.713 +-->
   1.714 \ No newline at end of file