hgbook

annotate fr/ch04-concepts.xml @ 993:71dbda516572

French translation : merge with Jean Marie Clement's work on ch04-concepts
author Frédéric Bouquet <youshe.jaalon@gmail.com>
date Fri Sep 11 14:35:36 2009 +0200 (2009-09-11)
parents 8b0f1e2984d0 e6894aa7baf2
children 669ae1a09e46
rev   line source
belaran@964 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
belaran@964 2
bos@559 3 <chapter id="chap:concepts">
bos@572 4 <?dbhtml filename="behind-the-scenes.html"?>
youshe@993 5 <title>Derrière le décor</title>
youshe@993 6
youshe@993 7 <para id="x_2e8">À la différence de beaucoup d'outils de gestion de versions,
youshe@993 8 les concepts sur lesquels se base Mercurial sont assez simples pour
youshe@993 9 qu'il soit facile de comprendre comment le logiciel fonctionne.
youshe@993 10 Bien que leur connaissance ne soit pas nécéssaire, je trouve utile
youshe@993 11 d'avoir un <quote>modèle mental</quote> de ce qui se passe.</para>
youshe@993 12
youshe@993 13 <para id="x_2e9">En effet, cette compréhension m'apporte la confiance que
youshe@993 14 Mercurial a été développé avec soin pour être à la fois
youshe@993 15 <emphasis>sûr</emphasis> et <emphasis>efficace</emphasis>. De surcroît,
youshe@993 16 si il m'est facile de garder en tête ce que le logiciel fait lorsque
youshe@993 17 j'accompli des tâches de révision, j'aurai moins de risques d'être
youshe@993 18 surpris par son comportement.</para>
youshe@993 19
youshe@993 20 <para id="x_2ea">Dans ce chapitre, nous décrirons tout d'abord les concepts
youshe@993 21 essentiels de l'architecture de Mercurial, pour ensuite discuter quelques
youshe@993 22 uns des détails intéressants de son implémentation.</para>
bos@559 23
bos@559 24 <sect1>
youshe@993 25 <title>Conservation de l'historique sous Mercurial</title>
youshe@993 26 <sect2>
youshe@993 27 <title>Suivi de l'historique pour un seul fichier</title>
youshe@993 28
youshe@993 29 <para id="x_2eb">Lorsque Mercurial effectue un suivi des modifications
youshe@993 30 faites à un fichier, il conserve l'historique pour ce fichier dans un
youshe@993 31 <emphasis>filelog</emphasis> sous forme de métadonnées. Chaque entrée
youshe@993 32 dans le filelog contient assez d'informations pour reconstituer une
youshe@993 33 révision du fichier correspondant. Les filelogs sont des fichiers
youshe@993 34 stockés dans le répertoire <filename role="special"
youshe@993 35 class="directory">.hg/store/data</filename>. Un filelog contient
youshe@993 36 des informations de deux types: les données de révision, et un index
youshe@993 37 pour permettre à Mercurial une recherche efficace d'une révision
youshe@993 38 donnée.</para>
youshe@993 39
youshe@993 40 <para id="x_2ec">Lorsqu'un fichier devient trop gros ou a un long
youshe@993 41 historique, son filelog se voit stocker dans un fichier de données
youshe@993 42 (avec un suffixe <quote><literal>.d</literal></quote>) et un fichier
youshe@993 43 index (avec un suffixe<quote><literal>.i</literal></quote>)
youshe@993 44 distincts. La relation entre un fichier dans le répertoire de travail
youshe@993 45 et le filelog couvrant le suivi de son historique dans le dépôt est
youshe@993 46 illustré à la figure <xref linkend="fig:concepts:filelog"/>.</para>
bos@559 47
bos@591 48 <figure id="fig:concepts:filelog">
youshe@993 49 <title>Relations entre les fichiers dans le répertoire de travail et
youshe@993 50 leurs filelogs dans le dépôt</title>
youshe@993 51 <mediaobject> <imageobject><imagedata
youshe@993 52 fileref="figs/filelog.png"/></imageobject>
youshe@993 53 <textobject><phrase>XXX add text</phrase></textobject>
youshe@993 54 </mediaobject> </figure>
youshe@993 55
youshe@993 56 </sect2>
youshe@993 57 <sect2>
youshe@993 58 <title>Gestion des fichiers suivis</title>
youshe@993 59
youshe@993 60 <para id="x_2ee">Mercurial a recours à une structure nommée
youshe@993 61 <emphasis>manifest</emphasis> pour rassembler les informations sur
youshe@993 62 les fichiers dont il gère le suivi. Chaque entrée dans ce manifest
youshe@993 63 contient des informations sur les fichiers présents dans une révision
youshe@993 64 donnée. Une entrée store la liste des fichiers faisant partie de la
youshe@993 65 révision, la version de chaque fichier, et quelques autres
youshe@993 66 métadonnées sur ces fichiers.</para>
bos@559 67
bos@559 68 </sect2>
bos@559 69 <sect2>
bos@559 70 <title>Recording changeset information</title>
bos@559 71
youshe@993 72 <para id="x_2ef">The <emphasis>changelog</emphasis> contains
youshe@993 73 information about each changeset. Each revision records who
youshe@993 74 committed a change, the changeset comment, other pieces of
youshe@993 75 changeset-related information, and the revision of the manifest to
youshe@993 76 use.</para>
bos@559 77
bos@559 78 </sect2>
bos@559 79 <sect2>
bos@559 80 <title>Relationships between revisions</title>
bos@559 81
bos@584 82 <para id="x_2f0">Within a changelog, a manifest, or a filelog, each
bos@559 83 revision stores a pointer to its immediate parent (or to its
bos@559 84 two parents, if it's a merge revision). As I mentioned above,
bos@559 85 there are also relationships between revisions
bos@559 86 <emphasis>across</emphasis> these structures, and they are
bos@559 87 hierarchical in nature.</para>
bos@559 88
bos@584 89 <para id="x_2f1">For every changeset in a repository, there is exactly one
bos@559 90 revision stored in the changelog. Each revision of the
bos@559 91 changelog contains a pointer to a single revision of the
bos@559 92 manifest. A revision of the manifest stores a pointer to a
bos@559 93 single revision of each filelog tracked when that changeset
bos@592 94 was created. These relationships are illustrated in
bos@559 95 <xref linkend="fig:concepts:metadata"/>.</para>
bos@559 96
bos@591 97 <figure id="fig:concepts:metadata">
bos@591 98 <title>Metadata relationships</title>
bos@591 99 <mediaobject>
bos@594 100 <imageobject><imagedata fileref="figs/metadata.png"/></imageobject>
bos@591 101 <textobject><phrase>XXX add text</phrase></textobject>
bos@559 102 </mediaobject>
bos@591 103 </figure>
bos@559 104
bos@584 105 <para id="x_2f3">As the illustration shows, there is
bos@559 106 <emphasis>not</emphasis> a <quote>one to one</quote>
bos@559 107 relationship between revisions in the changelog, manifest, or
bos@701 108 filelog. If a file that
bos@559 109 Mercurial tracks hasn't changed between two changesets, the
bos@559 110 entry for that file in the two revisions of the manifest will
bos@701 111 point to the same revision of its filelog<footnote>
bos@702 112 <para id="x_725">It is possible (though unusual) for the manifest to
bos@701 113 remain the same between two changesets, in which case the
bos@701 114 changelog entries for those changesets will point to the
bos@701 115 same revision of the manifest.</para>
bos@701 116 </footnote>.</para>
bos@559 117
bos@559 118 </sect2>
bos@559 119 </sect1>
bos@559 120 <sect1>
bos@559 121 <title>Safe, efficient storage</title>
bos@559 122
bos@584 123 <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are
bos@559 124 provided by a single structure called the
bos@559 125 <emphasis>revlog</emphasis>.</para>
bos@559 126
bos@559 127 <sect2>
bos@559 128 <title>Efficient storage</title>
bos@559 129
bos@584 130 <para id="x_2f5">The revlog provides efficient storage of revisions using a
bos@559 131 <emphasis>delta</emphasis> mechanism. Instead of storing a
bos@559 132 complete copy of a file for each revision, it stores the
bos@559 133 changes needed to transform an older revision into the new
bos@559 134 revision. For many kinds of file data, these deltas are
bos@559 135 typically a fraction of a percent of the size of a full copy
bos@559 136 of a file.</para>
bos@559 137
bos@584 138 <para id="x_2f6">Some obsolete revision control systems can only work with
bos@559 139 deltas of text files. They must either store binary files as
bos@559 140 complete snapshots or encoded into a text representation, both
bos@559 141 of which are wasteful approaches. Mercurial can efficiently
bos@559 142 handle deltas of files with arbitrary binary contents; it
bos@559 143 doesn't need to treat text as special.</para>
bos@559 144
bos@559 145 </sect2>
bos@559 146 <sect2 id="sec:concepts:txn">
bos@559 147 <title>Safe operation</title>
bos@559 148
bos@584 149 <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to
bos@559 150 the end of a revlog file. It never modifies a section of a
bos@559 151 file after it has written it. This is both more robust and
bos@559 152 efficient than schemes that need to modify or rewrite
bos@559 153 data.</para>
bos@559 154
bos@584 155 <para id="x_2f8">In addition, Mercurial treats every write as part of a
bos@559 156 <emphasis>transaction</emphasis> that can span a number of
bos@559 157 files. A transaction is <emphasis>atomic</emphasis>: either
bos@559 158 the entire transaction succeeds and its effects are all
bos@559 159 visible to readers in one go, or the whole thing is undone.
bos@559 160 This guarantee of atomicity means that if you're running two
bos@559 161 copies of Mercurial, where one is reading data and one is
bos@559 162 writing it, the reader will never see a partially written
bos@559 163 result that might confuse it.</para>
bos@559 164
bos@584 165 <para id="x_2f9">The fact that Mercurial only appends to files makes it
bos@559 166 easier to provide this transactional guarantee. The easier it
bos@559 167 is to do stuff like this, the more confident you should be
bos@559 168 that it's done correctly.</para>
bos@559 169
bos@559 170 </sect2>
bos@559 171 <sect2>
bos@559 172 <title>Fast retrieval</title>
bos@559 173
bos@701 174 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to
bos@701 175 all earlier revision control systems: the problem of
bos@701 176 <emphasis>inefficient retrieval</emphasis>. Most revision
bos@701 177 control systems store the contents of a revision as an
bos@701 178 incremental series of modifications against a
bos@701 179 <quote>snapshot</quote>. (Some base the snapshot on the
bos@701 180 oldest revision, others on the newest.) To reconstruct a
bos@701 181 specific revision, you must first read the snapshot, and then
bos@701 182 every one of the revisions between the snapshot and your
bos@701 183 target revision. The more history that a file accumulates,
bos@701 184 the more revisions you must read, hence the longer it takes to
bos@701 185 reconstruct a particular revision.</para>
bos@559 186
bos@591 187 <figure id="fig:concepts:snapshot">
bos@591 188 <title>Snapshot of a revlog, with incremental deltas</title>
bos@591 189 <mediaobject>
bos@594 190 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject>
bos@591 191 <textobject><phrase>XXX add text</phrase></textobject>
bos@591 192 </mediaobject>
bos@591 193 </figure>
bos@559 194
bos@584 195 <para id="x_2fc">The innovation that Mercurial applies to this problem is
bos@559 196 simple but effective. Once the cumulative amount of delta
bos@559 197 information stored since the last snapshot exceeds a fixed
bos@559 198 threshold, it stores a new snapshot (compressed, of course),
bos@559 199 instead of another delta. This makes it possible to
bos@559 200 reconstruct <emphasis>any</emphasis> revision of a file
bos@559 201 quickly. This approach works so well that it has since been
bos@559 202 copied by several other revision control systems.</para>
bos@559 203
bos@592 204 <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates
bos@559 205 the idea. In an entry in a revlog's index file, Mercurial
bos@559 206 stores the range of entries from the data file that it must
bos@559 207 read to reconstruct a particular revision.</para>
bos@559 208
bos@559 209 <sect3>
bos@559 210 <title>Aside: the influence of video compression</title>
bos@559 211
bos@701 212 <para id="x_2fe">If you're familiar with video compression or
bos@701 213 have ever watched a TV feed through a digital cable or
bos@701 214 satellite service, you may know that most video compression
bos@701 215 schemes store each frame of video as a delta against its
bos@701 216 predecessor frame.</para>
bos@701 217
bos@701 218 <para id="x_2ff">Mercurial borrows this idea to make it
bos@701 219 possible to reconstruct a revision from a snapshot and a
bos@701 220 small number of deltas.</para>
bos@559 221
bos@559 222 </sect3>
bos@559 223 </sect2>
bos@559 224 <sect2>
bos@559 225 <title>Identification and strong integrity</title>
bos@559 226
bos@584 227 <para id="x_300">Along with delta or snapshot information, a revlog entry
bos@559 228 contains a cryptographic hash of the data that it represents.
bos@559 229 This makes it difficult to forge the contents of a revision,
bos@559 230 and easy to detect accidental corruption.</para>
bos@559 231
bos@584 232 <para id="x_301">Hashes provide more than a mere check against corruption;
bos@559 233 they are used as the identifiers for revisions. The changeset
bos@559 234 identification hashes that you see as an end user are from
bos@559 235 revisions of the changelog. Although filelogs and the
bos@559 236 manifest also use hashes, Mercurial only uses these behind the
bos@559 237 scenes.</para>
bos@559 238
bos@584 239 <para id="x_302">Mercurial verifies that hashes are correct when it
bos@559 240 retrieves file revisions and when it pulls changes from
bos@559 241 another repository. If it encounters an integrity problem, it
bos@559 242 will complain and stop whatever it's doing.</para>
bos@559 243
bos@584 244 <para id="x_303">In addition to the effect it has on retrieval efficiency,
bos@559 245 Mercurial's use of periodic snapshots makes it more robust
bos@559 246 against partial data corruption. If a revlog becomes partly
bos@559 247 corrupted due to a hardware error or system bug, it's often
bos@559 248 possible to reconstruct some or most revisions from the
bos@559 249 uncorrupted sections of the revlog, both before and after the
bos@559 250 corrupted section. This would not be possible with a
bos@559 251 delta-only storage model.</para>
bos@559 252 </sect2>
bos@559 253 </sect1>
bos@701 254
bos@559 255 <sect1>
bos@559 256 <title>Revision history, branching, and merging</title>
bos@559 257
bos@584 258 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its
bos@559 259 immediate ancestor revision, usually referred to as its
bos@559 260 <emphasis>parent</emphasis>. In fact, a revision contains room
bos@559 261 for not one parent, but two. Mercurial uses a special hash,
bos@559 262 called the <quote>null ID</quote>, to represent the idea
bos@559 263 <quote>there is no parent here</quote>. This hash is simply a
bos@559 264 string of zeroes.</para>
bos@559 265
bos@592 266 <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see
bos@559 267 an example of the conceptual structure of a revlog. Filelogs,
bos@559 268 manifests, and changelogs all have this same structure; they
bos@559 269 differ only in the kind of data stored in each delta or
bos@559 270 snapshot.</para>
bos@559 271
bos@584 272 <para id="x_306">The first revision in a revlog (at the bottom of the image)
bos@559 273 has the null ID in both of its parent slots. For a
bos@559 274 <quote>normal</quote> revision, its first parent slot contains
bos@559 275 the ID of its parent revision, and its second contains the null
bos@559 276 ID, indicating that the revision has only one real parent. Any
bos@559 277 two revisions that have the same parent ID are branches. A
bos@559 278 revision that represents a merge between branches has two normal
bos@559 279 revision IDs in its parent slots.</para>
bos@559 280
bos@591 281 <figure id="fig:concepts:revlog">
bos@591 282 <title>The conceptual structure of a revlog</title>
bos@591 283 <mediaobject>
bos@594 284 <imageobject><imagedata fileref="figs/revlog.png"/></imageobject>
bos@591 285 <textobject><phrase>XXX add text</phrase></textobject>
bos@591 286 </mediaobject>
bos@591 287 </figure>
bos@559 288
bos@559 289 </sect1>
bos@559 290 <sect1>
bos@559 291 <title>The working directory</title>
bos@559 292
bos@584 293 <para id="x_307">In the working directory, Mercurial stores a snapshot of the
bos@559 294 files from the repository as of a particular changeset.</para>
bos@559 295
bos@584 296 <para id="x_308">The working directory <quote>knows</quote> which changeset
bos@559 297 it contains. When you update the working directory to contain a
bos@559 298 particular changeset, Mercurial looks up the appropriate
bos@559 299 revision of the manifest to find out which files it was tracking
bos@559 300 at the time that changeset was committed, and which revision of
bos@559 301 each file was then current. It then recreates a copy of each of
bos@559 302 those files, with the same contents it had when the changeset
bos@559 303 was committed.</para>
bos@559 304
bos@701 305 <para id="x_309">The <emphasis>dirstate</emphasis> is a special
bos@701 306 structure that contains Mercurial's knowledge of the working
bos@701 307 directory. It is maintained as a file named
bos@701 308 <filename>.hg/dirstate</filename> inside a repository. The
bos@701 309 dirstate details which changeset the working directory is
bos@701 310 updated to, and all of the files that Mercurial is tracking in
bos@701 311 the working directory. It also lets Mercurial quickly notice
bos@701 312 changed files, by recording their checkout times and
bos@701 313 sizes.</para>
bos@559 314
bos@584 315 <para id="x_30a">Just as a revision of a revlog has room for two parents, so
bos@559 316 that it can represent either a normal revision (with one parent)
bos@559 317 or a merge of two earlier revisions, the dirstate has slots for
bos@559 318 two parents. When you use the <command role="hg-cmd">hg
bos@559 319 update</command> command, the changeset that you update to is
bos@559 320 stored in the <quote>first parent</quote> slot, and the null ID
bos@559 321 in the second. When you <command role="hg-cmd">hg
bos@559 322 merge</command> with another changeset, the first parent
bos@559 323 remains unchanged, and the second parent is filled in with the
bos@559 324 changeset you're merging with. The <command role="hg-cmd">hg
bos@559 325 parents</command> command tells you what the parents of the
bos@559 326 dirstate are.</para>
bos@559 327
bos@559 328 <sect2>
bos@559 329 <title>What happens when you commit</title>
bos@559 330
bos@584 331 <para id="x_30b">The dirstate stores parent information for more than just
bos@559 332 book-keeping purposes. Mercurial uses the parents of the
bos@559 333 dirstate as <emphasis>the parents of a new
bos@559 334 changeset</emphasis> when you perform a commit.</para>
bos@559 335
bos@591 336 <figure id="fig:concepts:wdir">
bos@591 337 <title>The working directory can have two parents</title>
bos@591 338 <mediaobject>
bos@594 339 <imageobject><imagedata fileref="figs/wdir.png"/></imageobject>
bos@591 340 <textobject><phrase>XXX add text</phrase></textobject>
bos@591 341 </mediaobject>
bos@591 342 </figure>
bos@559 343
bos@592 344 <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the
bos@559 345 normal state of the working directory, where it has a single
bos@559 346 changeset as parent. That changeset is the
bos@559 347 <emphasis>tip</emphasis>, the newest changeset in the
bos@559 348 repository that has no children.</para>
bos@559 349
bos@591 350 <figure id="fig:concepts:wdir-after-commit">
bos@591 351 <title>The working directory gains new parents after a
bos@591 352 commit</title>
bos@591 353 <mediaobject>
bos@594 354 <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject>
bos@591 355 <textobject><phrase>XXX add text</phrase></textobject>
bos@591 356 </mediaobject>
bos@591 357 </figure>
bos@559 358
bos@584 359 <para id="x_30f">It's useful to think of the working directory as
bos@559 360 <quote>the changeset I'm about to commit</quote>. Any files
bos@559 361 that you tell Mercurial that you've added, removed, renamed,
bos@559 362 or copied will be reflected in that changeset, as will
bos@559 363 modifications to any files that Mercurial is already tracking;
bos@559 364 the new changeset will have the parents of the working
bos@559 365 directory as its parents.</para>
bos@559 366
bos@592 367 <para id="x_310">After a commit, Mercurial will update the
bos@592 368 parents of the working directory, so that the first parent is
bos@592 369 the ID of the new changeset, and the second is the null ID.
bos@592 370 This is shown in <xref
bos@592 371 linkend="fig:concepts:wdir-after-commit"/>. Mercurial
bos@559 372 doesn't touch any of the files in the working directory when
bos@559 373 you commit; it just modifies the dirstate to note its new
bos@559 374 parents.</para>
bos@559 375
bos@559 376 </sect2>
bos@559 377 <sect2>
bos@559 378 <title>Creating a new head</title>
bos@559 379
bos@584 380 <para id="x_311">It's perfectly normal to update the working directory to a
bos@559 381 changeset other than the current tip. For example, you might
bos@559 382 want to know what your project looked like last Tuesday, or
bos@559 383 you could be looking through changesets to see which one
bos@559 384 introduced a bug. In cases like this, the natural thing to do
bos@559 385 is update the working directory to the changeset you're
bos@559 386 interested in, and then examine the files in the working
bos@559 387 directory directly to see their contents as they were when you
bos@559 388 committed that changeset. The effect of this is shown in
bos@592 389 <xref linkend="fig:concepts:wdir-pre-branch"/>.</para>
bos@559 390
bos@591 391 <figure id="fig:concepts:wdir-pre-branch">
bos@591 392 <title>The working directory, updated to an older
bos@591 393 changeset</title>
bos@591 394 <mediaobject>
bos@594 395 <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject>
bos@591 396 <textobject><phrase>XXX add text</phrase></textobject>
bos@591 397 </mediaobject>
bos@591 398 </figure>
bos@559 399
bos@592 400 <para id="x_313">Having updated the working directory to an
bos@592 401 older changeset, what happens if you make some changes, and
bos@592 402 then commit? Mercurial behaves in the same way as I outlined
bos@559 403 above. The parents of the working directory become the
bos@559 404 parents of the new changeset. This new changeset has no
bos@559 405 children, so it becomes the new tip. And the repository now
bos@559 406 contains two changesets that have no children; we call these
bos@559 407 <emphasis>heads</emphasis>. You can see the structure that
bos@592 408 this creates in <xref
bos@559 409 linkend="fig:concepts:wdir-branch"/>.</para>
bos@559 410
bos@591 411 <figure id="fig:concepts:wdir-branch">
bos@591 412 <title>After a commit made while synced to an older
bos@591 413 changeset</title>
bos@591 414 <mediaobject>
bos@594 415 <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject>
bos@591 416 <textobject><phrase>XXX add text</phrase></textobject>
bos@591 417 </mediaobject>
bos@591 418 </figure>
bos@559 419
bos@559 420 <note>
bos@701 421 <para id="x_315">If you're new to Mercurial, you should keep
bos@701 422 in mind a common <quote>error</quote>, which is to use the
bos@701 423 <command role="hg-cmd">hg pull</command> command without any
bos@559 424 options. By default, the <command role="hg-cmd">hg
bos@559 425 pull</command> command <emphasis>does not</emphasis>
bos@559 426 update the working directory, so you'll bring new changesets
bos@559 427 into your repository, but the working directory will stay
bos@559 428 synced at the same changeset as before the pull. If you
bos@559 429 make some changes and commit afterwards, you'll thus create
bos@559 430 a new head, because your working directory isn't synced to
bos@701 431 whatever the current tip is. To combine the operation of a
bos@701 432 pull, followed by an update, run <command>hg pull
bos@701 433 -u</command>.</para>
bos@701 434
bos@701 435 <para id="x_316">I put the word <quote>error</quote> in quotes
bos@701 436 because all that you need to do to rectify the situation
bos@701 437 where you created a new head by accident is
bos@701 438 <command role="hg-cmd">hg merge</command>, then <command
bos@701 439 role="hg-cmd">hg commit</command>. In other words, this
bos@701 440 almost never has negative consequences; it's just something
bos@701 441 of a surprise for newcomers. I'll discuss other ways to
bos@701 442 avoid this behavior, and why Mercurial behaves in this
bos@701 443 initially surprising way, later on.</para>
bos@559 444 </note>
bos@559 445
bos@559 446 </sect2>
bos@559 447 <sect2>
bos@620 448 <title>Merging changes</title>
bos@559 449
bos@592 450 <para id="x_317">When you run the <command role="hg-cmd">hg
bos@592 451 merge</command> command, Mercurial leaves the first parent
bos@592 452 of the working directory unchanged, and sets the second parent
bos@592 453 to the changeset you're merging with, as shown in <xref
bos@559 454 linkend="fig:concepts:wdir-merge"/>.</para>
bos@559 455
bos@591 456 <figure id="fig:concepts:wdir-merge">
bos@591 457 <title>Merging two heads</title>
bos@591 458 <mediaobject>
bos@591 459 <imageobject>
bos@594 460 <imagedata fileref="figs/wdir-merge.png"/>
bos@591 461 </imageobject>
bos@591 462 <textobject><phrase>XXX add text</phrase></textobject>
bos@591 463 </mediaobject>
bos@591 464 </figure>
bos@559 465
bos@584 466 <para id="x_319">Mercurial also has to modify the working directory, to
bos@559 467 merge the files managed in the two changesets. Simplified a
bos@559 468 little, the merging process goes like this, for every file in
bos@559 469 the manifests of both changesets.</para>
bos@559 470 <itemizedlist>
bos@584 471 <listitem><para id="x_31a">If neither changeset has modified a file, do
bos@559 472 nothing with that file.</para>
bos@559 473 </listitem>
bos@584 474 <listitem><para id="x_31b">If one changeset has modified a file, and the
bos@559 475 other hasn't, create the modified copy of the file in the
bos@559 476 working directory.</para>
bos@559 477 </listitem>
bos@584 478 <listitem><para id="x_31c">If one changeset has removed a file, and the
bos@559 479 other hasn't (or has also deleted it), delete the file
bos@559 480 from the working directory.</para>
bos@559 481 </listitem>
bos@584 482 <listitem><para id="x_31d">If one changeset has removed a file, but the
bos@559 483 other has modified the file, ask the user what to do: keep
bos@559 484 the modified file, or remove it?</para>
bos@559 485 </listitem>
bos@584 486 <listitem><para id="x_31e">If both changesets have modified a file,
bos@559 487 invoke an external merge program to choose the new
bos@559 488 contents for the merged file. This may require input from
bos@559 489 the user.</para>
bos@559 490 </listitem>
bos@584 491 <listitem><para id="x_31f">If one changeset has modified a file, and the
bos@559 492 other has renamed or copied the file, make sure that the
bos@559 493 changes follow the new name of the file.</para>
bos@559 494 </listitem></itemizedlist>
bos@584 495 <para id="x_320">There are more details&emdash;merging has plenty of corner
bos@559 496 cases&emdash;but these are the most common choices that are
bos@559 497 involved in a merge. As you can see, most cases are
bos@559 498 completely automatic, and indeed most merges finish
bos@559 499 automatically, without requiring your input to resolve any
bos@559 500 conflicts.</para>
bos@559 501
bos@584 502 <para id="x_321">When you're thinking about what happens when you commit
bos@559 503 after a merge, once again the working directory is <quote>the
bos@559 504 changeset I'm about to commit</quote>. After the <command
bos@559 505 role="hg-cmd">hg merge</command> command completes, the
bos@559 506 working directory has two parents; these will become the
bos@559 507 parents of the new changeset.</para>
bos@559 508
bos@701 509 <para id="x_322">Mercurial lets you perform multiple merges, but
bos@701 510 you must commit the results of each individual merge as you
bos@701 511 go. This is necessary because Mercurial only tracks two
bos@701 512 parents for both revisions and the working directory. While
bos@701 513 it would be technically feasible to merge multiple changesets
bos@701 514 at once, Mercurial avoids this for simplicity. With multi-way
bos@701 515 merges, the risks of user confusion, nasty conflict
bos@701 516 resolution, and making a terrible mess of a merge would grow
bos@701 517 intolerable.</para>
bos@559 518
bos@559 519 </sect2>
bos@620 520
bos@620 521 <sect2>
bos@620 522 <title>Merging and renames</title>
bos@620 523
bos@676 524 <para id="x_69a">A surprising number of revision control systems pay little
bos@620 525 or no attention to a file's <emphasis>name</emphasis> over
bos@620 526 time. For instance, it used to be common that if a file got
bos@620 527 renamed on one side of a merge, the changes from the other
bos@620 528 side would be silently dropped.</para>
bos@620 529
bos@676 530 <para id="x_69b">Mercurial records metadata when you tell it to perform a
bos@620 531 rename or copy. It uses this metadata during a merge to do the
bos@620 532 right thing in the case of a merge. For instance, if I rename
bos@620 533 a file, and you edit it without renaming it, when we merge our
bos@620 534 work the file will be renamed and have your edits
bos@620 535 applied.</para>
bos@620 536 </sect2>
bos@559 537 </sect1>
bos@620 538
bos@559 539 <sect1>
bos@559 540 <title>Other interesting design features</title>
bos@559 541
bos@584 542 <para id="x_323">In the sections above, I've tried to highlight some of the
bos@559 543 most important aspects of Mercurial's design, to illustrate that
bos@559 544 it pays careful attention to reliability and performance.
bos@559 545 However, the attention to detail doesn't stop there. There are
bos@559 546 a number of other aspects of Mercurial's construction that I
bos@559 547 personally find interesting. I'll detail a few of them here,
bos@559 548 separate from the <quote>big ticket</quote> items above, so that
bos@559 549 if you're interested, you can gain a better idea of the amount
bos@559 550 of thinking that goes into a well-designed system.</para>
bos@559 551
bos@559 552 <sect2>
bos@559 553 <title>Clever compression</title>
bos@559 554
bos@584 555 <para id="x_324">When appropriate, Mercurial will store both snapshots and
bos@559 556 deltas in compressed form. It does this by always
bos@559 557 <emphasis>trying to</emphasis> compress a snapshot or delta,
bos@559 558 but only storing the compressed version if it's smaller than
bos@559 559 the uncompressed version.</para>
bos@559 560
bos@584 561 <para id="x_325">This means that Mercurial does <quote>the right
bos@559 562 thing</quote> when storing a file whose native form is
bos@559 563 compressed, such as a <literal>zip</literal> archive or a JPEG
bos@559 564 image. When these types of files are compressed a second
bos@559 565 time, the resulting file is usually bigger than the
bos@559 566 once-compressed form, and so Mercurial will store the plain
bos@559 567 <literal>zip</literal> or JPEG.</para>
bos@559 568
bos@584 569 <para id="x_326">Deltas between revisions of a compressed file are usually
bos@559 570 larger than snapshots of the file, and Mercurial again does
bos@559 571 <quote>the right thing</quote> in these cases. It finds that
bos@559 572 such a delta exceeds the threshold at which it should store a
bos@559 573 complete snapshot of the file, so it stores the snapshot,
bos@559 574 again saving space compared to a naive delta-only
bos@559 575 approach.</para>
bos@559 576
bos@559 577 <sect3>
bos@559 578 <title>Network recompression</title>
bos@559 579
bos@584 580 <para id="x_327">When storing revisions on disk, Mercurial uses the
bos@559 581 <quote>deflate</quote> compression algorithm (the same one
bos@559 582 used by the popular <literal>zip</literal> archive format),
bos@559 583 which balances good speed with a respectable compression
bos@559 584 ratio. However, when transmitting revision data over a
bos@559 585 network connection, Mercurial uncompresses the compressed
bos@559 586 revision data.</para>
bos@559 587
bos@584 588 <para id="x_328">If the connection is over HTTP, Mercurial recompresses
bos@559 589 the entire stream of data using a compression algorithm that
bos@559 590 gives a better compression ratio (the Burrows-Wheeler
bos@559 591 algorithm from the widely used <literal>bzip2</literal>
bos@559 592 compression package). This combination of algorithm and
bos@559 593 compression of the entire stream (instead of a revision at a
bos@559 594 time) substantially reduces the number of bytes to be
bos@620 595 transferred, yielding better network performance over most
bos@620 596 kinds of network.</para>
bos@559 597
bos@701 598 <para id="x_329">If the connection is over
bos@701 599 <command>ssh</command>, Mercurial
bos@701 600 <emphasis>doesn't</emphasis> recompress the stream, because
bos@701 601 <command>ssh</command> can already do this itself. You can
bos@701 602 tell Mercurial to always use <command>ssh</command>'s
bos@701 603 compression feature by editing the
bos@701 604 <filename>.hgrc</filename> file in your home directory as
bos@701 605 follows.</para>
bos@701 606
bos@701 607 <programlisting>[ui]
bos@701 608 ssh = ssh -C</programlisting>
bos@559 609
bos@559 610 </sect3>
bos@559 611 </sect2>
bos@559 612 <sect2>
bos@559 613 <title>Read/write ordering and atomicity</title>
bos@559 614
bos@592 615 <para id="x_32a">Appending to files isn't the whole story when
bos@592 616 it comes to guaranteeing that a reader won't see a partial
bos@592 617 write. If you recall <xref linkend="fig:concepts:metadata"/>,
bos@701 618 revisions in the changelog point to revisions in the manifest,
bos@701 619 and revisions in the manifest point to revisions in filelogs.
bos@592 620 This hierarchy is deliberate.</para>
bos@559 621
bos@584 622 <para id="x_32b">A writer starts a transaction by writing filelog and
bos@559 623 manifest data, and doesn't write any changelog data until
bos@559 624 those are finished. A reader starts by reading changelog
bos@559 625 data, then manifest data, followed by filelog data.</para>
bos@559 626
bos@584 627 <para id="x_32c">Since the writer has always finished writing filelog and
bos@559 628 manifest data before it writes to the changelog, a reader will
bos@559 629 never read a pointer to a partially written manifest revision
bos@559 630 from the changelog, and it will never read a pointer to a
bos@559 631 partially written filelog revision from the manifest.</para>
bos@559 632
bos@559 633 </sect2>
bos@559 634 <sect2>
bos@559 635 <title>Concurrent access</title>
bos@559 636
bos@584 637 <para id="x_32d">The read/write ordering and atomicity guarantees mean that
bos@559 638 Mercurial never needs to <emphasis>lock</emphasis> a
bos@559 639 repository when it's reading data, even if the repository is
bos@559 640 being written to while the read is occurring. This has a big
bos@559 641 effect on scalability; you can have an arbitrary number of
bos@559 642 Mercurial processes safely reading data from a repository
bos@701 643 all at once, no matter whether it's being written to or
bos@559 644 not.</para>
bos@559 645
bos@584 646 <para id="x_32e">The lockless nature of reading means that if you're
bos@559 647 sharing a repository on a multi-user system, you don't need to
bos@559 648 grant other local users permission to
bos@559 649 <emphasis>write</emphasis> to your repository in order for
bos@559 650 them to be able to clone it or pull changes from it; they only
bos@559 651 need <emphasis>read</emphasis> permission. (This is
bos@559 652 <emphasis>not</emphasis> a common feature among revision
bos@559 653 control systems, so don't take it for granted! Most require
bos@559 654 readers to be able to lock a repository to access it safely,
bos@559 655 and this requires write permission on at least one directory,
bos@559 656 which of course makes for all kinds of nasty and annoying
bos@559 657 security and administrative problems.)</para>
bos@559 658
bos@584 659 <para id="x_32f">Mercurial uses locks to ensure that only one process can
bos@559 660 write to a repository at a time (the locking mechanism is safe
bos@559 661 even over filesystems that are notoriously hostile to locking,
bos@559 662 such as NFS). If a repository is locked, a writer will wait
bos@559 663 for a while to retry if the repository becomes unlocked, but
bos@559 664 if the repository remains locked for too long, the process
bos@559 665 attempting to write will time out after a while. This means
bos@559 666 that your daily automated scripts won't get stuck forever and
bos@559 667 pile up if a system crashes unnoticed, for example. (Yes, the
bos@559 668 timeout is configurable, from zero to infinity.)</para>
bos@559 669
bos@559 670 <sect3>
bos@559 671 <title>Safe dirstate access</title>
bos@559 672
bos@584 673 <para id="x_330">As with revision data, Mercurial doesn't take a lock to
bos@559 674 read the dirstate file; it does acquire a lock to write it.
bos@559 675 To avoid the possibility of reading a partially written copy
bos@559 676 of the dirstate file, Mercurial writes to a file with a
bos@559 677 unique name in the same directory as the dirstate file, then
bos@559 678 renames the temporary file atomically to
bos@559 679 <filename>dirstate</filename>. The file named
bos@559 680 <filename>dirstate</filename> is thus guaranteed to be
bos@559 681 complete, not partially written.</para>
bos@559 682
bos@559 683 </sect3>
bos@559 684 </sect2>
bos@559 685 <sect2>
bos@559 686 <title>Avoiding seeks</title>
bos@559 687
bos@584 688 <para id="x_331">Critical to Mercurial's performance is the avoidance of
bos@559 689 seeks of the disk head, since any seek is far more expensive
bos@559 690 than even a comparatively large read operation.</para>
bos@559 691
bos@584 692 <para id="x_332">This is why, for example, the dirstate is stored in a
bos@559 693 single file. If there were a dirstate file per directory that
bos@559 694 Mercurial tracked, the disk would seek once per directory.
bos@559 695 Instead, Mercurial reads the entire single dirstate file in
bos@559 696 one step.</para>
bos@559 697
bos@584 698 <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme
bos@559 699 when cloning a repository on local storage. Instead of
bos@559 700 copying every revlog file from the old repository into the new
bos@559 701 repository, it makes a <quote>hard link</quote>, which is a
bos@559 702 shorthand way to say <quote>these two names point to the same
bos@559 703 file</quote>. When Mercurial is about to write to one of a
bos@559 704 revlog's files, it checks to see if the number of names
bos@559 705 pointing at the file is greater than one. If it is, more than
bos@559 706 one repository is using the file, so Mercurial makes a new
bos@559 707 copy of the file that is private to this repository.</para>
bos@559 708
bos@584 709 <para id="x_334">A few revision control developers have pointed out that
bos@559 710 this idea of making a complete private copy of a file is not
bos@559 711 very efficient in its use of storage. While this is true,
bos@559 712 storage is cheap, and this method gives the highest
bos@559 713 performance while deferring most book-keeping to the operating
bos@559 714 system. An alternative scheme would most likely reduce
bos@701 715 performance and increase the complexity of the software, but
bos@701 716 speed and simplicity are key to the <quote>feel</quote> of
bos@559 717 day-to-day use.</para>
bos@559 718
bos@559 719 </sect2>
bos@559 720 <sect2>
bos@559 721 <title>Other contents of the dirstate</title>
bos@559 722
bos@584 723 <para id="x_335">Because Mercurial doesn't force you to tell it when you're
bos@559 724 modifying a file, it uses the dirstate to store some extra
bos@559 725 information so it can determine efficiently whether you have
bos@559 726 modified a file. For each file in the working directory, it
bos@559 727 stores the time that it last modified the file itself, and the
bos@559 728 size of the file at that time.</para>
bos@559 729
bos@584 730 <para id="x_336">When you explicitly <command role="hg-cmd">hg
bos@559 731 add</command>, <command role="hg-cmd">hg remove</command>,
bos@559 732 <command role="hg-cmd">hg rename</command> or <command
bos@559 733 role="hg-cmd">hg copy</command> files, Mercurial updates the
bos@559 734 dirstate so that it knows what to do with those files when you
bos@559 735 commit.</para>
bos@559 736
bos@701 737 <para id="x_337">The dirstate helps Mercurial to efficiently
bos@701 738 check the status of files in a repository.</para>
bos@701 739
bos@701 740 <itemizedlist>
bos@701 741 <listitem>
bos@702 742 <para id="x_726">When Mercurial checks the state of a file in the
bos@701 743 working directory, it first checks a file's modification
bos@701 744 time against the time in the dirstate that records when
bos@701 745 Mercurial last wrote the file. If the last modified time
bos@701 746 is the same as the time when Mercurial wrote the file, the
bos@701 747 file must not have been modified, so Mercurial does not
bos@701 748 need to check any further.</para>
bos@701 749 </listitem>
bos@701 750 <listitem>
bos@702 751 <para id="x_727">If the file's size has changed, the file must have
bos@701 752 been modified. If the modification time has changed, but
bos@701 753 the size has not, only then does Mercurial need to
bos@701 754 actually read the contents of the file to see if it has
bos@701 755 changed.</para>
bos@701 756 </listitem>
bos@701 757 </itemizedlist>
bos@701 758
bos@702 759 <para id="x_728">Storing the modification time and size dramatically
bos@701 760 reduces the number of read operations that Mercurial needs to
bos@701 761 perform when we run commands like <command>hg status</command>.
bos@701 762 This results in large performance improvements.</para>
bos@559 763 </sect2>
bos@559 764 </sect1>
belaran@964 765 </chapter>
belaran@964 766
belaran@964 767 <!--
belaran@964 768 local variables:
belaran@964 769 sgml-parent-document: ("00book.xml" "book" "chapter")
belaran@964 770 end:
bos@559 771 -->