hgbook
changeset 992:8b0f1e2984d0
French translation : sync with original ch04-concepts
author | Frédéric Bouquet <youshe.jaalon@gmail.com> |
---|---|
date | Fri Sep 11 14:30:20 2009 +0200 (2009-09-11) |
parents | b4ff7b04efdc |
children | 71dbda516572 |
files | fr/ch04-concepts.xml |
line diff
1.1 --- a/fr/ch04-concepts.xml Thu Sep 10 14:45:17 2009 +0200 1.2 +++ b/fr/ch04-concepts.xml Fri Sep 11 14:30:20 2009 +0200 1.3 @@ -1,710 +1,778 @@ 1.4 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> 1.5 1.6 -<chapter> 1.7 -<title>Behind the scenes</title> 1.8 -<para>\label{chap:concepts}</para> 1.9 - 1.10 -<para>Unlike many revision control systems, the concepts upon which 1.11 -Mercurial is built are simple enough that it's easy to understand how 1.12 -the software really works. Knowing this certainly isn't necessary, 1.13 -but I find it useful to have a <quote>mental model</quote> of what's going on.</para> 1.14 - 1.15 -<para>This understanding gives me confidence that Mercurial has been 1.16 -carefully designed to be both <emphasis>safe</emphasis> and <emphasis>efficient</emphasis>. And 1.17 -just as importantly, if it's easy for me to retain a good idea of what 1.18 -the software is doing when I perform a revision control task, I'm less 1.19 -likely to be surprised by its behaviour.</para> 1.20 - 1.21 -<para>In this chapter, we'll initially cover the core concepts behind 1.22 -Mercurial's design, then continue to discuss some of the interesting 1.23 -details of its implementation.</para> 1.24 - 1.25 -<sect1> 1.26 -<title>Mercurial's historical record</title> 1.27 - 1.28 -<sect2> 1.29 -<title>Tracking the history of a single file</title> 1.30 - 1.31 -<para>When Mercurial tracks modifications to a file, it stores the history 1.32 -of that file in a metadata object called a <emphasis>filelog</emphasis>. Each entry 1.33 -in the filelog contains enough information to reconstruct one revision 1.34 -of the file that is being tracked. Filelogs are stored as files in 1.35 -the <filename role="special" class="directory">.hg/store/data</filename> directory. A filelog contains two kinds 1.36 -of information: revision data, and an index to help Mercurial to find 1.37 -a revision efficiently.</para> 1.38 - 1.39 -<para>A file that is large, or has a lot of history, has its filelog stored 1.40 -in separate data (<quote><literal>.d</literal></quote> suffix) and index (<quote><literal>.i</literal></quote> 1.41 -suffix) files. For small files without much history, the revision 1.42 -data and index are combined in a single <quote><literal>.i</literal></quote> file. The 1.43 -correspondence between a file in the working directory and the filelog 1.44 -that tracks its history in the repository is illustrated in 1.45 -figure <xref linkend="fig:concepts:filelog"/>.</para> 1.46 - 1.47 -<informalfigure> 1.48 - 1.49 -<para> <mediaobject><imageobject><imagedata fileref="filelog"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.50 - \caption{Relationships between files in working directory and 1.51 - filelogs in repository} 1.52 - \label{fig:concepts:filelog}</para> 1.53 -</informalfigure> 1.54 - 1.55 -</sect2> 1.56 -<sect2> 1.57 -<title>Managing tracked files</title> 1.58 - 1.59 -<para>Mercurial uses a structure called a <emphasis>manifest</emphasis> to collect 1.60 -together information about the files that it tracks. Each entry in 1.61 -the manifest contains information about the files present in a single 1.62 -changeset. An entry records which files are present in the changeset, 1.63 -the revision of each file, and a few other pieces of file metadata.</para> 1.64 - 1.65 -</sect2> 1.66 -<sect2> 1.67 -<title>Recording changeset information</title> 1.68 - 1.69 -<para>The <emphasis>changelog</emphasis> contains information about each changeset. Each 1.70 -revision records who committed a change, the changeset comment, other 1.71 -pieces of changeset-related information, and the revision of the 1.72 -manifest to use. 1.73 -</para> 1.74 - 1.75 -</sect2> 1.76 -<sect2> 1.77 -<title>Relationships between revisions</title> 1.78 - 1.79 -<para>Within a changelog, a manifest, or a filelog, each revision stores a 1.80 -pointer to its immediate parent (or to its two parents, if it's a 1.81 -merge revision). As I mentioned above, there are also relationships 1.82 -between revisions <emphasis>across</emphasis> these structures, and they are 1.83 -hierarchical in nature. 1.84 -</para> 1.85 - 1.86 -<para>For every changeset in a repository, there is exactly one revision 1.87 -stored in the changelog. Each revision of the changelog contains a 1.88 -pointer to a single revision of the manifest. A revision of the 1.89 -manifest stores a pointer to a single revision of each filelog tracked 1.90 -when that changeset was created. These relationships are illustrated 1.91 -in figure <xref linkend="fig:concepts:metadata"/>. 1.92 -</para> 1.93 - 1.94 -<informalfigure> 1.95 - 1.96 -<para> <mediaobject><imageobject><imagedata fileref="metadata"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.97 - <caption><para>Metadata relationships</para></caption> 1.98 - \label{fig:concepts:metadata} 1.99 -</para> 1.100 -</informalfigure> 1.101 - 1.102 -<para>As the illustration shows, there is <emphasis>not</emphasis> a <quote>one to one</quote> 1.103 -relationship between revisions in the changelog, manifest, or filelog. 1.104 -If the manifest hasn't changed between two changesets, the changelog 1.105 -entries for those changesets will point to the same revision of the 1.106 -manifest. If a file that Mercurial tracks hasn't changed between two 1.107 -changesets, the entry for that file in the two revisions of the 1.108 -manifest will point to the same revision of its filelog. 1.109 -</para> 1.110 - 1.111 -</sect2> 1.112 -</sect1> 1.113 -<sect1> 1.114 -<title>Safe, efficient storage</title> 1.115 - 1.116 -<para>The underpinnings of changelogs, manifests, and filelogs are provided 1.117 -by a single structure called the <emphasis>revlog</emphasis>. 1.118 -</para> 1.119 - 1.120 -<sect2> 1.121 -<title>Efficient storage</title> 1.122 - 1.123 -<para>The revlog provides efficient storage of revisions using a 1.124 -<emphasis>delta</emphasis> mechanism. Instead of storing a complete copy of a file 1.125 -for each revision, it stores the changes needed to transform an older 1.126 -revision into the new revision. For many kinds of file data, these 1.127 -deltas are typically a fraction of a percent of the size of a full 1.128 -copy of a file. 1.129 -</para> 1.130 - 1.131 -<para>Some obsolete revision control systems can only work with deltas of 1.132 -text files. They must either store binary files as complete snapshots 1.133 -or encoded into a text representation, both of which are wasteful 1.134 -approaches. Mercurial can efficiently handle deltas of files with 1.135 -arbitrary binary contents; it doesn't need to treat text as special. 1.136 -</para> 1.137 - 1.138 -</sect2> 1.139 -<sect2> 1.140 -<title>Safe operation</title> 1.141 -<para>\label{sec:concepts:txn} 1.142 -</para> 1.143 - 1.144 -<para>Mercurial only ever <emphasis>appends</emphasis> data to the end of a revlog file. 1.145 -It never modifies a section of a file after it has written it. This 1.146 -is both more robust and efficient than schemes that need to modify or 1.147 -rewrite data. 1.148 -</para> 1.149 - 1.150 -<para>In addition, Mercurial treats every write as part of a 1.151 -<emphasis>transaction</emphasis> that can span a number of files. A transaction is 1.152 -<emphasis>atomic</emphasis>: either the entire transaction succeeds and its effects 1.153 -are all visible to readers in one go, or the whole thing is undone. 1.154 -This guarantee of atomicity means that if you're running two copies of 1.155 -Mercurial, where one is reading data and one is writing it, the reader 1.156 -will never see a partially written result that might confuse it. 1.157 -</para> 1.158 - 1.159 -<para>The fact that Mercurial only appends to files makes it easier to 1.160 -provide this transactional guarantee. The easier it is to do stuff 1.161 -like this, the more confident you should be that it's done correctly. 1.162 -</para> 1.163 - 1.164 -</sect2> 1.165 -<sect2> 1.166 -<title>Fast retrieval</title> 1.167 - 1.168 -<para>Mercurial cleverly avoids a pitfall common to all earlier 1.169 -revision control systems: the problem of <emphasis>inefficient retrieval</emphasis>. 1.170 -Most revision control systems store the contents of a revision as an 1.171 -incremental series of modifications against a <quote>snapshot</quote>. To 1.172 -reconstruct a specific revision, you must first read the snapshot, and 1.173 -then every one of the revisions between the snapshot and your target 1.174 -revision. The more history that a file accumulates, the more 1.175 -revisions you must read, hence the longer it takes to reconstruct a 1.176 -particular revision. 1.177 -</para> 1.178 - 1.179 -<informalfigure> 1.180 - 1.181 -<para> <mediaobject><imageobject><imagedata fileref="snapshot"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.182 - <caption><para>Snapshot of a revlog, with incremental deltas</para></caption> 1.183 - \label{fig:concepts:snapshot} 1.184 -</para> 1.185 -</informalfigure> 1.186 - 1.187 -<para>The innovation that Mercurial applies to this problem is simple but 1.188 -effective. Once the cumulative amount of delta information stored 1.189 -since the last snapshot exceeds a fixed threshold, it stores a new 1.190 -snapshot (compressed, of course), instead of another delta. This 1.191 -makes it possible to reconstruct <emphasis>any</emphasis> revision of a file 1.192 -quickly. This approach works so well that it has since been copied by 1.193 -several other revision control systems. 1.194 -</para> 1.195 - 1.196 -<para>Figure <xref linkend="fig:concepts:snapshot"/> illustrates the idea. In an entry 1.197 -in a revlog's index file, Mercurial stores the range of entries from 1.198 -the data file that it must read to reconstruct a particular revision. 1.199 -</para> 1.200 - 1.201 -<sect3> 1.202 -<title>Aside: the influence of video compression</title> 1.203 - 1.204 -<para>If you're familiar with video compression or have ever watched a TV 1.205 -feed through a digital cable or satellite service, you may know that 1.206 -most video compression schemes store each frame of video as a delta 1.207 -against its predecessor frame. In addition, these schemes use 1.208 -<quote>lossy</quote> compression techniques to increase the compression ratio, so 1.209 -visual errors accumulate over the course of a number of inter-frame 1.210 -deltas. 1.211 -</para> 1.212 - 1.213 -<para>Because it's possible for a video stream to <quote>drop out</quote> occasionally 1.214 -due to signal glitches, and to limit the accumulation of artefacts 1.215 -introduced by the lossy compression process, video encoders 1.216 -periodically insert a complete frame (called a <quote>key frame</quote>) into the 1.217 -video stream; the next delta is generated against that frame. This 1.218 -means that if the video signal gets interrupted, it will resume once 1.219 -the next key frame is received. Also, the accumulation of encoding 1.220 -errors restarts anew with each key frame. 1.221 -</para> 1.222 - 1.223 -</sect3> 1.224 -</sect2> 1.225 -<sect2> 1.226 -<title>Identification and strong integrity</title> 1.227 - 1.228 -<para>Along with delta or snapshot information, a revlog entry contains a 1.229 -cryptographic hash of the data that it represents. This makes it 1.230 -difficult to forge the contents of a revision, and easy to detect 1.231 -accidental corruption. 1.232 -</para> 1.233 - 1.234 -<para>Hashes provide more than a mere check against corruption; they are 1.235 -used as the identifiers for revisions. The changeset identification 1.236 -hashes that you see as an end user are from revisions of the 1.237 -changelog. Although filelogs and the manifest also use hashes, 1.238 -Mercurial only uses these behind the scenes. 1.239 -</para> 1.240 - 1.241 -<para>Mercurial verifies that hashes are correct when it retrieves file 1.242 -revisions and when it pulls changes from another repository. If it 1.243 -encounters an integrity problem, it will complain and stop whatever 1.244 -it's doing. 1.245 -</para> 1.246 - 1.247 -<para>In addition to the effect it has on retrieval efficiency, Mercurial's 1.248 -use of periodic snapshots makes it more robust against partial data 1.249 -corruption. If a revlog becomes partly corrupted due to a hardware 1.250 -error or system bug, it's often possible to reconstruct some or most 1.251 -revisions from the uncorrupted sections of the revlog, both before and 1.252 -after the corrupted section. This would not be possible with a 1.253 -delta-only storage model. 1.254 -</para> 1.255 - 1.256 -<para>\section{Revision history, branching, 1.257 - and merging} 1.258 -</para> 1.259 - 1.260 -<para>Every entry in a Mercurial revlog knows the identity of its immediate 1.261 -ancestor revision, usually referred to as its <emphasis>parent</emphasis>. In fact, 1.262 -a revision contains room for not one parent, but two. Mercurial uses 1.263 -a special hash, called the <quote>null ID</quote>, to represent the idea <quote>there 1.264 -is no parent here</quote>. This hash is simply a string of zeroes. 1.265 -</para> 1.266 - 1.267 -<para>In figure <xref linkend="fig:concepts:revlog"/>, you can see an example of the 1.268 -conceptual structure of a revlog. Filelogs, manifests, and changelogs 1.269 -all have this same structure; they differ only in the kind of data 1.270 -stored in each delta or snapshot. 1.271 -</para> 1.272 - 1.273 -<para>The first revision in a revlog (at the bottom of the image) has the 1.274 -null ID in both of its parent slots. For a <quote>normal</quote> revision, its 1.275 -first parent slot contains the ID of its parent revision, and its 1.276 -second contains the null ID, indicating that the revision has only one 1.277 -real parent. Any two revisions that have the same parent ID are 1.278 -branches. A revision that represents a merge between branches has two 1.279 -normal revision IDs in its parent slots. 1.280 -</para> 1.281 - 1.282 -<informalfigure> 1.283 - 1.284 -<para> <mediaobject><imageobject><imagedata fileref="revlog"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.285 - \caption{} 1.286 - \label{fig:concepts:revlog} 1.287 -</para> 1.288 -</informalfigure> 1.289 - 1.290 -</sect2> 1.291 -</sect1> 1.292 -<sect1> 1.293 -<title>The working directory</title> 1.294 - 1.295 -<para>In the working directory, Mercurial stores a snapshot of the files 1.296 -from the repository as of a particular changeset. 1.297 -</para> 1.298 - 1.299 -<para>The working directory <quote>knows</quote> which changeset it contains. When you 1.300 -update the working directory to contain a particular changeset, 1.301 -Mercurial looks up the appropriate revision of the manifest to find 1.302 -out which files it was tracking at the time that changeset was 1.303 -committed, and which revision of each file was then current. It then 1.304 -recreates a copy of each of those files, with the same contents it had 1.305 -when the changeset was committed. 1.306 -</para> 1.307 - 1.308 -<para>The <emphasis>dirstate</emphasis> contains Mercurial's knowledge of the working 1.309 -directory. This details which changeset the working directory is 1.310 -updated to, and all of the files that Mercurial is tracking in the 1.311 -working directory. 1.312 -</para> 1.313 - 1.314 -<para>Just as a revision of a revlog has room for two parents, so that it 1.315 -can represent either a normal revision (with one parent) or a merge of 1.316 -two earlier revisions, the dirstate has slots for two parents. When 1.317 -you use the <command role="hg-cmd">hg update</command> command, the changeset that you update to 1.318 -is stored in the <quote>first parent</quote> slot, and the null ID in the second. 1.319 -When you <command role="hg-cmd">hg merge</command> with another changeset, the first parent 1.320 -remains unchanged, and the second parent is filled in with the 1.321 -changeset you're merging with. The <command role="hg-cmd">hg parents</command> command tells you 1.322 -what the parents of the dirstate are. 1.323 -</para> 1.324 - 1.325 -<sect2> 1.326 -<title>What happens when you commit</title> 1.327 - 1.328 -<para>The dirstate stores parent information for more than just book-keeping 1.329 -purposes. Mercurial uses the parents of the dirstate as \emph{the 1.330 - parents of a new changeset} when you perform a commit. 1.331 -</para> 1.332 - 1.333 -<informalfigure> 1.334 - 1.335 -<para> <mediaobject><imageobject><imagedata fileref="wdir"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.336 - <caption><para>The working directory can have two parents</para></caption> 1.337 - \label{fig:concepts:wdir} 1.338 -</para> 1.339 -</informalfigure> 1.340 - 1.341 -<para>Figure <xref linkend="fig:concepts:wdir"/> shows the normal state of the working 1.342 -directory, where it has a single changeset as parent. That changeset 1.343 -is the <emphasis>tip</emphasis>, the newest changeset in the repository that has no 1.344 -children. 1.345 -</para> 1.346 - 1.347 -<informalfigure> 1.348 - 1.349 -<para> <mediaobject><imageobject><imagedata fileref="wdir-after-commit"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.350 - <caption><para>The working directory gains new parents after a commit</para></caption> 1.351 - \label{fig:concepts:wdir-after-commit} 1.352 -</para> 1.353 -</informalfigure> 1.354 - 1.355 -<para>It's useful to think of the working directory as <quote>the changeset I'm 1.356 -about to commit</quote>. Any files that you tell Mercurial that you've 1.357 -added, removed, renamed, or copied will be reflected in that 1.358 -changeset, as will modifications to any files that Mercurial is 1.359 -already tracking; the new changeset will have the parents of the 1.360 -working directory as its parents. 1.361 -</para> 1.362 - 1.363 -<para>After a commit, Mercurial will update the parents of the working 1.364 -directory, so that the first parent is the ID of the new changeset, 1.365 -and the second is the null ID. This is shown in 1.366 -figure <xref linkend="fig:concepts:wdir-after-commit"/>. Mercurial doesn't touch 1.367 -any of the files in the working directory when you commit; it just 1.368 -modifies the dirstate to note its new parents. 1.369 -</para> 1.370 - 1.371 -</sect2> 1.372 -<sect2> 1.373 -<title>Creating a new head</title> 1.374 - 1.375 -<para>It's perfectly normal to update the working directory to a changeset 1.376 -other than the current tip. For example, you might want to know what 1.377 -your project looked like last Tuesday, or you could be looking through 1.378 -changesets to see which one introduced a bug. In cases like this, the 1.379 -natural thing to do is update the working directory to the changeset 1.380 -you're interested in, and then examine the files in the working 1.381 -directory directly to see their contents as they were when you 1.382 -committed that changeset. The effect of this is shown in 1.383 -figure <xref linkend="fig:concepts:wdir-pre-branch"/>. 1.384 -</para> 1.385 - 1.386 -<informalfigure> 1.387 - 1.388 -<para> <mediaobject><imageobject><imagedata fileref="wdir-pre-branch"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.389 - <caption><para>The working directory, updated to an older changeset</para></caption> 1.390 - \label{fig:concepts:wdir-pre-branch} 1.391 -</para> 1.392 -</informalfigure> 1.393 - 1.394 -<para>Having updated the working directory to an older changeset, what 1.395 -happens if you make some changes, and then commit? Mercurial behaves 1.396 -in the same way as I outlined above. The parents of the working 1.397 -directory become the parents of the new changeset. This new changeset 1.398 -has no children, so it becomes the new tip. And the repository now 1.399 -contains two changesets that have no children; we call these 1.400 -<emphasis>heads</emphasis>. You can see the structure that this creates in 1.401 -figure <xref linkend="fig:concepts:wdir-branch"/>. 1.402 -</para> 1.403 - 1.404 -<informalfigure> 1.405 - 1.406 -<para> <mediaobject><imageobject><imagedata fileref="wdir-branch"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.407 - <caption><para>After a commit made while synced to an older changeset</para></caption> 1.408 - \label{fig:concepts:wdir-branch} 1.409 -</para> 1.410 -</informalfigure> 1.411 - 1.412 -<note> 1.413 -<para> If you're new to Mercurial, you should keep in mind a common 1.414 - <quote>error</quote>, which is to use the <command role="hg-cmd">hg pull</command> command without any 1.415 - options. By default, the <command role="hg-cmd">hg pull</command> command <emphasis>does not</emphasis> 1.416 - update the working directory, so you'll bring new changesets into 1.417 - your repository, but the working directory will stay synced at the 1.418 - same changeset as before the pull. If you make some changes and 1.419 - commit afterwards, you'll thus create a new head, because your 1.420 - working directory isn't synced to whatever the current tip is. 1.421 -</para> 1.422 - 1.423 -<para> I put the word <quote>error</quote> in quotes because all that you need to do 1.424 - to rectify this situation is <command role="hg-cmd">hg merge</command>, then <command role="hg-cmd">hg commit</command>. In 1.425 - other words, this almost never has negative consequences; it just 1.426 - surprises people. I'll discuss other ways to avoid this behaviour, 1.427 - and why Mercurial behaves in this initially surprising way, later 1.428 - on. 1.429 -</para> 1.430 -</note> 1.431 - 1.432 -</sect2> 1.433 -<sect2> 1.434 -<title>Merging heads</title> 1.435 - 1.436 -<para>When you run the <command role="hg-cmd">hg merge</command> command, Mercurial leaves the first 1.437 -parent of the working directory unchanged, and sets the second parent 1.438 -to the changeset you're merging with, as shown in 1.439 -figure <xref linkend="fig:concepts:wdir-merge"/>. 1.440 -</para> 1.441 - 1.442 -<informalfigure> 1.443 - 1.444 -<para> <mediaobject><imageobject><imagedata fileref="wdir-merge"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject> 1.445 - <caption><para>Merging two heads</para></caption> 1.446 - \label{fig:concepts:wdir-merge} 1.447 -</para> 1.448 -</informalfigure> 1.449 - 1.450 -<para>Mercurial also has to modify the working directory, to merge the files 1.451 -managed in the two changesets. Simplified a little, the merging 1.452 -process goes like this, for every file in the manifests of both 1.453 -changesets. 1.454 -</para> 1.455 -<itemizedlist> 1.456 -<listitem><para>If neither changeset has modified a file, do nothing with that 1.457 - file. 1.458 -</para> 1.459 -</listitem> 1.460 -<listitem><para>If one changeset has modified a file, and the other hasn't, 1.461 - create the modified copy of the file in the working directory. 1.462 -</para> 1.463 -</listitem> 1.464 -<listitem><para>If one changeset has removed a file, and the other hasn't (or 1.465 - has also deleted it), delete the file from the working directory. 1.466 -</para> 1.467 -</listitem> 1.468 -<listitem><para>If one changeset has removed a file, but the other has modified 1.469 - the file, ask the user what to do: keep the modified file, or remove 1.470 - it? 1.471 -</para> 1.472 -</listitem> 1.473 -<listitem><para>If both changesets have modified a file, invoke an external 1.474 - merge program to choose the new contents for the merged file. This 1.475 - may require input from the user. 1.476 -</para> 1.477 -</listitem> 1.478 -<listitem><para>If one changeset has modified a file, and the other has renamed 1.479 - or copied the file, make sure that the changes follow the new name 1.480 - of the file. 1.481 -</para> 1.482 -</listitem></itemizedlist> 1.483 -<para>There are more details&emdash;merging has plenty of corner cases&emdash;but 1.484 -these are the most common choices that are involved in a merge. As 1.485 -you can see, most cases are completely automatic, and indeed most 1.486 -merges finish automatically, without requiring your input to resolve 1.487 -any conflicts. 1.488 -</para> 1.489 - 1.490 -<para>When you're thinking about what happens when you commit after a merge, 1.491 -once again the working directory is <quote>the changeset I'm about to 1.492 -commit</quote>. After the <command role="hg-cmd">hg merge</command> command completes, the working 1.493 -directory has two parents; these will become the parents of the new 1.494 -changeset. 1.495 -</para> 1.496 - 1.497 -<para>Mercurial lets you perform multiple merges, but you must commit the 1.498 -results of each individual merge as you go. This is necessary because 1.499 -Mercurial only tracks two parents for both revisions and the working 1.500 -directory. While it would be technically possible to merge multiple 1.501 -changesets at once, the prospect of user confusion and making a 1.502 -terrible mess of a merge immediately becomes overwhelming. 1.503 -</para> 1.504 - 1.505 -</sect2> 1.506 -</sect1> 1.507 -<sect1> 1.508 -<title>Other interesting design features</title> 1.509 - 1.510 -<para>In the sections above, I've tried to highlight some of the most 1.511 -important aspects of Mercurial's design, to illustrate that it pays 1.512 -careful attention to reliability and performance. However, the 1.513 -attention to detail doesn't stop there. There are a number of other 1.514 -aspects of Mercurial's construction that I personally find 1.515 -interesting. I'll detail a few of them here, separate from the <quote>big 1.516 -ticket</quote> items above, so that if you're interested, you can gain a 1.517 -better idea of the amount of thinking that goes into a well-designed 1.518 -system. 1.519 -</para> 1.520 - 1.521 -<sect2> 1.522 -<title>Clever compression</title> 1.523 - 1.524 -<para>When appropriate, Mercurial will store both snapshots and deltas in 1.525 -compressed form. It does this by always <emphasis>trying to</emphasis> compress a 1.526 -snapshot or delta, but only storing the compressed version if it's 1.527 -smaller than the uncompressed version. 1.528 -</para> 1.529 - 1.530 -<para>This means that Mercurial does <quote>the right thing</quote> when storing a file 1.531 -whose native form is compressed, such as a <literal>zip</literal> archive or a 1.532 -JPEG image. When these types of files are compressed a second time, 1.533 -the resulting file is usually bigger than the once-compressed form, 1.534 -and so Mercurial will store the plain <literal>zip</literal> or JPEG. 1.535 -</para> 1.536 - 1.537 -<para>Deltas between revisions of a compressed file are usually larger than 1.538 -snapshots of the file, and Mercurial again does <quote>the right thing</quote> in 1.539 -these cases. It finds that such a delta exceeds the threshold at 1.540 -which it should store a complete snapshot of the file, so it stores 1.541 -the snapshot, again saving space compared to a naive delta-only 1.542 -approach. 1.543 -</para> 1.544 - 1.545 -<sect3> 1.546 -<title>Network recompression</title> 1.547 - 1.548 -<para>When storing revisions on disk, Mercurial uses the <quote>deflate</quote> 1.549 -compression algorithm (the same one used by the popular <literal>zip</literal> 1.550 -archive format), which balances good speed with a respectable 1.551 -compression ratio. However, when transmitting revision data over a 1.552 -network connection, Mercurial uncompresses the compressed revision 1.553 -data. 1.554 -</para> 1.555 - 1.556 -<para>If the connection is over HTTP, Mercurial recompresses the entire 1.557 -stream of data using a compression algorithm that gives a better 1.558 -compression ratio (the Burrows-Wheeler algorithm from the widely used 1.559 -<literal>bzip2</literal> compression package). This combination of algorithm 1.560 -and compression of the entire stream (instead of a revision at a time) 1.561 -substantially reduces the number of bytes to be transferred, yielding 1.562 -better network performance over almost all kinds of network. 1.563 -</para> 1.564 - 1.565 -<para>(If the connection is over <command>ssh</command>, Mercurial <emphasis>doesn't</emphasis> 1.566 -recompress the stream, because <command>ssh</command> can already do this 1.567 -itself.) 1.568 -</para> 1.569 - 1.570 -</sect3> 1.571 -</sect2> 1.572 -<sect2> 1.573 -<title>Read/write ordering and atomicity</title> 1.574 - 1.575 -<para>Appending to files isn't the whole story when it comes to guaranteeing 1.576 -that a reader won't see a partial write. If you recall 1.577 -figure <xref linkend="fig:concepts:metadata"/>, revisions in the changelog point to 1.578 -revisions in the manifest, and revisions in the manifest point to 1.579 -revisions in filelogs. This hierarchy is deliberate. 1.580 -</para> 1.581 - 1.582 -<para>A writer starts a transaction by writing filelog and manifest data, 1.583 -and doesn't write any changelog data until those are finished. A 1.584 -reader starts by reading changelog data, then manifest data, followed 1.585 -by filelog data. 1.586 -</para> 1.587 - 1.588 -<para>Since the writer has always finished writing filelog and manifest data 1.589 -before it writes to the changelog, a reader will never read a pointer 1.590 -to a partially written manifest revision from the changelog, and it will 1.591 -never read a pointer to a partially written filelog revision from the 1.592 -manifest. 1.593 -</para> 1.594 - 1.595 -</sect2> 1.596 -<sect2> 1.597 -<title>Concurrent access</title> 1.598 - 1.599 -<para>The read/write ordering and atomicity guarantees mean that Mercurial 1.600 -never needs to <emphasis>lock</emphasis> a repository when it's reading data, even 1.601 -if the repository is being written to while the read is occurring. 1.602 -This has a big effect on scalability; you can have an arbitrary number 1.603 -of Mercurial processes safely reading data from a repository safely 1.604 -all at once, no matter whether it's being written to or not. 1.605 -</para> 1.606 - 1.607 -<para>The lockless nature of reading means that if you're sharing a 1.608 -repository on a multi-user system, you don't need to grant other local 1.609 -users permission to <emphasis>write</emphasis> to your repository in order for them 1.610 -to be able to clone it or pull changes from it; they only need 1.611 -<emphasis>read</emphasis> permission. (This is <emphasis>not</emphasis> a common feature among 1.612 -revision control systems, so don't take it for granted! Most require 1.613 -readers to be able to lock a repository to access it safely, and this 1.614 -requires write permission on at least one directory, which of course 1.615 -makes for all kinds of nasty and annoying security and administrative 1.616 -problems.) 1.617 -</para> 1.618 - 1.619 -<para>Mercurial uses locks to ensure that only one process can write to a 1.620 -repository at a time (the locking mechanism is safe even over 1.621 -filesystems that are notoriously hostile to locking, such as NFS). If 1.622 -a repository is locked, a writer will wait for a while to retry if the 1.623 -repository becomes unlocked, but if the repository remains locked for 1.624 -too long, the process attempting to write will time out after a while. 1.625 -This means that your daily automated scripts won't get stuck forever 1.626 -and pile up if a system crashes unnoticed, for example. (Yes, the 1.627 -timeout is configurable, from zero to infinity.) 1.628 -</para> 1.629 - 1.630 -<sect3> 1.631 -<title>Safe dirstate access</title> 1.632 - 1.633 -<para>As with revision data, Mercurial doesn't take a lock to read the 1.634 -dirstate file; it does acquire a lock to write it. To avoid the 1.635 -possibility of reading a partially written copy of the dirstate file, 1.636 -Mercurial writes to a file with a unique name in the same directory as 1.637 -the dirstate file, then renames the temporary file atomically to 1.638 -<filename>dirstate</filename>. The file named <filename>dirstate</filename> is thus 1.639 -guaranteed to be complete, not partially written. 1.640 -</para> 1.641 - 1.642 -</sect3> 1.643 -</sect2> 1.644 -<sect2> 1.645 -<title>Avoiding seeks</title> 1.646 - 1.647 -<para>Critical to Mercurial's performance is the avoidance of seeks of the 1.648 -disk head, since any seek is far more expensive than even a 1.649 -comparatively large read operation. 1.650 -</para> 1.651 - 1.652 -<para>This is why, for example, the dirstate is stored in a single file. If 1.653 -there were a dirstate file per directory that Mercurial tracked, the 1.654 -disk would seek once per directory. Instead, Mercurial reads the 1.655 -entire single dirstate file in one step. 1.656 -</para> 1.657 - 1.658 -<para>Mercurial also uses a <quote>copy on write</quote> scheme when cloning a 1.659 -repository on local storage. Instead of copying every revlog file 1.660 -from the old repository into the new repository, it makes a <quote>hard 1.661 -link</quote>, which is a shorthand way to say <quote>these two names point to the 1.662 -same file</quote>. When Mercurial is about to write to one of a revlog's 1.663 -files, it checks to see if the number of names pointing at the file is 1.664 -greater than one. If it is, more than one repository is using the 1.665 -file, so Mercurial makes a new copy of the file that is private to 1.666 -this repository. 1.667 -</para> 1.668 - 1.669 -<para>A few revision control developers have pointed out that this idea of 1.670 -making a complete private copy of a file is not very efficient in its 1.671 -use of storage. While this is true, storage is cheap, and this method 1.672 -gives the highest performance while deferring most book-keeping to the 1.673 -operating system. An alternative scheme would most likely reduce 1.674 -performance and increase the complexity of the software, each of which 1.675 -is much more important to the <quote>feel</quote> of day-to-day use. 1.676 -</para> 1.677 - 1.678 -</sect2> 1.679 -<sect2> 1.680 -<title>Other contents of the dirstate</title> 1.681 - 1.682 -<para>Because Mercurial doesn't force you to tell it when you're modifying a 1.683 -file, it uses the dirstate to store some extra information so it can 1.684 -determine efficiently whether you have modified a file. For each file 1.685 -in the working directory, it stores the time that it last modified the 1.686 -file itself, and the size of the file at that time. 1.687 -</para> 1.688 - 1.689 -<para>When you explicitly <command role="hg-cmd">hg add</command>, <command role="hg-cmd">hg remove</command>, <command role="hg-cmd">hg rename</command> or 1.690 -<command role="hg-cmd">hg copy</command> files, Mercurial updates the dirstate so that it knows 1.691 -what to do with those files when you commit. 1.692 -</para> 1.693 - 1.694 -<para>When Mercurial is checking the states of files in the working 1.695 -directory, it first checks a file's modification time. If that has 1.696 -not changed, the file must not have been modified. If the file's size 1.697 -has changed, the file must have been modified. If the modification 1.698 -time has changed, but the size has not, only then does Mercurial need 1.699 -to read the actual contents of the file to see if they've changed. 1.700 -Storing these few extra pieces of information dramatically reduces the 1.701 -amount of data that Mercurial needs to read, which yields large 1.702 -performance improvements compared to other revision control systems. 1.703 -</para> 1.704 - 1.705 -</sect2> 1.706 -</sect1> 1.707 +<chapter id="chap:concepts"> 1.708 + <?dbhtml filename="behind-the-scenes.html"?> 1.709 + <title>Behind the scenes</title> 1.710 + 1.711 + <para id="x_2e8">Unlike many revision control systems, the concepts 1.712 + upon which Mercurial is built are simple enough that it's easy to 1.713 + understand how the software really works. Knowing these details 1.714 + certainly isn't necessary, so it is certainly safe to skip this 1.715 + chapter. However, I think you will get more out of the software 1.716 + with a <quote>mental model</quote> of what's going on.</para> 1.717 + 1.718 + <para id="x_2e9">Being able to understand what's going on behind the 1.719 + scenes gives me confidence that Mercurial has been carefully 1.720 + designed to be both <emphasis>safe</emphasis> and 1.721 + <emphasis>efficient</emphasis>. And just as importantly, if it's 1.722 + easy for me to retain a good idea of what the software is doing 1.723 + when I perform a revision control task, I'm less likely to be 1.724 + surprised by its behavior.</para> 1.725 + 1.726 + <para id="x_2ea">In this chapter, we'll initially cover the core concepts 1.727 + behind Mercurial's design, then continue to discuss some of the 1.728 + interesting details of its implementation.</para> 1.729 + 1.730 + <sect1> 1.731 + <title>Mercurial's historical record</title> 1.732 + 1.733 + <sect2> 1.734 + <title>Tracking the history of a single file</title> 1.735 + 1.736 + <para id="x_2eb">When Mercurial tracks modifications to a file, it stores 1.737 + the history of that file in a metadata object called a 1.738 + <emphasis>filelog</emphasis>. Each entry in the filelog 1.739 + contains enough information to reconstruct one revision of the 1.740 + file that is being tracked. Filelogs are stored as files in 1.741 + the <filename role="special" 1.742 + class="directory">.hg/store/data</filename> directory. A 1.743 + filelog contains two kinds of information: revision data, and 1.744 + an index to help Mercurial to find a revision 1.745 + efficiently.</para> 1.746 + 1.747 + <para id="x_2ec">A file that is large, or has a lot of history, has its 1.748 + filelog stored in separate data 1.749 + (<quote><literal>.d</literal></quote> suffix) and index 1.750 + (<quote><literal>.i</literal></quote> suffix) files. For 1.751 + small files without much history, the revision data and index 1.752 + are combined in a single <quote><literal>.i</literal></quote> 1.753 + file. The correspondence between a file in the working 1.754 + directory and the filelog that tracks its history in the 1.755 + repository is illustrated in <xref 1.756 + linkend="fig:concepts:filelog"/>.</para> 1.757 + 1.758 + <figure id="fig:concepts:filelog"> 1.759 + <title>Relationships between files in working directory and 1.760 + filelogs in repository</title> 1.761 + <mediaobject> 1.762 + <imageobject><imagedata fileref="figs/filelog.png"/></imageobject> 1.763 + <textobject><phrase>XXX add text</phrase></textobject> 1.764 + </mediaobject> 1.765 + </figure> 1.766 + 1.767 + </sect2> 1.768 + <sect2> 1.769 + <title>Managing tracked files</title> 1.770 + 1.771 + <para id="x_2ee">Mercurial uses a structure called a 1.772 + <emphasis>manifest</emphasis> to collect together information 1.773 + about the files that it tracks. Each entry in the manifest 1.774 + contains information about the files present in a single 1.775 + changeset. An entry records which files are present in the 1.776 + changeset, the revision of each file, and a few other pieces 1.777 + of file metadata.</para> 1.778 + 1.779 + </sect2> 1.780 + <sect2> 1.781 + <title>Recording changeset information</title> 1.782 + 1.783 + <para id="x_2ef">The <emphasis>changelog</emphasis> contains information 1.784 + about each changeset. Each revision records who committed a 1.785 + change, the changeset comment, other pieces of 1.786 + changeset-related information, and the revision of the 1.787 + manifest to use.</para> 1.788 + 1.789 + </sect2> 1.790 + <sect2> 1.791 + <title>Relationships between revisions</title> 1.792 + 1.793 + <para id="x_2f0">Within a changelog, a manifest, or a filelog, each 1.794 + revision stores a pointer to its immediate parent (or to its 1.795 + two parents, if it's a merge revision). As I mentioned above, 1.796 + there are also relationships between revisions 1.797 + <emphasis>across</emphasis> these structures, and they are 1.798 + hierarchical in nature.</para> 1.799 + 1.800 + <para id="x_2f1">For every changeset in a repository, there is exactly one 1.801 + revision stored in the changelog. Each revision of the 1.802 + changelog contains a pointer to a single revision of the 1.803 + manifest. A revision of the manifest stores a pointer to a 1.804 + single revision of each filelog tracked when that changeset 1.805 + was created. These relationships are illustrated in 1.806 + <xref linkend="fig:concepts:metadata"/>.</para> 1.807 + 1.808 + <figure id="fig:concepts:metadata"> 1.809 + <title>Metadata relationships</title> 1.810 + <mediaobject> 1.811 + <imageobject><imagedata fileref="figs/metadata.png"/></imageobject> 1.812 + <textobject><phrase>XXX add text</phrase></textobject> 1.813 + </mediaobject> 1.814 + </figure> 1.815 + 1.816 + <para id="x_2f3">As the illustration shows, there is 1.817 + <emphasis>not</emphasis> a <quote>one to one</quote> 1.818 + relationship between revisions in the changelog, manifest, or 1.819 + filelog. If a file that 1.820 + Mercurial tracks hasn't changed between two changesets, the 1.821 + entry for that file in the two revisions of the manifest will 1.822 + point to the same revision of its filelog<footnote> 1.823 + <para id="x_725">It is possible (though unusual) for the manifest to 1.824 + remain the same between two changesets, in which case the 1.825 + changelog entries for those changesets will point to the 1.826 + same revision of the manifest.</para> 1.827 + </footnote>.</para> 1.828 + 1.829 + </sect2> 1.830 + </sect1> 1.831 + <sect1> 1.832 + <title>Safe, efficient storage</title> 1.833 + 1.834 + <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are 1.835 + provided by a single structure called the 1.836 + <emphasis>revlog</emphasis>.</para> 1.837 + 1.838 + <sect2> 1.839 + <title>Efficient storage</title> 1.840 + 1.841 + <para id="x_2f5">The revlog provides efficient storage of revisions using a 1.842 + <emphasis>delta</emphasis> mechanism. Instead of storing a 1.843 + complete copy of a file for each revision, it stores the 1.844 + changes needed to transform an older revision into the new 1.845 + revision. For many kinds of file data, these deltas are 1.846 + typically a fraction of a percent of the size of a full copy 1.847 + of a file.</para> 1.848 + 1.849 + <para id="x_2f6">Some obsolete revision control systems can only work with 1.850 + deltas of text files. They must either store binary files as 1.851 + complete snapshots or encoded into a text representation, both 1.852 + of which are wasteful approaches. Mercurial can efficiently 1.853 + handle deltas of files with arbitrary binary contents; it 1.854 + doesn't need to treat text as special.</para> 1.855 + 1.856 + </sect2> 1.857 + <sect2 id="sec:concepts:txn"> 1.858 + <title>Safe operation</title> 1.859 + 1.860 + <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to 1.861 + the end of a revlog file. It never modifies a section of a 1.862 + file after it has written it. This is both more robust and 1.863 + efficient than schemes that need to modify or rewrite 1.864 + data.</para> 1.865 + 1.866 + <para id="x_2f8">In addition, Mercurial treats every write as part of a 1.867 + <emphasis>transaction</emphasis> that can span a number of 1.868 + files. A transaction is <emphasis>atomic</emphasis>: either 1.869 + the entire transaction succeeds and its effects are all 1.870 + visible to readers in one go, or the whole thing is undone. 1.871 + This guarantee of atomicity means that if you're running two 1.872 + copies of Mercurial, where one is reading data and one is 1.873 + writing it, the reader will never see a partially written 1.874 + result that might confuse it.</para> 1.875 + 1.876 + <para id="x_2f9">The fact that Mercurial only appends to files makes it 1.877 + easier to provide this transactional guarantee. The easier it 1.878 + is to do stuff like this, the more confident you should be 1.879 + that it's done correctly.</para> 1.880 + 1.881 + </sect2> 1.882 + <sect2> 1.883 + <title>Fast retrieval</title> 1.884 + 1.885 + <para id="x_2fa">Mercurial cleverly avoids a pitfall common to 1.886 + all earlier revision control systems: the problem of 1.887 + <emphasis>inefficient retrieval</emphasis>. Most revision 1.888 + control systems store the contents of a revision as an 1.889 + incremental series of modifications against a 1.890 + <quote>snapshot</quote>. (Some base the snapshot on the 1.891 + oldest revision, others on the newest.) To reconstruct a 1.892 + specific revision, you must first read the snapshot, and then 1.893 + every one of the revisions between the snapshot and your 1.894 + target revision. The more history that a file accumulates, 1.895 + the more revisions you must read, hence the longer it takes to 1.896 + reconstruct a particular revision.</para> 1.897 + 1.898 + <figure id="fig:concepts:snapshot"> 1.899 + <title>Snapshot of a revlog, with incremental deltas</title> 1.900 + <mediaobject> 1.901 + <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject> 1.902 + <textobject><phrase>XXX add text</phrase></textobject> 1.903 + </mediaobject> 1.904 + </figure> 1.905 + 1.906 + <para id="x_2fc">The innovation that Mercurial applies to this problem is 1.907 + simple but effective. Once the cumulative amount of delta 1.908 + information stored since the last snapshot exceeds a fixed 1.909 + threshold, it stores a new snapshot (compressed, of course), 1.910 + instead of another delta. This makes it possible to 1.911 + reconstruct <emphasis>any</emphasis> revision of a file 1.912 + quickly. This approach works so well that it has since been 1.913 + copied by several other revision control systems.</para> 1.914 + 1.915 + <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates 1.916 + the idea. In an entry in a revlog's index file, Mercurial 1.917 + stores the range of entries from the data file that it must 1.918 + read to reconstruct a particular revision.</para> 1.919 + 1.920 + <sect3> 1.921 + <title>Aside: the influence of video compression</title> 1.922 + 1.923 + <para id="x_2fe">If you're familiar with video compression or 1.924 + have ever watched a TV feed through a digital cable or 1.925 + satellite service, you may know that most video compression 1.926 + schemes store each frame of video as a delta against its 1.927 + predecessor frame.</para> 1.928 + 1.929 + <para id="x_2ff">Mercurial borrows this idea to make it 1.930 + possible to reconstruct a revision from a snapshot and a 1.931 + small number of deltas.</para> 1.932 + 1.933 + </sect3> 1.934 + </sect2> 1.935 + <sect2> 1.936 + <title>Identification and strong integrity</title> 1.937 + 1.938 + <para id="x_300">Along with delta or snapshot information, a revlog entry 1.939 + contains a cryptographic hash of the data that it represents. 1.940 + This makes it difficult to forge the contents of a revision, 1.941 + and easy to detect accidental corruption.</para> 1.942 + 1.943 + <para id="x_301">Hashes provide more than a mere check against corruption; 1.944 + they are used as the identifiers for revisions. The changeset 1.945 + identification hashes that you see as an end user are from 1.946 + revisions of the changelog. Although filelogs and the 1.947 + manifest also use hashes, Mercurial only uses these behind the 1.948 + scenes.</para> 1.949 + 1.950 + <para id="x_302">Mercurial verifies that hashes are correct when it 1.951 + retrieves file revisions and when it pulls changes from 1.952 + another repository. If it encounters an integrity problem, it 1.953 + will complain and stop whatever it's doing.</para> 1.954 + 1.955 + <para id="x_303">In addition to the effect it has on retrieval efficiency, 1.956 + Mercurial's use of periodic snapshots makes it more robust 1.957 + against partial data corruption. If a revlog becomes partly 1.958 + corrupted due to a hardware error or system bug, it's often 1.959 + possible to reconstruct some or most revisions from the 1.960 + uncorrupted sections of the revlog, both before and after the 1.961 + corrupted section. This would not be possible with a 1.962 + delta-only storage model.</para> 1.963 + </sect2> 1.964 + </sect1> 1.965 + 1.966 + <sect1> 1.967 + <title>Revision history, branching, and merging</title> 1.968 + 1.969 + <para id="x_304">Every entry in a Mercurial revlog knows the identity of its 1.970 + immediate ancestor revision, usually referred to as its 1.971 + <emphasis>parent</emphasis>. In fact, a revision contains room 1.972 + for not one parent, but two. Mercurial uses a special hash, 1.973 + called the <quote>null ID</quote>, to represent the idea 1.974 + <quote>there is no parent here</quote>. This hash is simply a 1.975 + string of zeroes.</para> 1.976 + 1.977 + <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see 1.978 + an example of the conceptual structure of a revlog. Filelogs, 1.979 + manifests, and changelogs all have this same structure; they 1.980 + differ only in the kind of data stored in each delta or 1.981 + snapshot.</para> 1.982 + 1.983 + <para id="x_306">The first revision in a revlog (at the bottom of the image) 1.984 + has the null ID in both of its parent slots. For a 1.985 + <quote>normal</quote> revision, its first parent slot contains 1.986 + the ID of its parent revision, and its second contains the null 1.987 + ID, indicating that the revision has only one real parent. Any 1.988 + two revisions that have the same parent ID are branches. A 1.989 + revision that represents a merge between branches has two normal 1.990 + revision IDs in its parent slots.</para> 1.991 + 1.992 + <figure id="fig:concepts:revlog"> 1.993 + <title>The conceptual structure of a revlog</title> 1.994 + <mediaobject> 1.995 + <imageobject><imagedata fileref="figs/revlog.png"/></imageobject> 1.996 + <textobject><phrase>XXX add text</phrase></textobject> 1.997 + </mediaobject> 1.998 + </figure> 1.999 + 1.1000 + </sect1> 1.1001 + <sect1> 1.1002 + <title>The working directory</title> 1.1003 + 1.1004 + <para id="x_307">In the working directory, Mercurial stores a snapshot of the 1.1005 + files from the repository as of a particular changeset.</para> 1.1006 + 1.1007 + <para id="x_308">The working directory <quote>knows</quote> which changeset 1.1008 + it contains. When you update the working directory to contain a 1.1009 + particular changeset, Mercurial looks up the appropriate 1.1010 + revision of the manifest to find out which files it was tracking 1.1011 + at the time that changeset was committed, and which revision of 1.1012 + each file was then current. It then recreates a copy of each of 1.1013 + those files, with the same contents it had when the changeset 1.1014 + was committed.</para> 1.1015 + 1.1016 + <para id="x_309">The <emphasis>dirstate</emphasis> is a special 1.1017 + structure that contains Mercurial's knowledge of the working 1.1018 + directory. It is maintained as a file named 1.1019 + <filename>.hg/dirstate</filename> inside a repository. The 1.1020 + dirstate details which changeset the working directory is 1.1021 + updated to, and all of the files that Mercurial is tracking in 1.1022 + the working directory. It also lets Mercurial quickly notice 1.1023 + changed files, by recording their checkout times and 1.1024 + sizes.</para> 1.1025 + 1.1026 + <para id="x_30a">Just as a revision of a revlog has room for two parents, so 1.1027 + that it can represent either a normal revision (with one parent) 1.1028 + or a merge of two earlier revisions, the dirstate has slots for 1.1029 + two parents. When you use the <command role="hg-cmd">hg 1.1030 + update</command> command, the changeset that you update to is 1.1031 + stored in the <quote>first parent</quote> slot, and the null ID 1.1032 + in the second. When you <command role="hg-cmd">hg 1.1033 + merge</command> with another changeset, the first parent 1.1034 + remains unchanged, and the second parent is filled in with the 1.1035 + changeset you're merging with. The <command role="hg-cmd">hg 1.1036 + parents</command> command tells you what the parents of the 1.1037 + dirstate are.</para> 1.1038 + 1.1039 + <sect2> 1.1040 + <title>What happens when you commit</title> 1.1041 + 1.1042 + <para id="x_30b">The dirstate stores parent information for more than just 1.1043 + book-keeping purposes. Mercurial uses the parents of the 1.1044 + dirstate as <emphasis>the parents of a new 1.1045 + changeset</emphasis> when you perform a commit.</para> 1.1046 + 1.1047 + <figure id="fig:concepts:wdir"> 1.1048 + <title>The working directory can have two parents</title> 1.1049 + <mediaobject> 1.1050 + <imageobject><imagedata fileref="figs/wdir.png"/></imageobject> 1.1051 + <textobject><phrase>XXX add text</phrase></textobject> 1.1052 + </mediaobject> 1.1053 + </figure> 1.1054 + 1.1055 + <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the 1.1056 + normal state of the working directory, where it has a single 1.1057 + changeset as parent. That changeset is the 1.1058 + <emphasis>tip</emphasis>, the newest changeset in the 1.1059 + repository that has no children.</para> 1.1060 + 1.1061 + <figure id="fig:concepts:wdir-after-commit"> 1.1062 + <title>The working directory gains new parents after a 1.1063 + commit</title> 1.1064 + <mediaobject> 1.1065 + <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject> 1.1066 + <textobject><phrase>XXX add text</phrase></textobject> 1.1067 + </mediaobject> 1.1068 + </figure> 1.1069 + 1.1070 + <para id="x_30f">It's useful to think of the working directory as 1.1071 + <quote>the changeset I'm about to commit</quote>. Any files 1.1072 + that you tell Mercurial that you've added, removed, renamed, 1.1073 + or copied will be reflected in that changeset, as will 1.1074 + modifications to any files that Mercurial is already tracking; 1.1075 + the new changeset will have the parents of the working 1.1076 + directory as its parents.</para> 1.1077 + 1.1078 + <para id="x_310">After a commit, Mercurial will update the 1.1079 + parents of the working directory, so that the first parent is 1.1080 + the ID of the new changeset, and the second is the null ID. 1.1081 + This is shown in <xref 1.1082 + linkend="fig:concepts:wdir-after-commit"/>. Mercurial 1.1083 + doesn't touch any of the files in the working directory when 1.1084 + you commit; it just modifies the dirstate to note its new 1.1085 + parents.</para> 1.1086 + 1.1087 + </sect2> 1.1088 + <sect2> 1.1089 + <title>Creating a new head</title> 1.1090 + 1.1091 + <para id="x_311">It's perfectly normal to update the working directory to a 1.1092 + changeset other than the current tip. For example, you might 1.1093 + want to know what your project looked like last Tuesday, or 1.1094 + you could be looking through changesets to see which one 1.1095 + introduced a bug. In cases like this, the natural thing to do 1.1096 + is update the working directory to the changeset you're 1.1097 + interested in, and then examine the files in the working 1.1098 + directory directly to see their contents as they were when you 1.1099 + committed that changeset. The effect of this is shown in 1.1100 + <xref linkend="fig:concepts:wdir-pre-branch"/>.</para> 1.1101 + 1.1102 + <figure id="fig:concepts:wdir-pre-branch"> 1.1103 + <title>The working directory, updated to an older 1.1104 + changeset</title> 1.1105 + <mediaobject> 1.1106 + <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject> 1.1107 + <textobject><phrase>XXX add text</phrase></textobject> 1.1108 + </mediaobject> 1.1109 + </figure> 1.1110 + 1.1111 + <para id="x_313">Having updated the working directory to an 1.1112 + older changeset, what happens if you make some changes, and 1.1113 + then commit? Mercurial behaves in the same way as I outlined 1.1114 + above. The parents of the working directory become the 1.1115 + parents of the new changeset. This new changeset has no 1.1116 + children, so it becomes the new tip. And the repository now 1.1117 + contains two changesets that have no children; we call these 1.1118 + <emphasis>heads</emphasis>. You can see the structure that 1.1119 + this creates in <xref 1.1120 + linkend="fig:concepts:wdir-branch"/>.</para> 1.1121 + 1.1122 + <figure id="fig:concepts:wdir-branch"> 1.1123 + <title>After a commit made while synced to an older 1.1124 + changeset</title> 1.1125 + <mediaobject> 1.1126 + <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject> 1.1127 + <textobject><phrase>XXX add text</phrase></textobject> 1.1128 + </mediaobject> 1.1129 + </figure> 1.1130 + 1.1131 + <note> 1.1132 + <para id="x_315">If you're new to Mercurial, you should keep 1.1133 + in mind a common <quote>error</quote>, which is to use the 1.1134 + <command role="hg-cmd">hg pull</command> command without any 1.1135 + options. By default, the <command role="hg-cmd">hg 1.1136 + pull</command> command <emphasis>does not</emphasis> 1.1137 + update the working directory, so you'll bring new changesets 1.1138 + into your repository, but the working directory will stay 1.1139 + synced at the same changeset as before the pull. If you 1.1140 + make some changes and commit afterwards, you'll thus create 1.1141 + a new head, because your working directory isn't synced to 1.1142 + whatever the current tip is. To combine the operation of a 1.1143 + pull, followed by an update, run <command>hg pull 1.1144 + -u</command>.</para> 1.1145 + 1.1146 + <para id="x_316">I put the word <quote>error</quote> in quotes 1.1147 + because all that you need to do to rectify the situation 1.1148 + where you created a new head by accident is 1.1149 + <command role="hg-cmd">hg merge</command>, then <command 1.1150 + role="hg-cmd">hg commit</command>. In other words, this 1.1151 + almost never has negative consequences; it's just something 1.1152 + of a surprise for newcomers. I'll discuss other ways to 1.1153 + avoid this behavior, and why Mercurial behaves in this 1.1154 + initially surprising way, later on.</para> 1.1155 + </note> 1.1156 + 1.1157 + </sect2> 1.1158 + <sect2> 1.1159 + <title>Merging changes</title> 1.1160 + 1.1161 + <para id="x_317">When you run the <command role="hg-cmd">hg 1.1162 + merge</command> command, Mercurial leaves the first parent 1.1163 + of the working directory unchanged, and sets the second parent 1.1164 + to the changeset you're merging with, as shown in <xref 1.1165 + linkend="fig:concepts:wdir-merge"/>.</para> 1.1166 + 1.1167 + <figure id="fig:concepts:wdir-merge"> 1.1168 + <title>Merging two heads</title> 1.1169 + <mediaobject> 1.1170 + <imageobject> 1.1171 + <imagedata fileref="figs/wdir-merge.png"/> 1.1172 + </imageobject> 1.1173 + <textobject><phrase>XXX add text</phrase></textobject> 1.1174 + </mediaobject> 1.1175 + </figure> 1.1176 + 1.1177 + <para id="x_319">Mercurial also has to modify the working directory, to 1.1178 + merge the files managed in the two changesets. Simplified a 1.1179 + little, the merging process goes like this, for every file in 1.1180 + the manifests of both changesets.</para> 1.1181 + <itemizedlist> 1.1182 + <listitem><para id="x_31a">If neither changeset has modified a file, do 1.1183 + nothing with that file.</para> 1.1184 + </listitem> 1.1185 + <listitem><para id="x_31b">If one changeset has modified a file, and the 1.1186 + other hasn't, create the modified copy of the file in the 1.1187 + working directory.</para> 1.1188 + </listitem> 1.1189 + <listitem><para id="x_31c">If one changeset has removed a file, and the 1.1190 + other hasn't (or has also deleted it), delete the file 1.1191 + from the working directory.</para> 1.1192 + </listitem> 1.1193 + <listitem><para id="x_31d">If one changeset has removed a file, but the 1.1194 + other has modified the file, ask the user what to do: keep 1.1195 + the modified file, or remove it?</para> 1.1196 + </listitem> 1.1197 + <listitem><para id="x_31e">If both changesets have modified a file, 1.1198 + invoke an external merge program to choose the new 1.1199 + contents for the merged file. This may require input from 1.1200 + the user.</para> 1.1201 + </listitem> 1.1202 + <listitem><para id="x_31f">If one changeset has modified a file, and the 1.1203 + other has renamed or copied the file, make sure that the 1.1204 + changes follow the new name of the file.</para> 1.1205 + </listitem></itemizedlist> 1.1206 + <para id="x_320">There are more details&emdash;merging has plenty of corner 1.1207 + cases&emdash;but these are the most common choices that are 1.1208 + involved in a merge. As you can see, most cases are 1.1209 + completely automatic, and indeed most merges finish 1.1210 + automatically, without requiring your input to resolve any 1.1211 + conflicts.</para> 1.1212 + 1.1213 + <para id="x_321">When you're thinking about what happens when you commit 1.1214 + after a merge, once again the working directory is <quote>the 1.1215 + changeset I'm about to commit</quote>. After the <command 1.1216 + role="hg-cmd">hg merge</command> command completes, the 1.1217 + working directory has two parents; these will become the 1.1218 + parents of the new changeset.</para> 1.1219 + 1.1220 + <para id="x_322">Mercurial lets you perform multiple merges, but 1.1221 + you must commit the results of each individual merge as you 1.1222 + go. This is necessary because Mercurial only tracks two 1.1223 + parents for both revisions and the working directory. While 1.1224 + it would be technically feasible to merge multiple changesets 1.1225 + at once, Mercurial avoids this for simplicity. With multi-way 1.1226 + merges, the risks of user confusion, nasty conflict 1.1227 + resolution, and making a terrible mess of a merge would grow 1.1228 + intolerable.</para> 1.1229 + 1.1230 + </sect2> 1.1231 + 1.1232 + <sect2> 1.1233 + <title>Merging and renames</title> 1.1234 + 1.1235 + <para id="x_69a">A surprising number of revision control systems pay little 1.1236 + or no attention to a file's <emphasis>name</emphasis> over 1.1237 + time. For instance, it used to be common that if a file got 1.1238 + renamed on one side of a merge, the changes from the other 1.1239 + side would be silently dropped.</para> 1.1240 + 1.1241 + <para id="x_69b">Mercurial records metadata when you tell it to perform a 1.1242 + rename or copy. It uses this metadata during a merge to do the 1.1243 + right thing in the case of a merge. For instance, if I rename 1.1244 + a file, and you edit it without renaming it, when we merge our 1.1245 + work the file will be renamed and have your edits 1.1246 + applied.</para> 1.1247 + </sect2> 1.1248 + </sect1> 1.1249 + 1.1250 + <sect1> 1.1251 + <title>Other interesting design features</title> 1.1252 + 1.1253 + <para id="x_323">In the sections above, I've tried to highlight some of the 1.1254 + most important aspects of Mercurial's design, to illustrate that 1.1255 + it pays careful attention to reliability and performance. 1.1256 + However, the attention to detail doesn't stop there. There are 1.1257 + a number of other aspects of Mercurial's construction that I 1.1258 + personally find interesting. I'll detail a few of them here, 1.1259 + separate from the <quote>big ticket</quote> items above, so that 1.1260 + if you're interested, you can gain a better idea of the amount 1.1261 + of thinking that goes into a well-designed system.</para> 1.1262 + 1.1263 + <sect2> 1.1264 + <title>Clever compression</title> 1.1265 + 1.1266 + <para id="x_324">When appropriate, Mercurial will store both snapshots and 1.1267 + deltas in compressed form. It does this by always 1.1268 + <emphasis>trying to</emphasis> compress a snapshot or delta, 1.1269 + but only storing the compressed version if it's smaller than 1.1270 + the uncompressed version.</para> 1.1271 + 1.1272 + <para id="x_325">This means that Mercurial does <quote>the right 1.1273 + thing</quote> when storing a file whose native form is 1.1274 + compressed, such as a <literal>zip</literal> archive or a JPEG 1.1275 + image. When these types of files are compressed a second 1.1276 + time, the resulting file is usually bigger than the 1.1277 + once-compressed form, and so Mercurial will store the plain 1.1278 + <literal>zip</literal> or JPEG.</para> 1.1279 + 1.1280 + <para id="x_326">Deltas between revisions of a compressed file are usually 1.1281 + larger than snapshots of the file, and Mercurial again does 1.1282 + <quote>the right thing</quote> in these cases. It finds that 1.1283 + such a delta exceeds the threshold at which it should store a 1.1284 + complete snapshot of the file, so it stores the snapshot, 1.1285 + again saving space compared to a naive delta-only 1.1286 + approach.</para> 1.1287 + 1.1288 + <sect3> 1.1289 + <title>Network recompression</title> 1.1290 + 1.1291 + <para id="x_327">When storing revisions on disk, Mercurial uses the 1.1292 + <quote>deflate</quote> compression algorithm (the same one 1.1293 + used by the popular <literal>zip</literal> archive format), 1.1294 + which balances good speed with a respectable compression 1.1295 + ratio. However, when transmitting revision data over a 1.1296 + network connection, Mercurial uncompresses the compressed 1.1297 + revision data.</para> 1.1298 + 1.1299 + <para id="x_328">If the connection is over HTTP, Mercurial recompresses 1.1300 + the entire stream of data using a compression algorithm that 1.1301 + gives a better compression ratio (the Burrows-Wheeler 1.1302 + algorithm from the widely used <literal>bzip2</literal> 1.1303 + compression package). This combination of algorithm and 1.1304 + compression of the entire stream (instead of a revision at a 1.1305 + time) substantially reduces the number of bytes to be 1.1306 + transferred, yielding better network performance over most 1.1307 + kinds of network.</para> 1.1308 + 1.1309 + <para id="x_329">If the connection is over 1.1310 + <command>ssh</command>, Mercurial 1.1311 + <emphasis>doesn't</emphasis> recompress the stream, because 1.1312 + <command>ssh</command> can already do this itself. You can 1.1313 + tell Mercurial to always use <command>ssh</command>'s 1.1314 + compression feature by editing the 1.1315 + <filename>.hgrc</filename> file in your home directory as 1.1316 + follows.</para> 1.1317 + 1.1318 + <programlisting>[ui] 1.1319 +ssh = ssh -C</programlisting> 1.1320 + 1.1321 + </sect3> 1.1322 + </sect2> 1.1323 + <sect2> 1.1324 + <title>Read/write ordering and atomicity</title> 1.1325 + 1.1326 + <para id="x_32a">Appending to files isn't the whole story when 1.1327 + it comes to guaranteeing that a reader won't see a partial 1.1328 + write. If you recall <xref linkend="fig:concepts:metadata"/>, 1.1329 + revisions in the changelog point to revisions in the manifest, 1.1330 + and revisions in the manifest point to revisions in filelogs. 1.1331 + This hierarchy is deliberate.</para> 1.1332 + 1.1333 + <para id="x_32b">A writer starts a transaction by writing filelog and 1.1334 + manifest data, and doesn't write any changelog data until 1.1335 + those are finished. A reader starts by reading changelog 1.1336 + data, then manifest data, followed by filelog data.</para> 1.1337 + 1.1338 + <para id="x_32c">Since the writer has always finished writing filelog and 1.1339 + manifest data before it writes to the changelog, a reader will 1.1340 + never read a pointer to a partially written manifest revision 1.1341 + from the changelog, and it will never read a pointer to a 1.1342 + partially written filelog revision from the manifest.</para> 1.1343 + 1.1344 + </sect2> 1.1345 + <sect2> 1.1346 + <title>Concurrent access</title> 1.1347 + 1.1348 + <para id="x_32d">The read/write ordering and atomicity guarantees mean that 1.1349 + Mercurial never needs to <emphasis>lock</emphasis> a 1.1350 + repository when it's reading data, even if the repository is 1.1351 + being written to while the read is occurring. This has a big 1.1352 + effect on scalability; you can have an arbitrary number of 1.1353 + Mercurial processes safely reading data from a repository 1.1354 + all at once, no matter whether it's being written to or 1.1355 + not.</para> 1.1356 + 1.1357 + <para id="x_32e">The lockless nature of reading means that if you're 1.1358 + sharing a repository on a multi-user system, you don't need to 1.1359 + grant other local users permission to 1.1360 + <emphasis>write</emphasis> to your repository in order for 1.1361 + them to be able to clone it or pull changes from it; they only 1.1362 + need <emphasis>read</emphasis> permission. (This is 1.1363 + <emphasis>not</emphasis> a common feature among revision 1.1364 + control systems, so don't take it for granted! Most require 1.1365 + readers to be able to lock a repository to access it safely, 1.1366 + and this requires write permission on at least one directory, 1.1367 + which of course makes for all kinds of nasty and annoying 1.1368 + security and administrative problems.)</para> 1.1369 + 1.1370 + <para id="x_32f">Mercurial uses locks to ensure that only one process can 1.1371 + write to a repository at a time (the locking mechanism is safe 1.1372 + even over filesystems that are notoriously hostile to locking, 1.1373 + such as NFS). If a repository is locked, a writer will wait 1.1374 + for a while to retry if the repository becomes unlocked, but 1.1375 + if the repository remains locked for too long, the process 1.1376 + attempting to write will time out after a while. This means 1.1377 + that your daily automated scripts won't get stuck forever and 1.1378 + pile up if a system crashes unnoticed, for example. (Yes, the 1.1379 + timeout is configurable, from zero to infinity.)</para> 1.1380 + 1.1381 + <sect3> 1.1382 + <title>Safe dirstate access</title> 1.1383 + 1.1384 + <para id="x_330">As with revision data, Mercurial doesn't take a lock to 1.1385 + read the dirstate file; it does acquire a lock to write it. 1.1386 + To avoid the possibility of reading a partially written copy 1.1387 + of the dirstate file, Mercurial writes to a file with a 1.1388 + unique name in the same directory as the dirstate file, then 1.1389 + renames the temporary file atomically to 1.1390 + <filename>dirstate</filename>. The file named 1.1391 + <filename>dirstate</filename> is thus guaranteed to be 1.1392 + complete, not partially written.</para> 1.1393 + 1.1394 + </sect3> 1.1395 + </sect2> 1.1396 + <sect2> 1.1397 + <title>Avoiding seeks</title> 1.1398 + 1.1399 + <para id="x_331">Critical to Mercurial's performance is the avoidance of 1.1400 + seeks of the disk head, since any seek is far more expensive 1.1401 + than even a comparatively large read operation.</para> 1.1402 + 1.1403 + <para id="x_332">This is why, for example, the dirstate is stored in a 1.1404 + single file. If there were a dirstate file per directory that 1.1405 + Mercurial tracked, the disk would seek once per directory. 1.1406 + Instead, Mercurial reads the entire single dirstate file in 1.1407 + one step.</para> 1.1408 + 1.1409 + <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme 1.1410 + when cloning a repository on local storage. Instead of 1.1411 + copying every revlog file from the old repository into the new 1.1412 + repository, it makes a <quote>hard link</quote>, which is a 1.1413 + shorthand way to say <quote>these two names point to the same 1.1414 + file</quote>. When Mercurial is about to write to one of a 1.1415 + revlog's files, it checks to see if the number of names 1.1416 + pointing at the file is greater than one. If it is, more than 1.1417 + one repository is using the file, so Mercurial makes a new 1.1418 + copy of the file that is private to this repository.</para> 1.1419 + 1.1420 + <para id="x_334">A few revision control developers have pointed out that 1.1421 + this idea of making a complete private copy of a file is not 1.1422 + very efficient in its use of storage. While this is true, 1.1423 + storage is cheap, and this method gives the highest 1.1424 + performance while deferring most book-keeping to the operating 1.1425 + system. An alternative scheme would most likely reduce 1.1426 + performance and increase the complexity of the software, but 1.1427 + speed and simplicity are key to the <quote>feel</quote> of 1.1428 + day-to-day use.</para> 1.1429 + 1.1430 + </sect2> 1.1431 + <sect2> 1.1432 + <title>Other contents of the dirstate</title> 1.1433 + 1.1434 + <para id="x_335">Because Mercurial doesn't force you to tell it when you're 1.1435 + modifying a file, it uses the dirstate to store some extra 1.1436 + information so it can determine efficiently whether you have 1.1437 + modified a file. For each file in the working directory, it 1.1438 + stores the time that it last modified the file itself, and the 1.1439 + size of the file at that time.</para> 1.1440 + 1.1441 + <para id="x_336">When you explicitly <command role="hg-cmd">hg 1.1442 + add</command>, <command role="hg-cmd">hg remove</command>, 1.1443 + <command role="hg-cmd">hg rename</command> or <command 1.1444 + role="hg-cmd">hg copy</command> files, Mercurial updates the 1.1445 + dirstate so that it knows what to do with those files when you 1.1446 + commit.</para> 1.1447 + 1.1448 + <para id="x_337">The dirstate helps Mercurial to efficiently 1.1449 + check the status of files in a repository.</para> 1.1450 + 1.1451 + <itemizedlist> 1.1452 + <listitem> 1.1453 + <para id="x_726">When Mercurial checks the state of a file in the 1.1454 + working directory, it first checks a file's modification 1.1455 + time against the time in the dirstate that records when 1.1456 + Mercurial last wrote the file. If the last modified time 1.1457 + is the same as the time when Mercurial wrote the file, the 1.1458 + file must not have been modified, so Mercurial does not 1.1459 + need to check any further.</para> 1.1460 + </listitem> 1.1461 + <listitem> 1.1462 + <para id="x_727">If the file's size has changed, the file must have 1.1463 + been modified. If the modification time has changed, but 1.1464 + the size has not, only then does Mercurial need to 1.1465 + actually read the contents of the file to see if it has 1.1466 + changed.</para> 1.1467 + </listitem> 1.1468 + </itemizedlist> 1.1469 + 1.1470 + <para id="x_728">Storing the modification time and size dramatically 1.1471 + reduces the number of read operations that Mercurial needs to 1.1472 + perform when we run commands like <command>hg status</command>. 1.1473 + This results in large performance improvements.</para> 1.1474 + </sect2> 1.1475 + </sect1> 1.1476 </chapter> 1.1477 1.1478 <!-- 1.1479 local variables: 1.1480 sgml-parent-document: ("00book.xml" "book" "chapter") 1.1481 end: 1.1482 ---> 1.1483 \ No newline at end of file 1.1484 +-->