rev |
line source |
belaran@964
|
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
|
belaran@964
|
2
|
bos@559
|
3 <chapter id="chap:concepts">
|
bos@572
|
4 <?dbhtml filename="behind-the-scenes.html"?>
|
youshe@993
|
5 <title>Derrière le décor</title>
|
youshe@993
|
6
|
youshe@993
|
7 <para id="x_2e8">À la différence de beaucoup d'outils de gestion de versions,
|
youshe@993
|
8 les concepts sur lesquels se base Mercurial sont assez simples pour
|
youshe@993
|
9 qu'il soit facile de comprendre comment le logiciel fonctionne.
|
youshe@993
|
10 Bien que leur connaissance ne soit pas nécéssaire, je trouve utile
|
youshe@993
|
11 d'avoir un <quote>modèle mental</quote> de ce qui se passe.</para>
|
youshe@993
|
12
|
youshe@993
|
13 <para id="x_2e9">En effet, cette compréhension m'apporte la confiance que
|
youshe@993
|
14 Mercurial a été développé avec soin pour être à la fois
|
youshe@993
|
15 <emphasis>sûr</emphasis> et <emphasis>efficace</emphasis>. De surcroît,
|
youshe@993
|
16 si il m'est facile de garder en tête ce que le logiciel fait lorsque
|
youshe@993
|
17 j'accompli des tâches de révision, j'aurai moins de risques d'être
|
youshe@993
|
18 surpris par son comportement.</para>
|
youshe@993
|
19
|
youshe@993
|
20 <para id="x_2ea">Dans ce chapitre, nous décrirons tout d'abord les concepts
|
youshe@993
|
21 essentiels de l'architecture de Mercurial, pour ensuite discuter quelques
|
youshe@993
|
22 uns des détails intéressants de son implémentation.</para>
|
bos@559
|
23
|
bos@559
|
24 <sect1>
|
youshe@993
|
25 <title>Conservation de l'historique sous Mercurial</title>
|
youshe@993
|
26 <sect2>
|
youshe@993
|
27 <title>Suivi de l'historique pour un seul fichier</title>
|
youshe@993
|
28
|
youshe@993
|
29 <para id="x_2eb">Lorsque Mercurial effectue un suivi des modifications
|
youshe@993
|
30 faites à un fichier, il conserve l'historique pour ce fichier dans un
|
youshe@993
|
31 <emphasis>filelog</emphasis> sous forme de métadonnées. Chaque entrée
|
youshe@993
|
32 dans le filelog contient assez d'informations pour reconstituer une
|
youshe@993
|
33 révision du fichier correspondant. Les filelogs sont des fichiers
|
youshe@993
|
34 stockés dans le répertoire <filename role="special"
|
youshe@993
|
35 class="directory">.hg/store/data</filename>. Un filelog contient
|
youshe@993
|
36 des informations de deux types: les données de révision, et un index
|
youshe@993
|
37 pour permettre à Mercurial une recherche efficace d'une révision
|
youshe@993
|
38 donnée.</para>
|
youshe@993
|
39
|
youshe@993
|
40 <para id="x_2ec">Lorsqu'un fichier devient trop gros ou a un long
|
youshe@993
|
41 historique, son filelog se voit stocker dans un fichier de données
|
youshe@993
|
42 (avec un suffixe <quote><literal>.d</literal></quote>) et un fichier
|
youshe@993
|
43 index (avec un suffixe<quote><literal>.i</literal></quote>)
|
youshe@993
|
44 distincts. La relation entre un fichier dans le répertoire de travail
|
youshe@993
|
45 et le filelog couvrant le suivi de son historique dans le dépôt est
|
youshe@993
|
46 illustré à la figure <xref linkend="fig:concepts:filelog"/>.</para>
|
bos@559
|
47
|
bos@591
|
48 <figure id="fig:concepts:filelog">
|
youshe@993
|
49 <title>Relations entre les fichiers dans le répertoire de travail et
|
youshe@993
|
50 leurs filelogs dans le dépôt</title>
|
youshe@993
|
51 <mediaobject> <imageobject><imagedata
|
youshe@993
|
52 fileref="figs/filelog.png"/></imageobject>
|
youshe@993
|
53 <textobject><phrase>XXX add text</phrase></textobject>
|
youshe@993
|
54 </mediaobject> </figure>
|
youshe@993
|
55
|
youshe@993
|
56 </sect2>
|
youshe@993
|
57 <sect2>
|
youshe@993
|
58 <title>Gestion des fichiers suivis</title>
|
youshe@993
|
59
|
youshe@993
|
60 <para id="x_2ee">Mercurial a recours à une structure nommée
|
youshe@993
|
61 <emphasis>manifest</emphasis> pour rassembler les informations sur
|
youshe@993
|
62 les fichiers dont il gère le suivi. Chaque entrée dans ce manifest
|
youshe@993
|
63 contient des informations sur les fichiers présents dans une révision
|
youshe@993
|
64 donnée. Une entrée store la liste des fichiers faisant partie de la
|
youshe@993
|
65 révision, la version de chaque fichier, et quelques autres
|
youshe@993
|
66 métadonnées sur ces fichiers.</para>
|
bos@559
|
67
|
bos@559
|
68 </sect2>
|
bos@559
|
69 <sect2>
|
bos@559
|
70 <title>Recording changeset information</title>
|
bos@559
|
71
|
youshe@993
|
72 <para id="x_2ef">The <emphasis>changelog</emphasis> contains
|
youshe@993
|
73 information about each changeset. Each revision records who
|
youshe@993
|
74 committed a change, the changeset comment, other pieces of
|
youshe@993
|
75 changeset-related information, and the revision of the manifest to
|
youshe@993
|
76 use.</para>
|
bos@559
|
77
|
bos@559
|
78 </sect2>
|
bos@559
|
79 <sect2>
|
bos@559
|
80 <title>Relationships between revisions</title>
|
bos@559
|
81
|
bos@584
|
82 <para id="x_2f0">Within a changelog, a manifest, or a filelog, each
|
bos@559
|
83 revision stores a pointer to its immediate parent (or to its
|
bos@559
|
84 two parents, if it's a merge revision). As I mentioned above,
|
bos@559
|
85 there are also relationships between revisions
|
bos@559
|
86 <emphasis>across</emphasis> these structures, and they are
|
bos@559
|
87 hierarchical in nature.</para>
|
bos@559
|
88
|
bos@584
|
89 <para id="x_2f1">For every changeset in a repository, there is exactly one
|
bos@559
|
90 revision stored in the changelog. Each revision of the
|
bos@559
|
91 changelog contains a pointer to a single revision of the
|
bos@559
|
92 manifest. A revision of the manifest stores a pointer to a
|
bos@559
|
93 single revision of each filelog tracked when that changeset
|
bos@592
|
94 was created. These relationships are illustrated in
|
bos@559
|
95 <xref linkend="fig:concepts:metadata"/>.</para>
|
bos@559
|
96
|
bos@591
|
97 <figure id="fig:concepts:metadata">
|
bos@591
|
98 <title>Metadata relationships</title>
|
bos@591
|
99 <mediaobject>
|
bos@594
|
100 <imageobject><imagedata fileref="figs/metadata.png"/></imageobject>
|
bos@591
|
101 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@559
|
102 </mediaobject>
|
bos@591
|
103 </figure>
|
bos@559
|
104
|
bos@584
|
105 <para id="x_2f3">As the illustration shows, there is
|
bos@559
|
106 <emphasis>not</emphasis> a <quote>one to one</quote>
|
bos@559
|
107 relationship between revisions in the changelog, manifest, or
|
bos@701
|
108 filelog. If a file that
|
bos@559
|
109 Mercurial tracks hasn't changed between two changesets, the
|
bos@559
|
110 entry for that file in the two revisions of the manifest will
|
bos@701
|
111 point to the same revision of its filelog<footnote>
|
bos@702
|
112 <para id="x_725">It is possible (though unusual) for the manifest to
|
bos@701
|
113 remain the same between two changesets, in which case the
|
bos@701
|
114 changelog entries for those changesets will point to the
|
bos@701
|
115 same revision of the manifest.</para>
|
bos@701
|
116 </footnote>.</para>
|
bos@559
|
117
|
bos@559
|
118 </sect2>
|
bos@559
|
119 </sect1>
|
bos@559
|
120 <sect1>
|
bos@559
|
121 <title>Safe, efficient storage</title>
|
bos@559
|
122
|
bos@584
|
123 <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are
|
bos@559
|
124 provided by a single structure called the
|
bos@559
|
125 <emphasis>revlog</emphasis>.</para>
|
bos@559
|
126
|
bos@559
|
127 <sect2>
|
bos@559
|
128 <title>Efficient storage</title>
|
bos@559
|
129
|
bos@584
|
130 <para id="x_2f5">The revlog provides efficient storage of revisions using a
|
bos@559
|
131 <emphasis>delta</emphasis> mechanism. Instead of storing a
|
bos@559
|
132 complete copy of a file for each revision, it stores the
|
bos@559
|
133 changes needed to transform an older revision into the new
|
bos@559
|
134 revision. For many kinds of file data, these deltas are
|
bos@559
|
135 typically a fraction of a percent of the size of a full copy
|
bos@559
|
136 of a file.</para>
|
bos@559
|
137
|
bos@584
|
138 <para id="x_2f6">Some obsolete revision control systems can only work with
|
bos@559
|
139 deltas of text files. They must either store binary files as
|
bos@559
|
140 complete snapshots or encoded into a text representation, both
|
bos@559
|
141 of which are wasteful approaches. Mercurial can efficiently
|
bos@559
|
142 handle deltas of files with arbitrary binary contents; it
|
bos@559
|
143 doesn't need to treat text as special.</para>
|
bos@559
|
144
|
bos@559
|
145 </sect2>
|
bos@559
|
146 <sect2 id="sec:concepts:txn">
|
bos@559
|
147 <title>Safe operation</title>
|
bos@559
|
148
|
bos@584
|
149 <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to
|
bos@559
|
150 the end of a revlog file. It never modifies a section of a
|
bos@559
|
151 file after it has written it. This is both more robust and
|
bos@559
|
152 efficient than schemes that need to modify or rewrite
|
bos@559
|
153 data.</para>
|
bos@559
|
154
|
bos@584
|
155 <para id="x_2f8">In addition, Mercurial treats every write as part of a
|
bos@559
|
156 <emphasis>transaction</emphasis> that can span a number of
|
bos@559
|
157 files. A transaction is <emphasis>atomic</emphasis>: either
|
bos@559
|
158 the entire transaction succeeds and its effects are all
|
bos@559
|
159 visible to readers in one go, or the whole thing is undone.
|
bos@559
|
160 This guarantee of atomicity means that if you're running two
|
bos@559
|
161 copies of Mercurial, where one is reading data and one is
|
bos@559
|
162 writing it, the reader will never see a partially written
|
bos@559
|
163 result that might confuse it.</para>
|
bos@559
|
164
|
bos@584
|
165 <para id="x_2f9">The fact that Mercurial only appends to files makes it
|
bos@559
|
166 easier to provide this transactional guarantee. The easier it
|
bos@559
|
167 is to do stuff like this, the more confident you should be
|
bos@559
|
168 that it's done correctly.</para>
|
bos@559
|
169
|
bos@559
|
170 </sect2>
|
bos@559
|
171 <sect2>
|
bos@559
|
172 <title>Fast retrieval</title>
|
bos@559
|
173
|
bos@701
|
174 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to
|
bos@701
|
175 all earlier revision control systems: the problem of
|
bos@701
|
176 <emphasis>inefficient retrieval</emphasis>. Most revision
|
bos@701
|
177 control systems store the contents of a revision as an
|
bos@701
|
178 incremental series of modifications against a
|
bos@701
|
179 <quote>snapshot</quote>. (Some base the snapshot on the
|
bos@701
|
180 oldest revision, others on the newest.) To reconstruct a
|
bos@701
|
181 specific revision, you must first read the snapshot, and then
|
bos@701
|
182 every one of the revisions between the snapshot and your
|
bos@701
|
183 target revision. The more history that a file accumulates,
|
bos@701
|
184 the more revisions you must read, hence the longer it takes to
|
bos@701
|
185 reconstruct a particular revision.</para>
|
bos@559
|
186
|
bos@591
|
187 <figure id="fig:concepts:snapshot">
|
bos@591
|
188 <title>Snapshot of a revlog, with incremental deltas</title>
|
bos@591
|
189 <mediaobject>
|
bos@594
|
190 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject>
|
bos@591
|
191 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@591
|
192 </mediaobject>
|
bos@591
|
193 </figure>
|
bos@559
|
194
|
bos@584
|
195 <para id="x_2fc">The innovation that Mercurial applies to this problem is
|
bos@559
|
196 simple but effective. Once the cumulative amount of delta
|
bos@559
|
197 information stored since the last snapshot exceeds a fixed
|
bos@559
|
198 threshold, it stores a new snapshot (compressed, of course),
|
bos@559
|
199 instead of another delta. This makes it possible to
|
bos@559
|
200 reconstruct <emphasis>any</emphasis> revision of a file
|
bos@559
|
201 quickly. This approach works so well that it has since been
|
bos@559
|
202 copied by several other revision control systems.</para>
|
bos@559
|
203
|
bos@592
|
204 <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates
|
bos@559
|
205 the idea. In an entry in a revlog's index file, Mercurial
|
bos@559
|
206 stores the range of entries from the data file that it must
|
bos@559
|
207 read to reconstruct a particular revision.</para>
|
bos@559
|
208
|
bos@559
|
209 <sect3>
|
bos@559
|
210 <title>Aside: the influence of video compression</title>
|
bos@559
|
211
|
bos@701
|
212 <para id="x_2fe">If you're familiar with video compression or
|
bos@701
|
213 have ever watched a TV feed through a digital cable or
|
bos@701
|
214 satellite service, you may know that most video compression
|
bos@701
|
215 schemes store each frame of video as a delta against its
|
bos@701
|
216 predecessor frame.</para>
|
bos@701
|
217
|
bos@701
|
218 <para id="x_2ff">Mercurial borrows this idea to make it
|
bos@701
|
219 possible to reconstruct a revision from a snapshot and a
|
bos@701
|
220 small number of deltas.</para>
|
bos@559
|
221
|
bos@559
|
222 </sect3>
|
bos@559
|
223 </sect2>
|
bos@559
|
224 <sect2>
|
bos@559
|
225 <title>Identification and strong integrity</title>
|
bos@559
|
226
|
bos@584
|
227 <para id="x_300">Along with delta or snapshot information, a revlog entry
|
bos@559
|
228 contains a cryptographic hash of the data that it represents.
|
bos@559
|
229 This makes it difficult to forge the contents of a revision,
|
bos@559
|
230 and easy to detect accidental corruption.</para>
|
bos@559
|
231
|
bos@584
|
232 <para id="x_301">Hashes provide more than a mere check against corruption;
|
bos@559
|
233 they are used as the identifiers for revisions. The changeset
|
bos@559
|
234 identification hashes that you see as an end user are from
|
bos@559
|
235 revisions of the changelog. Although filelogs and the
|
bos@559
|
236 manifest also use hashes, Mercurial only uses these behind the
|
bos@559
|
237 scenes.</para>
|
bos@559
|
238
|
bos@584
|
239 <para id="x_302">Mercurial verifies that hashes are correct when it
|
bos@559
|
240 retrieves file revisions and when it pulls changes from
|
bos@559
|
241 another repository. If it encounters an integrity problem, it
|
bos@559
|
242 will complain and stop whatever it's doing.</para>
|
bos@559
|
243
|
bos@584
|
244 <para id="x_303">In addition to the effect it has on retrieval efficiency,
|
bos@559
|
245 Mercurial's use of periodic snapshots makes it more robust
|
bos@559
|
246 against partial data corruption. If a revlog becomes partly
|
bos@559
|
247 corrupted due to a hardware error or system bug, it's often
|
bos@559
|
248 possible to reconstruct some or most revisions from the
|
bos@559
|
249 uncorrupted sections of the revlog, both before and after the
|
bos@559
|
250 corrupted section. This would not be possible with a
|
bos@559
|
251 delta-only storage model.</para>
|
bos@559
|
252 </sect2>
|
bos@559
|
253 </sect1>
|
bos@701
|
254
|
bos@559
|
255 <sect1>
|
bos@559
|
256 <title>Revision history, branching, and merging</title>
|
bos@559
|
257
|
bos@584
|
258 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its
|
bos@559
|
259 immediate ancestor revision, usually referred to as its
|
bos@559
|
260 <emphasis>parent</emphasis>. In fact, a revision contains room
|
bos@559
|
261 for not one parent, but two. Mercurial uses a special hash,
|
bos@559
|
262 called the <quote>null ID</quote>, to represent the idea
|
bos@559
|
263 <quote>there is no parent here</quote>. This hash is simply a
|
bos@559
|
264 string of zeroes.</para>
|
bos@559
|
265
|
bos@592
|
266 <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see
|
bos@559
|
267 an example of the conceptual structure of a revlog. Filelogs,
|
bos@559
|
268 manifests, and changelogs all have this same structure; they
|
bos@559
|
269 differ only in the kind of data stored in each delta or
|
bos@559
|
270 snapshot.</para>
|
bos@559
|
271
|
bos@584
|
272 <para id="x_306">The first revision in a revlog (at the bottom of the image)
|
bos@559
|
273 has the null ID in both of its parent slots. For a
|
bos@559
|
274 <quote>normal</quote> revision, its first parent slot contains
|
bos@559
|
275 the ID of its parent revision, and its second contains the null
|
bos@559
|
276 ID, indicating that the revision has only one real parent. Any
|
bos@559
|
277 two revisions that have the same parent ID are branches. A
|
bos@559
|
278 revision that represents a merge between branches has two normal
|
bos@559
|
279 revision IDs in its parent slots.</para>
|
bos@559
|
280
|
bos@591
|
281 <figure id="fig:concepts:revlog">
|
bos@591
|
282 <title>The conceptual structure of a revlog</title>
|
bos@591
|
283 <mediaobject>
|
bos@594
|
284 <imageobject><imagedata fileref="figs/revlog.png"/></imageobject>
|
bos@591
|
285 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@591
|
286 </mediaobject>
|
bos@591
|
287 </figure>
|
bos@559
|
288
|
bos@559
|
289 </sect1>
|
bos@559
|
290 <sect1>
|
bos@559
|
291 <title>The working directory</title>
|
bos@559
|
292
|
bos@584
|
293 <para id="x_307">In the working directory, Mercurial stores a snapshot of the
|
bos@559
|
294 files from the repository as of a particular changeset.</para>
|
bos@559
|
295
|
bos@584
|
296 <para id="x_308">The working directory <quote>knows</quote> which changeset
|
bos@559
|
297 it contains. When you update the working directory to contain a
|
bos@559
|
298 particular changeset, Mercurial looks up the appropriate
|
bos@559
|
299 revision of the manifest to find out which files it was tracking
|
bos@559
|
300 at the time that changeset was committed, and which revision of
|
bos@559
|
301 each file was then current. It then recreates a copy of each of
|
bos@559
|
302 those files, with the same contents it had when the changeset
|
bos@559
|
303 was committed.</para>
|
bos@559
|
304
|
bos@701
|
305 <para id="x_309">The <emphasis>dirstate</emphasis> is a special
|
bos@701
|
306 structure that contains Mercurial's knowledge of the working
|
bos@701
|
307 directory. It is maintained as a file named
|
bos@701
|
308 <filename>.hg/dirstate</filename> inside a repository. The
|
bos@701
|
309 dirstate details which changeset the working directory is
|
bos@701
|
310 updated to, and all of the files that Mercurial is tracking in
|
bos@701
|
311 the working directory. It also lets Mercurial quickly notice
|
bos@701
|
312 changed files, by recording their checkout times and
|
bos@701
|
313 sizes.</para>
|
bos@559
|
314
|
bos@584
|
315 <para id="x_30a">Just as a revision of a revlog has room for two parents, so
|
bos@559
|
316 that it can represent either a normal revision (with one parent)
|
bos@559
|
317 or a merge of two earlier revisions, the dirstate has slots for
|
bos@559
|
318 two parents. When you use the <command role="hg-cmd">hg
|
bos@559
|
319 update</command> command, the changeset that you update to is
|
bos@559
|
320 stored in the <quote>first parent</quote> slot, and the null ID
|
bos@559
|
321 in the second. When you <command role="hg-cmd">hg
|
bos@559
|
322 merge</command> with another changeset, the first parent
|
bos@559
|
323 remains unchanged, and the second parent is filled in with the
|
bos@559
|
324 changeset you're merging with. The <command role="hg-cmd">hg
|
bos@559
|
325 parents</command> command tells you what the parents of the
|
bos@559
|
326 dirstate are.</para>
|
bos@559
|
327
|
bos@559
|
328 <sect2>
|
bos@559
|
329 <title>What happens when you commit</title>
|
bos@559
|
330
|
bos@584
|
331 <para id="x_30b">The dirstate stores parent information for more than just
|
bos@559
|
332 book-keeping purposes. Mercurial uses the parents of the
|
bos@559
|
333 dirstate as <emphasis>the parents of a new
|
bos@559
|
334 changeset</emphasis> when you perform a commit.</para>
|
bos@559
|
335
|
bos@591
|
336 <figure id="fig:concepts:wdir">
|
bos@591
|
337 <title>The working directory can have two parents</title>
|
bos@591
|
338 <mediaobject>
|
bos@594
|
339 <imageobject><imagedata fileref="figs/wdir.png"/></imageobject>
|
bos@591
|
340 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@591
|
341 </mediaobject>
|
bos@591
|
342 </figure>
|
bos@559
|
343
|
bos@592
|
344 <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the
|
bos@559
|
345 normal state of the working directory, where it has a single
|
bos@559
|
346 changeset as parent. That changeset is the
|
bos@559
|
347 <emphasis>tip</emphasis>, the newest changeset in the
|
bos@559
|
348 repository that has no children.</para>
|
bos@559
|
349
|
bos@591
|
350 <figure id="fig:concepts:wdir-after-commit">
|
bos@591
|
351 <title>The working directory gains new parents after a
|
bos@591
|
352 commit</title>
|
bos@591
|
353 <mediaobject>
|
bos@594
|
354 <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject>
|
bos@591
|
355 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@591
|
356 </mediaobject>
|
bos@591
|
357 </figure>
|
bos@559
|
358
|
bos@584
|
359 <para id="x_30f">It's useful to think of the working directory as
|
bos@559
|
360 <quote>the changeset I'm about to commit</quote>. Any files
|
bos@559
|
361 that you tell Mercurial that you've added, removed, renamed,
|
bos@559
|
362 or copied will be reflected in that changeset, as will
|
bos@559
|
363 modifications to any files that Mercurial is already tracking;
|
bos@559
|
364 the new changeset will have the parents of the working
|
bos@559
|
365 directory as its parents.</para>
|
bos@559
|
366
|
bos@592
|
367 <para id="x_310">After a commit, Mercurial will update the
|
bos@592
|
368 parents of the working directory, so that the first parent is
|
bos@592
|
369 the ID of the new changeset, and the second is the null ID.
|
bos@592
|
370 This is shown in <xref
|
bos@592
|
371 linkend="fig:concepts:wdir-after-commit"/>. Mercurial
|
bos@559
|
372 doesn't touch any of the files in the working directory when
|
bos@559
|
373 you commit; it just modifies the dirstate to note its new
|
bos@559
|
374 parents.</para>
|
bos@559
|
375
|
bos@559
|
376 </sect2>
|
bos@559
|
377 <sect2>
|
bos@559
|
378 <title>Creating a new head</title>
|
bos@559
|
379
|
bos@584
|
380 <para id="x_311">It's perfectly normal to update the working directory to a
|
bos@559
|
381 changeset other than the current tip. For example, you might
|
bos@559
|
382 want to know what your project looked like last Tuesday, or
|
bos@559
|
383 you could be looking through changesets to see which one
|
bos@559
|
384 introduced a bug. In cases like this, the natural thing to do
|
bos@559
|
385 is update the working directory to the changeset you're
|
bos@559
|
386 interested in, and then examine the files in the working
|
bos@559
|
387 directory directly to see their contents as they were when you
|
bos@559
|
388 committed that changeset. The effect of this is shown in
|
bos@592
|
389 <xref linkend="fig:concepts:wdir-pre-branch"/>.</para>
|
bos@559
|
390
|
bos@591
|
391 <figure id="fig:concepts:wdir-pre-branch">
|
bos@591
|
392 <title>The working directory, updated to an older
|
bos@591
|
393 changeset</title>
|
bos@591
|
394 <mediaobject>
|
bos@594
|
395 <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject>
|
bos@591
|
396 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@591
|
397 </mediaobject>
|
bos@591
|
398 </figure>
|
bos@559
|
399
|
bos@592
|
400 <para id="x_313">Having updated the working directory to an
|
bos@592
|
401 older changeset, what happens if you make some changes, and
|
bos@592
|
402 then commit? Mercurial behaves in the same way as I outlined
|
bos@559
|
403 above. The parents of the working directory become the
|
bos@559
|
404 parents of the new changeset. This new changeset has no
|
bos@559
|
405 children, so it becomes the new tip. And the repository now
|
bos@559
|
406 contains two changesets that have no children; we call these
|
bos@559
|
407 <emphasis>heads</emphasis>. You can see the structure that
|
bos@592
|
408 this creates in <xref
|
bos@559
|
409 linkend="fig:concepts:wdir-branch"/>.</para>
|
bos@559
|
410
|
bos@591
|
411 <figure id="fig:concepts:wdir-branch">
|
bos@591
|
412 <title>After a commit made while synced to an older
|
bos@591
|
413 changeset</title>
|
bos@591
|
414 <mediaobject>
|
bos@594
|
415 <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject>
|
bos@591
|
416 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@591
|
417 </mediaobject>
|
bos@591
|
418 </figure>
|
bos@559
|
419
|
bos@559
|
420 <note>
|
bos@701
|
421 <para id="x_315">If you're new to Mercurial, you should keep
|
bos@701
|
422 in mind a common <quote>error</quote>, which is to use the
|
bos@701
|
423 <command role="hg-cmd">hg pull</command> command without any
|
bos@559
|
424 options. By default, the <command role="hg-cmd">hg
|
bos@559
|
425 pull</command> command <emphasis>does not</emphasis>
|
bos@559
|
426 update the working directory, so you'll bring new changesets
|
bos@559
|
427 into your repository, but the working directory will stay
|
bos@559
|
428 synced at the same changeset as before the pull. If you
|
bos@559
|
429 make some changes and commit afterwards, you'll thus create
|
bos@559
|
430 a new head, because your working directory isn't synced to
|
bos@701
|
431 whatever the current tip is. To combine the operation of a
|
bos@701
|
432 pull, followed by an update, run <command>hg pull
|
bos@701
|
433 -u</command>.</para>
|
bos@701
|
434
|
bos@701
|
435 <para id="x_316">I put the word <quote>error</quote> in quotes
|
bos@701
|
436 because all that you need to do to rectify the situation
|
bos@701
|
437 where you created a new head by accident is
|
bos@701
|
438 <command role="hg-cmd">hg merge</command>, then <command
|
bos@701
|
439 role="hg-cmd">hg commit</command>. In other words, this
|
bos@701
|
440 almost never has negative consequences; it's just something
|
bos@701
|
441 of a surprise for newcomers. I'll discuss other ways to
|
bos@701
|
442 avoid this behavior, and why Mercurial behaves in this
|
bos@701
|
443 initially surprising way, later on.</para>
|
bos@559
|
444 </note>
|
bos@559
|
445
|
bos@559
|
446 </sect2>
|
bos@559
|
447 <sect2>
|
bos@620
|
448 <title>Merging changes</title>
|
bos@559
|
449
|
bos@592
|
450 <para id="x_317">When you run the <command role="hg-cmd">hg
|
bos@592
|
451 merge</command> command, Mercurial leaves the first parent
|
bos@592
|
452 of the working directory unchanged, and sets the second parent
|
bos@592
|
453 to the changeset you're merging with, as shown in <xref
|
bos@559
|
454 linkend="fig:concepts:wdir-merge"/>.</para>
|
bos@559
|
455
|
bos@591
|
456 <figure id="fig:concepts:wdir-merge">
|
bos@591
|
457 <title>Merging two heads</title>
|
bos@591
|
458 <mediaobject>
|
bos@591
|
459 <imageobject>
|
bos@594
|
460 <imagedata fileref="figs/wdir-merge.png"/>
|
bos@591
|
461 </imageobject>
|
bos@591
|
462 <textobject><phrase>XXX add text</phrase></textobject>
|
bos@591
|
463 </mediaobject>
|
bos@591
|
464 </figure>
|
bos@559
|
465
|
bos@584
|
466 <para id="x_319">Mercurial also has to modify the working directory, to
|
bos@559
|
467 merge the files managed in the two changesets. Simplified a
|
bos@559
|
468 little, the merging process goes like this, for every file in
|
bos@559
|
469 the manifests of both changesets.</para>
|
bos@559
|
470 <itemizedlist>
|
bos@584
|
471 <listitem><para id="x_31a">If neither changeset has modified a file, do
|
bos@559
|
472 nothing with that file.</para>
|
bos@559
|
473 </listitem>
|
bos@584
|
474 <listitem><para id="x_31b">If one changeset has modified a file, and the
|
bos@559
|
475 other hasn't, create the modified copy of the file in the
|
bos@559
|
476 working directory.</para>
|
bos@559
|
477 </listitem>
|
bos@584
|
478 <listitem><para id="x_31c">If one changeset has removed a file, and the
|
bos@559
|
479 other hasn't (or has also deleted it), delete the file
|
bos@559
|
480 from the working directory.</para>
|
bos@559
|
481 </listitem>
|
bos@584
|
482 <listitem><para id="x_31d">If one changeset has removed a file, but the
|
bos@559
|
483 other has modified the file, ask the user what to do: keep
|
bos@559
|
484 the modified file, or remove it?</para>
|
bos@559
|
485 </listitem>
|
bos@584
|
486 <listitem><para id="x_31e">If both changesets have modified a file,
|
bos@559
|
487 invoke an external merge program to choose the new
|
bos@559
|
488 contents for the merged file. This may require input from
|
bos@559
|
489 the user.</para>
|
bos@559
|
490 </listitem>
|
bos@584
|
491 <listitem><para id="x_31f">If one changeset has modified a file, and the
|
bos@559
|
492 other has renamed or copied the file, make sure that the
|
bos@559
|
493 changes follow the new name of the file.</para>
|
bos@559
|
494 </listitem></itemizedlist>
|
bos@584
|
495 <para id="x_320">There are more details&emdash;merging has plenty of corner
|
bos@559
|
496 cases&emdash;but these are the most common choices that are
|
bos@559
|
497 involved in a merge. As you can see, most cases are
|
bos@559
|
498 completely automatic, and indeed most merges finish
|
bos@559
|
499 automatically, without requiring your input to resolve any
|
bos@559
|
500 conflicts.</para>
|
bos@559
|
501
|
bos@584
|
502 <para id="x_321">When you're thinking about what happens when you commit
|
bos@559
|
503 after a merge, once again the working directory is <quote>the
|
bos@559
|
504 changeset I'm about to commit</quote>. After the <command
|
bos@559
|
505 role="hg-cmd">hg merge</command> command completes, the
|
bos@559
|
506 working directory has two parents; these will become the
|
bos@559
|
507 parents of the new changeset.</para>
|
bos@559
|
508
|
bos@701
|
509 <para id="x_322">Mercurial lets you perform multiple merges, but
|
bos@701
|
510 you must commit the results of each individual merge as you
|
bos@701
|
511 go. This is necessary because Mercurial only tracks two
|
bos@701
|
512 parents for both revisions and the working directory. While
|
bos@701
|
513 it would be technically feasible to merge multiple changesets
|
bos@701
|
514 at once, Mercurial avoids this for simplicity. With multi-way
|
bos@701
|
515 merges, the risks of user confusion, nasty conflict
|
bos@701
|
516 resolution, and making a terrible mess of a merge would grow
|
bos@701
|
517 intolerable.</para>
|
bos@559
|
518
|
bos@559
|
519 </sect2>
|
bos@620
|
520
|
bos@620
|
521 <sect2>
|
bos@620
|
522 <title>Merging and renames</title>
|
bos@620
|
523
|
bos@676
|
524 <para id="x_69a">A surprising number of revision control systems pay little
|
bos@620
|
525 or no attention to a file's <emphasis>name</emphasis> over
|
bos@620
|
526 time. For instance, it used to be common that if a file got
|
bos@620
|
527 renamed on one side of a merge, the changes from the other
|
bos@620
|
528 side would be silently dropped.</para>
|
bos@620
|
529
|
bos@676
|
530 <para id="x_69b">Mercurial records metadata when you tell it to perform a
|
bos@620
|
531 rename or copy. It uses this metadata during a merge to do the
|
bos@620
|
532 right thing in the case of a merge. For instance, if I rename
|
bos@620
|
533 a file, and you edit it without renaming it, when we merge our
|
bos@620
|
534 work the file will be renamed and have your edits
|
bos@620
|
535 applied.</para>
|
bos@620
|
536 </sect2>
|
bos@559
|
537 </sect1>
|
bos@620
|
538
|
bos@559
|
539 <sect1>
|
bos@559
|
540 <title>Other interesting design features</title>
|
bos@559
|
541
|
bos@584
|
542 <para id="x_323">In the sections above, I've tried to highlight some of the
|
bos@559
|
543 most important aspects of Mercurial's design, to illustrate that
|
bos@559
|
544 it pays careful attention to reliability and performance.
|
bos@559
|
545 However, the attention to detail doesn't stop there. There are
|
bos@559
|
546 a number of other aspects of Mercurial's construction that I
|
bos@559
|
547 personally find interesting. I'll detail a few of them here,
|
bos@559
|
548 separate from the <quote>big ticket</quote> items above, so that
|
bos@559
|
549 if you're interested, you can gain a better idea of the amount
|
bos@559
|
550 of thinking that goes into a well-designed system.</para>
|
bos@559
|
551
|
bos@559
|
552 <sect2>
|
bos@559
|
553 <title>Clever compression</title>
|
bos@559
|
554
|
bos@584
|
555 <para id="x_324">When appropriate, Mercurial will store both snapshots and
|
bos@559
|
556 deltas in compressed form. It does this by always
|
bos@559
|
557 <emphasis>trying to</emphasis> compress a snapshot or delta,
|
bos@559
|
558 but only storing the compressed version if it's smaller than
|
bos@559
|
559 the uncompressed version.</para>
|
bos@559
|
560
|
bos@584
|
561 <para id="x_325">This means that Mercurial does <quote>the right
|
bos@559
|
562 thing</quote> when storing a file whose native form is
|
bos@559
|
563 compressed, such as a <literal>zip</literal> archive or a JPEG
|
bos@559
|
564 image. When these types of files are compressed a second
|
bos@559
|
565 time, the resulting file is usually bigger than the
|
bos@559
|
566 once-compressed form, and so Mercurial will store the plain
|
bos@559
|
567 <literal>zip</literal> or JPEG.</para>
|
bos@559
|
568
|
bos@584
|
569 <para id="x_326">Deltas between revisions of a compressed file are usually
|
bos@559
|
570 larger than snapshots of the file, and Mercurial again does
|
bos@559
|
571 <quote>the right thing</quote> in these cases. It finds that
|
bos@559
|
572 such a delta exceeds the threshold at which it should store a
|
bos@559
|
573 complete snapshot of the file, so it stores the snapshot,
|
bos@559
|
574 again saving space compared to a naive delta-only
|
bos@559
|
575 approach.</para>
|
bos@559
|
576
|
bos@559
|
577 <sect3>
|
bos@559
|
578 <title>Network recompression</title>
|
bos@559
|
579
|
bos@584
|
580 <para id="x_327">When storing revisions on disk, Mercurial uses the
|
bos@559
|
581 <quote>deflate</quote> compression algorithm (the same one
|
bos@559
|
582 used by the popular <literal>zip</literal> archive format),
|
bos@559
|
583 which balances good speed with a respectable compression
|
bos@559
|
584 ratio. However, when transmitting revision data over a
|
bos@559
|
585 network connection, Mercurial uncompresses the compressed
|
bos@559
|
586 revision data.</para>
|
bos@559
|
587
|
bos@584
|
588 <para id="x_328">If the connection is over HTTP, Mercurial recompresses
|
bos@559
|
589 the entire stream of data using a compression algorithm that
|
bos@559
|
590 gives a better compression ratio (the Burrows-Wheeler
|
bos@559
|
591 algorithm from the widely used <literal>bzip2</literal>
|
bos@559
|
592 compression package). This combination of algorithm and
|
bos@559
|
593 compression of the entire stream (instead of a revision at a
|
bos@559
|
594 time) substantially reduces the number of bytes to be
|
bos@620
|
595 transferred, yielding better network performance over most
|
bos@620
|
596 kinds of network.</para>
|
bos@559
|
597
|
bos@701
|
598 <para id="x_329">If the connection is over
|
bos@701
|
599 <command>ssh</command>, Mercurial
|
bos@701
|
600 <emphasis>doesn't</emphasis> recompress the stream, because
|
bos@701
|
601 <command>ssh</command> can already do this itself. You can
|
bos@701
|
602 tell Mercurial to always use <command>ssh</command>'s
|
bos@701
|
603 compression feature by editing the
|
bos@701
|
604 <filename>.hgrc</filename> file in your home directory as
|
bos@701
|
605 follows.</para>
|
bos@701
|
606
|
bos@701
|
607 <programlisting>[ui]
|
bos@701
|
608 ssh = ssh -C</programlisting>
|
bos@559
|
609
|
bos@559
|
610 </sect3>
|
bos@559
|
611 </sect2>
|
bos@559
|
612 <sect2>
|
bos@559
|
613 <title>Read/write ordering and atomicity</title>
|
bos@559
|
614
|
bos@592
|
615 <para id="x_32a">Appending to files isn't the whole story when
|
bos@592
|
616 it comes to guaranteeing that a reader won't see a partial
|
bos@592
|
617 write. If you recall <xref linkend="fig:concepts:metadata"/>,
|
bos@701
|
618 revisions in the changelog point to revisions in the manifest,
|
bos@701
|
619 and revisions in the manifest point to revisions in filelogs.
|
bos@592
|
620 This hierarchy is deliberate.</para>
|
bos@559
|
621
|
bos@584
|
622 <para id="x_32b">A writer starts a transaction by writing filelog and
|
bos@559
|
623 manifest data, and doesn't write any changelog data until
|
bos@559
|
624 those are finished. A reader starts by reading changelog
|
bos@559
|
625 data, then manifest data, followed by filelog data.</para>
|
bos@559
|
626
|
bos@584
|
627 <para id="x_32c">Since the writer has always finished writing filelog and
|
bos@559
|
628 manifest data before it writes to the changelog, a reader will
|
bos@559
|
629 never read a pointer to a partially written manifest revision
|
bos@559
|
630 from the changelog, and it will never read a pointer to a
|
bos@559
|
631 partially written filelog revision from the manifest.</para>
|
bos@559
|
632
|
bos@559
|
633 </sect2>
|
bos@559
|
634 <sect2>
|
bos@559
|
635 <title>Concurrent access</title>
|
bos@559
|
636
|
bos@584
|
637 <para id="x_32d">The read/write ordering and atomicity guarantees mean that
|
bos@559
|
638 Mercurial never needs to <emphasis>lock</emphasis> a
|
bos@559
|
639 repository when it's reading data, even if the repository is
|
bos@559
|
640 being written to while the read is occurring. This has a big
|
bos@559
|
641 effect on scalability; you can have an arbitrary number of
|
bos@559
|
642 Mercurial processes safely reading data from a repository
|
bos@701
|
643 all at once, no matter whether it's being written to or
|
bos@559
|
644 not.</para>
|
bos@559
|
645
|
bos@584
|
646 <para id="x_32e">The lockless nature of reading means that if you're
|
bos@559
|
647 sharing a repository on a multi-user system, you don't need to
|
bos@559
|
648 grant other local users permission to
|
bos@559
|
649 <emphasis>write</emphasis> to your repository in order for
|
bos@559
|
650 them to be able to clone it or pull changes from it; they only
|
bos@559
|
651 need <emphasis>read</emphasis> permission. (This is
|
bos@559
|
652 <emphasis>not</emphasis> a common feature among revision
|
bos@559
|
653 control systems, so don't take it for granted! Most require
|
bos@559
|
654 readers to be able to lock a repository to access it safely,
|
bos@559
|
655 and this requires write permission on at least one directory,
|
bos@559
|
656 which of course makes for all kinds of nasty and annoying
|
bos@559
|
657 security and administrative problems.)</para>
|
bos@559
|
658
|
bos@584
|
659 <para id="x_32f">Mercurial uses locks to ensure that only one process can
|
bos@559
|
660 write to a repository at a time (the locking mechanism is safe
|
bos@559
|
661 even over filesystems that are notoriously hostile to locking,
|
bos@559
|
662 such as NFS). If a repository is locked, a writer will wait
|
bos@559
|
663 for a while to retry if the repository becomes unlocked, but
|
bos@559
|
664 if the repository remains locked for too long, the process
|
bos@559
|
665 attempting to write will time out after a while. This means
|
bos@559
|
666 that your daily automated scripts won't get stuck forever and
|
bos@559
|
667 pile up if a system crashes unnoticed, for example. (Yes, the
|
bos@559
|
668 timeout is configurable, from zero to infinity.)</para>
|
bos@559
|
669
|
bos@559
|
670 <sect3>
|
bos@559
|
671 <title>Safe dirstate access</title>
|
bos@559
|
672
|
bos@584
|
673 <para id="x_330">As with revision data, Mercurial doesn't take a lock to
|
bos@559
|
674 read the dirstate file; it does acquire a lock to write it.
|
bos@559
|
675 To avoid the possibility of reading a partially written copy
|
bos@559
|
676 of the dirstate file, Mercurial writes to a file with a
|
bos@559
|
677 unique name in the same directory as the dirstate file, then
|
bos@559
|
678 renames the temporary file atomically to
|
bos@559
|
679 <filename>dirstate</filename>. The file named
|
bos@559
|
680 <filename>dirstate</filename> is thus guaranteed to be
|
bos@559
|
681 complete, not partially written.</para>
|
bos@559
|
682
|
bos@559
|
683 </sect3>
|
bos@559
|
684 </sect2>
|
bos@559
|
685 <sect2>
|
bos@559
|
686 <title>Avoiding seeks</title>
|
bos@559
|
687
|
bos@584
|
688 <para id="x_331">Critical to Mercurial's performance is the avoidance of
|
bos@559
|
689 seeks of the disk head, since any seek is far more expensive
|
bos@559
|
690 than even a comparatively large read operation.</para>
|
bos@559
|
691
|
bos@584
|
692 <para id="x_332">This is why, for example, the dirstate is stored in a
|
bos@559
|
693 single file. If there were a dirstate file per directory that
|
bos@559
|
694 Mercurial tracked, the disk would seek once per directory.
|
bos@559
|
695 Instead, Mercurial reads the entire single dirstate file in
|
bos@559
|
696 one step.</para>
|
bos@559
|
697
|
bos@584
|
698 <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme
|
bos@559
|
699 when cloning a repository on local storage. Instead of
|
bos@559
|
700 copying every revlog file from the old repository into the new
|
bos@559
|
701 repository, it makes a <quote>hard link</quote>, which is a
|
bos@559
|
702 shorthand way to say <quote>these two names point to the same
|
bos@559
|
703 file</quote>. When Mercurial is about to write to one of a
|
bos@559
|
704 revlog's files, it checks to see if the number of names
|
bos@559
|
705 pointing at the file is greater than one. If it is, more than
|
bos@559
|
706 one repository is using the file, so Mercurial makes a new
|
bos@559
|
707 copy of the file that is private to this repository.</para>
|
bos@559
|
708
|
bos@584
|
709 <para id="x_334">A few revision control developers have pointed out that
|
bos@559
|
710 this idea of making a complete private copy of a file is not
|
bos@559
|
711 very efficient in its use of storage. While this is true,
|
bos@559
|
712 storage is cheap, and this method gives the highest
|
bos@559
|
713 performance while deferring most book-keeping to the operating
|
bos@559
|
714 system. An alternative scheme would most likely reduce
|
bos@701
|
715 performance and increase the complexity of the software, but
|
bos@701
|
716 speed and simplicity are key to the <quote>feel</quote> of
|
bos@559
|
717 day-to-day use.</para>
|
bos@559
|
718
|
bos@559
|
719 </sect2>
|
bos@559
|
720 <sect2>
|
bos@559
|
721 <title>Other contents of the dirstate</title>
|
bos@559
|
722
|
bos@584
|
723 <para id="x_335">Because Mercurial doesn't force you to tell it when you're
|
bos@559
|
724 modifying a file, it uses the dirstate to store some extra
|
bos@559
|
725 information so it can determine efficiently whether you have
|
bos@559
|
726 modified a file. For each file in the working directory, it
|
bos@559
|
727 stores the time that it last modified the file itself, and the
|
bos@559
|
728 size of the file at that time.</para>
|
bos@559
|
729
|
bos@584
|
730 <para id="x_336">When you explicitly <command role="hg-cmd">hg
|
bos@559
|
731 add</command>, <command role="hg-cmd">hg remove</command>,
|
bos@559
|
732 <command role="hg-cmd">hg rename</command> or <command
|
bos@559
|
733 role="hg-cmd">hg copy</command> files, Mercurial updates the
|
bos@559
|
734 dirstate so that it knows what to do with those files when you
|
bos@559
|
735 commit.</para>
|
bos@559
|
736
|
bos@701
|
737 <para id="x_337">The dirstate helps Mercurial to efficiently
|
bos@701
|
738 check the status of files in a repository.</para>
|
bos@701
|
739
|
bos@701
|
740 <itemizedlist>
|
bos@701
|
741 <listitem>
|
bos@702
|
742 <para id="x_726">When Mercurial checks the state of a file in the
|
bos@701
|
743 working directory, it first checks a file's modification
|
bos@701
|
744 time against the time in the dirstate that records when
|
bos@701
|
745 Mercurial last wrote the file. If the last modified time
|
bos@701
|
746 is the same as the time when Mercurial wrote the file, the
|
bos@701
|
747 file must not have been modified, so Mercurial does not
|
bos@701
|
748 need to check any further.</para>
|
bos@701
|
749 </listitem>
|
bos@701
|
750 <listitem>
|
bos@702
|
751 <para id="x_727">If the file's size has changed, the file must have
|
bos@701
|
752 been modified. If the modification time has changed, but
|
bos@701
|
753 the size has not, only then does Mercurial need to
|
bos@701
|
754 actually read the contents of the file to see if it has
|
bos@701
|
755 changed.</para>
|
bos@701
|
756 </listitem>
|
bos@701
|
757 </itemizedlist>
|
bos@701
|
758
|
bos@702
|
759 <para id="x_728">Storing the modification time and size dramatically
|
bos@701
|
760 reduces the number of read operations that Mercurial needs to
|
bos@701
|
761 perform when we run commands like <command>hg status</command>.
|
bos@701
|
762 This results in large performance improvements.</para>
|
bos@559
|
763 </sect2>
|
bos@559
|
764 </sect1>
|
belaran@964
|
765 </chapter>
|
belaran@964
|
766
|
belaran@964
|
767 <!--
|
belaran@964
|
768 local variables:
|
belaran@964
|
769 sgml-parent-document: ("00book.xml" "book" "chapter")
|
belaran@964
|
770 end:
|
bos@559
|
771 -->
|