rev |
line source |
bos@559
|
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
|
bos@559
|
2
|
dongsheng@625
|
3 <chapter id="chap.concepts">
|
bos@572
|
4 <?dbhtml filename="behind-the-scenes.html"?>
|
bos@559
|
5 <title>Behind the scenes</title>
|
bos@559
|
6
|
bos@559
|
7 <para>Unlike many revision control systems, the concepts upon which
|
bos@559
|
8 Mercurial is built are simple enough that it's easy to understand
|
bos@559
|
9 how the software really works. Knowing this certainly isn't
|
bos@559
|
10 necessary, but I find it useful to have a <quote>mental
|
bos@559
|
11 model</quote> of what's going on.</para>
|
bos@559
|
12
|
bos@559
|
13 <para>This understanding gives me confidence that Mercurial has been
|
bos@559
|
14 carefully designed to be both <emphasis>safe</emphasis> and
|
bos@559
|
15 <emphasis>efficient</emphasis>. And just as importantly, if it's
|
bos@559
|
16 easy for me to retain a good idea of what the software is doing
|
bos@559
|
17 when I perform a revision control task, I'm less likely to be
|
bos@559
|
18 surprised by its behaviour.</para>
|
bos@559
|
19
|
bos@559
|
20 <para>In this chapter, we'll initially cover the core concepts
|
bos@559
|
21 behind Mercurial's design, then continue to discuss some of the
|
bos@559
|
22 interesting details of its implementation.</para>
|
bos@559
|
23
|
bos@559
|
24 <sect1>
|
bos@559
|
25 <title>Mercurial's historical record</title>
|
bos@559
|
26
|
bos@559
|
27 <sect2>
|
bos@559
|
28 <title>Tracking the history of a single file</title>
|
bos@559
|
29
|
bos@559
|
30 <para>When Mercurial tracks modifications to a file, it stores
|
bos@559
|
31 the history of that file in a metadata object called a
|
bos@559
|
32 <emphasis>filelog</emphasis>. Each entry in the filelog
|
bos@559
|
33 contains enough information to reconstruct one revision of the
|
bos@559
|
34 file that is being tracked. Filelogs are stored as files in
|
bos@559
|
35 the <filename role="special"
|
bos@559
|
36 class="directory">.hg/store/data</filename> directory. A
|
bos@559
|
37 filelog contains two kinds of information: revision data, and
|
bos@559
|
38 an index to help Mercurial to find a revision
|
bos@559
|
39 efficiently.</para>
|
bos@559
|
40
|
bos@559
|
41 <para>A file that is large, or has a lot of history, has its
|
bos@559
|
42 filelog stored in separate data
|
bos@559
|
43 (<quote><literal>.d</literal></quote> suffix) and index
|
bos@559
|
44 (<quote><literal>.i</literal></quote> suffix) files. For
|
bos@559
|
45 small files without much history, the revision data and index
|
bos@559
|
46 are combined in a single <quote><literal>.i</literal></quote>
|
bos@559
|
47 file. The correspondence between a file in the working
|
bos@559
|
48 directory and the filelog that tracks its history in the
|
bos@559
|
49 repository is illustrated in figure <xref
|
dongsheng@640
|
50 endterm="fig.concepts.filelog.caption"
|
dongsheng@625
|
51 linkend="fig.concepts.filelog"/>.</para>
|
dongsheng@625
|
52
|
dongsheng@625
|
53 <informalfigure id="fig.concepts.filelog">
|
dongsheng@640
|
54 <mediaobject>
|
dongsheng@640
|
55 <imageobject><imagedata fileref="images/filelog.png"/></imageobject>
|
dongsheng@640
|
56 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
57 <caption><para id="fig.concepts.filelog.caption">Relationships between
|
dongsheng@640
|
58 files in working directory and filelogs in repository</para>
|
dongsheng@640
|
59 </caption>
|
dongsheng@640
|
60 </mediaobject>
|
bos@559
|
61 </informalfigure>
|
bos@559
|
62
|
bos@559
|
63 </sect2>
|
bos@559
|
64 <sect2>
|
bos@559
|
65 <title>Managing tracked files</title>
|
bos@559
|
66
|
bos@559
|
67 <para>Mercurial uses a structure called a
|
bos@559
|
68 <emphasis>manifest</emphasis> to collect together information
|
bos@559
|
69 about the files that it tracks. Each entry in the manifest
|
bos@559
|
70 contains information about the files present in a single
|
bos@559
|
71 changeset. An entry records which files are present in the
|
bos@559
|
72 changeset, the revision of each file, and a few other pieces
|
bos@559
|
73 of file metadata.</para>
|
bos@559
|
74
|
bos@559
|
75 </sect2>
|
bos@559
|
76 <sect2>
|
bos@559
|
77 <title>Recording changeset information</title>
|
bos@559
|
78
|
bos@559
|
79 <para>The <emphasis>changelog</emphasis> contains information
|
bos@559
|
80 about each changeset. Each revision records who committed a
|
bos@559
|
81 change, the changeset comment, other pieces of
|
bos@559
|
82 changeset-related information, and the revision of the
|
bos@559
|
83 manifest to use.</para>
|
bos@559
|
84
|
bos@559
|
85 </sect2>
|
bos@559
|
86 <sect2>
|
bos@559
|
87 <title>Relationships between revisions</title>
|
bos@559
|
88
|
bos@559
|
89 <para>Within a changelog, a manifest, or a filelog, each
|
bos@559
|
90 revision stores a pointer to its immediate parent (or to its
|
bos@559
|
91 two parents, if it's a merge revision). As I mentioned above,
|
bos@559
|
92 there are also relationships between revisions
|
bos@559
|
93 <emphasis>across</emphasis> these structures, and they are
|
bos@559
|
94 hierarchical in nature.</para>
|
bos@559
|
95
|
bos@559
|
96 <para>For every changeset in a repository, there is exactly one
|
bos@559
|
97 revision stored in the changelog. Each revision of the
|
bos@559
|
98 changelog contains a pointer to a single revision of the
|
bos@559
|
99 manifest. A revision of the manifest stores a pointer to a
|
bos@559
|
100 single revision of each filelog tracked when that changeset
|
bos@559
|
101 was created. These relationships are illustrated in figure
|
dongsheng@640
|
102 <xref endterm="fig.concepts.metadata.caption"
|
dongsheng@640
|
103 linkend="fig.concepts.metadata"/>.</para>
|
dongsheng@625
|
104
|
dongsheng@625
|
105 <informalfigure id="fig.concepts.metadata">
|
dongsheng@640
|
106 <mediaobject>
|
dongsheng@640
|
107 <imageobject><imagedata fileref="images/metadata.png"/></imageobject>
|
dongsheng@640
|
108 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
109 <caption><para id="fig.concepts.metadata.caption">Metadata
|
dongsheng@640
|
110 relationships</para></caption>
|
dongsheng@640
|
111 </mediaobject>
|
bos@559
|
112 </informalfigure>
|
bos@559
|
113
|
bos@559
|
114 <para>As the illustration shows, there is
|
bos@559
|
115 <emphasis>not</emphasis> a <quote>one to one</quote>
|
bos@559
|
116 relationship between revisions in the changelog, manifest, or
|
bos@559
|
117 filelog. If the manifest hasn't changed between two
|
bos@559
|
118 changesets, the changelog entries for those changesets will
|
bos@559
|
119 point to the same revision of the manifest. If a file that
|
bos@559
|
120 Mercurial tracks hasn't changed between two changesets, the
|
bos@559
|
121 entry for that file in the two revisions of the manifest will
|
bos@559
|
122 point to the same revision of its filelog.</para>
|
bos@559
|
123
|
bos@559
|
124 </sect2>
|
bos@559
|
125 </sect1>
|
bos@559
|
126 <sect1>
|
bos@559
|
127 <title>Safe, efficient storage</title>
|
bos@559
|
128
|
bos@559
|
129 <para>The underpinnings of changelogs, manifests, and filelogs are
|
bos@559
|
130 provided by a single structure called the
|
bos@559
|
131 <emphasis>revlog</emphasis>.</para>
|
bos@559
|
132
|
bos@559
|
133 <sect2>
|
bos@559
|
134 <title>Efficient storage</title>
|
bos@559
|
135
|
bos@559
|
136 <para>The revlog provides efficient storage of revisions using a
|
bos@559
|
137 <emphasis>delta</emphasis> mechanism. Instead of storing a
|
bos@559
|
138 complete copy of a file for each revision, it stores the
|
bos@559
|
139 changes needed to transform an older revision into the new
|
bos@559
|
140 revision. For many kinds of file data, these deltas are
|
bos@559
|
141 typically a fraction of a percent of the size of a full copy
|
bos@559
|
142 of a file.</para>
|
bos@559
|
143
|
bos@559
|
144 <para>Some obsolete revision control systems can only work with
|
bos@559
|
145 deltas of text files. They must either store binary files as
|
bos@559
|
146 complete snapshots or encoded into a text representation, both
|
bos@559
|
147 of which are wasteful approaches. Mercurial can efficiently
|
bos@559
|
148 handle deltas of files with arbitrary binary contents; it
|
bos@559
|
149 doesn't need to treat text as special.</para>
|
bos@559
|
150
|
bos@559
|
151 </sect2>
|
dongsheng@625
|
152 <sect2 id="sec.concepts.txn">
|
bos@559
|
153 <title>Safe operation</title>
|
bos@559
|
154
|
bos@559
|
155 <para>Mercurial only ever <emphasis>appends</emphasis> data to
|
bos@559
|
156 the end of a revlog file. It never modifies a section of a
|
bos@559
|
157 file after it has written it. This is both more robust and
|
bos@559
|
158 efficient than schemes that need to modify or rewrite
|
bos@559
|
159 data.</para>
|
bos@559
|
160
|
bos@559
|
161 <para>In addition, Mercurial treats every write as part of a
|
bos@559
|
162 <emphasis>transaction</emphasis> that can span a number of
|
bos@559
|
163 files. A transaction is <emphasis>atomic</emphasis>: either
|
bos@559
|
164 the entire transaction succeeds and its effects are all
|
bos@559
|
165 visible to readers in one go, or the whole thing is undone.
|
bos@559
|
166 This guarantee of atomicity means that if you're running two
|
bos@559
|
167 copies of Mercurial, where one is reading data and one is
|
bos@559
|
168 writing it, the reader will never see a partially written
|
bos@559
|
169 result that might confuse it.</para>
|
bos@559
|
170
|
bos@559
|
171 <para>The fact that Mercurial only appends to files makes it
|
bos@559
|
172 easier to provide this transactional guarantee. The easier it
|
bos@559
|
173 is to do stuff like this, the more confident you should be
|
bos@559
|
174 that it's done correctly.</para>
|
bos@559
|
175
|
bos@559
|
176 </sect2>
|
bos@559
|
177 <sect2>
|
bos@559
|
178 <title>Fast retrieval</title>
|
bos@559
|
179
|
bos@559
|
180 <para>Mercurial cleverly avoids a pitfall common to all earlier
|
bos@559
|
181 revision control systems: the problem of <emphasis>inefficient
|
bos@559
|
182 retrieval</emphasis>. Most revision control systems store
|
bos@559
|
183 the contents of a revision as an incremental series of
|
bos@559
|
184 modifications against a <quote>snapshot</quote>. To
|
bos@559
|
185 reconstruct a specific revision, you must first read the
|
bos@559
|
186 snapshot, and then every one of the revisions between the
|
bos@559
|
187 snapshot and your target revision. The more history that a
|
bos@559
|
188 file accumulates, the more revisions you must read, hence the
|
bos@559
|
189 longer it takes to reconstruct a particular revision.</para>
|
bos@559
|
190
|
dongsheng@625
|
191 <informalfigure id="fig.concepts.snapshot">
|
dongsheng@640
|
192 <mediaobject>
|
dongsheng@640
|
193 <imageobject><imagedata fileref="images/snapshot.png"/></imageobject>
|
dongsheng@640
|
194 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
195 <caption><para id="fig.concepts.snapshot.caption">Snapshot of
|
dongsheng@640
|
196 a revlog, with incremental deltas</para></caption>
|
dongsheng@640
|
197 </mediaobject>
|
bos@559
|
198 </informalfigure>
|
bos@559
|
199
|
bos@559
|
200 <para>The innovation that Mercurial applies to this problem is
|
bos@559
|
201 simple but effective. Once the cumulative amount of delta
|
bos@559
|
202 information stored since the last snapshot exceeds a fixed
|
bos@559
|
203 threshold, it stores a new snapshot (compressed, of course),
|
bos@559
|
204 instead of another delta. This makes it possible to
|
bos@559
|
205 reconstruct <emphasis>any</emphasis> revision of a file
|
bos@559
|
206 quickly. This approach works so well that it has since been
|
bos@559
|
207 copied by several other revision control systems.</para>
|
bos@559
|
208
|
dongsheng@640
|
209 <para>Figure <xref endterm="fig.concepts.snapshot.caption"
|
dongsheng@640
|
210 linkend="fig.concepts.snapshot"/> illustrates
|
bos@559
|
211 the idea. In an entry in a revlog's index file, Mercurial
|
bos@559
|
212 stores the range of entries from the data file that it must
|
bos@559
|
213 read to reconstruct a particular revision.</para>
|
bos@559
|
214
|
bos@559
|
215 <sect3>
|
bos@559
|
216 <title>Aside: the influence of video compression</title>
|
bos@559
|
217
|
bos@559
|
218 <para>If you're familiar with video compression or have ever
|
bos@559
|
219 watched a TV feed through a digital cable or satellite
|
bos@559
|
220 service, you may know that most video compression schemes
|
bos@559
|
221 store each frame of video as a delta against its predecessor
|
bos@559
|
222 frame. In addition, these schemes use <quote>lossy</quote>
|
bos@559
|
223 compression techniques to increase the compression ratio, so
|
bos@559
|
224 visual errors accumulate over the course of a number of
|
bos@559
|
225 inter-frame deltas.</para>
|
bos@559
|
226
|
bos@559
|
227 <para>Because it's possible for a video stream to <quote>drop
|
bos@559
|
228 out</quote> occasionally due to signal glitches, and to
|
bos@559
|
229 limit the accumulation of artefacts introduced by the lossy
|
bos@559
|
230 compression process, video encoders periodically insert a
|
bos@559
|
231 complete frame (called a <quote>key frame</quote>) into the
|
bos@559
|
232 video stream; the next delta is generated against that
|
bos@559
|
233 frame. This means that if the video signal gets
|
bos@559
|
234 interrupted, it will resume once the next key frame is
|
bos@559
|
235 received. Also, the accumulation of encoding errors
|
bos@559
|
236 restarts anew with each key frame.</para>
|
bos@559
|
237
|
bos@559
|
238 </sect3>
|
bos@559
|
239 </sect2>
|
bos@559
|
240 <sect2>
|
bos@559
|
241 <title>Identification and strong integrity</title>
|
bos@559
|
242
|
bos@559
|
243 <para>Along with delta or snapshot information, a revlog entry
|
bos@559
|
244 contains a cryptographic hash of the data that it represents.
|
bos@559
|
245 This makes it difficult to forge the contents of a revision,
|
bos@559
|
246 and easy to detect accidental corruption.</para>
|
bos@559
|
247
|
bos@559
|
248 <para>Hashes provide more than a mere check against corruption;
|
bos@559
|
249 they are used as the identifiers for revisions. The changeset
|
bos@559
|
250 identification hashes that you see as an end user are from
|
bos@559
|
251 revisions of the changelog. Although filelogs and the
|
bos@559
|
252 manifest also use hashes, Mercurial only uses these behind the
|
bos@559
|
253 scenes.</para>
|
bos@559
|
254
|
bos@559
|
255 <para>Mercurial verifies that hashes are correct when it
|
bos@559
|
256 retrieves file revisions and when it pulls changes from
|
bos@559
|
257 another repository. If it encounters an integrity problem, it
|
bos@559
|
258 will complain and stop whatever it's doing.</para>
|
bos@559
|
259
|
bos@559
|
260 <para>In addition to the effect it has on retrieval efficiency,
|
bos@559
|
261 Mercurial's use of periodic snapshots makes it more robust
|
bos@559
|
262 against partial data corruption. If a revlog becomes partly
|
bos@559
|
263 corrupted due to a hardware error or system bug, it's often
|
bos@559
|
264 possible to reconstruct some or most revisions from the
|
bos@559
|
265 uncorrupted sections of the revlog, both before and after the
|
bos@559
|
266 corrupted section. This would not be possible with a
|
bos@559
|
267 delta-only storage model.</para>
|
bos@559
|
268
|
bos@559
|
269 </sect2>
|
bos@559
|
270 </sect1>
|
bos@559
|
271 <sect1>
|
bos@559
|
272 <title>Revision history, branching, and merging</title>
|
bos@559
|
273
|
bos@559
|
274 <para>Every entry in a Mercurial revlog knows the identity of its
|
bos@559
|
275 immediate ancestor revision, usually referred to as its
|
bos@559
|
276 <emphasis>parent</emphasis>. In fact, a revision contains room
|
bos@559
|
277 for not one parent, but two. Mercurial uses a special hash,
|
bos@559
|
278 called the <quote>null ID</quote>, to represent the idea
|
bos@559
|
279 <quote>there is no parent here</quote>. This hash is simply a
|
bos@559
|
280 string of zeroes.</para>
|
bos@559
|
281
|
dongsheng@640
|
282 <para>In figure <xref endterm="fig.concepts.revlog.caption"
|
dongsheng@640
|
283 linkend="fig.concepts.revlog"/>, you can see
|
bos@559
|
284 an example of the conceptual structure of a revlog. Filelogs,
|
bos@559
|
285 manifests, and changelogs all have this same structure; they
|
bos@559
|
286 differ only in the kind of data stored in each delta or
|
bos@559
|
287 snapshot.</para>
|
bos@559
|
288
|
bos@559
|
289 <para>The first revision in a revlog (at the bottom of the image)
|
bos@559
|
290 has the null ID in both of its parent slots. For a
|
bos@559
|
291 <quote>normal</quote> revision, its first parent slot contains
|
bos@559
|
292 the ID of its parent revision, and its second contains the null
|
bos@559
|
293 ID, indicating that the revision has only one real parent. Any
|
bos@559
|
294 two revisions that have the same parent ID are branches. A
|
bos@559
|
295 revision that represents a merge between branches has two normal
|
bos@559
|
296 revision IDs in its parent slots.</para>
|
bos@559
|
297
|
dongsheng@625
|
298 <informalfigure id="fig.concepts.revlog">
|
dongsheng@640
|
299 <mediaobject>
|
dongsheng@640
|
300 <imageobject><imagedata fileref="images/revlog.png"/></imageobject>
|
dongsheng@640
|
301 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
302 <caption><para id="fig.concepts.revlog.caption">Revision in revlog</para>
|
dongsheng@640
|
303 </caption>
|
dongsheng@640
|
304 </mediaobject>
|
bos@559
|
305 </informalfigure>
|
bos@559
|
306
|
bos@559
|
307 </sect1>
|
bos@559
|
308 <sect1>
|
bos@559
|
309 <title>The working directory</title>
|
bos@559
|
310
|
bos@559
|
311 <para>In the working directory, Mercurial stores a snapshot of the
|
bos@559
|
312 files from the repository as of a particular changeset.</para>
|
bos@559
|
313
|
bos@559
|
314 <para>The working directory <quote>knows</quote> which changeset
|
bos@559
|
315 it contains. When you update the working directory to contain a
|
bos@559
|
316 particular changeset, Mercurial looks up the appropriate
|
bos@559
|
317 revision of the manifest to find out which files it was tracking
|
bos@559
|
318 at the time that changeset was committed, and which revision of
|
bos@559
|
319 each file was then current. It then recreates a copy of each of
|
bos@559
|
320 those files, with the same contents it had when the changeset
|
bos@559
|
321 was committed.</para>
|
bos@559
|
322
|
bos@559
|
323 <para>The <emphasis>dirstate</emphasis> contains Mercurial's
|
bos@559
|
324 knowledge of the working directory. This details which
|
bos@559
|
325 changeset the working directory is updated to, and all of the
|
bos@559
|
326 files that Mercurial is tracking in the working
|
bos@559
|
327 directory.</para>
|
bos@559
|
328
|
bos@559
|
329 <para>Just as a revision of a revlog has room for two parents, so
|
bos@559
|
330 that it can represent either a normal revision (with one parent)
|
bos@559
|
331 or a merge of two earlier revisions, the dirstate has slots for
|
bos@559
|
332 two parents. When you use the <command role="hg-cmd">hg
|
bos@559
|
333 update</command> command, the changeset that you update to is
|
bos@559
|
334 stored in the <quote>first parent</quote> slot, and the null ID
|
bos@559
|
335 in the second. When you <command role="hg-cmd">hg
|
bos@559
|
336 merge</command> with another changeset, the first parent
|
bos@559
|
337 remains unchanged, and the second parent is filled in with the
|
bos@559
|
338 changeset you're merging with. The <command role="hg-cmd">hg
|
bos@559
|
339 parents</command> command tells you what the parents of the
|
bos@559
|
340 dirstate are.</para>
|
bos@559
|
341
|
bos@559
|
342 <sect2>
|
bos@559
|
343 <title>What happens when you commit</title>
|
bos@559
|
344
|
bos@559
|
345 <para>The dirstate stores parent information for more than just
|
bos@559
|
346 book-keeping purposes. Mercurial uses the parents of the
|
bos@559
|
347 dirstate as <emphasis>the parents of a new
|
bos@559
|
348 changeset</emphasis> when you perform a commit.</para>
|
bos@559
|
349
|
dongsheng@625
|
350 <informalfigure id="fig.concepts.wdir">
|
dongsheng@640
|
351 <mediaobject>
|
dongsheng@640
|
352 <imageobject><imagedata fileref="images/wdir.png"/></imageobject>
|
dongsheng@640
|
353 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
354 <caption><para id="fig.concepts.wdir.caption">The working
|
dongsheng@640
|
355 directory can have two parents</para></caption>
|
dongsheng@640
|
356 </mediaobject>
|
bos@559
|
357 </informalfigure>
|
bos@559
|
358
|
dongsheng@640
|
359 <para>Figure <xref endterm="fig.concepts.wdir.caption"
|
dongsheng@640
|
360 linkend="fig.concepts.wdir"/> shows the
|
bos@559
|
361 normal state of the working directory, where it has a single
|
bos@559
|
362 changeset as parent. That changeset is the
|
bos@559
|
363 <emphasis>tip</emphasis>, the newest changeset in the
|
bos@559
|
364 repository that has no children.</para>
|
bos@559
|
365
|
dongsheng@625
|
366 <informalfigure id="fig.concepts.wdir-after-commit">
|
dongsheng@640
|
367 <mediaobject>
|
dongsheng@640
|
368 <imageobject><imagedata fileref="images/wdir-after-commit.png"/>
|
dongsheng@640
|
369 </imageobject>
|
dongsheng@640
|
370 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
371 <caption><para id="fig.concepts.wdir-after-commit.caption">The working
|
dongsheng@640
|
372 directory gains new parents after a commit</para></caption>
|
dongsheng@640
|
373 </mediaobject>
|
bos@559
|
374 </informalfigure>
|
bos@559
|
375
|
bos@559
|
376 <para>It's useful to think of the working directory as
|
bos@559
|
377 <quote>the changeset I'm about to commit</quote>. Any files
|
bos@559
|
378 that you tell Mercurial that you've added, removed, renamed,
|
bos@559
|
379 or copied will be reflected in that changeset, as will
|
bos@559
|
380 modifications to any files that Mercurial is already tracking;
|
bos@559
|
381 the new changeset will have the parents of the working
|
bos@559
|
382 directory as its parents.</para>
|
bos@559
|
383
|
bos@559
|
384 <para>After a commit, Mercurial will update the parents of the
|
bos@559
|
385 working directory, so that the first parent is the ID of the
|
bos@559
|
386 new changeset, and the second is the null ID. This is shown
|
dongsheng@640
|
387 in figure <xref endterm="fig.concepts.wdir-after-commit.caption"
|
dongsheng@640
|
388 linkend="fig.concepts.wdir-after-commit"/>.
|
bos@559
|
389 Mercurial
|
bos@559
|
390 doesn't touch any of the files in the working directory when
|
bos@559
|
391 you commit; it just modifies the dirstate to note its new
|
bos@559
|
392 parents.</para>
|
bos@559
|
393
|
bos@559
|
394 </sect2>
|
bos@559
|
395 <sect2>
|
bos@559
|
396 <title>Creating a new head</title>
|
bos@559
|
397
|
bos@559
|
398 <para>It's perfectly normal to update the working directory to a
|
bos@559
|
399 changeset other than the current tip. For example, you might
|
bos@559
|
400 want to know what your project looked like last Tuesday, or
|
bos@559
|
401 you could be looking through changesets to see which one
|
bos@559
|
402 introduced a bug. In cases like this, the natural thing to do
|
bos@559
|
403 is update the working directory to the changeset you're
|
bos@559
|
404 interested in, and then examine the files in the working
|
bos@559
|
405 directory directly to see their contents as they were when you
|
bos@559
|
406 committed that changeset. The effect of this is shown in
|
dongsheng@640
|
407 figure <xref endterm="fig.concepts.wdir-pre-branch.caption"
|
dongsheng@640
|
408 linkend="fig.concepts.wdir-pre-branch"/>.</para>
|
dongsheng@625
|
409
|
dongsheng@625
|
410 <informalfigure id="fig.concepts.wdir-pre-branch">
|
dongsheng@640
|
411 <mediaobject>
|
dongsheng@640
|
412 <imageobject><imagedata fileref="images/wdir-pre-branch.png"/>
|
dongsheng@640
|
413 </imageobject>
|
dongsheng@640
|
414 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
415 <caption><para id="fig.concepts.wdir-pre-branch.caption">The working
|
dongsheng@640
|
416 directory, updated to an older changeset</para></caption>
|
dongsheng@640
|
417 </mediaobject>
|
bos@559
|
418 </informalfigure>
|
bos@559
|
419
|
bos@559
|
420 <para>Having updated the working directory to an older
|
bos@559
|
421 changeset, what happens if you make some changes, and then
|
bos@559
|
422 commit? Mercurial behaves in the same way as I outlined
|
bos@559
|
423 above. The parents of the working directory become the
|
bos@559
|
424 parents of the new changeset. This new changeset has no
|
bos@559
|
425 children, so it becomes the new tip. And the repository now
|
bos@559
|
426 contains two changesets that have no children; we call these
|
bos@559
|
427 <emphasis>heads</emphasis>. You can see the structure that
|
bos@559
|
428 this creates in figure <xref
|
dongsheng@640
|
429 endterm="fig.concepts.wdir-branch.caption"
|
dongsheng@625
|
430 linkend="fig.concepts.wdir-branch"/>.</para>
|
dongsheng@625
|
431
|
dongsheng@625
|
432 <informalfigure id="fig.concepts.wdir-branch">
|
dongsheng@640
|
433 <mediaobject>
|
dongsheng@640
|
434 <imageobject><imagedata fileref="images/wdir-branch.png"/>
|
dongsheng@640
|
435 </imageobject>
|
dongsheng@640
|
436 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
437 <caption><para id="fig.concepts.wdir-branch.caption">After a
|
dongsheng@640
|
438 commit made while synced to an older changeset</para></caption>
|
dongsheng@640
|
439 </mediaobject>
|
bos@559
|
440 </informalfigure>
|
bos@559
|
441
|
bos@559
|
442 <note>
|
bos@559
|
443 <para> If you're new to Mercurial, you should keep in mind a
|
bos@559
|
444 common <quote>error</quote>, which is to use the <command
|
bos@559
|
445 role="hg-cmd">hg pull</command> command without any
|
bos@559
|
446 options. By default, the <command role="hg-cmd">hg
|
bos@559
|
447 pull</command> command <emphasis>does not</emphasis>
|
bos@559
|
448 update the working directory, so you'll bring new changesets
|
bos@559
|
449 into your repository, but the working directory will stay
|
bos@559
|
450 synced at the same changeset as before the pull. If you
|
bos@559
|
451 make some changes and commit afterwards, you'll thus create
|
bos@559
|
452 a new head, because your working directory isn't synced to
|
bos@559
|
453 whatever the current tip is.</para>
|
bos@559
|
454
|
bos@559
|
455 <para> I put the word <quote>error</quote> in quotes because
|
bos@559
|
456 all that you need to do to rectify this situation is
|
bos@559
|
457 <command role="hg-cmd">hg merge</command>, then <command
|
bos@559
|
458 role="hg-cmd">hg commit</command>. In other words, this
|
bos@559
|
459 almost never has negative consequences; it just surprises
|
bos@559
|
460 people. I'll discuss other ways to avoid this behaviour,
|
bos@559
|
461 and why Mercurial behaves in this initially surprising way,
|
bos@559
|
462 later on.</para>
|
bos@559
|
463 </note>
|
bos@559
|
464
|
bos@559
|
465 </sect2>
|
bos@559
|
466 <sect2>
|
bos@559
|
467 <title>Merging heads</title>
|
bos@559
|
468
|
bos@559
|
469 <para>When you run the <command role="hg-cmd">hg merge</command>
|
bos@559
|
470 command, Mercurial leaves the first parent of the working
|
bos@559
|
471 directory unchanged, and sets the second parent to the
|
bos@559
|
472 changeset you're merging with, as shown in figure <xref
|
dongsheng@640
|
473 endterm="fig.concepts.wdir-merge.caption"
|
dongsheng@625
|
474 linkend="fig.concepts.wdir-merge"/>.</para>
|
dongsheng@625
|
475
|
dongsheng@625
|
476 <informalfigure id="fig.concepts.wdir-merge">
|
dongsheng@640
|
477 <mediaobject>
|
dongsheng@640
|
478 <imageobject><imagedata fileref="images/wdir-merge.png"/>
|
dongsheng@640
|
479 </imageobject>
|
dongsheng@640
|
480 <textobject><phrase>XXX add text</phrase></textobject>
|
dongsheng@640
|
481 <caption><para id="fig.concepts.wdir-merge.caption">Merging two
|
dongsheng@640
|
482 heads</para></caption>
|
dongsheng@640
|
483 </mediaobject>
|
bos@559
|
484 </informalfigure>
|
bos@559
|
485
|
bos@559
|
486 <para>Mercurial also has to modify the working directory, to
|
bos@559
|
487 merge the files managed in the two changesets. Simplified a
|
bos@559
|
488 little, the merging process goes like this, for every file in
|
bos@559
|
489 the manifests of both changesets.</para>
|
bos@559
|
490 <itemizedlist>
|
bos@559
|
491 <listitem><para>If neither changeset has modified a file, do
|
bos@559
|
492 nothing with that file.</para>
|
bos@559
|
493 </listitem>
|
bos@559
|
494 <listitem><para>If one changeset has modified a file, and the
|
bos@559
|
495 other hasn't, create the modified copy of the file in the
|
bos@559
|
496 working directory.</para>
|
bos@559
|
497 </listitem>
|
bos@559
|
498 <listitem><para>If one changeset has removed a file, and the
|
bos@559
|
499 other hasn't (or has also deleted it), delete the file
|
bos@559
|
500 from the working directory.</para>
|
bos@559
|
501 </listitem>
|
bos@559
|
502 <listitem><para>If one changeset has removed a file, but the
|
bos@559
|
503 other has modified the file, ask the user what to do: keep
|
bos@559
|
504 the modified file, or remove it?</para>
|
bos@559
|
505 </listitem>
|
bos@559
|
506 <listitem><para>If both changesets have modified a file,
|
bos@559
|
507 invoke an external merge program to choose the new
|
bos@559
|
508 contents for the merged file. This may require input from
|
bos@559
|
509 the user.</para>
|
bos@559
|
510 </listitem>
|
bos@559
|
511 <listitem><para>If one changeset has modified a file, and the
|
bos@559
|
512 other has renamed or copied the file, make sure that the
|
bos@559
|
513 changes follow the new name of the file.</para>
|
bos@559
|
514 </listitem></itemizedlist>
|
bos@559
|
515 <para>There are more details&emdash;merging has plenty of corner
|
bos@559
|
516 cases&emdash;but these are the most common choices that are
|
bos@559
|
517 involved in a merge. As you can see, most cases are
|
bos@559
|
518 completely automatic, and indeed most merges finish
|
bos@559
|
519 automatically, without requiring your input to resolve any
|
bos@559
|
520 conflicts.</para>
|
bos@559
|
521
|
bos@559
|
522 <para>When you're thinking about what happens when you commit
|
bos@559
|
523 after a merge, once again the working directory is <quote>the
|
bos@559
|
524 changeset I'm about to commit</quote>. After the <command
|
bos@559
|
525 role="hg-cmd">hg merge</command> command completes, the
|
bos@559
|
526 working directory has two parents; these will become the
|
bos@559
|
527 parents of the new changeset.</para>
|
bos@559
|
528
|
bos@559
|
529 <para>Mercurial lets you perform multiple merges, but you must
|
bos@559
|
530 commit the results of each individual merge as you go. This
|
bos@559
|
531 is necessary because Mercurial only tracks two parents for
|
bos@559
|
532 both revisions and the working directory. While it would be
|
bos@559
|
533 technically possible to merge multiple changesets at once, the
|
bos@559
|
534 prospect of user confusion and making a terrible mess of a
|
bos@559
|
535 merge immediately becomes overwhelming.</para>
|
bos@559
|
536
|
bos@559
|
537 </sect2>
|
bos@559
|
538 </sect1>
|
bos@559
|
539 <sect1>
|
bos@559
|
540 <title>Other interesting design features</title>
|
bos@559
|
541
|
bos@559
|
542 <para>In the sections above, I've tried to highlight some of the
|
bos@559
|
543 most important aspects of Mercurial's design, to illustrate that
|
bos@559
|
544 it pays careful attention to reliability and performance.
|
bos@559
|
545 However, the attention to detail doesn't stop there. There are
|
bos@559
|
546 a number of other aspects of Mercurial's construction that I
|
bos@559
|
547 personally find interesting. I'll detail a few of them here,
|
bos@559
|
548 separate from the <quote>big ticket</quote> items above, so that
|
bos@559
|
549 if you're interested, you can gain a better idea of the amount
|
bos@559
|
550 of thinking that goes into a well-designed system.</para>
|
bos@559
|
551
|
bos@559
|
552 <sect2>
|
bos@559
|
553 <title>Clever compression</title>
|
bos@559
|
554
|
bos@559
|
555 <para>When appropriate, Mercurial will store both snapshots and
|
bos@559
|
556 deltas in compressed form. It does this by always
|
bos@559
|
557 <emphasis>trying to</emphasis> compress a snapshot or delta,
|
bos@559
|
558 but only storing the compressed version if it's smaller than
|
bos@559
|
559 the uncompressed version.</para>
|
bos@559
|
560
|
bos@559
|
561 <para>This means that Mercurial does <quote>the right
|
bos@559
|
562 thing</quote> when storing a file whose native form is
|
bos@559
|
563 compressed, such as a <literal>zip</literal> archive or a JPEG
|
bos@559
|
564 image. When these types of files are compressed a second
|
bos@559
|
565 time, the resulting file is usually bigger than the
|
bos@559
|
566 once-compressed form, and so Mercurial will store the plain
|
bos@559
|
567 <literal>zip</literal> or JPEG.</para>
|
bos@559
|
568
|
bos@559
|
569 <para>Deltas between revisions of a compressed file are usually
|
bos@559
|
570 larger than snapshots of the file, and Mercurial again does
|
bos@559
|
571 <quote>the right thing</quote> in these cases. It finds that
|
bos@559
|
572 such a delta exceeds the threshold at which it should store a
|
bos@559
|
573 complete snapshot of the file, so it stores the snapshot,
|
bos@559
|
574 again saving space compared to a naive delta-only
|
bos@559
|
575 approach.</para>
|
bos@559
|
576
|
bos@559
|
577 <sect3>
|
bos@559
|
578 <title>Network recompression</title>
|
bos@559
|
579
|
bos@559
|
580 <para>When storing revisions on disk, Mercurial uses the
|
bos@559
|
581 <quote>deflate</quote> compression algorithm (the same one
|
bos@559
|
582 used by the popular <literal>zip</literal> archive format),
|
bos@559
|
583 which balances good speed with a respectable compression
|
bos@559
|
584 ratio. However, when transmitting revision data over a
|
bos@559
|
585 network connection, Mercurial uncompresses the compressed
|
bos@559
|
586 revision data.</para>
|
bos@559
|
587
|
bos@559
|
588 <para>If the connection is over HTTP, Mercurial recompresses
|
bos@559
|
589 the entire stream of data using a compression algorithm that
|
bos@559
|
590 gives a better compression ratio (the Burrows-Wheeler
|
bos@559
|
591 algorithm from the widely used <literal>bzip2</literal>
|
bos@559
|
592 compression package). This combination of algorithm and
|
bos@559
|
593 compression of the entire stream (instead of a revision at a
|
bos@559
|
594 time) substantially reduces the number of bytes to be
|
bos@559
|
595 transferred, yielding better network performance over almost
|
bos@559
|
596 all kinds of network.</para>
|
bos@559
|
597
|
bos@559
|
598 <para>(If the connection is over <command>ssh</command>,
|
bos@559
|
599 Mercurial <emphasis>doesn't</emphasis> recompress the
|
bos@559
|
600 stream, because <command>ssh</command> can already do this
|
bos@559
|
601 itself.)</para>
|
bos@559
|
602
|
bos@559
|
603 </sect3>
|
bos@559
|
604 </sect2>
|
bos@559
|
605 <sect2>
|
bos@559
|
606 <title>Read/write ordering and atomicity</title>
|
bos@559
|
607
|
bos@559
|
608 <para>Appending to files isn't the whole story when it comes to
|
bos@559
|
609 guaranteeing that a reader won't see a partial write. If you
|
dongsheng@640
|
610 recall figure <xref endterm="fig.concepts.metadata.caption"
|
dongsheng@640
|
611 linkend="fig.concepts.metadata"/>, revisions in the
|
bos@559
|
612 changelog point to revisions in the manifest, and revisions in
|
bos@559
|
613 the manifest point to revisions in filelogs. This hierarchy
|
bos@559
|
614 is deliberate.</para>
|
bos@559
|
615
|
bos@559
|
616 <para>A writer starts a transaction by writing filelog and
|
bos@559
|
617 manifest data, and doesn't write any changelog data until
|
bos@559
|
618 those are finished. A reader starts by reading changelog
|
bos@559
|
619 data, then manifest data, followed by filelog data.</para>
|
bos@559
|
620
|
bos@559
|
621 <para>Since the writer has always finished writing filelog and
|
bos@559
|
622 manifest data before it writes to the changelog, a reader will
|
bos@559
|
623 never read a pointer to a partially written manifest revision
|
bos@559
|
624 from the changelog, and it will never read a pointer to a
|
bos@559
|
625 partially written filelog revision from the manifest.</para>
|
bos@559
|
626
|
bos@559
|
627 </sect2>
|
bos@559
|
628 <sect2>
|
bos@559
|
629 <title>Concurrent access</title>
|
bos@559
|
630
|
bos@559
|
631 <para>The read/write ordering and atomicity guarantees mean that
|
bos@559
|
632 Mercurial never needs to <emphasis>lock</emphasis> a
|
bos@559
|
633 repository when it's reading data, even if the repository is
|
bos@559
|
634 being written to while the read is occurring. This has a big
|
bos@559
|
635 effect on scalability; you can have an arbitrary number of
|
bos@559
|
636 Mercurial processes safely reading data from a repository
|
bos@559
|
637 safely all at once, no matter whether it's being written to or
|
bos@559
|
638 not.</para>
|
bos@559
|
639
|
bos@559
|
640 <para>The lockless nature of reading means that if you're
|
bos@559
|
641 sharing a repository on a multi-user system, you don't need to
|
bos@559
|
642 grant other local users permission to
|
bos@559
|
643 <emphasis>write</emphasis> to your repository in order for
|
bos@559
|
644 them to be able to clone it or pull changes from it; they only
|
bos@559
|
645 need <emphasis>read</emphasis> permission. (This is
|
bos@559
|
646 <emphasis>not</emphasis> a common feature among revision
|
bos@559
|
647 control systems, so don't take it for granted! Most require
|
bos@559
|
648 readers to be able to lock a repository to access it safely,
|
bos@559
|
649 and this requires write permission on at least one directory,
|
bos@559
|
650 which of course makes for all kinds of nasty and annoying
|
bos@559
|
651 security and administrative problems.)</para>
|
bos@559
|
652
|
bos@559
|
653 <para>Mercurial uses locks to ensure that only one process can
|
bos@559
|
654 write to a repository at a time (the locking mechanism is safe
|
bos@559
|
655 even over filesystems that are notoriously hostile to locking,
|
bos@559
|
656 such as NFS). If a repository is locked, a writer will wait
|
bos@559
|
657 for a while to retry if the repository becomes unlocked, but
|
bos@559
|
658 if the repository remains locked for too long, the process
|
bos@559
|
659 attempting to write will time out after a while. This means
|
bos@559
|
660 that your daily automated scripts won't get stuck forever and
|
bos@559
|
661 pile up if a system crashes unnoticed, for example. (Yes, the
|
bos@559
|
662 timeout is configurable, from zero to infinity.)</para>
|
bos@559
|
663
|
bos@559
|
664 <sect3>
|
bos@559
|
665 <title>Safe dirstate access</title>
|
bos@559
|
666
|
bos@559
|
667 <para>As with revision data, Mercurial doesn't take a lock to
|
bos@559
|
668 read the dirstate file; it does acquire a lock to write it.
|
bos@559
|
669 To avoid the possibility of reading a partially written copy
|
bos@559
|
670 of the dirstate file, Mercurial writes to a file with a
|
bos@559
|
671 unique name in the same directory as the dirstate file, then
|
bos@559
|
672 renames the temporary file atomically to
|
bos@559
|
673 <filename>dirstate</filename>. The file named
|
bos@559
|
674 <filename>dirstate</filename> is thus guaranteed to be
|
bos@559
|
675 complete, not partially written.</para>
|
bos@559
|
676
|
bos@559
|
677 </sect3>
|
bos@559
|
678 </sect2>
|
bos@559
|
679 <sect2>
|
bos@559
|
680 <title>Avoiding seeks</title>
|
bos@559
|
681
|
bos@559
|
682 <para>Critical to Mercurial's performance is the avoidance of
|
bos@559
|
683 seeks of the disk head, since any seek is far more expensive
|
bos@559
|
684 than even a comparatively large read operation.</para>
|
bos@559
|
685
|
bos@559
|
686 <para>This is why, for example, the dirstate is stored in a
|
bos@559
|
687 single file. If there were a dirstate file per directory that
|
bos@559
|
688 Mercurial tracked, the disk would seek once per directory.
|
bos@559
|
689 Instead, Mercurial reads the entire single dirstate file in
|
bos@559
|
690 one step.</para>
|
bos@559
|
691
|
bos@559
|
692 <para>Mercurial also uses a <quote>copy on write</quote> scheme
|
bos@559
|
693 when cloning a repository on local storage. Instead of
|
bos@559
|
694 copying every revlog file from the old repository into the new
|
bos@559
|
695 repository, it makes a <quote>hard link</quote>, which is a
|
bos@559
|
696 shorthand way to say <quote>these two names point to the same
|
bos@559
|
697 file</quote>. When Mercurial is about to write to one of a
|
bos@559
|
698 revlog's files, it checks to see if the number of names
|
bos@559
|
699 pointing at the file is greater than one. If it is, more than
|
bos@559
|
700 one repository is using the file, so Mercurial makes a new
|
bos@559
|
701 copy of the file that is private to this repository.</para>
|
bos@559
|
702
|
bos@559
|
703 <para>A few revision control developers have pointed out that
|
bos@559
|
704 this idea of making a complete private copy of a file is not
|
bos@559
|
705 very efficient in its use of storage. While this is true,
|
bos@559
|
706 storage is cheap, and this method gives the highest
|
bos@559
|
707 performance while deferring most book-keeping to the operating
|
bos@559
|
708 system. An alternative scheme would most likely reduce
|
bos@559
|
709 performance and increase the complexity of the software, each
|
bos@559
|
710 of which is much more important to the <quote>feel</quote> of
|
bos@559
|
711 day-to-day use.</para>
|
bos@559
|
712
|
bos@559
|
713 </sect2>
|
bos@559
|
714 <sect2>
|
bos@559
|
715 <title>Other contents of the dirstate</title>
|
bos@559
|
716
|
bos@559
|
717 <para>Because Mercurial doesn't force you to tell it when you're
|
bos@559
|
718 modifying a file, it uses the dirstate to store some extra
|
bos@559
|
719 information so it can determine efficiently whether you have
|
bos@559
|
720 modified a file. For each file in the working directory, it
|
bos@559
|
721 stores the time that it last modified the file itself, and the
|
bos@559
|
722 size of the file at that time.</para>
|
bos@559
|
723
|
bos@559
|
724 <para>When you explicitly <command role="hg-cmd">hg
|
bos@559
|
725 add</command>, <command role="hg-cmd">hg remove</command>,
|
bos@559
|
726 <command role="hg-cmd">hg rename</command> or <command
|
bos@559
|
727 role="hg-cmd">hg copy</command> files, Mercurial updates the
|
bos@559
|
728 dirstate so that it knows what to do with those files when you
|
bos@559
|
729 commit.</para>
|
bos@559
|
730
|
bos@559
|
731 <para>When Mercurial is checking the states of files in the
|
bos@559
|
732 working directory, it first checks a file's modification time.
|
bos@559
|
733 If that has not changed, the file must not have been modified.
|
bos@559
|
734 If the file's size has changed, the file must have been
|
bos@559
|
735 modified. If the modification time has changed, but the size
|
bos@559
|
736 has not, only then does Mercurial need to read the actual
|
bos@559
|
737 contents of the file to see if they've changed. Storing these
|
bos@559
|
738 few extra pieces of information dramatically reduces the
|
bos@559
|
739 amount of data that Mercurial needs to read, which yields
|
bos@559
|
740 large performance improvements compared to other revision
|
bos@559
|
741 control systems.</para>
|
bos@559
|
742
|
bos@559
|
743 </sect2>
|
bos@559
|
744 </sect1>
|
bos@559
|
745 </chapter>
|
bos@559
|
746
|
bos@559
|
747 <!--
|
bos@559
|
748 local variables:
|
bos@559
|
749 sgml-parent-document: ("00book.xml" "book" "chapter")
|
bos@559
|
750 end:
|
bos@559
|
751 -->
|