hgbook
changeset 110:75c076c7a374
More concepts stuff.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Fri Nov 10 15:09:49 2006 -0800 (2006-11-10) |
parents | 1b67dc96f27a |
children | 34b8b7a15ea1 |
files | en/Makefile en/concepts.tex en/metadata.svg |
line diff
1.1 --- a/en/Makefile Fri Nov 10 12:42:00 2006 -0800 1.2 +++ b/en/Makefile Fri Nov 10 15:09:49 2006 -0800 1.3 @@ -25,6 +25,7 @@ 1.4 kdiff3.png \ 1.5 metadata.svg \ 1.6 mq-stack.svg \ 1.7 + snapshot.svg \ 1.8 tour-history.svg \ 1.9 tour-merge-conflict.svg \ 1.10 tour-merge-merge.svg \
2.1 --- a/en/concepts.tex Fri Nov 10 12:42:00 2006 -0800 2.2 +++ b/en/concepts.tex Fri Nov 10 15:09:49 2006 -0800 2.3 @@ -77,15 +77,15 @@ 2.4 \label{fig:concepts:metadata} 2.5 \end{figure} 2.6 2.7 -Note that there is not a ``one to one'' relationship between revisions 2.8 -in these different metadata files. If the manifest hasn't changed 2.9 -between two changesets, their changelog entries will point to the same 2.10 -revision of the manifest. If a file that Mercurial tracks hasn't 2.11 -changed between two changesets, the entry for that file in the two 2.12 -revisions of the manifest will point to the same revision of its 2.13 -filelog. 2.14 - 2.15 -\section{An efficient, unified, safe storage mechanism} 2.16 +As the illustration shows, there is \emph{not} a ``one to one'' 2.17 +relationship between revisions in the changelog, manifest, or filelog. 2.18 +If the manifest hasn't changed between two changesets, the changelog 2.19 +entries for those changesets will point to the same revision of the 2.20 +manifest. If a file that Mercurial tracks hasn't changed between two 2.21 +changesets, the entry for that file in the two revisions of the 2.22 +manifest will point to the same revision of its filelog. 2.23 + 2.24 +\section{Safe, efficient storage} 2.25 2.26 The underpinnings of changelogs, manifests, and filelogs are provided 2.27 by a single structure called the \emph{revlog}. 2.28 @@ -136,13 +136,24 @@ 2.29 revisions you must read, hence the longer it takes to reconstruct a 2.30 particular revision. 2.31 2.32 +\begin{figure}[ht] 2.33 + \centering 2.34 + \grafix{snapshot} 2.35 + \caption{Snapshot of a revlog, with incremental deltas} 2.36 + \label{fig:concepts:snapshot} 2.37 +\end{figure} 2.38 + 2.39 The innovation that Mercurial applies to this problem is simple but 2.40 effective. Once the cumulative amount of delta information stored 2.41 since the last snapshot exceeds a fixed threshold, it stores a new 2.42 snapshot (compressed, of course), instead of another delta. This 2.43 makes it possible to reconstruct \emph{any} revision of a file 2.44 -quickly. This approach works so well that it has subsequently been 2.45 -copied by several other revision control systems. 2.46 +quickly. This approach works so well that it has since been copied by 2.47 +several other revision control systems. 2.48 + 2.49 +Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry 2.50 +in a revlog's index file, Mercurial stores the range of entries from 2.51 +the data file that it must read to reconstruct a particular revision. 2.52 2.53 \subsubsection{Aside: the influence of video compression} 2.54 2.55 @@ -163,26 +174,6 @@ 2.56 the next key frame is received. Also, the accumulation of encoding 2.57 errors restarts anew with each key frame. 2.58 2.59 -\subsection{Clever compression} 2.60 - 2.61 -When appropriate, Mercurial will store both snapshots and deltas in 2.62 -compressed form. It does this by always \emph{trying to} compress a 2.63 -snapshot or delta, but only storing the compressed version if it's 2.64 -smaller than the uncompressed version. 2.65 - 2.66 -This means that Mercurial does ``the right thing'' when storing a file 2.67 -whose native form is compressed, such as a \texttt{zip} archive or a 2.68 -JPEG image. When these types of files are compressed a second time, 2.69 -the resulting file is usually bigger than the once-compressed form, 2.70 -and so Mercurial will store the plain \texttt{zip} or JPEG. 2.71 - 2.72 -Deltas between revisions of a compressed file are usually larger than 2.73 -snapshots of the file, and Mercurial again does ``the right thing'' in 2.74 -these cases. It finds that such a delta exceeds the threshold at 2.75 -which it should store a complete snapshot of the file, so it stores 2.76 -the snapshot, again saving space compared to a naive delta-only 2.77 -approach. 2.78 - 2.79 \subsection{Strong integrity} 2.80 2.81 Along with delta or snapshot information, a revlog entry contains a 2.82 @@ -202,6 +193,83 @@ 2.83 after the corrupted section. This would not be possible with a 2.84 delta-only storage model. 2.85 2.86 +\section{The working directory} 2.87 + 2.88 +Mercurial's good ideas are not confined to the repository; it also 2.89 +needs to manage the working directory. The \emph{dirstate} contains 2.90 +Mercurial's knowledge of the working directory. This details which 2.91 +revision(s) the working directory is updated to, and all files that 2.92 +Mercurial is tracking in the working directory. 2.93 + 2.94 +Because Mercurial doesn't force you to tell it when you're modifying a 2.95 +file, it uses the dirstate to store some extra information so it can 2.96 +determine efficiently whether you have modified a file. For each file 2.97 +in the working directory, it stores the time that it last modified the 2.98 +file itself, and the size of the file at that time. 2.99 + 2.100 +When Mercurial is checking the states of files in the working 2.101 +directory, it first checks a file's modification time. If that has 2.102 +not changed, the file must not have been modified. If the file's size 2.103 +has changed, the file must have been modified. If the modification 2.104 +time has changed, but the size has not, only then does Mercurial need 2.105 +to read the actual contents of the file to see if they've changed. 2.106 +Storing these few extra pieces of information dramatically reduces the 2.107 +amount of data that Mercurial needs to read, which yields large 2.108 +performance improvements compared to other revision control systems. 2.109 + 2.110 +\section{Other interesting design features} 2.111 + 2.112 +In the sections above, I've tried to highlight some of the most 2.113 +important aspects of Mercurial's design, to illustrate that it pays 2.114 +careful attention to reliability and performance. However, the 2.115 +attention to detail doesn't stop there. There are a number of other 2.116 +aspects of Mercurial's construction that I personally find 2.117 +interesting. I'll detail a few of them here, separate from the ``big 2.118 +ticket'' items above, so that if you're interested, you can gain a 2.119 +better idea of the amount of thinking that goes into a well-designed 2.120 +system. 2.121 + 2.122 +\subsection{Clever compression} 2.123 + 2.124 +When appropriate, Mercurial will store both snapshots and deltas in 2.125 +compressed form. It does this by always \emph{trying to} compress a 2.126 +snapshot or delta, but only storing the compressed version if it's 2.127 +smaller than the uncompressed version. 2.128 + 2.129 +This means that Mercurial does ``the right thing'' when storing a file 2.130 +whose native form is compressed, such as a \texttt{zip} archive or a 2.131 +JPEG image. When these types of files are compressed a second time, 2.132 +the resulting file is usually bigger than the once-compressed form, 2.133 +and so Mercurial will store the plain \texttt{zip} or JPEG. 2.134 + 2.135 +Deltas between revisions of a compressed file are usually larger than 2.136 +snapshots of the file, and Mercurial again does ``the right thing'' in 2.137 +these cases. It finds that such a delta exceeds the threshold at 2.138 +which it should store a complete snapshot of the file, so it stores 2.139 +the snapshot, again saving space compared to a naive delta-only 2.140 +approach. 2.141 + 2.142 +\subsubsection{Network recompression} 2.143 + 2.144 +When storing revisions on disk, Mercurial uses the ``deflate'' 2.145 +compression algorithm (the same one used by the popular \texttt{zip} 2.146 +archive format), which balances good speed with a respectable 2.147 +compression ratio. However, when transmitting revision data over a 2.148 +network connection, Mercurial uncompresses the compressed revision 2.149 +data. 2.150 + 2.151 +If the connection is over HTTP, Mercurial recompresses the entire 2.152 +stream of data using a compression algorithm that gives a etter 2.153 +compression ratio (the Burrows-Wheeler algorithm from the widely used 2.154 +\texttt{bzip2} compression package). This combination of algorithm 2.155 +and compression of the entire stream (instead of a revision at a time) 2.156 +substantially reduces the number of bytes to be transferred, yielding 2.157 +better network performance over almost all kinds of network. 2.158 + 2.159 +(If the connection is over \command{ssh}, Mercurial \emph{doesn't} 2.160 +recompress the stream, because \command{ssh} can already do this 2.161 +itself.) 2.162 + 2.163 \subsection{Read/write ordering and atomicity} 2.164 2.165 Appending to files isn't the whole story when it comes to guaranteeing 2.166 @@ -241,15 +309,25 @@ 2.167 makes for all kinds of nasty and annoying security and administrative 2.168 problems.) 2.169 2.170 -Mercurial uses a locking mechanism to ensure that only one process can 2.171 -write to a repository at a time. This locking mechanism is safe even 2.172 -over filesystems that are notoriously unsafe for locking, such as NFS. 2.173 -If a repository is locked, a writer will wait for a while to retry if 2.174 -the repository becomes unlocked, but if the repository remains locked 2.175 -for too long, the process attempting to write will time out after a 2.176 -while. This means that your daily automated scripts won't get stuck 2.177 -forever and pile up if a system crashes unnoticed, for example. (Yes, 2.178 -the timeout is configurable, from zero to infinity.) 2.179 +Mercurial uses locks to ensure that only one process can write to a 2.180 +repository at a time (the locking mechanism is safe even over 2.181 +filesystems that are notoriously hostile to locking, such as NFS). If 2.182 +a repository is locked, a writer will wait for a while to retry if the 2.183 +repository becomes unlocked, but if the repository remains locked for 2.184 +too long, the process attempting to write will time out after a while. 2.185 +This means that your daily automated scripts won't get stuck forever 2.186 +and pile up if a system crashes unnoticed, for example. (Yes, the 2.187 +timeout is configurable, from zero to infinity.) 2.188 + 2.189 +\subsubsection{Safe dirstate access} 2.190 + 2.191 +As with revision data, Mercurial doesn't take a lock to read the 2.192 +dirstate file; it does acquire a lock to write it. To avoid the 2.193 +possibility of reading a partially written copy of the dirstate file, 2.194 +Mercurial writes to a file with a unique name in the same directory as 2.195 +the dirstate file, then renames the temporary file atomically to 2.196 +\filename{dirstate}. The file named \filename{dirstate} is thus 2.197 +guaranteed to be complete, not partially written. 2.198 2.199 2.200
3.1 --- a/en/metadata.svg Fri Nov 10 12:42:00 2006 -0800 3.2 +++ b/en/metadata.svg Fri Nov 10 15:09:49 2006 -0800 3.3 @@ -67,7 +67,7 @@ 3.4 id="layer1"> 3.5 <path 3.6 style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#a7a7a7;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1;display:inline" 3.7 - d="M 326.94646,471.18359 L 326.94646,510.98123" 3.8 + d="M 326.94646,467.18359 L 326.94646,510.98123" 3.9 id="path1910" 3.10 inkscape:connector-type="polyline" 3.11 inkscape:connection-end="#rect2962" 3.12 @@ -87,8 +87,8 @@ 3.13 inkscape:connection-end="#rect3038" 3.14 inkscape:connection-start="#rect2962" /> 3.15 <path 3.16 - style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5,1.5;stroke-dashoffset:0" 3.17 - d="M 254.23217,471.18359 L 254.23216,510.98123" 3.18 + style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1" 3.19 + d="M 254.23217,467.18359 L 254.23216,510.98123" 3.20 id="path3088" 3.21 inkscape:connector-type="polyline" 3.22 inkscape:connection-start="#rect1872" 3.23 @@ -113,52 +113,52 @@ 3.24 width="51.42857" 3.25 height="20" 3.26 x="228.51788" 3.27 - y="450.68359" /> 3.28 + y="446.68359" /> 3.29 <rect 3.30 style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" 3.31 id="rect2764" 3.32 width="51.42857" 3.33 height="20" 3.34 x="301.23218" 3.35 - y="450.68359" /> 3.36 + y="446.68359" /> 3.37 <rect 3.38 style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" 3.39 id="rect2766" 3.40 width="51.42857" 3.41 height="20" 3.42 x="155.80359" 3.43 - y="450.68359" /> 3.44 + y="446.68359" /> 3.45 <rect 3.46 style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" 3.47 id="rect2768" 3.48 width="51.42857" 3.49 height="20" 3.50 x="83.089294" 3.51 - y="450.68359" /> 3.52 - <path 3.53 - style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1" 3.54 - d="M 135.01786,460.68359 L 155.30359,460.68359" 3.55 + y="446.68359" /> 3.56 + <path 3.57 + style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1" 3.58 + d="M 135.01786,456.68359 L 155.30359,456.68359" 3.59 id="path2770" 3.60 inkscape:connector-type="polyline" 3.61 inkscape:connection-start="#rect2768" 3.62 inkscape:connection-end="#rect2766" /> 3.63 <path 3.64 style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1" 3.65 - d="M 207.73216,460.68359 L 228.01788,460.68359" 3.66 + d="M 207.73216,456.68359 L 228.01788,456.68359" 3.67 id="path2772" 3.68 inkscape:connector-type="polyline" 3.69 inkscape:connection-start="#rect2766" 3.70 inkscape:connection-end="#rect1872" /> 3.71 <path 3.72 style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1" 3.73 - d="M 280.44645,460.68359 L 300.73218,460.68359" 3.74 + d="M 280.44645,456.68359 L 300.73218,456.68359" 3.75 id="path2774" 3.76 inkscape:connector-type="polyline" 3.77 inkscape:connection-start="#rect1872" 3.78 inkscape:connection-end="#rect2764" /> 3.79 <path 3.80 style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:3, 3;stroke-dashoffset:0;stroke-opacity:1" 3.81 - d="M 62.303571,460.68359 L 82.589293,460.68359" 3.82 + d="M 62.303571,456.68359 L 82.589294,456.68359" 3.83 id="path2778" 3.84 inkscape:connector-type="polyline" 3.85 inkscape:connection-end="#rect2768" /> 3.86 @@ -298,12 +298,12 @@ 3.87 xml:space="preserve" 3.88 style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman" 3.89 x="82.072548" 3.90 - y="436.64789" 3.91 + y="432.64789" 3.92 id="text3094"><tspan 3.93 sodipodi:role="line" 3.94 id="tspan3096" 3.95 x="82.072548" 3.96 - y="436.64789">Changelog</tspan></text> 3.97 + y="432.64789">Changelog</tspan></text> 3.98 <text 3.99 xml:space="preserve" 3.100 style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman"