hgbook: bc2136732cd6 es/concepts.tex

hgbook

view es/concepts.tex @ 418:bc2136732cd6

translated 3 paragraphs

author	Javier Rojas <jerojasro@devnull.li>
date	Fri Nov 14 00:09:26 2008 -0500 (2008-11-14)
parents	15bf7d50b586
children	0774efad9003

line source

1 \chapter{Tras bambalinas}

2 \label{chap:concepts}

4 A diferencia de varios sistemas de control de revisiones, los

5 conceptos en los que se fundamenta Mercurial son lo suficientemente

6 simples como para entender fácilmente cómo funciona el software.

7 Saber esto no es necesario, pero considero útil tener un ``modelo

8 mental'' de qué es lo que sucede.

10 Comprender esto me da la confianza de que Mercurial ha sido

11 cuidadosamente diseñado para ser tanto \emph{seguro} como

12 \emph{eficiente}. Y tal vez con la misma importancia, si es fácil

13 para mí hacerme a una idea adecuada de qué está haciendo el software

14 cuando llevo a cabo una tarea relacionada con control de revisiones,

15 es menos probable que me sosprenda su comportamiento.

17 En este capítulo, cubriremos inicialmente los conceptos centrales

18 del diseño de Mercurial, y luego discutiremos algunos detalles

19 interesantes de su implementación.

21 \section{Registro del historial de Mercurial}

23 \subsection{Seguir el historial de un único fichero}

25 Cuando Mercurial sigue las modificaciones a un fichero, guarda el

26 historial de dicho fichero en un objeto de metadatos llamado

27 \emph{filelog}\ndt{Fichero de registro}. Cada entrada en el fichero

28 de registro contiene suficiente información para reconstruir una

29 revisión del fichero que se está siguiendo. Los ficheros de registro

30 son almacenados como ficheros el el directorio

31 \sdirname{.hg/store/data}. Un fichero de registro contiene dos tipos

32 de información: datos de revisiones, y un índice para ayudar a

33 Mercurial a buscar revisiones eficientemente.

35 El fichero de registro de un fichero grande, o con un historial muy

36 largo, es guardado como ficheros separados para datos (sufijo

37 ``\texttt{.d}'') y para el índice (sufijo ``\texttt{.i}''). Para

38 ficheros pequeños con un historial pequeño, los datos de revisiones y

39 el índice son combinados en un único fichero ``\texttt{.i}''. La

40 correspondencia entre un fichero en el directorio de trabajo y el

41 fichero de registro que hace seguimiento a su historial en el

42 repositorio se ilustra en la figura~\ref{fig:concepts:filelog}.

44 \begin{figure}[ht]

45 \centering

46 \grafix{filelog}

47 \caption{Relación entre ficheros en el directorio de trabajo y

48 ficheros de registro en el repositorio}

49 \label{fig:concepts:filelog}

50 \end{figure}

52 \subsection{Administración de ficheros monitoreados}

54 Mercurial usa una estructura llamada \emph{manifiesto} para

55 % TODO collect together => centralizar

56 centralizar la información que maneja acerca de los ficheros que

57 monitorea. Cada entrada en el manifiesto contiene información acerca

58 de los ficheros involucrados en un único conjunto de cambios. Una

59 entrada registra qué ficheros están presentes en el conjunto de

60 cambios, la revisión de cada fichero, y otros cuantos metadatos del

61 mismo.

63 \subsection{Registro de información del conjunto de cambios}

65 La \emph{bitácora de cambios} contiene información acerca de cada

66 conjunto de cambios. Cada revisión indica quién consignó un cambio, el

67 comentario para el conjunto de cambios, otros datos relacionados con

68 el conjunto de cambios, y la revisión del manifiesto a usar.

70 \subsection{Relaciones entre revisiones}

72 Dentro de una bitácora de cambios, un manifiesto, o un fichero de

73 registro, cada revisión conserva un apuntador a su padre inmediato

74 (o sus dos padres, si es la revisión de una fusión). Como menciońe

75 anteriormente, también hay relaciones entre revisiones \emph{a través}

76 de estas estructuras, y tienen naturaleza jerárquica.

78 Por cada conjunto de cambios en un repositorio, hay exactamente una

79 revisión almacenada en la bitácora de cambios. Cada revisión de la

80 bitácora de cambios contiene un apuntador a una única revisión del

81 manifiesto. Una revisión del manifiesto almacena un apuntador a una

82 única revisión de cada fichero de registro al que se le hacía

83 seguimiento cuando fue creado el conjunto de cambios. Estas relaciones

84 se ilustran en la figura~\ref{fig:concepts:metadata}.

86 \begin{figure}[ht]

87 \centering

88 \grafix{metadata}

89 \caption{Relaciones entre metadatos}

90 \label{fig:concepts:metadata}

91 \end{figure}

93 Como lo muestra la figura, \emph{no} hay una relación ``uno a uno''

94 entre las revisiones en el conjunto de cambios, el manifiesto, o el

95 fichero de registro. Si el manifiesto no ha sido modificado de un

96 conjunto de cambios a otro, las entradas en la bitácora de cambios

97 para esos conjuntos de cambios apuntarán a la misma revisión del

98 manifiesto. Si un fichero monitoreado por Mercurial no sufre ningún

99 cambio de un conjunto de cambios a otro, la entrada para dicho fichero

100 en las dos revisiones del manifiesto apuntará a la misma revisión de

101 su fichero de registro.

102

103 \section{Almacenamiento seguro y eficiente}

104

105 La base común de las bitácoras de cambios, los manifiestos, y los

106 ficheros de registros es provista por una única estructura llamada el

107 \emph{revlog}\ndt{Contracción de \emph{revision log}, registro de

108 revisión.}.

109

110 \subsection{Almacenamiento eficiente}

111

112 El revlog provee almacenamiento eficiente de revisiones por medio del

113 mecanismo de \emph{deltas}\ndt{Diferencias.}. En vez de almacenar una

114 copia completa del fichero por cada revisión, almacena los cambios

115 necesarios para transformar una revisión anterior en la nueva

116 revisión. Para muchos tipos de fichero, estos deltas son típicamente

117 de una fracción porcentual del tamaño de una copia completa del

118 fichero.

119

120 Algunos sistemas de control de revisiones obsoletos sólo pueden

121 manipular deltas de ficheros de texto plano. Ellos o bien almacenan

122 los ficheros binarios como instantáneas completas, o codificados en

123 alguna representación de texto plano adecuada, y ambas alternativas

124 son enfoques que desperdician bastantes recursos. Mercurial puede

125 manejar deltas de ficheros con contenido binario arbitrario; no

126 necesita tratar el texto plano como un caso especial.

127

128 \subsection{Operación segura}

129 \label{sec:concepts:txn}

130

131 Mercurial sólo \emph{añade} datos al final de los ficheros de revlog. Nunca

132 modifica ninguna sección de un fichero una vez ha sido escrita. Esto es más

133 robusto y eficiente que otros esquemas que requieren modificar o reescribir

134 datos.

135

136 Adicionalmente, Mercurial trata cada escritura como parte de una

137 \emph{transacción}, que puede cubrir varios ficheros. Una transacción es

138 \emph{atómica}: o bien la transacción tiene éxito y entonces todos sus efectos

139 son visibles para todos los lectores, o la operación completa es cancelada.

140 % TODO atomicidad no existe de acuerdo a DRAE, reemplazar

141 Esta garantía de atomicidad implica que, si usted está ejecutando dos copias de

142 Mercurial, donde una de ellas está leyendo datos y la otra los está escribiendo,

143 el lector nunca verá un resultado escrito parcialmente que podría confundirlo.

144

145 El hecho de que Mercurial sólo hace adiciones a los ficheros hace más fácil

146 proveer esta garantía transaccional. A medida que sea más fácil hacer

147 operaciones como ésta, más confianza tendrá usted en que sean hechas

148 correctamente.

149

150 \subsection{Recuperación rápida de datos}

151

152 Mercurial evita ingeniosamente un problema común a todos los sistemas de control

153 de revisiones anteriores> el problema de la

154 \emph{recuperación\ndt{\emph{Retrieval}. Recuperación en el sentido de traer los

155 datos, o reconstruirlos a partir de otros datos, pero no debido a una falla o

156 calamidad, sino a la operación normal del sistema.} ineficiente de datos}.

157 Muchos sistemas de control de revisiones almacenan los contenidos de una

158 revisión como una serie incremental de modificaciones a una ``instantánea''.

159 Para reconstruir una versión cualquiera, primero usted debe leer la instantánea,

160 y luego cada una de las revisiones entre la instantánea y su versión objetivo.

161 Entre más largo sea el historial de un fichero, más revisiones deben ser leídas,

162 y por tanto toma más tiempo reconstruir una versión particular.

163

164 \begin{figure}[ht]

165 \centering

166 \grafix{snapshot}

167 \caption{Instantánea de un revlog, con deltas incrementales}

168 \label{fig:concepts:snapshot}

169 \end{figure}

170

171 La innovación que aplica Mercurial a este problema es simple pero efectiva.

172 Una vez la cantidad de información de deltas acumulada desde la última

173 instantánea excede un umbral fijado de antemano, se almacena una nueva

174 instantánea (comprimida, por supuesto), en lugar de otro delta. Esto hace

175 posible reconstruir \emph{cualquier} versión de un fichero rápidamente. Este

176 enfoque funciona tan bien que desde entonces ha sido copiado por otros sistemas

177 de control de revisiones.

178

179 La figura~\ref{fig:concepts:snapshot} ilustra la idea. En una entrada en el

180 fichero índice de un revlog, Mercurial almacena el rango de entradas (deltas)

181 del fichero de datos que se deben leer para reconstruir una revisión en

182 particular.

183

184 \subsubsection{Nota al margen: la influencia de la compresión de vídeo}

185

186 Si le es familiar la compresión de vídeo, o ha mirado alguna vez una emisión de

187 TV a través de cable digital o un servicio de satélite, puede que sepa que la

188 mayor parte de los esquemas de compresión de vídeo almacenan cada cuadro del

189 mismo como un delta contra el cuadro predecesor. Adicionalmente, estos esquemas

190 usan técnicas de compresión ``con pérdida'' para aumentar la tasa de

191 compresión, por lo que los errores visuales se acumulan a lo largo de una

192 cantidad de deltas inter-cuadros.

193

194 Ya que existe la posibilidad de que un flujo de vídeo se ``pierda''

195 ocasionalmente debido a fallas en la señal, y para limitar la acumulación de

196 errores introducida por la compresión con pérdidas, los codificadores de vídeo

197 insertan periódicamente un cuadro completo (también llamado ``cuadro clave'') en

198 el flujo de vídeo; el siguiente delta es generado con respecto a dicho cuadro.

199 Esto quiere decir que si la señal de vídeo se interrumpe, se reanudará una vez

200 se reciba el siguiente cuadro clave. Además, la acumulación de errores de

201 codificación se reinicia con cada cuadro clave.

202

203 \subsection{Identificación e integridad fuerte}

204

205 Además de la información de deltas e instantáneas, una entrada en un

206 % TODO de pronto aclarar qué diablos es un hash?

207 revlog contiene un hash criptográfico de los datos que representa.

208 Esto hace difícil falsificar el contenido de una revisión, y hace

209 fácil detectar una corrupción accidental.

210

211 Los hashes proveen más que una simple revisión de corrupción: son

212 usados como los identificadores para las revisiones.

213 % TODO no entendí completamente la frase a continuación

214 Los hashes de

215 identificación de conjuntos de cambios que usted ve como usuario final

216 son de las revisiones de la bitácora de cambios. Aunque los ficheros

217 de registro y el manifiesto también usan hashes, Mercurial sólo los

218 usa tras bambalinas.

219

220 Mercurial verifica que los hashes sean correctos cuando recupera

221 revisiones de ficheros y cuando jala cambios desde otro repositorio.

222 Si se encuentra un problema de integridad, Mercurial se quejará y

223 detendrá cualquier operación que esté haciendo.

224

225 Además del efecto que tiene en la eficiencia en la recuperación, el

226 uso periódico de instantáneas de Mercurial lo hace más robusto frente

227 a la corrupción parcial de datos. Si un fichero de registro se

228 corrompe parcialmente debido a un error de hardware o del sistema, a

229 menudo es posible reconstruir algunas o la mayoría de las revisiones a

230 partir de las secciones no corrompidas del fichero de registro, tanto

231 antes como después de la sección corrompida. Esto no sería posible con

232 un sistema de almacenamiento basado únicamente en deltas.

233

234 \section{Historial de revisiones, ramas y fusiones}

235

236 Cada entrada en el revlog de Mercurial conoce la identidad de la

237 revisión de su ancestro inmediato, al que se conoce usualmente como su

238 \emph{padre}. De hecho, una revisión contiene sitio no sólo para un

239 padre, sino para dos. Mercurial usa un hash especial, llamado el

240 ``ID nulo'', para representar la idea de ``no hay padre aquí''. Este

241 hash es simplemente una cadena de ceros.

242

243 En la figura~\ref{fig:concepts:revlog} usted puede ver un ejemplo de

244 la estructura conceptual de un revlog. Los ficheros de registro,

245 manifiestos, y bitácoras de cambios comparten la misma estructura;

246 sólo difieren en el tipo de datos almacenados en cada delta o

247 instantánea.

248

249 La primera revisión en un revlog (al final de la imagen) tiene como

250 padre al ID nulo, en las dos ranuras disponibles para padres. En una

251 revisión normal, la primera ranura para padres contiene el ID de la

252 revisión padre, y la segunda contiene el ID nulo, señalando así que la

253 revisión sólo tiene un padre real. Un par de revisiones que tenga el

254 mismo ID padre son ramas. Una revisión que representa una fusión entre

255 ramas tiene dos IDs de revisión normales en sus ranuras para padres.

256

257 \begin{figure}[ht]

258 \centering

259 \grafix{revlog}

260 \caption{}

261 \label{fig:concepts:revlog}

262 \end{figure}

263

264 \section{The working directory}

265

266 In the working directory, Mercurial stores a snapshot of the files

267 from the repository as of a particular changeset.

268

269 The working directory ``knows'' which changeset it contains. When you

270 update the working directory to contain a particular changeset,

271 Mercurial looks up the appropriate revision of the manifest to find

272 out which files it was tracking at the time that changeset was

273 committed, and which revision of each file was then current. It then

274 recreates a copy of each of those files, with the same contents it had

275 when the changeset was committed.

276

277 The \emph{dirstate} contains Mercurial's knowledge of the working

278 directory. This details which changeset the working directory is

279 updated to, and all of the files that Mercurial is tracking in the

280 working directory.

281

282 Just as a revision of a revlog has room for two parents, so that it

283 can represent either a normal revision (with one parent) or a merge of

284 two earlier revisions, the dirstate has slots for two parents. When

285 you use the \hgcmd{update} command, the changeset that you update to

286 is stored in the ``first parent'' slot, and the null ID in the second.

287 When you \hgcmd{merge} with another changeset, the first parent

288 remains unchanged, and the second parent is filled in with the

289 changeset you're merging with. The \hgcmd{parents} command tells you

290 what the parents of the dirstate are.

291

292 \subsection{What happens when you commit}

293

294 The dirstate stores parent information for more than just book-keeping

295 purposes. Mercurial uses the parents of the dirstate as \emph{the

296 parents of a new changeset} when you perform a commit.

297

298 \begin{figure}[ht]

299 \centering

300 \grafix{wdir}

301 \caption{The working directory can have two parents}

302 \label{fig:concepts:wdir}

303 \end{figure}

304

305 Figure~\ref{fig:concepts:wdir} shows the normal state of the working

306 directory, where it has a single changeset as parent. That changeset

307 is the \emph{tip}, the newest changeset in the repository that has no

308 children.

309

310 \begin{figure}[ht]

311 \centering

312 \grafix{wdir-after-commit}

313 \caption{The working directory gains new parents after a commit}

314 \label{fig:concepts:wdir-after-commit}

315 \end{figure}

316

317 It's useful to think of the working directory as ``the changeset I'm

318 about to commit''. Any files that you tell Mercurial that you've

319 added, removed, renamed, or copied will be reflected in that

320 changeset, as will modifications to any files that Mercurial is

321 already tracking; the new changeset will have the parents of the

322 working directory as its parents.

323

324 After a commit, Mercurial will update the parents of the working

325 directory, so that the first parent is the ID of the new changeset,

326 and the second is the null ID. This is shown in

327 figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch

328 any of the files in the working directory when you commit; it just

329 modifies the dirstate to note its new parents.

330

331 \subsection{Creating a new head}

332

333 It's perfectly normal to update the working directory to a changeset

334 other than the current tip. For example, you might want to know what

335 your project looked like last Tuesday, or you could be looking through

336 changesets to see which one introduced a bug. In cases like this, the

337 natural thing to do is update the working directory to the changeset

338 you're interested in, and then examine the files in the working

339 directory directly to see their contents as they werea when you

340 committed that changeset. The effect of this is shown in

341 figure~\ref{fig:concepts:wdir-pre-branch}.

342

343 \begin{figure}[ht]

344 \centering

345 \grafix{wdir-pre-branch}

346 \caption{The working directory, updated to an older changeset}

347 \label{fig:concepts:wdir-pre-branch}

348 \end{figure}

349

350 Having updated the working directory to an older changeset, what

351 happens if you make some changes, and then commit? Mercurial behaves

352 in the same way as I outlined above. The parents of the working

353 directory become the parents of the new changeset. This new changeset

354 has no children, so it becomes the new tip. And the repository now

355 contains two changesets that have no children; we call these

356 \emph{heads}. You can see the structure that this creates in

357 figure~\ref{fig:concepts:wdir-branch}.

358

359 \begin{figure}[ht]

360 \centering

361 \grafix{wdir-branch}

362 \caption{After a commit made while synced to an older changeset}

363 \label{fig:concepts:wdir-branch}

364 \end{figure}

365

366 \begin{note}

367 If you're new to Mercurial, you should keep in mind a common

368 ``error'', which is to use the \hgcmd{pull} command without any

369 options. By default, the \hgcmd{pull} command \emph{does not}

370 update the working directory, so you'll bring new changesets into

371 your repository, but the working directory will stay synced at the

372 same changeset as before the pull. If you make some changes and

373 commit afterwards, you'll thus create a new head, because your

374 working directory isn't synced to whatever the current tip is.

375

376 I put the word ``error'' in quotes because all that you need to do

377 to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In

378 other words, this almost never has negative consequences; it just

379 surprises people. I'll discuss other ways to avoid this behaviour,

380 and why Mercurial behaves in this initially surprising way, later

381 on.

382 \end{note}

383

384 \subsection{Merging heads}

385

386 When you run the \hgcmd{merge} command, Mercurial leaves the first

387 parent of the working directory unchanged, and sets the second parent

388 to the changeset you're merging with, as shown in

389 figure~\ref{fig:concepts:wdir-merge}.

390

391 \begin{figure}[ht]

392 \centering

393 \grafix{wdir-merge}

394 \caption{Merging two heads}

395 \label{fig:concepts:wdir-merge}

396 \end{figure}

397

398 Mercurial also has to modify the working directory, to merge the files

399 managed in the two changesets. Simplified a little, the merging

400 process goes like this, for every file in the manifests of both

401 changesets.

402 \begin{itemize}

403 \item If neither changeset has modified a file, do nothing with that

404 file.

405 \item If one changeset has modified a file, and the other hasn't,

406 create the modified copy of the file in the working directory.

407 \item If one changeset has removed a file, and the other hasn't (or

408 has also deleted it), delete the file from the working directory.

409 \item If one changeset has removed a file, but the other has modified

410 the file, ask the user what to do: keep the modified file, or remove

411 it?

412 \item If both changesets have modified a file, invoke an external

413 merge program to choose the new contents for the merged file. This

414 may require input from the user.

415 \item If one changeset has modified a file, and the other has renamed

416 or copied the file, make sure that the changes follow the new name

417 of the file.

418 \end{itemize}

419 There are more details---merging has plenty of corner cases---but

420 these are the most common choices that are involved in a merge. As

421 you can see, most cases are completely automatic, and indeed most

422 merges finish automatically, without requiring your input to resolve

423 any conflicts.

424

425 When you're thinking about what happens when you commit after a merge,

426 once again the working directory is ``the changeset I'm about to

427 commit''. After the \hgcmd{merge} command completes, the working

428 directory has two parents; these will become the parents of the new

429 changeset.

430

431 Mercurial lets you perform multiple merges, but you must commit the

432 results of each individual merge as you go. This is necessary because

433 Mercurial only tracks two parents for both revisions and the working

434 directory. While it would be technically possible to merge multiple

435 changesets at once, the prospect of user confusion and making a

436 terrible mess of a merge immediately becomes overwhelming.

437

438 \section{Other interesting design features}

439

440 In the sections above, I've tried to highlight some of the most

441 important aspects of Mercurial's design, to illustrate that it pays

442 careful attention to reliability and performance. However, the

443 attention to detail doesn't stop there. There are a number of other

444 aspects of Mercurial's construction that I personally find

445 interesting. I'll detail a few of them here, separate from the ``big

446 ticket'' items above, so that if you're interested, you can gain a

447 better idea of the amount of thinking that goes into a well-designed

448 system.

449

450 \subsection{Clever compression}

451

452 When appropriate, Mercurial will store both snapshots and deltas in

453 compressed form. It does this by always \emph{trying to} compress a

454 snapshot or delta, but only storing the compressed version if it's

455 smaller than the uncompressed version.

456

457 This means that Mercurial does ``the right thing'' when storing a file

458 whose native form is compressed, such as a \texttt{zip} archive or a

459 JPEG image. When these types of files are compressed a second time,

460 the resulting file is usually bigger than the once-compressed form,

461 and so Mercurial will store the plain \texttt{zip} or JPEG.

462

463 Deltas between revisions of a compressed file are usually larger than

464 snapshots of the file, and Mercurial again does ``the right thing'' in

465 these cases. It finds that such a delta exceeds the threshold at

466 which it should store a complete snapshot of the file, so it stores

467 the snapshot, again saving space compared to a naive delta-only

468 approach.

469

470 \subsubsection{Network recompression}

471

472 When storing revisions on disk, Mercurial uses the ``deflate''

473 compression algorithm (the same one used by the popular \texttt{zip}

474 archive format), which balances good speed with a respectable

475 compression ratio. However, when transmitting revision data over a

476 network connection, Mercurial uncompresses the compressed revision

477 data.

478

479 If the connection is over HTTP, Mercurial recompresses the entire

480 stream of data using a compression algorithm that gives a better

481 compression ratio (the Burrows-Wheeler algorithm from the widely used

482 \texttt{bzip2} compression package). This combination of algorithm

483 and compression of the entire stream (instead of a revision at a time)

484 substantially reduces the number of bytes to be transferred, yielding

485 better network performance over almost all kinds of network.

486

487 (If the connection is over \command{ssh}, Mercurial \emph{doesn't}

488 recompress the stream, because \command{ssh} can already do this

489 itself.)

490

491 \subsection{Read/write ordering and atomicity}

492

493 Appending to files isn't the whole story when it comes to guaranteeing

494 that a reader won't see a partial write. If you recall

495 figure~\ref{fig:concepts:metadata}, revisions in the changelog point to

496 revisions in the manifest, and revisions in the manifest point to

497 revisions in filelogs. This hierarchy is deliberate.

498

499 A writer starts a transaction by writing filelog and manifest data,

500 and doesn't write any changelog data until those are finished. A

501 reader starts by reading changelog data, then manifest data, followed

502 by filelog data.

503

504 Since the writer has always finished writing filelog and manifest data

505 before it writes to the changelog, a reader will never read a pointer

506 to a partially written manifest revision from the changelog, and it will

507 never read a pointer to a partially written filelog revision from the

508 manifest.

509

510 \subsection{Concurrent access}

511

512 The read/write ordering and atomicity guarantees mean that Mercurial

513 never needs to \emph{lock} a repository when it's reading data, even

514 if the repository is being written to while the read is occurring.

515 This has a big effect on scalability; you can have an arbitrary number

516 of Mercurial processes safely reading data from a repository safely

517 all at once, no matter whether it's being written to or not.

518

519 The lockless nature of reading means that if you're sharing a

520 repository on a multi-user system, you don't need to grant other local

521 users permission to \emph{write} to your repository in order for them

522 to be able to clone it or pull changes from it; they only need

523 \emph{read} permission. (This is \emph{not} a common feature among

524 revision control systems, so don't take it for granted! Most require

525 readers to be able to lock a repository to access it safely, and this

526 requires write permission on at least one directory, which of course

527 makes for all kinds of nasty and annoying security and administrative

528 problems.)

529

530 Mercurial uses locks to ensure that only one process can write to a

531 repository at a time (the locking mechanism is safe even over

532 filesystems that are notoriously hostile to locking, such as NFS). If

533 a repository is locked, a writer will wait for a while to retry if the

534 repository becomes unlocked, but if the repository remains locked for

535 too long, the process attempting to write will time out after a while.

536 This means that your daily automated scripts won't get stuck forever

537 and pile up if a system crashes unnoticed, for example. (Yes, the

538 timeout is configurable, from zero to infinity.)

539

540 \subsubsection{Safe dirstate access}

541

542 As with revision data, Mercurial doesn't take a lock to read the

543 dirstate file; it does acquire a lock to write it. To avoid the

544 possibility of reading a partially written copy of the dirstate file,

545 Mercurial writes to a file with a unique name in the same directory as

546 the dirstate file, then renames the temporary file atomically to

547 \filename{dirstate}. The file named \filename{dirstate} is thus

548 guaranteed to be complete, not partially written.

549

550 \subsection{Avoiding seeks}

551

552 Critical to Mercurial's performance is the avoidance of seeks of the

553 disk head, since any seek is far more expensive than even a

554 comparatively large read operation.

555

556 This is why, for example, the dirstate is stored in a single file. If

557 there were a dirstate file per directory that Mercurial tracked, the

558 disk would seek once per directory. Instead, Mercurial reads the

559 entire single dirstate file in one step.

560

561 Mercurial also uses a ``copy on write'' scheme when cloning a

562 repository on local storage. Instead of copying every revlog file

563 from the old repository into the new repository, it makes a ``hard

564 link'', which is a shorthand way to say ``these two names point to the

565 same file''. When Mercurial is about to write to one of a revlog's

566 files, it checks to see if the number of names pointing at the file is

567 greater than one. If it is, more than one repository is using the

568 file, so Mercurial makes a new copy of the file that is private to

569 this repository.

570

571 A few revision control developers have pointed out that this idea of

572 making a complete private copy of a file is not very efficient in its

573 use of storage. While this is true, storage is cheap, and this method

574 gives the highest performance while deferring most book-keeping to the

575 operating system. An alternative scheme would most likely reduce

576 performance and increase the complexity of the software, each of which

577 is much more important to the ``feel'' of day-to-day use.

578

579 \subsection{Other contents of the dirstate}

580

581 Because Mercurial doesn't force you to tell it when you're modifying a

582 file, it uses the dirstate to store some extra information so it can

583 determine efficiently whether you have modified a file. For each file

584 in the working directory, it stores the time that it last modified the

585 file itself, and the size of the file at that time.

586

587 When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or

588 \hgcmd{copy} files, Mercurial updates the dirstate so that it knows

589 what to do with those files when you commit.

590

591 When Mercurial is checking the states of files in the working

592 directory, it first checks a file's modification time. If that has

593 not changed, the file must not have been modified. If the file's size

594 has changed, the file must have been modified. If the modification

595 time has changed, but the size has not, only then does Mercurial need

596 to read the actual contents of the file to see if they've changed.

597 Storing these few extra pieces of information dramatically reduces the

598 amount of data that Mercurial needs to read, which yields large

599 performance improvements compared to other revision control systems.

600

601 %%% Local Variables:

602 %%% mode: latex

603 %%% TeX-master: "00book"

604 %%% End: