belaran@964: belaran@964: bos@559: bos@572: youshe@993: Derrière le décor youshe@993: youshe@993: À la différence de beaucoup d'outils de gestion de versions, youshe@993: les concepts sur lesquels se base Mercurial sont assez simples pour youshe@993: qu'il soit facile de comprendre comment le logiciel fonctionne. youshe@993: Bien que leur connaissance ne soit pas nécéssaire, je trouve utile youshe@993: d'avoir un modèle mental de ce qui se passe. youshe@993: youshe@993: En effet, cette compréhension m'apporte la confiance que youshe@993: Mercurial a été développé avec soin pour être à la fois youshe@993: sûr et efficace. De surcroît, youshe@993: si il m'est facile de garder en tête ce que le logiciel fait lorsque youshe@993: j'accompli des tâches de révision, j'aurai moins de risques d'être youshe@993: surpris par son comportement. youshe@993: youshe@993: Dans ce chapitre, nous décrirons tout d'abord les concepts youshe@993: essentiels de l'architecture de Mercurial, pour ensuite discuter quelques youshe@993: uns des détails intéressants de son implémentation. bos@559: bos@559: youshe@993: Conservation de l'historique sous Mercurial youshe@993: youshe@993: Suivi de l'historique pour un seul fichier youshe@993: youshe@993: Lorsque Mercurial effectue un suivi des modifications youshe@993: faites à un fichier, il conserve l'historique pour ce fichier dans un youshe@993: filelog sous forme de métadonnées. Chaque entrée youshe@993: dans le filelog contient assez d'informations pour reconstituer une youshe@993: révision du fichier correspondant. Les filelogs sont des fichiers youshe@993: stockés dans le répertoire .hg/store/data. Un filelog contient youshe@993: des informations de deux types: les données de révision, et un index youshe@993: pour permettre à Mercurial une recherche efficace d'une révision youshe@993: donnée. youshe@993: youshe@993: Lorsqu'un fichier devient trop gros ou a un long youshe@993: historique, son filelog se voit stocker dans un fichier de données youshe@993: (avec un suffixe .d) et un fichier youshe@993: index (avec un suffixe.i) youshe@993: distincts. La relation entre un fichier dans le répertoire de travail youshe@993: et le filelog couvrant le suivi de son historique dans le dépôt est youshe@993: illustré à la figure . bos@559: bos@591:
youshe@993: Relations entre les fichiers dans le répertoire de travail et youshe@993: leurs filelogs dans le dépôt youshe@993: youshe@993: XXX add text youshe@993:
youshe@993: youshe@993:
youshe@993: youshe@993: Gestion des fichiers suivis youshe@993: youshe@993: Mercurial a recours à une structure nommée youshe@993: manifest pour rassembler les informations sur youshe@993: les fichiers dont il gère le suivi. Chaque entrée dans ce manifest youshe@993: contient des informations sur les fichiers présents dans une révision youshe@993: donnée. Une entrée store la liste des fichiers faisant partie de la youshe@993: révision, la version de chaque fichier, et quelques autres youshe@993: métadonnées sur ces fichiers. bos@559: bos@559: bos@559: bos@559: Recording changeset information bos@559: youshe@993: The changelog contains youshe@993: information about each changeset. Each revision records who youshe@993: committed a change, the changeset comment, other pieces of youshe@993: changeset-related information, and the revision of the manifest to youshe@993: use. bos@559: bos@559: bos@559: bos@559: Relationships between revisions bos@559: bos@584: Within a changelog, a manifest, or a filelog, each bos@559: revision stores a pointer to its immediate parent (or to its bos@559: two parents, if it's a merge revision). As I mentioned above, bos@559: there are also relationships between revisions bos@559: across these structures, and they are bos@559: hierarchical in nature. bos@559: bos@584: For every changeset in a repository, there is exactly one bos@559: revision stored in the changelog. Each revision of the bos@559: changelog contains a pointer to a single revision of the bos@559: manifest. A revision of the manifest stores a pointer to a bos@559: single revision of each filelog tracked when that changeset bos@592: was created. These relationships are illustrated in bos@559: . bos@559: bos@591:
bos@591: Metadata relationships bos@591: bos@594: bos@591: XXX add text bos@559: bos@591:
bos@559: bos@584: As the illustration shows, there is bos@559: not a one to one bos@559: relationship between revisions in the changelog, manifest, or bos@701: filelog. If a file that bos@559: Mercurial tracks hasn't changed between two changesets, the bos@559: entry for that file in the two revisions of the manifest will bos@701: point to the same revision of its filelog bos@702: It is possible (though unusual) for the manifest to bos@701: remain the same between two changesets, in which case the bos@701: changelog entries for those changesets will point to the bos@701: same revision of the manifest. bos@701: . bos@559: bos@559:
bos@559:
bos@559: bos@559: Safe, efficient storage bos@559: bos@584: The underpinnings of changelogs, manifests, and filelogs are bos@559: provided by a single structure called the bos@559: revlog. bos@559: bos@559: bos@559: Efficient storage bos@559: bos@584: The revlog provides efficient storage of revisions using a bos@559: delta mechanism. Instead of storing a bos@559: complete copy of a file for each revision, it stores the bos@559: changes needed to transform an older revision into the new bos@559: revision. For many kinds of file data, these deltas are bos@559: typically a fraction of a percent of the size of a full copy bos@559: of a file. bos@559: bos@584: Some obsolete revision control systems can only work with bos@559: deltas of text files. They must either store binary files as bos@559: complete snapshots or encoded into a text representation, both bos@559: of which are wasteful approaches. Mercurial can efficiently bos@559: handle deltas of files with arbitrary binary contents; it bos@559: doesn't need to treat text as special. bos@559: bos@559: bos@559: bos@559: Safe operation bos@559: bos@584: Mercurial only ever appends data to bos@559: the end of a revlog file. It never modifies a section of a bos@559: file after it has written it. This is both more robust and bos@559: efficient than schemes that need to modify or rewrite bos@559: data. bos@559: bos@584: In addition, Mercurial treats every write as part of a bos@559: transaction that can span a number of bos@559: files. A transaction is atomic: either bos@559: the entire transaction succeeds and its effects are all bos@559: visible to readers in one go, or the whole thing is undone. bos@559: This guarantee of atomicity means that if you're running two bos@559: copies of Mercurial, where one is reading data and one is bos@559: writing it, the reader will never see a partially written bos@559: result that might confuse it. bos@559: bos@584: The fact that Mercurial only appends to files makes it bos@559: easier to provide this transactional guarantee. The easier it bos@559: is to do stuff like this, the more confident you should be bos@559: that it's done correctly. bos@559: bos@559: bos@559: bos@559: Fast retrieval bos@559: bos@701: Mercurial cleverly avoids a pitfall common to bos@701: all earlier revision control systems: the problem of bos@701: inefficient retrieval. Most revision bos@701: control systems store the contents of a revision as an bos@701: incremental series of modifications against a bos@701: snapshot. (Some base the snapshot on the bos@701: oldest revision, others on the newest.) To reconstruct a bos@701: specific revision, you must first read the snapshot, and then bos@701: every one of the revisions between the snapshot and your bos@701: target revision. The more history that a file accumulates, bos@701: the more revisions you must read, hence the longer it takes to bos@701: reconstruct a particular revision. bos@559: bos@591:
bos@591: Snapshot of a revlog, with incremental deltas bos@591: bos@594: bos@591: XXX add text bos@591: bos@591:
bos@559: bos@584: The innovation that Mercurial applies to this problem is bos@559: simple but effective. Once the cumulative amount of delta bos@559: information stored since the last snapshot exceeds a fixed bos@559: threshold, it stores a new snapshot (compressed, of course), bos@559: instead of another delta. This makes it possible to bos@559: reconstruct any revision of a file bos@559: quickly. This approach works so well that it has since been bos@559: copied by several other revision control systems. bos@559: bos@592: illustrates bos@559: the idea. In an entry in a revlog's index file, Mercurial bos@559: stores the range of entries from the data file that it must bos@559: read to reconstruct a particular revision. bos@559: bos@559: bos@559: Aside: the influence of video compression bos@559: bos@701: If you're familiar with video compression or bos@701: have ever watched a TV feed through a digital cable or bos@701: satellite service, you may know that most video compression bos@701: schemes store each frame of video as a delta against its bos@701: predecessor frame. bos@701: bos@701: Mercurial borrows this idea to make it bos@701: possible to reconstruct a revision from a snapshot and a bos@701: small number of deltas. bos@559: bos@559: bos@559:
bos@559: bos@559: Identification and strong integrity bos@559: bos@584: Along with delta or snapshot information, a revlog entry bos@559: contains a cryptographic hash of the data that it represents. bos@559: This makes it difficult to forge the contents of a revision, bos@559: and easy to detect accidental corruption. bos@559: bos@584: Hashes provide more than a mere check against corruption; bos@559: they are used as the identifiers for revisions. The changeset bos@559: identification hashes that you see as an end user are from bos@559: revisions of the changelog. Although filelogs and the bos@559: manifest also use hashes, Mercurial only uses these behind the bos@559: scenes. bos@559: bos@584: Mercurial verifies that hashes are correct when it bos@559: retrieves file revisions and when it pulls changes from bos@559: another repository. If it encounters an integrity problem, it bos@559: will complain and stop whatever it's doing. bos@559: bos@584: In addition to the effect it has on retrieval efficiency, bos@559: Mercurial's use of periodic snapshots makes it more robust bos@559: against partial data corruption. If a revlog becomes partly bos@559: corrupted due to a hardware error or system bug, it's often bos@559: possible to reconstruct some or most revisions from the bos@559: uncorrupted sections of the revlog, both before and after the bos@559: corrupted section. This would not be possible with a bos@559: delta-only storage model. bos@559: bos@559:
bos@701: bos@559: bos@559: Revision history, branching, and merging bos@559: bos@584: Every entry in a Mercurial revlog knows the identity of its bos@559: immediate ancestor revision, usually referred to as its bos@559: parent. In fact, a revision contains room bos@559: for not one parent, but two. Mercurial uses a special hash, bos@559: called the null ID, to represent the idea bos@559: there is no parent here. This hash is simply a bos@559: string of zeroes. bos@559: bos@592: In , you can see bos@559: an example of the conceptual structure of a revlog. Filelogs, bos@559: manifests, and changelogs all have this same structure; they bos@559: differ only in the kind of data stored in each delta or bos@559: snapshot. bos@559: bos@584: The first revision in a revlog (at the bottom of the image) bos@559: has the null ID in both of its parent slots. For a bos@559: normal revision, its first parent slot contains bos@559: the ID of its parent revision, and its second contains the null bos@559: ID, indicating that the revision has only one real parent. Any bos@559: two revisions that have the same parent ID are branches. A bos@559: revision that represents a merge between branches has two normal bos@559: revision IDs in its parent slots. bos@559: bos@591:
bos@591: The conceptual structure of a revlog bos@591: bos@594: bos@591: XXX add text bos@591: bos@591:
bos@559: bos@559:
bos@559: bos@559: The working directory bos@559: bos@584: In the working directory, Mercurial stores a snapshot of the bos@559: files from the repository as of a particular changeset. bos@559: bos@584: The working directory knows which changeset bos@559: it contains. When you update the working directory to contain a bos@559: particular changeset, Mercurial looks up the appropriate bos@559: revision of the manifest to find out which files it was tracking bos@559: at the time that changeset was committed, and which revision of bos@559: each file was then current. It then recreates a copy of each of bos@559: those files, with the same contents it had when the changeset bos@559: was committed. bos@559: bos@701: The dirstate is a special bos@701: structure that contains Mercurial's knowledge of the working bos@701: directory. It is maintained as a file named bos@701: .hg/dirstate inside a repository. The bos@701: dirstate details which changeset the working directory is bos@701: updated to, and all of the files that Mercurial is tracking in bos@701: the working directory. It also lets Mercurial quickly notice bos@701: changed files, by recording their checkout times and bos@701: sizes. bos@559: bos@584: Just as a revision of a revlog has room for two parents, so bos@559: that it can represent either a normal revision (with one parent) bos@559: or a merge of two earlier revisions, the dirstate has slots for bos@559: two parents. When you use the hg bos@559: update command, the changeset that you update to is bos@559: stored in the first parent slot, and the null ID bos@559: in the second. When you hg bos@559: merge with another changeset, the first parent bos@559: remains unchanged, and the second parent is filled in with the bos@559: changeset you're merging with. The hg bos@559: parents command tells you what the parents of the bos@559: dirstate are. bos@559: bos@559: bos@559: What happens when you commit bos@559: bos@584: The dirstate stores parent information for more than just bos@559: book-keeping purposes. Mercurial uses the parents of the bos@559: dirstate as the parents of a new bos@559: changeset when you perform a commit. bos@559: bos@591:
bos@591: The working directory can have two parents bos@591: bos@594: bos@591: XXX add text bos@591: bos@591:
bos@559: bos@592: shows the bos@559: normal state of the working directory, where it has a single bos@559: changeset as parent. That changeset is the bos@559: tip, the newest changeset in the bos@559: repository that has no children. bos@559: bos@591:
bos@591: The working directory gains new parents after a bos@591: commit bos@591: bos@594: bos@591: XXX add text bos@591: bos@591:
bos@559: bos@584: It's useful to think of the working directory as bos@559: the changeset I'm about to commit. Any files bos@559: that you tell Mercurial that you've added, removed, renamed, bos@559: or copied will be reflected in that changeset, as will bos@559: modifications to any files that Mercurial is already tracking; bos@559: the new changeset will have the parents of the working bos@559: directory as its parents. bos@559: bos@592: After a commit, Mercurial will update the bos@592: parents of the working directory, so that the first parent is bos@592: the ID of the new changeset, and the second is the null ID. bos@592: This is shown in . Mercurial bos@559: doesn't touch any of the files in the working directory when bos@559: you commit; it just modifies the dirstate to note its new bos@559: parents. bos@559: bos@559:
bos@559: bos@559: Creating a new head bos@559: bos@584: It's perfectly normal to update the working directory to a bos@559: changeset other than the current tip. For example, you might bos@559: want to know what your project looked like last Tuesday, or bos@559: you could be looking through changesets to see which one bos@559: introduced a bug. In cases like this, the natural thing to do bos@559: is update the working directory to the changeset you're bos@559: interested in, and then examine the files in the working bos@559: directory directly to see their contents as they were when you bos@559: committed that changeset. The effect of this is shown in bos@592: . bos@559: bos@591:
bos@591: The working directory, updated to an older bos@591: changeset bos@591: bos@594: bos@591: XXX add text bos@591: bos@591:
bos@559: bos@592: Having updated the working directory to an bos@592: older changeset, what happens if you make some changes, and bos@592: then commit? Mercurial behaves in the same way as I outlined bos@559: above. The parents of the working directory become the bos@559: parents of the new changeset. This new changeset has no bos@559: children, so it becomes the new tip. And the repository now bos@559: contains two changesets that have no children; we call these bos@559: heads. You can see the structure that bos@592: this creates in . bos@559: bos@591:
bos@591: After a commit made while synced to an older bos@591: changeset bos@591: bos@594: bos@591: XXX add text bos@591: bos@591:
bos@559: bos@559: bos@701: If you're new to Mercurial, you should keep bos@701: in mind a common error, which is to use the bos@701: hg pull command without any bos@559: options. By default, the hg bos@559: pull command does not bos@559: update the working directory, so you'll bring new changesets bos@559: into your repository, but the working directory will stay bos@559: synced at the same changeset as before the pull. If you bos@559: make some changes and commit afterwards, you'll thus create bos@559: a new head, because your working directory isn't synced to bos@701: whatever the current tip is. To combine the operation of a bos@701: pull, followed by an update, run hg pull bos@701: -u. bos@701: bos@701: I put the word error in quotes bos@701: because all that you need to do to rectify the situation bos@701: where you created a new head by accident is bos@701: hg merge, then hg commit. In other words, this bos@701: almost never has negative consequences; it's just something bos@701: of a surprise for newcomers. I'll discuss other ways to bos@701: avoid this behavior, and why Mercurial behaves in this bos@701: initially surprising way, later on. bos@559: bos@559: bos@559:
bos@559: bos@620: Merging changes bos@559: bos@592: When you run the hg bos@592: merge command, Mercurial leaves the first parent bos@592: of the working directory unchanged, and sets the second parent bos@592: to the changeset you're merging with, as shown in . bos@559: bos@591:
bos@591: Merging two heads bos@591: bos@591: bos@594: bos@591: bos@591: XXX add text bos@591: bos@591:
bos@559: bos@584: Mercurial also has to modify the working directory, to bos@559: merge the files managed in the two changesets. Simplified a bos@559: little, the merging process goes like this, for every file in bos@559: the manifests of both changesets. bos@559: bos@584: If neither changeset has modified a file, do bos@559: nothing with that file. bos@559: bos@584: If one changeset has modified a file, and the bos@559: other hasn't, create the modified copy of the file in the bos@559: working directory. bos@559: bos@584: If one changeset has removed a file, and the bos@559: other hasn't (or has also deleted it), delete the file bos@559: from the working directory. bos@559: bos@584: If one changeset has removed a file, but the bos@559: other has modified the file, ask the user what to do: keep bos@559: the modified file, or remove it? bos@559: bos@584: If both changesets have modified a file, bos@559: invoke an external merge program to choose the new bos@559: contents for the merged file. This may require input from bos@559: the user. bos@559: bos@584: If one changeset has modified a file, and the bos@559: other has renamed or copied the file, make sure that the bos@559: changes follow the new name of the file. bos@559: bos@584: There are more details&emdash;merging has plenty of corner bos@559: cases&emdash;but these are the most common choices that are bos@559: involved in a merge. As you can see, most cases are bos@559: completely automatic, and indeed most merges finish bos@559: automatically, without requiring your input to resolve any bos@559: conflicts. bos@559: bos@584: When you're thinking about what happens when you commit bos@559: after a merge, once again the working directory is the bos@559: changeset I'm about to commit. After the hg merge command completes, the bos@559: working directory has two parents; these will become the bos@559: parents of the new changeset. bos@559: bos@701: Mercurial lets you perform multiple merges, but bos@701: you must commit the results of each individual merge as you bos@701: go. This is necessary because Mercurial only tracks two bos@701: parents for both revisions and the working directory. While bos@701: it would be technically feasible to merge multiple changesets bos@701: at once, Mercurial avoids this for simplicity. With multi-way bos@701: merges, the risks of user confusion, nasty conflict bos@701: resolution, and making a terrible mess of a merge would grow bos@701: intolerable. bos@559: bos@559:
bos@620: bos@620: bos@620: Merging and renames bos@620: bos@676: A surprising number of revision control systems pay little bos@620: or no attention to a file's name over bos@620: time. For instance, it used to be common that if a file got bos@620: renamed on one side of a merge, the changes from the other bos@620: side would be silently dropped. bos@620: bos@676: Mercurial records metadata when you tell it to perform a bos@620: rename or copy. It uses this metadata during a merge to do the bos@620: right thing in the case of a merge. For instance, if I rename bos@620: a file, and you edit it without renaming it, when we merge our bos@620: work the file will be renamed and have your edits bos@620: applied. bos@620: bos@559:
bos@620: bos@559: bos@559: Other interesting design features bos@559: bos@584: In the sections above, I've tried to highlight some of the bos@559: most important aspects of Mercurial's design, to illustrate that bos@559: it pays careful attention to reliability and performance. bos@559: However, the attention to detail doesn't stop there. There are bos@559: a number of other aspects of Mercurial's construction that I bos@559: personally find interesting. I'll detail a few of them here, bos@559: separate from the big ticket items above, so that bos@559: if you're interested, you can gain a better idea of the amount bos@559: of thinking that goes into a well-designed system. bos@559: bos@559: bos@559: Clever compression bos@559: bos@584: When appropriate, Mercurial will store both snapshots and bos@559: deltas in compressed form. It does this by always bos@559: trying to compress a snapshot or delta, bos@559: but only storing the compressed version if it's smaller than bos@559: the uncompressed version. bos@559: bos@584: This means that Mercurial does the right bos@559: thing when storing a file whose native form is bos@559: compressed, such as a zip archive or a JPEG bos@559: image. When these types of files are compressed a second bos@559: time, the resulting file is usually bigger than the bos@559: once-compressed form, and so Mercurial will store the plain bos@559: zip or JPEG. bos@559: bos@584: Deltas between revisions of a compressed file are usually bos@559: larger than snapshots of the file, and Mercurial again does bos@559: the right thing in these cases. It finds that bos@559: such a delta exceeds the threshold at which it should store a bos@559: complete snapshot of the file, so it stores the snapshot, bos@559: again saving space compared to a naive delta-only bos@559: approach. bos@559: bos@559: bos@559: Network recompression bos@559: bos@584: When storing revisions on disk, Mercurial uses the bos@559: deflate compression algorithm (the same one bos@559: used by the popular zip archive format), bos@559: which balances good speed with a respectable compression bos@559: ratio. However, when transmitting revision data over a bos@559: network connection, Mercurial uncompresses the compressed bos@559: revision data. bos@559: bos@584: If the connection is over HTTP, Mercurial recompresses bos@559: the entire stream of data using a compression algorithm that bos@559: gives a better compression ratio (the Burrows-Wheeler bos@559: algorithm from the widely used bzip2 bos@559: compression package). This combination of algorithm and bos@559: compression of the entire stream (instead of a revision at a bos@559: time) substantially reduces the number of bytes to be bos@620: transferred, yielding better network performance over most bos@620: kinds of network. bos@559: bos@701: If the connection is over bos@701: ssh, Mercurial bos@701: doesn't recompress the stream, because bos@701: ssh can already do this itself. You can bos@701: tell Mercurial to always use ssh's bos@701: compression feature by editing the bos@701: .hgrc file in your home directory as bos@701: follows. bos@701: bos@701: [ui] bos@701: ssh = ssh -C bos@559: bos@559: bos@559: bos@559: bos@559: Read/write ordering and atomicity bos@559: bos@592: Appending to files isn't the whole story when bos@592: it comes to guaranteeing that a reader won't see a partial bos@592: write. If you recall , bos@701: revisions in the changelog point to revisions in the manifest, bos@701: and revisions in the manifest point to revisions in filelogs. bos@592: This hierarchy is deliberate. bos@559: bos@584: A writer starts a transaction by writing filelog and bos@559: manifest data, and doesn't write any changelog data until bos@559: those are finished. A reader starts by reading changelog bos@559: data, then manifest data, followed by filelog data. bos@559: bos@584: Since the writer has always finished writing filelog and bos@559: manifest data before it writes to the changelog, a reader will bos@559: never read a pointer to a partially written manifest revision bos@559: from the changelog, and it will never read a pointer to a bos@559: partially written filelog revision from the manifest. bos@559: bos@559: bos@559: bos@559: Concurrent access bos@559: bos@584: The read/write ordering and atomicity guarantees mean that bos@559: Mercurial never needs to lock a bos@559: repository when it's reading data, even if the repository is bos@559: being written to while the read is occurring. This has a big bos@559: effect on scalability; you can have an arbitrary number of bos@559: Mercurial processes safely reading data from a repository bos@701: all at once, no matter whether it's being written to or bos@559: not. bos@559: bos@584: The lockless nature of reading means that if you're bos@559: sharing a repository on a multi-user system, you don't need to bos@559: grant other local users permission to bos@559: write to your repository in order for bos@559: them to be able to clone it or pull changes from it; they only bos@559: need read permission. (This is bos@559: not a common feature among revision bos@559: control systems, so don't take it for granted! Most require bos@559: readers to be able to lock a repository to access it safely, bos@559: and this requires write permission on at least one directory, bos@559: which of course makes for all kinds of nasty and annoying bos@559: security and administrative problems.) bos@559: bos@584: Mercurial uses locks to ensure that only one process can bos@559: write to a repository at a time (the locking mechanism is safe bos@559: even over filesystems that are notoriously hostile to locking, bos@559: such as NFS). If a repository is locked, a writer will wait bos@559: for a while to retry if the repository becomes unlocked, but bos@559: if the repository remains locked for too long, the process bos@559: attempting to write will time out after a while. This means bos@559: that your daily automated scripts won't get stuck forever and bos@559: pile up if a system crashes unnoticed, for example. (Yes, the bos@559: timeout is configurable, from zero to infinity.) bos@559: bos@559: bos@559: Safe dirstate access bos@559: bos@584: As with revision data, Mercurial doesn't take a lock to bos@559: read the dirstate file; it does acquire a lock to write it. bos@559: To avoid the possibility of reading a partially written copy bos@559: of the dirstate file, Mercurial writes to a file with a bos@559: unique name in the same directory as the dirstate file, then bos@559: renames the temporary file atomically to bos@559: dirstate. The file named bos@559: dirstate is thus guaranteed to be bos@559: complete, not partially written. bos@559: bos@559: bos@559: bos@559: bos@559: Avoiding seeks bos@559: bos@584: Critical to Mercurial's performance is the avoidance of bos@559: seeks of the disk head, since any seek is far more expensive bos@559: than even a comparatively large read operation. bos@559: bos@584: This is why, for example, the dirstate is stored in a bos@559: single file. If there were a dirstate file per directory that bos@559: Mercurial tracked, the disk would seek once per directory. bos@559: Instead, Mercurial reads the entire single dirstate file in bos@559: one step. bos@559: bos@584: Mercurial also uses a copy on write scheme bos@559: when cloning a repository on local storage. Instead of bos@559: copying every revlog file from the old repository into the new bos@559: repository, it makes a hard link, which is a bos@559: shorthand way to say these two names point to the same bos@559: file. When Mercurial is about to write to one of a bos@559: revlog's files, it checks to see if the number of names bos@559: pointing at the file is greater than one. If it is, more than bos@559: one repository is using the file, so Mercurial makes a new bos@559: copy of the file that is private to this repository. bos@559: bos@584: A few revision control developers have pointed out that bos@559: this idea of making a complete private copy of a file is not bos@559: very efficient in its use of storage. While this is true, bos@559: storage is cheap, and this method gives the highest bos@559: performance while deferring most book-keeping to the operating bos@559: system. An alternative scheme would most likely reduce bos@701: performance and increase the complexity of the software, but bos@701: speed and simplicity are key to the feel of bos@559: day-to-day use. bos@559: bos@559: bos@559: bos@559: Other contents of the dirstate bos@559: bos@584: Because Mercurial doesn't force you to tell it when you're bos@559: modifying a file, it uses the dirstate to store some extra bos@559: information so it can determine efficiently whether you have bos@559: modified a file. For each file in the working directory, it bos@559: stores the time that it last modified the file itself, and the bos@559: size of the file at that time. bos@559: bos@584: When you explicitly hg bos@559: add, hg remove, bos@559: hg rename or hg copy files, Mercurial updates the bos@559: dirstate so that it knows what to do with those files when you bos@559: commit. bos@559: bos@701: The dirstate helps Mercurial to efficiently bos@701: check the status of files in a repository. bos@701: bos@701: bos@701: bos@702: When Mercurial checks the state of a file in the bos@701: working directory, it first checks a file's modification bos@701: time against the time in the dirstate that records when bos@701: Mercurial last wrote the file. If the last modified time bos@701: is the same as the time when Mercurial wrote the file, the bos@701: file must not have been modified, so Mercurial does not bos@701: need to check any further. bos@701: bos@701: bos@702: If the file's size has changed, the file must have bos@701: been modified. If the modification time has changed, but bos@701: the size has not, only then does Mercurial need to bos@701: actually read the contents of the file to see if it has bos@701: changed. bos@701: bos@701: bos@701: bos@702: Storing the modification time and size dramatically bos@701: reduces the number of read operations that Mercurial needs to bos@701: perform when we run commands like hg status. bos@701: This results in large performance improvements. bos@559: bos@559: belaran@964:
belaran@964: belaran@964: