Behind the scenes

bos@559: bos@559: bos@559: bos@572: bos@559: Behind the scenes bos@559: bos@584: Unlike many revision control systems, the concepts upon which bos@559: Mercurial is built are simple enough that it's easy to understand bos@559: how the software really works. Knowing this certainly isn't bos@559: necessary, but I find it useful to have a mental bos@559: model of what's going on. bos@559: bos@584: This understanding gives me confidence that Mercurial has been bos@559: carefully designed to be both safe and bos@559: efficient. And just as importantly, if it's bos@559: easy for me to retain a good idea of what the software is doing bos@559: when I perform a revision control task, I'm less likely to be bos@559: surprised by its behaviour. bos@559: bos@584: In this chapter, we'll initially cover the core concepts bos@559: behind Mercurial's design, then continue to discuss some of the bos@559: interesting details of its implementation. bos@559: bos@559: bos@559: Mercurial's historical record bos@559: bos@559: bos@559: Tracking the history of a single file bos@559: bos@584: When Mercurial tracks modifications to a file, it stores bos@559: the history of that file in a metadata object called a bos@559: filelog. Each entry in the filelog bos@559: contains enough information to reconstruct one revision of the bos@559: file that is being tracked. Filelogs are stored as files in bos@559: the .hg/store/data directory. A bos@559: filelog contains two kinds of information: revision data, and bos@559: an index to help Mercurial to find a revision bos@559: efficiently. bos@559: bos@584: A file that is large, or has a lot of history, has its bos@559: filelog stored in separate data bos@559: (.d suffix) and index bos@559: (.i suffix) files. For bos@559: small files without much history, the revision data and index bos@559: are combined in a single .i bos@559: file. The correspondence between a file in the working bos@559: directory and the filelog that tracks its history in the bos@559: repository is illustrated in figure . bos@559: bos@559: bos@559: XXX bos@559: add text bos@584: Relationships between files in working bos@559: directory and filelogs in bos@559: repository bos@559: bos@559: bos@559: bos@559: bos@559: Managing tracked files bos@559: bos@584: Mercurial uses a structure called a bos@559: manifest to collect together information bos@559: about the files that it tracks. Each entry in the manifest bos@559: contains information about the files present in a single bos@559: changeset. An entry records which files are present in the bos@559: changeset, the revision of each file, and a few other pieces bos@559: of file metadata. bos@559: bos@559: bos@559: bos@559: Recording changeset information bos@559: bos@584: The changelog contains information bos@559: about each changeset. Each revision records who committed a bos@559: change, the changeset comment, other pieces of bos@559: changeset-related information, and the revision of the bos@559: manifest to use. bos@559: bos@559: bos@559: bos@559: Relationships between revisions bos@559: bos@584: Within a changelog, a manifest, or a filelog, each bos@559: revision stores a pointer to its immediate parent (or to its bos@559: two parents, if it's a merge revision). As I mentioned above, bos@559: there are also relationships between revisions bos@559: across these structures, and they are bos@559: hierarchical in nature. bos@559: bos@584: For every changeset in a repository, there is exactly one bos@559: revision stored in the changelog. Each revision of the bos@559: changelog contains a pointer to a single revision of the bos@559: manifest. A revision of the manifest stores a pointer to a bos@559: single revision of each filelog tracked when that changeset bos@559: was created. These relationships are illustrated in figure bos@559: . bos@559: bos@559: bos@559: XXX bos@584: add textMetadata bos@559: relationships bos@559: bos@559: bos@559: bos@584: As the illustration shows, there is bos@559: not a one to one bos@559: relationship between revisions in the changelog, manifest, or bos@559: filelog. If the manifest hasn't changed between two bos@559: changesets, the changelog entries for those changesets will bos@559: point to the same revision of the manifest. If a file that bos@559: Mercurial tracks hasn't changed between two changesets, the bos@559: entry for that file in the two revisions of the manifest will bos@559: point to the same revision of its filelog. bos@559: bos@559: bos@559: bos@559: bos@559: Safe, efficient storage bos@559: bos@584: The underpinnings of changelogs, manifests, and filelogs are bos@559: provided by a single structure called the bos@559: revlog. bos@559: bos@559: bos@559: Efficient storage bos@559: bos@584: The revlog provides efficient storage of revisions using a bos@559: delta mechanism. Instead of storing a bos@559: complete copy of a file for each revision, it stores the bos@559: changes needed to transform an older revision into the new bos@559: revision. For many kinds of file data, these deltas are bos@559: typically a fraction of a percent of the size of a full copy bos@559: of a file. bos@559: bos@584: Some obsolete revision control systems can only work with bos@559: deltas of text files. They must either store binary files as bos@559: complete snapshots or encoded into a text representation, both bos@559: of which are wasteful approaches. Mercurial can efficiently bos@559: handle deltas of files with arbitrary binary contents; it bos@559: doesn't need to treat text as special. bos@559: bos@559: bos@559: bos@559: Safe operation bos@559: bos@584: Mercurial only ever appends data to bos@559: the end of a revlog file. It never modifies a section of a bos@559: file after it has written it. This is both more robust and bos@559: efficient than schemes that need to modify or rewrite bos@559: data. bos@559: bos@584: In addition, Mercurial treats every write as part of a bos@559: transaction that can span a number of bos@559: files. A transaction is atomic: either bos@559: the entire transaction succeeds and its effects are all bos@559: visible to readers in one go, or the whole thing is undone. bos@559: This guarantee of atomicity means that if you're running two bos@559: copies of Mercurial, where one is reading data and one is bos@559: writing it, the reader will never see a partially written bos@559: result that might confuse it. bos@559: bos@584: The fact that Mercurial only appends to files makes it bos@559: easier to provide this transactional guarantee. The easier it bos@559: is to do stuff like this, the more confident you should be bos@559: that it's done correctly. bos@559: bos@559: bos@559: bos@559: Fast retrieval bos@559: bos@584: Mercurial cleverly avoids a pitfall common to all earlier bos@559: revision control systems: the problem of inefficient bos@559: retrieval. Most revision control systems store bos@559: the contents of a revision as an incremental series of bos@559: modifications against a snapshot. To bos@559: reconstruct a specific revision, you must first read the bos@559: snapshot, and then every one of the revisions between the bos@559: snapshot and your target revision. The more history that a bos@559: file accumulates, the more revisions you must read, hence the bos@559: longer it takes to reconstruct a particular revision. bos@559: bos@559: bos@559: XXX bos@584: add textSnapshot of bos@559: a revlog, with incremental bos@559: deltas bos@559: bos@559: bos@584: The innovation that Mercurial applies to this problem is bos@559: simple but effective. Once the cumulative amount of delta bos@559: information stored since the last snapshot exceeds a fixed bos@559: threshold, it stores a new snapshot (compressed, of course), bos@559: instead of another delta. This makes it possible to bos@559: reconstruct any revision of a file bos@559: quickly. This approach works so well that it has since been bos@559: copied by several other revision control systems. bos@559: bos@584: Figure illustrates bos@559: the idea. In an entry in a revlog's index file, Mercurial bos@559: stores the range of entries from the data file that it must bos@559: read to reconstruct a particular revision. bos@559: bos@559: bos@559: Aside: the influence of video compression bos@559: bos@584: If you're familiar with video compression or have ever bos@559: watched a TV feed through a digital cable or satellite bos@559: service, you may know that most video compression schemes bos@559: store each frame of video as a delta against its predecessor bos@559: frame. In addition, these schemes use lossy bos@559: compression techniques to increase the compression ratio, so bos@559: visual errors accumulate over the course of a number of bos@559: inter-frame deltas. bos@559: bos@584: Because it's possible for a video stream to drop bos@559: out occasionally due to signal glitches, and to bos@559: limit the accumulation of artefacts introduced by the lossy bos@559: compression process, video encoders periodically insert a bos@559: complete frame (called a key frame) into the bos@559: video stream; the next delta is generated against that bos@559: frame. This means that if the video signal gets bos@559: interrupted, it will resume once the next key frame is bos@559: received. Also, the accumulation of encoding errors bos@559: restarts anew with each key frame. bos@559: bos@559: bos@559: bos@559: bos@559: Identification and strong integrity bos@559: bos@584: Along with delta or snapshot information, a revlog entry bos@559: contains a cryptographic hash of the data that it represents. bos@559: This makes it difficult to forge the contents of a revision, bos@559: and easy to detect accidental corruption. bos@559: bos@584: Hashes provide more than a mere check against corruption; bos@559: they are used as the identifiers for revisions. The changeset bos@559: identification hashes that you see as an end user are from bos@559: revisions of the changelog. Although filelogs and the bos@559: manifest also use hashes, Mercurial only uses these behind the bos@559: scenes. bos@559: bos@584: Mercurial verifies that hashes are correct when it bos@559: retrieves file revisions and when it pulls changes from bos@559: another repository. If it encounters an integrity problem, it bos@559: will complain and stop whatever it's doing. bos@559: bos@584: In addition to the effect it has on retrieval efficiency, bos@559: Mercurial's use of periodic snapshots makes it more robust bos@559: against partial data corruption. If a revlog becomes partly bos@559: corrupted due to a hardware error or system bug, it's often bos@559: possible to reconstruct some or most revisions from the bos@559: uncorrupted sections of the revlog, both before and after the bos@559: corrupted section. This would not be possible with a bos@559: delta-only storage model. bos@559: bos@559: bos@559: bos@559: bos@559: Revision history, branching, and merging bos@559: bos@584: Every entry in a Mercurial revlog knows the identity of its bos@559: immediate ancestor revision, usually referred to as its bos@559: parent. In fact, a revision contains room bos@559: for not one parent, but two. Mercurial uses a special hash, bos@559: called the null ID, to represent the idea bos@559: there is no parent here. This hash is simply a bos@559: string of zeroes. bos@559: bos@584: In figure , you can see bos@559: an example of the conceptual structure of a revlog. Filelogs, bos@559: manifests, and changelogs all have this same structure; they bos@559: differ only in the kind of data stored in each delta or bos@559: snapshot. bos@559: bos@584: The first revision in a revlog (at the bottom of the image) bos@559: has the null ID in both of its parent slots. For a bos@559: normal revision, its first parent slot contains bos@559: the ID of its parent revision, and its second contains the null bos@559: ID, indicating that the revision has only one real parent. Any bos@559: two revisions that have the same parent ID are branches. A bos@559: revision that represents a merge between branches has two normal bos@559: revision IDs in its parent slots. bos@559: bos@559: bos@559: XXX bos@559: add text bos@559: bos@559: bos@559: bos@559: bos@559: The working directory bos@559: bos@584: In the working directory, Mercurial stores a snapshot of the bos@559: files from the repository as of a particular changeset. bos@559: bos@584: The working directory knows which changeset bos@559: it contains. When you update the working directory to contain a bos@559: particular changeset, Mercurial looks up the appropriate bos@559: revision of the manifest to find out which files it was tracking bos@559: at the time that changeset was committed, and which revision of bos@559: each file was then current. It then recreates a copy of each of bos@559: those files, with the same contents it had when the changeset bos@559: was committed. bos@559: bos@584: The dirstate contains Mercurial's bos@559: knowledge of the working directory. This details which bos@559: changeset the working directory is updated to, and all of the bos@559: files that Mercurial is tracking in the working bos@559: directory. bos@559: bos@584: Just as a revision of a revlog has room for two parents, so bos@559: that it can represent either a normal revision (with one parent) bos@559: or a merge of two earlier revisions, the dirstate has slots for bos@559: two parents. When you use the hg bos@559: update command, the changeset that you update to is bos@559: stored in the first parent slot, and the null ID bos@559: in the second. When you hg bos@559: merge with another changeset, the first parent bos@559: remains unchanged, and the second parent is filled in with the bos@559: changeset you're merging with. The hg bos@559: parents command tells you what the parents of the bos@559: dirstate are. bos@559: bos@559: bos@559: What happens when you commit bos@559: bos@584: The dirstate stores parent information for more than just bos@559: book-keeping purposes. Mercurial uses the parents of the bos@559: dirstate as the parents of a new bos@559: changeset when you perform a commit. bos@559: bos@559: bos@559: XXX bos@584: add textThe working bos@559: directory can have two bos@559: parents bos@559: bos@559: bos@584: Figure shows the bos@559: normal state of the working directory, where it has a single bos@559: changeset as parent. That changeset is the bos@559: tip, the newest changeset in the bos@559: repository that has no children. bos@559: bos@559: bos@559: XXX bos@584: add textThe working bos@559: directory gains new parents after a bos@559: commit bos@559: bos@559: bos@584: It's useful to think of the working directory as bos@559: the changeset I'm about to commit. Any files bos@559: that you tell Mercurial that you've added, removed, renamed, bos@559: or copied will be reflected in that changeset, as will bos@559: modifications to any files that Mercurial is already tracking; bos@559: the new changeset will have the parents of the working bos@559: directory as its parents. bos@559: bos@584: After a commit, Mercurial will update the parents of the bos@559: working directory, so that the first parent is the ID of the bos@559: new changeset, and the second is the null ID. This is shown bos@559: in figure . bos@559: Mercurial bos@559: doesn't touch any of the files in the working directory when bos@559: you commit; it just modifies the dirstate to note its new bos@559: parents. bos@559: bos@559: bos@559: bos@559: Creating a new head bos@559: bos@584: It's perfectly normal to update the working directory to a bos@559: changeset other than the current tip. For example, you might bos@559: want to know what your project looked like last Tuesday, or bos@559: you could be looking through changesets to see which one bos@559: introduced a bug. In cases like this, the natural thing to do bos@559: is update the working directory to the changeset you're bos@559: interested in, and then examine the files in the working bos@559: directory directly to see their contents as they were when you bos@559: committed that changeset. The effect of this is shown in bos@559: figure . bos@559: bos@559: bos@559: XXX bos@584: add textThe working bos@559: directory, updated to an older bos@559: changeset bos@559: bos@559: bos@584: Having updated the working directory to an older bos@559: changeset, what happens if you make some changes, and then bos@559: commit? Mercurial behaves in the same way as I outlined bos@559: above. The parents of the working directory become the bos@559: parents of the new changeset. This new changeset has no bos@559: children, so it becomes the new tip. And the repository now bos@559: contains two changesets that have no children; we call these bos@559: heads. You can see the structure that bos@559: this creates in figure . bos@559: bos@559: bos@559: XXX bos@584: add textAfter a bos@559: commit made while synced to an older bos@559: changeset bos@559: bos@559: bos@559: bos@584: If you're new to Mercurial, you should keep in mind a bos@559: common error, which is to use the hg pull command without any bos@559: options. By default, the hg bos@559: pull command does not bos@559: update the working directory, so you'll bring new changesets bos@559: into your repository, but the working directory will stay bos@559: synced at the same changeset as before the pull. If you bos@559: make some changes and commit afterwards, you'll thus create bos@559: a new head, because your working directory isn't synced to bos@559: whatever the current tip is. bos@559: bos@584: I put the word error in quotes because bos@559: all that you need to do to rectify this situation is bos@559: hg merge, then hg commit. In other words, this bos@559: almost never has negative consequences; it just surprises bos@559: people. I'll discuss other ways to avoid this behaviour, bos@559: and why Mercurial behaves in this initially surprising way, bos@559: later on. bos@559: bos@559: bos@559: bos@559: bos@559: Merging heads bos@559: bos@584: When you run the hg merge bos@559: command, Mercurial leaves the first parent of the working bos@559: directory unchanged, and sets the second parent to the bos@559: changeset you're merging with, as shown in figure . bos@559: bos@559: bos@559: XXX bos@584: add textMerging two bos@559: heads bos@559: bos@559: bos@584: Mercurial also has to modify the working directory, to bos@559: merge the files managed in the two changesets. Simplified a bos@559: little, the merging process goes like this, for every file in bos@559: the manifests of both changesets. bos@559: bos@584: If neither changeset has modified a file, do bos@559: nothing with that file. bos@559: bos@584: If one changeset has modified a file, and the bos@559: other hasn't, create the modified copy of the file in the bos@559: working directory. bos@559: bos@584: If one changeset has removed a file, and the bos@559: other hasn't (or has also deleted it), delete the file bos@559: from the working directory. bos@559: bos@584: If one changeset has removed a file, but the bos@559: other has modified the file, ask the user what to do: keep bos@559: the modified file, or remove it? bos@559: bos@584: If both changesets have modified a file, bos@559: invoke an external merge program to choose the new bos@559: contents for the merged file. This may require input from bos@559: the user. bos@559: bos@584: If one changeset has modified a file, and the bos@559: other has renamed or copied the file, make sure that the bos@559: changes follow the new name of the file. bos@559: bos@584: There are more details&emdash;merging has plenty of corner bos@559: cases&emdash;but these are the most common choices that are bos@559: involved in a merge. As you can see, most cases are bos@559: completely automatic, and indeed most merges finish bos@559: automatically, without requiring your input to resolve any bos@559: conflicts. bos@559: bos@584: When you're thinking about what happens when you commit bos@559: after a merge, once again the working directory is the bos@559: changeset I'm about to commit. After the hg merge command completes, the bos@559: working directory has two parents; these will become the bos@559: parents of the new changeset. bos@559: bos@584: Mercurial lets you perform multiple merges, but you must bos@559: commit the results of each individual merge as you go. This bos@559: is necessary because Mercurial only tracks two parents for bos@559: both revisions and the working directory. While it would be bos@559: technically possible to merge multiple changesets at once, the bos@559: prospect of user confusion and making a terrible mess of a bos@559: merge immediately becomes overwhelming. bos@559: bos@559: bos@559: bos@559: bos@559: Other interesting design features bos@559: bos@584: In the sections above, I've tried to highlight some of the bos@559: most important aspects of Mercurial's design, to illustrate that bos@559: it pays careful attention to reliability and performance. bos@559: However, the attention to detail doesn't stop there. There are bos@559: a number of other aspects of Mercurial's construction that I bos@559: personally find interesting. I'll detail a few of them here, bos@559: separate from the big ticket items above, so that bos@559: if you're interested, you can gain a better idea of the amount bos@559: of thinking that goes into a well-designed system. bos@559: bos@559: bos@559: Clever compression bos@559: bos@584: When appropriate, Mercurial will store both snapshots and bos@559: deltas in compressed form. It does this by always bos@559: trying to compress a snapshot or delta, bos@559: but only storing the compressed version if it's smaller than bos@559: the uncompressed version. bos@559: bos@584: This means that Mercurial does the right bos@559: thing when storing a file whose native form is bos@559: compressed, such as a zip archive or a JPEG bos@559: image. When these types of files are compressed a second bos@559: time, the resulting file is usually bigger than the bos@559: once-compressed form, and so Mercurial will store the plain bos@559: zip or JPEG. bos@559: bos@584: Deltas between revisions of a compressed file are usually bos@559: larger than snapshots of the file, and Mercurial again does bos@559: the right thing in these cases. It finds that bos@559: such a delta exceeds the threshold at which it should store a bos@559: complete snapshot of the file, so it stores the snapshot, bos@559: again saving space compared to a naive delta-only bos@559: approach. bos@559: bos@559: bos@559: Network recompression bos@559: bos@584: When storing revisions on disk, Mercurial uses the bos@559: deflate compression algorithm (the same one bos@559: used by the popular zip archive format), bos@559: which balances good speed with a respectable compression bos@559: ratio. However, when transmitting revision data over a bos@559: network connection, Mercurial uncompresses the compressed bos@559: revision data. bos@559: bos@584: If the connection is over HTTP, Mercurial recompresses bos@559: the entire stream of data using a compression algorithm that bos@559: gives a better compression ratio (the Burrows-Wheeler bos@559: algorithm from the widely used bzip2 bos@559: compression package). This combination of algorithm and bos@559: compression of the entire stream (instead of a revision at a bos@559: time) substantially reduces the number of bytes to be bos@559: transferred, yielding better network performance over almost bos@559: all kinds of network. bos@559: bos@584: (If the connection is over ssh, bos@559: Mercurial doesn't recompress the bos@559: stream, because ssh can already do this bos@559: itself.) bos@559: bos@559: bos@559: bos@559: bos@559: Read/write ordering and atomicity bos@559: bos@584: Appending to files isn't the whole story when it comes to bos@559: guaranteeing that a reader won't see a partial write. If you bos@559: recall figure , bos@559: revisions in the bos@559: changelog point to revisions in the manifest, and revisions in bos@559: the manifest point to revisions in filelogs. This hierarchy bos@559: is deliberate. bos@559: bos@584: A writer starts a transaction by writing filelog and bos@559: manifest data, and doesn't write any changelog data until bos@559: those are finished. A reader starts by reading changelog bos@559: data, then manifest data, followed by filelog data. bos@559: bos@584: Since the writer has always finished writing filelog and bos@559: manifest data before it writes to the changelog, a reader will bos@559: never read a pointer to a partially written manifest revision bos@559: from the changelog, and it will never read a pointer to a bos@559: partially written filelog revision from the manifest. bos@559: bos@559: bos@559: bos@559: Concurrent access bos@559: bos@584: The read/write ordering and atomicity guarantees mean that bos@559: Mercurial never needs to lock a bos@559: repository when it's reading data, even if the repository is bos@559: being written to while the read is occurring. This has a big bos@559: effect on scalability; you can have an arbitrary number of bos@559: Mercurial processes safely reading data from a repository bos@559: safely all at once, no matter whether it's being written to or bos@559: not. bos@559: bos@584: The lockless nature of reading means that if you're bos@559: sharing a repository on a multi-user system, you don't need to bos@559: grant other local users permission to bos@559: write to your repository in order for bos@559: them to be able to clone it or pull changes from it; they only bos@559: need read permission. (This is bos@559: not a common feature among revision bos@559: control systems, so don't take it for granted! Most require bos@559: readers to be able to lock a repository to access it safely, bos@559: and this requires write permission on at least one directory, bos@559: which of course makes for all kinds of nasty and annoying bos@559: security and administrative problems.) bos@559: bos@584: Mercurial uses locks to ensure that only one process can bos@559: write to a repository at a time (the locking mechanism is safe bos@559: even over filesystems that are notoriously hostile to locking, bos@559: such as NFS). If a repository is locked, a writer will wait bos@559: for a while to retry if the repository becomes unlocked, but bos@559: if the repository remains locked for too long, the process bos@559: attempting to write will time out after a while. This means bos@559: that your daily automated scripts won't get stuck forever and bos@559: pile up if a system crashes unnoticed, for example. (Yes, the bos@559: timeout is configurable, from zero to infinity.) bos@559: bos@559: bos@559: Safe dirstate access bos@559: bos@584: As with revision data, Mercurial doesn't take a lock to bos@559: read the dirstate file; it does acquire a lock to write it. bos@559: To avoid the possibility of reading a partially written copy bos@559: of the dirstate file, Mercurial writes to a file with a bos@559: unique name in the same directory as the dirstate file, then bos@559: renames the temporary file atomically to bos@559: dirstate. The file named bos@559: dirstate is thus guaranteed to be bos@559: complete, not partially written. bos@559: bos@559: bos@559: bos@559: bos@559: Avoiding seeks bos@559: bos@584: Critical to Mercurial's performance is the avoidance of bos@559: seeks of the disk head, since any seek is far more expensive bos@559: than even a comparatively large read operation. bos@559: bos@584: This is why, for example, the dirstate is stored in a bos@559: single file. If there were a dirstate file per directory that bos@559: Mercurial tracked, the disk would seek once per directory. bos@559: Instead, Mercurial reads the entire single dirstate file in bos@559: one step. bos@559: bos@584: Mercurial also uses a copy on write scheme bos@559: when cloning a repository on local storage. Instead of bos@559: copying every revlog file from the old repository into the new bos@559: repository, it makes a hard link, which is a bos@559: shorthand way to say these two names point to the same bos@559: file. When Mercurial is about to write to one of a bos@559: revlog's files, it checks to see if the number of names bos@559: pointing at the file is greater than one. If it is, more than bos@559: one repository is using the file, so Mercurial makes a new bos@559: copy of the file that is private to this repository. bos@559: bos@584: A few revision control developers have pointed out that bos@559: this idea of making a complete private copy of a file is not bos@559: very efficient in its use of storage. While this is true, bos@559: storage is cheap, and this method gives the highest bos@559: performance while deferring most book-keeping to the operating bos@559: system. An alternative scheme would most likely reduce bos@559: performance and increase the complexity of the software, each bos@559: of which is much more important to the feel of bos@559: day-to-day use. bos@559: bos@559: bos@559: bos@559: Other contents of the dirstate bos@559: bos@584: Because Mercurial doesn't force you to tell it when you're bos@559: modifying a file, it uses the dirstate to store some extra bos@559: information so it can determine efficiently whether you have bos@559: modified a file. For each file in the working directory, it bos@559: stores the time that it last modified the file itself, and the bos@559: size of the file at that time. bos@559: bos@584: When you explicitly hg bos@559: add, hg remove, bos@559: hg rename or hg copy files, Mercurial updates the bos@559: dirstate so that it knows what to do with those files when you bos@559: commit. bos@559: bos@584: When Mercurial is checking the states of files in the bos@559: working directory, it first checks a file's modification time. bos@559: If that has not changed, the file must not have been modified. bos@559: If the file's size has changed, the file must have been bos@559: modified. If the modification time has changed, but the size bos@559: has not, only then does Mercurial need to read the actual bos@559: contents of the file to see if they've changed. Storing these bos@559: few extra pieces of information dramatically reduces the bos@559: amount of data that Mercurial needs to read, which yields bos@559: large performance improvements compared to other revision bos@559: control systems. bos@559: bos@559: bos@559: bos@559: bos@559: bos@559: