File names and pattern matching

belaran@964: belaran@964: belaran@964: belaran@964: File names and pattern matching belaran@964: \label{chap:names} belaran@964: belaran@964: Mercurial provides mechanisms that let you work with file names in a belaran@964: consistent and expressive way. belaran@964: belaran@964: belaran@964: Simple file naming belaran@964: belaran@964: Mercurial uses a unified piece of machinery under the hood to belaran@964: handle file names. Every command behaves uniformly with respect to belaran@964: file names. The way in which commands work with file names is as belaran@964: follows. belaran@964: belaran@964: If you explicitly name real files on the command line, Mercurial works belaran@964: with exactly those files, as you would expect. belaran@964: belaran@964: belaran@964: When you provide a directory name, Mercurial will interpret this as belaran@964: operate on every file in this directory and its subdirectories. belaran@964: Mercurial traverses the files and subdirectories in a directory in belaran@964: alphabetical order. When it encounters a subdirectory, it will belaran@964: traverse that subdirectory before continuing with the current belaran@964: directory. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Running commands without any file names belaran@964: belaran@964: Mercurial's commands that work with file names have useful default belaran@964: behaviours when you invoke them without providing any file names or belaran@964: patterns. What kind of behaviour you should expect depends on what belaran@964: the command does. Here are a few rules of thumb you can use to belaran@964: predict what a command is likely to do if you don't give it any names belaran@964: to work with. belaran@964: belaran@964: Most commands will operate on the entire working directory. belaran@964: This is what the hg add command does, for example. belaran@964: belaran@964: If the command has effects that are difficult or impossible to belaran@964: reverse, it will force you to explicitly provide at least one name belaran@964: or pattern (see below). This protects you from accidentally belaran@964: deleting files by running hg remove with no arguments, for belaran@964: example. belaran@964: belaran@964: belaran@964: It's easy to work around these default behaviours if they don't suit belaran@964: you. If a command normally operates on the whole working directory, belaran@964: you can invoke it on just the current directory and its subdirectories belaran@964: by giving it the name .. belaran@964: belaran@964: belaran@964: belaran@964: Along the same lines, some commands normally print file names relative belaran@964: to the root of the repository, even if you're invoking them from a belaran@964: subdirectory. Such a command will print file names relative to your belaran@964: subdirectory if you give it explicit names. Here, we're going to run belaran@964: hg status from a subdirectory, and get it to operate on the belaran@964: entire working directory while printing file names relative to our belaran@964: subdirectory, by passing it the output of the hg root command. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Telling you what's going on belaran@964: belaran@964: The hg add example in the preceding section illustrates something belaran@964: else that's helpful about Mercurial commands. If a command operates belaran@964: on a file that you didn't name explicitly on the command line, it will belaran@964: usually print the name of the file, so that you will not be surprised belaran@964: what's going on. belaran@964: belaran@964: belaran@964: The principle here is of least surprise. If you've exactly belaran@964: named a file on the command line, there's no point in repeating it belaran@964: back at you. If Mercurial is acting on a file implicitly, belaran@964: because you provided no names, or a directory, or a pattern (see belaran@964: below), it's safest to tell you what it's doing. belaran@964: belaran@964: belaran@964: For commands that behave this way, you can silence them using the belaran@964: option. You can also get them to print the name of every belaran@964: file, even those you've named explicitly, using the belaran@964: option. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Using patterns to identify files belaran@964: belaran@964: In addition to working with file and directory names, Mercurial lets belaran@964: you use patterns to identify files. Mercurial's pattern belaran@964: handling is expressive. belaran@964: belaran@964: belaran@964: On Unix-like systems (Linux, MacOS, etc.), the job of matching file belaran@964: names to patterns normally falls to the shell. On these systems, you belaran@964: must explicitly tell Mercurial that a name is a pattern. On Windows, belaran@964: the shell does not expand patterns, so Mercurial will automatically belaran@964: identify names that are patterns, and expand them for you. belaran@964: belaran@964: belaran@964: To provide a pattern in place of a regular name on the command line, belaran@964: the mechanism is simple: belaran@964: belaran@964: belaran@964: syntax:patternbody belaran@964: belaran@964: belaran@964: That is, a pattern is identified by a short text string that says what belaran@964: kind of pattern this is, followed by a colon, followed by the actual belaran@964: pattern. belaran@964: belaran@964: belaran@964: Mercurial supports two kinds of pattern syntax. The most frequently belaran@964: used is called glob; this is the same kind of pattern belaran@964: matching used by the Unix shell, and should be familiar to Windows belaran@964: command prompt users, too. belaran@964: belaran@964: belaran@964: When Mercurial does automatic pattern matching on Windows, it uses belaran@964: glob syntax. You can thus omit the glob: prefix belaran@964: on Windows, but it's safe to use it, too. belaran@964: belaran@964: belaran@964: The re syntax is more powerful; it lets you specify patterns belaran@964: using regular expressions, also known as regexps. belaran@964: belaran@964: belaran@964: By the way, in the examples that follow, notice that I'm careful to belaran@964: wrap all of my patterns in quote characters, so that they won't get belaran@964: expanded by the shell before Mercurial sees them. belaran@964: belaran@964: belaran@964: belaran@964: Shell-style <literal>glob</literal> patterns belaran@964: belaran@964: This is an overview of the kinds of patterns you can use when you're belaran@964: matching on glob patterns. belaran@964: belaran@964: belaran@964: The * character matches any string, within a single belaran@964: directory. belaran@964: belaran@964: belaran@964: belaran@964: The ** pattern matches any string, and crosses directory belaran@964: boundaries. It's not a standard Unix glob token, but it's accepted by belaran@964: several popular Unix shells, and is very useful. belaran@964: belaran@964: belaran@964: belaran@964: The ? pattern matches any single character. belaran@964: belaran@964: belaran@964: belaran@964: The [ character begins a character class. This belaran@964: matches any single character within the class. The class ends with a belaran@964: ] character. A class may contain multiple ranges belaran@964: of the form a-f, which is shorthand for belaran@964: abcdef. belaran@964: belaran@964: If the first character after the [ in a character class belaran@964: is a !, it negates the class, making it match any belaran@964: single character not in the class. belaran@964: belaran@964: belaran@964: A { begins a group of subpatterns, where the whole group belaran@964: matches if any subpattern in the group matches. The , belaran@964: character separates subpatterns, and \texttt{}} ends the group. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Watch out! belaran@964: belaran@964: Don't forget that if you want to match a pattern in any directory, you belaran@964: should not be using the * match-any token, as this will belaran@964: only match within one directory. Instead, use the ** belaran@964: token. This small example illustrates the difference between the two. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Regular expression matching with <literal>re</literal> patterns belaran@964: belaran@964: Mercurial accepts the same regular expression syntax as the Python belaran@964: programming language (it uses Python's regexp engine internally). belaran@964: This is based on the Perl language's regexp syntax, which is the most belaran@964: popular dialect in use (it's also used in Java, for example). belaran@964: belaran@964: belaran@964: I won't discuss Mercurial's regexp dialect in any detail here, as belaran@964: regexps are not often used. Perl-style regexps are in any case belaran@964: already exhaustively documented on a multitude of web sites, and in belaran@964: many books. Instead, I will focus here on a few things you should belaran@964: know if you find yourself needing to use regexps with Mercurial. belaran@964: belaran@964: belaran@964: A regexp is matched against an entire file name, relative to the root belaran@964: of the repository. In other words, even if you're already in belaran@964: subbdirectory foo, if you want to match files under this belaran@964: directory, your pattern must start with foo/. belaran@964: belaran@964: belaran@964: One thing to note, if you're familiar with Perl-style regexps, is that belaran@964: Mercurial's are rooted. That is, a regexp starts matching belaran@964: against the beginning of a string; it doesn't look for a match belaran@964: anywhere within the string. To match anywhere in a string, start belaran@964: your pattern with .*. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Filtering files belaran@964: belaran@964: Not only does Mercurial give you a variety of ways to specify files; belaran@964: it lets you further winnow those files using filters. Commands belaran@964: that work with file names accept two filtering options. belaran@964: belaran@964: belaran@964: , or , lets you specify a pattern belaran@964: that file names must match in order to be processed. belaran@964: belaran@964: belaran@964: , or , gives you a way to belaran@964: avoid processing files, if they match this pattern. belaran@964: belaran@964: belaran@964: You can provide multiple and options on the belaran@964: command line, and intermix them as you please. Mercurial interprets belaran@964: the patterns you provide using glob syntax by default (but you can use belaran@964: regexps if you need to). belaran@964: belaran@964: belaran@964: You can read a filter as process only the files that belaran@964: match this filter. belaran@964: belaran@964: The filter is best read as process only the files that belaran@964: don't match this pattern. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Ignoring unwanted files and directories belaran@964: belaran@964: XXX. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Case sensitivity belaran@964: \label{sec:names:case} belaran@964: belaran@964: belaran@964: If you're working in a mixed development environment that contains belaran@964: both Linux (or other Unix) systems and Macs or Windows systems, you belaran@964: should keep in the back of your mind the knowledge that they treat the belaran@964: case (N versus n) of file names in incompatible ways. This is belaran@964: not very likely to affect you, and it's easy to deal with if it does, belaran@964: but it could surprise you if you don't know about it. belaran@964: belaran@964: belaran@964: Operating systems and filesystems differ in the way they handle the belaran@964: case of characters in file and directory names. There are belaran@964: three common ways to handle case in names. belaran@964: belaran@964: belaran@964: Completely case insensitive. Uppercase and lowercase versions belaran@964: of a letter are treated as identical, both when creating a file and belaran@964: during subsequent accesses. This is common on older DOS-based belaran@964: systems. belaran@964: belaran@964: belaran@964: Case preserving, but insensitive. When a file or directory is belaran@964: created, the case of its name is stored, and can be retrieved and belaran@964: displayed by the operating system. When an existing file is being belaran@964: looked up, its case is ignored. This is the standard arrangement on belaran@964: Windows and MacOS. The names foo and FoO belaran@964: identify the same file. This treatment of uppercase and lowercase belaran@964: letters as interchangeable is also referred to as \emph{case belaran@964: folding}. belaran@964: belaran@964: belaran@964: Case sensitive. The case of a name is significant at all times. belaran@964: The names foo and {FoO} identify different files. This belaran@964: is the way Linux and Unix systems normally work. belaran@964: belaran@964: belaran@964: belaran@964: On Unix-like systems, it is possible to have any or all of the above belaran@964: ways of handling case in action at once. For example, if you use a belaran@964: USB thumb drive formatted with a FAT32 filesystem on a Linux system, belaran@964: Linux will handle names on that filesystem in a case preserving, but belaran@964: insensitive, way. belaran@964: belaran@964: belaran@964: belaran@964: Safe, portable repository storage belaran@964: belaran@964: Mercurial's repository storage mechanism is case safe. It belaran@964: translates file names so that they can be safely stored on both case belaran@964: sensitive and case insensitive filesystems. This means that you can belaran@964: use normal file copying tools to transfer a Mercurial repository onto, belaran@964: for example, a USB thumb drive, and safely move that drive and belaran@964: repository back and forth between a Mac, a PC running Windows, and a belaran@964: Linux box. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Detecting case conflicts belaran@964: belaran@964: When operating in the working directory, Mercurial honours the naming belaran@964: policy of the filesystem where the working directory is located. If belaran@964: the filesystem is case preserving, but insensitive, Mercurial will belaran@964: treat names that differ only in case as the same. belaran@964: belaran@964: belaran@964: An important aspect of this approach is that it is possible to commit belaran@964: a changeset on a case sensitive (typically Linux or Unix) filesystem belaran@964: that will cause trouble for users on case insensitive (usually Windows belaran@964: and MacOS) users. If a Linux user commits changes to two files, one belaran@964: named myfile.c and the other named MyFile.C, belaran@964: they will be stored correctly in the repository. And in the working belaran@964: directories of other Linux users, they will be correctly represented belaran@964: as separate files. belaran@964: belaran@964: belaran@964: If a Windows or Mac user pulls this change, they will not initially belaran@964: have a problem, because Mercurial's repository storage mechanism is belaran@964: case safe. However, once they try to hg update the working belaran@964: directory to that changeset, or hg merge with that changeset, belaran@964: Mercurial will spot the conflict between the two file names that the belaran@964: filesystem would treat as the same, and forbid the update or merge belaran@964: from occurring. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: Fixing a case conflict belaran@964: belaran@964: If you are using Windows or a Mac in a mixed environment where some of belaran@964: your collaborators are using Linux or Unix, and Mercurial reports a belaran@964: case folding conflict when you try to hg update or hg merge, belaran@964: the procedure to fix the problem is simple. belaran@964: belaran@964: belaran@964: Just find a nearby Linux or Unix box, clone the problem repository belaran@964: onto it, and use Mercurial's hg rename command to change the belaran@964: names of any offending files or directories so that they will no belaran@964: longer cause case folding conflicts. Commit this change, hg pull belaran@964: or hg push it across to your Windows or MacOS system, and belaran@964: hg update to the revision with the non-conflicting names. belaran@964: belaran@964: belaran@964: The changeset with case-conflicting names will remain in your belaran@964: project's history, and you still won't be able to hg update your belaran@964: working directory to that changeset on a Windows or MacOS system, but belaran@964: you can continue development unimpeded. belaran@964: belaran@964: belaran@964: belaran@964: Prior to version 0.9.3, Mercurial did not use a case safe repository belaran@964: storage mechanism, and did not detect case folding conflicts. If belaran@964: you are using an older version of Mercurial on Windows or MacOS, I belaran@964: strongly recommend that you upgrade. belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: belaran@964: