bos@133: \chapter{File names and pattern matching} bos@133: \label{chap:names} bos@133: bos@133: Mercurial provides mechanisms that let you work with file names in a bos@133: consistent and expressive way. bos@133: bos@133: \section{Simple file naming} bos@133: bos@133: Mercurial uses a unified piece of machinery ``under the hood'' to bos@133: handle file names. Every command behaves uniformly with respect to bos@133: file names. The way in which commands work with file names is as bos@133: follows. bos@133: bos@133: If you explicitly name real files on the command line, Mercurial works bos@133: with exactly those files, as you would expect. bos@133: \interaction{filenames.files} bos@133: bos@133: When you provide a directory name, Mercurial will interpret this as bos@133: ``operate on every file in this directory and its subdirectories''. bos@133: Mercurial traverses the files and subdirectories in a directory in bos@133: alphabetical order. When it encounters a subdirectory, it will bos@133: traverse that subdirectory before continuing with the current bos@133: directory. bos@133: \interaction{filenames.dirs} bos@133: bos@133: \section{Running commands without any file names} bos@133: bos@133: Mercurial's commands that work with file names have useful default bos@133: behaviours when you invoke them without providing any file names or bos@133: patterns. What kind of behaviour you should expect depends on what bos@133: the command does. Here are a few rules of thumb you can use to bos@133: predict what a command is likely to do if you don't give it any names bos@133: to work with. bos@133: \begin{itemize} bos@133: \item Most commands will operate on the entire working directory. bos@133: This is what the \hgcmd{add} command does, for example. bos@133: \item If the command has effects that are difficult or impossible to bos@133: reverse, it will force you to explicitly provide at least one name bos@133: or pattern (see below). This protects you from accidentally bos@133: deleting files by running \hgcmd{remove} with no arguments, for bos@133: example. bos@133: \end{itemize} bos@133: bos@133: It's easy to work around these default behaviours if they don't suit bos@133: you. If a command normally operates on the whole working directory, bos@133: you can invoke it on just the current directory and its subdirectories bos@133: by giving it the name ``\dirname{.}''. bos@133: \interaction{filenames.wdir-subdir} bos@133: bos@133: Along the same lines, some commands normally print file names relative bos@133: to the root of the repository, even if you're invoking them from a bos@133: subdirectory. Such a command will print file names relative to your bos@133: subdirectory if you give it explicit names. Here, we're going to run bos@133: \hgcmd{status} from a subdirectory, and get it to operate on the bos@133: entire working directory while printing file names relative to our bos@133: subdirectory, by passing it the output of the \hgcmd{root} command. bos@133: \interaction{filenames.wdir-relname} bos@133: bos@133: \section{Telling you what's going on} bos@133: bos@133: The \hgcmd{add} example in the preceding section illustrates something bos@133: else that's helpful about Mercurial commands. If a command operates bos@133: on a file that you didn't name explicitly on the command line, it will bos@133: usually print the name of the file, so that you will not be surprised bos@133: what's going on. bos@133: bos@133: The principle here is of \emph{least surprise}. If you've exactly bos@133: named a file on the command line, there's no point in repeating it bos@133: back at you. If Mercurial is acting on a file \emph{implicitly}, bos@133: because you provided no names, or a directory, or a pattern (see bos@133: below), it's safest to tell you what it's doing. bos@133: bos@133: For commands that behave this way, you can silence them using the bos@133: \hggopt{-q} option. You can also get them to print the name of every bos@133: file, even those you've named explicitly, using the \hggopt{-v} bos@133: option. bos@133: bos@133: \section{Using patterns to identify files} bos@133: bos@133: In addition to working with file and directory names, Mercurial lets bos@133: you use \emph{patterns} to identify files. Mercurial's pattern bos@133: handling is expressive. bos@133: bos@133: On Unix-like systems (Linux, MacOS, etc.), the job of matching file bos@133: names to patterns normally falls to the shell. On these systems, you bos@133: must explicitly tell Mercurial that a name is a pattern. On Windows, bos@133: the shell does not expand patterns, so Mercurial will automatically bos@133: identify names that are patterns, and expand them for you. bos@133: bos@133: To provide a pattern in place of a regular name on the command line, bos@133: the mechanism is simple: bos@133: \begin{codesample2} bos@133: syntax:patternbody bos@133: \end{codesample2} bos@133: That is, a pattern is identified by a short text string that says what bos@133: kind of pattern this is, followed by a colon, followed by the actual bos@133: pattern. bos@133: bos@133: Mercurial supports two kinds of pattern syntax. The most frequently bos@133: used is called \texttt{glob}; this is the same kind of pattern bos@133: matching used by the Unix shell, and should be familiar to Windows bos@133: command prompt users, too. bos@133: bos@133: When Mercurial does automatic pattern matching on Windows, it uses bos@133: \texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix bos@133: on Windows, but it's safe to use it, too. bos@133: bos@133: The \texttt{re} syntax is more powerful; it lets you specify patterns bos@133: using regular expressions, also known as regexps. bos@133: bos@133: By the way, in the examples that follow, notice that I'm careful to bos@133: wrap all of my patterns in quote characters, so that they won't get bos@133: expanded by the shell before Mercurial sees them. bos@133: bos@133: \subsection{Shell-style \texttt{glob} patterns} bos@133: bos@133: This is an overview of the kinds of patterns you can use when you're bos@133: matching on glob patterns. bos@133: bos@133: The ``\texttt{*}'' character matches any string, within a single bos@133: directory. bos@133: \interaction{filenames.glob.star} bos@133: bos@133: The ``\texttt{**}'' pattern matches any string, and crosses directory bos@133: boundaries. It's not a standard Unix glob token, but it's accepted by bos@133: several popular Unix shells, and is very useful. bos@133: \interaction{filenames.glob.starstar} bos@133: bos@133: The ``\texttt{?}'' pattern matches any single character. bos@133: \interaction{filenames.glob.question} bos@133: bos@133: The ``\texttt{[}'' character begins a \emph{character class}. This bos@133: matches any single character within the class. The class ends with a bos@133: ``\texttt{]}'' character. A class may contain multiple \emph{range}s bos@133: of the form ``\texttt{a-f}'', which is shorthand for bos@133: ``\texttt{abcdef}''. bos@133: \interaction{filenames.glob.range} bos@133: If the first character after the ``\texttt{[}'' in a character class bos@133: is a ``\texttt{!}'', it \emph{negates} the class, making it match any bos@133: single character not in the class. bos@133: bos@133: A ``\texttt{\{}'' begins a group of subpatterns, where the whole group bos@133: matches if any subpattern in the group matches. The ``\texttt{,}'' bos@133: character separates subpatterns, and ``\texttt{\}}'' ends the group. bos@133: \interaction{filenames.glob.group} bos@133: bos@133: \subsubsection{Watch out!} bos@133: bos@133: Don't forget that if you want to match a pattern in any directory, you bos@133: should not be using the ``\texttt{*}'' match-any token, as this will bos@133: only match within one directory. Instead, use the ``\texttt{**}'' bos@133: token. This small example illustrates the difference between the two. bos@133: \interaction{filenames.glob.star-starstar} bos@133: bos@133: \subsection{Regular expression matching with \texttt{re} patterns} bos@133: bos@133: Mercurial accepts the same regular expression syntax as the Python bos@133: programming language (it uses Python's regexp engine internally). bos@133: This is based on the Perl language's regexp syntax, which is the most bos@133: popular dialect in use (it's also used in Java, for example). bos@133: bos@133: I won't discuss Mercurial's regexp dialect in any detail here, as bos@133: regexps are not often used. Perl-style regexps are in any case bos@133: already exhaustively documented on a multitude of web sites, and in bos@133: many books. Instead, I will focus here on a few things you should bos@133: know if you find yourself needing to use regexps with Mercurial. bos@133: bos@133: A regexp is matched against an entire file name, relative to the root bos@133: of the repository. In other words, even if you're already in bos@133: subbdirectory \dirname{foo}, if you want to match files under this bos@133: directory, your pattern must start with ``\texttt{foo/}''. bos@133: bos@133: One thing to note, if you're familiar with Perl-style regexps, is that bos@133: Mercurial's are \emph{rooted}. That is, a regexp starts matching bos@133: against the beginning of a string; it doesn't look for a match arne@264: anywhere within the string. To match anywhere in a string, start bos@133: your pattern with ``\texttt{.*}''. bos@133: bos@133: \section{Filtering files} bos@133: bos@133: Not only does Mercurial give you a variety of ways to specify files; bos@133: it lets you further winnow those files using \emph{filters}. Commands bos@133: that work with file names accept two filtering options. bos@133: \begin{itemize} bos@133: \item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern bos@133: that file names must match in order to be processed. bos@133: \item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to bos@133: \emph{avoid} processing files, if they match this pattern. bos@133: \end{itemize} bos@133: You can provide multiple \hggopt{-I} and \hggopt{-X} options on the bos@133: command line, and intermix them as you please. Mercurial interprets bos@133: the patterns you provide using glob syntax by default (but you can use bos@133: regexps if you need to). bos@133: bos@133: You can read a \hggopt{-I} filter as ``process only the files that bos@133: match this filter''. bos@133: \interaction{filenames.filter.include} bos@133: The \hggopt{-X} filter is best read as ``process only the files that bos@133: don't match this pattern''. bos@133: \interaction{filenames.filter.exclude} bos@133: bos@133: \section{Ignoring unwanted files and directories} bos@133: bos@133: XXX. bos@133: bos@133: \section{Case sensitivity} bos@133: \label{sec:names:case} bos@133: bos@133: If you're working in a mixed development environment that contains bos@133: both Linux (or other Unix) systems and Macs or Windows systems, you bos@133: should keep in the back of your mind the knowledge that they treat the bos@133: case (``N'' versus ``n'') of file names in incompatible ways. This is bos@133: not very likely to affect you, and it's easy to deal with if it does, bos@133: but it could surprise you if you don't know about it. bos@133: bos@133: Operating systems and filesystems differ in the way they handle the bos@133: \emph{case} of characters in file and directory names. There are bos@133: three common ways to handle case in names. bos@133: \begin{itemize} bos@133: \item Completely case insensitive. Uppercase and lowercase versions bos@133: of a letter are treated as identical, both when creating a file and bos@133: during subsequent accesses. This is common on older DOS-based bos@133: systems. bos@133: \item Case preserving, but insensitive. When a file or directory is bos@133: created, the case of its name is stored, and can be retrieved and bos@133: displayed by the operating system. When an existing file is being bos@133: looked up, its case is ignored. This is the standard arrangement on bos@133: Windows and MacOS. The names \filename{foo} and \filename{FoO} bos@133: identify the same file. This treatment of uppercase and lowercase bos@133: letters as interchangeable is also referred to as \emph{case bos@133: folding}. bos@133: \item Case sensitive. The case of a name is significant at all times. bos@133: The names \filename{foo} and {FoO} identify different files. This bos@133: is the way Linux and Unix systems normally work. bos@133: \end{itemize} bos@133: bos@133: On Unix-like systems, it is possible to have any or all of the above bos@133: ways of handling case in action at once. For example, if you use a bos@133: USB thumb drive formatted with a FAT32 filesystem on a Linux system, bos@133: Linux will handle names on that filesystem in a case preserving, but bos@133: insensitive, way. bos@133: bos@133: \subsection{Safe, portable repository storage} bos@133: bos@133: Mercurial's repository storage mechanism is \emph{case safe}. It bos@133: translates file names so that they can be safely stored on both case bos@133: sensitive and case insensitive filesystems. This means that you can bos@133: use normal file copying tools to transfer a Mercurial repository onto, bos@133: for example, a USB thumb drive, and safely move that drive and bos@133: repository back and forth between a Mac, a PC running Windows, and a bos@133: Linux box. bos@133: bos@133: \subsection{Detecting case conflicts} bos@133: bos@133: When operating in the working directory, Mercurial honours the naming bos@133: policy of the filesystem where the working directory is located. If bos@133: the filesystem is case preserving, but insensitive, Mercurial will bos@133: treat names that differ only in case as the same. bos@133: bos@133: An important aspect of this approach is that it is possible to commit bos@133: a changeset on a case sensitive (typically Linux or Unix) filesystem bos@133: that will cause trouble for users on case insensitive (usually Windows bos@133: and MacOS) users. If a Linux user commits changes to two files, one bos@133: named \filename{myfile.c} and the other named \filename{MyFile.C}, bos@133: they will be stored correctly in the repository. And in the working bos@133: directories of other Linux users, they will be correctly represented bos@133: as separate files. bos@133: bos@133: If a Windows or Mac user pulls this change, they will not initially bos@133: have a problem, because Mercurial's repository storage mechanism is bos@133: case safe. However, once they try to \hgcmd{update} the working bos@133: directory to that changeset, or \hgcmd{merge} with that changeset, bos@133: Mercurial will spot the conflict between the two file names that the bos@133: filesystem would treat as the same, and forbid the update or merge bos@133: from occurring. bos@133: bos@133: \subsection{Fixing a case conflict} bos@133: bos@133: If you are using Windows or a Mac in a mixed environment where some of bos@133: your collaborators are using Linux or Unix, and Mercurial reports a bos@133: case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, bos@133: the procedure to fix the problem is simple. bos@133: bos@133: Just find a nearby Linux or Unix box, clone the problem repository bos@133: onto it, and use Mercurial's \hgcmd{rename} command to change the bos@133: names of any offending files or directories so that they will no bos@133: longer cause case folding conflicts. Commit this change, \hgcmd{pull} bos@133: or \hgcmd{push} it across to your Windows or MacOS system, and bos@133: \hgcmd{update} to the revision with the non-conflicting names. bos@133: bos@133: The changeset with case-conflicting names will remain in your bos@133: project's history, and you still won't be able to \hgcmd{update} your bos@133: working directory to that changeset on a Windows or MacOS system, but bos@133: you can continue development unimpeded. bos@133: bos@133: \begin{note} bos@133: Prior to version~0.9.3, Mercurial did not use a case safe repository bos@133: storage mechanism, and did not detect case folding conflicts. If bos@133: you are using an older version of Mercurial on Windows or MacOS, I bos@133: strongly recommend that you upgrade. bos@133: \end{note} bos@133: bos@133: %%% Local Variables: bos@133: %%% mode: latex bos@133: %%% TeX-master: "00book" bos@133: %%% End: