hgbook

view en/filenames.tex @ 177:c54f4c106fd5

Record the version of Mercurial used.
author Bryan O'Sullivan <bos@serpentine.com>
date Wed Mar 28 23:01:57 2007 -0700 (2007-03-28)
parents 1e013fbe35f7
children d3dd1bedba3c
line source
1 \chapter{File names and pattern matching}
2 \label{chap:names}
4 Mercurial provides mechanisms that let you work with file names in a
5 consistent and expressive way.
7 \section{Simple file naming}
9 Mercurial uses a unified piece of machinery ``under the hood'' to
10 handle file names. Every command behaves uniformly with respect to
11 file names. The way in which commands work with file names is as
12 follows.
14 If you explicitly name real files on the command line, Mercurial works
15 with exactly those files, as you would expect.
16 \interaction{filenames.files}
18 When you provide a directory name, Mercurial will interpret this as
19 ``operate on every file in this directory and its subdirectories''.
20 Mercurial traverses the files and subdirectories in a directory in
21 alphabetical order. When it encounters a subdirectory, it will
22 traverse that subdirectory before continuing with the current
23 directory.
24 \interaction{filenames.dirs}
26 \section{Running commands without any file names}
28 Mercurial's commands that work with file names have useful default
29 behaviours when you invoke them without providing any file names or
30 patterns. What kind of behaviour you should expect depends on what
31 the command does. Here are a few rules of thumb you can use to
32 predict what a command is likely to do if you don't give it any names
33 to work with.
34 \begin{itemize}
35 \item Most commands will operate on the entire working directory.
36 This is what the \hgcmd{add} command does, for example.
37 \item If the command has effects that are difficult or impossible to
38 reverse, it will force you to explicitly provide at least one name
39 or pattern (see below). This protects you from accidentally
40 deleting files by running \hgcmd{remove} with no arguments, for
41 example.
42 \end{itemize}
44 It's easy to work around these default behaviours if they don't suit
45 you. If a command normally operates on the whole working directory,
46 you can invoke it on just the current directory and its subdirectories
47 by giving it the name ``\dirname{.}''.
48 \interaction{filenames.wdir-subdir}
50 Along the same lines, some commands normally print file names relative
51 to the root of the repository, even if you're invoking them from a
52 subdirectory. Such a command will print file names relative to your
53 subdirectory if you give it explicit names. Here, we're going to run
54 \hgcmd{status} from a subdirectory, and get it to operate on the
55 entire working directory while printing file names relative to our
56 subdirectory, by passing it the output of the \hgcmd{root} command.
57 \interaction{filenames.wdir-relname}
59 \section{Telling you what's going on}
61 The \hgcmd{add} example in the preceding section illustrates something
62 else that's helpful about Mercurial commands. If a command operates
63 on a file that you didn't name explicitly on the command line, it will
64 usually print the name of the file, so that you will not be surprised
65 what's going on.
67 The principle here is of \emph{least surprise}. If you've exactly
68 named a file on the command line, there's no point in repeating it
69 back at you. If Mercurial is acting on a file \emph{implicitly},
70 because you provided no names, or a directory, or a pattern (see
71 below), it's safest to tell you what it's doing.
73 For commands that behave this way, you can silence them using the
74 \hggopt{-q} option. You can also get them to print the name of every
75 file, even those you've named explicitly, using the \hggopt{-v}
76 option.
78 \section{Using patterns to identify files}
80 In addition to working with file and directory names, Mercurial lets
81 you use \emph{patterns} to identify files. Mercurial's pattern
82 handling is expressive.
84 On Unix-like systems (Linux, MacOS, etc.), the job of matching file
85 names to patterns normally falls to the shell. On these systems, you
86 must explicitly tell Mercurial that a name is a pattern. On Windows,
87 the shell does not expand patterns, so Mercurial will automatically
88 identify names that are patterns, and expand them for you.
90 To provide a pattern in place of a regular name on the command line,
91 the mechanism is simple:
92 \begin{codesample2}
93 syntax:patternbody
94 \end{codesample2}
95 That is, a pattern is identified by a short text string that says what
96 kind of pattern this is, followed by a colon, followed by the actual
97 pattern.
99 Mercurial supports two kinds of pattern syntax. The most frequently
100 used is called \texttt{glob}; this is the same kind of pattern
101 matching used by the Unix shell, and should be familiar to Windows
102 command prompt users, too.
104 When Mercurial does automatic pattern matching on Windows, it uses
105 \texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix
106 on Windows, but it's safe to use it, too.
108 The \texttt{re} syntax is more powerful; it lets you specify patterns
109 using regular expressions, also known as regexps.
111 By the way, in the examples that follow, notice that I'm careful to
112 wrap all of my patterns in quote characters, so that they won't get
113 expanded by the shell before Mercurial sees them.
115 \subsection{Shell-style \texttt{glob} patterns}
117 This is an overview of the kinds of patterns you can use when you're
118 matching on glob patterns.
120 The ``\texttt{*}'' character matches any string, within a single
121 directory.
122 \interaction{filenames.glob.star}
124 The ``\texttt{**}'' pattern matches any string, and crosses directory
125 boundaries. It's not a standard Unix glob token, but it's accepted by
126 several popular Unix shells, and is very useful.
127 \interaction{filenames.glob.starstar}
129 The ``\texttt{?}'' pattern matches any single character.
130 \interaction{filenames.glob.question}
132 The ``\texttt{[}'' character begins a \emph{character class}. This
133 matches any single character within the class. The class ends with a
134 ``\texttt{]}'' character. A class may contain multiple \emph{range}s
135 of the form ``\texttt{a-f}'', which is shorthand for
136 ``\texttt{abcdef}''.
137 \interaction{filenames.glob.range}
138 If the first character after the ``\texttt{[}'' in a character class
139 is a ``\texttt{!}'', it \emph{negates} the class, making it match any
140 single character not in the class.
142 A ``\texttt{\{}'' begins a group of subpatterns, where the whole group
143 matches if any subpattern in the group matches. The ``\texttt{,}''
144 character separates subpatterns, and ``\texttt{\}}'' ends the group.
145 \interaction{filenames.glob.group}
147 \subsubsection{Watch out!}
149 Don't forget that if you want to match a pattern in any directory, you
150 should not be using the ``\texttt{*}'' match-any token, as this will
151 only match within one directory. Instead, use the ``\texttt{**}''
152 token. This small example illustrates the difference between the two.
153 \interaction{filenames.glob.star-starstar}
155 When you're writing a glob pattern, bear in mind that Mercurial will
156 treat a pattern that matches a directory name as ``match every file
157 under that directory''. For example, a glob pattern of
158 ``\texttt{**c}'' means \emph{both} ``match files ending in
159 `\texttt{c}''' ``any file under all directories that end in
160 `\texttt{c}'''. I personally find this behaviour counterintuitive.
161 If you need to write a pattern that means ``match \emph{only} files'',
162 you'll need to express it as a regular expression instead; see below.
164 \subsection{Regular expression matching with \texttt{re} patterns}
166 Mercurial accepts the same regular expression syntax as the Python
167 programming language (it uses Python's regexp engine internally).
168 This is based on the Perl language's regexp syntax, which is the most
169 popular dialect in use (it's also used in Java, for example).
171 I won't discuss Mercurial's regexp dialect in any detail here, as
172 regexps are not often used. Perl-style regexps are in any case
173 already exhaustively documented on a multitude of web sites, and in
174 many books. Instead, I will focus here on a few things you should
175 know if you find yourself needing to use regexps with Mercurial.
177 A regexp is matched against an entire file name, relative to the root
178 of the repository. In other words, even if you're already in
179 subbdirectory \dirname{foo}, if you want to match files under this
180 directory, your pattern must start with ``\texttt{foo/}''.
182 One thing to note, if you're familiar with Perl-style regexps, is that
183 Mercurial's are \emph{rooted}. That is, a regexp starts matching
184 against the beginning of a string; it doesn't look for a match
185 anywhere within the string it. To match anywhere in a string, start
186 your pattern with ``\texttt{.*}''.
188 \section{Filtering files}
190 Not only does Mercurial give you a variety of ways to specify files;
191 it lets you further winnow those files using \emph{filters}. Commands
192 that work with file names accept two filtering options.
193 \begin{itemize}
194 \item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern
195 that file names must match in order to be processed.
196 \item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to
197 \emph{avoid} processing files, if they match this pattern.
198 \end{itemize}
199 You can provide multiple \hggopt{-I} and \hggopt{-X} options on the
200 command line, and intermix them as you please. Mercurial interprets
201 the patterns you provide using glob syntax by default (but you can use
202 regexps if you need to).
204 You can read a \hggopt{-I} filter as ``process only the files that
205 match this filter''.
206 \interaction{filenames.filter.include}
207 The \hggopt{-X} filter is best read as ``process only the files that
208 don't match this pattern''.
209 \interaction{filenames.filter.exclude}
211 \section{Ignoring unwanted files and directories}
213 XXX.
215 \section{Case sensitivity}
216 \label{sec:names:case}
218 If you're working in a mixed development environment that contains
219 both Linux (or other Unix) systems and Macs or Windows systems, you
220 should keep in the back of your mind the knowledge that they treat the
221 case (``N'' versus ``n'') of file names in incompatible ways. This is
222 not very likely to affect you, and it's easy to deal with if it does,
223 but it could surprise you if you don't know about it.
225 Operating systems and filesystems differ in the way they handle the
226 \emph{case} of characters in file and directory names. There are
227 three common ways to handle case in names.
228 \begin{itemize}
229 \item Completely case insensitive. Uppercase and lowercase versions
230 of a letter are treated as identical, both when creating a file and
231 during subsequent accesses. This is common on older DOS-based
232 systems.
233 \item Case preserving, but insensitive. When a file or directory is
234 created, the case of its name is stored, and can be retrieved and
235 displayed by the operating system. When an existing file is being
236 looked up, its case is ignored. This is the standard arrangement on
237 Windows and MacOS. The names \filename{foo} and \filename{FoO}
238 identify the same file. This treatment of uppercase and lowercase
239 letters as interchangeable is also referred to as \emph{case
240 folding}.
241 \item Case sensitive. The case of a name is significant at all times.
242 The names \filename{foo} and {FoO} identify different files. This
243 is the way Linux and Unix systems normally work.
244 \end{itemize}
246 On Unix-like systems, it is possible to have any or all of the above
247 ways of handling case in action at once. For example, if you use a
248 USB thumb drive formatted with a FAT32 filesystem on a Linux system,
249 Linux will handle names on that filesystem in a case preserving, but
250 insensitive, way.
252 \subsection{Safe, portable repository storage}
254 Mercurial's repository storage mechanism is \emph{case safe}. It
255 translates file names so that they can be safely stored on both case
256 sensitive and case insensitive filesystems. This means that you can
257 use normal file copying tools to transfer a Mercurial repository onto,
258 for example, a USB thumb drive, and safely move that drive and
259 repository back and forth between a Mac, a PC running Windows, and a
260 Linux box.
262 \subsection{Detecting case conflicts}
264 When operating in the working directory, Mercurial honours the naming
265 policy of the filesystem where the working directory is located. If
266 the filesystem is case preserving, but insensitive, Mercurial will
267 treat names that differ only in case as the same.
269 An important aspect of this approach is that it is possible to commit
270 a changeset on a case sensitive (typically Linux or Unix) filesystem
271 that will cause trouble for users on case insensitive (usually Windows
272 and MacOS) users. If a Linux user commits changes to two files, one
273 named \filename{myfile.c} and the other named \filename{MyFile.C},
274 they will be stored correctly in the repository. And in the working
275 directories of other Linux users, they will be correctly represented
276 as separate files.
278 If a Windows or Mac user pulls this change, they will not initially
279 have a problem, because Mercurial's repository storage mechanism is
280 case safe. However, once they try to \hgcmd{update} the working
281 directory to that changeset, or \hgcmd{merge} with that changeset,
282 Mercurial will spot the conflict between the two file names that the
283 filesystem would treat as the same, and forbid the update or merge
284 from occurring.
286 \subsection{Fixing a case conflict}
288 If you are using Windows or a Mac in a mixed environment where some of
289 your collaborators are using Linux or Unix, and Mercurial reports a
290 case folding conflict when you try to \hgcmd{update} or \hgcmd{merge},
291 the procedure to fix the problem is simple.
293 Just find a nearby Linux or Unix box, clone the problem repository
294 onto it, and use Mercurial's \hgcmd{rename} command to change the
295 names of any offending files or directories so that they will no
296 longer cause case folding conflicts. Commit this change, \hgcmd{pull}
297 or \hgcmd{push} it across to your Windows or MacOS system, and
298 \hgcmd{update} to the revision with the non-conflicting names.
300 The changeset with case-conflicting names will remain in your
301 project's history, and you still won't be able to \hgcmd{update} your
302 working directory to that changeset on a Windows or MacOS system, but
303 you can continue development unimpeded.
305 \begin{note}
306 Prior to version~0.9.3, Mercurial did not use a case safe repository
307 storage mechanism, and did not detect case folding conflicts. If
308 you are using an older version of Mercurial on Windows or MacOS, I
309 strongly recommend that you upgrade.
310 \end{note}
312 %%% Local Variables:
313 %%% mode: latex
314 %%% TeX-master: "00book"
315 %%% End: