rev |
line source |
bos@133
|
1 \chapter{File names and pattern matching}
|
bos@133
|
2 \label{chap:names}
|
bos@133
|
3
|
bos@133
|
4 Mercurial provides mechanisms that let you work with file names in a
|
bos@133
|
5 consistent and expressive way.
|
bos@133
|
6
|
bos@133
|
7 \section{Simple file naming}
|
bos@133
|
8
|
bos@133
|
9 Mercurial uses a unified piece of machinery ``under the hood'' to
|
bos@133
|
10 handle file names. Every command behaves uniformly with respect to
|
bos@133
|
11 file names. The way in which commands work with file names is as
|
bos@133
|
12 follows.
|
bos@133
|
13
|
bos@133
|
14 If you explicitly name real files on the command line, Mercurial works
|
bos@133
|
15 with exactly those files, as you would expect.
|
bos@133
|
16 \interaction{filenames.files}
|
bos@133
|
17
|
bos@133
|
18 When you provide a directory name, Mercurial will interpret this as
|
bos@133
|
19 ``operate on every file in this directory and its subdirectories''.
|
bos@133
|
20 Mercurial traverses the files and subdirectories in a directory in
|
bos@133
|
21 alphabetical order. When it encounters a subdirectory, it will
|
bos@133
|
22 traverse that subdirectory before continuing with the current
|
bos@133
|
23 directory.
|
bos@133
|
24 \interaction{filenames.dirs}
|
bos@133
|
25
|
bos@133
|
26 \section{Running commands without any file names}
|
bos@133
|
27
|
bos@133
|
28 Mercurial's commands that work with file names have useful default
|
bos@133
|
29 behaviours when you invoke them without providing any file names or
|
bos@133
|
30 patterns. What kind of behaviour you should expect depends on what
|
bos@133
|
31 the command does. Here are a few rules of thumb you can use to
|
bos@133
|
32 predict what a command is likely to do if you don't give it any names
|
bos@133
|
33 to work with.
|
bos@133
|
34 \begin{itemize}
|
bos@133
|
35 \item Most commands will operate on the entire working directory.
|
bos@133
|
36 This is what the \hgcmd{add} command does, for example.
|
bos@133
|
37 \item If the command has effects that are difficult or impossible to
|
bos@133
|
38 reverse, it will force you to explicitly provide at least one name
|
bos@133
|
39 or pattern (see below). This protects you from accidentally
|
bos@133
|
40 deleting files by running \hgcmd{remove} with no arguments, for
|
bos@133
|
41 example.
|
bos@133
|
42 \end{itemize}
|
bos@133
|
43
|
bos@133
|
44 It's easy to work around these default behaviours if they don't suit
|
bos@133
|
45 you. If a command normally operates on the whole working directory,
|
bos@133
|
46 you can invoke it on just the current directory and its subdirectories
|
bos@133
|
47 by giving it the name ``\dirname{.}''.
|
bos@133
|
48 \interaction{filenames.wdir-subdir}
|
bos@133
|
49
|
bos@133
|
50 Along the same lines, some commands normally print file names relative
|
bos@133
|
51 to the root of the repository, even if you're invoking them from a
|
bos@133
|
52 subdirectory. Such a command will print file names relative to your
|
bos@133
|
53 subdirectory if you give it explicit names. Here, we're going to run
|
bos@133
|
54 \hgcmd{status} from a subdirectory, and get it to operate on the
|
bos@133
|
55 entire working directory while printing file names relative to our
|
bos@133
|
56 subdirectory, by passing it the output of the \hgcmd{root} command.
|
bos@133
|
57 \interaction{filenames.wdir-relname}
|
bos@133
|
58
|
bos@133
|
59 \section{Telling you what's going on}
|
bos@133
|
60
|
bos@133
|
61 The \hgcmd{add} example in the preceding section illustrates something
|
bos@133
|
62 else that's helpful about Mercurial commands. If a command operates
|
bos@133
|
63 on a file that you didn't name explicitly on the command line, it will
|
bos@133
|
64 usually print the name of the file, so that you will not be surprised
|
bos@133
|
65 what's going on.
|
bos@133
|
66
|
bos@133
|
67 The principle here is of \emph{least surprise}. If you've exactly
|
bos@133
|
68 named a file on the command line, there's no point in repeating it
|
bos@133
|
69 back at you. If Mercurial is acting on a file \emph{implicitly},
|
bos@133
|
70 because you provided no names, or a directory, or a pattern (see
|
bos@133
|
71 below), it's safest to tell you what it's doing.
|
bos@133
|
72
|
bos@133
|
73 For commands that behave this way, you can silence them using the
|
bos@133
|
74 \hggopt{-q} option. You can also get them to print the name of every
|
bos@133
|
75 file, even those you've named explicitly, using the \hggopt{-v}
|
bos@133
|
76 option.
|
bos@133
|
77
|
bos@133
|
78 \section{Using patterns to identify files}
|
bos@133
|
79
|
bos@133
|
80 In addition to working with file and directory names, Mercurial lets
|
bos@133
|
81 you use \emph{patterns} to identify files. Mercurial's pattern
|
bos@133
|
82 handling is expressive.
|
bos@133
|
83
|
bos@133
|
84 On Unix-like systems (Linux, MacOS, etc.), the job of matching file
|
bos@133
|
85 names to patterns normally falls to the shell. On these systems, you
|
bos@133
|
86 must explicitly tell Mercurial that a name is a pattern. On Windows,
|
bos@133
|
87 the shell does not expand patterns, so Mercurial will automatically
|
bos@133
|
88 identify names that are patterns, and expand them for you.
|
bos@133
|
89
|
bos@133
|
90 To provide a pattern in place of a regular name on the command line,
|
bos@133
|
91 the mechanism is simple:
|
bos@133
|
92 \begin{codesample2}
|
bos@133
|
93 syntax:patternbody
|
bos@133
|
94 \end{codesample2}
|
bos@133
|
95 That is, a pattern is identified by a short text string that says what
|
bos@133
|
96 kind of pattern this is, followed by a colon, followed by the actual
|
bos@133
|
97 pattern.
|
bos@133
|
98
|
bos@133
|
99 Mercurial supports two kinds of pattern syntax. The most frequently
|
bos@133
|
100 used is called \texttt{glob}; this is the same kind of pattern
|
bos@133
|
101 matching used by the Unix shell, and should be familiar to Windows
|
bos@133
|
102 command prompt users, too.
|
bos@133
|
103
|
bos@133
|
104 When Mercurial does automatic pattern matching on Windows, it uses
|
bos@133
|
105 \texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix
|
bos@133
|
106 on Windows, but it's safe to use it, too.
|
bos@133
|
107
|
bos@133
|
108 The \texttt{re} syntax is more powerful; it lets you specify patterns
|
bos@133
|
109 using regular expressions, also known as regexps.
|
bos@133
|
110
|
bos@133
|
111 By the way, in the examples that follow, notice that I'm careful to
|
bos@133
|
112 wrap all of my patterns in quote characters, so that they won't get
|
bos@133
|
113 expanded by the shell before Mercurial sees them.
|
bos@133
|
114
|
bos@133
|
115 \subsection{Shell-style \texttt{glob} patterns}
|
bos@133
|
116
|
bos@133
|
117 This is an overview of the kinds of patterns you can use when you're
|
bos@133
|
118 matching on glob patterns.
|
bos@133
|
119
|
bos@133
|
120 The ``\texttt{*}'' character matches any string, within a single
|
bos@133
|
121 directory.
|
bos@133
|
122 \interaction{filenames.glob.star}
|
bos@133
|
123
|
bos@133
|
124 The ``\texttt{**}'' pattern matches any string, and crosses directory
|
bos@133
|
125 boundaries. It's not a standard Unix glob token, but it's accepted by
|
bos@133
|
126 several popular Unix shells, and is very useful.
|
bos@133
|
127 \interaction{filenames.glob.starstar}
|
bos@133
|
128
|
bos@133
|
129 The ``\texttt{?}'' pattern matches any single character.
|
bos@133
|
130 \interaction{filenames.glob.question}
|
bos@133
|
131
|
bos@133
|
132 The ``\texttt{[}'' character begins a \emph{character class}. This
|
bos@133
|
133 matches any single character within the class. The class ends with a
|
bos@133
|
134 ``\texttt{]}'' character. A class may contain multiple \emph{range}s
|
bos@133
|
135 of the form ``\texttt{a-f}'', which is shorthand for
|
bos@133
|
136 ``\texttt{abcdef}''.
|
bos@133
|
137 \interaction{filenames.glob.range}
|
bos@133
|
138 If the first character after the ``\texttt{[}'' in a character class
|
bos@133
|
139 is a ``\texttt{!}'', it \emph{negates} the class, making it match any
|
bos@133
|
140 single character not in the class.
|
bos@133
|
141
|
bos@133
|
142 A ``\texttt{\{}'' begins a group of subpatterns, where the whole group
|
bos@133
|
143 matches if any subpattern in the group matches. The ``\texttt{,}''
|
bos@133
|
144 character separates subpatterns, and ``\texttt{\}}'' ends the group.
|
bos@133
|
145 \interaction{filenames.glob.group}
|
bos@133
|
146
|
bos@133
|
147 \subsubsection{Watch out!}
|
bos@133
|
148
|
bos@133
|
149 Don't forget that if you want to match a pattern in any directory, you
|
bos@133
|
150 should not be using the ``\texttt{*}'' match-any token, as this will
|
bos@133
|
151 only match within one directory. Instead, use the ``\texttt{**}''
|
bos@133
|
152 token. This small example illustrates the difference between the two.
|
bos@133
|
153 \interaction{filenames.glob.star-starstar}
|
bos@133
|
154
|
bos@133
|
155 \subsection{Regular expression matching with \texttt{re} patterns}
|
bos@133
|
156
|
bos@133
|
157 Mercurial accepts the same regular expression syntax as the Python
|
bos@133
|
158 programming language (it uses Python's regexp engine internally).
|
bos@133
|
159 This is based on the Perl language's regexp syntax, which is the most
|
bos@133
|
160 popular dialect in use (it's also used in Java, for example).
|
bos@133
|
161
|
bos@133
|
162 I won't discuss Mercurial's regexp dialect in any detail here, as
|
bos@133
|
163 regexps are not often used. Perl-style regexps are in any case
|
bos@133
|
164 already exhaustively documented on a multitude of web sites, and in
|
bos@133
|
165 many books. Instead, I will focus here on a few things you should
|
bos@133
|
166 know if you find yourself needing to use regexps with Mercurial.
|
bos@133
|
167
|
bos@133
|
168 A regexp is matched against an entire file name, relative to the root
|
bos@133
|
169 of the repository. In other words, even if you're already in
|
bos@133
|
170 subbdirectory \dirname{foo}, if you want to match files under this
|
bos@133
|
171 directory, your pattern must start with ``\texttt{foo/}''.
|
bos@133
|
172
|
bos@133
|
173 One thing to note, if you're familiar with Perl-style regexps, is that
|
bos@133
|
174 Mercurial's are \emph{rooted}. That is, a regexp starts matching
|
bos@133
|
175 against the beginning of a string; it doesn't look for a match
|
arne@264
|
176 anywhere within the string. To match anywhere in a string, start
|
bos@133
|
177 your pattern with ``\texttt{.*}''.
|
bos@133
|
178
|
bos@133
|
179 \section{Filtering files}
|
bos@133
|
180
|
bos@133
|
181 Not only does Mercurial give you a variety of ways to specify files;
|
bos@133
|
182 it lets you further winnow those files using \emph{filters}. Commands
|
bos@133
|
183 that work with file names accept two filtering options.
|
bos@133
|
184 \begin{itemize}
|
bos@133
|
185 \item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern
|
bos@133
|
186 that file names must match in order to be processed.
|
bos@133
|
187 \item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to
|
bos@133
|
188 \emph{avoid} processing files, if they match this pattern.
|
bos@133
|
189 \end{itemize}
|
bos@133
|
190 You can provide multiple \hggopt{-I} and \hggopt{-X} options on the
|
bos@133
|
191 command line, and intermix them as you please. Mercurial interprets
|
bos@133
|
192 the patterns you provide using glob syntax by default (but you can use
|
bos@133
|
193 regexps if you need to).
|
bos@133
|
194
|
bos@133
|
195 You can read a \hggopt{-I} filter as ``process only the files that
|
bos@133
|
196 match this filter''.
|
bos@133
|
197 \interaction{filenames.filter.include}
|
bos@133
|
198 The \hggopt{-X} filter is best read as ``process only the files that
|
bos@133
|
199 don't match this pattern''.
|
bos@133
|
200 \interaction{filenames.filter.exclude}
|
bos@133
|
201
|
bos@133
|
202 \section{Ignoring unwanted files and directories}
|
bos@133
|
203
|
bos@133
|
204 XXX.
|
bos@133
|
205
|
bos@133
|
206 \section{Case sensitivity}
|
bos@133
|
207 \label{sec:names:case}
|
bos@133
|
208
|
bos@133
|
209 If you're working in a mixed development environment that contains
|
bos@133
|
210 both Linux (or other Unix) systems and Macs or Windows systems, you
|
bos@133
|
211 should keep in the back of your mind the knowledge that they treat the
|
bos@133
|
212 case (``N'' versus ``n'') of file names in incompatible ways. This is
|
bos@133
|
213 not very likely to affect you, and it's easy to deal with if it does,
|
bos@133
|
214 but it could surprise you if you don't know about it.
|
bos@133
|
215
|
bos@133
|
216 Operating systems and filesystems differ in the way they handle the
|
bos@133
|
217 \emph{case} of characters in file and directory names. There are
|
bos@133
|
218 three common ways to handle case in names.
|
bos@133
|
219 \begin{itemize}
|
bos@133
|
220 \item Completely case insensitive. Uppercase and lowercase versions
|
bos@133
|
221 of a letter are treated as identical, both when creating a file and
|
bos@133
|
222 during subsequent accesses. This is common on older DOS-based
|
bos@133
|
223 systems.
|
bos@133
|
224 \item Case preserving, but insensitive. When a file or directory is
|
bos@133
|
225 created, the case of its name is stored, and can be retrieved and
|
bos@133
|
226 displayed by the operating system. When an existing file is being
|
bos@133
|
227 looked up, its case is ignored. This is the standard arrangement on
|
bos@133
|
228 Windows and MacOS. The names \filename{foo} and \filename{FoO}
|
bos@133
|
229 identify the same file. This treatment of uppercase and lowercase
|
bos@133
|
230 letters as interchangeable is also referred to as \emph{case
|
bos@133
|
231 folding}.
|
bos@133
|
232 \item Case sensitive. The case of a name is significant at all times.
|
bos@133
|
233 The names \filename{foo} and {FoO} identify different files. This
|
bos@133
|
234 is the way Linux and Unix systems normally work.
|
bos@133
|
235 \end{itemize}
|
bos@133
|
236
|
bos@133
|
237 On Unix-like systems, it is possible to have any or all of the above
|
bos@133
|
238 ways of handling case in action at once. For example, if you use a
|
bos@133
|
239 USB thumb drive formatted with a FAT32 filesystem on a Linux system,
|
bos@133
|
240 Linux will handle names on that filesystem in a case preserving, but
|
bos@133
|
241 insensitive, way.
|
bos@133
|
242
|
bos@133
|
243 \subsection{Safe, portable repository storage}
|
bos@133
|
244
|
bos@133
|
245 Mercurial's repository storage mechanism is \emph{case safe}. It
|
bos@133
|
246 translates file names so that they can be safely stored on both case
|
bos@133
|
247 sensitive and case insensitive filesystems. This means that you can
|
bos@133
|
248 use normal file copying tools to transfer a Mercurial repository onto,
|
bos@133
|
249 for example, a USB thumb drive, and safely move that drive and
|
bos@133
|
250 repository back and forth between a Mac, a PC running Windows, and a
|
bos@133
|
251 Linux box.
|
bos@133
|
252
|
bos@133
|
253 \subsection{Detecting case conflicts}
|
bos@133
|
254
|
bos@133
|
255 When operating in the working directory, Mercurial honours the naming
|
bos@133
|
256 policy of the filesystem where the working directory is located. If
|
bos@133
|
257 the filesystem is case preserving, but insensitive, Mercurial will
|
bos@133
|
258 treat names that differ only in case as the same.
|
bos@133
|
259
|
bos@133
|
260 An important aspect of this approach is that it is possible to commit
|
bos@133
|
261 a changeset on a case sensitive (typically Linux or Unix) filesystem
|
bos@133
|
262 that will cause trouble for users on case insensitive (usually Windows
|
bos@133
|
263 and MacOS) users. If a Linux user commits changes to two files, one
|
bos@133
|
264 named \filename{myfile.c} and the other named \filename{MyFile.C},
|
bos@133
|
265 they will be stored correctly in the repository. And in the working
|
bos@133
|
266 directories of other Linux users, they will be correctly represented
|
bos@133
|
267 as separate files.
|
bos@133
|
268
|
bos@133
|
269 If a Windows or Mac user pulls this change, they will not initially
|
bos@133
|
270 have a problem, because Mercurial's repository storage mechanism is
|
bos@133
|
271 case safe. However, once they try to \hgcmd{update} the working
|
bos@133
|
272 directory to that changeset, or \hgcmd{merge} with that changeset,
|
bos@133
|
273 Mercurial will spot the conflict between the two file names that the
|
bos@133
|
274 filesystem would treat as the same, and forbid the update or merge
|
bos@133
|
275 from occurring.
|
bos@133
|
276
|
bos@133
|
277 \subsection{Fixing a case conflict}
|
bos@133
|
278
|
bos@133
|
279 If you are using Windows or a Mac in a mixed environment where some of
|
bos@133
|
280 your collaborators are using Linux or Unix, and Mercurial reports a
|
bos@133
|
281 case folding conflict when you try to \hgcmd{update} or \hgcmd{merge},
|
bos@133
|
282 the procedure to fix the problem is simple.
|
bos@133
|
283
|
bos@133
|
284 Just find a nearby Linux or Unix box, clone the problem repository
|
bos@133
|
285 onto it, and use Mercurial's \hgcmd{rename} command to change the
|
bos@133
|
286 names of any offending files or directories so that they will no
|
bos@133
|
287 longer cause case folding conflicts. Commit this change, \hgcmd{pull}
|
bos@133
|
288 or \hgcmd{push} it across to your Windows or MacOS system, and
|
bos@133
|
289 \hgcmd{update} to the revision with the non-conflicting names.
|
bos@133
|
290
|
bos@133
|
291 The changeset with case-conflicting names will remain in your
|
bos@133
|
292 project's history, and you still won't be able to \hgcmd{update} your
|
bos@133
|
293 working directory to that changeset on a Windows or MacOS system, but
|
bos@133
|
294 you can continue development unimpeded.
|
bos@133
|
295
|
bos@133
|
296 \begin{note}
|
bos@133
|
297 Prior to version~0.9.3, Mercurial did not use a case safe repository
|
bos@133
|
298 storage mechanism, and did not detect case folding conflicts. If
|
bos@133
|
299 you are using an older version of Mercurial on Windows or MacOS, I
|
bos@133
|
300 strongly recommend that you upgrade.
|
bos@133
|
301 \end{note}
|
bos@133
|
302
|
bos@133
|
303 %%% Local Variables:
|
bos@133
|
304 %%% mode: latex
|
bos@133
|
305 %%% TeX-master: "00book"
|
bos@133
|
306 %%% End:
|