hgbook

view en/ch07-filenames.xml @ 567:8fcd44708f41

Uncomment all the mangled interaction examples.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon Mar 09 23:22:09 2009 -0700 (2009-03-09)
parents 21c62e09b99f
children 13513d2a128d
line source
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
3 <chapter id="chap:names">
4 <title>File names and pattern matching</title>
6 <para>Mercurial provides mechanisms that let you work with file
7 names in a consistent and expressive way.</para>
9 <sect1>
10 <title>Simple file naming</title>
12 <para>Mercurial uses a unified piece of machinery <quote>under the
13 hood</quote> to handle file names. Every command behaves
14 uniformly with respect to file names. The way in which commands
15 work with file names is as follows.</para>
17 <para>If you explicitly name real files on the command line,
18 Mercurial works with exactly those files, as you would expect.
19 &interaction.filenames.files;</para>
21 <para>When you provide a directory name, Mercurial will interpret
22 this as <quote>operate on every file in this directory and its
23 subdirectories</quote>. Mercurial traverses the files and
24 subdirectories in a directory in alphabetical order. When it
25 encounters a subdirectory, it will traverse that subdirectory
26 before continuing with the current directory.</para>
28 &interaction.filenames.dirs;
30 </sect1>
31 <sect1>
32 <title>Running commands without any file names</title>
34 <para>Mercurial's commands that work with file names have useful
35 default behaviours when you invoke them without providing any
36 file names or patterns. What kind of behaviour you should
37 expect depends on what the command does. Here are a few rules
38 of thumb you can use to predict what a command is likely to do
39 if you don't give it any names to work with.</para>
40 <itemizedlist>
41 <listitem><para>Most commands will operate on the entire working
42 directory. This is what the <command role="hg-cmd">hg
43 add</command> command does, for example.</para>
44 </listitem>
45 <listitem><para>If the command has effects that are difficult or
46 impossible to reverse, it will force you to explicitly
47 provide at least one name or pattern (see below). This
48 protects you from accidentally deleting files by running
49 <command role="hg-cmd">hg remove</command> with no
50 arguments, for example.</para>
51 </listitem></itemizedlist>
53 <para>It's easy to work around these default behaviours if they
54 don't suit you. If a command normally operates on the whole
55 working directory, you can invoke it on just the current
56 directory and its subdirectories by giving it the name
57 <quote><filename class="directory">.</filename></quote>.</para>
59 &interaction.filenames.wdir-subdir;
61 <para>Along the same lines, some commands normally print file
62 names relative to the root of the repository, even if you're
63 invoking them from a subdirectory. Such a command will print
64 file names relative to your subdirectory if you give it explicit
65 names. Here, we're going to run <command role="hg-cmd">hg
66 status</command> from a subdirectory, and get it to operate on
67 the entire working directory while printing file names relative
68 to our subdirectory, by passing it the output of the <command
69 role="hg-cmd">hg root</command> command.</para>
71 &interaction.filenames.wdir-relname;
73 </sect1>
74 <sect1>
75 <title>Telling you what's going on</title>
77 <para>The <command role="hg-cmd">hg add</command> example in the
78 preceding section illustrates something else that's helpful
79 about Mercurial commands. If a command operates on a file that
80 you didn't name explicitly on the command line, it will usually
81 print the name of the file, so that you will not be surprised
82 what's going on.</para>
84 <para>The principle here is of <emphasis>least
85 surprise</emphasis>. If you've exactly named a file on the
86 command line, there's no point in repeating it back at you. If
87 Mercurial is acting on a file <emphasis>implicitly</emphasis>,
88 because you provided no names, or a directory, or a pattern (see
89 below), it's safest to tell you what it's doing.</para>
91 <para>For commands that behave this way, you can silence them
92 using the <option role="hg-opt-global">-q</option> option. You
93 can also get them to print the name of every file, even those
94 you've named explicitly, using the <option
95 role="hg-opt-global">-v</option> option.</para>
97 </sect1>
98 <sect1>
99 <title>Using patterns to identify files</title>
101 <para>In addition to working with file and directory names,
102 Mercurial lets you use <emphasis>patterns</emphasis> to identify
103 files. Mercurial's pattern handling is expressive.</para>
105 <para>On Unix-like systems (Linux, MacOS, etc.), the job of
106 matching file names to patterns normally falls to the shell. On
107 these systems, you must explicitly tell Mercurial that a name is
108 a pattern. On Windows, the shell does not expand patterns, so
109 Mercurial will automatically identify names that are patterns,
110 and expand them for you.</para>
112 <para>To provide a pattern in place of a regular name on the
113 command line, the mechanism is simple:</para>
114 <programlisting>syntax:patternbody</programlisting>
115 <para>That is, a pattern is identified by a short text string that
116 says what kind of pattern this is, followed by a colon, followed
117 by the actual pattern.</para>
119 <para>Mercurial supports two kinds of pattern syntax. The most
120 frequently used is called <literal>glob</literal>; this is the
121 same kind of pattern matching used by the Unix shell, and should
122 be familiar to Windows command prompt users, too.</para>
124 <para>When Mercurial does automatic pattern matching on Windows,
125 it uses <literal>glob</literal> syntax. You can thus omit the
126 <quote><literal>glob:</literal></quote> prefix on Windows, but
127 it's safe to use it, too.</para>
129 <para>The <literal>re</literal> syntax is more powerful; it lets
130 you specify patterns using regular expressions, also known as
131 regexps.</para>
133 <para>By the way, in the examples that follow, notice that I'm
134 careful to wrap all of my patterns in quote characters, so that
135 they won't get expanded by the shell before Mercurial sees
136 them.</para>
138 <sect2>
139 <title>Shell-style <literal>glob</literal> patterns</title>
141 <para>This is an overview of the kinds of patterns you can use
142 when you're matching on glob patterns.</para>
144 <para>The <quote><literal>*</literal></quote> character matches
145 any string, within a single directory.</para>
147 &interaction.filenames.glob.star;
149 <para>The <quote><literal>**</literal></quote> pattern matches
150 any string, and crosses directory boundaries. It's not a
151 standard Unix glob token, but it's accepted by several popular
152 Unix shells, and is very useful.</para>
154 &interaction.filenames.glob.starstar;
156 <para>The <quote><literal>?</literal></quote> pattern matches
157 any single character.</para>
159 &interaction.filenames.glob.question;
161 <para>The <quote><literal>[</literal></quote> character begins a
162 <emphasis>character class</emphasis>. This matches any single
163 character within the class. The class ends with a
164 <quote><literal>]</literal></quote> character. A class may
165 contain multiple <emphasis>range</emphasis>s of the form
166 <quote><literal>a-f</literal></quote>, which is shorthand for
167 <quote><literal>abcdef</literal></quote>.</para>
169 &interaction.filenames.glob.range;
171 <para>If the first character after the
172 <quote><literal>[</literal></quote> in a character class is a
173 <quote><literal>!</literal></quote>, it
174 <emphasis>negates</emphasis> the class, making it match any
175 single character not in the class.</para>
177 <para>A <quote><literal>{</literal></quote> begins a group of
178 subpatterns, where the whole group matches if any subpattern
179 in the group matches. The <quote><literal>,</literal></quote>
180 character separates subpatterns, and
181 <quote><literal>}</literal></quote> ends the group.</para>
183 &interaction.filenames.glob.group;
185 <sect3>
186 <title>Watch out!</title>
188 <para>Don't forget that if you want to match a pattern in any
189 directory, you should not be using the
190 <quote><literal>*</literal></quote> match-any token, as this
191 will only match within one directory. Instead, use the
192 <quote><literal>**</literal></quote> token. This small
193 example illustrates the difference between the two.</para>
195 &interaction.filenames.glob.star-starstar;
197 </sect3>
198 </sect2>
199 <sect2>
200 <title>Regular expression matching with <literal>re</literal>
201 patterns</title>
203 <para>Mercurial accepts the same regular expression syntax as
204 the Python programming language (it uses Python's regexp
205 engine internally). This is based on the Perl language's
206 regexp syntax, which is the most popular dialect in use (it's
207 also used in Java, for example).</para>
209 <para>I won't discuss Mercurial's regexp dialect in any detail
210 here, as regexps are not often used. Perl-style regexps are
211 in any case already exhaustively documented on a multitude of
212 web sites, and in many books. Instead, I will focus here on a
213 few things you should know if you find yourself needing to use
214 regexps with Mercurial.</para>
216 <para>A regexp is matched against an entire file name, relative
217 to the root of the repository. In other words, even if you're
218 already in subbdirectory <filename
219 class="directory">foo</filename>, if you want to match files
220 under this directory, your pattern must start with
221 <quote><literal>foo/</literal></quote>.</para>
223 <para>One thing to note, if you're familiar with Perl-style
224 regexps, is that Mercurial's are <emphasis>rooted</emphasis>.
225 That is, a regexp starts matching against the beginning of a
226 string; it doesn't look for a match anywhere within the
227 string. To match anywhere in a string, start your pattern
228 with <quote><literal>.*</literal></quote>.</para>
230 </sect2>
231 </sect1>
232 <sect1>
233 <title>Filtering files</title>
235 <para>Not only does Mercurial give you a variety of ways to
236 specify files; it lets you further winnow those files using
237 <emphasis>filters</emphasis>. Commands that work with file
238 names accept two filtering options.</para>
239 <itemizedlist>
240 <listitem><para><option role="hg-opt-global">-I</option>, or
241 <option role="hg-opt-global">--include</option>, lets you
242 specify a pattern that file names must match in order to be
243 processed.</para>
244 </listitem>
245 <listitem><para><option role="hg-opt-global">-X</option>, or
246 <option role="hg-opt-global">--exclude</option>, gives you a
247 way to <emphasis>avoid</emphasis> processing files, if they
248 match this pattern.</para>
249 </listitem></itemizedlist>
250 <para>You can provide multiple <option
251 role="hg-opt-global">-I</option> and <option
252 role="hg-opt-global">-X</option> options on the command line,
253 and intermix them as you please. Mercurial interprets the
254 patterns you provide using glob syntax by default (but you can
255 use regexps if you need to).</para>
257 <para>You can read a <option role="hg-opt-global">-I</option>
258 filter as <quote>process only the files that match this
259 filter</quote>.</para>
261 &interaction.filenames.filter.include;
263 <para>The <option role="hg-opt-global">-X</option> filter is best
264 read as <quote>process only the files that don't match this
265 pattern</quote>.</para>
267 &interaction.filenames.filter.exclude;
269 </sect1>
270 <sect1>
271 <title>Ignoring unwanted files and directories</title>
273 <para>XXX.</para>
275 </sect1>
276 <sect1 id="sec:names:case">
277 <title>Case sensitivity</title>
279 <para>If you're working in a mixed development environment that
280 contains both Linux (or other Unix) systems and Macs or Windows
281 systems, you should keep in the back of your mind the knowledge
282 that they treat the case (<quote>N</quote> versus
283 <quote>n</quote>) of file names in incompatible ways. This is
284 not very likely to affect you, and it's easy to deal with if it
285 does, but it could surprise you if you don't know about
286 it.</para>
288 <para>Operating systems and filesystems differ in the way they
289 handle the <emphasis>case</emphasis> of characters in file and
290 directory names. There are three common ways to handle case in
291 names.</para>
292 <itemizedlist>
293 <listitem><para>Completely case insensitive. Uppercase and
294 lowercase versions of a letter are treated as identical,
295 both when creating a file and during subsequent accesses.
296 This is common on older DOS-based systems.</para>
297 </listitem>
298 <listitem><para>Case preserving, but insensitive. When a file
299 or directory is created, the case of its name is stored, and
300 can be retrieved and displayed by the operating system.
301 When an existing file is being looked up, its case is
302 ignored. This is the standard arrangement on Windows and
303 MacOS. The names <filename>foo</filename> and
304 <filename>FoO</filename> identify the same file. This
305 treatment of uppercase and lowercase letters as
306 interchangeable is also referred to as <emphasis>case
307 folding</emphasis>.</para>
308 </listitem>
309 <listitem><para>Case sensitive. The case of a name is
310 significant at all times. The names <filename>foo</filename>
311 and {FoO} identify different files. This is the way Linux
312 and Unix systems normally work.</para>
313 </listitem></itemizedlist>
315 <para>On Unix-like systems, it is possible to have any or all of
316 the above ways of handling case in action at once. For example,
317 if you use a USB thumb drive formatted with a FAT32 filesystem
318 on a Linux system, Linux will handle names on that filesystem in
319 a case preserving, but insensitive, way.</para>
321 <sect2>
322 <title>Safe, portable repository storage</title>
324 <para>Mercurial's repository storage mechanism is <emphasis>case
325 safe</emphasis>. It translates file names so that they can
326 be safely stored on both case sensitive and case insensitive
327 filesystems. This means that you can use normal file copying
328 tools to transfer a Mercurial repository onto, for example, a
329 USB thumb drive, and safely move that drive and repository
330 back and forth between a Mac, a PC running Windows, and a
331 Linux box.</para>
333 </sect2>
334 <sect2>
335 <title>Detecting case conflicts</title>
337 <para>When operating in the working directory, Mercurial honours
338 the naming policy of the filesystem where the working
339 directory is located. If the filesystem is case preserving,
340 but insensitive, Mercurial will treat names that differ only
341 in case as the same.</para>
343 <para>An important aspect of this approach is that it is
344 possible to commit a changeset on a case sensitive (typically
345 Linux or Unix) filesystem that will cause trouble for users on
346 case insensitive (usually Windows and MacOS) users. If a
347 Linux user commits changes to two files, one named
348 <filename>myfile.c</filename> and the other named
349 <filename>MyFile.C</filename>, they will be stored correctly
350 in the repository. And in the working directories of other
351 Linux users, they will be correctly represented as separate
352 files.</para>
354 <para>If a Windows or Mac user pulls this change, they will not
355 initially have a problem, because Mercurial's repository
356 storage mechanism is case safe. However, once they try to
357 <command role="hg-cmd">hg update</command> the working
358 directory to that changeset, or <command role="hg-cmd">hg
359 merge</command> with that changeset, Mercurial will spot the
360 conflict between the two file names that the filesystem would
361 treat as the same, and forbid the update or merge from
362 occurring.</para>
364 </sect2>
365 <sect2>
366 <title>Fixing a case conflict</title>
368 <para>If you are using Windows or a Mac in a mixed environment
369 where some of your collaborators are using Linux or Unix, and
370 Mercurial reports a case folding conflict when you try to
371 <command role="hg-cmd">hg update</command> or <command
372 role="hg-cmd">hg merge</command>, the procedure to fix the
373 problem is simple.</para>
375 <para>Just find a nearby Linux or Unix box, clone the problem
376 repository onto it, and use Mercurial's <command
377 role="hg-cmd">hg rename</command> command to change the
378 names of any offending files or directories so that they will
379 no longer cause case folding conflicts. Commit this change,
380 <command role="hg-cmd">hg pull</command> or <command
381 role="hg-cmd">hg push</command> it across to your Windows or
382 MacOS system, and <command role="hg-cmd">hg update</command>
383 to the revision with the non-conflicting names.</para>
385 <para>The changeset with case-conflicting names will remain in
386 your project's history, and you still won't be able to
387 <command role="hg-cmd">hg update</command> your working
388 directory to that changeset on a Windows or MacOS system, but
389 you can continue development unimpeded.</para>
391 <note>
392 <para> Prior to version 0.9.3, Mercurial did not use a case
393 safe repository storage mechanism, and did not detect case
394 folding conflicts. If you are using an older version of
395 Mercurial on Windows or MacOS, I strongly recommend that you
396 upgrade.</para>
397 </note>
399 </sect2>
400 </sect1>
401 </chapter>
403 <!--
404 local variables:
405 sgml-parent-document: ("00book.xml" "book" "chapter")
406 end:
407 -->