hgbook

view en/ch07-filenames.xml @ 753:6ff5cf15b3c9

Complete revision of Ch.7.
author Giulio@puck
date Sun Jul 12 21:35:06 2009 +0200 (2009-07-12)
parents 477d6a3e5023
children
line source
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
3 <chapter id="chap:names">
4 <?dbhtml filename="file-names-and-pattern-matching.html"?>
5 <title>File names and pattern matching</title>
7 <para id="x_543">Mercurial provides mechanisms that let you work with file
8 names in a consistent and expressive way.</para>
10 <sect1>
11 <title>Simple file naming</title>
13 <para id="x_544">Mercurial uses a unified piece of machinery <quote>under the
14 hood</quote> to handle file names. Every command behaves
15 uniformly with respect to file names. The way in which commands
16 work with file names is as follows.</para>
18 <para id="x_545">If you explicitly name real files on the command line,
19 Mercurial works with exactly those files, as you would expect.
20 &interaction.filenames.files;</para>
22 <para id="x_546">When you provide a directory name, Mercurial will interpret
23 this as <quote>operate on every file in this directory and its
24 subdirectories</quote>. Mercurial traverses the files and
25 subdirectories in a directory in alphabetical order. When it
26 encounters a subdirectory, it will traverse that subdirectory
27 before continuing with the current directory.</para>
29 &interaction.filenames.dirs;
30 </sect1>
32 <sect1>
33 <title>Running commands without any file names</title>
35 <para id="x_547">Mercurial's commands that work with file names have useful
36 default behaviors when you invoke them without providing any
37 file names or patterns. What kind of behavior you should
38 expect depends on what the command does. Here are a few rules
39 of thumb you can use to predict what a command is likely to do
40 if you don't give it any names to work with.</para>
41 <itemizedlist>
42 <listitem><para id="x_548">Most commands will operate on the entire working
43 directory. This is what the <command role="hg-cmd">hg
44 add</command> command does, for example.</para>
45 </listitem>
46 <listitem><para id="x_549">If the command has effects that are difficult or
47 impossible to reverse, it will force you to explicitly
48 provide at least one name or pattern (see below). This
49 protects you from accidentally deleting files by running
50 <command role="hg-cmd">hg remove</command> with no
51 arguments, for example.</para>
52 </listitem></itemizedlist>
54 <para id="x_54a">It's easy to work around these default behaviors if they
55 don't suit you. If a command normally operates on the whole
56 working directory, you can invoke it on just the current
57 directory and its subdirectories by giving it the name
58 <quote><filename class="directory">.</filename></quote>.</para>
60 &interaction.filenames.wdir-subdir;
62 <para id="x_54b">Along the same lines, some commands normally print file
63 names relative to the root of the repository, even if you're
64 invoking them from a subdirectory. Such a command will print
65 file names relative to your subdirectory if you give it explicit
66 names. Here, we're going to run <command role="hg-cmd">hg
67 status</command> from a subdirectory, and get it to operate on
68 the entire working directory while printing file names relative
69 to our subdirectory, by passing it the output of the <command
70 role="hg-cmd">hg root</command> command.</para>
72 &interaction.filenames.wdir-relname;
73 </sect1>
75 <sect1>
76 <title>Telling you what's going on</title>
78 <para id="x_54c">The <command role="hg-cmd">hg add</command> example in the
79 preceding section illustrates something else that's helpful
80 about Mercurial commands. If a command operates on a file that
81 you didn't name explicitly on the command line, it will usually
82 print the name of the file, so that you will not be surprised
83 what's going on.</para>
85 <para id="x_54d">The principle here is of <emphasis>least
86 surprise</emphasis>. If you've exactly named a file on the
87 command line, there's no point in repeating it back at you. If
88 Mercurial is acting on a file <emphasis>implicitly</emphasis>, e.g.
89 because you provided no names, or a directory, or a pattern (see
90 below), it is safest to tell you what files it's operating on.</para>
92 <para id="x_54e">For commands that behave this way, you can silence them
93 using the <option role="hg-opt-global">-q</option> option. You
94 can also get them to print the name of every file, even those
95 you've named explicitly, using the <option
96 role="hg-opt-global">-v</option> option.</para>
97 </sect1>
99 <sect1>
100 <title>Using patterns to identify files</title>
102 <para id="x_54f">In addition to working with file and directory names,
103 Mercurial lets you use <emphasis>patterns</emphasis> to identify
104 files. Mercurial's pattern handling is expressive.</para>
106 <para id="x_550">On Unix-like systems (Linux, MacOS, etc.), the job of
107 matching file names to patterns normally falls to the shell. On
108 these systems, you must explicitly tell Mercurial that a name is
109 a pattern. On Windows, the shell does not expand patterns, so
110 Mercurial will automatically identify names that are patterns,
111 and expand them for you.</para>
113 <para id="x_551">To provide a pattern in place of a regular name on the
114 command line, the mechanism is simple:</para>
115 <programlisting>syntax:patternbody</programlisting>
116 <para id="x_552">That is, a pattern is identified by a short text string that
117 says what kind of pattern this is, followed by a colon, followed
118 by the actual pattern.</para>
120 <para id="x_553">Mercurial supports two kinds of pattern syntax. The most
121 frequently used is called <literal>glob</literal>; this is the
122 same kind of pattern matching used by the Unix shell, and should
123 be familiar to Windows command prompt users, too.</para>
125 <para id="x_554">When Mercurial does automatic pattern matching on Windows,
126 it uses <literal>glob</literal> syntax. You can thus omit the
127 <quote><literal>glob:</literal></quote> prefix on Windows, but
128 it's safe to use it, too.</para>
130 <para id="x_555">The <literal>re</literal> syntax is more powerful; it lets
131 you specify patterns using regular expressions, also known as
132 regexps.</para>
134 <para id="x_556">By the way, in the examples that follow, notice that I'm
135 careful to wrap all of my patterns in quote characters, so that
136 they won't get expanded by the shell before Mercurial sees
137 them.</para>
139 <sect2>
140 <title>Shell-style <literal>glob</literal> patterns</title>
142 <para id="x_557">This is an overview of the kinds of patterns you can use
143 when you're matching on glob patterns.</para>
145 <para id="x_558">The <quote><literal>*</literal></quote> character matches
146 any string, within a single directory.</para>
148 &interaction.filenames.glob.star;
150 <para id="x_559">The <quote><literal>**</literal></quote> pattern matches
151 any string, and crosses directory boundaries. It's not a
152 standard Unix glob token, but it's accepted by several popular
153 Unix shells, and is very useful.</para>
155 &interaction.filenames.glob.starstar;
157 <para id="x_55a">The <quote><literal>?</literal></quote> pattern matches
158 any single character.</para>
160 &interaction.filenames.glob.question;
162 <para id="x_55b">The <quote><literal>[</literal></quote> character begins a
163 <emphasis>character class</emphasis>. This matches any single
164 character within the class. The class ends with a
165 <quote><literal>]</literal></quote> character. A class may
166 contain multiple <emphasis>range</emphasis>s of the form
167 <quote><literal>a-f</literal></quote>, which is shorthand for
168 <quote><literal>abcdef</literal></quote>.</para>
170 &interaction.filenames.glob.range;
172 <para id="x_55c">If the first character after the
173 <quote><literal>[</literal></quote> in a character class is a
174 <quote><literal>!</literal></quote>, it
175 <emphasis>negates</emphasis> the class, making it match any
176 single character not in the class.</para>
178 <para id="x_55d">A <quote><literal>{</literal></quote> begins a group of
179 subpatterns, where the whole group matches if any subpattern
180 in the group matches. The <quote><literal>,</literal></quote>
181 character separates subpatterns, and
182 <quote><literal>}</literal></quote> ends the group.</para>
184 &interaction.filenames.glob.group;
186 <sect3>
187 <title>Watch out!</title>
189 <para id="x_55e">Don't forget that if you want to match a pattern in any
190 directory, you should not be using the
191 <quote><literal>*</literal></quote> match-any token, as this
192 will only match within one directory. Instead, use the
193 <quote><literal>**</literal></quote> token. This small
194 example illustrates the difference between the two.</para>
196 &interaction.filenames.glob.star-starstar;
197 </sect3>
198 </sect2>
200 <sect2>
201 <title>Regular expression matching with <literal>re</literal>
202 patterns</title>
204 <para id="x_55f">Mercurial accepts the same regular expression syntax as
205 the Python programming language (it uses Python's regexp
206 engine internally). This is based on the Perl language's
207 regexp syntax, which is the most popular dialect in use (it's
208 also used in Java, for example).</para>
210 <para id="x_560">I won't discuss Mercurial's regexp dialect in any detail
211 here, as regexps are not often used. Perl-style regexps are
212 in any case already exhaustively documented on a multitude of
213 web sites, and in many books. Instead, I will focus here on a
214 few things you should know if you find yourself needing to use
215 regexps with Mercurial.</para>
217 <para id="x_561">A regexp is matched against an entire file name, relative
218 to the root of the repository. In other words, even if you're
219 already in subbdirectory <filename
220 class="directory">foo</filename>, if you want to match files
221 under this directory, your pattern must start with
222 <quote><literal>foo/</literal></quote>.</para>
224 <para id="x_562">One thing to note, if you're familiar with Perl-style
225 regexps, is that Mercurial's are <emphasis>rooted</emphasis>.
226 That is, a regexp starts matching against the beginning of a
227 string; it doesn't look for a match anywhere within the
228 string. To match anywhere in a string, start your pattern
229 with <quote><literal>.*</literal></quote>.</para>
230 </sect2>
231 </sect1>
233 <sect1>
234 <title>Filtering files</title>
236 <para id="x_563">Not only does Mercurial give you a variety of ways to
237 specify files; it lets you further winnow those files using
238 <emphasis>filters</emphasis>. Commands that work with file
239 names accept two filtering options.</para>
240 <itemizedlist>
241 <listitem><para id="x_564"><option role="hg-opt-global">-I</option>, or
242 <option role="hg-opt-global">--include</option>, lets you
243 specify a pattern that file names must match in order to be
244 processed.</para>
245 </listitem>
246 <listitem><para id="x_565"><option role="hg-opt-global">-X</option>, or
247 <option role="hg-opt-global">--exclude</option>, gives you a
248 way to <emphasis>avoid</emphasis> processing files, if they
249 match this pattern.</para>
250 </listitem></itemizedlist>
251 <para id="x_566">You can provide multiple <option
252 role="hg-opt-global">-I</option> and <option
253 role="hg-opt-global">-X</option> options on the command line,
254 and intermix them as you please. Mercurial interprets the
255 patterns you provide using glob syntax by default (but you can
256 use regexps if you need to).</para>
258 <para id="x_567">You can read a <option role="hg-opt-global">-I</option>
259 filter as <quote>process only the files that match this
260 filter</quote>.</para>
262 &interaction.filenames.filter.include;
264 <para id="x_568">The <option role="hg-opt-global">-X</option> filter is best
265 read as <quote>process only the files that don't match this
266 pattern</quote>.</para>
268 &interaction.filenames.filter.exclude;
269 </sect1>
271 <sect1>
272 <title>Permanently ignoring unwanted files and directories</title>
274 <para id="x_569">When you create a new repository, the chances are
275 that over time it will grow to contain files that ought to
276 <emphasis>not</emphasis> be managed by Mercurial, but which you
277 don't want to see listed every time you run <command>hg
278 status</command>. For instance, <quote>build products</quote>
279 are files that are created as part of a build but which should
280 not be managed by a revision control system. The most common
281 build products are output files produced by software tools such
282 as compilers. As another example, many text editors litter a
283 directory with lock files, temporary working files, and backup
284 files, which it also makes no sense to manage.</para>
286 <para id="x_6b4">To have Mercurial permanently ignore such files, create a
287 file named <filename>.hgignore</filename> in the root of your
288 repository. You <emphasis>should</emphasis> <command>hg
289 add</command> this file so that it gets tracked with the rest of
290 your repository contents, since your collaborators will probably
291 find it useful too.</para>
293 <para id="x_6b5">By default, the <filename>.hgignore</filename> file should
294 contain a list of regular expressions, one per line. Empty
295 lines are skipped. Most people prefer to describe the files they
296 want to ignore using the <quote>glob</quote> syntax that we
297 described above, so a typical <filename>.hgignore</filename>
298 file will start with this directive:</para>
300 <programlisting>syntax: glob</programlisting>
302 <para id="x_6b6">This tells Mercurial to interpret the lines that follow as
303 glob patterns, not regular expressions.</para>
305 <para id="x_6b7">Here is a typical-looking <filename>.hgignore</filename>
306 file.</para>
308 <programlisting>syntax: glob
309 # This line is a comment, and will be skipped.
310 # Empty lines are skipped too.
312 # Backup files left behind by the Emacs editor.
313 *~
315 # Lock files used by the Emacs editor.
316 # Notice that the "#" character is quoted with a backslash.
317 # This prevents it from being interpreted as starting a comment.
318 .\#*
320 # Temporary files used by the vim editor.
321 .*.swp
323 # A hidden file created by the Mac OS X Finder.
324 .DS_Store
325 </programlisting>
326 </sect1>
328 <sect1 id="sec:names:case">
329 <title>Case sensitivity</title>
331 <para id="x_56a">If you're working in a mixed development environment that
332 contains both Linux (or other Unix) systems and Macs or Windows
333 systems, you should keep in the back of your mind the knowledge
334 that they treat the case (<quote>N</quote> versus
335 <quote>n</quote>) of file names in incompatible ways. This is
336 not very likely to affect you, and it's easy to deal with if it
337 does, but it could surprise you if you don't know about
338 it.</para>
340 <para id="x_56b">Operating systems and filesystems differ in the way they
341 handle the <emphasis>case</emphasis> of characters in file and
342 directory names. There are three common ways to handle case in
343 names.</para>
344 <itemizedlist>
345 <listitem><para id="x_56c">Completely case insensitive. Uppercase and
346 lowercase versions of a letter are treated as identical,
347 both when creating a file and during subsequent accesses.
348 This is common on older DOS-based systems.</para>
349 </listitem>
350 <listitem><para id="x_56d">Case preserving, but insensitive. When a file
351 or directory is created, the case of its name is stored, and
352 can be retrieved and displayed by the operating system.
353 When an existing file is being looked up, its case is
354 ignored. This is the standard arrangement on Windows and
355 MacOS. The names <filename>foo</filename> and
356 <filename>FoO</filename> identify the same file. This
357 treatment of uppercase and lowercase letters as
358 interchangeable is also referred to as <emphasis>case
359 folding</emphasis>.</para>
360 </listitem>
361 <listitem><para id="x_56e">Case sensitive. The case of a name
362 is significant at all times. The names
363 <filename>foo</filename> and <filename>FoO</filename>
364 identify different files. This is the way Linux and Unix
365 systems normally work.</para>
366 </listitem></itemizedlist>
368 <para id="x_56f">On Unix-like systems, it is possible to have any or all of
369 the above ways of handling case in action at once. For example,
370 if you use a USB thumb drive formatted with a FAT32 filesystem
371 on a Linux system, Linux will handle names on that filesystem in
372 a case preserving, but insensitive, way.</para>
374 <sect2>
375 <title>Safe, portable repository storage</title>
377 <para id="x_570">Mercurial's repository storage mechanism is <emphasis>case
378 safe</emphasis>. It translates file names so that they can
379 be safely stored on both case sensitive and case insensitive
380 filesystems. This means that you can use normal file copying
381 tools to transfer a Mercurial repository onto, for example, a
382 USB thumb drive, and safely move that drive and repository
383 back and forth between a Mac, a PC running Windows, and a
384 Linux box.</para>
386 </sect2>
387 <sect2>
388 <title>Detecting case conflicts</title>
390 <para id="x_571">When operating in the working directory, Mercurial honours
391 the naming policy of the filesystem where the working
392 directory is located. If the filesystem is case preserving,
393 but insensitive, Mercurial will treat names that differ only
394 in case as the same.</para>
396 <para id="x_572">An important aspect of this approach is that it is
397 possible to commit a changeset on a case sensitive (typically
398 Linux or Unix) filesystem that will cause trouble for users on
399 case insensitive (usually Windows and MacOS) users. If a
400 Linux user commits changes to two files, one named
401 <filename>myfile.c</filename> and the other named
402 <filename>MyFile.C</filename>, they will be stored correctly
403 in the repository. And in the working directories of other
404 Linux users, they will be correctly represented as separate
405 files.</para>
407 <para id="x_573">If a Windows or Mac user pulls this change, they will not
408 initially have a problem, because Mercurial's repository
409 storage mechanism is case safe. However, once they try to
410 <command role="hg-cmd">hg update</command> the working
411 directory to that changeset, or <command role="hg-cmd">hg
412 merge</command> with that changeset, Mercurial will spot the
413 conflict between the two file names that the filesystem would
414 treat as the same, and forbid the update or merge from
415 occurring.</para>
416 </sect2>
418 <sect2>
419 <title>Fixing a case conflict</title>
421 <para id="x_574">If you are using Windows or a Mac in a mixed environment
422 where some of your collaborators are using Linux or Unix, and
423 Mercurial reports a case folding conflict when you try to
424 <command role="hg-cmd">hg update</command> or <command
425 role="hg-cmd">hg merge</command>, the procedure to fix the
426 problem is simple.</para>
428 <para id="x_575">Just find a nearby Linux or Unix box, clone the problem
429 repository onto it, and use Mercurial's <command
430 role="hg-cmd">hg rename</command> command to change the
431 names of any offending files or directories so that they will
432 no longer cause case folding conflicts. Commit this change,
433 <command role="hg-cmd">hg pull</command> or <command
434 role="hg-cmd">hg push</command> it across to your Windows or
435 MacOS system, and <command role="hg-cmd">hg update</command>
436 to the revision with the non-conflicting names.</para>
438 <para id="x_576">The changeset with case-conflicting names will remain in
439 your project's history, and you still won't be able to
440 <command role="hg-cmd">hg update</command> your working
441 directory to that changeset on a Windows or MacOS system, but
442 you can continue development unimpeded.</para>
443 </sect2>
444 </sect1>
445 </chapter>
447 <!--
448 local variables:
449 sgml-parent-document: ("00book.xml" "book" "chapter")
450 end:
451 -->