hgbook

annotate en/ch07-filenames.xml @ 875:35ddb2ce38fb

Update README
author Dongsheng Song <dongsheng.song@gmail.com>
date Wed Oct 21 11:24:26 2009 +0800 (2009-10-21)
parents 477d6a3e5023
children
rev   line source
bos@559 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
bos@559 2
bos@559 3 <chapter id="chap:names">
bos@572 4 <?dbhtml filename="file-names-and-pattern-matching.html"?>
bos@559 5 <title>File names and pattern matching</title>
bos@559 6
bos@584 7 <para id="x_543">Mercurial provides mechanisms that let you work with file
bos@559 8 names in a consistent and expressive way.</para>
bos@559 9
bos@559 10 <sect1>
bos@559 11 <title>Simple file naming</title>
bos@559 12
bos@584 13 <para id="x_544">Mercurial uses a unified piece of machinery <quote>under the
bos@559 14 hood</quote> to handle file names. Every command behaves
bos@559 15 uniformly with respect to file names. The way in which commands
bos@559 16 work with file names is as follows.</para>
bos@559 17
bos@584 18 <para id="x_545">If you explicitly name real files on the command line,
bos@559 19 Mercurial works with exactly those files, as you would expect.
bos@567 20 &interaction.filenames.files;</para>
bos@559 21
bos@584 22 <para id="x_546">When you provide a directory name, Mercurial will interpret
bos@559 23 this as <quote>operate on every file in this directory and its
bos@559 24 subdirectories</quote>. Mercurial traverses the files and
bos@559 25 subdirectories in a directory in alphabetical order. When it
bos@559 26 encounters a subdirectory, it will traverse that subdirectory
bos@567 27 before continuing with the current directory.</para>
bos@567 28
bos@567 29 &interaction.filenames.dirs;
bos@675 30 </sect1>
bos@675 31
bos@559 32 <sect1>
bos@559 33 <title>Running commands without any file names</title>
bos@559 34
bos@584 35 <para id="x_547">Mercurial's commands that work with file names have useful
bos@672 36 default behaviors when you invoke them without providing any
bos@672 37 file names or patterns. What kind of behavior you should
bos@559 38 expect depends on what the command does. Here are a few rules
bos@559 39 of thumb you can use to predict what a command is likely to do
bos@559 40 if you don't give it any names to work with.</para>
bos@559 41 <itemizedlist>
bos@584 42 <listitem><para id="x_548">Most commands will operate on the entire working
bos@559 43 directory. This is what the <command role="hg-cmd">hg
bos@559 44 add</command> command does, for example.</para>
bos@559 45 </listitem>
bos@584 46 <listitem><para id="x_549">If the command has effects that are difficult or
bos@559 47 impossible to reverse, it will force you to explicitly
bos@559 48 provide at least one name or pattern (see below). This
bos@559 49 protects you from accidentally deleting files by running
bos@559 50 <command role="hg-cmd">hg remove</command> with no
bos@559 51 arguments, for example.</para>
bos@559 52 </listitem></itemizedlist>
bos@559 53
bos@672 54 <para id="x_54a">It's easy to work around these default behaviors if they
bos@559 55 don't suit you. If a command normally operates on the whole
bos@559 56 working directory, you can invoke it on just the current
bos@559 57 directory and its subdirectories by giving it the name
bos@567 58 <quote><filename class="directory">.</filename></quote>.</para>
bos@567 59
bos@567 60 &interaction.filenames.wdir-subdir;
bos@559 61
bos@584 62 <para id="x_54b">Along the same lines, some commands normally print file
bos@559 63 names relative to the root of the repository, even if you're
bos@559 64 invoking them from a subdirectory. Such a command will print
bos@559 65 file names relative to your subdirectory if you give it explicit
bos@559 66 names. Here, we're going to run <command role="hg-cmd">hg
bos@559 67 status</command> from a subdirectory, and get it to operate on
bos@559 68 the entire working directory while printing file names relative
bos@559 69 to our subdirectory, by passing it the output of the <command
bos@567 70 role="hg-cmd">hg root</command> command.</para>
bos@567 71
bos@567 72 &interaction.filenames.wdir-relname;
bos@675 73 </sect1>
bos@675 74
bos@559 75 <sect1>
bos@559 76 <title>Telling you what's going on</title>
bos@559 77
bos@584 78 <para id="x_54c">The <command role="hg-cmd">hg add</command> example in the
bos@559 79 preceding section illustrates something else that's helpful
bos@559 80 about Mercurial commands. If a command operates on a file that
bos@559 81 you didn't name explicitly on the command line, it will usually
bos@559 82 print the name of the file, so that you will not be surprised
bos@559 83 what's going on.</para>
bos@559 84
bos@584 85 <para id="x_54d">The principle here is of <emphasis>least
bos@559 86 surprise</emphasis>. If you've exactly named a file on the
bos@559 87 command line, there's no point in repeating it back at you. If
bos@675 88 Mercurial is acting on a file <emphasis>implicitly</emphasis>, e.g.
bos@559 89 because you provided no names, or a directory, or a pattern (see
bos@675 90 below), it is safest to tell you what files it's operating on.</para>
bos@559 91
bos@584 92 <para id="x_54e">For commands that behave this way, you can silence them
bos@559 93 using the <option role="hg-opt-global">-q</option> option. You
bos@559 94 can also get them to print the name of every file, even those
bos@559 95 you've named explicitly, using the <option
bos@559 96 role="hg-opt-global">-v</option> option.</para>
bos@675 97 </sect1>
bos@675 98
bos@559 99 <sect1>
bos@559 100 <title>Using patterns to identify files</title>
bos@559 101
bos@584 102 <para id="x_54f">In addition to working with file and directory names,
bos@559 103 Mercurial lets you use <emphasis>patterns</emphasis> to identify
bos@559 104 files. Mercurial's pattern handling is expressive.</para>
bos@559 105
bos@584 106 <para id="x_550">On Unix-like systems (Linux, MacOS, etc.), the job of
bos@559 107 matching file names to patterns normally falls to the shell. On
bos@559 108 these systems, you must explicitly tell Mercurial that a name is
bos@559 109 a pattern. On Windows, the shell does not expand patterns, so
bos@559 110 Mercurial will automatically identify names that are patterns,
bos@559 111 and expand them for you.</para>
bos@559 112
bos@584 113 <para id="x_551">To provide a pattern in place of a regular name on the
bos@559 114 command line, the mechanism is simple:</para>
bos@559 115 <programlisting>syntax:patternbody</programlisting>
bos@584 116 <para id="x_552">That is, a pattern is identified by a short text string that
bos@559 117 says what kind of pattern this is, followed by a colon, followed
bos@559 118 by the actual pattern.</para>
bos@559 119
bos@584 120 <para id="x_553">Mercurial supports two kinds of pattern syntax. The most
bos@559 121 frequently used is called <literal>glob</literal>; this is the
bos@559 122 same kind of pattern matching used by the Unix shell, and should
bos@559 123 be familiar to Windows command prompt users, too.</para>
bos@559 124
bos@584 125 <para id="x_554">When Mercurial does automatic pattern matching on Windows,
bos@559 126 it uses <literal>glob</literal> syntax. You can thus omit the
bos@559 127 <quote><literal>glob:</literal></quote> prefix on Windows, but
bos@559 128 it's safe to use it, too.</para>
bos@559 129
bos@584 130 <para id="x_555">The <literal>re</literal> syntax is more powerful; it lets
bos@559 131 you specify patterns using regular expressions, also known as
bos@559 132 regexps.</para>
bos@559 133
bos@584 134 <para id="x_556">By the way, in the examples that follow, notice that I'm
bos@559 135 careful to wrap all of my patterns in quote characters, so that
bos@559 136 they won't get expanded by the shell before Mercurial sees
bos@559 137 them.</para>
bos@559 138
bos@559 139 <sect2>
bos@559 140 <title>Shell-style <literal>glob</literal> patterns</title>
bos@559 141
bos@584 142 <para id="x_557">This is an overview of the kinds of patterns you can use
bos@559 143 when you're matching on glob patterns.</para>
bos@559 144
bos@584 145 <para id="x_558">The <quote><literal>*</literal></quote> character matches
bos@567 146 any string, within a single directory.</para>
bos@567 147
bos@567 148 &interaction.filenames.glob.star;
bos@559 149
bos@584 150 <para id="x_559">The <quote><literal>**</literal></quote> pattern matches
bos@559 151 any string, and crosses directory boundaries. It's not a
bos@559 152 standard Unix glob token, but it's accepted by several popular
bos@567 153 Unix shells, and is very useful.</para>
bos@567 154
bos@567 155 &interaction.filenames.glob.starstar;
bos@559 156
bos@584 157 <para id="x_55a">The <quote><literal>?</literal></quote> pattern matches
bos@567 158 any single character.</para>
bos@567 159
bos@567 160 &interaction.filenames.glob.question;
bos@559 161
bos@584 162 <para id="x_55b">The <quote><literal>[</literal></quote> character begins a
bos@559 163 <emphasis>character class</emphasis>. This matches any single
bos@559 164 character within the class. The class ends with a
bos@559 165 <quote><literal>]</literal></quote> character. A class may
bos@559 166 contain multiple <emphasis>range</emphasis>s of the form
bos@559 167 <quote><literal>a-f</literal></quote>, which is shorthand for
bos@567 168 <quote><literal>abcdef</literal></quote>.</para>
bos@567 169
bos@567 170 &interaction.filenames.glob.range;
bos@567 171
bos@584 172 <para id="x_55c">If the first character after the
bos@567 173 <quote><literal>[</literal></quote> in a character class is a
bos@567 174 <quote><literal>!</literal></quote>, it
bos@559 175 <emphasis>negates</emphasis> the class, making it match any
bos@559 176 single character not in the class.</para>
bos@559 177
bos@584 178 <para id="x_55d">A <quote><literal>{</literal></quote> begins a group of
bos@559 179 subpatterns, where the whole group matches if any subpattern
bos@559 180 in the group matches. The <quote><literal>,</literal></quote>
bos@567 181 character separates subpatterns, and
bos@567 182 <quote><literal>}</literal></quote> ends the group.</para>
bos@567 183
bos@567 184 &interaction.filenames.glob.group;
bos@559 185
bos@559 186 <sect3>
bos@559 187 <title>Watch out!</title>
bos@559 188
bos@584 189 <para id="x_55e">Don't forget that if you want to match a pattern in any
bos@559 190 directory, you should not be using the
bos@559 191 <quote><literal>*</literal></quote> match-any token, as this
bos@559 192 will only match within one directory. Instead, use the
bos@559 193 <quote><literal>**</literal></quote> token. This small
bos@567 194 example illustrates the difference between the two.</para>
bos@567 195
bos@567 196 &interaction.filenames.glob.star-starstar;
bos@559 197 </sect3>
bos@559 198 </sect2>
bos@675 199
bos@559 200 <sect2>
bos@559 201 <title>Regular expression matching with <literal>re</literal>
bos@559 202 patterns</title>
bos@559 203
bos@584 204 <para id="x_55f">Mercurial accepts the same regular expression syntax as
bos@559 205 the Python programming language (it uses Python's regexp
bos@559 206 engine internally). This is based on the Perl language's
bos@559 207 regexp syntax, which is the most popular dialect in use (it's
bos@559 208 also used in Java, for example).</para>
bos@559 209
bos@584 210 <para id="x_560">I won't discuss Mercurial's regexp dialect in any detail
bos@559 211 here, as regexps are not often used. Perl-style regexps are
bos@559 212 in any case already exhaustively documented on a multitude of
bos@559 213 web sites, and in many books. Instead, I will focus here on a
bos@559 214 few things you should know if you find yourself needing to use
bos@559 215 regexps with Mercurial.</para>
bos@559 216
bos@584 217 <para id="x_561">A regexp is matched against an entire file name, relative
bos@559 218 to the root of the repository. In other words, even if you're
bos@559 219 already in subbdirectory <filename
bos@559 220 class="directory">foo</filename>, if you want to match files
bos@559 221 under this directory, your pattern must start with
bos@559 222 <quote><literal>foo/</literal></quote>.</para>
bos@559 223
bos@584 224 <para id="x_562">One thing to note, if you're familiar with Perl-style
bos@559 225 regexps, is that Mercurial's are <emphasis>rooted</emphasis>.
bos@559 226 That is, a regexp starts matching against the beginning of a
bos@559 227 string; it doesn't look for a match anywhere within the
bos@559 228 string. To match anywhere in a string, start your pattern
bos@559 229 with <quote><literal>.*</literal></quote>.</para>
bos@559 230 </sect2>
bos@559 231 </sect1>
bos@675 232
bos@559 233 <sect1>
bos@559 234 <title>Filtering files</title>
bos@559 235
bos@584 236 <para id="x_563">Not only does Mercurial give you a variety of ways to
bos@559 237 specify files; it lets you further winnow those files using
bos@559 238 <emphasis>filters</emphasis>. Commands that work with file
bos@559 239 names accept two filtering options.</para>
bos@559 240 <itemizedlist>
bos@584 241 <listitem><para id="x_564"><option role="hg-opt-global">-I</option>, or
bos@559 242 <option role="hg-opt-global">--include</option>, lets you
bos@559 243 specify a pattern that file names must match in order to be
bos@559 244 processed.</para>
bos@559 245 </listitem>
bos@584 246 <listitem><para id="x_565"><option role="hg-opt-global">-X</option>, or
bos@559 247 <option role="hg-opt-global">--exclude</option>, gives you a
bos@559 248 way to <emphasis>avoid</emphasis> processing files, if they
bos@559 249 match this pattern.</para>
bos@559 250 </listitem></itemizedlist>
bos@584 251 <para id="x_566">You can provide multiple <option
bos@559 252 role="hg-opt-global">-I</option> and <option
bos@559 253 role="hg-opt-global">-X</option> options on the command line,
bos@559 254 and intermix them as you please. Mercurial interprets the
bos@559 255 patterns you provide using glob syntax by default (but you can
bos@559 256 use regexps if you need to).</para>
bos@559 257
bos@584 258 <para id="x_567">You can read a <option role="hg-opt-global">-I</option>
bos@559 259 filter as <quote>process only the files that match this
bos@567 260 filter</quote>.</para>
bos@567 261
bos@567 262 &interaction.filenames.filter.include;
bos@567 263
bos@584 264 <para id="x_568">The <option role="hg-opt-global">-X</option> filter is best
bos@559 265 read as <quote>process only the files that don't match this
bos@567 266 pattern</quote>.</para>
bos@567 267
bos@567 268 &interaction.filenames.filter.exclude;
bos@675 269 </sect1>
bos@675 270
bos@675 271 <sect1>
bos@675 272 <title>Permanently ignoring unwanted files and directories</title>
bos@675 273
bos@675 274 <para id="x_569">When you create a new repository, the chances are
bos@675 275 that over time it will grow to contain files that ought to
bos@675 276 <emphasis>not</emphasis> be managed by Mercurial, but which you
bos@675 277 don't want to see listed every time you run <command>hg
bos@675 278 status</command>. For instance, <quote>build products</quote>
bos@675 279 are files that are created as part of a build but which should
bos@675 280 not be managed by a revision control system. The most common
bos@675 281 build products are output files produced by software tools such
bos@675 282 as compilers. As another example, many text editors litter a
bos@675 283 directory with lock files, temporary working files, and backup
bos@675 284 files, which it also makes no sense to manage.</para>
bos@675 285
bos@676 286 <para id="x_6b4">To have Mercurial permanently ignore such files, create a
bos@675 287 file named <filename>.hgignore</filename> in the root of your
bos@675 288 repository. You <emphasis>should</emphasis> <command>hg
bos@675 289 add</command> this file so that it gets tracked with the rest of
bos@675 290 your repository contents, since your collaborators will probably
bos@675 291 find it useful too.</para>
bos@675 292
bos@676 293 <para id="x_6b5">By default, the <filename>.hgignore</filename> file should
bos@675 294 contain a list of regular expressions, one per line. Empty
bos@675 295 lines are skipped. Most people prefer to describe the files they
bos@675 296 want to ignore using the <quote>glob</quote> syntax that we
bos@675 297 described above, so a typical <filename>.hgignore</filename>
bos@675 298 file will start with this directive:</para>
bos@675 299
bos@675 300 <programlisting>syntax: glob</programlisting>
bos@675 301
bos@676 302 <para id="x_6b6">This tells Mercurial to interpret the lines that follow as
bos@675 303 glob patterns, not regular expressions.</para>
bos@675 304
bos@676 305 <para id="x_6b7">Here is a typical-looking <filename>.hgignore</filename>
bos@675 306 file.</para>
bos@675 307
bos@675 308 <programlisting>syntax: glob
bos@675 309 # This line is a comment, and will be skipped.
bos@675 310 # Empty lines are skipped too.
bos@675 311
bos@675 312 # Backup files left behind by the Emacs editor.
bos@675 313 *~
bos@675 314
bos@675 315 # Lock files used by the Emacs editor.
bos@675 316 # Notice that the "#" character is quoted with a backslash.
bos@675 317 # This prevents it from being interpreted as starting a comment.
bos@675 318 .\#*
bos@675 319
bos@675 320 # Temporary files used by the vim editor.
bos@675 321 .*.swp
bos@675 322
bos@675 323 # A hidden file created by the Mac OS X Finder.
bos@675 324 .DS_Store
bos@675 325 </programlisting>
bos@675 326 </sect1>
bos@675 327
bos@559 328 <sect1 id="sec:names:case">
bos@559 329 <title>Case sensitivity</title>
bos@559 330
bos@584 331 <para id="x_56a">If you're working in a mixed development environment that
bos@559 332 contains both Linux (or other Unix) systems and Macs or Windows
bos@559 333 systems, you should keep in the back of your mind the knowledge
bos@559 334 that they treat the case (<quote>N</quote> versus
bos@559 335 <quote>n</quote>) of file names in incompatible ways. This is
bos@559 336 not very likely to affect you, and it's easy to deal with if it
bos@559 337 does, but it could surprise you if you don't know about
bos@559 338 it.</para>
bos@559 339
bos@584 340 <para id="x_56b">Operating systems and filesystems differ in the way they
bos@559 341 handle the <emphasis>case</emphasis> of characters in file and
bos@559 342 directory names. There are three common ways to handle case in
bos@559 343 names.</para>
bos@559 344 <itemizedlist>
bos@584 345 <listitem><para id="x_56c">Completely case insensitive. Uppercase and
bos@559 346 lowercase versions of a letter are treated as identical,
bos@559 347 both when creating a file and during subsequent accesses.
bos@559 348 This is common on older DOS-based systems.</para>
bos@559 349 </listitem>
bos@584 350 <listitem><para id="x_56d">Case preserving, but insensitive. When a file
bos@559 351 or directory is created, the case of its name is stored, and
bos@559 352 can be retrieved and displayed by the operating system.
bos@559 353 When an existing file is being looked up, its case is
bos@559 354 ignored. This is the standard arrangement on Windows and
bos@559 355 MacOS. The names <filename>foo</filename> and
bos@559 356 <filename>FoO</filename> identify the same file. This
bos@559 357 treatment of uppercase and lowercase letters as
bos@559 358 interchangeable is also referred to as <emphasis>case
bos@559 359 folding</emphasis>.</para>
bos@559 360 </listitem>
bos@701 361 <listitem><para id="x_56e">Case sensitive. The case of a name
bos@701 362 is significant at all times. The names
bos@701 363 <filename>foo</filename> and <filename>FoO</filename>
bos@701 364 identify different files. This is the way Linux and Unix
bos@701 365 systems normally work.</para>
bos@559 366 </listitem></itemizedlist>
bos@559 367
bos@584 368 <para id="x_56f">On Unix-like systems, it is possible to have any or all of
bos@559 369 the above ways of handling case in action at once. For example,
bos@559 370 if you use a USB thumb drive formatted with a FAT32 filesystem
bos@559 371 on a Linux system, Linux will handle names on that filesystem in
bos@559 372 a case preserving, but insensitive, way.</para>
bos@559 373
bos@559 374 <sect2>
bos@559 375 <title>Safe, portable repository storage</title>
bos@559 376
bos@584 377 <para id="x_570">Mercurial's repository storage mechanism is <emphasis>case
bos@559 378 safe</emphasis>. It translates file names so that they can
bos@559 379 be safely stored on both case sensitive and case insensitive
bos@559 380 filesystems. This means that you can use normal file copying
bos@559 381 tools to transfer a Mercurial repository onto, for example, a
bos@559 382 USB thumb drive, and safely move that drive and repository
bos@559 383 back and forth between a Mac, a PC running Windows, and a
bos@559 384 Linux box.</para>
bos@559 385
bos@559 386 </sect2>
bos@559 387 <sect2>
bos@559 388 <title>Detecting case conflicts</title>
bos@559 389
bos@584 390 <para id="x_571">When operating in the working directory, Mercurial honours
bos@559 391 the naming policy of the filesystem where the working
bos@559 392 directory is located. If the filesystem is case preserving,
bos@559 393 but insensitive, Mercurial will treat names that differ only
bos@559 394 in case as the same.</para>
bos@559 395
bos@584 396 <para id="x_572">An important aspect of this approach is that it is
bos@559 397 possible to commit a changeset on a case sensitive (typically
bos@559 398 Linux or Unix) filesystem that will cause trouble for users on
bos@559 399 case insensitive (usually Windows and MacOS) users. If a
bos@559 400 Linux user commits changes to two files, one named
bos@559 401 <filename>myfile.c</filename> and the other named
bos@559 402 <filename>MyFile.C</filename>, they will be stored correctly
bos@559 403 in the repository. And in the working directories of other
bos@559 404 Linux users, they will be correctly represented as separate
bos@559 405 files.</para>
bos@559 406
bos@584 407 <para id="x_573">If a Windows or Mac user pulls this change, they will not
bos@559 408 initially have a problem, because Mercurial's repository
bos@559 409 storage mechanism is case safe. However, once they try to
bos@559 410 <command role="hg-cmd">hg update</command> the working
bos@559 411 directory to that changeset, or <command role="hg-cmd">hg
bos@559 412 merge</command> with that changeset, Mercurial will spot the
bos@559 413 conflict between the two file names that the filesystem would
bos@559 414 treat as the same, and forbid the update or merge from
bos@559 415 occurring.</para>
bos@559 416 </sect2>
bos@675 417
bos@559 418 <sect2>
bos@559 419 <title>Fixing a case conflict</title>
bos@559 420
bos@584 421 <para id="x_574">If you are using Windows or a Mac in a mixed environment
bos@559 422 where some of your collaborators are using Linux or Unix, and
bos@559 423 Mercurial reports a case folding conflict when you try to
bos@559 424 <command role="hg-cmd">hg update</command> or <command
bos@559 425 role="hg-cmd">hg merge</command>, the procedure to fix the
bos@559 426 problem is simple.</para>
bos@559 427
bos@584 428 <para id="x_575">Just find a nearby Linux or Unix box, clone the problem
bos@559 429 repository onto it, and use Mercurial's <command
bos@559 430 role="hg-cmd">hg rename</command> command to change the
bos@559 431 names of any offending files or directories so that they will
bos@559 432 no longer cause case folding conflicts. Commit this change,
bos@559 433 <command role="hg-cmd">hg pull</command> or <command
bos@559 434 role="hg-cmd">hg push</command> it across to your Windows or
bos@559 435 MacOS system, and <command role="hg-cmd">hg update</command>
bos@559 436 to the revision with the non-conflicting names.</para>
bos@559 437
bos@584 438 <para id="x_576">The changeset with case-conflicting names will remain in
bos@559 439 your project's history, and you still won't be able to
bos@559 440 <command role="hg-cmd">hg update</command> your working
bos@559 441 directory to that changeset on a Windows or MacOS system, but
bos@559 442 you can continue development unimpeded.</para>
bos@559 443 </sect2>
bos@559 444 </sect1>
bos@559 445 </chapter>
bos@559 446
bos@559 447 <!--
bos@559 448 local variables:
bos@559 449 sgml-parent-document: ("00book.xml" "book" "chapter")
bos@559 450 end:
bos@559 451 -->