hgbook

diff en/ch07-filenames.xml @ 977:719b03ea27c8

merge with Italian, and very (few) work on ch03
author Romain PELISSE <belaran@gmail.com>
date Fri Sep 04 16:33:46 2009 +0200 (2009-09-04)
parents 477d6a3e5023
children
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/en/ch07-filenames.xml	Fri Sep 04 16:33:46 2009 +0200
     1.3 @@ -0,0 +1,451 @@
     1.4 +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
     1.5 +
     1.6 +<chapter id="chap:names">
     1.7 +  <?dbhtml filename="file-names-and-pattern-matching.html"?>
     1.8 +  <title>File names and pattern matching</title>
     1.9 +
    1.10 +  <para id="x_543">Mercurial provides mechanisms that let you work with file
    1.11 +    names in a consistent and expressive way.</para>
    1.12 +
    1.13 +  <sect1>
    1.14 +    <title>Simple file naming</title>
    1.15 +
    1.16 +    <para id="x_544">Mercurial uses a unified piece of machinery <quote>under the
    1.17 +	hood</quote> to handle file names.  Every command behaves
    1.18 +      uniformly with respect to file names.  The way in which commands
    1.19 +      work with file names is as follows.</para>
    1.20 +
    1.21 +    <para id="x_545">If you explicitly name real files on the command line,
    1.22 +      Mercurial works with exactly those files, as you would expect.
    1.23 +      &interaction.filenames.files;</para>
    1.24 +
    1.25 +    <para id="x_546">When you provide a directory name, Mercurial will interpret
    1.26 +      this as <quote>operate on every file in this directory and its
    1.27 +	subdirectories</quote>. Mercurial traverses the files and
    1.28 +      subdirectories in a directory in alphabetical order.  When it
    1.29 +      encounters a subdirectory, it will traverse that subdirectory
    1.30 +      before continuing with the current directory.</para>
    1.31 +
    1.32 +      &interaction.filenames.dirs;
    1.33 +  </sect1>
    1.34 +
    1.35 +  <sect1>
    1.36 +    <title>Running commands without any file names</title>
    1.37 +
    1.38 +    <para id="x_547">Mercurial's commands that work with file names have useful
    1.39 +      default behaviors when you invoke them without providing any
    1.40 +      file names or patterns.  What kind of behavior you should
    1.41 +      expect depends on what the command does.  Here are a few rules
    1.42 +      of thumb you can use to predict what a command is likely to do
    1.43 +      if you don't give it any names to work with.</para>
    1.44 +    <itemizedlist>
    1.45 +      <listitem><para id="x_548">Most commands will operate on the entire working
    1.46 +	  directory. This is what the <command role="hg-cmd">hg
    1.47 +	    add</command> command does, for example.</para>
    1.48 +      </listitem>
    1.49 +      <listitem><para id="x_549">If the command has effects that are difficult or
    1.50 +	  impossible to reverse, it will force you to explicitly
    1.51 +	  provide at least one name or pattern (see below).  This
    1.52 +	  protects you from accidentally deleting files by running
    1.53 +	  <command role="hg-cmd">hg remove</command> with no
    1.54 +	  arguments, for example.</para>
    1.55 +      </listitem></itemizedlist>
    1.56 +
    1.57 +    <para id="x_54a">It's easy to work around these default behaviors if they
    1.58 +      don't suit you.  If a command normally operates on the whole
    1.59 +      working directory, you can invoke it on just the current
    1.60 +      directory and its subdirectories by giving it the name
    1.61 +      <quote><filename class="directory">.</filename></quote>.</para>
    1.62 +
    1.63 +    &interaction.filenames.wdir-subdir;
    1.64 +
    1.65 +    <para id="x_54b">Along the same lines, some commands normally print file
    1.66 +      names relative to the root of the repository, even if you're
    1.67 +      invoking them from a subdirectory.  Such a command will print
    1.68 +      file names relative to your subdirectory if you give it explicit
    1.69 +      names.  Here, we're going to run <command role="hg-cmd">hg
    1.70 +	status</command> from a subdirectory, and get it to operate on
    1.71 +      the entire working directory while printing file names relative
    1.72 +      to our subdirectory, by passing it the output of the <command
    1.73 +	role="hg-cmd">hg root</command> command.</para>
    1.74 +
    1.75 +      &interaction.filenames.wdir-relname;
    1.76 +  </sect1>
    1.77 +
    1.78 +  <sect1>
    1.79 +    <title>Telling you what's going on</title>
    1.80 +
    1.81 +    <para id="x_54c">The <command role="hg-cmd">hg add</command> example in the
    1.82 +      preceding section illustrates something else that's helpful
    1.83 +      about Mercurial commands.  If a command operates on a file that
    1.84 +      you didn't name explicitly on the command line, it will usually
    1.85 +      print the name of the file, so that you will not be surprised
    1.86 +      what's going on.</para>
    1.87 +
    1.88 +    <para id="x_54d">The principle here is of <emphasis>least
    1.89 +	surprise</emphasis>.  If you've exactly named a file on the
    1.90 +      command line, there's no point in repeating it back at you.  If
    1.91 +      Mercurial is acting on a file <emphasis>implicitly</emphasis>, e.g.
    1.92 +      because you provided no names, or a directory, or a pattern (see
    1.93 +      below), it is safest to tell you what files it's operating on.</para>
    1.94 +
    1.95 +    <para id="x_54e">For commands that behave this way, you can silence them
    1.96 +      using the <option role="hg-opt-global">-q</option> option.  You
    1.97 +      can also get them to print the name of every file, even those
    1.98 +      you've named explicitly, using the <option
    1.99 +	role="hg-opt-global">-v</option> option.</para>
   1.100 +  </sect1>
   1.101 +
   1.102 +  <sect1>
   1.103 +    <title>Using patterns to identify files</title>
   1.104 +
   1.105 +    <para id="x_54f">In addition to working with file and directory names,
   1.106 +      Mercurial lets you use <emphasis>patterns</emphasis> to identify
   1.107 +      files.  Mercurial's pattern handling is expressive.</para>
   1.108 +
   1.109 +    <para id="x_550">On Unix-like systems (Linux, MacOS, etc.), the job of
   1.110 +      matching file names to patterns normally falls to the shell.  On
   1.111 +      these systems, you must explicitly tell Mercurial that a name is
   1.112 +      a pattern.  On Windows, the shell does not expand patterns, so
   1.113 +      Mercurial will automatically identify names that are patterns,
   1.114 +      and expand them for you.</para>
   1.115 +
   1.116 +    <para id="x_551">To provide a pattern in place of a regular name on the
   1.117 +      command line, the mechanism is simple:</para>
   1.118 +    <programlisting>syntax:patternbody</programlisting>
   1.119 +    <para id="x_552">That is, a pattern is identified by a short text string that
   1.120 +      says what kind of pattern this is, followed by a colon, followed
   1.121 +      by the actual pattern.</para>
   1.122 +
   1.123 +    <para id="x_553">Mercurial supports two kinds of pattern syntax.  The most
   1.124 +      frequently used is called <literal>glob</literal>; this is the
   1.125 +      same kind of pattern matching used by the Unix shell, and should
   1.126 +      be familiar to Windows command prompt users, too.</para>
   1.127 +
   1.128 +    <para id="x_554">When Mercurial does automatic pattern matching on Windows,
   1.129 +      it uses <literal>glob</literal> syntax.  You can thus omit the
   1.130 +      <quote><literal>glob:</literal></quote> prefix on Windows, but
   1.131 +      it's safe to use it, too.</para>
   1.132 +
   1.133 +    <para id="x_555">The <literal>re</literal> syntax is more powerful; it lets
   1.134 +      you specify patterns using regular expressions, also known as
   1.135 +      regexps.</para>
   1.136 +
   1.137 +    <para id="x_556">By the way, in the examples that follow, notice that I'm
   1.138 +      careful to wrap all of my patterns in quote characters, so that
   1.139 +      they won't get expanded by the shell before Mercurial sees
   1.140 +      them.</para>
   1.141 +
   1.142 +    <sect2>
   1.143 +      <title>Shell-style <literal>glob</literal> patterns</title>
   1.144 +
   1.145 +      <para id="x_557">This is an overview of the kinds of patterns you can use
   1.146 +	when you're matching on glob patterns.</para>
   1.147 +
   1.148 +      <para id="x_558">The <quote><literal>*</literal></quote> character matches
   1.149 +	any string, within a single directory.</para>
   1.150 +
   1.151 +      &interaction.filenames.glob.star;
   1.152 +
   1.153 +      <para id="x_559">The <quote><literal>**</literal></quote> pattern matches
   1.154 +	any string, and crosses directory boundaries.  It's not a
   1.155 +	standard Unix glob token, but it's accepted by several popular
   1.156 +	Unix shells, and is very useful.</para>
   1.157 +
   1.158 +      &interaction.filenames.glob.starstar;
   1.159 +
   1.160 +      <para id="x_55a">The <quote><literal>?</literal></quote> pattern matches
   1.161 +	any single character.</para>
   1.162 +
   1.163 +      &interaction.filenames.glob.question;
   1.164 +
   1.165 +      <para id="x_55b">The <quote><literal>[</literal></quote> character begins a
   1.166 +	<emphasis>character class</emphasis>.  This matches any single
   1.167 +	character within the class.  The class ends with a
   1.168 +	<quote><literal>]</literal></quote> character.  A class may
   1.169 +	contain multiple <emphasis>range</emphasis>s of the form
   1.170 +	<quote><literal>a-f</literal></quote>, which is shorthand for
   1.171 +	<quote><literal>abcdef</literal></quote>.</para>
   1.172 +
   1.173 +	&interaction.filenames.glob.range;
   1.174 +
   1.175 +      <para id="x_55c">If the first character after the
   1.176 +	<quote><literal>[</literal></quote> in a character class is a
   1.177 +	<quote><literal>!</literal></quote>, it
   1.178 +	<emphasis>negates</emphasis> the class, making it match any
   1.179 +	single character not in the class.</para>
   1.180 +
   1.181 +      <para id="x_55d">A <quote><literal>{</literal></quote> begins a group of
   1.182 +	subpatterns, where the whole group matches if any subpattern
   1.183 +	in the group matches.  The <quote><literal>,</literal></quote>
   1.184 +	character separates subpatterns, and
   1.185 +	<quote><literal>}</literal></quote> ends the group.</para>
   1.186 +
   1.187 +      &interaction.filenames.glob.group;
   1.188 +
   1.189 +      <sect3>
   1.190 +	<title>Watch out!</title>
   1.191 +
   1.192 +	<para id="x_55e">Don't forget that if you want to match a pattern in any
   1.193 +	  directory, you should not be using the
   1.194 +	  <quote><literal>*</literal></quote> match-any token, as this
   1.195 +	  will only match within one directory.  Instead, use the
   1.196 +	  <quote><literal>**</literal></quote> token.  This small
   1.197 +	  example illustrates the difference between the two.</para>
   1.198 +
   1.199 +	  &interaction.filenames.glob.star-starstar;
   1.200 +      </sect3>
   1.201 +    </sect2>
   1.202 +
   1.203 +    <sect2>
   1.204 +      <title>Regular expression matching with <literal>re</literal>
   1.205 +	patterns</title>
   1.206 +
   1.207 +      <para id="x_55f">Mercurial accepts the same regular expression syntax as
   1.208 +	the Python programming language (it uses Python's regexp
   1.209 +	engine internally). This is based on the Perl language's
   1.210 +	regexp syntax, which is the most popular dialect in use (it's
   1.211 +	also used in Java, for example).</para>
   1.212 +
   1.213 +      <para id="x_560">I won't discuss Mercurial's regexp dialect in any detail
   1.214 +	here, as regexps are not often used.  Perl-style regexps are
   1.215 +	in any case already exhaustively documented on a multitude of
   1.216 +	web sites, and in many books.  Instead, I will focus here on a
   1.217 +	few things you should know if you find yourself needing to use
   1.218 +	regexps with Mercurial.</para>
   1.219 +
   1.220 +      <para id="x_561">A regexp is matched against an entire file name, relative
   1.221 +	to the root of the repository.  In other words, even if you're
   1.222 +	already in subbdirectory <filename
   1.223 +	  class="directory">foo</filename>, if you want to match files
   1.224 +	under this directory, your pattern must start with
   1.225 +	<quote><literal>foo/</literal></quote>.</para>
   1.226 +
   1.227 +      <para id="x_562">One thing to note, if you're familiar with Perl-style
   1.228 +	regexps, is that Mercurial's are <emphasis>rooted</emphasis>.
   1.229 +	That is, a regexp starts matching against the beginning of a
   1.230 +	string; it doesn't look for a match anywhere within the
   1.231 +	string.  To match anywhere in a string, start your pattern
   1.232 +	with <quote><literal>.*</literal></quote>.</para>
   1.233 +    </sect2>
   1.234 +  </sect1>
   1.235 +
   1.236 +  <sect1>
   1.237 +    <title>Filtering files</title>
   1.238 +
   1.239 +    <para id="x_563">Not only does Mercurial give you a variety of ways to
   1.240 +      specify files; it lets you further winnow those files using
   1.241 +      <emphasis>filters</emphasis>.  Commands that work with file
   1.242 +      names accept two filtering options.</para>
   1.243 +    <itemizedlist>
   1.244 +      <listitem><para id="x_564"><option role="hg-opt-global">-I</option>, or
   1.245 +	  <option role="hg-opt-global">--include</option>, lets you
   1.246 +	  specify a pattern that file names must match in order to be
   1.247 +	  processed.</para>
   1.248 +      </listitem>
   1.249 +      <listitem><para id="x_565"><option role="hg-opt-global">-X</option>, or
   1.250 +	  <option role="hg-opt-global">--exclude</option>, gives you a
   1.251 +	  way to <emphasis>avoid</emphasis> processing files, if they
   1.252 +	  match this pattern.</para>
   1.253 +      </listitem></itemizedlist>
   1.254 +    <para id="x_566">You can provide multiple <option
   1.255 +	role="hg-opt-global">-I</option> and <option
   1.256 +	role="hg-opt-global">-X</option> options on the command line,
   1.257 +      and intermix them as you please.  Mercurial interprets the
   1.258 +      patterns you provide using glob syntax by default (but you can
   1.259 +      use regexps if you need to).</para>
   1.260 +
   1.261 +    <para id="x_567">You can read a <option role="hg-opt-global">-I</option>
   1.262 +      filter as <quote>process only the files that match this
   1.263 +	filter</quote>.</para>
   1.264 +
   1.265 +    &interaction.filenames.filter.include;
   1.266 +
   1.267 +    <para id="x_568">The <option role="hg-opt-global">-X</option> filter is best
   1.268 +      read as <quote>process only the files that don't match this
   1.269 +	pattern</quote>.</para>
   1.270 +
   1.271 +    &interaction.filenames.filter.exclude;
   1.272 +  </sect1>
   1.273 +
   1.274 +  <sect1>
   1.275 +    <title>Permanently ignoring unwanted files and directories</title>
   1.276 +
   1.277 +    <para id="x_569">When you create a new repository, the chances are
   1.278 +      that over time it will grow to contain files that ought to
   1.279 +      <emphasis>not</emphasis> be managed by Mercurial, but which you
   1.280 +      don't want to see listed every time you run <command>hg
   1.281 +	status</command>.  For instance, <quote>build products</quote>
   1.282 +      are files that are created as part of a build but which should
   1.283 +      not be managed by a revision control system.  The most common
   1.284 +      build products are output files produced by software tools such
   1.285 +      as compilers.  As another example, many text editors litter a
   1.286 +      directory with lock files, temporary working files, and backup
   1.287 +      files, which it also makes no sense to manage.</para>
   1.288 +
   1.289 +    <para id="x_6b4">To have Mercurial permanently ignore such files, create a
   1.290 +      file named <filename>.hgignore</filename> in the root of your
   1.291 +      repository.  You <emphasis>should</emphasis> <command>hg
   1.292 +      add</command> this file so that it gets tracked with the rest of
   1.293 +      your repository contents, since your collaborators will probably
   1.294 +      find it useful too.</para>
   1.295 +
   1.296 +    <para id="x_6b5">By default, the <filename>.hgignore</filename> file should
   1.297 +      contain a list of regular expressions, one per line.  Empty
   1.298 +      lines are skipped. Most people prefer to describe the files they
   1.299 +      want to ignore using the <quote>glob</quote> syntax that we
   1.300 +      described above, so a typical <filename>.hgignore</filename>
   1.301 +      file will start with this directive:</para>
   1.302 +
   1.303 +    <programlisting>syntax: glob</programlisting>
   1.304 +
   1.305 +    <para id="x_6b6">This tells Mercurial to interpret the lines that follow as
   1.306 +      glob patterns, not regular expressions.</para>
   1.307 +
   1.308 +    <para id="x_6b7">Here is a typical-looking <filename>.hgignore</filename>
   1.309 +      file.</para>
   1.310 +
   1.311 +    <programlisting>syntax: glob
   1.312 +# This line is a comment, and will be skipped.
   1.313 +# Empty lines are skipped too.
   1.314 +
   1.315 +# Backup files left behind by the Emacs editor.
   1.316 +*~
   1.317 +
   1.318 +# Lock files used by the Emacs editor.
   1.319 +# Notice that the "#" character is quoted with a backslash.
   1.320 +# This prevents it from being interpreted as starting a comment.
   1.321 +.\#*
   1.322 +
   1.323 +# Temporary files used by the vim editor.
   1.324 +.*.swp
   1.325 +
   1.326 +# A hidden file created by the Mac OS X Finder.
   1.327 +.DS_Store
   1.328 +</programlisting>
   1.329 +  </sect1>
   1.330 +
   1.331 +  <sect1 id="sec:names:case">
   1.332 +    <title>Case sensitivity</title>
   1.333 +
   1.334 +    <para id="x_56a">If you're working in a mixed development environment that
   1.335 +      contains both Linux (or other Unix) systems and Macs or Windows
   1.336 +      systems, you should keep in the back of your mind the knowledge
   1.337 +      that they treat the case (<quote>N</quote> versus
   1.338 +      <quote>n</quote>) of file names in incompatible ways.  This is
   1.339 +      not very likely to affect you, and it's easy to deal with if it
   1.340 +      does, but it could surprise you if you don't know about
   1.341 +      it.</para>
   1.342 +
   1.343 +    <para id="x_56b">Operating systems and filesystems differ in the way they
   1.344 +      handle the <emphasis>case</emphasis> of characters in file and
   1.345 +      directory names.  There are three common ways to handle case in
   1.346 +      names.</para>
   1.347 +    <itemizedlist>
   1.348 +      <listitem><para id="x_56c">Completely case insensitive.  Uppercase and
   1.349 +	  lowercase versions of a letter are treated as identical,
   1.350 +	  both when creating a file and during subsequent accesses.
   1.351 +	  This is common on older DOS-based systems.</para>
   1.352 +      </listitem>
   1.353 +      <listitem><para id="x_56d">Case preserving, but insensitive.  When a file
   1.354 +	  or directory is created, the case of its name is stored, and
   1.355 +	  can be retrieved and displayed by the operating system.
   1.356 +	  When an existing file is being looked up, its case is
   1.357 +	  ignored.  This is the standard arrangement on Windows and
   1.358 +	  MacOS.  The names <filename>foo</filename> and
   1.359 +	  <filename>FoO</filename> identify the same file.  This
   1.360 +	  treatment of uppercase and lowercase letters as
   1.361 +	  interchangeable is also referred to as <emphasis>case
   1.362 +	    folding</emphasis>.</para>
   1.363 +      </listitem>
   1.364 +      <listitem><para id="x_56e">Case sensitive.  The case of a name
   1.365 +	  is significant at all times. The names
   1.366 +	  <filename>foo</filename> and <filename>FoO</filename>
   1.367 +	  identify different files.  This is the way Linux and Unix
   1.368 +	  systems normally work.</para>
   1.369 +      </listitem></itemizedlist>
   1.370 +
   1.371 +    <para id="x_56f">On Unix-like systems, it is possible to have any or all of
   1.372 +      the above ways of handling case in action at once.  For example,
   1.373 +      if you use a USB thumb drive formatted with a FAT32 filesystem
   1.374 +      on a Linux system, Linux will handle names on that filesystem in
   1.375 +      a case preserving, but insensitive, way.</para>
   1.376 +
   1.377 +    <sect2>
   1.378 +      <title>Safe, portable repository storage</title>
   1.379 +
   1.380 +      <para id="x_570">Mercurial's repository storage mechanism is <emphasis>case
   1.381 +	  safe</emphasis>.  It translates file names so that they can
   1.382 +	be safely stored on both case sensitive and case insensitive
   1.383 +	filesystems.  This means that you can use normal file copying
   1.384 +	tools to transfer a Mercurial repository onto, for example, a
   1.385 +	USB thumb drive, and safely move that drive and repository
   1.386 +	back and forth between a Mac, a PC running Windows, and a
   1.387 +	Linux box.</para>
   1.388 +
   1.389 +    </sect2>
   1.390 +    <sect2>
   1.391 +      <title>Detecting case conflicts</title>
   1.392 +
   1.393 +      <para id="x_571">When operating in the working directory, Mercurial honours
   1.394 +	the naming policy of the filesystem where the working
   1.395 +	directory is located.  If the filesystem is case preserving,
   1.396 +	but insensitive, Mercurial will treat names that differ only
   1.397 +	in case as the same.</para>
   1.398 +
   1.399 +      <para id="x_572">An important aspect of this approach is that it is
   1.400 +	possible to commit a changeset on a case sensitive (typically
   1.401 +	Linux or Unix) filesystem that will cause trouble for users on
   1.402 +	case insensitive (usually Windows and MacOS) users.  If a
   1.403 +	Linux user commits changes to two files, one named
   1.404 +	<filename>myfile.c</filename> and the other named
   1.405 +	<filename>MyFile.C</filename>, they will be stored correctly
   1.406 +	in the repository.  And in the working directories of other
   1.407 +	Linux users, they will be correctly represented as separate
   1.408 +	files.</para>
   1.409 +
   1.410 +      <para id="x_573">If a Windows or Mac user pulls this change, they will not
   1.411 +	initially have a problem, because Mercurial's repository
   1.412 +	storage mechanism is case safe.  However, once they try to
   1.413 +	<command role="hg-cmd">hg update</command> the working
   1.414 +	directory to that changeset, or <command role="hg-cmd">hg
   1.415 +	  merge</command> with that changeset, Mercurial will spot the
   1.416 +	conflict between the two file names that the filesystem would
   1.417 +	treat as the same, and forbid the update or merge from
   1.418 +	occurring.</para>
   1.419 +    </sect2>
   1.420 +
   1.421 +    <sect2>
   1.422 +      <title>Fixing a case conflict</title>
   1.423 +
   1.424 +      <para id="x_574">If you are using Windows or a Mac in a mixed environment
   1.425 +	where some of your collaborators are using Linux or Unix, and
   1.426 +	Mercurial reports a case folding conflict when you try to
   1.427 +	<command role="hg-cmd">hg update</command> or <command
   1.428 +	  role="hg-cmd">hg merge</command>, the procedure to fix the
   1.429 +	problem is simple.</para>
   1.430 +
   1.431 +      <para id="x_575">Just find a nearby Linux or Unix box, clone the problem
   1.432 +	repository onto it, and use Mercurial's <command
   1.433 +	  role="hg-cmd">hg rename</command> command to change the
   1.434 +	names of any offending files or directories so that they will
   1.435 +	no longer cause case folding conflicts.  Commit this change,
   1.436 +	<command role="hg-cmd">hg pull</command> or <command
   1.437 +	  role="hg-cmd">hg push</command> it across to your Windows or
   1.438 +	MacOS system, and <command role="hg-cmd">hg update</command>
   1.439 +	to the revision with the non-conflicting names.</para>
   1.440 +
   1.441 +      <para id="x_576">The changeset with case-conflicting names will remain in
   1.442 +	your project's history, and you still won't be able to
   1.443 +	<command role="hg-cmd">hg update</command> your working
   1.444 +	directory to that changeset on a Windows or MacOS system, but
   1.445 +	you can continue development unimpeded.</para>
   1.446 +    </sect2>
   1.447 +  </sect1>
   1.448 +</chapter>
   1.449 +
   1.450 +<!--
   1.451 +local variables: 
   1.452 +sgml-parent-document: ("00book.xml" "book" "chapter")
   1.453 +end:
   1.454 +-->