hgbook

annotate en/filenames.tex @ 161:7f07aca44938

Write up the unpleasant effects of change ef1f1a4b2efb in the hg tree.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon Mar 26 21:58:24 2007 -0700 (2007-03-26)
parents 1e013fbe35f7
children d3dd1bedba3c
rev   line source
bos@133 1 \chapter{File names and pattern matching}
bos@133 2 \label{chap:names}
bos@133 3
bos@133 4 Mercurial provides mechanisms that let you work with file names in a
bos@133 5 consistent and expressive way.
bos@133 6
bos@133 7 \section{Simple file naming}
bos@133 8
bos@133 9 Mercurial uses a unified piece of machinery ``under the hood'' to
bos@133 10 handle file names. Every command behaves uniformly with respect to
bos@133 11 file names. The way in which commands work with file names is as
bos@133 12 follows.
bos@133 13
bos@133 14 If you explicitly name real files on the command line, Mercurial works
bos@133 15 with exactly those files, as you would expect.
bos@133 16 \interaction{filenames.files}
bos@133 17
bos@133 18 When you provide a directory name, Mercurial will interpret this as
bos@133 19 ``operate on every file in this directory and its subdirectories''.
bos@133 20 Mercurial traverses the files and subdirectories in a directory in
bos@133 21 alphabetical order. When it encounters a subdirectory, it will
bos@133 22 traverse that subdirectory before continuing with the current
bos@133 23 directory.
bos@133 24 \interaction{filenames.dirs}
bos@133 25
bos@133 26 \section{Running commands without any file names}
bos@133 27
bos@133 28 Mercurial's commands that work with file names have useful default
bos@133 29 behaviours when you invoke them without providing any file names or
bos@133 30 patterns. What kind of behaviour you should expect depends on what
bos@133 31 the command does. Here are a few rules of thumb you can use to
bos@133 32 predict what a command is likely to do if you don't give it any names
bos@133 33 to work with.
bos@133 34 \begin{itemize}
bos@133 35 \item Most commands will operate on the entire working directory.
bos@133 36 This is what the \hgcmd{add} command does, for example.
bos@133 37 \item If the command has effects that are difficult or impossible to
bos@133 38 reverse, it will force you to explicitly provide at least one name
bos@133 39 or pattern (see below). This protects you from accidentally
bos@133 40 deleting files by running \hgcmd{remove} with no arguments, for
bos@133 41 example.
bos@133 42 \end{itemize}
bos@133 43
bos@133 44 It's easy to work around these default behaviours if they don't suit
bos@133 45 you. If a command normally operates on the whole working directory,
bos@133 46 you can invoke it on just the current directory and its subdirectories
bos@133 47 by giving it the name ``\dirname{.}''.
bos@133 48 \interaction{filenames.wdir-subdir}
bos@133 49
bos@133 50 Along the same lines, some commands normally print file names relative
bos@133 51 to the root of the repository, even if you're invoking them from a
bos@133 52 subdirectory. Such a command will print file names relative to your
bos@133 53 subdirectory if you give it explicit names. Here, we're going to run
bos@133 54 \hgcmd{status} from a subdirectory, and get it to operate on the
bos@133 55 entire working directory while printing file names relative to our
bos@133 56 subdirectory, by passing it the output of the \hgcmd{root} command.
bos@133 57 \interaction{filenames.wdir-relname}
bos@133 58
bos@133 59 \section{Telling you what's going on}
bos@133 60
bos@133 61 The \hgcmd{add} example in the preceding section illustrates something
bos@133 62 else that's helpful about Mercurial commands. If a command operates
bos@133 63 on a file that you didn't name explicitly on the command line, it will
bos@133 64 usually print the name of the file, so that you will not be surprised
bos@133 65 what's going on.
bos@133 66
bos@133 67 The principle here is of \emph{least surprise}. If you've exactly
bos@133 68 named a file on the command line, there's no point in repeating it
bos@133 69 back at you. If Mercurial is acting on a file \emph{implicitly},
bos@133 70 because you provided no names, or a directory, or a pattern (see
bos@133 71 below), it's safest to tell you what it's doing.
bos@133 72
bos@133 73 For commands that behave this way, you can silence them using the
bos@133 74 \hggopt{-q} option. You can also get them to print the name of every
bos@133 75 file, even those you've named explicitly, using the \hggopt{-v}
bos@133 76 option.
bos@133 77
bos@133 78 \section{Using patterns to identify files}
bos@133 79
bos@133 80 In addition to working with file and directory names, Mercurial lets
bos@133 81 you use \emph{patterns} to identify files. Mercurial's pattern
bos@133 82 handling is expressive.
bos@133 83
bos@133 84 On Unix-like systems (Linux, MacOS, etc.), the job of matching file
bos@133 85 names to patterns normally falls to the shell. On these systems, you
bos@133 86 must explicitly tell Mercurial that a name is a pattern. On Windows,
bos@133 87 the shell does not expand patterns, so Mercurial will automatically
bos@133 88 identify names that are patterns, and expand them for you.
bos@133 89
bos@133 90 To provide a pattern in place of a regular name on the command line,
bos@133 91 the mechanism is simple:
bos@133 92 \begin{codesample2}
bos@133 93 syntax:patternbody
bos@133 94 \end{codesample2}
bos@133 95 That is, a pattern is identified by a short text string that says what
bos@133 96 kind of pattern this is, followed by a colon, followed by the actual
bos@133 97 pattern.
bos@133 98
bos@133 99 Mercurial supports two kinds of pattern syntax. The most frequently
bos@133 100 used is called \texttt{glob}; this is the same kind of pattern
bos@133 101 matching used by the Unix shell, and should be familiar to Windows
bos@133 102 command prompt users, too.
bos@133 103
bos@133 104 When Mercurial does automatic pattern matching on Windows, it uses
bos@133 105 \texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix
bos@133 106 on Windows, but it's safe to use it, too.
bos@133 107
bos@133 108 The \texttt{re} syntax is more powerful; it lets you specify patterns
bos@133 109 using regular expressions, also known as regexps.
bos@133 110
bos@133 111 By the way, in the examples that follow, notice that I'm careful to
bos@133 112 wrap all of my patterns in quote characters, so that they won't get
bos@133 113 expanded by the shell before Mercurial sees them.
bos@133 114
bos@133 115 \subsection{Shell-style \texttt{glob} patterns}
bos@133 116
bos@133 117 This is an overview of the kinds of patterns you can use when you're
bos@133 118 matching on glob patterns.
bos@133 119
bos@133 120 The ``\texttt{*}'' character matches any string, within a single
bos@133 121 directory.
bos@133 122 \interaction{filenames.glob.star}
bos@133 123
bos@133 124 The ``\texttt{**}'' pattern matches any string, and crosses directory
bos@133 125 boundaries. It's not a standard Unix glob token, but it's accepted by
bos@133 126 several popular Unix shells, and is very useful.
bos@133 127 \interaction{filenames.glob.starstar}
bos@133 128
bos@133 129 The ``\texttt{?}'' pattern matches any single character.
bos@133 130 \interaction{filenames.glob.question}
bos@133 131
bos@133 132 The ``\texttt{[}'' character begins a \emph{character class}. This
bos@133 133 matches any single character within the class. The class ends with a
bos@133 134 ``\texttt{]}'' character. A class may contain multiple \emph{range}s
bos@133 135 of the form ``\texttt{a-f}'', which is shorthand for
bos@133 136 ``\texttt{abcdef}''.
bos@133 137 \interaction{filenames.glob.range}
bos@133 138 If the first character after the ``\texttt{[}'' in a character class
bos@133 139 is a ``\texttt{!}'', it \emph{negates} the class, making it match any
bos@133 140 single character not in the class.
bos@133 141
bos@133 142 A ``\texttt{\{}'' begins a group of subpatterns, where the whole group
bos@133 143 matches if any subpattern in the group matches. The ``\texttt{,}''
bos@133 144 character separates subpatterns, and ``\texttt{\}}'' ends the group.
bos@133 145 \interaction{filenames.glob.group}
bos@133 146
bos@133 147 \subsubsection{Watch out!}
bos@133 148
bos@133 149 Don't forget that if you want to match a pattern in any directory, you
bos@133 150 should not be using the ``\texttt{*}'' match-any token, as this will
bos@133 151 only match within one directory. Instead, use the ``\texttt{**}''
bos@133 152 token. This small example illustrates the difference between the two.
bos@133 153 \interaction{filenames.glob.star-starstar}
bos@133 154
bos@161 155 When you're writing a glob pattern, bear in mind that Mercurial will
bos@161 156 treat a pattern that matches a directory name as ``match every file
bos@161 157 under that directory''. For example, a glob pattern of
bos@161 158 ``\texttt{**c}'' means \emph{both} ``match files ending in
bos@161 159 `\texttt{c}''' ``any file under all directories that end in
bos@161 160 `\texttt{c}'''. I personally find this behaviour counterintuitive.
bos@161 161 If you need to write a pattern that means ``match \emph{only} files'',
bos@161 162 you'll need to express it as a regular expression instead; see below.
bos@161 163
bos@133 164 \subsection{Regular expression matching with \texttt{re} patterns}
bos@133 165
bos@133 166 Mercurial accepts the same regular expression syntax as the Python
bos@133 167 programming language (it uses Python's regexp engine internally).
bos@133 168 This is based on the Perl language's regexp syntax, which is the most
bos@133 169 popular dialect in use (it's also used in Java, for example).
bos@133 170
bos@133 171 I won't discuss Mercurial's regexp dialect in any detail here, as
bos@133 172 regexps are not often used. Perl-style regexps are in any case
bos@133 173 already exhaustively documented on a multitude of web sites, and in
bos@133 174 many books. Instead, I will focus here on a few things you should
bos@133 175 know if you find yourself needing to use regexps with Mercurial.
bos@133 176
bos@133 177 A regexp is matched against an entire file name, relative to the root
bos@133 178 of the repository. In other words, even if you're already in
bos@133 179 subbdirectory \dirname{foo}, if you want to match files under this
bos@133 180 directory, your pattern must start with ``\texttt{foo/}''.
bos@133 181
bos@133 182 One thing to note, if you're familiar with Perl-style regexps, is that
bos@133 183 Mercurial's are \emph{rooted}. That is, a regexp starts matching
bos@133 184 against the beginning of a string; it doesn't look for a match
bos@133 185 anywhere within the string it. To match anywhere in a string, start
bos@133 186 your pattern with ``\texttt{.*}''.
bos@133 187
bos@133 188 \section{Filtering files}
bos@133 189
bos@133 190 Not only does Mercurial give you a variety of ways to specify files;
bos@133 191 it lets you further winnow those files using \emph{filters}. Commands
bos@133 192 that work with file names accept two filtering options.
bos@133 193 \begin{itemize}
bos@133 194 \item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern
bos@133 195 that file names must match in order to be processed.
bos@133 196 \item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to
bos@133 197 \emph{avoid} processing files, if they match this pattern.
bos@133 198 \end{itemize}
bos@133 199 You can provide multiple \hggopt{-I} and \hggopt{-X} options on the
bos@133 200 command line, and intermix them as you please. Mercurial interprets
bos@133 201 the patterns you provide using glob syntax by default (but you can use
bos@133 202 regexps if you need to).
bos@133 203
bos@133 204 You can read a \hggopt{-I} filter as ``process only the files that
bos@133 205 match this filter''.
bos@133 206 \interaction{filenames.filter.include}
bos@133 207 The \hggopt{-X} filter is best read as ``process only the files that
bos@133 208 don't match this pattern''.
bos@133 209 \interaction{filenames.filter.exclude}
bos@133 210
bos@133 211 \section{Ignoring unwanted files and directories}
bos@133 212
bos@133 213 XXX.
bos@133 214
bos@133 215 \section{Case sensitivity}
bos@133 216 \label{sec:names:case}
bos@133 217
bos@133 218 If you're working in a mixed development environment that contains
bos@133 219 both Linux (or other Unix) systems and Macs or Windows systems, you
bos@133 220 should keep in the back of your mind the knowledge that they treat the
bos@133 221 case (``N'' versus ``n'') of file names in incompatible ways. This is
bos@133 222 not very likely to affect you, and it's easy to deal with if it does,
bos@133 223 but it could surprise you if you don't know about it.
bos@133 224
bos@133 225 Operating systems and filesystems differ in the way they handle the
bos@133 226 \emph{case} of characters in file and directory names. There are
bos@133 227 three common ways to handle case in names.
bos@133 228 \begin{itemize}
bos@133 229 \item Completely case insensitive. Uppercase and lowercase versions
bos@133 230 of a letter are treated as identical, both when creating a file and
bos@133 231 during subsequent accesses. This is common on older DOS-based
bos@133 232 systems.
bos@133 233 \item Case preserving, but insensitive. When a file or directory is
bos@133 234 created, the case of its name is stored, and can be retrieved and
bos@133 235 displayed by the operating system. When an existing file is being
bos@133 236 looked up, its case is ignored. This is the standard arrangement on
bos@133 237 Windows and MacOS. The names \filename{foo} and \filename{FoO}
bos@133 238 identify the same file. This treatment of uppercase and lowercase
bos@133 239 letters as interchangeable is also referred to as \emph{case
bos@133 240 folding}.
bos@133 241 \item Case sensitive. The case of a name is significant at all times.
bos@133 242 The names \filename{foo} and {FoO} identify different files. This
bos@133 243 is the way Linux and Unix systems normally work.
bos@133 244 \end{itemize}
bos@133 245
bos@133 246 On Unix-like systems, it is possible to have any or all of the above
bos@133 247 ways of handling case in action at once. For example, if you use a
bos@133 248 USB thumb drive formatted with a FAT32 filesystem on a Linux system,
bos@133 249 Linux will handle names on that filesystem in a case preserving, but
bos@133 250 insensitive, way.
bos@133 251
bos@133 252 \subsection{Safe, portable repository storage}
bos@133 253
bos@133 254 Mercurial's repository storage mechanism is \emph{case safe}. It
bos@133 255 translates file names so that they can be safely stored on both case
bos@133 256 sensitive and case insensitive filesystems. This means that you can
bos@133 257 use normal file copying tools to transfer a Mercurial repository onto,
bos@133 258 for example, a USB thumb drive, and safely move that drive and
bos@133 259 repository back and forth between a Mac, a PC running Windows, and a
bos@133 260 Linux box.
bos@133 261
bos@133 262 \subsection{Detecting case conflicts}
bos@133 263
bos@133 264 When operating in the working directory, Mercurial honours the naming
bos@133 265 policy of the filesystem where the working directory is located. If
bos@133 266 the filesystem is case preserving, but insensitive, Mercurial will
bos@133 267 treat names that differ only in case as the same.
bos@133 268
bos@133 269 An important aspect of this approach is that it is possible to commit
bos@133 270 a changeset on a case sensitive (typically Linux or Unix) filesystem
bos@133 271 that will cause trouble for users on case insensitive (usually Windows
bos@133 272 and MacOS) users. If a Linux user commits changes to two files, one
bos@133 273 named \filename{myfile.c} and the other named \filename{MyFile.C},
bos@133 274 they will be stored correctly in the repository. And in the working
bos@133 275 directories of other Linux users, they will be correctly represented
bos@133 276 as separate files.
bos@133 277
bos@133 278 If a Windows or Mac user pulls this change, they will not initially
bos@133 279 have a problem, because Mercurial's repository storage mechanism is
bos@133 280 case safe. However, once they try to \hgcmd{update} the working
bos@133 281 directory to that changeset, or \hgcmd{merge} with that changeset,
bos@133 282 Mercurial will spot the conflict between the two file names that the
bos@133 283 filesystem would treat as the same, and forbid the update or merge
bos@133 284 from occurring.
bos@133 285
bos@133 286 \subsection{Fixing a case conflict}
bos@133 287
bos@133 288 If you are using Windows or a Mac in a mixed environment where some of
bos@133 289 your collaborators are using Linux or Unix, and Mercurial reports a
bos@133 290 case folding conflict when you try to \hgcmd{update} or \hgcmd{merge},
bos@133 291 the procedure to fix the problem is simple.
bos@133 292
bos@133 293 Just find a nearby Linux or Unix box, clone the problem repository
bos@133 294 onto it, and use Mercurial's \hgcmd{rename} command to change the
bos@133 295 names of any offending files or directories so that they will no
bos@133 296 longer cause case folding conflicts. Commit this change, \hgcmd{pull}
bos@133 297 or \hgcmd{push} it across to your Windows or MacOS system, and
bos@133 298 \hgcmd{update} to the revision with the non-conflicting names.
bos@133 299
bos@133 300 The changeset with case-conflicting names will remain in your
bos@133 301 project's history, and you still won't be able to \hgcmd{update} your
bos@133 302 working directory to that changeset on a Windows or MacOS system, but
bos@133 303 you can continue development unimpeded.
bos@133 304
bos@133 305 \begin{note}
bos@133 306 Prior to version~0.9.3, Mercurial did not use a case safe repository
bos@133 307 storage mechanism, and did not detect case folding conflicts. If
bos@133 308 you are using an older version of Mercurial on Windows or MacOS, I
bos@133 309 strongly recommend that you upgrade.
bos@133 310 \end{note}
bos@133 311
bos@133 312 %%% Local Variables:
bos@133 313 %%% mode: latex
bos@133 314 %%% TeX-master: "00book"
bos@133 315 %%% End: