rev |
line source |
bos@16
|
1 \chapter{Introduction}
|
bos@16
|
2 \label{chap:intro}
|
bos@16
|
3
|
bos@217
|
4 \section{About revision control}
|
bos@155
|
5
|
bos@219
|
6 Revision control is the process of managing multiple versions of a
|
bos@219
|
7 piece of information. In its simplest form, this is something that
|
bos@219
|
8 many people do by hand: every time you modify a file, save it under a
|
bos@219
|
9 new name that contains a number, each one higher than the number of
|
bos@219
|
10 the preceding version.
|
bos@217
|
11
|
bos@217
|
12 Manually managing multiple versions of even a single file is an
|
bos@217
|
13 error-prone task, though, so software tools to help automate this
|
bos@217
|
14 process have long been available. The earliest automated revision
|
bos@217
|
15 control tools were intended to help a single user to manage revisions
|
bos@219
|
16 of a single file. Over the past few decades, the scope of revision
|
bos@219
|
17 control tools has expanded greatly; they now manage multiple files,
|
bos@219
|
18 and help multiple people to work together. The best modern revision
|
bos@219
|
19 control tools have no problem coping with thousands of people working
|
bos@219
|
20 together on projects that consist of hundreds of thousands of files.
|
bos@217
|
21
|
bos@217
|
22 \subsection{Why use revision control?}
|
bos@217
|
23
|
bos@217
|
24 There are a number of reasons why you or your team might want to use
|
bos@217
|
25 an automated revision control tool for a project.
|
bos@217
|
26 \begin{itemize}
|
bos@219
|
27 \item It will track the history and evolution of your project, so you
|
bos@219
|
28 don't have to. For every change, you'll have a log of \emph{who}
|
bos@219
|
29 made it; \emph{why} they made it; \emph{when} they made it; and
|
bos@219
|
30 \emph{what} the change was.
|
bos@219
|
31 \item When you're working with other people, revision control software
|
bos@219
|
32 makes it easier for you to collaborate. For example, when people
|
bos@219
|
33 more or less simultaneously make potentially incompatible changes,
|
bos@219
|
34 the software will help you to identify and resolve those conflicts.
|
bos@217
|
35 \item It can help you to recover from mistakes. If you make a change
|
bos@217
|
36 that later turns out to be in error, you can revert to an earlier
|
bos@217
|
37 version of one or more files. In fact, a \emph{really} good
|
bos@217
|
38 revision control tool will even help you to efficiently figure out
|
bos@217
|
39 exactly when a problem was introduced (see
|
bos@217
|
40 section~\ref{sec:undo:bisect} for details).
|
bos@218
|
41 \item It will help you to work simultaneously on, and manage the drift
|
bos@218
|
42 between, multiple versions of your project.
|
bos@217
|
43 \end{itemize}
|
bos@218
|
44 Most of these reasons are equally valid---at least in theory---whether
|
bos@218
|
45 you're working on a project by yourself, or with a hundred other
|
bos@218
|
46 people.
|
bos@218
|
47
|
bos@218
|
48 A key question about the practicality of revision control at these two
|
bos@218
|
49 different scales (``lone hacker'' and ``huge team'') is how its
|
bos@218
|
50 \emph{benefits} compare to its \emph{costs}. A revision control tool
|
bos@218
|
51 that's difficult to understand or use is going to impose a high cost.
|
bos@218
|
52
|
bos@219
|
53 A five-hundred-person project is likely to collapse under its own
|
bos@219
|
54 weight almost immediately without a revision control tool and process.
|
bos@219
|
55 In this case, the cost of using revision control might hardly seem
|
bos@219
|
56 worth considering, since \emph{without} it, failure is almost
|
bos@219
|
57 guaranteed.
|
bos@218
|
58
|
bos@218
|
59 On the other hand, a one-person ``quick hack'' might seem like a poor
|
bos@218
|
60 place to use a revision control tool, because surely the cost of using
|
bos@218
|
61 one must be close to the overall cost of the project. Right?
|
bos@218
|
62
|
bos@218
|
63 Mercurial uniquely supports \emph{both} of these scales of
|
bos@218
|
64 development. You can learn the basics in just a few minutes, and due
|
bos@218
|
65 to its low overhead, you can apply revision control to the smallest of
|
bos@218
|
66 projects with ease. Its simplicity means you won't have a lot of
|
bos@218
|
67 abstruse concepts or command sequences competing for mental space with
|
bos@218
|
68 whatever you're \emph{really} trying to do. At the same time,
|
bos@218
|
69 Mercurial's high performance and peer-to-peer nature let you scale
|
bos@218
|
70 painlessly to handle large projects.
|
bos@217
|
71
|
bos@219
|
72 No revision control tool can rescue a poorly run project, but a good
|
bos@219
|
73 choice of tools can make a huge difference to the fluidity with which
|
bos@219
|
74 you can work on a project.
|
bos@219
|
75
|
bos@217
|
76 \subsection{The many names of revision control}
|
bos@217
|
77
|
bos@217
|
78 Revision control is a diverse field, so much so that it doesn't
|
bos@217
|
79 actually have a single name or acronym. Here are a few of the more
|
bos@217
|
80 common names and acronyms you'll encounter:
|
bos@217
|
81 \begin{itemize}
|
bos@217
|
82 \item Revision control (RCS)
|
bos@219
|
83 \item Software configuration management (SCM), or configuration management
|
bos@218
|
84 \item Source code management
|
bos@219
|
85 \item Source code control, or source control
|
bos@217
|
86 \item Version control (VCS)
|
bos@217
|
87 \end{itemize}
|
bos@217
|
88 Some people claim that these terms actually have different meanings,
|
bos@217
|
89 but in practice they overlap so much that there's no agreed or even
|
bos@217
|
90 useful way to tease them apart.
|
bos@155
|
91
|
bos@219
|
92 \section{A short history of revision control}
|
bos@155
|
93
|
bos@218
|
94 The best known of the old-time revision control tools is SCCS (Source
|
bos@218
|
95 Code Control System), which Marc Rochkind wrote at Bell Labs, in the
|
bos@218
|
96 early 1970s. SCCS operated on individual files, and required every
|
bos@218
|
97 person working on a project to have access to a shared workspace on a
|
bos@218
|
98 single system. Only one person could modify a file at any time;
|
bos@218
|
99 arbitration for access to files was via locks. It was common for
|
bos@218
|
100 people to lock files, and later forget to unlock them, preventing
|
bos@218
|
101 anyone else from modifying those files without the help of an
|
bos@218
|
102 administrator.
|
bos@218
|
103
|
bos@218
|
104 Walter Tichy developed a free alternative to SCCS in the early 1980s;
|
bos@218
|
105 he called his program RCS (Revison Control System). Like SCCS, RCS
|
bos@218
|
106 required developers to work in a single shared workspace, and to lock
|
bos@218
|
107 files to prevent multiple people from modifying them simultaneously.
|
bos@218
|
108
|
bos@218
|
109 Later in the 1980s, Dick Grune used RCS as a building block for a set
|
bos@218
|
110 of shell scripts he initially called cmt, but then renamed to CVS
|
bos@218
|
111 (Concurrent Versions System). The big innovation of CVS was that it
|
bos@218
|
112 let developers work simultaneously and somewhat independently in their
|
bos@218
|
113 own personal workspaces. The personal workspaces prevented developers
|
bos@218
|
114 from stepping on each other's toes all the time, as was common with
|
bos@218
|
115 SCCS and RCS. Each developer had a copy of every project file, and
|
bos@218
|
116 could modify their copies independently. They had to merge their
|
bos@218
|
117 edits prior to committing changes to the central repository.
|
bos@218
|
118
|
bos@218
|
119 Brian Berliner took Grune's original scripts and rewrote them in~C,
|
bos@218
|
120 releasing in 1989 the code that has since developed into the modern
|
bos@218
|
121 version of CVS. CVS subsequently acquired the ability to operate over
|
bos@218
|
122 a network connection, giving it a client/server architecture. CVS's
|
bos@218
|
123 architecture is centralised; only the server has a copy of the history
|
bos@218
|
124 of the project. Client workspaces just contain copies of recent
|
bos@218
|
125 versions of the project's files, and a little metadata to tell them
|
bos@218
|
126 where the server is. CVS has been enormously successful; it is
|
bos@218
|
127 probably the world's most widely used revision control system.
|
bos@218
|
128
|
bos@218
|
129 In the early 1990s, Sun Microsystems developed an early distributed
|
bos@218
|
130 revision control system, called TeamWare. A TeamWare workspace
|
bos@218
|
131 contains a complete copy of the project's history. TeamWare has no
|
bos@218
|
132 notion of a central repository. (CVS relied upon RCS for its history
|
bos@218
|
133 storage; TeamWare used SCCS.)
|
bos@218
|
134
|
bos@218
|
135 As the 1990s progressed, awareness grew of a number of problems with
|
bos@218
|
136 CVS. It records simultaneous changes to multiple files individually,
|
bos@218
|
137 instead of grouping them together as a single logically atomic
|
bos@218
|
138 operation. It does not manage its file hierarchy well; it is easy to
|
bos@218
|
139 make a mess of a repository by renaming files and directories. Worse,
|
bos@218
|
140 its source code is difficult to read and maintain, which made the
|
bos@218
|
141 ``pain level'' of fixing these architectural problems prohibitive.
|
bos@218
|
142
|
bos@218
|
143 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on
|
bos@218
|
144 CVS, started a project to replace it with a tool that would have a
|
bos@218
|
145 better architecture and cleaner code. The result, Subversion, does
|
bos@218
|
146 not stray from CVS's centralised client/server model, but it adds
|
bos@218
|
147 multi-file atomic commits, better namespace management, and a number
|
bos@218
|
148 of other features that make it a generally better tool than CVS.
|
bos@218
|
149 Since its initial release, it has rapidly grown in popularity.
|
bos@218
|
150
|
bos@218
|
151 More or less simultaneously, Graydon Hoare began working on an
|
bos@218
|
152 ambitious distributed revision control system that he named Monotone.
|
bos@218
|
153 While Monotone addresses many of CVS's design flaws and has a
|
bos@218
|
154 peer-to-peer architecture, it goes beyond earlier (and subsequent)
|
bos@218
|
155 revision control tools in a number of innovative ways. It uses
|
bos@218
|
156 cryptographic hashes as identifiers, and has an integral notion of
|
bos@218
|
157 ``trust'' for code from different sources.
|
bos@218
|
158
|
bos@218
|
159 Mercurial began life in 2005. While a few aspects of its design are
|
bos@218
|
160 influenced by Monotone, Mercurial focuses on ease of use, high
|
bos@218
|
161 performance, and scalability to very large projects.
|
bos@155
|
162
|
bos@219
|
163 \section{Trends in revision control}
|
bos@219
|
164
|
bos@219
|
165 There has been an unmistakable trend in the development and use of
|
bos@219
|
166 revision control tools over the past four decades, as people have
|
bos@219
|
167 become familiar with the capabilities of their tools and constrained
|
bos@219
|
168 by their limitations.
|
bos@219
|
169
|
bos@219
|
170 The first generation began by managing single files on individual
|
bos@219
|
171 computers. Although these tools represented a huge advance over
|
bos@219
|
172 ad-hoc manual revision control, their locking model and reliance on a
|
bos@219
|
173 single computer limited them to small, tightly-knit teams.
|
bos@219
|
174
|
bos@219
|
175 The second generation loosened these constraints by moving to
|
bos@219
|
176 network-centered architectures, and managing entire projects at a
|
bos@219
|
177 time. As projects grew larger, they ran into new problems. With
|
bos@219
|
178 clients needing to talk to servers very frequently, server scaling
|
bos@219
|
179 became an issue for large projects. An unreliable network connection
|
bos@219
|
180 could prevent remote users from being able to talk to the server at
|
bos@219
|
181 all. As open source projects started making read-only access
|
bos@219
|
182 available anonymously to anyone, people without commit privileges
|
bos@219
|
183 found that they could not use the tools to interact with a project in
|
bos@219
|
184 a natural way, as they could not record their changes.
|
bos@219
|
185
|
bos@219
|
186 The current generation of revision control tools is peer-to-peer in
|
bos@219
|
187 nature. All of these systems have dropped the dependency on a single
|
bos@219
|
188 central server, and allow people to distribute their revision control
|
bos@219
|
189 data to where it's actually needed. Collaboration over the Internet
|
bos@219
|
190 has moved from constrained by technology to a matter of choice and
|
bos@219
|
191 consensus. Modern tools can operate offline indefinitely and
|
bos@219
|
192 autonomously, with a network connection only needed when syncing
|
bos@219
|
193 changes with another repository.
|
bos@219
|
194
|
bos@219
|
195 \section{A few of the advantages of distributed revision control}
|
bos@219
|
196
|
bos@219
|
197 Even though distributed revision control tools have for several years
|
bos@219
|
198 been as robust and usable as their previous-generation counterparts,
|
bos@219
|
199 people using older tools have not yet necessarily woken up to their
|
bos@219
|
200 advantages. There are a number of ways in which distributed tools
|
bos@219
|
201 shine relative to centralised ones.
|
bos@219
|
202
|
bos@219
|
203 For an individual developer, distributed tools are almost always much
|
bos@219
|
204 faster than centralised tools. This is for a simple reason: a
|
bos@219
|
205 centralised tool needs to talk over the network for many common
|
bos@219
|
206 operations, because most metadata is stored in a single copy on the
|
bos@219
|
207 central server. A distributed tool stores all of its metadata
|
bos@219
|
208 locally. All else being equal, talking over the network adds overhead
|
bos@219
|
209 to a centralised tool. Don't underestimate the value of a snappy,
|
bos@219
|
210 responsive tool: you're going to spend a lot of time interacting with
|
bos@219
|
211 your revision control software.
|
bos@219
|
212
|
bos@219
|
213 Distributed tools are indifferent to the vagaries of your server
|
bos@219
|
214 infrastructure, again because they replicate metadata to so many
|
bos@219
|
215 locations. If you use a centralised system and your server catches
|
bos@219
|
216 fire, you'd better hope that your backup media are reliable, and that
|
bos@219
|
217 your last backup was recent and actually worked. With a distributed
|
bos@219
|
218 tool, you have many backups available on every contributor's computer.
|
bos@219
|
219
|
bos@219
|
220 The reliability of your network will affect distributed tools far less
|
bos@219
|
221 than it will centralised tools. You can't even use a centralised tool
|
bos@219
|
222 without a network connection, except for a few highly constrained
|
bos@219
|
223 commands. With a distributed tool, if your network connection goes
|
bos@219
|
224 down while you're working, you may not even notice. The only thing
|
bos@219
|
225 you won't be able to do is talk to repositories on other computers,
|
bos@219
|
226 something that is relatively rare compared with local operations. If
|
bos@219
|
227 you have a far-flung team of collaborators, this may be significant.
|
bos@219
|
228
|
bos@219
|
229 If you take a shine to an open source project and decide that you
|
bos@219
|
230 would like to start hacking on it, and that project uses a distributed
|
bos@219
|
231 revision control tool, you are at once a peer with the people who
|
bos@219
|
232 consider themselves the ``core'' of that project. If they publish
|
bos@219
|
233 their repositories, you can immediately copy their project history,
|
bos@219
|
234 start making changes, and record your work, using the same tools in
|
bos@219
|
235 the same ways as insiders. By contrast, with a centralised tool, you
|
bos@219
|
236 must use the software in a ``read only'' mode unless someone grants
|
bos@219
|
237 you permission to commit changes to their central server. Until then,
|
bos@219
|
238 you won't be able to record changes, and your local modifications will
|
bos@219
|
239 be at risk of corruption any time you try to update your client's view
|
bos@219
|
240 of the repository.
|
bos@155
|
241
|
bos@155
|
242 \subsection{For open source projects}
|
bos@155
|
243
|
bos@155
|
244 \subsection{For commercial projects}
|
bos@155
|
245
|
bos@155
|
246 \subsection{Myths about distributed revision control}
|
bos@155
|
247
|
bos@219
|
248 \subsubsection{Distributed tools encourage projects to fork}
|
bos@219
|
249
|
bos@155
|
250 \section{Why choose Mercurial?}
|
bos@155
|
251
|
bos@16
|
252
|
bos@16
|
253 %%% Local Variables:
|
bos@16
|
254 %%% mode: latex
|
bos@16
|
255 %%% TeX-master: "00book"
|
bos@16
|
256 %%% End:
|