Joey Hess
Joey shows you how to keep track of everything with CVS.
I keep my life in a CVS repository. For
the past two years, every file I've created and worked on, every
e-mail I've sent or received and every config file I've
tweaked have all been checked into my CVS archive. When I tell people
about this, they invariably respond, ``You're crazy!''
After all, CVS is meant for managing discrete bodies of code, such
as free software programs that are worked on and available to a lot
of people or in-house projects that are collaboratively developed by
several employees. CVS has a reputation of being a pain to deal with,
and it has a lot of crufty bits that regularly drive users up the wall,
like its mistreatment of directories. Why inflict the pain of CVS on
yourself if you don't have to? Why do it on such a scale that it
affects nearly everything you do with your computer?
I get three major benefits from keeping my whole home directory in CVS:
home directory replication, history and distributed backups. The first of
these is what originally drove me to CVS for my whole home directory. At
the time, I had a home desktop machine, two laptops and a desktop machine
at work. Rounding this out were perhaps 20 remote accounts on various
systems around the world and many systems around the workplace that I
might randomly find myself logging in to. I used all of these accounts for
working on the same projects and already was using CVS for those projects.
I'm a conservative guy when it comes to my computing environment
(I've used the same wallpaper image for the past five years),
and at the same time I'm always making a lot of little tweaks to
improve things. Whenever I go to work and something wasn't just like
I had tweaked it the night before, I'd feel a jarring disconnect,
and annoyingly copy over whatever the change was. When I sat down at some
other system at work, to burn a CD perhaps, and found a bare Bash shell
instead of the heavily customized environment I've built up over
the past ten years, it was even worse. The plethora of environments,
each imperfectly customized to my needs by varying degrees, was really
getting on my nerves. So one day I cracked and sat down and began to
feed my whole home directory into CVS.
It worked astonishingly well. After a few weeks of tweaking and importing
I had everything working and began developing some new habits. Every
morning (er, afternoon) when I came into work, I'd cvs up
while I read the morning mail. In the evening, I'd cvs
commit and then update my laptop for the trip home. When I got
home, I'd sync up again, dive right back into whatever I'd
been doing at work and keep on rolling until late at night--when I
committed, went to bed and began the cycle all over again. As for the
systems I used less frequently, like the CD burner machine, I'd
just update when I got annoyed at them for being a trifle out of date.
It only took a few more weeks before the advantage of having a history
of everything I'd done began to show up. It wasn't a real
surprise because having a history of past versions of a project is one
of the reasons to use CVS in the first place, but it's very cool
to have it suddenly apply to every file you own. When I broke my .zshrc
or .procmailrc, I could roll back to the previous day's or look
back and see when I made the change and why. It's very handy to
be able to run cvs diff on your kernel config file and see
how make xconfig changed it. It's great to be able to
recover files you deleted or delete files because they're not
relevant and still know you've not really lost them. For those
amateur historians among us, it's very cool to be able to check
out one's system as it looked one full year ago and poke around
and discover how everything has evolved over time.
The final major benefit took some time to become clear. Linus Torvalds
once said, ``Only wimps use tape backup: real men just upload
their important stuff on FTP and let the rest of the world mirror
it.'' I'm not a real enough man to upload my confidential
documents to ftp.kernel.org though, so I've been wimping
along with backups to tape and CD and so on. But then it hit me: take,
for example, one crucial file, like my .zshrc or sent-mail archive:
I had a copy of that file on my work machine, and on my home machine,
and on my laptop and several other copies on other accounts. There was
another copy encoded in my CVS repository too.
I'm told that the best backups are done without effort--so
you actually do them--and are widely scattered among many machines
and a lot of area so that a local disaster doesn't knock them
out. They are tested on a regular basis to make sure the backup works. I
was doing all of these things as a mere side effect of keeping it all
in CVS. Then I sobered up and remembered that a dead CVS repository
would be a really, really bad thing and kept those wimpy backups to CD
going. But the automatic distributed backups are what keep me sleeping
quietly at night. Later, when I left that job, the last thing I did
on my work desktop machine was: cvs commit ; sudo rm -rf /.
And I didn't worry a bit; my life was still there, secure in CVS.
A full checkout of my home directory with all the trimmings often runs
about 4GB in size. A lot of that will be temporary trees in tmp/ and
rsynced Ogg Vorbis files (so far, I have not found the disk space to
check all of them into CVS). My CVS repository currently uses less than
1GB of space, though it is steadily growing in size. I keep some 13,000
files in CVS, and so a full CVS update of my home directory is a sight
to see and takes a while.
These days I'm often stuck behind a dial-up connection,
and I mostly just use one laptop, so I might go days between CVS
updates. Other better-connected systems have automatic CVS updates done
via cron each day. I cvs commit whenever I want to make a
backup of where I am in a file or when I am at the point of releasing
something. I still also do a full commit of my home directory every
day or so. I confess that some of my CVS commit messages are less than
informative--``foo'' has been used far too many times on
some classes of files. I even do some automatic CVS commits; for example,
my mailbox archives are committed by a daily cron job.
There are other benefits of course. I attend many tradeshows and other
events that require me to sit down at some computer out of the
box, use it for an hour or a day and never see it again. I can check
out the core of my CVS home directory in about five minutes, and after
that it is just as comfortable as if I'd SSH'd home and
was doing everything there. I even get my whole desktop set up in that
five minutes. In a chaotic tradeshow environment, there is nothing more
reassuring than having your familiar computer setup at your fingertips
as you demo things to the hordes of visitors.
Keeping your home directory in CVS is not all fun though. Anyone
who's used CVS in a large project probably has had to resolve
conflicts engendered by two people modifying the same file. At least
you can curse the other guy who committed the changes first while you
deal with this annoying task. Most of you have probably not had
to resolve conflicts between the file you modified at home and at work,
then cursing at yourself.
Then there are CVS's famous problems: poor handling of directories
and binary files. The nearly nonexistent handling of permissions, which
is not a big deal in most projects but becomes important when you have
a home directory with some public and some private files and directories
in it. A slow, bloated protocol, hindered even more by the necessity of
piping it all over SSH; the pain of trying to move a file that is already
in CVS, or much worse, a whole directory tree, again hits you especially
hard when you're using CVS for the whole home directory. And those
damn CVS directories are always cluttering up everything. I've
developed means of coping with all of these to varying degrees, but
like many of us, I'm hoping for a better replacement one day
(and dreading the transition).
Perhaps it's time that I get down to the details of how I organize
my home directory in CVS. I've always managed my home directory
with an iron hand, and CVS has just exacerbated this tendency. Let's
look at the top level:
joey@silk:~>ls
CVS/ GNUstep/ bin/ debian/ doc/ html/ lib/
mail/ src/ tmp/
Yes, that's it. Well, except for the 100-plus dot files. Most
people use their home directory as a scratch space for files they're
working on, but instead I have a dedicated scratch directory, the tmp
directory, which I clean out irregularly. In general, when I start a
new file or project, I will be checking it into CVS soon, so I begin
working on it in the appropriate directory. This document, for example,
is starting its life in the html directory and will be checked into
CVS soon to live there forever. Of course, sometimes I goof up and then
I have to resort to the usual tricks to move files in CVS. And so the
first rule of CVS home directories is it pays to think before starting
and get the right filename and location the first time. Don't be
too impatient to check in the file.
CVS is a great way to ensure that you have a nice, clean, well-managed
home directory. Every time I cvs update it will helpfully
complain to me about any files it doesn't know about. Of course,
I make heavy use of .cvsignore files in some directories (like tmp/).
If I go to another machine, the home directory looks pretty much the same,
though various things might be missing:
joeyh@auric:~>ls
CVS/ GNUstep/ bin/ tmp/
I use this machine for occasional specific shell purposes. I
don't administer the system, so I don't want to put
private files there. The result is a much truncated version of my home
directory. It's perfectly usable for everything I normally do on
that machine, and if I want to, say, work on this document there at some
point, I can just type cvs co html and a password and be on
my way.
The way I make this partial-checkouts system work is by using CVS
modules and aliases. I have modules defined for each of the top-level
directories and for the home directory (dot files) itself. For example,
the entry in my CVSROOT/modules file for the stripped-down version of
my home directory looks like this:
joeyh -u cvsfix -o cvsfix joey-cvs/home &bin
For more complete home directories, I use this instead:
joey -u cvsfix -o cvsfix joey-cvs/home &src &doc
&debian &html &lib &.hide &bin &mail
Notice the .hide module. It results in a ~/.hide directory when I
check it out. This directory is where I put the occasional private file
that I don't want to appear in home directories--like the
one on auric--that are on systems not administered by me. The files
in .hide get hard-linked to their proper locations if .hide is checked
out, so I can put confidential dot files in there and only check those
dot files out on trusted systems. I also have, for example, my Mozilla
cookies file in .hide.
It's important to distinguish between such files that I need
to put in .hide and the entire set of private directories, like my mail
directory. Yes, I keep my mail in CVS (except for just-arrived spooled
mail, which I keep synced up with a neat little program called isync that
is smarter about mail than CVS is). But it's all in its own mail/
directory, so I can omit checking that directory out to systems that
I don't trust with my mail or that I don't want to burden
with hundreds of megabytes of mail archives.
While I'm discussing privacy issues, I should mention that I make
some bits of my home directory completely open to the public. This
includes a lot of free software in debian/ and src/, and some handy
little programs in bin/. This is accomplished by permissions. I have
to make sure that most directories in the repository (or at least
the top-level directories like mail/) are mode 700, so only I can
access them. Other top-level directories, like bin/, are opened
up to mode 755. This allows anonymous CVS access and browsing at
cvs.kitenet.net/joey-cvs/bin/.
This leads to the second rule of CVS home directories: don't import
$HOME in one big chunk; break it up into multiple modules. The structure
of your repository need not mirror the structure of your actual home
directory. Modules can be checked out in different locations to move
things around and control access on a per-module level. There's
a layer of indirection there, and such layers always make things more
flexible and more complex.
Some of the projects I work on have their own CVS repositories that are
unconnected to my big home directory repository. That's fine too;
I simply check them out into logical places in my home directory tree
as needed. CVS can even be tweaked to recurse into those directories
when updating or committing.
Another thing to notice in those lines from my modules file is the use of
-u cvsfix to make the cvsfix program run after CVS updates.
That program does a lot of little things, including ensuring that
permissions are correct, setting up the hard links to files in .hide and
so on.
One last thing to mention is the issue of heterogeneous environments
and CVS. Most of my accounts are on systems running varying versions
of Debian Linux on a host of different architectures, but there are
accounts on other distributions, on Solaris and so forth. Trying to
make the same dot files work on everything can be interesting. My
.zshrc file, for example, goes to great pains to detect things like
GNU ls, deals with varying zsh versions, sets up aliases to the best
available editor and other commands and so on. Other programs, like
.xinitrc, check the host they're running on and behave slightly (or
completely) differently. I've even at one point had a .procmailrc
that filtered mail differently depending on hostname, though the trick
to doing that is lost somewhere in one of the innumerable versions
stored in my repository. I've even resorted in a few places to
files with names of the form filename.hostname--cvsfix finds one
matching the current host and links it to the filename. Branches are
also a possibility, of course, but despite my heavy use of CVS, I still
find some corners of it a black art.
Well I guess that's it. I'd be happy to hear from anyone
else who keeps their home directory in CVS, especially if you have some
tricks to share. In the future I'd like to try checking /etc into
CVS too, and if you've successfully done this, I'd love to
talk with you. Now I'm off to commit this file.
Joey Hess (joey@kitenet.net) is a longtime Debian developer
who lives on a farm in Virginia. He enjoys finding new and unlikely
places from which to commit code wirelessly to CVS.