Erik Ramsgaard Wognsen

Thoughts & technology

RIP Mercurial (on Bitbucket)

Bitbucket is going to drop support for the Mercurial version control system, due to most people using Git instead. I use both Mercurial (hg) and Git, and while Git has become the de facto standard today, Mercurial was my first love in the world of version control systems, and I’m sad to see it go.

Of course, Mercurial is not dead, but being removed from Bitbucket is a big deal (at least to me). But this news also gave me an occasion to reconsider Git vs. Mercurial, and writing this post has actually given me a newfound, if not appreciation, then at least understanding of Git.

Init

When I started university in 2006, SVN had recently replaced CVS as the cool thing. So for group work we used SVN and it worked fine. Well, except for all the file conflicts we got. And the awful tree conflicts. Come to think of it, it didn’t work that great. Luckily distributed version control systems had been maturing in the meantime.

In 2009 we started using Mercurial. It was like SVN, except it was actually good for collaboration! Now you could store a safe snapshot of your own work before trying to merge other people’s changes in. Also merging generally worked well. Also it was fast.

All was good, but there was also this Git thing people on the internet were talking about. Coming from SVN, Mercurial made sense and Git didn’t. (Linus Torvalds, being Linus Torvalds, made Git purposely different from the CVS/SVN family.) Git did seem to be more flexible at the expense of being less elegant. As was written back then: Git is MacGyver, Mercurial is James Bond.

My group mates and I kept using Mercurial, but as I started my PhD, collaboration with new people, not least my supervisor René, made me spend more time with Git. It got better when I began to understand that the differences between Mercurial and Git start at fundamental concepts such as branches and merges.

Different Concepts

“Branch” doesn’t have the same interpretation in Mercurial and Git. In Mercurial, a branch is a sequence of commits all sharing a (permanent) name. In Git, a branch is temporary and is represented by a reference to the current commit at the tip of the branch. Once gone, you can’t tell which branch(es) a commit used to be part of.

In Mercurial branches are also shared between clones of the same repository. Everybody works on the same branch, and you can hg pull to get the newest version of the branch, and then hg update your working copy. In Git nobody works on the same branch — there’s just people working on different branches that might or might not have the same name. You can’t “update” to the newest version because you already have the newest version of your own branch by definition. But you can merge another branch, possibly with the same name, into your own.

In Mercurial, merges are used for merging divergent branches or heads. In Git, everything is a merge, even if only one branch has changed. But at the same time, a Git merge might or might not create a merge commit. By default, a new commit will only be created if the two branches had diverged, i.e., work had been done on both. Otherwise, the current branch will just be updated to point to the same commit as the branch being merged in. It makes a lot of sense once you get used to Git, but before that it’s just confusing.

Different Commands

Learning Git from an SVN/Mercurial background meant relearning a few words. “Revert” means something new in Git and “shelve” is called “stash”, but that’s not bad. Instead, what is annoying is that many simple Mercurial commands don’t have simple, memorable equivalents in Git:

Show the hash of the current commit:

1
2
hg id
git rev-parse HEAD

Show which commits would be pulled or pushed:

1
2
3
4
5
hg in
git fetch && git log ..@{u}

hg out
git fetch && git log @{u}..

Mark a file to stop tracking, but don’t delete the file. This one is especially weird in Git because you don’t remove something -​-cached, you cache (stage) the removal of something:

1
2
hg forget
git rm --cached

Print the root directory of the repository and the URL of the remote repository:

1
2
3
4
5
hg root
hg path

git rev-parse --show-toplevel
git config --get remote.origin.url

List the version controlled files:

1
2
hg manifest
git ls-tree -r --name-only --full-tree HEAD

Git is Git

I have been using the above commands as aliases in Git, but I have wondered if doing that somehow prevented me from learning the “true spirit” of Git. I have now come to the conclusion that the essence of Git is its architecture and data model, and not its incoherent command line interface. Learning the user interface is just a chore, and using aliases to smooth it out is fine.

I’m not bashing Git because it’s different from Mercurial, but because it is internally inconsistent and, as another blogger puts it: Git doesn’t so much have a leaky abstraction as no abstraction. Here are some of the inconsistencies and lacking abstractions:

  • Concepts can have multiple names, like the index a.k.a. the cache a.k.a. the staging area (which would be okay if Git didn’t already have so many concepts)
  • The branch origin/master is your (local) remote tracking branch, but master alone can refer to your local master branch or to the remote’s master branch depending on context
  • The git checkout command famously does too much: It can check out a branch (merging in uncommitted changes in the process if you want, or creating a new branch manually or automatically if you want), and it can restore or overwrites individual files in the working copy
    • However, the new Git 2.23 released this month has started experimenting with the new, more specific, and subject to change git switch and git restore commands (similar to hg update and hg revert!)
  • The notation A:B sometimes means <src>:<dst> (a refspec) and sometimes <rev>:<path> (path in a revision — to be distinguished from a pathspec which is a third concept that can use the colon in a third way)
  • The notations A B and A..B mean the same thing to git diff but not to git log
  • The notations A..B and A...B mean somewhat opposite things to git diff and git log
    • To specify branches A and B symmetrically you need git diff A..B but git log A...B
    • git diff A...B and git log A..B mean “from the common ancestor of A and B, to B”

In the end, I think Git is a good tool despite its command line interface. But if you’re the type of person who needs to understand something in depth to feel comfortable with it, you have a lot of reading to do. If you already know the basics, git help gitglossary will get you started on 80+ underlying concepts that will help you on your way to master the world’s favored VCS!

Comments