The Essence of vi

Vim is a popular and widely distributed text editor improving on the legendary vi from the late 1970s. If you are reading this on Mac or Linux, chances are that Vim is already installed on your computer, included with your OS. If you tried Vim and found it too strange to use, or if you use it but still harbor the idea that it was made by (or for!) aliens, you’ve come to the right place!

Vim is a very powerful tool, but often misunderstood. There are plenty of tutorials that go through the basics, but they usually fail to paint the big picture. Why the weird keyboard shortcuts, and the modes? Why is it worthwhile to learn? In this post I will try to answer these questions. And I am pleased to say that the answer doesn’t start with h, j, k, l, or i.

Instead of the traditional bottom-up tutorial, this will be a top-down introduction, that uncovers the essence of vi and the almost poetic language you use to speak to Vim and vi. It is my hope that reading it will make your time with Vim both more useful and more enjoyable. I have myself enjoyed Vim since 2007, and I also contribute a tiny bit to it.

Don’t Worry About Modes

In Adobe Photoshop, pressing L selects the lasso tool, and M the marquee tool. Photoshop has many tools and functions, and there are many letters on the keyboard — good. But for most writing applications and text editors, the letter keys are used for typing. In these programs (and Photoshop), keyboard shortcuts use modifiers, for example ctrl-s to save changes. This means that prime keyboard real estate is not used to its fullest. 50+ easy-to-press shortcuts — upper and lower case letters — are only used for a single purpose.

A different approach to text editing is modal editing. Here the “typing mode” is just one of several modes of using the editor (and thus the keyboard). It is distinct from the mode(s) wherein you move the cursor, copy, paste, etc. Notably, it often makes sense that the typing mode, or insert mode as it is commonly called, is not the default mode. The reason is that it is far more common to edit, write, and program incrementally, than it is to write a piece from start to finish. You navigate around the text to read, think, make small adjustments, and enter a few words, sentences, or statements at a time.

Vim has many modes, with normal (“command”) mode and insert mode being the two main modes. Now, I could enumerate all the modes, their roles, and how to switch between them, but I’m not going to. Because when you use Vim naturally, you don’t think about modes, you think about actions. And the actions lead you naturally to the mode changes. Think about finding content on a web page in your browser: You press ctrl-f which opens a text field. Here you write your search string, press <enter> a few times, then <esc> when you’re done. You just used FIND MODE! (It was super effective!)

Modal editing simply gives you more buttons for actions. As Jon Beltran de Heredia wrote:

with vi, your keyboard becomes a huge specialized text-editing gamepad with almost a hundred buttons.

Play with the Gamepad

It can be hard to separate what you usually do with a text editor from what you want to do with the text. Learning Vim, you might ask “How do I select all text?” After all, in most editors, this is a simple <c-a> (this is another way of writing ctrl-a). I could tell you how to do it in Vim (one way is ggVG), but it is not in the spirit of vi. In fact, it’s not even possible in vi. And to master Vim, you must understand vi. (In a later post I will cover some “philosophical” differences between Vim and vi. Update 2014-11-25: Here it is!) Instead, realize that selecting text is a means, but to what end? Why do you want to select all?

Do you want to copy everything? Then copy, don’t select. It’s “ggyG”. Type it one letter at a time, and pay attention to case. What does it mean? The gg part moves the cursor to the first line, y then yanks (copies) every line until (and including) G, the last line. vi is older than the standardized cut, copy, and paste, hence the name “yank”.

Now, if you would think ggyG is no faster than <c-a><c-c> (select all, copy) in a regular editor, you’d be right. But say you wanted to copy just a paragraph? Type yap to “yank a paragraph”, with the cursor anywhere in that paragraph. Or copy an HTML/XML element? Go ahead and “yank a tag-block”, yat. Then put (paste) three copies of it: 3P. And change the content (of whichever one you put the cursor on): cit (“change inner tag-block”).

This last command is slightly different from the others. It deletes the content of the element, then drops you to insert mode with the cursor inside the element, i.e. <foo bar="baz">between the tags</foo>. Then you type your new content, followed by <esc>, which returns you to normal mode, simultaneously completing the action. It might go like this, from start to finish: citMy new content<esc>.

Then move to another element and type . to repeat the action!

In the words of Jeet Sukumaran:

It seemed that, without my hands leaving the keyboard, just a few strokes here and a few taps there, I was capable of dancing all over the document, and perform everything from extremely precise [targeted] micro-surgery to massive document-wide renovations.

The Language of vi

Mode changes take time to get accustomed to. They are very visible and “demanding” the first many times you use a modal editor. Hence, they get a lot of attention. But too often they steal the attention from the more important concept of actions.

People have described Vim/vi as having a language of editing. Perhaps you already got the feeling from the mnemonics above such as “yank a paragraph”. The language of vi is how you instruct the editor to perform actions. One action is one sentence, and a sentence consists of verbs, nouns, etc. Here I will pursue the idea with a bit more systematically than the aforementioned sources.

There are four possible sentences types/word orders:

Verb
Verb Object
Verb [Prepositional Phrase]
[Prepositional Phrase]

We shall explore the components of these sentences below, along with what it means for a prepositional phrase (e.g., “to …”, “until …”) to stand alone. The subject is always the implied “you” in the imperative. We are commanding the editor, after all!

Verbs

We have already seen verbs like yank y, change c, put P, and repeat .. Verbs fall into two categories: The first type has its “area of effect” implied and the change will occur immediately:

put previously yanked or deleted text P
repeat last action in a new context .
delete a single character backwards X
swap case of a single character ~
insert text i
replace text R

The second type is the operator which includes commands such as

yank y
change c
delete d
increase/decrease indentation > <
make UPPER/lower/opposite case gU gu g~

The difference between “immediate changes” and operators is that when you type an operator, the editor awaits the object or region that the operator should affect. This role is played by nouns and prepositional phrases.

There are many more verbs, and you indeed need more to use vi (and therefore Vim) efficiently, but this selection will do fine in explaining the language metaphor. It’s expected and normal that you find the commands hard to remember at first. But if you are curious and/or adamant enough to learn them, they grow into muscle memory, and you reap the full benefits of the compactness and efficiency. Paraphrasing Pascal Precht:

Vim’s learning curve is not a curve at all. It’s a wall. But once you climb that wall, you can lean back and slide down on the other side. Once I was able to do some basic operations in Vi, I made almost as twice as much progress at work in about half the time.

(If you want some tips on how to edit text better (not just in Vim) and do more in the same amount of time, Bram Moolenaar, the creator of Vim, has 7 Habits For Effective Text Editing.)

Verbs from both categories (immediate and operators) can involve a mode change. The insert i and change c commands both switch to insert mode where you can enter the new text. The replace R command uses replace mode, which is similar to insert mode, except you replace (overwrite) text as you type. (This is maybe another mode you already knew – in Microsoft Word and other word processors, it’s toggled by the <insert> key.) Operators (as opposed to immediate changes) in fact always incur mode a change: After typing the operator, the editor will be in operator pending mode, awaiting the choice of the text to operate on. We will see how to choose that text in the next two sections.

Nouns/Objects

We have seen nouns such as a paragraph ap and a tag block at. There are others such as a word aw, a sentence as, and various blocks (delimited by {}, (), <>, [], "", '', and ``). Nouns, or text objects as they are called in Vim, must be prefixed with an a or i. Above I showed them with a to help as a mnemonic (“a paragraph” ~ ap). The “article” serves two purposes: First, it distinguishes the nouns from prepositional phrases that are bound to the same letter (“a word” aw, for example, is not the same as w which means “to/until the next word”, as we shall see in the next section). Second, it marks whether the whole object (a) or only its “insides” (i) are meant. For example, we have seen yat yank a whole tag block, while cit changes its contents (i.e., not the enclosing tags). For objects that do not have an obvious crunchy shell surrounding a soft, chewy center, the a version includes the trailing whitespace while i doesn’t. Thus, a is useful for transplanting words and sentences without having to clean up the whitespace afterwards.

Text objects are “intelligent”. First, they are insensitive to the cursor position within the text object: The whole object (or its inside) is used as long as the cursor is somewhere inside it. Second, they can depend on context: The meaning of “a word” depends on the file type. In C-style programming languages, dashes are not part of a word, but in Lisps, they are. For prose writing, I have set Vim up to see apostrophes as parts of words such that “can’t” is seen as a single word (which would not be so useful in programming). Paragraph boundaries can similarly be specified. Strings are aware of escaped quote characters. For example, "This is one \"string\"".

I didn’t mention “line” as a noun. This is not because operating on lines is uncommon, in fact it’s the opposite. Repeating an operator symbol makes the operator work on the current line: yy, cc, dd, >>, <<, gUU, guu, g~~, …

Prepositional Phrases/Motions

There are also prepositional phrases, which is the grammatical term I use for what Vim calls cursor motions. We have already seen “to the first line” and “to the last line”, gg and G, respectively. The sequence ggyG we saw earlier reads as two sentences: “(First,) (go) to the first line of the buffer. (Then) yank until the last line.” This demonstrates an important point: There is no need for punctuation in the “vi language” (at least figuratively speaking!) because the word order is enough: A prepositional phrase without a verb is taken to be a cursor movement command. A prepositional phrase following an operator verb means that the operator should act on the text from the current cursor position and to wherever the motion leads. (Here you can see that this is really a top-down introduction to Vim. A tutorial would probably have told you first thing how to move the cursor!)

First, let’s see some word motions:

“to the next beginning-of-a-word” w
“(backwards) to the previous beginning-of-a-word” b
“to the next/previous end-of-a-word” e and ge

Unlike text objects such as “a word” aw, motions use your exact cursor position as the point of origin. Therefore, b and e will find the beginning/end of the word the cursor is inside, unless you are already on the beginning/end. If you are on the beginning or end, they will move to the previous or next word, respectively. Notice that I say on. In Vim/vi, the cursor moves on characters, not between them as in many other editors.

If you move back and forth in a sentence using b and w you can exploit the fact that the cursor is always on the first character of a word. For example, dw deletes a word without the need for daw. The motions ( and ) work like b and w but for sentences. The motions { and } go to the previous/next paragraph boundary.

Another powerful type of motion is the “till”/“find” motion. Till t moves to the next character you specify. For example, t" moves to right before the next " in the line. Find f is similar, except it moves onto the specified character. The motion ; repeats the last t or f motion. All three motions have equivalents going to the left instead of right: T, F and ,.

The till/find motions are for quick and simple jumps within the same line. Regex search finds anything, anywhere in the file. The commands / and ? open up a search field for forward and backward regex search, respectively. It even highlights the first match while you type to give you live feedback on your search pattern. After confirming the entered pattern with <enter>, keys n and N jump in the same/opposite direction of the current regex search.

Coming from other text editors, it’s easy to think that / is “just” how you search for text. And it is. But the four commands / ? n N are all motions, and like any motion they can be used alone or with an operator. The design of vi is orthogonal: You can learn it piece by piece, and combine the new motion you just learned with the verbs you already knew and vice versa. This is extremely powerful and part of what makes vi/Vim so special. The idea may be almost 40 years old, but it’s still relevant today and in the future!

You may have noticed I didn’t mention the most basic motions you already know from your arrow keys. Many Vim introductions (including the official “vimtutor”) mention very early on that h j k l work as ← ↓ ↑ →, respectively. And people wonder why they can’t just use the normal arrow keys. But either kind of arrow keys are beside the point! There’s almost always a smarter way — a more precise motion — to go somewhere than to hold down arrow keys. Jeet Sukumaran tells it again:

being forced to use the rich suite of powerful normal mode movement commands to get to exactly where I needed to be […] was like suddenly beginning to use the fifth and other gears while driving on an open highway, whereas before I had been grinding along for mile after laborious mile on first.

Here are some more motions just to illustrate:

“to the matching parenthesis/brace/comment marker/HTML tag/if/else-stmt/etc. %
“to the next/previous misspelled word” ]s [s
“to the next/previous start/end of a method (Java style) ]m ]M [m [M

Numerals

So far we have covered verbs, nouns, and prepositional phrases. Let’s look at numerals, of which vi uses three types:

Cardinal numbers, e.g., 1, 2, 7
Adverbial numbers, e.g., once, twice, sevenfold
Ordinal numbers, e.g. 1st, 2nd, 7th

The grammatical distinction is mine; in vi they’re all known as counts. But it helps to show the different roles counts have. Cardinal numbers (1, 2, 7) can choose the number of lines to apply an operator to, or the number of text objects (nouns) to apply an operator to:

“Indent ten lines” 10>>
“Uppercase two sentences” gU2as

Adverbial numbers (once, twice, sevenfold) can decide the number of times to do something:

“Twentyfold insert a blank line” 20i<enter><esc> (-> “Insert 20 blank lines”)
“Delete until you have moved twice to the next word” d2w (-> “Delete two words”)

Ordinal numbers (1st, 2nd, 7th) can choose where to go:

“To the 27th line” 27gg
“To the 20000th byte” 20000go
“To the 80’th column” 80|
“To the 70%‘th line” 70%

The ordinal numbers can also be though of just as cardinals (line 27 = 27th line), but since they access ordered items, it makes sense to see them as ordinals.

All motions and operators in fact take a count, but it usually defaults to 1. This is another example of orthogonality in vi.

Expand your Vocabulary

For even more orthogonality, you can even add verbs and nouns, as mentioned by Yan Pritzker:

Install Drew Niel’s textobj-rubyblock plugin to get the new noun “Ruby block” r.
Install Tim Pope’s surround.vim plugin to get three new verbs:
- “Surround with”, which is even HTML aware (closes tags). Example:
  - “(you) surround a word with em-tags ysaw<em>
- “Change surroundings”, example:
  - “change the nearest surrounding single quotes to double quotes” cs'"
- “Delete surroundings”, e.g.
  - “delete the nearest surrounding asterisks” ds*

Real Keyboard Shortcuts

The key y is not a shortcut for yank, it is the yank operator. But vi still has shortcuts (in the traditional sense of the word) for those actions that are so common that having a shorter ways to get to them is useful:

“Delete left/right” dh and dl are X and x (think backspace and delete)
“Change to the end of the line” c$ is C
“Delete to the end of the line” d$ is D

Recall that $ means “to the end of the line”. Similarly, 0 means “to the beginning of the line” and ^ means “to the first non-blank character on the line”. With these we get:

“Insert at the “beginning” of the line” ^i is I
“Really insert at the beginning of the line!” 0i is gI
“Append to the line” $a is A (append a is like insert i except it inserts after the cursor position)

The idea that “vi is about pressing i to insert text” is wrong. We have already seen that it takes focus away from what vi is really about, namely actions. But it’s also wrong because it takes focus away from the above insertion commands which are just as common as i for the effective user. For programmers, one type of action is even more common:

“Open a new line below” o
“Open a new line above” O

It inserts a blank line and leaves you in insert mode, ready to type a new statement in your program. With Vim’s indentation features this even starts inserting at the right indentation level, aware of nesting and control structures. For example, if the previous line is an “if” statement or the start of a new block, the next line’s indentation level should be one higher.

What about Usability?

I figure I should mention usability because there could seem to be a conflict between these two facts: (1) I care about usability, and (2) vi is definitely not known to be user friendly. How can I reconcile this? The thing to remember is that usability is about more than user friendliness. Using the first two components of Jakob Nielsen’s definition of usability:

Learnability: How easy is it for users to accomplish basic tasks the first time they encounter the design?
Efficiency: Once users have learned the design, how quickly can they perform tasks?

This is the fundamental trade-off with Vim/vi: Very hard to master and very efficient when mastered. The question then quickly arises: Can learnability be improved without harming efficiency? And what other options exist on the learnability/efficiency plane? (And what about the other aspects of usability?) These are all interesting topics for another time! In the meantime I hope the mental model of vi explained in this post will make, if not your first, then your next encounter with Vim more meaningful.

Conclusion

The essence of vi is not modality. The essence of vi is performing actions on text. Actions are specified using an extensible and orthogonally composable “language”, where the same motions that are part of actions are also used for plain cursor movement. Some of the actions involve mode switching, some don’t. The actions drive the editing process while the modes help put a large vocabulary right under your fingertips.

I have only covered basic text manipulation, though Vim does much more. Any user would need to know how to open/close/save files, and maybe open/close/switch buffers, windows, and tabs (Vim has them all). Fancier features include regex substitution, autocompletion, diff mode, scripting, key remapping, (mutually recursive) macros, persistent undo, and encryption.

These features add tremendously to the usefulness of Vim. And regex substitution predates even vi. But in my opinion they are not part of its essence, action based editing. In fact, I barely touched upon the command-line/Ex mode. Most people who have tried Vim know it because it is the one you enter when you type :w<enter> to save and :q<enter> to quit.

Is command-line mode essential? Maybe the Vim Koan “The superior editor” will enlighten you.

Update 2016-04-27: Motions ]m ]M [m [M do unfortunately not work for C# out of the box. Also, grammar fixes.

Update 2021-06-26: Fix dead links.

Erik Ramsgaard Wognsen

Thoughts & technology