August 25, 2015

Automated Text Formatting in Vim

By Drew Barontini

I constantly evaluate my workflow to discover areas of repetitiveness. If I’m continually repeating a particular process, I will work on automating it.

Recently, I grew tired of the manual editing process of WordPress posts. Before each post goes out on the Code School Blog, I have to (generally) perform the following set of actions to format the text:

  • Replace <b> tags wrapping Heading in <h2>Heading</h2>
  • Replace <i> tags with <em>
  • Replace <b> tags with <strong>
  • Replace <strong>Heading</strong> with <h2>Heading</h2>
  • Replace <i>Heading</i> with <h3>Heading</h3>
  • Remove wrapping <a> tag around <img>
  • Remove WordPress-generated classes in <img> tag
  • Clear the alt attribute in <img> tag
  • Remove &nbsp;
  • Remove double spaces
  • Remove double newlines
  • Remove <span> tags
  • Remove any inline styles (e.g. style='font-weight: normal;')

Because of this tedious, manual process, I spent time working on an automated solution. I tried writing a Ruby script to parse the pasted-in text, run the replacements, and then provide a “diff” of what was changed. However, this proved to be a bigger pain than I expected. I landed on using regular expressions and Vimscript for writing a function that runs a set of substitutions on a file.

Capture groups in Vim search

After some experimentation and Googling, I discovered you can use capture groups in Vim as part of the search string. For example:

/<span>\(.*\)<\/span>

This search in Vim will return all span tags with their enclosing content, with the content in between the <span></span> set to a capture group. The “gotcha” is, by default, Vim will search for the literal opening ( and closing ) parenthesis, not as a capture group. We have to escape the parenthesis characters with a backslash for Vim to treat the set of parentheses as a capture group.

This is the case with all regular-expression strings. Make sure to escape the characters so Vim knows not to search for the literal character.

Running a substitution

In Vim, we can run a substitution within a file like so:

:%s/hello/goodbye/g

This will replace the string “hello” with “goodbye.” The trailing g will replace all instances within the file. There are additional flags, such as:

  • c to ask for confirmation before each substitution
  • i for case insensitive
  • I for case sensitive
:%s/hello/goodbye/gci

This will replace all instances of “hello” (case insensitive), and additionally ask for confirmation before each substitution.

Using a capture group

To use a capture group in the replacement, it looks like this:

:%s/\(hello\)/\1 world!/g

The \1 is referencing the first capture group in the search string which, in this case, is “hello”. All instances of “hello” will be replaced with “hello world!”.

Running a set of substitutions

This solution worked well for running a single substitution at a time, but we want to run all substitutions on a file with a single command. For that, we need a function.

function! DrewstroyBlogPost()
  " ...
endfunction

Vimscript is a bit odd, but if you’ve written a function in any language, this should look familiar. Within the function, we use exec to run a command:

Vim functions are capitalized to avoid confusion with the built-in functions.

exec 'command_goes_here'

In our case, we use it to run a set of substitutions:

" Convert <i></i> to <em></em>
exec ':%s/<i>\(.*\)<\/i>/<em>\1<\/em>/ge'

" Convert <b></b> to <strong></strong>
exec ':%s/<b>\(.*\)<\/b>/<strong>\1<\/strong>/ge'

" ...

The e flag tells Vim that not finding a match is not an error.

Running the function

So now that we have the function in place, how do we call it? While in Vim, type the following:

:call DrewstroyBlogPost()

Once we hit enter, the set of substitutions will run on the file. If we don’t want to run that command every time, we can use a mapping to trigger the function, like so:

nnoremap <leader>db :call DrewstroyBlogPost()<cr>

Now we can hit <leader> (leader key) and db to run the function. Simple as that.

Improvements

Although this drastically improves the process, there is still repetitiveness. We have to:

  • Copy the text in the WordPress Admin
  • Open up a terminal
  • Fire up Vim
  • Paste in the file contents
  • Run :call DrewstroyBlogPost() (or <leader>db)
  • Manually skim the file to clean up anything that’s missed or incorrect

So how could this be automated? We might add functionality to:

  • Run a bash command to set up a new file in a Git repo with the clipboard contents
  • Run the function on the file
  • Save and close the file
  • Run git diff to view what’s changed

This was just the first step, but these continual improvements will further refine the automation.

That’s All, Folks

I’m a big believer in automation. There are always ways to improve efficiencies. I highly recommend keeping a document for anything you repeatedly do. Evaluate the document frequently, and spend time to improve your efficiency. It’s worth it.

If you want to see the full function, here’s a GitHub Gist.

© 2019 Drew Barontini — Building products under Drewbio, LLC