Monday, September 19, 2011

Generate side-by-side diffs in html using vim

Problem

Given two text files, generate a visually appealing diff between them. Do so in html, so it can be visualized in a web browser.

Some approaches

Searching the web

The internet has a suprisingly little amount of flexible standalone tools to generate html diffs that fullfill all my requirements. My requirements include
  • generate side-by-side diff
  • allow to save to html without user intervention
  • work fast on large files with lots of differences
This overview is by no means meant to be exhaustive:
  • lots of diff tools exist, but to the best of my knowledge none of them can generate html.
  • match-patch doesn't seem to generate the layout i'd like to see.
  • diff2html.py and derived tools won't generate intraline differences.
  • csdiff only works on windows systems (but in all honesty, it's faster to generate diffs than using the VIM approach described in this article, and the generated html is more user friendly because it includes navigation between differences)

Rolling your own

In the past i have rolled my own solution in python, using two different approaches:
  • Using python's difflib. This is by far the easiest solution, but it can be very slow for long files with a lot of differences between the two files under comparison.
  • Using a combination of GNU diff and python's difflib. I used GNU diff to generate a rough diff, then used python difflib to generate the intraline differences. This was a lot faster than using only python's difflib already, but still too slow on very large files with many differences. This approach does have the advantage of being very flexible: this tool can generate both side-by-side and line-by-line differences with or without intra-line differences indicated.

Using vim

The text editor vim has an option to visualize the diff of two files by calling it with the command line option "-d":
gvim -d file1.txt file2.txt
Vim highlights as the intraline difference everything between the common prefix and common suffix of a line. Depending on your needs, this may be too much of an approximation. In recent enough versions of vim, the complete diff can be rendered to html using the command
:TOhtml
This command will create a third buffer containing the html representation. Colors are determined by the active colorscheme. Colorscheme can be changed using the command
:colorscheme <name>
Colorscheme only seems to have the expected effect if you use gvim instead of vim. You can experiment with using "vim" instead of "gvim". The names of possible colorschemes can be found underneath the colors folder in the vim installation folder. By default the following colorschemes were installed on my system:
  • blue
  • darkblue
  • default
  • delek
  • desert
  • elflord
  • evening
  • koehler
  • morning
  • murphy
  • pablo
  • peachpuff
  • ron
  • shine
  • slate
  • torte
  • zellner

Automating the html generation using vim

All of the above consists of steps to perform manually, which is ok if you have to compare two files, but far from ok if you have to diff hundreds of files. So how can it be automated? Luckily vim has a few interesting command line options, one of which is the option "-c". Option "-c" allows you to pass a command that will be executed on startup, after the file(s) are loaded. You can pass more than one "-c" command. The following one-liner will load the files, select a colorscheme, generate the diff, save it as html to a file called diff.html, and close vim:
gvim -d orig.txt modified.txt -c "colorscheme zellner"\
-c TOhtml -c "w! test.html" -c q! -c q! -c q!
Note that if you intend to send a lot of commands to vim, you will want to use the command line option "-s" , which allows you to specify a file with vim commands to be executed after the first file is loaded. so the previous command line then becomes
gvim -d orig.txt modified.txt -s commands.vim
and the contents of commands.vim are:
:colorscheme zellner
:TOhtml
:w! test.html
:q!
:q!
:q!

Generating html fragments to embed in a bigger page

Here's one approach to strip the <html> and <head> tags, and replace the <body> tag with a table. In short, replace the previous commands.vim file with this one:
:colorscheme zellner
:let g:html_use_css=0
:TOhtml
:%g/<body/normal k$dgg
:%s/<body\s*\(bgcolor="[^"]*"\)\s*text=\("[^"]*"\)\s*>/<table \1 cellPadding=0><tr><td><font color=\2>/
:%s#</body>\(.\|\n\)*</html>#\='</font></td></tr></table>'#i
:w! PATH/TO/diff.html
:q!
:q!
:q!
Vim's TOhtml command takes a few options that allow you to influence the html generation. Some examples are listed below. Be sure to check out the vim documentation to find out about other useful options.
  • add
    -c "let g:html_use_css=0"
    to avoid generating css (for using with very old browsers, or to embed in emails);
  • add
    -c "let g:html_dynamic_folds=1"
    to generate html with css that allows dynamic folding of sections in the files (useful for source code)
  • add
    -c "let g:html_number_lines=1"
    to show line numbers
  • add
    -c "let g:html_no_pre=1"
    to wrap long lines instead of having scrollable columns
  • add
    -c "let g:html_use_xhtml=1"
    to generate XHTML instead of HTML
Needless to say, if you (ab)use vim in this way in a multi-user environment you will have to take care of choosing a suitable .html filename (unique name for each user/process) to avoid multithreading problems.

Using vim in server mode

Here's another neat trick in case you want to avoid starting up vim over and over again... You can start vim in server mode, meaning that it can listen to remote commands, as follows
gvim --servername DIFFSERVER
DIFFSERVER is just a name I chose. It's an id that is used to identify the correct vim instance that should receive your commands. After you've started vim in server mode you can start a different console (or keep using the same one...) and send commands to the vim server:
vim --servername DIFFSERVER --remote-send ":e PATH/TO/file1.txt"
vim --servername DIFFSERVER --remote-send ":vert diffsplit PATH/TO/file2.txt"
vim --servername DIFFSERVER --remote-send ":colorscheme zellner"
vim --servername DIFFSERVER --remote-send ":let g:html_use_css=1"
vim --servername DIFFSERVER --remote-send ":TOhtml"
vim --servername DIFFSERVER --remote-send ":w! PATH/TO/diff.html" 
vim --servername DIFFSERVER --remote-send ":q!"
vim --servername DIFFSERVER --remote-send ":q"
For now it seems to work reasonably fast even on large files with many differences (except when you use dynamic folding). Diff to html? Piece of cake! ;)

3 comments:

  1. Thanks! This was just what I was looking for. Helped me a lot :)

    ReplyDelete
  2. Thanks alot, Stefaan!

    ReplyDelete
  3. Nice - did not know you could export like this - good post!

    ReplyDelete