Monday, September 19, 2011

Generate side-by-side diffs in html using vim

Problem

Given two text files, generate a visually appealing diff between them. Do so in html, so it can be visualized in a web browser.

Some approaches

Searching the web

The internet has a suprisingly little amount of flexible standalone tools to generate html diffs that fullfill all my requirements. My requirements include
  • generate side-by-side diff
  • allow to save to html without user intervention
  • work fast on large files with lots of differences
This overview is by no means meant to be exhaustive:
  • lots of diff tools exist, but to the best of my knowledge none of them can generate html.
  • match-patch doesn't seem to generate the layout i'd like to see.
  • diff2html.py and derived tools won't generate intraline differences.
  • csdiff only works on windows systems (but in all honesty, it's faster to generate diffs than using the VIM approach described in this article, and the generated html is more user friendly because it includes navigation between differences)

Rolling your own

In the past i have rolled my own solution in python, using two different approaches:
  • Using python's difflib. This is by far the easiest solution, but it can be very slow for long files with a lot of differences between the two files under comparison.
  • Using a combination of GNU diff and python's difflib. I used GNU diff to generate a rough diff, then used python difflib to generate the intraline differences. This was a lot faster than using only python's difflib already, but still too slow on very large files with many differences. This approach does have the advantage of being very flexible: this tool can generate both side-by-side and line-by-line differences with or without intra-line differences indicated.

Using vim

The text editor vim has an option to visualize the diff of two files by calling it with the command line option "-d":
gvim -d file1.txt file2.txt
Vim highlights as the intraline difference everything between the common prefix and common suffix of a line. Depending on your needs, this may be too much of an approximation. In recent enough versions of vim, the complete diff can be rendered to html using the command
:TOhtml
This command will create a third buffer containing the html representation. Colors are determined by the active colorscheme. Colorscheme can be changed using the command
:colorscheme <name>
Colorscheme only seems to have the expected effect if you use gvim instead of vim. You can experiment with using "vim" instead of "gvim". The names of possible colorschemes can be found underneath the colors folder in the vim installation folder. By default the following colorschemes were installed on my system:
  • blue
  • darkblue
  • default
  • delek
  • desert
  • elflord
  • evening
  • koehler
  • morning
  • murphy
  • pablo
  • peachpuff
  • ron
  • shine
  • slate
  • torte
  • zellner

Automating the html generation using vim

All of the above consists of steps to perform manually, which is ok if you have to compare two files, but far from ok if you have to diff hundreds of files. So how can it be automated? Luckily vim has a few interesting command line options, one of which is the option "-c". Option "-c" allows you to pass a command that will be executed on startup, after the file(s) are loaded. You can pass more than one "-c" command. The following one-liner will load the files, select a colorscheme, generate the diff, save it as html to a file called diff.html, and close vim:
gvim -d orig.txt modified.txt -c "colorscheme zellner"\
-c TOhtml -c "w! test.html" -c q! -c q! -c q!
Note that if you intend to send a lot of commands to vim, you will want to use the command line option "-s" , which allows you to specify a file with vim commands to be executed after the first file is loaded. so the previous command line then becomes
gvim -d orig.txt modified.txt -s commands.vim
and the contents of commands.vim are:
:colorscheme zellner
:TOhtml
:w! test.html
:q!
:q!
:q!

Generating html fragments to embed in a bigger page

Here's one approach to strip the <html> and <head> tags, and replace the <body> tag with a table. In short, replace the previous commands.vim file with this one:
:colorscheme zellner
:let g:html_use_css=0
:TOhtml
:%g/<body/normal k$dgg
:%s/<body\s*\(bgcolor="[^"]*"\)\s*text=\("[^"]*"\)\s*>/<table \1 cellPadding=0><tr><td><font color=\2>/
:%s#</body>\(.\|\n\)*</html>#\='</font></td></tr></table>'#i
:w! PATH/TO/diff.html
:q!
:q!
:q!
Vim's TOhtml command takes a few options that allow you to influence the html generation. Some examples are listed below. Be sure to check out the vim documentation to find out about other useful options.
  • add
    -c "let g:html_use_css=0"
    to avoid generating css (for using with very old browsers, or to embed in emails);
  • add
    -c "let g:html_dynamic_folds=1"
    to generate html with css that allows dynamic folding of sections in the files (useful for source code)
  • add
    -c "let g:html_number_lines=1"
    to show line numbers
  • add
    -c "let g:html_no_pre=1"
    to wrap long lines instead of having scrollable columns
  • add
    -c "let g:html_use_xhtml=1"
    to generate XHTML instead of HTML
Needless to say, if you (ab)use vim in this way in a multi-user environment you will have to take care of choosing a suitable .html filename (unique name for each user/process) to avoid multithreading problems.

Using vim in server mode

Here's another neat trick in case you want to avoid starting up vim over and over again... You can start vim in server mode, meaning that it can listen to remote commands, as follows
gvim --servername DIFFSERVER
DIFFSERVER is just a name I chose. It's an id that is used to identify the correct vim instance that should receive your commands. After you've started vim in server mode you can start a different console (or keep using the same one...) and send commands to the vim server:
vim --servername DIFFSERVER --remote-send ":e PATH/TO/file1.txt"
vim --servername DIFFSERVER --remote-send ":vert diffsplit PATH/TO/file2.txt"
vim --servername DIFFSERVER --remote-send ":colorscheme zellner"
vim --servername DIFFSERVER --remote-send ":let g:html_use_css=1"
vim --servername DIFFSERVER --remote-send ":TOhtml"
vim --servername DIFFSERVER --remote-send ":w! PATH/TO/diff.html" 
vim --servername DIFFSERVER --remote-send ":q!"
vim --servername DIFFSERVER --remote-send ":q"
For now it seems to work reasonably fast even on large files with many differences (except when you use dynamic folding). Diff to html? Piece of cake! ;)

Sunday, September 4, 2011

Dependency graph visualization

The problem

In the case of real software projects, graphviz will come up with an incredible mess of interconnected nodes, which is nearly impossible to navigate. What can be done ?

Specialized graphviz browser

I serendipitously found the ZGRViewer tool, implemented in JAVA, which has features that are aimed directly at visualizing and browsing large graphviz graphs. The most useful features, for the time being, are only available in the unreleased SVN version though. The currently available version for download is somewhat useful, but I'm really waiting for version 0.9.0 to become available. It looks promising, but it remains to be seen if this will actually let us draw any conclusions from inspecting the graph. (For now, formulating prolog queries is a much more powerful tool to find out interesting facts about the source code under test.)

Different layout engine: circos

Not long after finding ZGRViewer, I found a potentially interesting alternative to graphviz in the form of circos. Circos has an interesting approach to visualizing large data tables (you had better refer to their website for details). It was rather easy to add some functionality in pycdep's prolog template that lets us export a set of dependencies as an adjacency matrix, which can then be visualized using the circos tableviewer utility script. Circos offers a huge amount of customization possibilities, and I haven't exactly delved into them. The drawing on top of this blog post, is the result of running circos with all default options on the STAF/STAX source code that I've been using in previous blog posts about pycdep. I'm certain I haven't even scratched the surface of what is possible with this fascinating tool.

Friday, September 2, 2011

Making a json-rpc request from swi-prolog to some other program.

Picture courtesy of http://vitaminsea.typepad.com/vitaminsea/2007/10/index.html

Posting JSON-RPC requests from a prolog program

This tutorial basically performs the opposite of the previous blog entry. We will make a prolog client that talks to a python JSON-RPC service.

Creating the python server

There are several options to create a python JSON-RPC server. One is getting the jsonrpclib library, another one is using a web application framework with support for JSON-RPC - the easiest to learn and use by far being web2py: just unpack the source code, python web2py.py and you can immediately start. No configuration or additional dependencies needed. In web2py, you click a single button to create a simple application named simple_json_server (or any name you like ;) ), then click one button to edit the controller code and add the following code: And that's it! Now we have a fully functioning json-rpc server living at URL 'http://127.0.0.1:5000/simple_json_server/default/call/jsonrpc'.

Creating the prolog client

Creating the prolog client is just as simple.

Practical issues when using JSON-RPC from prolog

Talking to different JSON-RPC servers may result in receiving different replies for the same question:

  • The ordering of the terms in the json([...]) reply can vary from server to server. Swi-prolog chose to optimize for speed, and doesn't provide any tools to cope with such variations, assuming that the reply from one given server would always return terms in the same order.
  • Servers that serve JSON-RPC 1.1 are allowed to leave out the "jsonrpc"="1.1" from their reply. They can apparantly also can add "error"="null" to indicate that no error occurred. 
So in real-life usage of JSON-RPC we have to deal with varying ordering of fields, and with optional fields. For JSON-RPC requests without nested parameters, I found it beneficial to convert the JSON-RPC reply to an association list first, from which then all fields we're interested in can be retrieved and processed.


One way to go about converting a simple json reply to an association list is as follows: While we're converting the list to a form that makes it suitable to convert into an association list, we can also do some processing, like removing null fields, or converting @true to true and @false to false: From the association list it is now easy to get the values we're interested in, in the order we are interested in receiving them.

Note that failing to configure a correct content type when interacting with the server, will cause your reply to show up as a prolog atom instead of a parsed prolog representation. A bit to my surprise, web2py responded with a content-type "'text/html'; charset=utf-8" instead of the expected "application/json". (Edited to add: this was a bug in web2py version <= 1.98.2 - the development version now will return "application/json; charset=utf-8") To make sure that 'text/html; charset=utf-8' is parsed as a JSON-RPC request, we can make the parser believe that this is a valid JSON-RPC reply: http_json:json_type("'text/html'; charset=utf-8"), or in the case of web2py > 1.98.2: http_json:json_type("application/json; charset=utf-8". Perhaps the swi prolog json parser could be made a little bit smarter to ignore the charset definition (Edited to add: this was solved after a bug report. Should be available in swi-prolog versions > 5.11.26)