Saturday, March 24, 2012

Syntax highlighting stax scripts in vim

What is Staf/Stax?

STAF is a Software Testing Automation Framework. It provides all basic infrastructure for creating robust distributed test environments. It's structured as a collection of services which can be called from the command line or from a ui, or from your favourite programming language. These services implement things like: "start a process on pc xxx and get the return code", "copy files from pc xxx to pc yyy", synchronization primitives, sharing of variables between pcs, sending mails, etc etc etc.

STAF is a very impressive piece of software. Don't let the outdated JAVA ui look and feel of some of the tools that come with it fool you. Everything can be activated from the command line. Making use of the JAVA uis is optional, but the ui offers lots of out-of-the-box functionality that I wouldn't want to program myself if I can avoid it. Probably the quickest way to get started writing a test harness is to use the STAX service. Lots of excellent documentation and tutorials on STAF/STAX are available from the official STAF/STAX website, but I found it beneficial to first read this introduction from Grig Gheorghiu. STAX offers an XML based language with embedded Jython fragments (Jython is a version of Python that compiles to the java virtual machine).

To be honest, STAX being based on XML doesn't make it the most programmer friendly programming language in existence (the value of what it provides to a certain extent compensates for the discomfort of using it). To make it somewhat user friendlier, good syntax highlighting is indispensable. The syntax highlighting is somewhat complicated because STAX really consists of a combination of two rather different languages in one: on one hand there's the XML that is used to define the high-level control flow of the test harnass. On the other hand there's embedded jython code.

If you understand why creating a syntax highlighting definition for a language like STAX could be a challenge, you might appreciate the elegance with which the problem can be solved in the world's most powerful programmer's editor: vim. Vim offers ways to define syntax highlighting in different modes (if you know vim, this concept of modes should not come as a surprise ;) ). The easiest way to syntax highlight STAX programs in vim is to reuse the existing syntax highlighting for XML and Python. Vim can switch between different syntax highlighting modes based on encountering certain regular expressions.

Syntax highlighting code

Put the following code in a file called stax.vim:

runtime! syntax/xml.vim
unlet! b:current_syntax
syntax include @Python syntax/python.vim
syntax region pythoncode matchgroup=pythongroup start=+<script>+ keepend end=+</script>+ contains=@Python
hi link pythongroup xmlTag

On my linux system, in the .vimrc file, I added

au BufNewFile,BufRead *.stax set filetype=stax

to activate the syntax highlighting for all files that have a .stax file extension. On a windows system, i had to add the above line to "Program Files"\Vim\vim73\filetype.vim instead.

With the syntax highlighting definition provided here, STAX becomes more programmer friendly. (Note: I posted this same code some time ago on the staf/stax mailing list)

Friday, December 16, 2011

Application specific software metrics

Image available under creative commons license from http://www.flickr.com/photos/enrevanche/2991963069/

How to make developers like software metrics

The problem

Managers like metrics. Almost every software project of a given size is characterized by metrics. According to wikipedia,

A software metric is a measure of some property of a piece of software or its specifications. <...> The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance testing, software debugging, software performance optimization, and optimal personnel task assignments.

Typical OO software metrics include things like: "lines of code", "number of modules per 1000 lines of code", "number of comments", "McCabe complexity", "Coupling", "Fan-in" or "Fan-out" of classes. The problem many such general metrics is that they describe (aspects of) the state of your software, but they don't tell you how to go about improving something. At most they give vague hints if any at all: I would hope it's clear to experienced developers that adding lines of code is not something to actively strive for, ~~unless you get a bonus for each line of code you write~~. Does a high fan-in mean high coupling or rather good code reuse? Does a high number of comments automatically imply they are relevant and up-to-date? It could also be a sign that your codebase is so hard to understand that one needs to include half a manual with each line of code?

An alternative

What can be done to make sure that developers like metrics too? In my opinion, we have to carefully craft our metrics so that they fulfill three basic properties:

Single number: it should be possible to summarize each metric in a single, meaningful number. For one specific metric, a higher number must always mean a better (worse) result.
Concrete action: for each deterioration it must be unambiguously clear how to go about improving it.
Automatable: the metrics must be easy (and preferably very cheap) to calculate automatically. Each developer can be warned automatically if he made a metric significantly worse (or rewarded with a positive report if she improved it) after committing changes to the version control system.

You think this sounds like Utopia and probably requires extremely expensive tools? Think again! With minimal efforts one can already attain some very useful application specific metrics.

Examples

Here I list some possible metrics (actually I have implemented all of these and more as a "hobby" project at my day job):

Counting regular expressions

A lot of useful metrics can be built by counting occurrences of regular expressions. Examples:

Counting deprecated constructs: While introducing a new framework in a significant piece of software, there will always be a period where your software features a mixture of both old code and new framework code. Count the API calls of the old code. Anyone adding new calls to the old API is warned to use the new API instead.
Counting conditions: if you are removing "if (predicate) doSomething;" statements and replacing them with polymorphism, count how often the old predicates are called. Anyone who adds new calls to the predicate can be warned automatically about using the new framework instead.

Monitoring dependencies between projects and coupling between classes

If your software has a layered structure, your will typically have constraints about which projects are allowed to include from which other projects. Count and list all violations by analyzing your include dependencies. I also use include dependencies to get a rough estimation of coupling between classes and impact of adding/removing #includes (by calculating how many extra statements will have to be compiled as a result of the new #include). (Shameless self-plug: you can use my FOSS pycdep tool for this).

Monitoring McCabe complexity

If you add new "if" statements, you can be warned automatically about increased complexity. This can be automated using a tool like sourcemonitor which has command line options that allow you to bypass its GUI and to integrate it in your own flow.

Unit tests

Check the results of your unit tests after each commit. Anyone breaking one or more tests is warned automatically about fixing them.

Using it in real life

Of course, no one is stopping you to add to these basic tools some machinery to run the metrics automatically on every commit, preferably generating incremental results (i.e. the metrics should measure what changed compared to the previous commit, so you get a clear idea of the impact your changes had) generate diffs and distribute over different computers and collect the results using some suitable framework, make the reports available using a web application created in an easy-to-use web application framework.

In my day job I have set up two such systems running in parallel: the first system will send one email a day, summarizing all changes in all metrics compared yesterday's version of the software (or for some metrics also the changes that took place since the start of the new sprint). The comparisons happen by comparing the metrics tool test reports with reference reports. Reference reports have to be updated explicitly (via the web front-end) annotated with a reason and a rating (improvement/deterioration/status quo). All team members get this report so if one did a really good job of cleaning up code, everyone in the team becomes aware of it (and if he did a really lousy job, there might be some social pressure to get it right ;) ). The second system calculates incremental metrics per commit, sends email reports to the committer only, but makes the reports available for interested viewers on the intranet (together with author, commit message and revision number).

Although such system can sound scary (I named it "Big Brother"), in practice we only use it to improve both code and team quality and an anonymous poll showed that without exception, every developer liked it (no one wants to dive into someone else's lousy code :) ) Reports with significant changes (good or bad) are discussed in the team on a daily standup meeting, and can identify misunderstandings about the architecture of the code, or result in ideas for organizing training sessions.

Caution

Relying solely on such metrics can give a false fuzzy feeling of software quality. One only improves what is measured. Code review by experienced team members is a good addition to all of the above.

If you, dear reader, have ideas for other metrics, or remarks about the contents of this or other blog entries, feel free to comment.

TechnoGems