Friday, December 16, 2011

Application specific software metrics

Image available under creative commons license from http://www.flickr.com/photos/enrevanche/2991963069/

How to make developers like software metrics

The problem

Managers like metrics. Almost every software project of a given size is characterized by metrics. According to wikipedia,

A software metric is a measure of some property of a piece of software or its specifications. <...> The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance testing, software debugging, software performance optimization, and optimal personnel task assignments.

Typical OO software metrics include things like: "lines of code", "number of modules per 1000 lines of code", "number of comments", "McCabe complexity", "Coupling", "Fan-in" or "Fan-out" of classes. The problem many such general metrics is that they describe (aspects of) the state of your software, but they don't tell you how to go about improving something. At most they give vague hints if any at all: I would hope it's clear to experienced developers that adding lines of code is not something to actively strive for, ~~unless you get a bonus for each line of code you write~~. Does a high fan-in mean high coupling or rather good code reuse? Does a high number of comments automatically imply they are relevant and up-to-date? It could also be a sign that your codebase is so hard to understand that one needs to include half a manual with each line of code?

An alternative

What can be done to make sure that developers like metrics too? In my opinion, we have to carefully craft our metrics so that they fulfill three basic properties:

Single number: it should be possible to summarize each metric in a single, meaningful number. For one specific metric, a higher number must always mean a better (worse) result.
Concrete action: for each deterioration it must be unambiguously clear how to go about improving it.
Automatable: the metrics must be easy (and preferably very cheap) to calculate automatically. Each developer can be warned automatically if he made a metric significantly worse (or rewarded with a positive report if she improved it) after committing changes to the version control system.

You think this sounds like Utopia and probably requires extremely expensive tools? Think again! With minimal efforts one can already attain some very useful application specific metrics.

Examples

Here I list some possible metrics (actually I have implemented all of these and more as a "hobby" project at my day job):

Counting regular expressions

A lot of useful metrics can be built by counting occurrences of regular expressions. Examples:

Counting deprecated constructs: While introducing a new framework in a significant piece of software, there will always be a period where your software features a mixture of both old code and new framework code. Count the API calls of the old code. Anyone adding new calls to the old API is warned to use the new API instead.
Counting conditions: if you are removing "if (predicate) doSomething;" statements and replacing them with polymorphism, count how often the old predicates are called. Anyone who adds new calls to the predicate can be warned automatically about using the new framework instead.

Monitoring dependencies between projects and coupling between classes

If your software has a layered structure, your will typically have constraints about which projects are allowed to include from which other projects. Count and list all violations by analyzing your include dependencies. I also use include dependencies to get a rough estimation of coupling between classes and impact of adding/removing #includes (by calculating how many extra statements will have to be compiled as a result of the new #include). (Shameless self-plug: you can use my FOSS pycdep tool for this).

Monitoring McCabe complexity

If you add new "if" statements, you can be warned automatically about increased complexity. This can be automated using a tool like sourcemonitor which has command line options that allow you to bypass its GUI and to integrate it in your own flow.

Unit tests

Check the results of your unit tests after each commit. Anyone breaking one or more tests is warned automatically about fixing them.

Using it in real life

Of course, no one is stopping you to add to these basic tools some machinery to run the metrics automatically on every commit, preferably generating incremental results (i.e. the metrics should measure what changed compared to the previous commit, so you get a clear idea of the impact your changes had) generate diffs and distribute over different computers and collect the results using some suitable framework, make the reports available using a web application created in an easy-to-use web application framework.

In my day job I have set up two such systems running in parallel: the first system will send one email a day, summarizing all changes in all metrics compared yesterday's version of the software (or for some metrics also the changes that took place since the start of the new sprint). The comparisons happen by comparing the metrics tool test reports with reference reports. Reference reports have to be updated explicitly (via the web front-end) annotated with a reason and a rating (improvement/deterioration/status quo). All team members get this report so if one did a really good job of cleaning up code, everyone in the team becomes aware of it (and if he did a really lousy job, there might be some social pressure to get it right ;) ). The second system calculates incremental metrics per commit, sends email reports to the committer only, but makes the reports available for interested viewers on the intranet (together with author, commit message and revision number).

Although such system can sound scary (I named it "Big Brother"), in practice we only use it to improve both code and team quality and an anonymous poll showed that without exception, every developer liked it (no one wants to dive into someone else's lousy code :) ) Reports with significant changes (good or bad) are discussed in the team on a daily standup meeting, and can identify misunderstandings about the architecture of the code, or result in ideas for organizing training sessions.

Caution

Relying solely on such metrics can give a false fuzzy feeling of software quality. One only improves what is measured. Code review by experienced team members is a good addition to all of the above.

If you, dear reader, have ideas for other metrics, or remarks about the contents of this or other blog entries, feel free to comment.

Friday, September 2, 2011

Making a json-rpc request from swi-prolog to some other program.

Picture courtesy of http://vitaminsea.typepad.com/vitaminsea/2007/10/index.html

Posting JSON-RPC requests from a prolog program

This tutorial basically performs the opposite of the previous blog entry. We will make a prolog client that talks to a python JSON-RPC service.

Creating the python server

There are several options to create a python JSON-RPC server. One is getting the jsonrpclib library, another one is using a web application framework with support for JSON-RPC - the easiest to learn and use by far being web2py: just unpack the source code, python web2py.py and you can immediately start. No configuration or additional dependencies needed. In web2py, you click a single button to create a simple application named simple_json_server (or any name you like ;) ), then click one button to edit the controller code and add the following code: And that's it! Now we have a fully functioning json-rpc server living at URL 'http://127.0.0.1:5000/simple_json_server/default/call/jsonrpc'.

Creating the prolog client

Creating the prolog client is just as simple.

Practical issues when using JSON-RPC from prolog

Talking to different JSON-RPC servers may result in receiving different replies for the same question:

The ordering of the terms in the json([...]) reply can vary from server to server. Swi-prolog chose to optimize for speed, and doesn't provide any tools to cope with such variations, assuming that the reply from one given server would always return terms in the same order.
Servers that serve JSON-RPC 1.1 are allowed to leave out the "jsonrpc"="1.1" from their reply. They can apparantly also can add "error"="null" to indicate that no error occurred.

So in real-life usage of JSON-RPC we have to deal with varying ordering of fields, and with optional fields. For JSON-RPC requests without nested parameters, I found it beneficial to convert the JSON-RPC reply to an association list first, from which then all fields we're interested in can be retrieved and processed.

One way to go about converting a simple json reply to an association list is as follows: While we're converting the list to a form that makes it suitable to convert into an association list, we can also do some processing, like removing null fields, or converting @true to true and @false to false: From the association list it is now easy to get the values we're interested in, in the order we are interested in receiving them.

Note that failing to configure a correct content type when interacting with the server, will cause your reply to show up as a prolog atom instead of a parsed prolog representation. A bit to my surprise, web2py responded with a content-type "'text/html'; charset=utf-8" instead of the expected "application/json". (Edited to add: this was a bug in web2py version <= 1.98.2 - the development version now will return "application/json; charset=utf-8") To make sure that 'text/html; charset=utf-8' is parsed as a JSON-RPC request, we can make the parser believe that this is a valid JSON-RPC reply: http_json:json_type("'text/html'; charset=utf-8"), or in the case of web2py > 1.98.2: http_json:json_type("application/json; charset=utf-8". Perhaps the swi prolog json parser could be made a little bit smarter to ignore the charset definition (Edited to add: this was solved after a bug report. Should be available in swi-prolog versions > 5.11.26)

TechnoGems