Sunday, April 1, 2012

Mutable state considered harmful?

Image by Török Gábor (nyuhuhuu) on Flickr

Musings on mutable state

In this video, Misko Hevery argues that global state is bad from a testability point of view. Global state resides in global variables and singletons. They make the API deceptive in the sense that when you repeat the same calls, you can still expect different results because the results depend on both the object's state and on global state which is not obvious to someone reading the code.

After seeing that video it occurred to me that the exact same argument applies to mutable data members in a class. They are a kind of "globally available state inside a given object instance" that can make the API deceptive. Consider the following code: As you can see the "Display" method changes internal state of a TestClass instance, and causes different results for every new API invocation. So the arguments Misko Hevery use seem valid ones, but applying them rigorously has deep implications: it seems to me it really argues against side-effects and thus in favor of purely functional programming.

By now I have seen a fair share of hard-to-trace bugs caused by side-effects of method calls, so it struck me that perhaps it is possible to find some compromise between mutable state and purely function programming (and it's really just that: an idea resulting from a brainstorm; I haven't actually used it in production as I fear it may have significant performance implications). If you have additional insights into the matter, feel free to comment.

Proposal: Forbid non-const methods.

How is this different from existing practices?

Difference with const correctness

Const correctness argues that one should prefer const methods and passing objects by references/pointers to const, but it doesn't forbid non-const methods. In my proposal, non-const methods are forbidden (e.g. enforced by a suitable compiler). Note that in my proposal it is not forbidden to pass references to non-const objects as arguments. So in my proposal, objects cannot change themselves but they can change arguments they receive (in practice, this means local variables of plain old data types, as other objects only have const methods which prevents them from being changed). The fact that you have to pass in the arguments as function parameters makes it obvious that the code might change them. This is a case of changing state that doesn't happen "behind the scenes", and therefore can be defended.

Difference with purely functional programming

In my proposal it is still allowed to have mutable local variables of plain old data types, which allows writing "for"-loops without having to resort to recursion. I have nothing against recursion per se, and in the proper context it can make code more concise and readable, but in many cases for-loops are simpler to understand, and more efficient in terms of run-time and memory usage (especially in languages like C++ which do not have built-in tail call elimination). Interestingly, the side-effects in constructors and destructors are what make tail call elimination very hard to impossible in general in C++ -like languages.

What can you do with immutable objects? Isn't the very point of defining an object to group together the data and the operations that operate on that data?

An object indeed groups data and operations, but nothing prevents the object from creating a new object as a result of performing operations on its own data (as opposed to modifying its own internal state). Instead of modifying objects, you can always create a new object with the desired values. The old object can either be thrown away, or kept around in case someone still needs it.

But isn't this incredibly inefficient from a performance point of view?

On the one hand, there probably is some price to pay. On the other hand, one probably could fine-tune the run-time environment to make creation of many similar objects cheaper (perhaps by keeping around and reusing discarded objects?). Add to that the fact that modern computers are fast enough to deliver reasonable performance with much more exotic ideas like dynamic typing. Let's not forget why we would use this system: to limit the impact of side-effects, and have less trouble understanding how our code works. The advantage, should it work the way it is envisioned, would be in more readable code, a reduced bug count and easier to trace bugs and therefore increased programmer productivity.