Immutable classes, my new favourite methodology
September 8, 2014
As part of Grasshopper 2.0 development I’m trying to read up on good and modern coding practices. Since multi-threading is a design goal for GH2, I need to make sure that the foundations of the new code are thread-safe. There’s more than one way to make code thread-safe of course, but I’ve become exceedingly fond of the immutable types approach. It helps that Eric Lippert is advocating immutability, he typically knows what he’s on about.
There are more than one ways to think about immutability, but here I’m concerned with write-once immutability, meaning that fields get assigned from within constructors and then are never allowed to be replaced with another value. The problem is that C# has no fundamental game plan when it comes to immutability and thus trying to use the concept in your own code means treading very, very carefully.
C++ does at a more fundamental level support a certain flavour immutability by providing the const keyword. Const-ness can be assigned to methods, instances of classes, and structs. When you are given a const instance of a class, you are not allowed to change it in any way. Take for example the Rhino.Geometry.Curve class from RhinoCommon. Curve provides a lot of non-static methods, some of which change the state (ChangeClosedCurveSeam and MakeDeformable for example) whereas other methods do not (Fair and ExtendByLine for example which both return new curves that represent the result of the operation). And of course there’s a whole slew of analysis methods that do not change the curve state such as IsPlanar, IsLinear, and GetLength. If constness was a supported property in C#, methods like Fair and IsLinear would be marked as const methods, meaning they can be safely invoked on a const instance of Curve. The big benefit of the C++ approach is that you can share data and decide whether or not you trust someone to change your data using keywords, rather than runtime variables or multiple types. There is no need for a separate ConstCurve class which only exposes safe methods, and a derived Curve class which exposes the non-safe methods as well.
In C# the traditional way to mimic C++ constness is to use interfaces. The curve class implements the IConstCurve interface, and through this interface only safe methods are exposed. Now you can choose to either give someone a full blown Curve instance and let them modify it to their heart content, or you can choose to give them the Curve instance but disguise it as a IConstCurve interface. The drawback of this approach is that it requires a lot of additional interfaces. The drawback of both the C# and C++ approaches is that it is predicated on trust. C++ constness is a compile time property, not enforced during runtime. It’s possible to ‘cast away constness’ and start modifying instances you weren’t supposed to. Similarly, in C# you can cast the interface to the unprotected class and violate it as you please. I do not think this is a particularly big drawback, if some other programmer is willing to hack your code then all bets are off and you can safely and squarely blame her for any subsequent crashes.
So instead of choosing whom to trust with our precious data and risking our trust being taken advantage off, how about we go ultra-paranoid and don’t allow anyone to change any data, ever, anywhere? This approach is what I’ll call immutability and it’s not a novel concept. It’s long since been gospel that structs should always be immutable, unfortunately very few people seem to take this to heart. The Rectangle structure in System.Drawing or the Point3d, Plane, and Circle structs in RhinoCommon are fully mutable which means I can write code like this:
Point3d point = new Point3d(4,6,8); point.Transform(_rotation); point.X += 10.0;
It’s nice and compact code and there doesn’t seem to be anything wrong with doing it like this. What might it look like if Point3d was immutable instead?
Point3d point = new Point3d(4,6,8); point = point.Transform(_rotation); point = new Point3d(point.X + 10.0, point.Y, point.Z);
Surely that’s worse! Line 2 is more complicated than before and line 3 is simply a monstrosity. Not only does it require more code, it also entails many more operations. It’ll be slower and less memory efficient. So why is this any better? Well, it depends on what you mean by “better”. If the code you’re writing is extremely performance critical and you can totally trust anyone who has access to your data, mutable classes and structs are probably better. But I’ve certainly never written code that falls into either of those categories, let alone both. And neither have you, unless you think you can somehow be trusted with your own code.
The problem with mutable types is that it’s really hard to know when they are mutated. Unless you have properly encapsulated everything and are either keeping track of changes* or raising events whenever states change**, there’s no way of knowing when your data changes and who’s doing it. Immutable types may well lead to uglier code, but the security and reassurance immutability provides vastly outweighs the drawbacks.
Let’s have a look at a famous immutable .NET framework class. If you write C# or VB code you’ve almost certainly used this type many times, perhaps even without realising it is in fact immutable. System.String, the core type used to represent text in .NET can never be changed once it’s constructed. Methods that seemingly modify the string class all return new strings instead:
string text1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; string text2 = text1.Replace("M", string.Empty); string text3 = text1.SubString(5, 3);
At no point does one call a method on String which modifies the text stored inside the string instance. This does sometimes confuse people, as they forget to store the return value of a string method, assuming it acts on the original string. I.e.:
string text = "A B C D E F G "; text.Replace(" ", string.Empty);
does absolutely nothing as the return value of the Replace() method is not stored in any variable. The fact that C# allows this code is just another example of the fact that immutability isn’t really a core concept in C#***. I mentioned before that immutability may result in lower performance and increased memory usage, as it is no longer possible to modify individual fields of types. Instead one has to create a new instance of that type which reflects the change. This is a simplification which doesn’t always hold true. Let’s look at a few conceptual and practical advantages of immutable objects:
- Since no data ever changes, it is completely safe for multiple instances to share other instances as internal fields. For example imagine two instances of the Brep class, both of which share 5 out of 6 of their faces. If the BrepFace can be modified, then sharing such data becomes a risky undertaking.
- Since no data ever changes, computing cached results is much easier. One never has to worry that cached results become invalidated. It is quite expensive to compute the BoundingBox for a Brep object for example, so it makes sense to cache the results once computed. But if you don’t know who has access to the control-point positions of the surfaces that make up the Brep, your cached boundingbox may well become ‘stale’ at any moment without you being aware of it. This is an example of a performance increase associated with immutability.
- Memory efficiency can be -in some cases drastically- improved by having a single repository of common values. The framework String type does this by maintaining a something called an ‘intern pool’. When you declare 10,000 different string instances using the same content, the actual character sequence is only stored in memory once, and all 10,000 string instances point to that same memory.
- Immutable objects are always thread safe. You don’t have to worry about synchronizing them (which leads to decreased performance) or duplicating them (which leads to increased memory usage).
- … there are many more benefits …
Unfortunately as mentioned earlier, C# does not have special plumbing associated with (im)mutability. Many framework types (even structs!) are mutable and thus dangerous to use as building blocks of immutable types. Immutability (much like constness in C++) is something that has to be done from the bottom up and it has to be done right.
* Such as enumerable collections, for the purposes for detecting changes to the collection during foreach loops.
** Now there’s a lot of extra code and a huge performance impact if ever I saw one.
*** ReSharper does provide a compiler warning for such an obvious mistake.