[.NET fundamentals] Classes vs. Structures
June 24, 2012
A lot of people who use Grasshopper will at some point be confronted with VB.NET or C#. Sometimes by choice, sometimes because someone else posts a script that does what they need. If you have no prior programming experience, this can be quite a shock, as neither VB.NET nor C# are particularly low-threshold languages. It is also possible that you have some experience with Rhino coding via RhinoScript or the old Rhino SDK. In both cases there’s a lot of Object Oriented Programming concepts that will be new to you. In this series of posts called “.NET fundamentals” I’ll try and explain some of the basic concepts that one must understand in order to write code for RhinoCommon. Three of the most frequent stumbling blocks are:
- value-types vs. reference-types
- shared-methods vs. instance-methods
- interfaces and generic types
These words probably mean little to you if you’re just starting out, so I’ll try and refrain from using unexplained jargon in these posts. Also, since C# and VB almost uniformly use different words for all these concepts, I’m just going to stick to VB-speak. However before we get going I’d like to make it clear that almost everything I’m about to tell you is not true. At least it’s not necessarily true. It is however a useful fiction that will make certain things easier to understand. But before we can start talking about any of the abstract stuff, we must first talk about memory.
If you’re running a modern version of Windows you’ll probably have somewhere between 2GB and 32GB of RAM stashed somewhere in your computer. However the total amount of memory that Windows can use depends solely on the bit-depth of the operating system, so if you start using more memory than you actually have, bytes will be stored on the hard-drive instead of the RAM. This process is called paging and since access to the disc is much slower than access to RAM, it should be avoided if at all possible. If on the other hand you have more memory than you could possible use, a lot of it will just sit there doing nothing.
So let’s say that we’re running a 32-bit Windows which has a total of 4GB of physical RAM. This equals a total of 34 billion, 359 million, 738 thousand and 368 bits. But even though a bit is the smallest possible unit of information, the smallest unit when we talk about computer memory is the byte, where a single byte contains 8 bits. So it makes more sense to say that you have a little under 4 billion, 300 million bytes at your disposal. Every single byte in the memory has a unique address, which means every single byte can be accessed individually. Instead of actual memory address notation, I’d like to use street addresses because they are more humanly readable, plus I have a very clever analogy lined up that will benefit greatly from this scheme.
32-bit Windows by default allows every application that runs on it a maximum of 2GB of memory. However this is just a promise, it doesn’t actually create a 2GB empty block of memory for every program to run in, because it could only create two such blocks before all memory runs out. So instead Windows just allocates a small amount of memory which is enough for the program to get started. Whenever the program exceeds that amount, a little bit more is allocated. This can continue until a programs wants to use more than the preset limit of 2GB, at which point Windows will strangle it to death. Note that there is absolutely no reason why all the memory that is reserved for a specific program has to be continuous in the hardware. It may be scattered all over the RAM and the disc, but as long as Windows is keeping track of it, the program will never be confronted with the fragmented nature of the memory allocation process. This is why program memory is often called virtual, because programs may think they have a nice 2GB continuous space to do stuff in, but it’s all an illusion.
So let’s take a look at some actual code and see how it behaves from a memory point of view.
Dim valueOne As Int32 = 5
Dim valueTwo As Int32 = valueOne
valueTwo += 16
As far as programmatic logic is concerned, this is very simple indeed. All we do is define an integer and give it a value 5. Then define another integer, assign it whatever value the first integer happens to have and then increment by 16. So at the end of the code
valueOne should equal 5, whereas
valueTwo should equal 21. Now let’s break it down into atomic operations. The first thing that happens is that we declare a new variable called
valueOne. This variable is of type Int32, where the “32” refers to the total number or bits (not bytes!) taken up by the integer. Int32 is just one of many types of integer in the .NET framework. 32 bits equals 4 bytes, so this one integer value will take up 4 consecutive memory addresses. When we declare this variable the system allots 4 adjacent bytes somewhere in memory, say 62 West Wallaby Street up to and including 65 West Wallaby Street.
At this point
valueOne exists and it will have a value, namely whatever state the bits are in in this portion of memory. Unless something is totally borked, the only object with access to this particular section of memory is the
valueOne variable. The algorithm that is responsible for allocating and assigning memory addresses to variables should never assign the same memory to more than one owner, otherwise mayhem will ensue.
Once a section of memory has been allotted, the next step is the assignment of the value 5 to this location in memory. This is just a matter of setting some of the bits in that region to 0 and others to 1.
The third step is very similar to the first step, except this time a different range of addresses will be awarded to a different variable, say High Street 4 to High Street 7. Note that it need not be in the same street, any valid address will do. Windows may choose to assign a different program French addresses, just to make sure the two programs never overlap in memory.
When we say that
valueTwo = valueOne, the number 5 will be copied from West Wallaby Street into High Street. And since it was copied, any changes to either variable will not affect the other. Another important property of integers is that they always have some numeric value. Integers are not nullable as programmers like to say. If you declare a variable of type Int32 it will always have a value, be it zero, five, 16 or 1946298. After all, whatever the contents of those 4 addresses, they can always be interpreted as a number. Maybe it’s just gunk and the number is meaningless, but it will be a number nonetheless.
The kind of behaviour described above only holds for value-types. VB.NET calls these types ‘structures‘. When you look at the MSDN page for the System.Int32 type, you’ll find that the page is entitled “Int32 Structure“. Structures are fairly simple since they contain their own data. Most primitive types in .NET are implemented as structures; bytes, booleans, decimals, singles, doubles, guids, points and rectangles for example all behave exactly the same as integers.
The opposite of a value-type is a reference-type, or class. The vast majority of types in the .NET framework are classes rather than structures and they are stored very differently in memory. First, let’s rewrite the code using a reference-type.
Dim valueOne As New Font("Arial", 10)
Dim valueTwo As Font = valueOne
valueTwo.SizeInPoints = 16
Font is a class, not a structure, and therefore it doesn’t contain any data, it merely contains a memory address which links to the actual data. In C++, this kind of type is called a ‘pointer’, as it points to where the data is actually stored. So where is the Font data stored and what’s the benefit of not having it inside
valueOne, like before?
VB.NET is a managed language. It means that there is some process which is responsible for allotting memory addresses when needed and releasing memory addresses when they stop being needed; in effect a memory manager. Languages such as C++ require that the programmer manage her own memory. Memory management is sometimes very tricky and it’s possible that you either start overwriting addresses that do not belong to you or you prevent others from using memory which no longer serves any purpose. The first kind of problem is usually called memory-corruption, the second memory-leak. VB.NET solves this problem (at a performance cost and memory overhead) by having a single manager which owns all class instances. This manager is called the Garbage Collector since that is its main task. Its job is to make sure that any instances that could not possibly be reached ever again by the program (i.e. ‘dead objects’) are removed from memory and their addresses re-used for data which is still alive. In order to accomplish this, the GC must own all class instances.
So when you declare a variable of type Font, it is initially null. Reference-types are nullable, meaning they can contain no data whatsoever. Not zero, zero is still something. We’re talking less than zero; nothing. When we declare a variable as
New Font(), we instruct the GC to create a new Font instance and store it somewhere in the memory. Then, the address of where it was put will be stored inside the
The upshot is that when we now assign one variable to another, i.e. valueTwo = valueOne, we’re not copying the font data, but only the font address. Now we have two variables pointing to the same font and therefore if we change one, it will also affect the other. The benefit of this is that we can use the same data in multiple places when we want to share it, and we don’t have to copy it all the time either, which can be quite an expensive operation if the data represents lots of bytes.
RhinoCommon too consists of both value-types and reference-types. Small chunks of data are usually implemented as structures; points, vectors, intervals, planes, circles, transformations, whereas large and variable data is typically implemented as a class; curves, meshes, breps.
A good analogy may be to imagine variables as pieces of paper with information written on them. You can always copy the contents of one piece of paper onto another, and you’d end up with two disjoint instances of the same data. Changing one will not affect the other. Value-Types are simply pieces of paper with something written on it, Reference types are pieces of paper with a code written on them. This code tells you in what drawer of the Garbage Collector cabinet to find your data:
A good understanding of the difference between structures and classes is vital if you want to write VB/C# code without developing a severe mental disorder. Here are some useful links that discuss value and reference types in further detail:
- Using Classes and Structures in Visual Basic .NET (MSDN article)
- Value Types and Reference Types (MSDN article)
- A .NET primer on reference types and value types (TechRepublic article)
- C# From A to Z – Lesson 2: Value vs Reference Types (YouTube video series)