## What’s the point? [6]

### March 11, 2013

Fine, I lied. Post [5] wasn’t the last one after all. I’d like to propose one more optimization to the Quadtree/Octree algorithm as explained in the previous post.

As you may recall, spatial trees are clever because they [A] allow us to quickly —almost trivially— reject large portions of a dataset. The recursive nature of the spatial tree means that if we can reject a specific node, we can also reject all of its child-nodes. Furthermore, because the tree has variable density it is very well suited to handle unbalanced point collections.

We can reject a node in the spatial tree when the shortest distance from the search locus to the boundary of that node is already larger than the best answer we’ve found so far. From this it follows that the smaller the node region, the more quickly we can reject a node. Typically the regions of all child-nodes equal the region of the parent-node. However it’s only important to have a tree without gaps between nodes if we’re still in the process of inserting points into the tree. When we’re done adding points we can perform two fairly easy optimizations that will improve the search speed (though not enough to change the runtime-complexity).

Optimization number one involves shrinking every node region to contain exactly the points inside of it. Or —if the node contains child-nodes rather than points— shrinking the node region to contain exactly the regions of the child-nodes. By shrinking the node regions to the bare minimum we will increase the shortest distance from the search locus to the node region, which will —sometimes— result in earlier rejection.

Optimization number two involves simplifying the tree structure by removing nodes that contain exactly one child-node. We can ‘detach’ the single-child and plug it into the same slot as its parent used to occupy. By removing these pointless nodes we reduce the time spend traversing the tree.

The image above shows the same Quadtree we build in the previous post, but now the nodes are colour-coded to reflect what sort of optimization we can apply to them. Orange nodes are empty and can be removed entirely. Green nodes contain points whose collective bounding-box is smaller than the node region. By removing empty nodes and shrinking the remainder, we end up with a vastly reduced total structure.

Once you apply these optimizations to a tree structure it is no longer possible to insert more points, as they might fall in the gaps that now exist between adjacent nodes. Unless you’re willing to make your point insertion smart enough to grow node regions on demand of course.

Seriously, I’ll stop harping on about spatial trees now.

## What’s the point? [5]

### March 9, 2013

Welcome to the —for now— last post in this series on algorithmic optimization. We talked a bit about generic optimization, then we introduced a specific problem and explored different ways of speeding up finding the solution to that specific problem. We talked about brute-force searching and why it’s not an option when we’re dealing with large point collections. We discussed one dimensional sorting and spatial bucketing and the merits and drawbacks of each algorithm. It is now time to discuss the ultimate of closest-point-searching algorithms, the weapon of choice for the informed and discriminating programmer; *spatial trees*.

But let us start by briefly rehashing the logic of the square-grid-search algorithm and why it’s not a particularly good one. By creating a grid of adjacent cells and assigning each point ** p** in a collection

**to whatever cell contains it, we gain the ability to explore**

*C**in a localized fashion. Although we may not know exactly which two points are direct neighbours, we do know which two cells are direct neighbours and therefore all the points in these cells can be treated as being in the same neighbourhood. This works splendidly while single cells do not contain too many points and while there aren’t too many completely empty cells. For balanced point collections we just need to pick a clever grid size, but for unbalanced collections we are rightly and truly screwed:*

**C**If we pick a large grid cell size then too many points end up in the same bucket and we’ve converted a smart search algorithm into a brute-force algorithm. If we pick a small grid cell size then there will be loads of empty cells and iterating over them will retard the search process.

If at this point you’re thinking “*If brute-force search is a prohibiting factor in the case of grid cells with many points, why not put a smaller grid inside those cells?*” you can feel real good about yourself. In fact, once we establish that it’s allowed to put a smaller grid inside the cell of a bigger grid, we can simplify the whole grid notion and only support grids that are composed of 2×2 cells. To create a nested grid structure like this, we only need a very short algorithm:

- Draw a rectangle around all points in
.*C* - If the number of points in this rectangle exceeds
, then subdivide the rectangle into four smaller rectangles.*N* - Assign each of the points in the parent rectangle to the appropriate child rectangle, and then repeat steps 2 & 3 for each child rectangle.

The above algorithm is recursive and the grid structure it generates is no longer one where all cells have the same size. If points are more densely packed in one region of space, the grid-structure will subdivide more often and it will result in smaller cells. If points are completely absent from another region of space the grid-structure will not subdivide at all and we won’t be plagued by a plethora of empty grid cells. If we apply this logic to an unbalanced point collection, we should get something like this:

Level 0 represents the top-most level of the grid. The number of points inside the outer boundary is too high and the region is split into four quadrants. The lower right quadrant still has too many points so it too is split into four quadrants. Then the process repeats one more time giving us a third level of subdivision. Now we have a grid containing ten cells of varying size containing the entire point collection. In fact two of the cells contain no points and can be removed from the grid structure in order to reduce memory overhead. The entire structure also contains three cells which host nested grids.

A spatial data structure like this is typically referred to as a *tree* in programming because the recursive and fractal nature of the algorithm resembles the branching patterns of biological trees. There are many different flavours of spatial trees, some more suited for specific problems, some easier to implement, some optimized for point insertion or deletion. What I’ve drawn above is usually called a Quadtree (or Q-tree) because it always subdivides into quadrants. If we extend this notion to 3D space and we have to subdivide a single box volume into eight sub-volumes and we call the resulting structure an Octree instead.

So how does searching in a Quadtree work? Remember that in order to perform a fast closest point search, we need to reject as many points as possible, limiting the search to only those points which are serious candidates. In some ways searching a Quadtree is easier than searching a fixed grid, one of the reasons being that we don’t actually have any empty cells. The search algorithm might work roughly like this:

- Build a tree
from the point collection*T*.*C* - Find the cell
in**L**which is closest to the search point*T*.*h* - Find the point
in*p*which is closest to*L*using —once again— brute-force search.*h* - Start recursing to sibling and parent cells to see if they contain points closer still.
- Every cell that has a minimum distance to
which is already larger than the smallest distance we found so far can instantly be skipped, as can all child-cells within it.*h*

Step 5 is where the real magic happens. Spatial trees make it very easy to weed out enormous portions of ** C** that are not near

**. Adding points to an existing tree structure is also very easy, even if they are outside the outer bounds of the level 0 cell. All we need to do in those cases is insert another level underneath level 0 (which then by definition becomes level 1) in order to grow the reach of the tree. Every time we add such a bottom level we double the reach, so even inserting points which are really far away will not take intolerably long (unlike fixed grids which grow in a linear rather than exponential fashion).**

*h*As I mentioned in post #1, optimization is often a difficult process and code which is optimized often gains so much complexity as to become unreadable —at least without significant effort— by third parties. Only start optimizing code once you’ve determined beyond a shadow of a doubt where the bottleneck in your program is.

## What’s the point? [4]

### March 9, 2013

One of the main drawbacks of the adapted-sorted-brute-force search algorithm we developed in the previous post is that it only sorts the data along a single coordinate axis. For two dimensional data that means we’re ignoring half the information. For three dimensional data we’re ignoring twice as much information as we’re using to speed up the search. The more dimensions, the less efficient that searcher becomes.

The key to optimizing a Closest Point Search is to reject as many points as possible because measuring the distance between points takes time. It might not take a lot of time to compute the distance between two points, but if you want to find the nearest point in a collection of 10 million points for 1 million other points, you end up computing 10^{7} * 10^{6} = 10^{13} distances if you take the brute-force approach. Even if a single distance test only takes 5 nanoseconds it will still take 50,000 seconds, which is almost 14 hours. The adapted-sorted-brute-force (which wasn’t all that brute-force by the time we were done with it) managed to reject points because their distance measured along the sorting direction exceeded the best result we’ve found so far.

What we’d like to accomplish today is roughly the same, except we want to be able to reject points in all directions, not just one. Instead of sorting all the points, what we’ll do instead is put them in spatial buckets, and these buckets in turn will be defined in such a fashion as to allow very easy navigation to neighbouring buckets. In other words: a grid.

In the image above you can see the entire point collection represented by the black dots. Point ** h** is represented by the red cross and we’re looking for the black dot closest to

**. This search algorithm also requires a pre-process step, but instead of sorting all the points we’ll instead assign them to whatever grid square contains them. So instead of a single large collection of points (**

*h***), we end up with a grid structure where each grid cell contains a subset of**

*C***. This is a very cheap process as it only takes a single subtraction, multiplication and rounding operation per dimension per point*. It’s also very easy to multithread this pre-process as it belongs to a class of problems known as embarrassingly parallel. The algorithm —which we shall henceforth refer to as the**

*C**square-grid-search*algorithm— would work as follows:

- Create a grid of square cells that encompasses all of
.**C** - Assign all points in
to their respective grid cells.*C* - Find which cell
contains*r*(or which is closest to*h*if*h*is not within the grid bounds).**h** - Find the nearest point
in*p*to*r*using brute-force search.*h* - If
is empty, extend to search to all eight neighbouring cells and so on until at least one point is encountered.*r* - Find the collection of grid cells
which intersect with the search boundary (represented by the green circle).*R* - Iterate over all points in
using brute-force to see if any of them are closer than the early result.*R*

The algorithm is fairly straightforward and easy to implement, but —like adapted-sorted-brute-force— is has serious drawbacks. It will perform very well on balanced point collections, but the performance heavily depends on the size of the grid cells. Too small cells and many of them will be empty, causing us to spend a lot of time iterating over pointless** areas. Too big and they’ll contain a lot of points which will slow down the algorithm as the intra-cell search is brute-force.

If the collection is unbalanced then we’ll end up with many empty cells and some cells which contain many points. As long as the intra-cell search remains brute-force, *square-grid-search* will not be suitable for such cases.

There is a new drawback as well. It’s quite simple and very fast to insert points into a sorted collection without breaking the sort order. It is also very easy to insert points into a spatial bucketing grid,* provided the points are within the grid boundaries*. If you wish to add points outside of the existing grid boundary, the grid will have to be grown, which can be a computationally expensive operation***.

Up next we’ll examine ways to make spatial bucketing more effective for unbalanced —indeed, wildly unbalanced— point collections.

* Assuming the Grid only has rectangular cells. Square cells make the algorithm much easier to implement though, not to mention hexagonal or free-form cells.

** Bow down before the Master of Puns!

*** It is possible to use a data structure known as a sparse array to store the grid cells rather than a standard multi-dimensional array. This effectively solves the insertion problem, but it also makes navigation between neighbouring cells slightly slower.

## What’s the point? [3]

### March 9, 2013

Remember the *sorted brute-force* search algorithm we came up with last time? It attempted to outperform regular brute-force searching by sorting the points left to right and then skipping as many points as possible by deciding that the best possible answer has already been found. We also mentioned that sometimes it makes more sense to iterate from right-to-left, or top-to-bottom or bottom-to-top depending on the relative location of the search locus (** h**) with respect to the point collection (

**) as a whole.**

*C*But in many cases we have to work our way from one extreme of ** C** to the point nearest to

**before we can seriously start to omit points. If only we could start somewhere in the vicinity of**

*h***then the chances of finding the best possible answer quick will be much higher, which in turn means we can skip more points, which in turn means we’ll be done quicker. So an improved sorted brute-force search algorithm might look like:**

*h*- Sort the points in
by ascending x-coordinate (i.e. from left to right).*C* - Starting from a point
in*p*somewhere in the vicinity of*C*, iterate towards the right and compute the distance to*h*.**h** - If this distance is less than the previous smallest distance, remember this new distance as the smallest distance and remember
as the best result so far.*p* - If the x-coordinate of
is higher than the x-coordinate of**p**+ the current smallest distance then stop iterating.*h* - Starting from the point
in*p*that was in the vicinity of**C**, iterate towards the left and compute the distance to**h**.*h* - If this distance is less than the previous smallest distance, remember this new distance as the smallest distance and remember
as the best result so far.*p* - If the x-coordinate of
is lower than the x-coordinate of**p**– the current smallest distance then stop iterating.*h* - When we’re done iterating, we’ll have remembered the one nearest to
.**h**

The only difference between this adapted search algorithm and the original is that instead of iterating from one end of ** C** towards the other, we start somewhere in the middle, at a point that is already near

*. But how do we quickly find a point near*

**h****? Wasn’t that what we were trying to accomplish in the first place?**

*h*Well, not quite. We’re looking for the point closest to ** h**. But for this adapted algorithm to work, all we need is a point

*somewhere in the vicinity of*

*. As long as it’s not way off, we should be able to get some benefit out of our smarter iteration logic. So the trick now is to find this magical point in the vicinity of*

**h****(let’s call it**

*h***) and of course we need to do so quickly because we’re trying to safe time, not spend it.**

*p*_{base}There is a more rigid definition of * p_{base}* than “in the vicinity of” and it may give a clue as to how we’re supposed to find it quickly.

*can be defined as that point*

**p**_{base}**in**

*p**whose x-coordinate is most similar to the x-coordinate of*

**C****. In other words, for this initial guess, we’re happy looking only at the distribution of**

*h***in the x-direction, ignoring the fact that**

*C***is a two-dimensional point collection and thus also has a distribution in the y-direction. There is a fantastically clever and fast and easy way to find a value in a sorted collection that is close to some other value. It is called Binary Searching.**

*C*A binary search progresses by taking a series of decreasing steps towards the right answer. It only requires that the data it is stepping over is sorted. A major benefit of binary searching is that the runtime complexity is O(log N), which is a very expensive way of saying that as the collection gets larger, it doesn’t take up correspondingly more time. Imagine for example that we use binary search to find an item in a collection containing 100 items. It will take —on average— N milliseconds to complete. If we now apply the same algorithm to a collection ten times the size, i.e. 1000 values, it might only take 2×N milliseconds rather than 10×N. If we make the collection bigger still, say 10000 values, it will take only 4×N milliseconds rather than 100×N.

Put even simpler, binary search —and other algorithms with a logarithmic runtime complexity— take proportionally less time to complete big problems as opposed to small problems.

Let’s have a look at how a binary searcher might find ** p_{base}** in our point collection. The image below shows the points in C as grey circles, sorted from left to right. Note that the y-coordinate of each point —while drawn— is irrelevant. Below the points are drawn the four steps that a binary searcher performs in order to arrive at the conclusion that point #3 is closest to

*(*

**h****×**) if you only take the x-distribution into account. By a happy coincidence, point #3 is also the closest point in general, but that’s not necessarily the case.

A binary searcher always operates on the same basic principle, which is then recursively applied over and over until a solution is found. We know that * h* is somewhere in between the extremes of the entire collection. We can ascertain this by making sure that

*is to the right of the first point (0) and to the left of the last point (19). So we take the point halfway in between 0 and 19 and test whether it is to the left or to the right of*

**h***. That point happens to be #10 and it happens to be to the right of*

**h***. So now we know that the real answer must be somewhere between 0~10.*

**h**Once again we test the point halfway between 0 and 10, which is point #5. Same deal as before, 5 is to the right of * h* and we’ve now shrunk our search range to 0~5. If we look halfway between 0 and 5, we find point #2 which happens to be to the left of

*, so instead of adjusting the upper limit of the search range we now adjust the lower and end up with a range of 2~5. Repeat until you can move neither left nor right without making things worse.*

**h**As you can see, the search range is effectively cut in half with every step. So when you double the number of items, it only takes one extra step to complete the search. Quadruple the number of items and it only takes two additional steps. This is why binary search is a logarithmic algorithm.

We’ve now developed a closest point search algorithm which actually runs pretty fast, especially on balanced point collections. We’ve gone from brute-force, to sorted-brute-force, to adapted-sorted-brute-force which is hardly brute-force at all any more.

However there are still deep problems with the core algorithm and if we want to address them we’ll have to drastically rethink our approach. The main issue is that the sorting only happens along a single axis. With well balanced, two-dimensional points that isn’t too big a deal, but if you start dealing with three or more dimensions it becomes increasingly less efficient.

Up next, spatial buckets and how they can be used to generate a notion of locality.

## What’s the point? [2]

### March 8, 2013

In the previous post we discussed some generic optimization strategies that might be applied to software to speed things up. Now we’ll look instead at a very well-known problem to see how being a smart programmer is still better than having a faster computer. The problem we’ll tackle is known as Closest Point Search, and it is interesting both because it is very simple to understand and very simple to write a naive algorithm that solves the problem.

Imagine a large collection of two-dimensional points (* C*) strewn across a flat surface. For the time being we’ll assume that the furthest distance between any two points in the collection is finite and that the average distance from any point to its neighbours is roughly the same. Such a point collection is called

*balanced*. By contrast, an unbalanced collection would have very pronounced clusters where points group together. The further apart these clusters are the more unbalanced the collection is.

The question we want to answer is; “*Which point ( p) in this collection is closest to a location (h)?*“. A naive algorithm which would answer this question is the

*brute-force*implementation:

- For each point
in*p*, compute the distance to*C*.**h** - If this distance is less than the previous smallest distance, remember this new distance as the smallest distance and remember
as the best result so far.*p* - When we’re done iterating over all points, we’ll have remembered the point nearest to
.**h**

The problem with this algorithm is that it has to iterate over *all* points in ** C**. Which means that when there’s twice as many points, it’ll take twice as long. Ten times as many points, it’ll take ten times as long. It would be nice if we could somehow find a way to omit points from the search. If we can reject out of hand —say— half the points our algorithm will be twice as fast.

First however some bad news. If we’ll only ask the question “which is the point closest to ** h**?” once then there’s absolutely nothing we can do*. We can only start to optimize the search if we’re certain that there will be

*multiple*closest point searches performed on the same —or at least a very similar— collection.

So let’s consider an optimization to our brute-force search algorithm. We’ll add a pre-process step and also an escape clause:

- Sort the points in
by ascending x-coordinate (i.e. from left to right).*C* - For each point
in*p*, compute the distance to*C*.**h** - If this distance is less than the previous smallest distance, remember this new distance as the smallest distance and remember
as the best result so far.*p* - If the x-coordinate of
is higher than the x-coordinate of**p**+ the current smallest distance then stop iterating.*h* - When we’re done iterating, we’ll have remembered the one nearest to
.**h**

The trick here is the sorting. Since we know we’re now iterating from left to right, we also know that every new point we test will have an x-coordinate that is equal to or higher than the point we just tested. So once we find that the distance between this x-coordinate and the x-coordinate of ** h** is larger than the smallest distance we’ve found so far**, we know for a fact that all remaining points cannot possibly yield a better answer then the one we already have. We can thus reject those from the search.

The image above provides a snap-shot of the search algorithm just prior to step 4. All the black points have already been tested and almost every new point turned out to be closer than the previous one. We’ve just tested the point that was nearest to **×** and the green circle represents the remaining area in which even closer points may be found. The right-most edge of the circle defines the location of the rejection boundary. The next point to be tested is just beyond this boundary and thus it and any subsequent points still left in the collection cannot possibly yield a better result then we have already.

This approach allows us to omit certain points and will thus be faster than the brute-force search. There is however an initial penalty since we have to sort the points first. But although this algorithm performs better, it still has many weaknesses. If the point collection has little variation along the x-direction for example then the benefit of sorting along the x-direction are minimal. Also when * h* is far outside the bounding box of

*, then the shortest distance will always be quite large and the Rejection Boundary might never actually exclude any points, nullifying the optimization. Finally, if*

**C****is somewhere near the right edge of**

*h**, then the Rejection Boundary will be well outside of*

**C****for most of the time. One could of course choose to sort**

*C***along the y-direction or to iterate backwards instead of forwards depending on the relative location of**

*C***with respect to**

*h***, this will definitely alleviate some of the drawbacks at the expense of more complicated code. There is however another trick that can be used to significantly improve the performance of the sorted brute-force search. But for that we need to understand how binary searching works…**

*C** Short of multithreading the Brute-force search. We could split the collection into N sub-collections (one for each available core), find the nearest point in each sub-collection and then find the best result amongst the N partial results.

** We are ignoring the contribution of the y-coordinate, which can only make the distance larger, never smaller. It is thus safe to ignore.

## What’s the point? [1]

### March 8, 2013

Let’s talk optimization today. It’s a contentious topic amongst programmers primarily because some are too eager to optimize while others are too lackadaisical. The former often results in code which is difficult to read and hard to maintain while containing more bugs than it otherwise would have, the latter results in programs which are unresponsive and often make the user wait. There are well known generic strategies for optimizing code which can be applied to improve perceived performance.

*Caching* is a common strategy which improves performance by remembering past answers to questions. You technically only have to compute a certain calculation once, for the answer to 4×31 will be the same tomorrow as it is today. There are only four considerations that matter with respect to caching, to wit:

- Is the algorithm used to recompute a value a bottleneck in the overall scheme of things? In other words, if the computation is very fast to begin with, caching the result is probably not useful.
- Is the time needed to retrieve the cached value less than the time needed to compute the value anew? Although it sounds like a rephrasing of bullet item #1, this is about the absolute performance gain rather than general merit.
- Is the memory needed to store the cached value available? Optimization can apply both to processor performance and memory footprint and although the focus of this post is purely on performance, increased memory usage may make a certain optimization prohibitively expensive.
- Is the cached value likely to be used ever again? There’s no point in caching the answer to a question that will only be asked once.

*Lazy Evaluation* is another common strategy which improves performance by delaying expensive computation till the utmost last moment. The basic idea is that if you never end up asking a specific question there’s no point wasting processor cycles on calculating —and caching— the answer. Lazy Evaluation improves the perception of performance in two ways:

- It breaks potentially long computation times into smaller chunks that may not be noticeable to the user.
- It eliminates processor cycles spend on calculating answers to questions that will never be asked.

*Greedy Evaluation* is the opposite of Lazy Evaluation and is —perhaps somewhat paradoxically— another common optimization strategy. Note that rather than talking about performance, I specifically prefixed it with the qualifier “perceived” in the opening paragraph of this post. When writing software for humans, what really matters is how people perceive the software rather than scientific metrics. It is a well known fact that the duration people are made to wait bears little correlation to how long people feel they had to wait. There are times when a delay will not bother people, for example when starting an application or entering a new game level. We expect to have to wait and will happily sit back for 3 seconds while the software performs its magic incantations. But make someone wait for half a second after selecting an object before you display the selected state and your software becomes all but unusable. Greedy Evaluation is the idea of calculating —and caching— lots of values during a period where the user is happy to wait, so that these values can then later be put to immediate use should the need arise.

Greedy Evaluation is sometimes also called *Eager Evaluation*, I assume to make it sound less negative and more like the opposite of Lazy Evaluation.

*Multithreading* improves performance by computing certain things in parallel. Almost all personal computers these days have more than one processor cores and certain computations can be performed simultaneously on different cores. Not all problems lend themselves to multi-threading, and often it is quite difficult to properly implement it. Breaking a process into smaller pieces is typically easy enough, but aggregating the results of each individual thread into a single final answer can be difficult. Especially with the arrival of Cell-processors and GPU-Multicores it is tempting to invest in multithreading as the performance increase can span two orders of magnitude.

But the above are all generic optimization strategies. Certain famous problems have a long history of people searching for specific optimizations and it is one of these I want to talk about in more detail. Up next: finding nearby points in large collections.