March 19, 2011
Colour is a tricky subject. From pretty much every angle you look at it. From a chemical point of view we get to deal mostly with pigments, which is to say how molecules (or rather, the electron clouds of molecules) interact with photons. It would be difficult enough to predict the result of mixing two pigments if you didn’t have to take chemical reactions into account. Chemical reactions due to pigments interacting, or due to pigments falling apart as a function of half-life or light or temperature or humidity.
From a physics point of view colour is usually approached from the photon-wave-length direction. Every photon has a wave-length and some of these wave-lengths translate into visible light. Lasers tend to emit light of only a single wavelength (mono chromatic), but most natural light sources emit a continuous range of wavelengths. It’s easy enough to predict the result when mixing two light sources of known wavelength and intensity, but there’s a difference between the harsh numeric wavelength approach and the soft human biological approach. Just because you know the exact mixture of wavelengths in a certain light beam doesn’t mean you know how that colour will be experienced by living creatures. The rods and cones in our eyes do not measure all wavelengths equally well. The following graph shows how, on average, human cones respond to photons of specific wave-lengths:
The three curves each represent one of the types of cones humans typically have. They’re usually referred to as Blue, Green and Red cones, though it’s clear from the graph that it’s actually closer to Blue, Yellow and Red. When photons with a wavelength of 400~500 fall on our retinas, it’s mostly the blue cones that fire, for larger wavelengths it’s a ratio of Green and Red cones. Anything over 700 is ‘too red’ for us to see (infrared), as is everything under 400 (ultra-violet). Note that a larger wave-length equals less energy, hence ‘infra’ for red and ‘ultra’ for violet instead of the other way around.
Then there’s the natural variation in the population at large, Rhodopsin after all is subject to evolutionary forces, so we expect this graph to differ from person to person. Also a lot of people are colour blind and there are reports of women who have a fourth kind of cone, making them tetrachromats.
But physics (specifically optics) tells us that colour can also be the result of processes other than photon-pigment interaction. Structural colours for example arise due to quantum effects when light is reflected from thin layers. We see this effect in a lot of bird-feathers, which makes them appear somewhat metallic. Then there’s good old fashioned dispersion due to the index of refraction, as first properly investigated by Newton. All in all biology has found seven ways to generate colour, all of which are treated in Andrew Parker’s excellent book ‘Seven Deadly Colours‘, a worthwhile read.
(source: Hipgnosis Design)
But by far the most complex facet of colours is the cultural/linguistic aspect. How people experience and recognize colour is in large parts a function of their own personal history. Not only does colour perception differ between cultures, it also differs from time to time (which is of course not that surprising, British culture today is not the same as British culture a century ago). Pink for example wasn’t always a colour. Sometime in the 14th century the verb to pink changed meaning and is today exclusively used to identify photons of a specific wave-length and extrovert singer-songwriters. There is no equivalent word in Dutch or German or French or Slavic languages for pink, this particular shade is called ‘roze’ or ‘rose’ or some other variation. (Finnish does have ‘pinkki’, but it’s a loan word. Anyone who’s ever been to Finland will tell you that particular hue is not present anywhere in the country, which explains why they had to borrow it from elsewhere).
Whether or not you’ll be able to distinguish Indigo as a colour distinct from Purple depends on large parts on whether you’re familiar with Indigo. It is not a function of our cones or nerves, but rather our memory. The popular web-comic author xkcd performed a large scale colour survey where people were asked to name a random colour. The results of which he graciously published for anyone to play with.
Which brings us finally to the relationship between wavelength and word, which is what this blog post is about. Most modern computer screens are able to display 16,777,216 unique colours (256 variations for red, green & blue each), but we are only able to distinguish a few as truly distinct colours. This means there is a very limited amount of colour names needed to identify any combination of red, green & blue. Or, put another way, it is very easy to write an algorithm that matches a name to any given colour. All you need is a database of a few dozen to maybe a few hundred colour|name combinations and a way to measure distances between two colours. xkcd provided such a map of the fully saturated faces of the RGB colour cube with the regions that fall within a specific colour name:
But what about going the other way? What if we want to write an algorithm that takes a textual description of a colour and returns an RGB value? Now we’re in trouble. Sure, we can just invert the aforementioned algorithm and whenever people happen to type a colour name like ‘Blue’ or ‘Pink’ or ‘Brown’ we can return a representative hue of that particular colour. But whenever we’re confronted with something like ‘Darkish Teal’ or ‘Pinkish Yellow’ or ‘Boring Maroon’ there’s nothing we can do. If we’re serious about writing an algorithm that can ‘reverse-engineer’ colours from colour names, we first need a large database of colour names. There are several official palettes that define a collection of named colours such as the X11 table, or the Pantone Space. Then there’s a large database of human-centric colour names as collected by the xkcd Colour Survey. When we put it all together we get the large image below. A lot of the entries are duplicates though since this table was created using several sources. Then there’s a large abundance of colours that we would ideally like to be able to generate directly with an algorithm such as ‘Dark Green’ or ‘Yellow Green’. When these unnecessary and redundant entries are filtered out, we end up with the thin images on the right:
(click for large versions)
If we map the reduced palette to an RGB cube, we get the following distribution:
As is already visible, the cube does not have uniform density. This is not terribly surprising, as I can think of a lot of names for whitish colours, but only a few for purplish colours. What I did find surprising is the exact shape of the density distribution. We can remove the most isolated colours from the set (those whose distance to the nearest other colour in the set is largest) and thus highlight the boundary between high and low density areas. The following two images are two different views of the entire set with the 50% most isolated colours removed:
As these images show, the bulk of colour names cluster around the black-white diagonal. We basically have lots of words for colours that are very greyish, but few words for colours that are saturated. A possible explanation for this would be that we experience colours in an imaginary context of nearby colours. After all, if you start to modify Barbie Pink, it takes a lot of modification to really get away from that specific colour, whereas even a little change to a very faint pink will already push it to a different hue. But this is pure conjecture, I have no evidence for this hypothesis.
Next up, how to parse compound colour names and end up with a decent result.