Tuesday, 9 February 2010

How many words for Red? Part 2

Just an update on the last post (here). I looked at the distribution of colour terms from Wikipedia according to their hue value and suggested that people have more words for some colour ranges than others.

First, I ran a linear mixed effects model on the data, and came up with slightly different results. The spectrum was split into 10 equally sized bins, and the number of words that fell into each bin was counted. This number was predicted by the bin number and the slope and intercept were allowed to vary by language. The bin number significantly improved the fit of the model (Log likelihood difference = 5.12, Chi squared = 10.2, df = 1, p=0.001). This suggests that the distribution is not flat (i.e. there are more words for some colour ranges than others). However, allowing different languages to have their own fit did not significantly improve the model (Log likelihood difference = 0.65, Chi squared = 1.3, df = 2, p = 0.52). This suggests that languages do not differ significantly in the distribution of colour words.

However, I also noticed that the distribution looks very similar to the Just Noticeable Difference (JND) curve for colour. Human eyes are not uniformly sensitive to colour. We can distinguish colours better at some ranges than others. Below is the distribution for colour terms from the English Wikipedia site, with an overlay of the human JND curve (from Long et al., 2006).

You'll notice that the curve is a very good fit. Indeed, the two distributions are correlated (r=0.6, df=18, p=0.005). That is, the distribution of colour words may not be uniform over the physical spectrum, but it is pretty even across the perceptual spectrum. Put another way, humans have lots of words for ranges of the spectrum that they are good at discerning.

For 8 languages, the number of colour categories and the JND are correlated (r = 0.28, df = 94, p = 0.005), and more so for all non-monochromatic colours (r = 0.3, df = 93, p = 0.002814).

A mixed effects model shows that the perceptually normalised number of colours (num
colours/JND) are still significantly skewed (Log Likelihood difference = 10.22 Chi square = 20.4234 p< 0.001). But this skew is not much different between languages (Log Likelihood difference = 1.95 chi square = 3.8, p= 0.14). (The p values drop to 0.0002 and 0.55 when considering non-monochromatic colours)

This suggests that there is still a non-perceptually motivated colour term distribution

Long F, Yang Z, & Purves D (2006). Spectral statistics in natural scenes predict hue, saturation, and brightness. Proceedings of the National Academy of Sciences of the United States of America, 103 (15), 6013-8 PMID: 16595630

1 comment:

  1. Have you considered anthropological influences? it might not be language per se, but culture that affects the color naming. Why does red show up so often? While yellow-green has high perceptibility, red has many interpretations that are very important, such as health. Think embarrassment, inflammation, bleeding, bruising, grave illness (lack of red). But also for agrarian communities, red indicates ready to eat for some foods (apple, strawberry, tomato) but not yet ready for others (blueberry). Lastly, is there an advertising influence, i.e., more names for last year's colors so clothing and car manufacturers can convince you their product is really new? If so, pre-20th century words might be evaluated differently from words in current use.