Monday, 8 February 2010

How many words for Red?

Just how different are languages in the way they label colours? Since Berlin & Kay's 1969 study of colour terms in many languages, the debate over cross-linguistic similarities has raged. Interestingly, most of the subjects in Berlin & Kay's experiments were bilingual, but they didn't think that other languages could influence the results of individuals.

In the last few years, Dr Panos Athanasopoulos at the ESRC Centre for Research on Bilingualism, Bangor, Wales, has been investigating colour perception in Bilinguals. In 2009, Dr. Athanasopoulos studied Greek-English bilinguals. Greek makes a distinction between dark blue ble and light blue ghalazio. Results suggested that bilinguals' perceptions shift towards those of native speakers of their second language. The study is set to be extended into Japanese this year by two forthcoming publications (in Bilingualism: Language and Cognition and Language and Bilingual Cognition).

Last week I was thinking about models of colour terms and bilinguals. What do we mean when we say someone is 'bilingual'? On a syntactic level, this may be a bit easier to answer, but on the lexical level (where I am at the moment) is a bit more difficult. For example, what's the difference between an having two 'languages', and having one 'language', but many words for the same category?

My current suggestion is that bilinguals differ from monolinguals because they have more sets of categories that span their entire perceptual field. What does that mean? Imagine the colour spectrum (red -> yellow -> green -> blue). A monolingual will be able to label any point in the spectrum with one word, and some points with more than one. For instance, a certain red may be 'red' and 'crimson'. However, a bilingual will be able to label any point with more than one word. That is, the spread of their synonyms will be more even.

Here's an illustration: Below is how a monolingual might break up the colour spectrum. It has several categories in its memory (the colour 'E' spans from red to yellow, 'C' covers greens and 'A' covers a specific kind of green, etc.):


The bottom of the image is labelled 'bin packing'. This represents the results of the bin packing algorithm, which tries to fit the categories into the smallest space possible. The monolingual above has lots of wasted space (striped areas) because it has lots of synonyms for a few categories.

The next illustration is of a bilingual, but note I haven't specified which 'language' a category belongs to. They have synonyms too, but they pack much more efficiently. In fact the speaker below has two options with which to describe any colour. In contrast, the monolingual has only one complete system.

Ok, these examples are set up. But I still predict that bin packing of bilingual categories will be more efficient than for monolingual categories. I'll use this metric to examine the results of a model in an upcoming post.

However, this metric assumes that colour categories within a language are unevenly dispersed. That is, speakers know many words for some colours, but not others. Let's put this to the test. The problem with doing so is that most colour category experiments involve getting people to assign colours to labels, meaning that they can't declare two labels for the same colour. So, I went to the Wikipedia List of Colours page which lists details of all colours mentioned in technical articles on colour. These are mainly standardised names for use in HTML, which is a problem for our current analysis, but let's see what happens.

The names for colours in English and other langauges were gathered along with the Hue angle (i.e. position in the spectrum). Below is the histogram of the number of colour terms in different portions of the spectrum (coloured by average colour of the colour terms, taking saturation and brightness into account) for English, followed by the histograms for some other languages:

First, the histogram for English is certainly not even. There are more names for reds and blues than for greens. This works intuitively - how many types of red can you think of in comparison to types of green?

Let's look cross-linguistically: First of all, the histograms are not identical - the Wikipedia pages are not just translations. Second, they all seem to have lots of names for reds. This may be an artefact of the circular meaning space (from 0 degrees to 360 degrees), but would not entierly explain the imbalances. Let's do an ANOVA of the number of colour terms per section of the spectrum (splitting into 10 even sections) by section position and language.

Colour names were not evenly distributed within languages (F(1,64)=11.35, p<0.01), but were significantly differently distributed across languages (F(7,64)=6.31, p<0.001). Having said this, I'm not completely sure of my stats here.

However, if the analysis is correct, then bilinguals should have a better packing efficiency than monolinguals because acquiring a whole extra 'language' is more likely to normalise the distribution rather than increase the skewness. This may be a useful metric in the analysis of feature-level models of bilingualism.

No comments:

Post a Comment