Tuesday, 25 May 2010

Evolutionary approaches to Bilingualism

I recently gave a talk at the University of Edinburgh LEL Postgraduate Conference. It was my first ever talk and it really forced me to figure out what I'm supposed to be studying! Here's a video of my talk:

Frank MC, Goodman ND, & Tenenbaum JB (2009). Using speakers' referential intentions to model early cross-situational word learning. Psychological science : a journal of the American Psychological Society / APS, 20 (5), 578-85 PMID: 19389131

Hunag, Y. (2009). Supporting Meaningful Social Networks Technical Report, ECS, University of Southampton

Healey, E. and Scarabela, B. (2009). Are children willing to accept two labels for one object? Proceedings of the Child Language Seminar. University of Reading.

Byers-Heinlein K, & Werker JF (2009). Monolingual, bilingual, trilingual: infants' language experience influences the development of a word-learning heuristic. Developmental science, 12 (5), 815-23 PMID: 19702772

Thursday, 13 May 2010

E-Coli, Linux and Language

A recent post on The Loom looks at a paper by Koon-Kiu Yang et al. which compares the hierarchical structures of the operating system Linux and the bacterium E-Coli. Really interesting analysis - and a good discussion on the blog.

I found it interesting that E-coli's structure is primarily lower-level 'workhorses' with relatively few master controllers. Linux on the other hand has a much larger percentage of high-level 'master' and 'middle manager' modules and reletively few 'workhorses'. Linux is designed while E-coli is evolved.

I’m wondering how linguistic systems would fit into this schema. What are the ‘workhorses’ and ‘master regulators’ of language? There are many more ‘low-level’ words that refer to things than ‘higher level’ syntactic structures. This would make it like e-coli.

On the other hand, there are relatively few ‘low level’ phonemes and very many ‘high level’ concepts. This would make it more like Linux.

Maybe language has more ‘middle managers’ than anything else?

Answering this may give an insight into how ‘designed’ language is, as opposed to ‘evolved’.

Yan, K., Fang, G., Bhardwaj, N., Alexander, R., & Gerstein, M. (2010). Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.0914771107

Tuesday, 11 May 2010

Mutual Exclusivity biases in cross-situational learning: A comparison between monolingual and bilingual corpora

This report focuses on models of cross-situational learning and how current models compare when exposed to real monolingual and bilingual input. Several model types were evaluated against two transcribed videos of parent-child interaction, one being monolingual and the other being bilingual. Children have been shown to demonstrate a Mutual Exclusivity (ME) bias (Markman and Wachtel, 1988) during word learning. Frank et al. (2009) showed that their model also exhibited Mutual Exclusivity (ME) behaviour after learning from a monolingual corpus of contexts. The current study takes the same model but with bilingual input and asks whether the same behaviour is exhibited.


Frank et al. (2009) provide a transcribed video of monolingual parent-child interaction coded for use in cross-situational learning. An equivalent bilingual corpus was looked for. The main criterion was a roughly equal number of utterances in both languages. The CHILDES database has suitable resources. A recording from a study by Yip and Matthews was selected (see CHILDES, 2010). The child in question was a native bilingual from birth. Her mother was a native Hong-Kong Cantonese speaker and her father was a native speaker of British English. She was 2;11 in the chosen recording. There are 967 utterances, 48% of which are Cantonese and 52% are English. The objects visible in the video were added to the transcription, along with the mapping between words and objects and the referential intentions of the speakers. The coding scheme was adopted from Frank et al. (2009). The code for the models was supplied by Frank et al. (from Frank,2010).


The performance of different models were analysed in two ways. Firstly, the best estimated lexicon (word-object mappings) of each lexicon was evaluated against a gold-standard lexicon in terms of precision, recall and the resulting F-score.Secondly, the models were asked to guess the intended referent of each utterance-object context.Tables 1 and 2 show the lexicon results for the monolingual and bilingual corpora respectively. Frank et al.’s model returns the highest F-scores in both cases.This is largely due to an advantage in precision, likely stemming from the modeling of non-referential words. Frank et al.’s model returned a word-object mapping for the bilingual corpus with a precision of 0.31 and a recall of 0.27, giving an F-score of 0.29. This is lower than the score for the same model on the monolingual corpus. This could be due to the referential uncertainty (independent from amount of synonymy) in the bilingual corpus being higher.The results for the referential intentions, shown in Tables 3 and 4, have different trends. For the monolingual case, the precision of Frank et al.’s model allows it to outperform the other models. However, it performs relatively poorly in the bilingual case, with the Conditional Probability model performing best. However,all models perform with little precision and recall, suggesting that the task is harder. With more data, results might be different.

Mutual Exclusivity

After the model had processed the corpus, it was presented with a mutual exclusivity task and the relative likelihood of several interpretations were measured.In the task, the model was presented with a context with a new object (e.g. a dax) and a familiar object (a bird for the monolingual model and an orange for the bilingual model) and a new word (e.g. ”dax”). The probabilities were calculated for the model linking the new word with neither object (i.e. it considers the word non-referential), linking it with the new word, linking it with the old word or linking it with both. Figure 1 shows the results of the task with the results for a ’monolingual’ model for comparison.

The monolingual results are re-calculated for this study, so differ slightly from those reported in Frank et al. (2009).The results for the monolingual and bilingual models have the same trend - both rank the possible situations in the same order of likelihood. The most likely situation is the new word being linked to the new object, honouring mutual exclusivity. The second most likely situation is that the word refers to neither object.Intuitively, one would expect a bilingual to be more likely to consider that the new word was another word for the familiar object. Indeed, the bilingual model does consider this possibility relatively more likely than the monolingual model. However, the model still considers neither mapping to be more likely than an extra synonym. This may mean that, given an additional cue (e.g. pragmatic), the bilingual would be more ready to accept a synonymous interpretation. This is an empirical question.

The Prior Probability

The prior probability is simply the number of mappings in the hypothesised lexicon, modulated by a fixed parameter (alpha). This represents a preference for smaller lexicons. This means that a hypothesis which results in the lexicon with fewest mappings will receive the highest prior probability. With the default parameter (alpha = 7), the Mutually Exclusive preference (for DAX-dax) beats the preferences for the original mapping (map neither word to the unfamiliar object),both mappings and the mapping of the unfamiliar object with the familiar name.However, this ranking depends on the lexicon size bias (alpha) parameter. With a low alpha, the most likely mapping is the ME mapping. With a higher alpha, the most likely mapping is the original mapping (see Figure 2).

The same trend also exists between the preference for the ME mapping and both mappings, although the preference for both mappings does not overtake the preference for the ME mapping (see Figure 3).
The explanation is as follows: The original mapping receives a high prior probability because it doesn’t increase the size of the lexicon. However, the likelihood of experiencing a non-referential word is low, leading to a total probability that favours the ME mapping over the original. Assuming a larger lexicon (decreasing alpha), the relative increase in lexicon size is smaller, tipping the balance between the original and ME mapping preferences.Interestingly, the likelihood of choosing both mappings overtakes the original mapping when alpha is less than 1 (see figure 4).
That is, the likelihood of assuming both mappings increases when the prior is set to less than the number of word-object mappings in the lexicon. Such a setting makes sense for a bilingual (who have up to twice as many mappings as bilinguals) because it represents the number of concepts. Put another way, by compensating for the additional synonymy in bilingual input, the likelihood of assuming both mappings increases.The dependence of the ME experiment results on alpha is acknowledged by Frank et al.:

“Note that there is some parameter dependence in our models fit to the mutual exclusivity situation. Depending on the size of the corpus,it might be the case that the prior disadvantage of adding a word to the lexicon would not be outweighed by the increase in corpus likelihood caused by learning a new word. This fact makes a developmental prediction: in early development, when very few words are known,inferences about mutual exclusivity should be weaker.”
Supporting Information for Frank et al. (2009), p. 13.

This prediction is borne out in some studies (Merriman and Bowman, 1989; Frankand Poulin-Dubois, 2002; Merriman et al., 1993). However, Markman and Wachtel (1988) found that the ME constraint weakens over time, with older children showing less of a bias, while Deak et al. (2001) find no change.The issue here is the size of the lexicon. Bilingual children may know more words than monolinguals, but it may be more accurate to judge the lexicon size by the size of one language’s lexicon.The model does not provide a mechanism for modulating the lexicon size prior parameter during learning. Currently the prior is modulated by the alpha parameter and the number of mappings, meaning that adding new mappings is dis-preferred. Bilinguals will have a higher number of mappings, altering their prior probabilities. However, this does not lead to qualitative differences in the mutual exclusivity experiment.The motivation for modulating the prior by the number of mappings is mainly to simplify the model.

“We chose a prior probability distribution that favored parsimony,making lexicons exponentially less probable as they included more word-object pairings ... The choice of a simple prior puts most of the work of the model in the likelihood term ... hence, the likelihood term captures the learners assumptions about the structure of the learning task.”
Frank et al., 2009, p. 579

That is, the decision is driven by the statistical, computational approach to the formal problem rather than being psychologically motivated. Therefore, the interpretation is that mutual exclusivity behaviour stems from the child’s unwillingness to learn new signal-meaning mappings. This seems a little circular - children prefer not to extend mappings from familiar words to unfamiliar objects because they prefer not to extend mappings. It also seems to go against children’s obvious ability and motivation for learning new words and meanings. Several solutions which would make the prior more sensitive to the input involve incorporating the number of concepts, the number of words or the amount of synonymy (proportional to the number of words in the lexicon divided by the number of concepts). However, the nature of the model now changes - we are using it to test specific hypotheses about mutual exclusivity, judged against empirical data,rather than seeing if mutual exclusivity ’falls out’ of more basic assumptions.

Concept-based Prior

The mapping-based prior was biased towards a monolingual mode. The model was altered so that the prior was negatively related to the number of objects in the lexicon. This represents the number of concepts for which the child knows words. The model was run on the bilingual corpus and returned a lexicon with a precision of 0.05, a recall of 0.41 and a resulting F-score of 0.09. The model was also run on the monolingual corpus again, returning a precision of 0.05, a recall of0.79 and an F-score of 0.09. For both monolingual and bilingual corpora, the recall of this model is better than for a mapping-based prior, but the precision is much worse. That is, the model overestimates the number of word-concept mappings. In fact, the models accumulated many hundreds of word-concept mappings for tens of objects (Monolingual: 551 mappings for 22 objects and 419 words; Bilingual:641 mappings for 55 objects and 598 words). The models have failed to acquire a useful vocabulary.However, running the Mutual Exclusivity experiment again, the relative ranking of the preferences has changed. Although the ME mapping is still favoured, the next preferred interpretation is to make both mappings (rather than neither, see Figure 5). However, this difference is exhibited with both monolingual and bilingual input data. By neutralising the difference in the prior, the corpus likelihood now plays a bigger role, leading to a difference in the preferences.

How ’Monolingual’ is the Monolingual corpus?

Although the monolingual corpus is taken from a carer speaking one language, the lexicon the model learns contains synonymy. In fact, for the 15 objects it learned words for, 8 had more than one associated word. For half of these 8 objects, all synonyms were appropriate (e.g. ’bird’ and ’birdie’ to describe the object ’duck’),but half were not appropriate. In other words, the model accommodates synonymy.The original Mutual Exclusivity experiment in Frank et al. was done with the object ’bird’, which had one associated word. The ME experiment was applied for all words that the model learned from the monolingual corpus. There were no significant differences between the posterior probabilities for any of the situations (DAX-dax, Both etc.) for synonymous mappings versus non-synonymous mappings. This holds for both the original and the concept-based prior.


Frank et al.’s model can be used to model word learning in bilinguals. There are some quantitative differences in the ME behaviour of models run on monolingual and bilingual corpora. However, no qualitative differences were found. Even when the prior bias for minimising the number of mappings was neutralised, both models still preferred to map the new object with the new word.

Next Steps
The results are inconclusive, but may reflect the limited data. I suggest that synthetic corpora would make the dynamics more clear. Very simple cross-situational learning corpora could be created with varying amount of ’bilingualism’.


Frank MC, Goodman ND, & Tenenbaum JB (2009). Using speakers' referential intentions to model early cross-situational word learning. Psychological science : a journal of the American Psychological Society / APS, 20 (5), 578-85 PMID: 19389131

Byers-Heinlein K, & Werker JF (2009). Monolingual, bilingual, trilingual: infants' language experience influences the development of a word-learning heuristic. Developmental science, 12 (5), 815-23 PMID: 19702772

Deák GO, Yen L, & Pettit J (2001). By any other name: when will preschoolers produce several labels for a referent? Journal of child language, 28 (3), 787-804 PMID: 11797548

Frank, I., & Poulin-Dubois, D. (2002). Young monolingual and bilingual children's responses to violation of the Mutual Exclusivity Principle International Journal of Bilingualism, 6 (2), 125-146 DOI: 10.1177/13670069020060020201

Markman EM, & Wachtel GF (1988). Children's use of mutual exclusivity to constrain the meanings of words. Cognitive psychology, 20 (2), 121-57 PMID: 3365937

Merriman WE, & Bowman LL (1989). The mutual exclusivity bias in children's word learning. Monographs of the Society for Research in Child Development, 54 (3-4), 1-132 PMID: 2608077

Merriman WE, Marazita J, & Jarvis LH (1993). Four-year-olds' disambiguation of action and object word reference. Journal of experimental child psychology, 56 (3), 412-30 PMID: 8301246

Healey, E. and Scarabela, B. (2009). Are children willing to accept two labels for one object? Proceedings of the Child Language Seminar. University of Reading.

Thursday, 6 May 2010

Systematicity of RNA

I've been looking at evolutionary precursors to bilingualism. What does this mean? At the moment, I'm thinking about it in the sense of having two or more signals which correspond to the same action or meaning. Not much before language, you say? How about going all the way back to RNA codes?

RNA converts genetic information stored in DNA into proteins which regulate processes within cells. The ‘code’ for translating DNA into proteins is redundant but not ambiguous. There are varieties of code. Different organisms use different proportions of codons. ‘Error’ is defined as sum of protein changes when changing from each codon to each other codon, weighted by the frequency of the codon’s use (Marquez, Smit & Knight, 2005). In this sense, the error rate is comparable with the RegMap index of redundancy.

RegMap was developed to calculate the degree of regularity in the mappings between signals and meanings (Tamariz & Smith, 2008). Essentially, it's the relative entropy modified by the frequency of use.

RegMap was applied to RNA coding frequencies of various organisms. Info was taken from the codon usage database for about 16,500 organisms. As a baseline, the same coding transcriptions were used, but with randomised frequencies. The RegMap index of genetic code and actual usage frequencies is significantly higher than randomised frequencies (Mean RegMap for actual = 0.711, random = 0.708, t = 4.8, df = 7196, p less than 0.0001).

The graph is not much use, but here it is:

Marquez R, Smit S, & Knight R (2005). Do universal codon-usage patterns minimize the effects of mutation and translation error? Genome biology, 6 (11) PMID: 16277746

Monica Tamariz, Andrew D. M. Smith (2008). Quantifying the regularity of the mappings between signals and meanings Proceedings of the 7th Conference on the Evolution of Language. pdf

Wednesday, 5 May 2010

LEL Postgraduate Conference

I'm giving a talk at the LEL Postgraduate Conference at the University of Edinburgh, 19th - 21st May. It's not that big of a deal, since I'm required to give a talk, but it is my first talk. A link to the website (which I'm maintaining) with more details and my abstract follow!

LEL Postgraduate Conference 2010

Bilingualism and Social Networks

Children learn language from exposure to speakers in their social network. This learning influences the input that will be given to the next generation. The way languages change over time is dependent on the learning biases of individuals (e.g. Kirby, Dowman & Griffiths, 2007), but also on the dynamics of the social network of those individuals (Gong & Wang, in press; Lupyan & Dale, 2010; Gal, 1979; Govindasamy, 2003).

Bilingualism is often marginalised in theories of language evolution and existing bilingualism is generally seen as the product of contact between two or more monolingual communities. However, I hypothesise that a bilingual ability is a fundamental aspect of language learning: children can learn two languages as easily as learning one. This suggests that human cognition is geared towards handling complex, not homogenous cultural input. This in turn may suggest the kind of social networks in which human cultural transmission evolved. The prevalence of monolingualism in some modern societies may be explained by changes to social structures afforded by communications technology.

This talk will outline my approach to this hypothesis. This involves the idea of cultural transmission as a trade-off between communicative flexibility and expressivity, the use of a comparative approach to bilingualism and methodologies to generate and test hypotheses.

Tuesday, 4 May 2010

Bilingualism as a preadaptation for Language

This report is the beginnings of an attempt at a comparative approach to bilingualism, in the style of Fitch(2005). Bilingualism is difficult to define, but by asking whether there is evidence for this capacity in non-human species, it's hoped that this question is made clearer.

This research project takes an evolutionary approach to Bilingualism. One of the most difficult problems faced so far is identifying the role of bilingualism in the cultural evolution of language. Is it a product or a catalyst? Firstly, I'm not sure whether this has been considered to any great extent. However, I suggest that the implicit assumption in the vast majority of work in both the areas of Bilingualism and Language Evolution has been that bilingualism is a product of the merging of homogenous language communities. This report explicitly asks the question: Which came first - Language or Bilingualism? That is, did the capacity for bilingualism develop from a pressure to learn multiple existing languages or was it a capacity which existed before human languages were established and influenced their arrival?

The latter hypothesis seems to be non-sensical. How can individuals have the ability to learn more than one language when there are no languages to be learned? Here, I'd like to make a distinction between two kinds of bilingualism, following the approach of Hauser Chomsky Fitch (2002). Bilingualism in the narrow sense means the ability to learn several human languages. This is obviously a human-only trait. Bilingualism in the broad sense refers to the general capacity to acquire more than one signalling system. Depending on how one defines signalling systems, this capacity may be shared with many other animals, both closely and distantly related. Of course, defining what constitutes a single signalling system is difficult, let alone defining language or bilingualism. However, it's hoped that the approach taken in this paper will help towards this goal by considering the features of the phenomenon we wish to define.

Before considering this possibility, the comparative approach to language evolution is presented. Fitch (2005 and others) approaches the study of the evolution of language by considering what elements contribute towards the `Faculty of Language'. In the broad sense of the term, this covers all the prerequisite elements that are required for linguistic communication. This involves cognitive capacities such as acoustic string segmentation and semantic processing, but also much more basic features such as memory. That is, features of the Faculty of Language in the broad sense (FLB) are found in humans and animals. The narrow sense of the term (FLN) refers to those capacities which are involved in language alone. There is much more debate about what these elements are. Recursive processing has been suggested as one example.

The comparative approach has been used to answer the question of what belongs to FLN and to FLB. Animals have been shown to be capable of a number of processes required for language, including categorical perception of speech sounds (Kuhl & Miller, 2978) and Mutual Exclusivity (Juliane & Kaminski, 2004). From studies of divergent and convergent evolution of these traits, some important features have been identified. For example, many species which exhibit vocal learning have direct neural connections between the brain and vocal motors, while non-vocal learners do not (see Doupe, 1999).

This report suggests that this approach should be adopted for the study of Bilingualism. Such an approach would seek to answer whether bilingualism is a uniquely human capacity. If it turns out that other animals also have this capacity, then the role of bilingualism in the evolution of language can be re-assessed.

However, there is a large initial problem. Even FLB only consists of capacities that are required for language. Bilingualism in the broad sense, however, is not required in order to speak a language. This problem may be due to the individual-level bias to the idea of the 'Faculty of Language'. Its primary aim is to describe capacities that an individual organism requires, rather than a community. Therefore, bilingualism may not be part of the FLB, and simply a product of cultural interaction. However, the comparative approach can help verify this hypothesis if social animals exhibit the capacity for bilingualism. That is, if bilingualism comes from cultural interaction alone, there should be no non-social animals which have the capacity for it.

Bilingualism in Bengalese Finches
If other species exhibit bilingualism, then this is evidence that bilingualism developed before human language. Takahasi & Okanoya (2010) study the vocal learning patterns of the Bengalese Finch. These are a domesticated breed descended from wild White Backed Munia. The Bengalese Finch exhibits very complex song patterns in comparison to the White Backed Munia.

Takahasi & Okanoya (2010) carry out a cross-fostering experiment where Munias are brought up by Finches, and Finches are brought up by Munias. The Munias tended to have a stronger preference for copying Munia songs, while the Finches are not so disposed towards their own strain's song. That is, Finches have more flexibility in learning. It is hypothesised that this is because there is a pressure on Munias to identify their own strain in the wild where there are mixed flocks, while this pressure has been masked for Finches by domestication and isolation.

However, is this really `Bilingualism'? The problem is that, although there is flexibility in the sources of acquisition, the birds do not have the same flexibility in production. That is, as I understand it, they still develop only one song (i.e. they can't sing elements of A and B's songs in the morning, then elements of C and D's songs in the evening). Furthermore, the idea of 'comprehension' is more difficult to apply, since there is no semantics.

It has been suggested that Bengalese Finches have developed song complexity as a sexual display (Okanoya, 2004). Following from this, Soma et al. (2009) find that chicks select tutors based on their song complexity. Also, Okanoya (2010) presents some evidence to suggest that Benglaese Finches learn from many tutors. That is, they splice whole segments of songs from many other individuals to create their own song. In this sense, learning from multiple tutors increases the complexity of the song and so increases the attractiveness and fitness of the individual.

The ability to learn syntactic sequences from many tutors has apparently occurred in a system with no semantics. This may suggest that bilingualism at the syntactic level emerged before bilingualism at the lexical level, opposite to the order implicitly assumed by many. One big advantage of cultural evolution is that individuals can inherit information from multiple sources, whereas there are a limited number of biological parents. This is the core of what I mean by Bilingualism being a preadaptation for language: part of the acquisition of human languages requires the flexibility afforded by bilingualism.

Counter Arguments
This phenomenon in Bengalese Finches is interesting, but may not help with our question. Although the complexity of the system has increased due to a change in the environment (domestication), whether this was initially enabled by learning from multiple tutors is not clear.

Also, Okanoya has shown that vocal learners who co-inhabit areas with other species of vocal learners have less complex song. That is, song complexity does not help species identification. Therefore, if the capacity for bilingualism developed in humans before language, it's likely that there was little pressure on vocal cues for species identification.

Inter-species semantic communication
Many species also communicate vocally with other species. Vervet monkeys respond to the territorial and alarm calls of superb starlings (Seyfarth Cheney, 1990). Ring-tailed lemurs respond to the alarm calls of Verreaux's sifakas (Oda Masataka, 1996). However, captive ring-tailed lemurs who had never heard the sifakas' alarm calls also responded appropriately to playbacks. Oda and Masataka argue that they are therefore responding to shared acoustic features rather than to an associated meaning. Although in most examples of inter-species communication do not involve the transference of 'concepts', some examples do show evidence for this.

Zuberbuhler (2000) studied communication between Diana monkeys and Campbell's monkeys. Diana monkeys respond appropriately to Campbell's monkeys' alarm calls for leopards and eagles. Furthermore, their responses suggest they are attending to the meaning rather than the acoustic signal. If a Diana monkey hears a leopard or a leopard alarm call, it calls out loudly, but if it hears a second leopard or leopard alarm, it is quieter, presumably because of the risk of predation (the same is true of eagle alarms). Diana monkeys were primed with Campbell alarms for either leopards or eagles then probed with either eagle or leopard sounds (growls and shrieks). They responded loudly to each combination, apart from where the Campbell alarm corresponded to the predator type (e.g. Campbell leopard alarm followed by a leopard sound). In these cases, the Diana monkeys were quieter, suggesting that they thought the predator was already present.

Zuberbuhler concludes that "Diana monkeys can flexibly use and assess information derived from the communication of other species" and that "semantic understanding can be based on arbitrary signals, as it is the
case for word meaning" (Zuberbuhler, 2000, p. 717). Diana monkeys seem to understand the same concept from two different calls. I argue that this is bilingualism in the broad sense at the lexical-like level.

Again, Diana Monkeys are limited by their physiology in terms of production of the Campbell's alarms. However, the information transfer from Campbell's monkeys to Diana monkeys is not 'communication' as defined by MaynardSmith Harper, 2003) (also see Scott-Phillips, 2008). That is, although the Campbell's calls affect the behaviour of the Diana monkeys, they did not evolve to do this (they are cues, not signals). Therefore, I'd like to suggest that the origins of the capacity for bilingualism originates in the evolution from cues to signals.

However, these responses may not be learned. Furthermore, there is no current evidence to suggest that Campbell's reciprocate in their comprehension of Diana Monkey's calls. The latter issue is discussed by Magrath (2009) who study the alarm call responses of 3 ecologically distinct avian species and find that responses may be reciprocal, but not necessarily symmetrical. Different species reacted to each other's alarm calls in proportion to the 'reliability' of the call as a cue to one of the listener's predators. That is, not all predators of species A are predators of species B, so the A's alarms are not always reliable for species B, and species B responds appropriately. In Magrath (2009)'s study, some species responded in the same way to three different calls. Again, this is evidence for bilingualism in the broad sense.

This raises an interesting question of 'reliability' or 'relevance' (as in Relevance theory, Sperber Wilson, 1995) in animal communications. Much of animal communication is limited to and grounded in information relevant to shared survival interests, that is, food, predators and mating. Humans are capable of communicating about topics beyond their immediate survival needs. This difference possibly requires the of 'ungrounding' of signals from the domains in which they evolved (see the next section).

Bilingualism's impact on FLB
Although bilingualism may not be necessary for the acquisition of language, and so could arguably not be part of FLN, learning two languages does seem to have a qualitative impact on capacities in FLB. For example, compared with monolinguals, bilinguals develop better inhibitory control, theory of mind (Goetz, 2003) and task-switching (Bialystok Martin, 2004).

Raphael Nunez's approach to the evolution of language hypothesises that it involved several pre-adapted 'Modules', but these modules coevolved. That is, an advancement in one module (e.g. more stable voice source, see Demolin, 2010) could cause an advancement in another (e.g. vocal learning), which could feed back into the first module.

Nunez sees the evolution of meaning as involving the development of a grounded system, ungrounding this system from its original domain, then re-grounding it in another. His work focuses on how gestural instantiations of space were re-grounded to convey information about time. For example, one might point behind to indicate an event that occurred in the past. Linguistic expressions of time have also adopted this system.

I suggest that bilingualism can be seen in this way. For instance, being able to learn from several tutors has advantages for increasing signal complexity in some situations. If this ability to learn from individuals could be ungrounded to allow learning from contexts, then this would allow a semantic system to develop. In other words, a kind of bilingualism allows the complex vocal learning mechanisms to be deployed over more general domains.

Okanoya has a similar hypothesis which sees string segmentation and context segmentation as necessary preadaptations for a semantic system. Indeed, the Bengalese Finches studied above may not only be doing string segmentation of tutor's songs, but also a king of crude `tutor segmentation'. That is, they select whole sections from different tutors.

The ungrounding theory suggests that the pressure on the original system needs to be lifted by some other mechanism. This may be a change in the environment, or an internal mechanism. It's likely in the case of the Bengalese Finch that its domestication had a large part to play, alleviating the burden of foraging and predation.

Asking whether non-human species have capacities for bilingualism in the broad sense may affect the way we approach bilingualism. This report has reviewed studies which show that animals have capacities compatible with ideas of bilingualism, but without other features of human language. These capacities stem from very basic evolution of cues and being able to learn from multiple tutors.

Further analysis of evidence for bilingual behaviour in animals is required. These include, for example, switching tasks in primates and other animals and the boundaries between different dialects in whale song. Crucially, this analysis, just like for the rest of FLB, relies on a evidence from a great number of studies. If the relevant studies have not been done, the potential for completing them in this project is extremely restricted.

More fundamentally, this report takes an approach to bilingualism that may not be appropriate. The comparative approach was designed to identify and study necessary components of the language faculty. On the other hand, such an approach may show that, from an evolutionary perspective, there is no easy way to define bilingualism, questioning whether there is a difference between monolingualism and bilingualism or even an easy way to distinguish between languages.


Bialystok E, & Martin MM (2004). Attention and inhibition in bilingual children: evidence from the dimensional change card sort task. Developmental science, 7 (3), 325-39 PMID: 15595373

D. Demolin (2010). Prosody and recursion in primate vocalisation Proceed- ings of the JAIST International Seminar on the Emergence and Evolution of Linguistic Communication, Kyoto, Japan.

Doupe AJ, & Kuhl PK (1999). Birdsong and human speech: common themes and mechanisms. Annual review of neuroscience, 22, 567-631 PMID: 10202549

Fitch, W. T. (2005). The evolution of language: A comparative review Biology and Philosophy, 20 (2-3), 193-203 : 10.1007/s10539-005-5597-1

Kaminski J, Call J, & Fischer J (2004). Word learning in a domestic dog: evidence for "fast mapping". Science (New York, N.Y.), 304 (5677), 1682-3 PMID: 15192233

Kuhl, P. & Miller, J. D. (1978). Speech perception by the chinchilla: Identification functions for synthetic VOT stimuli The Journal of the Acoustical Society of America, 63 (3) DOI: 10.1121/1.381770

Magrath, R., Pitcher, B., & Gardner, J. (2009). An avian eavesdropping network: alarm signal reliability and heterospecific response Behavioral Ecology, 20 (4), 745-752 DOI: 10.1093/beheco/arp055

Maynard Smith, J., & Harper, D.G.C. (2003). Animal Signals Oxford University Press, Oxford

Oda, R. and Masataka, N. (1996). Interspecific responses of ring-tailed lemurs to playback of antipredator alarm calls given by Verreaux's sifakas. Ethology, 102, 441-453 : 10.1159/000021651

Okanoya, K. (2004). Song syntax in bengalese finches: proximate and ultimate
analyses Advance in the Study of Behaviour, 34, 297-346

Okanoya, K. (2010). Biological preadaptations for language Proceedings of the JAIST International Seminar on the Emergence and Evolution of Linguistic Communication, Kyoto, Japan.

SCOTT-PHILLIPS, T. (2008). Defining biological communication Journal of Evolutionary Biology, 21 (2), 387-395 DOI: 10.1111/j.1420-9101.2007.01497.x

SEYFARTH, R., & CHENEY, D. (1990). The assessment by vervet monkeys of their own and another species' alarm calls Animal Behaviour, 40 (4), 754-764 DOI: 10.1016/S0003-3472(05)80704-3

Soma, M., Hiraiwa-Hasegawa, M., & Okanoya, K. (2009). Song-learning strategies in the Bengalese finch: do chicks choose tutors based on song complexity? Animal Behaviour, 78 (5), 1107-1113 DOI: 10.1016/j.anbehav.2009.08.002

Miki Takahasi, & Kazuo Okanoya (2010). Song Learning in Wild and Domesticated Strains of White-Rumped Munia, Lonchura striata, Compared by Cross-Fostering Procedures: Domestication Increases Song Variability by Decreasing Strain-Specific Bias Ethology

Zuberbühler, K. (2000). Interspecies semantic communication in two forest primates Proceedings: Biological Sciences, 267 (1444), 713-718 DOI: 10.1098/rspb.2000.1061

Levels of Bilingualism - update

I recently went back to my analysis of the Ethnologue into bilingualism. I suggested estimating the level of bilingualism by calculating the minimum number of bilinguals. However, there is already a much better estimation, and it has already been calculated. Greenberg's diversity index calculates the probability of two people who have different mother tongues in the same country meeting. This index has already been calculated for each country in the ethnologue. Below is a map of these indices. Dark colours indicate a higher diversity index (few people have the same mother tongue), lighter colours indicate a lower diversity index (total white = all people have the same mother tongue).

However, this still isn't a good predictor of bilingualism. Instead, it's a measure of diversity of mother tongue. The index still assumes all people only speak one language. I'm trying to figure out how to modify the index to take account of bilinguals - but I was always rubbish at probability.