Tuesday, 11 May 2010

Mutual Exclusivity biases in cross-situational learning: A comparison between monolingual and bilingual corpora

This report focuses on models of cross-situational learning and how current models compare when exposed to real monolingual and bilingual input. Several model types were evaluated against two transcribed videos of parent-child interaction, one being monolingual and the other being bilingual. Children have been shown to demonstrate a Mutual Exclusivity (ME) bias (Markman and Wachtel, 1988) during word learning. Frank et al. (2009) showed that their model also exhibited Mutual Exclusivity (ME) behaviour after learning from a monolingual corpus of contexts. The current study takes the same model but with bilingual input and asks whether the same behaviour is exhibited.


Frank et al. (2009) provide a transcribed video of monolingual parent-child interaction coded for use in cross-situational learning. An equivalent bilingual corpus was looked for. The main criterion was a roughly equal number of utterances in both languages. The CHILDES database has suitable resources. A recording from a study by Yip and Matthews was selected (see CHILDES, 2010). The child in question was a native bilingual from birth. Her mother was a native Hong-Kong Cantonese speaker and her father was a native speaker of British English. She was 2;11 in the chosen recording. There are 967 utterances, 48% of which are Cantonese and 52% are English. The objects visible in the video were added to the transcription, along with the mapping between words and objects and the referential intentions of the speakers. The coding scheme was adopted from Frank et al. (2009). The code for the models was supplied by Frank et al. (from Frank,2010).


The performance of different models were analysed in two ways. Firstly, the best estimated lexicon (word-object mappings) of each lexicon was evaluated against a gold-standard lexicon in terms of precision, recall and the resulting F-score.Secondly, the models were asked to guess the intended referent of each utterance-object context.Tables 1 and 2 show the lexicon results for the monolingual and bilingual corpora respectively. Frank et al.’s model returns the highest F-scores in both cases.This is largely due to an advantage in precision, likely stemming from the modeling of non-referential words. Frank et al.’s model returned a word-object mapping for the bilingual corpus with a precision of 0.31 and a recall of 0.27, giving an F-score of 0.29. This is lower than the score for the same model on the monolingual corpus. This could be due to the referential uncertainty (independent from amount of synonymy) in the bilingual corpus being higher.The results for the referential intentions, shown in Tables 3 and 4, have different trends. For the monolingual case, the precision of Frank et al.’s model allows it to outperform the other models. However, it performs relatively poorly in the bilingual case, with the Conditional Probability model performing best. However,all models perform with little precision and recall, suggesting that the task is harder. With more data, results might be different.

Mutual Exclusivity

After the model had processed the corpus, it was presented with a mutual exclusivity task and the relative likelihood of several interpretations were measured.In the task, the model was presented with a context with a new object (e.g. a dax) and a familiar object (a bird for the monolingual model and an orange for the bilingual model) and a new word (e.g. ”dax”). The probabilities were calculated for the model linking the new word with neither object (i.e. it considers the word non-referential), linking it with the new word, linking it with the old word or linking it with both. Figure 1 shows the results of the task with the results for a ’monolingual’ model for comparison.

The monolingual results are re-calculated for this study, so differ slightly from those reported in Frank et al. (2009).The results for the monolingual and bilingual models have the same trend - both rank the possible situations in the same order of likelihood. The most likely situation is the new word being linked to the new object, honouring mutual exclusivity. The second most likely situation is that the word refers to neither object.Intuitively, one would expect a bilingual to be more likely to consider that the new word was another word for the familiar object. Indeed, the bilingual model does consider this possibility relatively more likely than the monolingual model. However, the model still considers neither mapping to be more likely than an extra synonym. This may mean that, given an additional cue (e.g. pragmatic), the bilingual would be more ready to accept a synonymous interpretation. This is an empirical question.

The Prior Probability

The prior probability is simply the number of mappings in the hypothesised lexicon, modulated by a fixed parameter (alpha). This represents a preference for smaller lexicons. This means that a hypothesis which results in the lexicon with fewest mappings will receive the highest prior probability. With the default parameter (alpha = 7), the Mutually Exclusive preference (for DAX-dax) beats the preferences for the original mapping (map neither word to the unfamiliar object),both mappings and the mapping of the unfamiliar object with the familiar name.However, this ranking depends on the lexicon size bias (alpha) parameter. With a low alpha, the most likely mapping is the ME mapping. With a higher alpha, the most likely mapping is the original mapping (see Figure 2).

The same trend also exists between the preference for the ME mapping and both mappings, although the preference for both mappings does not overtake the preference for the ME mapping (see Figure 3).
The explanation is as follows: The original mapping receives a high prior probability because it doesn’t increase the size of the lexicon. However, the likelihood of experiencing a non-referential word is low, leading to a total probability that favours the ME mapping over the original. Assuming a larger lexicon (decreasing alpha), the relative increase in lexicon size is smaller, tipping the balance between the original and ME mapping preferences.Interestingly, the likelihood of choosing both mappings overtakes the original mapping when alpha is less than 1 (see figure 4).
That is, the likelihood of assuming both mappings increases when the prior is set to less than the number of word-object mappings in the lexicon. Such a setting makes sense for a bilingual (who have up to twice as many mappings as bilinguals) because it represents the number of concepts. Put another way, by compensating for the additional synonymy in bilingual input, the likelihood of assuming both mappings increases.The dependence of the ME experiment results on alpha is acknowledged by Frank et al.:

“Note that there is some parameter dependence in our models fit to the mutual exclusivity situation. Depending on the size of the corpus,it might be the case that the prior disadvantage of adding a word to the lexicon would not be outweighed by the increase in corpus likelihood caused by learning a new word. This fact makes a developmental prediction: in early development, when very few words are known,inferences about mutual exclusivity should be weaker.”
Supporting Information for Frank et al. (2009), p. 13.

This prediction is borne out in some studies (Merriman and Bowman, 1989; Frankand Poulin-Dubois, 2002; Merriman et al., 1993). However, Markman and Wachtel (1988) found that the ME constraint weakens over time, with older children showing less of a bias, while Deak et al. (2001) find no change.The issue here is the size of the lexicon. Bilingual children may know more words than monolinguals, but it may be more accurate to judge the lexicon size by the size of one language’s lexicon.The model does not provide a mechanism for modulating the lexicon size prior parameter during learning. Currently the prior is modulated by the alpha parameter and the number of mappings, meaning that adding new mappings is dis-preferred. Bilinguals will have a higher number of mappings, altering their prior probabilities. However, this does not lead to qualitative differences in the mutual exclusivity experiment.The motivation for modulating the prior by the number of mappings is mainly to simplify the model.

“We chose a prior probability distribution that favored parsimony,making lexicons exponentially less probable as they included more word-object pairings ... The choice of a simple prior puts most of the work of the model in the likelihood term ... hence, the likelihood term captures the learners assumptions about the structure of the learning task.”
Frank et al., 2009, p. 579

That is, the decision is driven by the statistical, computational approach to the formal problem rather than being psychologically motivated. Therefore, the interpretation is that mutual exclusivity behaviour stems from the child’s unwillingness to learn new signal-meaning mappings. This seems a little circular - children prefer not to extend mappings from familiar words to unfamiliar objects because they prefer not to extend mappings. It also seems to go against children’s obvious ability and motivation for learning new words and meanings. Several solutions which would make the prior more sensitive to the input involve incorporating the number of concepts, the number of words or the amount of synonymy (proportional to the number of words in the lexicon divided by the number of concepts). However, the nature of the model now changes - we are using it to test specific hypotheses about mutual exclusivity, judged against empirical data,rather than seeing if mutual exclusivity ’falls out’ of more basic assumptions.

Concept-based Prior

The mapping-based prior was biased towards a monolingual mode. The model was altered so that the prior was negatively related to the number of objects in the lexicon. This represents the number of concepts for which the child knows words. The model was run on the bilingual corpus and returned a lexicon with a precision of 0.05, a recall of 0.41 and a resulting F-score of 0.09. The model was also run on the monolingual corpus again, returning a precision of 0.05, a recall of0.79 and an F-score of 0.09. For both monolingual and bilingual corpora, the recall of this model is better than for a mapping-based prior, but the precision is much worse. That is, the model overestimates the number of word-concept mappings. In fact, the models accumulated many hundreds of word-concept mappings for tens of objects (Monolingual: 551 mappings for 22 objects and 419 words; Bilingual:641 mappings for 55 objects and 598 words). The models have failed to acquire a useful vocabulary.However, running the Mutual Exclusivity experiment again, the relative ranking of the preferences has changed. Although the ME mapping is still favoured, the next preferred interpretation is to make both mappings (rather than neither, see Figure 5). However, this difference is exhibited with both monolingual and bilingual input data. By neutralising the difference in the prior, the corpus likelihood now plays a bigger role, leading to a difference in the preferences.

How ’Monolingual’ is the Monolingual corpus?

Although the monolingual corpus is taken from a carer speaking one language, the lexicon the model learns contains synonymy. In fact, for the 15 objects it learned words for, 8 had more than one associated word. For half of these 8 objects, all synonyms were appropriate (e.g. ’bird’ and ’birdie’ to describe the object ’duck’),but half were not appropriate. In other words, the model accommodates synonymy.The original Mutual Exclusivity experiment in Frank et al. was done with the object ’bird’, which had one associated word. The ME experiment was applied for all words that the model learned from the monolingual corpus. There were no significant differences between the posterior probabilities for any of the situations (DAX-dax, Both etc.) for synonymous mappings versus non-synonymous mappings. This holds for both the original and the concept-based prior.


Frank et al.’s model can be used to model word learning in bilinguals. There are some quantitative differences in the ME behaviour of models run on monolingual and bilingual corpora. However, no qualitative differences were found. Even when the prior bias for minimising the number of mappings was neutralised, both models still preferred to map the new object with the new word.

Next Steps
The results are inconclusive, but may reflect the limited data. I suggest that synthetic corpora would make the dynamics more clear. Very simple cross-situational learning corpora could be created with varying amount of ’bilingualism’.


Frank MC, Goodman ND, & Tenenbaum JB (2009). Using speakers' referential intentions to model early cross-situational word learning. Psychological science : a journal of the American Psychological Society / APS, 20 (5), 578-85 PMID: 19389131

Byers-Heinlein K, & Werker JF (2009). Monolingual, bilingual, trilingual: infants' language experience influences the development of a word-learning heuristic. Developmental science, 12 (5), 815-23 PMID: 19702772

Deák GO, Yen L, & Pettit J (2001). By any other name: when will preschoolers produce several labels for a referent? Journal of child language, 28 (3), 787-804 PMID: 11797548

Frank, I., & Poulin-Dubois, D. (2002). Young monolingual and bilingual children's responses to violation of the Mutual Exclusivity Principle International Journal of Bilingualism, 6 (2), 125-146 DOI: 10.1177/13670069020060020201

Markman EM, & Wachtel GF (1988). Children's use of mutual exclusivity to constrain the meanings of words. Cognitive psychology, 20 (2), 121-57 PMID: 3365937

Merriman WE, & Bowman LL (1989). The mutual exclusivity bias in children's word learning. Monographs of the Society for Research in Child Development, 54 (3-4), 1-132 PMID: 2608077

Merriman WE, Marazita J, & Jarvis LH (1993). Four-year-olds' disambiguation of action and object word reference. Journal of experimental child psychology, 56 (3), 412-30 PMID: 8301246

Healey, E. and Scarabela, B. (2009). Are children willing to accept two labels for one object? Proceedings of the Child Language Seminar. University of Reading.

No comments:

Post a Comment