Wednesday, 8 December 2010

A fun Language Experiment!

I'm running a pilot experiment - you can help me by taking part - it only takes 5 minutes!

It's a memory experiment where you'll hear an alien language, then answer questions on it.

Friday, 19 November 2010

Learning the Form of Causal Relationships Using Hierarchical Bayesian Models

The mutual exclusivity bias has been posited as a fundamental learning bias (Markman & Wachtel, 1988; Merriman & Bowman, 1989). However, there is mounting evidence that bilinguals do not exhibit mutual exclusivity (Merriman & Kutlesic, 1993; Byers-Heinlein & Werker, 2009;Healey & Skarabela, 2009; Houston-Price, Caloghiris & Raviglione, 2010). It is hypothesised that the amount of variation in the input of bilinguals either lacks enough evidence for a mutual exclusivity bias to emerge, or renders it ineffective. Previously, I showed that a Bayesian model of cross-situational learning (Frank et al., 2009) could not account for differences found between the mutual exclusivity behaviour of monolingual and bilingual children. A reasonable step towards capturing this behaviour would be to add a higher level of abstraction to the model. This would allow the model to alter the strength of its own mutual exclusivity bias in accordance with the amount of variance it encountered.

The research described here implements a hierarchical Bayesian model of causal structure (Lucas & Griffiths,2010;Lucas, 2010). Although it does not address mutual exclusivity directly, it is very relevant. Adult's and children's inferences about causal relationships were affected by their previous experience. A hierarchical Bayesian model that could adjust its assumptions about causal structure was shown to match the performance of adults.
When we learn about causal relationships, we learn to associate a cause with an effect. When there are multiple possible causes, we also need to consider the functional form of the relationships. For example, imagine that two switches are connected to a light. There are a number of possibilities that could cause the light to turn on. Perhaps only one switch needs to be activated (OR), perhaps both are needed (AND). The switches may always work the same (deterministic) or vary some proportion of the time (stochastic or `noisy').

Lucas & Griffiths (2010) show that a hierarchical Bayesian model can match the behaviour of adults in a learning task where they had to make inferences about causal relationships. Previous models of causal structure learning either assume particular functional forms of causal relationships, making them inflexible, or make no assumptions, rendering them incapable of capturing effects of context (see Griffiths & Tenenbaum, 2005, Lucas & Griffiths, 2010). The hierarchical model infers the functional form of the causal relationship as well as the exact relationship between variables.

Although not couched in terms of cross-situational learning, the task is compatible with one. Participants were shown wooden blocks, identical except for a one-letter label (A, B and C in the figure above). The task was to learn which were `blickets'. To help them, there was a `blicket meter' - a device that activated when in the presence of a blicket. Participants were shown several training rounds where one or two blocks were placed on the blicket meter and observed the meter activating or not. After training, they saw another set of blocks (D, E and F) go through a series of blicket tests. They were asked to indicate how confident they were that each block was a blicket.

The experiments had different training conditions. In one, the training data was consistent with the blicket meter's response having a disjunctive relationship (OR) with its causes. That is, it activated when any of the blocks placed on meter were blickets. In another condition, the blicket meter responded consistently with a conjunctive relationship (AND). That is, two blickets were required to activate the blicket meter. The test block was set up to be consistent with either training condition.

Participants' responses in the test block were affected by their experience during training. They saw block D fail to activate the meter 3 alone 3 times, block E fail once and blocks D and F together activate the meter twice. If you assume a disjunctive relationship, then D failing 3 times should be evidence against D being a blicket, while E failing once is less evidence. Indeed, participants in this condition rated D as being less likely to be a blicket than E. Assuming a conjunctive relationship, however, D failing to activate the meter is not informative, whereas seeing D and F activate the meter together is evidence for D being a blicket. Participants given conjunctive training rated D as being more likley to be a blicket. The model matched the participants responses closely.

Lucas & Griffiths argue that this shows that people can make inferences appropriate to causal relationships with more than one kind of functional form (e.g. conjunctive, disjunctive) and that their inferences can be affected by evidence `transferred' from a previous experience. In other words, participants' assumptions about causal relationships can be modified by experience, and this can lead to qualitatively different behaviour.

Lucas (2010) also shows that the model accounts for children's behaviour, too. However, children's responses were more affected by the likelihood than the adults, while adults tended to assume an OR function. This suggests that children are more flexible learners.

However, I'm unsure whether hierarchical Bayesian modelling can be applied to language. Causal forms and causal relationships have a definite hierarchy. But what about F1 and F0? English uses formants to make distinctions at the lexical level and pitch to make distinctions at the pragmatic or phrasal level. Tonal languages, however, use pitch at the lexical level, and some have morphological markers of phrasal boundaries (see Black, 2000).

I suggest that Bayesian models will always have built-in assumptions about the structure of the phenomenon. In studying language evolution, we should be focussing on how this structure emerges in the first place. I propose a different kind of hierarchical model that does not specify a structure in advance. Rather, the role of each level of the hierarchy should be determined by the data. This should be based on the most salient cues that divide the variance in the data in the most functional way.

One possible solution could be general hierarchical dynamic expectation maximisation models. But more on this in the future.

Lucas, C. G., & Griffiths, T. (2010). Learning the Form of Causal Relationships Using Hierarchical Bayesian Models Cognitive Science, 34 (1) pdf

Lucas, C. G. (2010). Developmental differences in learning the form of causal relationships Proceedings of the Cognitive Science Society pdf

Wednesday, 25 August 2010

How to fix Java for Firefox 3.6 on Mac

A few days ago I updated Firefox and found that java wasn't working. This is a problem, since I'm developing java programs.

The Software Update facility could not find an update for java or Firefox. Although there were some solutions on the net, none of them worked.

In the end, I reinstalled the Java Embedding Plugin in the following way:

Download and unzip the Java Embedding Plugin.

Follow the Read-Me, reproduced here:

1) In the JavaEmbeddingPlugin folder you just downloaded and unzipped, open the Binaries folder and drag JavaEmbeddingPlugin.bundle and MRJPlugin.plugin to "/Library/Internet Plug-Ins" folder.

2) If you're running Mac OS X 10.4.X (Tiger), you also need to make
sure MRJPlugin.plugin's timestamp is later than the timestamps
of two other files in the "/Library/Internet Plug-Ins" folder --
"Java Applet.plugin" and "Java Applet Plugin Enabler". This
isn't necessary on Mac OS X 10.5.X (Leopard).

Open a Terminal session and enter the following command:

touch "/Library/Internet Plug-Ins/MRJPlugin.plugin"

3) Find your browser in the Applications folder (e.g. Firefox), Control-click (or right-click) on your browser's binary and choose "Show Package Contents".

4) Browse to the Contents/MacOS/plugins folder and delete JavaEmbeddingPlugin.bundle and MRJPlugin.plugin.

Friday, 6 August 2010

Language Evolution and Tetris!

Hello, people of the Blogosphere!

Why not take some time out from your dedicated reading to do a little language evolution experiment! And all you have to do is play Tetris!

The Evolution of Tetris

... and learn an alien language. It takes no less than 10 minutes.

The instructions and game are here:

Due to me being a terrible programmer, it'll probably crash or do some weird things. But it's all in the name of pseudo-science!

Monday, 26 July 2010

I'm blogging at Replicated Typo

I am now blogging at A Replicated Typo - don't expect to see to much more on this blog!

Thursday, 24 June 2010

Cultural Induciton is Hard

Chater and Christiansen (2010) argue that culturally transmitted systems such as language are easier to learn than natural systems because they have adapted to learner's biases, so their intuitions will likely be correct. Being a speaker of one of the morphologically most complex languages in the world, I'm not so sure...

Maggie Tallerman gave a keynote speech at this year's EvoLang conference. A section of it used grammatically acceptable and unacceptable sentences in Welsh to illustrate the point. As a somewhat lapsed Welshspeaker, whose knowledge of Mutation was never great, I was a bit embarrassed to find I wasn't sure if the examples were correct. Last month, too, Mike Dowman gave a talk at Edinburgh University emphasising how impressive it is that we all make the same grammaticality judgements about sentences we have never seen before.

I've long wondered whether this is the case with Welsh mutation. Consonant mutation, or 'Treiglo' in Welsh, occurs in many Celtic languages and is a terrible affliction for the second language learner. In a number of (grammatical) contexts, the initial consonant of a word (nouns and verbs) changes to another. For instance, 'kitchen' in Welsh is 'cegin' [k3gIn], but 'his kitchen' is 'ei gegin' [g3gIn] and 'her kitchen' is 'ei chegin' [X3gIn].

There are three forms of mutuation - soft, nasal and aspirate. The Wikipedia page on Welsh Mutation gives a broad overview. The contexts they apply in are extensive, for example:

  • Nouns after the preposition 'in'
  • After imperatives.
  • After the personal pronoun (my)
  • Singular feminine nouns after the definite article (but not words beginning in ll or rh)
  • In the negative form of verbs in the Short Future Tense
  • Masculine nouns after 'three' and all nouns after 'six'

The rules of mutation in old Welsh were much simpler: It only occured for feminine nouns after the definate article. However, presumaply by a process of analogy, the 'rules' spread and became more complex. And here's the point I'm trying to make: Welsh morphology may be so complex, and subject to so much change by analogy, that there is very little agreement between people.

I'm not saying that mutuation is unsystematic. There's even an automatic mutuation checker online. However, it always annoys me when syntacticians cite some examples that they haven't actually gone out and tested.

Now, this may be an intuition I have from school. I grew up speaking Welsh, both my parents speak Welsh and I went to a Welsh-medium nursery, primary and secondary school where we were not allowed to speak English. Despite this, I'm not a confident speaker, especially after 7 years outside of Wales. I was particularly bad at mutation (although my English spelling was, and still is, equally as bad). We had it drilled into us with tables and excersises, but I still can't really do it properly. This is partly because of the minority langauge status of Welsh, and the fact that everybody spoke English as a form of rebellion. The influence of English has been felt in other areas of Welsh such as Subject-Verb order, too.

However, I've always felt guilty about not being able to speak the mothertongue properly. Then I became a linguist and found a way out: All along, my teachers had been prescribing language, and that prescription was a few generations old. In terms of language evolution, the language the children speak is the correct language (the descriptivist approach).

Long story short, I conducted my own grammaticality judgement experiment. I couldn't find a grammaticality judgement for Welsh mutation online, nor any information about how well learners pick it up (please send me links if you know of any!). Neither am I a trained syntactician, so I have no real idea how to do an experiment, nor do I have access to money to employ participants.

So I decided to do it in the form of a facebook quiz, using Quibblo. I found example sentences from an instructional pamphlet and took one from each major context. I then created alternative mutations for each sentence. Participants were presented with an English equivalent of each sentence, and asked to indicate which sentences they thought were correct. Participants could make more than one choice.

You can take the quiz here and view the results here.

12 people participanted, mainly schoolfriends since this was distributed via facebook. This is a good point rather than a bad point, since we're more likely to have been exposed to the same linguistic environment (and indeed been in frequent contact). However, many of these may be, like me, somewhat out of practice. On the other hand, 11 indicated they were 'fluent' and 1 'intermediate' speakers of welsh. All but one came from South Wales. All particpants chose only one answer in 17 out of 34 questions.

As it turned out, Quibblo wasn't a very good choice - it only records totals, not individual records of participant's choices. Anyway, here's some analysis.

For each setence, I worked out the average agreement. This is the likelyhood of any two people agreeing that at least one form was correct. For all sentences, the average agreement was 67.1% For sentences where participants only selected one answer, the average agreement was 60.9%.

Some sentences recieved 100% agreement - for the sentence 'the boy', all 12 participants chose the prescribed 'dy fachgen'. However, the sentence 'the girl' was split with two thirds going for the prescribed 'y ferch' and one third going for 'y merch'. On the other end of things, 9 participants chose 'dydd mawrth' for the meaning 'on tuesday', when the prescribed form is 'ddydd mawrth', which only 3 participants chose.

For the setences with more than two possible choices, the choices are spread. For the sentence 'I read a good book', 6 different options were selected with an average agreement of 54.5%. The worst agreement was for the sentence 'the sixth girl' with participants agreeing on average 25.5% of the time between 4 options. For this sentence, 12 participants chose 16 options, meaning that some participants thought at least two options were correct (I don't have the exact data on who chose what).

I put some tricky questions in to see what would happen. The first was designed to test whether adjacent adjectives should be mutated. That is, adjectives which follow a singular feminine noun mutate, but it's not clear whether a following adjective should too. Participants were given the sentence 'a big, tall, good girl' and given the option to mutate none, the first adjective, first and second adjectives and all adjectives. 3 participants chose to mutate only the first while 5 chose to mutate all adjectives (3 chose to mutate none and 1 chose to mutate two). The agreement was 24.2%.

The second tricky question involved loanwords. Nouns after a conjunction mutate, so participants were given the sentence 'gin and tonic' and the options 'jin a tonic' and 'jin a thonic'. Agreement was slightly better here at 84.8% in favour of the prescribed (and attested) mutated form, although one person thought both were correct.

The final one involved analogy and loanwords. I heard someone mutate 'chips', which has a [ tʃ ], which doesn't exist in Welsh to the voiced equivalent [ dʒ ]. There is no prescription here, but this makes perfect sense if mutation really does spread by analogy. Presented with the sentence 'a bag of chips', 5 participants voted for the unmutated variant and 10 for 'bag o jips' (83.3% agreement).

All in all, enough to make my old Welsh teachers weep. Of course, the sample is probably skewed and can't be verified and some might have looked the answer up etc. But part of the point of this is that, for very simple sentences, people should be choosing the same sentences.

In a forthcoming special issue of Cognitive Science (preview here), Nick Chater and Morton Christiansen argue that learning cultrually-transmitted systems is easier than learning about the natural world because cultural systems will be adapted towards a learner's biases. Therefore, a learner's intuitions and guesses are likely to be correct. That is, it's easier to co-ordinate your behaviour with other people than it is to be right about the world (an alternative name for the paper could have been 'Language Evolution: Specifically not the hardest problem in Science').

It's a great paper, and argues for my PhD thesis - that language acquisition should be looked at in the light of language evolution. However, cultrual induction may not be easier than learning about the natural world if everybody is doing something different. Consider the participants in my experiment: A child learning from them faces sources of cultural variants that not only contradict each other half of the time, but contradict themselves part of the time. At least mass and other physical attributes are Universal - gravity doesn't work differently in North Wales. However, since the data cultural learners are presented with comes from multiple people who themselves may have had different and non-overlapping sources of input, cultural learning may be pretty tricky after all.

So, there may be space in this topic for my PhD thesis: The social structure of cultural learners will have a huge impact on the ease of Cultural induction, and thus on the pressures and eventual forms of language.

Nick Chater & Morton H. Christiansen (2010). Language Acquisition Meets Language Evolution Cognitive Science

Tuesday, 25 May 2010

Evolutionary approaches to Bilingualism

I recently gave a talk at the University of Edinburgh LEL Postgraduate Conference. It was my first ever talk and it really forced me to figure out what I'm supposed to be studying! Here's a video of my talk:

Frank MC, Goodman ND, & Tenenbaum JB (2009). Using speakers' referential intentions to model early cross-situational word learning. Psychological science : a journal of the American Psychological Society / APS, 20 (5), 578-85 PMID: 19389131

Hunag, Y. (2009). Supporting Meaningful Social Networks Technical Report, ECS, University of Southampton

Healey, E. and Scarabela, B. (2009). Are children willing to accept two labels for one object? Proceedings of the Child Language Seminar. University of Reading.

Byers-Heinlein K, & Werker JF (2009). Monolingual, bilingual, trilingual: infants' language experience influences the development of a word-learning heuristic. Developmental science, 12 (5), 815-23 PMID: 19702772

Thursday, 13 May 2010

E-Coli, Linux and Language

A recent post on The Loom looks at a paper by Koon-Kiu Yang et al. which compares the hierarchical structures of the operating system Linux and the bacterium E-Coli. Really interesting analysis - and a good discussion on the blog.

I found it interesting that E-coli's structure is primarily lower-level 'workhorses' with relatively few master controllers. Linux on the other hand has a much larger percentage of high-level 'master' and 'middle manager' modules and reletively few 'workhorses'. Linux is designed while E-coli is evolved.

I’m wondering how linguistic systems would fit into this schema. What are the ‘workhorses’ and ‘master regulators’ of language? There are many more ‘low-level’ words that refer to things than ‘higher level’ syntactic structures. This would make it like e-coli.

On the other hand, there are relatively few ‘low level’ phonemes and very many ‘high level’ concepts. This would make it more like Linux.

Maybe language has more ‘middle managers’ than anything else?

Answering this may give an insight into how ‘designed’ language is, as opposed to ‘evolved’.

Yan, K., Fang, G., Bhardwaj, N., Alexander, R., & Gerstein, M. (2010). Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.0914771107

Tuesday, 11 May 2010

Mutual Exclusivity biases in cross-situational learning: A comparison between monolingual and bilingual corpora

This report focuses on models of cross-situational learning and how current models compare when exposed to real monolingual and bilingual input. Several model types were evaluated against two transcribed videos of parent-child interaction, one being monolingual and the other being bilingual. Children have been shown to demonstrate a Mutual Exclusivity (ME) bias (Markman and Wachtel, 1988) during word learning. Frank et al. (2009) showed that their model also exhibited Mutual Exclusivity (ME) behaviour after learning from a monolingual corpus of contexts. The current study takes the same model but with bilingual input and asks whether the same behaviour is exhibited.


Frank et al. (2009) provide a transcribed video of monolingual parent-child interaction coded for use in cross-situational learning. An equivalent bilingual corpus was looked for. The main criterion was a roughly equal number of utterances in both languages. The CHILDES database has suitable resources. A recording from a study by Yip and Matthews was selected (see CHILDES, 2010). The child in question was a native bilingual from birth. Her mother was a native Hong-Kong Cantonese speaker and her father was a native speaker of British English. She was 2;11 in the chosen recording. There are 967 utterances, 48% of which are Cantonese and 52% are English. The objects visible in the video were added to the transcription, along with the mapping between words and objects and the referential intentions of the speakers. The coding scheme was adopted from Frank et al. (2009). The code for the models was supplied by Frank et al. (from Frank,2010).


The performance of different models were analysed in two ways. Firstly, the best estimated lexicon (word-object mappings) of each lexicon was evaluated against a gold-standard lexicon in terms of precision, recall and the resulting F-score.Secondly, the models were asked to guess the intended referent of each utterance-object context.Tables 1 and 2 show the lexicon results for the monolingual and bilingual corpora respectively. Frank et al.’s model returns the highest F-scores in both cases.This is largely due to an advantage in precision, likely stemming from the modeling of non-referential words. Frank et al.’s model returned a word-object mapping for the bilingual corpus with a precision of 0.31 and a recall of 0.27, giving an F-score of 0.29. This is lower than the score for the same model on the monolingual corpus. This could be due to the referential uncertainty (independent from amount of synonymy) in the bilingual corpus being higher.The results for the referential intentions, shown in Tables 3 and 4, have different trends. For the monolingual case, the precision of Frank et al.’s model allows it to outperform the other models. However, it performs relatively poorly in the bilingual case, with the Conditional Probability model performing best. However,all models perform with little precision and recall, suggesting that the task is harder. With more data, results might be different.

Mutual Exclusivity

After the model had processed the corpus, it was presented with a mutual exclusivity task and the relative likelihood of several interpretations were measured.In the task, the model was presented with a context with a new object (e.g. a dax) and a familiar object (a bird for the monolingual model and an orange for the bilingual model) and a new word (e.g. ”dax”). The probabilities were calculated for the model linking the new word with neither object (i.e. it considers the word non-referential), linking it with the new word, linking it with the old word or linking it with both. Figure 1 shows the results of the task with the results for a ’monolingual’ model for comparison.

The monolingual results are re-calculated for this study, so differ slightly from those reported in Frank et al. (2009).The results for the monolingual and bilingual models have the same trend - both rank the possible situations in the same order of likelihood. The most likely situation is the new word being linked to the new object, honouring mutual exclusivity. The second most likely situation is that the word refers to neither object.Intuitively, one would expect a bilingual to be more likely to consider that the new word was another word for the familiar object. Indeed, the bilingual model does consider this possibility relatively more likely than the monolingual model. However, the model still considers neither mapping to be more likely than an extra synonym. This may mean that, given an additional cue (e.g. pragmatic), the bilingual would be more ready to accept a synonymous interpretation. This is an empirical question.

The Prior Probability

The prior probability is simply the number of mappings in the hypothesised lexicon, modulated by a fixed parameter (alpha). This represents a preference for smaller lexicons. This means that a hypothesis which results in the lexicon with fewest mappings will receive the highest prior probability. With the default parameter (alpha = 7), the Mutually Exclusive preference (for DAX-dax) beats the preferences for the original mapping (map neither word to the unfamiliar object),both mappings and the mapping of the unfamiliar object with the familiar name.However, this ranking depends on the lexicon size bias (alpha) parameter. With a low alpha, the most likely mapping is the ME mapping. With a higher alpha, the most likely mapping is the original mapping (see Figure 2).

The same trend also exists between the preference for the ME mapping and both mappings, although the preference for both mappings does not overtake the preference for the ME mapping (see Figure 3).
The explanation is as follows: The original mapping receives a high prior probability because it doesn’t increase the size of the lexicon. However, the likelihood of experiencing a non-referential word is low, leading to a total probability that favours the ME mapping over the original. Assuming a larger lexicon (decreasing alpha), the relative increase in lexicon size is smaller, tipping the balance between the original and ME mapping preferences.Interestingly, the likelihood of choosing both mappings overtakes the original mapping when alpha is less than 1 (see figure 4).
That is, the likelihood of assuming both mappings increases when the prior is set to less than the number of word-object mappings in the lexicon. Such a setting makes sense for a bilingual (who have up to twice as many mappings as bilinguals) because it represents the number of concepts. Put another way, by compensating for the additional synonymy in bilingual input, the likelihood of assuming both mappings increases.The dependence of the ME experiment results on alpha is acknowledged by Frank et al.:

“Note that there is some parameter dependence in our models fit to the mutual exclusivity situation. Depending on the size of the corpus,it might be the case that the prior disadvantage of adding a word to the lexicon would not be outweighed by the increase in corpus likelihood caused by learning a new word. This fact makes a developmental prediction: in early development, when very few words are known,inferences about mutual exclusivity should be weaker.”
Supporting Information for Frank et al. (2009), p. 13.

This prediction is borne out in some studies (Merriman and Bowman, 1989; Frankand Poulin-Dubois, 2002; Merriman et al., 1993). However, Markman and Wachtel (1988) found that the ME constraint weakens over time, with older children showing less of a bias, while Deak et al. (2001) find no change.The issue here is the size of the lexicon. Bilingual children may know more words than monolinguals, but it may be more accurate to judge the lexicon size by the size of one language’s lexicon.The model does not provide a mechanism for modulating the lexicon size prior parameter during learning. Currently the prior is modulated by the alpha parameter and the number of mappings, meaning that adding new mappings is dis-preferred. Bilinguals will have a higher number of mappings, altering their prior probabilities. However, this does not lead to qualitative differences in the mutual exclusivity experiment.The motivation for modulating the prior by the number of mappings is mainly to simplify the model.

“We chose a prior probability distribution that favored parsimony,making lexicons exponentially less probable as they included more word-object pairings ... The choice of a simple prior puts most of the work of the model in the likelihood term ... hence, the likelihood term captures the learners assumptions about the structure of the learning task.”
Frank et al., 2009, p. 579

That is, the decision is driven by the statistical, computational approach to the formal problem rather than being psychologically motivated. Therefore, the interpretation is that mutual exclusivity behaviour stems from the child’s unwillingness to learn new signal-meaning mappings. This seems a little circular - children prefer not to extend mappings from familiar words to unfamiliar objects because they prefer not to extend mappings. It also seems to go against children’s obvious ability and motivation for learning new words and meanings. Several solutions which would make the prior more sensitive to the input involve incorporating the number of concepts, the number of words or the amount of synonymy (proportional to the number of words in the lexicon divided by the number of concepts). However, the nature of the model now changes - we are using it to test specific hypotheses about mutual exclusivity, judged against empirical data,rather than seeing if mutual exclusivity ’falls out’ of more basic assumptions.

Concept-based Prior

The mapping-based prior was biased towards a monolingual mode. The model was altered so that the prior was negatively related to the number of objects in the lexicon. This represents the number of concepts for which the child knows words. The model was run on the bilingual corpus and returned a lexicon with a precision of 0.05, a recall of 0.41 and a resulting F-score of 0.09. The model was also run on the monolingual corpus again, returning a precision of 0.05, a recall of0.79 and an F-score of 0.09. For both monolingual and bilingual corpora, the recall of this model is better than for a mapping-based prior, but the precision is much worse. That is, the model overestimates the number of word-concept mappings. In fact, the models accumulated many hundreds of word-concept mappings for tens of objects (Monolingual: 551 mappings for 22 objects and 419 words; Bilingual:641 mappings for 55 objects and 598 words). The models have failed to acquire a useful vocabulary.However, running the Mutual Exclusivity experiment again, the relative ranking of the preferences has changed. Although the ME mapping is still favoured, the next preferred interpretation is to make both mappings (rather than neither, see Figure 5). However, this difference is exhibited with both monolingual and bilingual input data. By neutralising the difference in the prior, the corpus likelihood now plays a bigger role, leading to a difference in the preferences.

How ’Monolingual’ is the Monolingual corpus?

Although the monolingual corpus is taken from a carer speaking one language, the lexicon the model learns contains synonymy. In fact, for the 15 objects it learned words for, 8 had more than one associated word. For half of these 8 objects, all synonyms were appropriate (e.g. ’bird’ and ’birdie’ to describe the object ’duck’),but half were not appropriate. In other words, the model accommodates synonymy.The original Mutual Exclusivity experiment in Frank et al. was done with the object ’bird’, which had one associated word. The ME experiment was applied for all words that the model learned from the monolingual corpus. There were no significant differences between the posterior probabilities for any of the situations (DAX-dax, Both etc.) for synonymous mappings versus non-synonymous mappings. This holds for both the original and the concept-based prior.


Frank et al.’s model can be used to model word learning in bilinguals. There are some quantitative differences in the ME behaviour of models run on monolingual and bilingual corpora. However, no qualitative differences were found. Even when the prior bias for minimising the number of mappings was neutralised, both models still preferred to map the new object with the new word.

Next Steps
The results are inconclusive, but may reflect the limited data. I suggest that synthetic corpora would make the dynamics more clear. Very simple cross-situational learning corpora could be created with varying amount of ’bilingualism’.


Frank MC, Goodman ND, & Tenenbaum JB (2009). Using speakers' referential intentions to model early cross-situational word learning. Psychological science : a journal of the American Psychological Society / APS, 20 (5), 578-85 PMID: 19389131

Byers-Heinlein K, & Werker JF (2009). Monolingual, bilingual, trilingual: infants' language experience influences the development of a word-learning heuristic. Developmental science, 12 (5), 815-23 PMID: 19702772

Deák GO, Yen L, & Pettit J (2001). By any other name: when will preschoolers produce several labels for a referent? Journal of child language, 28 (3), 787-804 PMID: 11797548

Frank, I., & Poulin-Dubois, D. (2002). Young monolingual and bilingual children's responses to violation of the Mutual Exclusivity Principle International Journal of Bilingualism, 6 (2), 125-146 DOI: 10.1177/13670069020060020201

Markman EM, & Wachtel GF (1988). Children's use of mutual exclusivity to constrain the meanings of words. Cognitive psychology, 20 (2), 121-57 PMID: 3365937

Merriman WE, & Bowman LL (1989). The mutual exclusivity bias in children's word learning. Monographs of the Society for Research in Child Development, 54 (3-4), 1-132 PMID: 2608077

Merriman WE, Marazita J, & Jarvis LH (1993). Four-year-olds' disambiguation of action and object word reference. Journal of experimental child psychology, 56 (3), 412-30 PMID: 8301246

Healey, E. and Scarabela, B. (2009). Are children willing to accept two labels for one object? Proceedings of the Child Language Seminar. University of Reading.

Thursday, 6 May 2010

Systematicity of RNA

I've been looking at evolutionary precursors to bilingualism. What does this mean? At the moment, I'm thinking about it in the sense of having two or more signals which correspond to the same action or meaning. Not much before language, you say? How about going all the way back to RNA codes?

RNA converts genetic information stored in DNA into proteins which regulate processes within cells. The ‘code’ for translating DNA into proteins is redundant but not ambiguous. There are varieties of code. Different organisms use different proportions of codons. ‘Error’ is defined as sum of protein changes when changing from each codon to each other codon, weighted by the frequency of the codon’s use (Marquez, Smit & Knight, 2005). In this sense, the error rate is comparable with the RegMap index of redundancy.

RegMap was developed to calculate the degree of regularity in the mappings between signals and meanings (Tamariz & Smith, 2008). Essentially, it's the relative entropy modified by the frequency of use.

RegMap was applied to RNA coding frequencies of various organisms. Info was taken from the codon usage database for about 16,500 organisms. As a baseline, the same coding transcriptions were used, but with randomised frequencies. The RegMap index of genetic code and actual usage frequencies is significantly higher than randomised frequencies (Mean RegMap for actual = 0.711, random = 0.708, t = 4.8, df = 7196, p less than 0.0001).

The graph is not much use, but here it is:

Marquez R, Smit S, & Knight R (2005). Do universal codon-usage patterns minimize the effects of mutation and translation error? Genome biology, 6 (11) PMID: 16277746

Monica Tamariz, Andrew D. M. Smith (2008). Quantifying the regularity of the mappings between signals and meanings Proceedings of the 7th Conference on the Evolution of Language. pdf

Wednesday, 5 May 2010

LEL Postgraduate Conference

I'm giving a talk at the LEL Postgraduate Conference at the University of Edinburgh, 19th - 21st May. It's not that big of a deal, since I'm required to give a talk, but it is my first talk. A link to the website (which I'm maintaining) with more details and my abstract follow!

LEL Postgraduate Conference 2010

Bilingualism and Social Networks

Children learn language from exposure to speakers in their social network. This learning influences the input that will be given to the next generation. The way languages change over time is dependent on the learning biases of individuals (e.g. Kirby, Dowman & Griffiths, 2007), but also on the dynamics of the social network of those individuals (Gong & Wang, in press; Lupyan & Dale, 2010; Gal, 1979; Govindasamy, 2003).

Bilingualism is often marginalised in theories of language evolution and existing bilingualism is generally seen as the product of contact between two or more monolingual communities. However, I hypothesise that a bilingual ability is a fundamental aspect of language learning: children can learn two languages as easily as learning one. This suggests that human cognition is geared towards handling complex, not homogenous cultural input. This in turn may suggest the kind of social networks in which human cultural transmission evolved. The prevalence of monolingualism in some modern societies may be explained by changes to social structures afforded by communications technology.

This talk will outline my approach to this hypothesis. This involves the idea of cultural transmission as a trade-off between communicative flexibility and expressivity, the use of a comparative approach to bilingualism and methodologies to generate and test hypotheses.

Tuesday, 4 May 2010

Bilingualism as a preadaptation for Language

This report is the beginnings of an attempt at a comparative approach to bilingualism, in the style of Fitch(2005). Bilingualism is difficult to define, but by asking whether there is evidence for this capacity in non-human species, it's hoped that this question is made clearer.

This research project takes an evolutionary approach to Bilingualism. One of the most difficult problems faced so far is identifying the role of bilingualism in the cultural evolution of language. Is it a product or a catalyst? Firstly, I'm not sure whether this has been considered to any great extent. However, I suggest that the implicit assumption in the vast majority of work in both the areas of Bilingualism and Language Evolution has been that bilingualism is a product of the merging of homogenous language communities. This report explicitly asks the question: Which came first - Language or Bilingualism? That is, did the capacity for bilingualism develop from a pressure to learn multiple existing languages or was it a capacity which existed before human languages were established and influenced their arrival?

The latter hypothesis seems to be non-sensical. How can individuals have the ability to learn more than one language when there are no languages to be learned? Here, I'd like to make a distinction between two kinds of bilingualism, following the approach of Hauser Chomsky Fitch (2002). Bilingualism in the narrow sense means the ability to learn several human languages. This is obviously a human-only trait. Bilingualism in the broad sense refers to the general capacity to acquire more than one signalling system. Depending on how one defines signalling systems, this capacity may be shared with many other animals, both closely and distantly related. Of course, defining what constitutes a single signalling system is difficult, let alone defining language or bilingualism. However, it's hoped that the approach taken in this paper will help towards this goal by considering the features of the phenomenon we wish to define.

Before considering this possibility, the comparative approach to language evolution is presented. Fitch (2005 and others) approaches the study of the evolution of language by considering what elements contribute towards the `Faculty of Language'. In the broad sense of the term, this covers all the prerequisite elements that are required for linguistic communication. This involves cognitive capacities such as acoustic string segmentation and semantic processing, but also much more basic features such as memory. That is, features of the Faculty of Language in the broad sense (FLB) are found in humans and animals. The narrow sense of the term (FLN) refers to those capacities which are involved in language alone. There is much more debate about what these elements are. Recursive processing has been suggested as one example.

The comparative approach has been used to answer the question of what belongs to FLN and to FLB. Animals have been shown to be capable of a number of processes required for language, including categorical perception of speech sounds (Kuhl & Miller, 2978) and Mutual Exclusivity (Juliane & Kaminski, 2004). From studies of divergent and convergent evolution of these traits, some important features have been identified. For example, many species which exhibit vocal learning have direct neural connections between the brain and vocal motors, while non-vocal learners do not (see Doupe, 1999).

This report suggests that this approach should be adopted for the study of Bilingualism. Such an approach would seek to answer whether bilingualism is a uniquely human capacity. If it turns out that other animals also have this capacity, then the role of bilingualism in the evolution of language can be re-assessed.

However, there is a large initial problem. Even FLB only consists of capacities that are required for language. Bilingualism in the broad sense, however, is not required in order to speak a language. This problem may be due to the individual-level bias to the idea of the 'Faculty of Language'. Its primary aim is to describe capacities that an individual organism requires, rather than a community. Therefore, bilingualism may not be part of the FLB, and simply a product of cultural interaction. However, the comparative approach can help verify this hypothesis if social animals exhibit the capacity for bilingualism. That is, if bilingualism comes from cultural interaction alone, there should be no non-social animals which have the capacity for it.

Bilingualism in Bengalese Finches
If other species exhibit bilingualism, then this is evidence that bilingualism developed before human language. Takahasi & Okanoya (2010) study the vocal learning patterns of the Bengalese Finch. These are a domesticated breed descended from wild White Backed Munia. The Bengalese Finch exhibits very complex song patterns in comparison to the White Backed Munia.

Takahasi & Okanoya (2010) carry out a cross-fostering experiment where Munias are brought up by Finches, and Finches are brought up by Munias. The Munias tended to have a stronger preference for copying Munia songs, while the Finches are not so disposed towards their own strain's song. That is, Finches have more flexibility in learning. It is hypothesised that this is because there is a pressure on Munias to identify their own strain in the wild where there are mixed flocks, while this pressure has been masked for Finches by domestication and isolation.

However, is this really `Bilingualism'? The problem is that, although there is flexibility in the sources of acquisition, the birds do not have the same flexibility in production. That is, as I understand it, they still develop only one song (i.e. they can't sing elements of A and B's songs in the morning, then elements of C and D's songs in the evening). Furthermore, the idea of 'comprehension' is more difficult to apply, since there is no semantics.

It has been suggested that Bengalese Finches have developed song complexity as a sexual display (Okanoya, 2004). Following from this, Soma et al. (2009) find that chicks select tutors based on their song complexity. Also, Okanoya (2010) presents some evidence to suggest that Benglaese Finches learn from many tutors. That is, they splice whole segments of songs from many other individuals to create their own song. In this sense, learning from multiple tutors increases the complexity of the song and so increases the attractiveness and fitness of the individual.

The ability to learn syntactic sequences from many tutors has apparently occurred in a system with no semantics. This may suggest that bilingualism at the syntactic level emerged before bilingualism at the lexical level, opposite to the order implicitly assumed by many. One big advantage of cultural evolution is that individuals can inherit information from multiple sources, whereas there are a limited number of biological parents. This is the core of what I mean by Bilingualism being a preadaptation for language: part of the acquisition of human languages requires the flexibility afforded by bilingualism.

Counter Arguments
This phenomenon in Bengalese Finches is interesting, but may not help with our question. Although the complexity of the system has increased due to a change in the environment (domestication), whether this was initially enabled by learning from multiple tutors is not clear.

Also, Okanoya has shown that vocal learners who co-inhabit areas with other species of vocal learners have less complex song. That is, song complexity does not help species identification. Therefore, if the capacity for bilingualism developed in humans before language, it's likely that there was little pressure on vocal cues for species identification.

Inter-species semantic communication
Many species also communicate vocally with other species. Vervet monkeys respond to the territorial and alarm calls of superb starlings (Seyfarth Cheney, 1990). Ring-tailed lemurs respond to the alarm calls of Verreaux's sifakas (Oda Masataka, 1996). However, captive ring-tailed lemurs who had never heard the sifakas' alarm calls also responded appropriately to playbacks. Oda and Masataka argue that they are therefore responding to shared acoustic features rather than to an associated meaning. Although in most examples of inter-species communication do not involve the transference of 'concepts', some examples do show evidence for this.

Zuberbuhler (2000) studied communication between Diana monkeys and Campbell's monkeys. Diana monkeys respond appropriately to Campbell's monkeys' alarm calls for leopards and eagles. Furthermore, their responses suggest they are attending to the meaning rather than the acoustic signal. If a Diana monkey hears a leopard or a leopard alarm call, it calls out loudly, but if it hears a second leopard or leopard alarm, it is quieter, presumably because of the risk of predation (the same is true of eagle alarms). Diana monkeys were primed with Campbell alarms for either leopards or eagles then probed with either eagle or leopard sounds (growls and shrieks). They responded loudly to each combination, apart from where the Campbell alarm corresponded to the predator type (e.g. Campbell leopard alarm followed by a leopard sound). In these cases, the Diana monkeys were quieter, suggesting that they thought the predator was already present.

Zuberbuhler concludes that "Diana monkeys can flexibly use and assess information derived from the communication of other species" and that "semantic understanding can be based on arbitrary signals, as it is the
case for word meaning" (Zuberbuhler, 2000, p. 717). Diana monkeys seem to understand the same concept from two different calls. I argue that this is bilingualism in the broad sense at the lexical-like level.

Again, Diana Monkeys are limited by their physiology in terms of production of the Campbell's alarms. However, the information transfer from Campbell's monkeys to Diana monkeys is not 'communication' as defined by MaynardSmith Harper, 2003) (also see Scott-Phillips, 2008). That is, although the Campbell's calls affect the behaviour of the Diana monkeys, they did not evolve to do this (they are cues, not signals). Therefore, I'd like to suggest that the origins of the capacity for bilingualism originates in the evolution from cues to signals.

However, these responses may not be learned. Furthermore, there is no current evidence to suggest that Campbell's reciprocate in their comprehension of Diana Monkey's calls. The latter issue is discussed by Magrath (2009) who study the alarm call responses of 3 ecologically distinct avian species and find that responses may be reciprocal, but not necessarily symmetrical. Different species reacted to each other's alarm calls in proportion to the 'reliability' of the call as a cue to one of the listener's predators. That is, not all predators of species A are predators of species B, so the A's alarms are not always reliable for species B, and species B responds appropriately. In Magrath (2009)'s study, some species responded in the same way to three different calls. Again, this is evidence for bilingualism in the broad sense.

This raises an interesting question of 'reliability' or 'relevance' (as in Relevance theory, Sperber Wilson, 1995) in animal communications. Much of animal communication is limited to and grounded in information relevant to shared survival interests, that is, food, predators and mating. Humans are capable of communicating about topics beyond their immediate survival needs. This difference possibly requires the of 'ungrounding' of signals from the domains in which they evolved (see the next section).

Bilingualism's impact on FLB
Although bilingualism may not be necessary for the acquisition of language, and so could arguably not be part of FLN, learning two languages does seem to have a qualitative impact on capacities in FLB. For example, compared with monolinguals, bilinguals develop better inhibitory control, theory of mind (Goetz, 2003) and task-switching (Bialystok Martin, 2004).

Raphael Nunez's approach to the evolution of language hypothesises that it involved several pre-adapted 'Modules', but these modules coevolved. That is, an advancement in one module (e.g. more stable voice source, see Demolin, 2010) could cause an advancement in another (e.g. vocal learning), which could feed back into the first module.

Nunez sees the evolution of meaning as involving the development of a grounded system, ungrounding this system from its original domain, then re-grounding it in another. His work focuses on how gestural instantiations of space were re-grounded to convey information about time. For example, one might point behind to indicate an event that occurred in the past. Linguistic expressions of time have also adopted this system.

I suggest that bilingualism can be seen in this way. For instance, being able to learn from several tutors has advantages for increasing signal complexity in some situations. If this ability to learn from individuals could be ungrounded to allow learning from contexts, then this would allow a semantic system to develop. In other words, a kind of bilingualism allows the complex vocal learning mechanisms to be deployed over more general domains.

Okanoya has a similar hypothesis which sees string segmentation and context segmentation as necessary preadaptations for a semantic system. Indeed, the Bengalese Finches studied above may not only be doing string segmentation of tutor's songs, but also a king of crude `tutor segmentation'. That is, they select whole sections from different tutors.

The ungrounding theory suggests that the pressure on the original system needs to be lifted by some other mechanism. This may be a change in the environment, or an internal mechanism. It's likely in the case of the Bengalese Finch that its domestication had a large part to play, alleviating the burden of foraging and predation.

Asking whether non-human species have capacities for bilingualism in the broad sense may affect the way we approach bilingualism. This report has reviewed studies which show that animals have capacities compatible with ideas of bilingualism, but without other features of human language. These capacities stem from very basic evolution of cues and being able to learn from multiple tutors.

Further analysis of evidence for bilingual behaviour in animals is required. These include, for example, switching tasks in primates and other animals and the boundaries between different dialects in whale song. Crucially, this analysis, just like for the rest of FLB, relies on a evidence from a great number of studies. If the relevant studies have not been done, the potential for completing them in this project is extremely restricted.

More fundamentally, this report takes an approach to bilingualism that may not be appropriate. The comparative approach was designed to identify and study necessary components of the language faculty. On the other hand, such an approach may show that, from an evolutionary perspective, there is no easy way to define bilingualism, questioning whether there is a difference between monolingualism and bilingualism or even an easy way to distinguish between languages.


Bialystok E, & Martin MM (2004). Attention and inhibition in bilingual children: evidence from the dimensional change card sort task. Developmental science, 7 (3), 325-39 PMID: 15595373

D. Demolin (2010). Prosody and recursion in primate vocalisation Proceed- ings of the JAIST International Seminar on the Emergence and Evolution of Linguistic Communication, Kyoto, Japan.

Doupe AJ, & Kuhl PK (1999). Birdsong and human speech: common themes and mechanisms. Annual review of neuroscience, 22, 567-631 PMID: 10202549

Fitch, W. T. (2005). The evolution of language: A comparative review Biology and Philosophy, 20 (2-3), 193-203 : 10.1007/s10539-005-5597-1

Kaminski J, Call J, & Fischer J (2004). Word learning in a domestic dog: evidence for "fast mapping". Science (New York, N.Y.), 304 (5677), 1682-3 PMID: 15192233

Kuhl, P. & Miller, J. D. (1978). Speech perception by the chinchilla: Identification functions for synthetic VOT stimuli The Journal of the Acoustical Society of America, 63 (3) DOI: 10.1121/1.381770

Magrath, R., Pitcher, B., & Gardner, J. (2009). An avian eavesdropping network: alarm signal reliability and heterospecific response Behavioral Ecology, 20 (4), 745-752 DOI: 10.1093/beheco/arp055

Maynard Smith, J., & Harper, D.G.C. (2003). Animal Signals Oxford University Press, Oxford

Oda, R. and Masataka, N. (1996). Interspecific responses of ring-tailed lemurs to playback of antipredator alarm calls given by Verreaux's sifakas. Ethology, 102, 441-453 : 10.1159/000021651

Okanoya, K. (2004). Song syntax in bengalese finches: proximate and ultimate
analyses Advance in the Study of Behaviour, 34, 297-346

Okanoya, K. (2010). Biological preadaptations for language Proceedings of the JAIST International Seminar on the Emergence and Evolution of Linguistic Communication, Kyoto, Japan.

SCOTT-PHILLIPS, T. (2008). Defining biological communication Journal of Evolutionary Biology, 21 (2), 387-395 DOI: 10.1111/j.1420-9101.2007.01497.x

SEYFARTH, R., & CHENEY, D. (1990). The assessment by vervet monkeys of their own and another species' alarm calls Animal Behaviour, 40 (4), 754-764 DOI: 10.1016/S0003-3472(05)80704-3

Soma, M., Hiraiwa-Hasegawa, M., & Okanoya, K. (2009). Song-learning strategies in the Bengalese finch: do chicks choose tutors based on song complexity? Animal Behaviour, 78 (5), 1107-1113 DOI: 10.1016/j.anbehav.2009.08.002

Miki Takahasi, & Kazuo Okanoya (2010). Song Learning in Wild and Domesticated Strains of White-Rumped Munia, Lonchura striata, Compared by Cross-Fostering Procedures: Domestication Increases Song Variability by Decreasing Strain-Specific Bias Ethology

Zuberbühler, K. (2000). Interspecies semantic communication in two forest primates Proceedings: Biological Sciences, 267 (1444), 713-718 DOI: 10.1098/rspb.2000.1061

Levels of Bilingualism - update

I recently went back to my analysis of the Ethnologue into bilingualism. I suggested estimating the level of bilingualism by calculating the minimum number of bilinguals. However, there is already a much better estimation, and it has already been calculated. Greenberg's diversity index calculates the probability of two people who have different mother tongues in the same country meeting. This index has already been calculated for each country in the ethnologue. Below is a map of these indices. Dark colours indicate a higher diversity index (few people have the same mother tongue), lighter colours indicate a lower diversity index (total white = all people have the same mother tongue).

However, this still isn't a good predictor of bilingualism. Instead, it's a measure of diversity of mother tongue. The index still assumes all people only speak one language. I'm trying to figure out how to modify the index to take account of bilinguals - but I was always rubbish at probability.

Wednesday, 21 April 2010

Keeping Time

I'm stuck in Amsterdam because of an erupting volcano. We're staying on a barge- very quaint,but the rooms are tiny. I was lying awake, wondering what time it was because my phone has run out of battery. Was it 4am or 12 noon? When should I wake my friends? How to tell how much time was passing? It's not that easy- I remember seeing a competition where people had to judge an hour without watches, and one guy made a bet at half an hour.

It occur to me that singing songs in my head was a good way of keeping time. Instead of counting seconds, I'll count songs. I'll give it two 'beat it's and a 'torn' before waking them, I thought. Then, I thought about the evolution of language conference we visited this week. People were always coming up with theories about the adaptive advantage of language. What if it was useful for measuring waiting time?why would you want to?

In the Miocene, the environment started to dry out, leading to a thinning of resources. This meant that primates either had to reduce their group size or travel further more efficiently in order to find enough food. Our ancestors chose the second option (this theory put forwards by Isbell & Young, 1996).

Now, imagine yourself as part of a large group who travel big distances in forests. Inevitably, you're going to split up. You won't be able to contact them, so you have to decide to wait or move on. In a foreign city with no phone battery, I've been in this situation many times his week. The best hing to do is wait for a while, then move on. But how long? And how to measure?

Singing! And the more complex the song, the less the number of repetitions you have to keep track of. I remember now a piece of child psychology where you tell a child they can have one biscuit now or wait 10 minutes and have two. Intelligent kids will sing to themselves to pass the time.

So there we go, language evolved under an adaptive pressure to accurately measure small periods of time. There are a billion holes in this theory. For example, the sun is a pretty good indication of the time. Also, it's not clear that this ability is any use.

Anyway, we might get a few papers, a book and a conference out of it.

Isbell, L. A., & Young T. P. (1996). The evolution of bipedalism in hominids and reduced group size in chimpanzees: alternative responses to decreasing resource availability Journal of Human Evolution, 30 (5), 389-397 DOI: 10.1006/jhev.1996.0034

Monday, 12 April 2010

Catch Up

This weekend I took part in a 48 film making competition! Here's the result:

Monday, 5 April 2010


I've just finished a stencil for a friend. It's Yuna from Final Fantasy X. I used too much paint on the first print, but the second came out much sharper.

Step 1: Steal picture

Step 2: Print and Cut

Step 3: Spray

Step 4: Repeat