Sunday, 1 November 2009


Where do our words come from? English is Germanic, right? Ok, but what about words like Cappuccino, Revolution and Smorgasbord. Well, those were just 'borrowed' - they don't really count, since we're intending to give them back. But how valid is this view? Over the centuries, speakers have adopted words from all over the place, yet the diversity of the sources of words is under-appreciated.

Sounds like a job for ... etymology!

The general view of languages is that they are related like a family tree. English is seen as a Germanic language, along with Dutch and Flemish, while Welsh is seen as a Celtic language along with Irish and Cornish. The tree diagram below shows this idea, and gives the impression that the last 'common ancestor' of English and Welsh was way-back Proto-European:

However, this masks the complexity of languages and language change. A strict family tree marginalises the borrowing of words from other languages. For example, there are a huge number of 'English' words with roots in French, Italian and Spanish.

Hurford & Dediu (2009) encourage us to see languages as made up of sets of linguistic units (e.g. a word), each of which can have a separate ancestry. I wondered what this would look like, so I used the Online Etymology Dictionary to create one.

The Etymology Dictionary lists the heritage of English words, for example:
Cabin: 1549, from M.Fr. cabinet "small room," dim. of O.Fr. cabane "cabin" (see cabin); perhaps infl. by It. gabbinetto, dim. of gabbia, from L. cavea "stall, stoop, cage." Sense of "private room where advisors meet" (1607) led to modern political meaning (1644).
That is, the ancestry of 'Cabin' can be traced back through Middle French, Old French, Italian and Latin. Similarly, the word 'Tower' also comes from Latin, but via Old English. Crawling the website, the relationships for about 5000 words were processed. I used hypergraph to display them in an interactive hyperbolic graph. You can play about with it below, or visit here. Click and drag portions of the graph on the edges closer to the middle to explore. For some reason, it starts off zoomed in on Latin, but there's a lot of detail to the right (see here for abbreviations).

For ease of presentation, the graph is simplifed, with lineages of words between 'languages' first going through a language node. Also, Modern English words are not represented, but all contained within the 'Mod.Eng.' node.

Some bits of the graph are tree-like: Words with roots in Middle High German are only borrowed through (New High) German. However, in general, the graph is not tree-like at all. The lineages of English words have all sorts of routes through earlier languages. For example, words can come from Greek via German or French. And this is only for English words. Imagine etymological data from German and French was added.

Ok, so the graph is pretty useless for research - it's just way too complicated (part of the problem is that hypergraph is designed for trees). What I'm aiming at is questioning the idea of a 'language' as a stable set cut off from other 'languages'. We don't inherit a 'dictionary' from just two individuals, like our genes; we pick up individual words from a wide range of sources, and keep adding, borrowing and changing them throughout our lifetime.

1 comment:

  1. The first thought that came to my mind was an analogy with horizontal gene transfer complicating the branching 'tree of life' way of thinking about evolution. Geneticists might have ways of graphically mapping gene lineages and relations that could be co-opted for representing words...