Session I / Part II (-> Handout)



Corpus Linguistics

I will give a presentation on a corpus-based approach to the study of (specialist) language. To get us started, I will ask you to note down all you know about the German word ‘Aufmerksamkeit’ and the English word ‘attention’; we will then compare your intuitions about these words with evidence from a large corpus (a machine-held collection of texts). The point I’ll be trying to make is that intuition is prone to error and imprecision when it comes to describing the realities of language use. Many pre-electronic descriptions of language testify to this, either in making gross misjudgements about the behaviour of a language item, or in presenting examples which have the unmistakeable ring of artificiality.

I will then proceed to discuss the methodology and software used in corpus linguistics, answering the following questions:


How do you build a corpus?

How do you analyse a corpus?


We will then discuss some of the implications and applications of corpus-based analysis:


How can corpus analysis help us to describe special(ist) language?

How can corpus analysis help us to teach special language? (for academic English, see Tim Johns, Kibbitzer)


More detailed introductions to corpus linguistics are Catherine Ball, Concordances and Corpora, Tony McEnery/Andrew Wilson, Corpus Linguistics and Graeme Kennedy, Introduction to Corpus Linguistics. London: Longman 1998; I have published a French-language article on a corpus-driven approach to grammar. Those of you who are interested in using concordancing software can download a free DOS-based concordancer called Microconcord at (click on Other software). Corpora of specialist texts can be constructed from texts freely available on the World Wide Web using off-line browsers or Corpus Web (the advantage of the latter being that it automatically converts html format into txt format).


Assignment for Session 2: To recap on today’s session, read Catherine Ball, Concordances and Corpora, Lynne Bowker/Jennifer Pearson, Working with Specialized Language and/or Tony McEnery/Andrew Wilson, Corpus Linguistics (You may also wish to read the article on ‘corpora’ in Linguistics Encyclopaedia.) Then download Microconcord at (click on Other software in the left-hand margin) and a corpus of specialist (film studies) texts onto your hard disk, placing them in the same directory; open the Windows explorer and start Microconcord by clicking on the file named ‘MCONCORD’ (application/Anwendung).

Search the corpus for the following specialist terms and try to describe their behaviour (i.e. compounds, collocations, phraseology, complementation patterns) in meticulous detail: shot (= Aufnahme), cut (= Schnitt / schneiden).

Any questions? Write to me at