Proceedings of EURALEX 2000
all the many words, uses, and structures that are possible in a language, it will show us how
to pick out just those that are normal, and it will relate other uses to the norms by a theory of
exploitations: a set of exploitation rules that will say how a normal use may be exploited to
form metaphors and other unusual uses, and what the constraints are. (Norms, of course, may
be genre-specific, as well as general.)
Until the advent of large corpora in the 1980s, there was simply no way of analysing the charac-
teristic behaviour of each word in the language. Now we have large corpora, it is time to revisit
theory from a lexical point of view, taking account of what can be learned from corpora.
In pursuit of definitions that accurately summarize the unique contributions of words to the
meaning of sentences in which they occur, modern lexicographers can now study concordance
lines from a corpus. What they find is interesting, and not always expected, even though, in
all too many cases, what they find is be determined by what they expect to find. Some lexi-
cographers and linguists have treated the corpus merely as a quarry, a source of examples for
what they already ’know’. And very often the corpus obliges. If you look long enough and hard
enough, and if you have a large enough corpus, or enough texts of the right kind, you will find
what you are looking for. For example, a large historical corpus may yet be found that contains
an example or two supporting the notion that the verb fan means ’to winnow (grain)’. But that
does not mean that this is part of the meaning of the modern word fan. In fact, to use a corpus in
this way, i.e. to make self-fulfilling prophecies, is precisely what corpus linguistics is not about.
(This does not prevent lexicographers from doing it, however.) Corpus linguistics, if it is about
anything, is about observing the conventions of language in use, and then observing the great
variety of ways in which these conventions are exploited. (It is perhaps worth mentioning in
passing that a corpus does not, of course, provide direct evidence for meaning; it consists of a
record of traces of linguistic behaviour, from which meanings can be inferred.)
Some grammarians have used corpus evidence in a similarly supplementary way. Beth Levin,
for example, in compiling her (partial) inventory of English verb classes and alternations, first
consulted her intuitions, then (with the help of colleagues) checked the corpus to see if she
had missed anything. The result was undoubtedly an improvement on intuition alone, but nev-
ertheless some of the verbs in Levin’s classification rarely if ever behave in the way that the
classification predicts. The corpus, evidently, was used to supplement intuitions rather than to
motivate the analysis, and examples which satisfied intuition but for which no corpus evidence
was available were not rejected. But Levin might ask, why should they be? For we must beware
of the failure-to-find fallacy: the fact that we have failed to find something does not mean that it
does not exist. Against this must be set the line of argument that says that if something does not
occur in a corpus of 100 million words equivalent to half a dozen years of hard uninterrupted
reading for a normal person then it cannot be very important.
Another example is the COMLEX project (Grishman, McLeod et al.), which describes in detail
the possible complementation patterns of English verbs. Because the focus of COMLEX is on
the possible, not the probable, it is perhaps a less useful tool than it might have been. And I think
the COMLEX people recognize this. It is surely no accident that one of the driving forces behind
the American National Corpus initiative is Catherine McLeod, who was also one of the prime
movers in COMLEX. Her experience on COMLEX was not dissimilar to that of many British
lexicographers in the 1970s and 80s. Using their intuitions, she and her colleagues compiled
12