Responses to Dunning and Finch; thesis e-availability
re: Dunning's 'time' data
Rosamund Moon (lexicographer at COBUILD) has written a paper
specifically on 'time' and its associated idioms.
As Steve Finch points out, dictionaries aren't sacred. The very idea
of polysemy comes from them. When we use them, whether we like it or
not, we are taking on board 200 years (minimum) of lexicographic
history. We either make fools of ourselves (at least in their eyes)
or take this seriously (by, eg, asking them what they think about what
they do, and making sure we are not using obsolete evidence: LDOCE-1,
which Ted Dunning quotes and is most widely used in NLP, is 18 years
old now, with LDOCE-3 out shortly. Lexicography has made great
strides in between, not least because of advances with corpora).
Lexicographers know a lot about these things. After all they spend 8 hours a
day 5 days a week 35 weeks a year for X years staring at the
specifics. The data we work on is their creative output.
re: Steve Finch's msg - 'Computational Linguist'.
> Others might
> say that `computational linguistics' has become an `idiom', but then
> there are surely a lot of `idioms' out there...
There are clearly degrees of idiomaticity (and lots of them).
Idiomaticity/collocation is just one dimension of the conceptual space
surrounding polysemy. But there are at least four. Others are
homonomy (eg arbitrary meaning-difference), alternation (rule-bound
behaviour - cf Pustejovsky, Beth Levin) and analogy (which covers
nonce and creative uses, arguably outside the *linguistic* system
proper). Each of these relates to frequency of occurrence in
interesting ways. There are also different kinds of word-
sense distinction. Just because 'time' or 'linguist' works one way,
it doesn't mean other words do.
re: thesis availability
I couple of people enquired about my thesis, 'Polysemy', referenced in
an earlier msg. It's now available electronically (barring a couple
of diagrams I have yet to fix):
See especially chapter 7, also 5, for elaboration of comments above,
and chapter 6 for a study where I'm a bit systematic about seeing
whether corpus instances of a word can be allocated to a unique dictionary