disambiguation tasks, collocational & phrasal patterning, & 'time'
Back from a long trip, now I can finally respond to the discussion
on word sense.
In what follows, I talk about Longman's entry for 'time' that Ted
Dunning sent out. I first mention a couple issues relevant to the
sense disambiguation task and then I query folks about a tiered
approach for specifying word senses.
I. On dictionary sense disambiguation tasks.
While Robert Amsler commented that he expects "true ambiguity
resolution to find the distinctions between senses as detailed in a
specific published dictionary," Ted Dunning noted that "people
can't perform this disambiguation task with any reliability or
even repeatability. " Dunning then queried whether this doesn't
"seriously bring into [question] whether the task is pertinent to
Actually, it strikes me as unsurprising that people don't replicate
sense distinctions found in published dictionaries.
First, Dictionaries vary rather wildly on the number of senses
they attribute to a particular word. For example, for the noun
'time', Longman cites 50; Webster's III unabridged, 15 + 7
expressions involving 'time'; Random House II unabridged, 26 +
32 expressions; and Collins Cobuild English Language
Dictionary cites 28 senses + 11 expressions.
If dictionaries don't agree on the number (and therefore identity)
of senses, why should any respondent replicate the distinctions of
any particular published dictionary as opposed to any other?
Never mind the dictionaries which have not yet been published.
Further, taking Longman's as an example, this dictionary relies
q u i t e heavily on collocational patterning in its statement of
meaning. Dictionary entries relying almost exclusively on
collocational patterning can result in definitions which are
overly restricted which miss linguistic generalizations.
While collocational patterning is obviously important, it's not the
only patterning relevant to word meaning. And actually,
attention to the phrasal structure of collocations can point the
way to syntactic generalizations that may prove useful in
organizing a dictionary entry.
Here are two examples from Longman's entry for 'time'.
Sense 0100 reads as follows:
0100 A continuous measurable quantity from the past through
the present and into the future.
ex. (The universe exists in space and time)
Comment: The gloss of quantity/extent (from past through
present into the future) corresponds to what we know about
universes, not to the meaning of 'time' itself.
Thus, in "A lighting bolt exists in time and space", we find no
notion of continuous, measurable quantity, or of
Longman is building into the definition of 'time' information
more properly associated with the definition of 'universe'.
Syntax provides a helpful toehold here. [in .... time] is a
prepositional phrase, exhibiting one of the typical prepositional
phrase meanings -- location. You can tell that because the wh-
interrogative query on the PP in example 0100 is "WHERE does
the universe exist?" not "when does the universe exist?". The
phrase structure of this reading is instrumental in pointing to
basic aspects of its meaning.
Here's another example of how general syntactic information and
collocational information interact in the statement of word
Longman Definition: 1400 'the time': the right occasion:
ex. (He's in a good temper, so now's the time to tell him you've
made a serious mistake )
Comment: It is certainly true, "the time" in 1400 receives the
gloss of "the right occasion". However, there is more to the story.
This reading is systematically related to others, specifically
NP=[determiner (X) time] where X is an adjective specifying a
characteristic of time: ex. "Now's the right /wrong /perfect /worst
time." When the adjective is omitted, we get a default
assumption of the "right" time.
If the definition specified the general syntactic pattern
NP=[det adj N], much of the meaning would be readily derivable
and it would also key into the similar paradigmatic pattern: [the
2. A question regarding a multi-tiered approach to sense.
As I worked through the 50 senses of 'time' in Longman's, I
realized that 'time' occurrs in 3 basic syntactic patterns:
A) Prepositional phrases as head of a preposition,
B) Noun Phrases, as head, and
C) Verb Phrases as object to a content verb
In each case, looks to me like a good deal of the meanings stated
in Longman's for 'time' stem from the preposition, or nominal
modifiers, or content verb phrasally associated with 'time'.
For example, 4000 [make good time], Longman glosses as "to
go that a speed that is satisfactory or better than expected". But
this example instantiates the more general pattern, VP=[make X
time] (make good/poor/terrible/alright, etc. time.). So the
meaning ought to read "go at a speed that is X relative to
Here are the 3 patterns, and Longman entries corresponding to
A) PP [prep time]
encoding the usual types of PP meaning:
(i) location (stative, point, goal),
(ii) time (point, duration, iteration)
Longman senses: 0100, 0500, 0700, 1700, 2000, 2100, 2200,
2400, 2500, 2600, 2800, 2900, 3000, 3300, 3400, 3500, 3600,
B) NP [(det) (adj) time]
modifiers (a, the, all, many, only, full, good, long, right
etc. ), compounding (bedtime, summertime), near
compounding (closing time).
Longman senses: 0300, 0400, 0600, 0800, 1200, 1300, 1400,
1500, 1600, 1700, 1800, 1900, 2300, 4100, 4800, 4900
C) VP [V time]
V= have, keep, (kill, pass, bide), do/serve, take,
make, play for
Longman senses: 1000, 1100, 2700, 3100, 3200, 3800, 3900,
4000, 4500, 4600, 4700,
Then there were some entries that seemed rather idiomatic or
metaphoric to me... (3700 It's only a question/matter of time;
4300 once upon a time; 5000 the time of one's life;
0900 the rate of pay received for an hour's work)
Here's my question:
Would it be useful in NLP to
(i) make the first cut at the level of phrasal syntax (NP, PP, VP),
drawing from word's syntactic position basic, typical NP, PP,
VP types of meaning, and
(ii) then move on for more detail to the to collocational aspects of
meaning in the word's immediate context?
As is, in Longman's, it seems that the lexicographers went
straight to (ii), skipping all the semantic basic info. they could
have drawn out of the basic phrasal patterning of (i).