Subject: Sum : compound noun corpora Resent-Date: Tue, 4 Apr 1995 16:18:25 +0200 Resent-From: Cecile.Fabre@irisa.fr One month ago I sent a query to obtain English noun compound corpora. These are the two largest lists I received : 1. a 1-MB word list of compounds from a spellchecker for the NeXt computer, sent by George Fowler. 2. a 9000 binary nominals sent by Richard Sproat with judgments on accent placement. It is described in : Richard Sproat, ``English Noun-Phrase Accent Prediction for Text-to-Speech.'' {\it Computer Speech and Language}, 1994, 8, 79--94. The 2 files are available by anonymous FTP from the following site : ftp.irisa.fr under the directory /local/corpus Other responses were mainly advice to build my own list from tagged corpora (Brown Corpus, Penn Treebank, etc.) or by statistical methods (see Johansson, C., 1994, Catching the Cheshire Cat, proc. COLING, Kyoto, /http://www.ling.lu.se). I received also some biliographical references on the treatment of complex nominal sequences, which I reproduce below. Thanks to : Eric Steven Atwell, Paul Bennett, Pier Marco Bertinetto, Beatrice Daille, George Fowler, Christer Johansson, Bernie Jones, Mark Lauer, Judith N. Levi, Philip Resnik, Richard Sproat, Achim Stein, Wilco Ter Stal, Evelyne Tzoukermann, Nick Youd. Bibliographical references : Paul Bennett, A Multilingual Translation-oriented Typology of Compound Nouns, TAL (Traitement Automatique du Langage), 1993, vol.34. Church and Hanks, article in Computational Linguistics 16 Bernie Jones "Predicting Nominal Compounds", MPhil Dissertation, University of Cambridge Engineering Department Lauer, Mark (1994) "Conceptual Association for Compound Noun Analysis" Proceedings of the Student Session of the 32nd Annual Meeting of the Association for Computational Linguistics, June, Las Cruces, New Mexico Lauer, Mark and Dras, Mark (1994) "A Probabilistic Model of Compound Nouns" Proceedings of the 7th Australian Joint Conference on Artificial Intelligence, November, Armidale, Australia Levi, Judith N. 1978. THE SYNTAX AND SEMANTICS OF COMPLEX NOMINALS. NY: Academic Press. Includes an appendix of compound forms. Leonard, Rosemary. 1984. THE INTERPRETATION OF ENGLISH NOUN SEQUENCES ON THE COMPUTER. Amsterdam: North-Holland This study used 2000 noun sequences taken from a corpus of 300,000 words of English fiction from 1700 to now. Ryder, Mary Ellen. 1994. ORDERED CHAOS: THE INTERPRETATION OF ENGLISH NOUN-NOUN COMPOUNDS. Berkeley/Los Angeles/ London: University of California Press. Focuses esp. on interpretation of novel pairings. Rivista di Linguista, 4,1, 1992 Wilco G. ter Stal & Paul E. van der Vet, Two-level semantic analysis of compounds