March 24, 2000
Hybrid, Semi-Automatic Approaches to NLP
We present hybrid approaches to NLP that result from a synergistic combination of numerical computations and linguistic grammars (or linguistic filters). Numerical data mining tools are generally quite robust but only provide coarse-granularity results; such tools can handle very large inputs. Computational linguistic tools are able to provide fine-granularity results but are less robust; such tools usually handle relatively short inputs. We apply our hybrid approach to knowledge extraction, multi-word term identification and web customisation (work in progress). Our software tools are different from most other term identification or knowledge extraction tools in that they are by design semi-automatic : i.e. they are interactive and constantly under the user's control. The software supports the knowledge engineer's work, the (corpus) domain's expert, or the linguist's task, by helping them do their job more efficiently. We justify this semi-automatic approach by the need to have a more flexible and customisable tool to perform certain term identification tasks and certain knowledge extraction tasks. More specifically, in some applications we want to allow the user's perspective, knowledge and subjectivity, influence the results: all this within certain limits, of course. An example of such an application on which we are currently working is that of Web personalisation: to allow individuals to develop their own vision of information universes of interest to them, we need flexible and customisable tools that can support them in such a challenging task, not tools that will impose on them a pseudo-standardised vision of the world.