February 4, 2000
3:30pm
Martin Fontaine
Structural Identification of Unintelligible Documents
This presentation describes some techniques and approaches to solve the document identification problem in a certain particular context. We are trying to classify documents by their structure not by their content. This means that we do not assume that the documents are written in a natural language. The only assumption made is that the target concept that we are trying to learn (with machine learning techniques) can be expressed with a regular expression. The following topics will be discussed: