MULTEXT

Multilingual Text Tools and Corpora
MULTEXT

Contents: [Summary] [Progress and Results] [Exploitation] [Consortium] [Contact details]

Project Summary

The project has developed a set of generally usable software tools to manipulate and analyse text corpora, together with lexicons and multilingual corpora in seven European languages. It has established conventions for the encoding of corpora and harmonised specifications for computational lexicons, building on and contributing to the preliminary recommendations of the relevant international and European standardisation initiatives. All project results are freely and publicly available.


Progress and results

MULTEXT has developed the first set of publicly available large-scale resources and tools for use in corpus-based language engineering applications. The project's specific achievements fall into three areas:

Specifications:

Resources:

Tools:

Start date January 1994
Duration 26 months
Total Effort ca. 350 person-months


Exploitation

Text-oriented methods and software tools have come to be of primary interest to the NLP community. It is therefore expected that the availability of basic multilingual tools and data will improve and extend R&D across a wide range of disciplines, including not only the various areas of language engineering, but also fields such as speech technology, language learning, lexicography and lexicology, information retrieval, etc. The project's methodologies and results are being used in a related project under the Copernicus programme, MULTEXT-EAST, thus extending the application to thirteen western and eastern European languages. Extensions to regional and non-European languages are also underway.


Consortium

Organisation Role Country
Laboratoire Parole et Langage-CNRS C FR
Universitat Autonoma de Barcelona,
Fundación Bosch Gimpera
A ES
Universitat Central de Barcelona A ES
University of Umea A SE
Institut Dalle Molle pour les
Etudes Sémantiques et Cognitives, Geneva
P CH
ILC - CNR, Pisa P IT
University of Edinburgh,
HCRC-LTG
P UK
Universiteit Utrecht,
Stichting Taaltechnologie
A NL
Universität Münster A DE
INCYTA P ES
Digital Equipment BV P NL
SITE EUROLANG-Sonovision Itep Technologies P FR
Rank Xerox Research Centre A FR

Contact person

Dr. Jean Véronis
Head of Natural Language and Speech Processing Group
Laboratoire Parole et Langage, CNRS
29 Avenue Robert Schuman
13621 Aix-en-Provence Cédex 1
France
Tel: +33 42 95 20 73
Fax: +33 42 59 50 96
Email: 
veronis@lpl.univ-aix.fr
URL: http://www.lpl.univ-aix.fr/projects/multext/

Home | I*M Europe | Telematics | What's new | News | Events | Reports | Indexes | Site map | FAQ | About this site | Search | Feedback |

Home - Gate - Back - Top - Multext - Relevant