University Centre for Computer Corpus Research on Language (UCREL)
Description
The University Centre for Computer Corpus Research on Language is a research group based in the School of Computing and Communications and the Department of Linguistics and English Language.
The Centre has led the way in an approach to statistical natural language processing based upon information from large bodies of naturally-occurring text and in the application of corpus data to industrial problems in areas as diverse as dictionary creation and speech processing. The Centre's recent projects have been funded nationally by the Engineering and Physical Sciences Research Council, the Economic and Social Research Council, the Arts and Humanities Research Council and the Leverhulme Trust, as well as by the European Union and the Andrew W. Mellon Foundation in the US. Previous projects include the British National Corpus project, a national consortium of academic and industrial partners (Oxford University Press, Oxford University Computing Service, The British Library, Longman Group Ltd., W. and R. Chambers Ltd.). Further international collaborators have included HarperCollins, Nokia, IBM Paris, Universidad Autonoma de Madrid and Shogakukan Inc, Japan.
Since 1994, the Centre has launched three continuing series of highly successful conferences (Teaching and Language Corpora; Discourse Anaphora and Anaphor Resolution Colloquium; Corpus Linguistics), with the first two events in each series taking place at Lancaster University.
For more than four decades, the Centre has led the way in an approach to natural language processing that is based upon information derived from large bodies of naturally-occuring text. These bodies of text are stored on the computer and are known as corpora (sg. corpus).
The vast majority of the Centre's work is carried out within this corpus-based paradigm. The corpora are used to derive empirical knowledge about language, which can supplement, and frequently supplant, information from reference sources and introspection.
Because they are well suited to quantitative analysis, corpora can provide information about the relative frequencies of many aspects of language. These frequencies can then be employed in probabilistic analysis techniques, which are another major feature of the Centre's work.
UCREL's work is very much focussed on practical outcomes regarding:
- speech synthesis
- speech recognition
- machine-aided translation and assisting human translators
- dictionary publishing
- social survey interview analysis
- computer-aided language teaching
- software engineering
The Centre’s work focusses on:
- English – the Centre was a leading partner in the British National Corpus consortium and is now exploiting it to arrive at new, data-grounded analyses of present-day British speech and writing. The Centre is also involved in corpus-based work on the historical development of the English language, as well as on learner English.
- Modern foreign languages - members have built, annotated, and exploited corpora of modern languages such as French and Spanish, and are presently involved (in collaboration with the University of Lodz) in producing a major corpus of contemporary Polish.
- Minority, endangered, and ancient languages - members have pioneered corpus work on non-indigenous minority languages in the UK (e.g. Chinese, Hindi, Punjabi), and they are now extending this work to European indigenous minority languages. They have also carried out computer-aided linguistic research on ancient languages such as Latin.
Offers funding
No, this infrastructure does not provide funding.
Contact details
University House
Bailrigg
Lancaster
LA1 4YW
United Kingdom
On the map
Categorisation
Type
Project Tags
University affiliation(s)
Lancaster University
Bailrigg
Lancaster
LA1 4YW
Last modified:
2023-09-20 14:59:54