Resources and Tools
Science and Technology Corpus
The principal aim is a tool to conduct research into the use of Basque in the field of science and technology
- Works in the field of science and technology published between 1990 and 2002 have been taken into consideration to feed a corpus
- A total of 8,5 million words.
- 1.9 million words have been processed automatically and then reviewed and corrected manually.
- The Corpus is classified according to field (scientific discipline) and genre (text type).
- The Corpus is tagged, with respect to text structure and format as well as linguistically.
- There is a powerful interface for querying the Corpus and all kinds of simple and complex searches can be made.
- It has been developed by the Elhuyar Foundation’s R&D Group and the IXA Group of the Computing Faculty of the UPV-EHU (University of the Basque Country).
- It was presented at LREC 2006, and at Corpus Linguistics 2007.
- Three modes have been set up in order to introduce it to the general public.
- Freely available for research through an agreement.
- Distributed by means of a licence for commercial use.