What is tagging?
Part-of-Speech tagging (POS tagging) consists of automatically assigning tags to words. Each word is tagged (=labelled) according to its linguistic category. A simplified form of POS tagging is similar to what we did at school when identifying words as Nouns, Verbs, Adjectives, Prepositions. For example the word ‘amigo’ will be tagged as NCMS, which means that it is a Noun, Common, Masculine, Singular.
Which CEDEL2 subcorpora are tagged?
Only the Spanish and English components of CEDEL2 have been POS tagged: all the L2 Spanish learner subcorpora, the L1 Spanish native subcorpus and the L1 English native subcorpus.
What is POS tagging used for in CEDEL2?
When searching the CEDEL2 corpus, you can do two types of searches:
- Searching for a word: you can do a simple search for individual words like ‘estar’, ‘ser’, ‘amigo’, ‘amor’, or for a combination of words like ‘estar enamorado’, ‘vivo en Estados Unidos’. This is called ‘string’ search.
- Searching for a word category: you can do an advanced search by looking for a Verb, or for a Noun, or for a combination like Noun+Adjective (a noun followed by an adjective) or Adjective+Noun. This gives you a more sophisticated way of searching for constituents in the corpus. Please check the tag ‘User guide’> ‘Instructions’ for further details on advanced searches.
When doing an advanced search, the corpus must have been previously POS tagged. This is why CEDEL2 has been POS tagged.
Which tags have been used?
A note on automatic POS tagging
Please note that in this version of CEDEL2 we have done an automatic POS tagging, which implies that some words produced by learners might have been incorrectly categorised due to the very nature of learners’ language. This is so because the POS tagger automatically tags from Spanish native categories onto the learner language (L2 Spanish), e.g.:
- “Me casa es blanco”: the word ‘me’ is tagged as the Spanish native first person singular object personal pronoun (meaning ‘(to) me’), though we know that learners often use me as a first person singular possessive pronoun (cf. the Spanish native correct mi, meaning ‘my’).
- “Yo cumplear dieciseis anos”: novel words which are typical of learner’s language (cumplear) will not be properly categorised since they do not exist in native Spanish (cf. the correct cumplir).
In later versions of the CEDEL2 corpus, incorrect tagging will be double checked manually, but for the time being the automatic tagging is useful for advanced searches.