Tags

What is tagging?

Part-of-Speech tagging (POS tagging) consists of automatically assigning tags to words. Each word is tagged (=labelled) according to its linguistic category. A simplified form of POS tagging is similar to what we used to do at school when identifying words as Nouns, Verbs, Adjectives, Prepositions. For example the word ‘amigo’ will be tagged as NCMS, which means that it is a Noun, Common, Masculine, Singular.

Which CEDEL2 subcorpora are tagged?

In this version of CEDEL2, only the Spanish and English components of CEDEL2 have been POS tagged: all the L2 Spanish learner subcorpora, the L1 Spanish native subcorpus and the L1 English native subcorpus.

What is POS tagging used for in CEDEL2?

When searching the CEDEL2 corpus, you can do two types of searches:

Searching for a word: you can do a simple search for individual words like ‘estar’, ‘ser’, ‘amigo’, ‘amor’, or for a combination of words like ‘estar enamorado’, ‘vivo en Estados Unidos’. This is called ‘string’ search.
Searching for a word category: you can do an advanced search by looking for a Verb, or for a Noun, or for a combination like Noun+Adjective (a noun followed by an adjective) or Adjective+Noun. This gives you a more sophisticated way of searching for constituents in the corpus. Please check the tag ‘Web Interface: User manual’ for further details on advanced searches.

When doing an advanced search, the corpus must have been previously POS tagged. This is why CEDEL2 has been POS tagged.

Which tags have been used?

CEDEL2 subcorpora have been automatically POS tagged with the Freeling tagger. For an interpretation of the tags, see the Freeling tagset description and, more specifically, the Spanish tagset and the English tagset. You can also see an online demo of Freeling where you can introduce your own text and it will be automatically tagged.

A note on automatic POS tagging

Please note that in this version of CEDEL2 we have done an automatic POS tagging, which implies that some words produced by learners might have been incorrectly categorised due to the very nature of learners’ language. This is so because the POS tagger automatically applies Spanish native categories onto the learner language (L2 Spanish), e.g.:

“Me casa es blanco”: the word ‘me’ is tagged as the Spanish native first person singular object personal pronoun (meaning ‘(to) me’), though we know that learners often use me as a first person singular possessive pronoun (cf. the correct Spanish native mi, meaning ‘my’).
“Yo cumplear dieciseis anos”: novel words which are typical of learner’s language (cumplear) will not be properly categorised since they do not exist in native Spanish (cf. the correct cumplir). In these cases, ‘cumplear’ has been categorised as a novel lemma (the infinitive of the non-existing verb ‘cumplear’.
“Hoy es Veintedos de Enero”: misspelled words may not be properly tagged (cf. the correct veintidós).

Despite the shortcomings of automatic tagging, we believe that this type of tagging is still very useful for those users who want to do complex searches, e.g., two advanced searches comparing the word order Noun+Adjective (blanco perro) vs. Adjective+Noun (perro blanco). The automatic tagger will not tag as adjectives learners’ misspelled adjectives (espanol ‘español’ or intilligete ‘inteligente’). However, properly spelled adjectives, which are the majority, will be correctly tagged as adjectives. Therefore, automatic tagging in a learner corpus is more useful than no tagging at all.

Technical cookies		So that our website can work. Activated by default.
Technical cookies are strictly necessary for our website to work and for you to navigate through it. These types of cookies are those that, for example, allow us to identify you, give you access to certain restricted parts of the website if necessary, or remember different options or services already selected by you, such as your privacy preferences. Therefore, they are activated by default and your authorization is not necessary. Through the configuration of your browser, you can block or alert the presence of this type of cookies, although such blocking will affect the proper functioning of the different functionalities of our website.
Analysis cookies		To allow us to know how our web is being used. You can enable or disable them.
Analysis cookies allow us to study the navigation of the users of our website in general (for example, which sections of the site are the most visited, which services are used most and if they work correctly, etc.). From this statistical information about navigation on our website, we can improve both the operation of the site itself and the different services it offers. Therefore, these cookies do not have an advertising purpose, but only serve to make our website work better, adapting to our users in general. By activating them you will contribute to this continuous improvement. You can activate or deactivate these cookies by changing the corresponding sliders.

CEDEL2: Corpus Escrito del Español L2 (version 2)

CEDEL2 (v2)