CEDEL2: Corpus Escrito del Español L2 (version 2)

CEDEL2 (v2)

v2.0
Sept. 2020

Metadata

You can download the texts with metadata (as in the example below) or without the metadata (i.e., only the text produced by the participant). For each downloaded file, you will see the corresponding metadata to the left and its values to its right, as in the following example for a learner of Spanish:

Subcorpus: Learners
Filename: EN_WR_41_22_7_1_DLM
Year data collection: 2006-2016
Placement test raw score: 41 / 43
Placement test score (%): 95.3
Proficiency: Upper advanced
Sex: Female
Age: 22
School/University/Institution: University of Minnesota
Major: Psychology & Spanish
Years at university/school: 4th
L1: English
Father's native language: English
Mother's native language: English
Languages spoken at home: English
Age of exposure to Spanish: 14
Years studying Spanish: 7
Stay in Spanish speaking countru (>= 1 month): Yes
Stay where: Spain
Stay when: 1999 & 2005-2006
Stay abroad (months): 18
Language certificates (type and level):
Proficiency (self-assessment) speaking: Lower advanced (C1)
Proficiency (self-assessment) listening: Lower advanced (C1)
Proficiency (self-assessment) reading: Lower advanced (C1)
Proficiency (self-assessment) writing: Upper intermediate (B2)
Proficiency (self-assessment): 4.75
Additional foreign language(s):
Proficiency (self-assessment) in additional language speaking:
Proficiency (self-assessment) in additional language listening:
Proficiency (self-assessment) in additional language reading:
Proficiency (self-assessment) in additional language writing:
Medium: Written
Task number: 1
Task title: 1. Region where you live
Writting/audio details: written_online
Minutes taken to complete the task:
Where the task was done: Inside classroom
Resources used:
Text: Minneapolis es una ciudad en el sureste de Minnesota en los Estados Unidos. Por estar tan cerca-- realmente al otro lado del río Misisipi-- Minneapolis es conocido, con Saint Paul, como una de las ciudades gemelas. Se dice que Minnesota es la <<tierra de 10 mil lagos>>, y casi un tercio de ellos están en Minneapolis. Una ciudad con muchas riquezas, tanto debido al ambiente ecológico impresionante como una gran cultura artistica, creo que Minneapolis es uno de los mejores sitios en el mundo. La universidad a que asisto está en Minneapolis y tiene un campo enorme, de casi 25 bolques cuadrados. ...

The tables below present a full list of metadata with a brief explanatory description. First, we will present the learners’ metadata and next the natives’ metadata.

Table: learner’s metadata

Metadata Description
FILENAME:

Each file in the corpus has a unique code. For learners, the filename format is:

L1_medium_proficiencyscore(raw)_age_LoI_tasknumber_initials

  • L1: ES Spanish, EN English, GR Greek, PT Portuguese, IT Italian, DE German, NL Dutch, AR Arabic, JP Japanese, CN Chinese, RU Russian.
  • MEDIUM: WR (written), SP (spoken)
  • PROFICIENCY RAW SCORE: Score obtained in the placement test (0 minimum - 43 maximum)
  • AGE: in years.
  • LoI: Length of Instruction in Spanish (i.e., years studying Spanish) in years, e.g.: 7 (seven years) or 8.5 (eight and a half years).
  • TASK NUMBER: number of the task (see below in this table the task number and titles; for additional task details, see the tab ‘User guide’ > ‘Corpus design’).
  • INITIALS: the participant’s initials (e.g., JFK).

For example, the file code EN_WR_37_21_7_14_MG represents an English native, who produced a written task, who obtained 37 points in the placement test (out of 43), who is 21 years old, who has been learning Spanish for 7 years, who did the Chaplin task (task # 14) and whose initials are MG.

YEAR DATA COLLECTION: The year (or range in years) when the data were collected, e.g., 2019 or e.g., 2006-2016. Year ranges are given if the actual year of data collection is unknown, which can be the case in the first version of CEDEL2.
PLACEMENT TEST RAW SCORE: The score obtained in the Spanish placement test (0 minimum-43 maximum): University of Wisconsin (1998). The University of Wisconsin College-Level Placement Test: Spanish (Grammar) Form 96M. University of Wisconsin Press.
PLACEMENT TEST % SCORE: The placement test raw score transformed into percentage (0% minimum - 100% maximum).
PROFICIENCY LEVEL:

The classification of the placement test raw score into proficiency categories:

  • Lower beginner: 0-12 points (0%-28%)
  • Upper beginner: 13-20 points (30%-47%)
  • Lower intermediate: 21-28 points (49%-65%)
  • Upper intermediate: 29-35 points (67%-81%)
  • Lower advanced: 36-40 points (84-93%)
  • Upper advanced 41-43 points (95%-100%)
INITIALS: The participant’s initials (e.g., JFK).
SEX: The participant’s sex (Male, Female, Unknown).
AGE: The participant’s chronological age (in years).
SCHOOL/UNIVERSITY/INSTITUTION: The participant’s school or university name, if any.
MAJOR: The participant’s major subject at university, if any.
YEAR AT UNIVERSITY/SCHOOL: The participant’s year or course at school or university, if any.
NATIVE LANGUAGE: The participant’s L1 (native language).
FATHER'S NATIVE LANGUAGE: The native language of the participant’s father.
MOTHER'S NATIVE LANGUAGE: The native language of the participant’s mother.
LANGUAGE(S) SPOKEN AT HOME: The language(s) spoken at home.
AGE SPANISH WAS STARTED (IN YEARS): Age at which the participant started learning Spanish.
YEARS STUDYING SPANISH: Length of Instruction (LoI) in Spanish .
STAY IN SPANISH-SPEAKING COUNTRY (≥1 MONTH): Stay(s) in any Spanish-speaking country that have been longer than one month.
STAY WHERE: Spanish-speaking country of the stay.
STAY WHEN: Year(s) of the stay; or period(s) of the stay; or age of the participant when s/he did the stay.
STAY ABROAD (MONTHS): Length of the stay (in months), e.g.,3.5 months or 24 months or unknown (for cases where the stay is longer than one months but the participant did not specify the exact length in months).
LANGUAGE CERTIFICATES (TYPE AND LEVEL): Official language certificates held by the participant, if any.
SELF-ASSESSMENT IN SPANISH (SPEAKING):

The participant self-assesses his/her speaking level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN SPANISH (LISTENING):

The participant self-assesses his/her listening level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN SPANISH (READING):

The participant self-assesses his/her reading level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN SPANISH (WRITING):

The participant self-assesses his/her writing level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
PROFICIENCY (SELF-ASSESSMENT):

The participant’s average self-assessment score in the four skills together (speaking, writing, listening, reading) in Spanish. according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)

It is calculated as follows: Each skill is self-scored, as described above, and then an average is obtained from the four scores.

ADDITIONAL FOREIGN LANGUAGE(S): Additional foreign languages (other than Spanish) known by the participant, if any.
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (SPEAKING):

The participant self-assesses his/her speaking level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (LISTENING):

The participant self-assesses his/her listening level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (READING):

The participant self-assesses his/her reading level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (WRITING):

The participant self-assesses his/her writing level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
MEDIUM (WRITTEN/SPOKEN):

The medium in which the task was produced:

  • Written
  • Spoken
TASK NUMBER: This is the number of the task (1 to 14).
TASK TITLE:

This is the title of the task:

  1. Region where you live (Describe the region where you live)
  2. Famous person (Talk about a famous person)
  3. Film (Describe a film you have recently seen)
  4. Recent trip (What did you do last year during your holidays?)
  5. Future plans (Which are your plans for the future?)
  6. Recent trip (Describe a trip you have recently made)
  7. Experience (Narrate an experience you have lived)
  8. Terrorism (Talk about the problem of terrorism in the world)
  9. Anti-smoking law (What do you think about the anti-tobacco law?)
  10. Gay couples (Do you think gay couples have the right to get married and adopt children?)
  11. Marijuana legalization (Do you think marihuana should be legalised?)
  12. Immigration (Analyse the main aspects of immigration)
  13. Frog (Describe the picture-based frog story)
  14. Chaplin (Describe the Charles Chaplin silent video clip)
TEXT: This is the text produced in the task (either the written text or the transcription of the spoken text).
WRITING/AUDIO DETAILS:

These are additional details about where the task was collected:

  • written_online (a written task that was collected via online forms on the internet).
  • written_offline_classroom (a handwritten task that was collected on pen and paper in the classroom and later transcribed).
  • spoken_online (a spoken task that was self-recorded by the learner in his/her computer while at home).
  • spoken_offline_classroom (a spoken task that was recorded by an assistant in the classroom).
  • spoken_offline_lab (a spoken task that was recorded by an assistant in a quiet lab and with a specialised recording equipment: Audio Technica AT2020: Cardioid condenser microphone, 74 dB, 1 kHz at 1 Pa). These audio files are of the highest quality in the corpus and is ideal for phoneticians and phonologists.
MINUTES TAKEN TO COMPLETE THE TASK: The time taken to complete the task as self-reported by the participant (sometimes there is no self-reported information in this metadata).
WHERE WAS THE TASK DONE:

The location where the task was done:

  • Inside the classroom.
  • Outside the classroom.
  • Both inside and outside the classroom (i.e., the task was done in the classroom, for example, and then finished off at home).
RESOURCES USED:

The resources the participant used to complete the task, if any:

  • Help from a Spanish native speaker
  • Bilingual dictionary (Spanish/Learners’ L1)
  • Monolingual dictionary (Spanish)
  • Spellchecker
  • Grammar book
  • Background readings about the task topic (newspapers, internet, TV, etc.)

Table: Native’s metadata

Metadata Description
FILENAME:

Each file in the corpus has a unique code. For natives, the filename format is:

L1_medium_age_tasknumber_initials

  • L1: ES Spanish, EN English, GR Greek, PT Portuguese, AR Arabic, JP Japanese
  • MEDIUM:
    • WR (written)
    • SP (spoken)
  • TASK NUMBER: number of the task (see below in this table the task number and titles; for additional task details, see the tab ‘User guide’ > ‘Corpus design’).
  • INITIALS: the participant’s initials.

For example, the file code ES_WR_25_13_FBL represents a Spanish native who produced a written task, who is 25 years, who did task number 13 (Frog) and whose initials are FBL.

YEAR DATA COLLECTION: The year (or range in years) when the data were collected. The format is either year (e.g., 2019 for data collected in 2019) or year range (e.g., 2006-2016 for data collected during a year range since the actual year of data collection is not available, as is the case in the first version of CEDEL2).
INITIALS: The participant’s initials (e.g., JFK).
SEX: The participant’s sex (Male, Female, Unknown).
AGE: The participant’s chronological age (in years).
SCHOOL/UNIVERSITY/INSTITUTION: The participant’s school or university name, if any.
MAJOR: The participant’s major subject at university, if any.
YEAR AT UNIVERSITY/SCHOOL: The participant’s year or course at school or university, if any.
NATIVE LANGUAGE: The participant’s L1 (native language).
VARIETY OF NATIVE LANGUAGE (COUNTRY): The variety of the participant’s L1 (e.g., Peninsular Spanish, Mexican Spanish, etc).
FATHER'S NATIVE LANGUAGE: The native language of the participant’s father.
MOTHER'S NATIVE LANGUAGE: The native language of the participant’s mother.
LANGUAGE(S) SPOKEN AT HOME: The language(s) spoken at home.
ANY FOREIGN LANGUAGE?: Whether the participant knows a foreign language (yes, no).
FOREIGN LANGUAGE: Foreign language known by the participant, if any.
SELF-ASSESSMENT FOREIGN LANGUAGE (SPEAKING):

The participant self-assesses his/her speaking level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT FOREIGN LANGUAGE (LISTENING):

The participant self-assesses his/her listening level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT FOREIGN LANGUAGE (READING):

The participant self-assesses his/her reading level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT FOREIGN LANGUAGE (WRITING):

The participant self-assesses his/her writing level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
ADDITIONAL FOREIGN LANGUAGE: Foreign language known by the participant, if any.
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (SPEAKING):

The participant self-assesses his/her speaking level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (LISTENING):

The participant self-assesses his/her listening level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (READING):

The participant self-assesses his/her reading level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (WRITING):

The participant self-assesses his/her writing level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
MEDIUM (WRITTEN/SPOKEN):

The medium in which the task was produced:

  • Written
  • Spoken
TASK NUMBER: This is the number of the task (1 to 12).
TASK TITLE:

This is the title of the task:

  1. Region where you live (Describe the region where you live)
  2. Famous person (Talk about a famous person)
  3. Film (Describe a film you have recently seen)
  4. Recent trip (What did you do last year during your holidays?)
  5. Future plans (Which are your plans for the future?)
  6. Recent trip (Describe a trip you have recently made)
  7. Experience (Narrate an experience you have lived)
  8. Terrorism (Talk about the problem of terrorism in the world)
  9. Anti-smoking law (What do you think about the anti-tobacco law?)
  10. Gay couples (Do you think gay couples have the right to get married and adopt children?)
  11. Marijuana legalization (Do you think marihuana should be legalised?)
  12. Immigration (Analyse the main aspects of immigration)
  13. Frog (Describe the picture-based frog story)
  14. Chaplin (Describe the Charles Chaplin silent video clip)
TEXT: This is the text produced in the task (either the written text or the transcription if the task was in spoken format).
WRITING/AUDIO DETAILS:

These are additional details about where the task was collected:

  • written_online (a written task that was collected via online forms on the internet).
  • written_offline_classroom (a handwritten task that was collected on pen and paper in the classroom and later transcribed).
  • spoken_online (a spoken task that was self-recorded by the learner in his/her computer while at home).
  • spoken_offline_classroom (a spoken task that was recorded by an assistant in the classroom).
  • spoken_offline_lab (a spoken task that was recorded by an assistant in a quiet lab and with a specialised recording equipment: Audio Technica AT2020: Cardioid condenser microphone, 74 dB, 1 kHz at 1 Pa). These audio files are of the highest quality in the corpus and is ideal for phoneticians and phonologists.
MINUTES TAKEN TO COMPLETE THE TASK: The time taken to complete the task as self-reported by the participant (sometimes there is no self-reported information in this metadata).
RESOURCES USED:

The resources the participant used to complete the task, if any:

  • Monolingual dictionary (in the learners’ L1)
  • Spellchecker
  • Grammar book
  • Background readings about the task topic (newspapers, internet, TV, etc.)