CEDEL2: Corpus Escrito del Español L2 (version 2)

CEDEL2 (v2)

Metadata

When downloading the corpus texts with their metadata, you will see a format similar to the one below. For each downloaded file, you will see the corresponding metadata to the left and its values to its right.

The tables below present a full list of metadata with a brief explanatory description.

Table: learner’s metadata

Metadata Description
FILENAME:

Each file in the corpus has a unique code. For learners, the filename format is:

L1_medium_proficiencyscore(raw)_age_LoI_tasknumber_initials

  • L1: ES Spanish, EN English, GR Greek, PT Portuguese, IT Italian, DE German, NL Dutch, AR Arabic, JP Japanese, CN Chinese, RU Russian.
  • MEDIUM:
    • WR (written)
    • SP (spoken)
  • PROFICIENCY RAW SCORE: Score obtained in the placement test (0 minimum - 43 maximum)
  • AGE: in years.
  • LoI: Length of Instruction in Spanish (i.e., years studying Spanish) in years, e.g.: 7 (seven years) or 8.5 (eight and a half years).
  • TASK NUMBER: number of the task (see below in this table the task number and titles; for additional task details, see the tab ‘User guide’ > ‘Corpus design’).
  • INITIALS: the participant’s initials (e.g., JFK).

For example, the file code EN_WR_37_21_7_14_MG represents an English native, who produced a written task, who obtained 37 points in the placement test (out of 43), who is 21 years old, who has been learning Spanish for 7 years, who did the Chaplin task (task # 14) and whose initials are MG.

YEAR DATA COLLECTION: The year (or range in years) when the data were collected. The format is either year (e.g., 2019 for data collected in 2019) or year range (e.g., 2006-2016 for data collected during a year range since the actual year of data collection is not available, as is the case in the first version of CEDEL2).
PLACEMENT TEST RAW SCORE: The score obtained in the Spanish placement test (0 minimum-43 maximum): University of Wisconsin (1998). The University of Wisconsin College-Level Placement Test: Spanish (Grammar) Form 96M. University of Wisconsin Press.
PLACEMENT TEST % SCORE: The placement test raw score transformed into percentage (0% minimum - 100% maximum).
PROFICIENCY LEVEL:

The classification of the placement test raw score into proficiency categories:

  • Lower beginner: 0-12 points (0%-28%)
  • Upper beginner: 13-20 points (30%-47%)
  • Lower intermediate: 21-28 points (49%-65%)
  • Upper intermediate: 29-35 points (67%-81%)
  • Lower advanced: 36-40 points (84-93%)
  • Upper advanced 41-43 points (95%-100%)
INITIALS: The participant’s initials (e.g., JFK).
SEX: The participant’s sex (Male, Female, Unknown).
AGE: The participant’s chronological age (in years).
SCHOOL/UNIVERSITY/INSTITUTION: The participant’s school or university name, if any.
MAJOR: The participant’s major subject at university, if any.
YEAR AT UNIVERSITY/SCHOOL: The participant’s year or course at school or university, if any.
NATIVE LANGUAGE: The participant’s L1 (native language).
FATHER'S NATIVE LANGUAGE: The native language of the participant’s father.
MOTHER'S NATIVE LANGUAGE: The native language of the participant’s mother.
LANGUAGE(S) SPOKEN AT HOME: The language(s) spoken at home.
AGE SPANISH WAS STARTED (IN YEARS): Age at which the participant started learning Spanish.
YEARS STUDYING SPANISH: Length of Instruction (LoI) in Spanish .
STAY IN SPANISH-SPEAKING COUNTRY (≥1 MONTH): Stay(s) in any Spanish-speaking country that have been longer than one month.
STAY WHERE: Spanish-speaking country of the stay.
STAY WHEN: Year(s) of the stay; or period(s) of the stay; or age of the participant when s/he did the stay.
STAY HOW LONG (MONTHS): Length of the stay (in months), e.g.,3.5 months or 24 months or unknown (for cases where the stay is longer than one months but the participant did not specify the exact length in months).
LANGUAGE CERTIFICATES (TYPE AND LEVEL): Official language certificates held by the participant, if any.
SELF-ASSESSMENT IN SPANISH (SPEAKING):

The participant self-assesses his/her speaking level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN SPANISH (LISTENING):

The participant self-assesses his/her listening level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN SPANISH (READING):

The participant self-assesses his/her reading level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN SPANISH (WRITING):

The participant self-assesses his/her writing level in Spanish according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
PROFICIENCY (SELF-ASSESSMENT):

The participant’s average self-assessment score in the four skills together (speaking, writing, listening, reading) in Spanish. according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)

It is calculated as follows: Each skill is self-scored, as described above, and then an average is obtained from the four scores.

ADDITIONAL FOREIGN LANGUAGE(S): Additional foreign languages (other than Spanish) known by the participant, if any.
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (SPEAKING):

The participant self-assesses his/her speaking level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (LISTENING):

The participant self-assesses his/her listening level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (READING):

The participant self-assesses his/her reading level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN MAIN ADDITIONAL LANGUAGE (WRITING):

The participant self-assesses his/her writing level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
MEDIUM (WRITTEN/SPOKEN):

The medium in which the task was produced:

  • Written
  • Spoken
TASK NUMBER: This is the number of the task (1 to 14).
TASK TITLE:

This is the title of the task:

  1. Region where you live (Describe the region where you live)
  2. Famous person (Talk about a famous person)
  3. Film (Describe a film you have recently seen)
  4. Recent trip (What did you do last year during your holidays?)
  5. Future plans (Which are your plans for the future?)
  6. Recent trip (Describe a trip you have recently made)
  7. Experience (Narrate an experience you have lived)
  8. Terrorism (Talk about the problem of terrorism in the world)
  9. Anti-smoking law (What do you think about the anti-tobacco law?)
  10. Gay couples (Do you think gay couples have the right to get married and adopt children?)
  11. Marijuana legalization (Do you think marihuana should be legalised?)
  12. Immigration (Analyse the main aspects of immigration)
  13. Frog (Describe the picture-based frog story)
  14. Chaplin (Describe the Charles Chaplin silent video clip)
TEXT: This is the text produced in the task (either the written text or the transcription if the task was in spoken format).
WRITING/AUDIO DETAILS:

These are additional details about where the task was collected:

  • written_online (a written task that was collected via online forms on the internet).
  • written_offline_classroom (a handwritten task that was collected on pen and paper in the classroom and later transcribed).
  • spoken_online (a spoken task that was self-recorded by the learner in his/her computer while at home).
  • spoken_offline_classroom (a spoken task that was recorded by an assistant in the classroom).
  • spoken_offline_lab (a spoken task that was recorded by an assistant in a quiet lab and with a specialised recording equipment: Audio Technica AT2020: Cardioid condenser microphone, 74 dB, 1 kHz at 1 Pa). These audio files are of the highest quality in the corpus and is ideal for phoneticians and phonologists.
MINUTES TAKEN TO COMPLETE THE TASK: The time taken to complete the task as self-reported by the participant (sometimes there is no self-reported information in this metadata).
WHERE WAS THE TASK DONE:

The location where the task was done:

  • Inside the classroom.
  • Outside the classroom.
  • Both inside and outside the classroom (i.e., the task was done in the classroom, for example, and then finished off at home).
RESOURCES USED:

The resources the participant used to complete the task, if any:

  • Help from a Spanish native speaker
  • Bilingual dictionary (Spanish/Learners’ L1)
  • Monolingual dictionary (Spanish)
  • Spellchecker
  • Grammar book
  • Background readings about the task topic (newspapers, internet, TV, etc.)

Table: Native’s metadata

Metadata Description
FILENAME:

Each file in the corpus has a unique code. For natives, the filename format is:

L1_medium_age_tasknumber_initials

  • L1: ES Spanish, EN English, GR Greek, PT Portuguese, AR Arabic, JP Japanese
  • MEDIUM:
    • WR (written)
    • SP (spoken)
  • TASK NUMBER: number of the task (see below in this table the task number and titles; for additional task details, see the tab ‘User guide’ > ‘Corpus design’).
  • INITIALS: the participant’s initials.

For example, the file code ES_WR_25_13_FBL represents a Spanish native who produced a written task, who is 25 years, who did task number 13 (Frog) and whose initials are FBL.

YEAR DATA COLLECTION: The year (or range in years) when the data were collected. The format is either year (e.g., 2019 for data collected in 2019) or year range (e.g., 2006-2016 for data collected during a year range since the actual year of data collection is not available, as is the case in the first version of CEDEL2).
INITIALS: The participant’s initials (e.g., JFK).
SEX: The participant’s sex (Male, Female, Unknown).
AGE: The participant’s chronological age (in years).
SCHOOL/UNIVERSITY/INSTITUTION: The participant’s school or university name, if any.
MAJOR: The participant’s major subject at university, if any.
YEAR AT UNIVERSITY/SCHOOL: The participant’s year or course at school or university, if any.
NATIVE LANGUAGE: The participant’s L1 (native language).
VARIETY OF NATIVE LANGUAGE (COUNTRY): The variety of the participant’s L1 (e.g., Peninsular Spanish, Mexican Spanish, etc).
FATHER'S NATIVE LANGUAGE: The native language of the participant’s father.
MOTHER'S NATIVE LANGUAGE: The native language of the participant’s mother.
LANGUAGE(S) SPOKEN AT HOME: The language(s) spoken at home.
ANY FOREIGN LANGUAGE?: Whether the participant knows a foreign language (yes, no).
FOREIGN LANGUAGE: Foreign language known by the participant, if any.
SELF-ASSESSMENT FOREIGN LANGUAGE (SPEAKING):

The participant self-assesses his/her speaking level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT FOREIGN LANGUAGE (LISTENING):

The participant self-assesses his/her listening level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT FOREIGN LANGUAGE (READING):

The participant self-assesses his/her reading level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT FOREIGN LANGUAGE (WRITING):

The participant self-assesses his/her writing level in the foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
ADDITIONAL FOREIGN LANGUAGE: Foreign language known by the participant, if any.
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (SPEAKING):

The participant self-assesses his/her speaking level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (LISTENING):

The participant self-assesses his/her listening level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (READING):

The participant self-assesses his/her reading level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
SELF-ASSESSMENT IN ADDITIONAL FOREIGN LANGUAGE (WRITING):

The participant self-assesses his/her writing level in the additional foreign language according to a 6-point scale:

  • Lower beginner (A1)
  • Upper beginner (A2)
  • Upper beginner (A2)
  • Upper intermediate (B2)
  • Lower advanced (C1)
  • Upper advanced (C2)
MEDIUM (WRITTEN/SPOKEN):

The medium in which the task was produced:

  • Written
  • Spoken
TASK NUMBER: This is the number of the task (1 to 12).
TASK TITLE:

This is the title of the task:

  1. Region where you live (Describe the region where you live)
  2. Famous person (Talk about a famous person)
  3. Film (Describe a film you have recently seen)
  4. Recent trip (What did you do last year during your holidays?)
  5. Future plans (Which are your plans for the future?)
  6. Recent trip (Describe a trip you have recently made)
  7. Experience (Narrate an experience you have lived)
  8. Terrorism (Talk about the problem of terrorism in the world)
  9. Anti-smoking law (What do you think about the anti-tobacco law?)
  10. Gay couples (Do you think gay couples have the right to get married and adopt children?)
  11. Marijuana legalization (Do you think marihuana should be legalised?)
  12. Immigration (Analyse the main aspects of immigration)
  13. Frog (Describe the picture-based frog story)
  14. Chaplin (Describe the Charles Chaplin silent video clip)
TEXT: This is the text produced in the task (either the written text or the transcription if the task was in spoken format).
WRITING/AUDIO DETAILS:

These are additional details about where the task was collected:

  • written_online (a written task that was collected via online forms on the internet).
  • written_offline_classroom (a handwritten task that was collected on pen and paper in the classroom and later transcribed).
  • spoken_online (a spoken task that was self-recorded by the learner in his/her computer while at home).
  • spoken_offline_classroom (a spoken task that was recorded by an assistant in the classroom).
  • spoken_offline_lab (a spoken task that was recorded by an assistant in a quiet lab and with a specialised recording equipment: Audio Technica AT2020: Cardioid condenser microphone, 74 dB, 1 kHz at 1 Pa). These audio files are of the highest quality in the corpus and is ideal for phoneticians and phonologists.
MINUTES TAKEN TO COMPLETE THE TASK: The time taken to complete the task as self-reported by the participant (sometimes there is no self-reported information in this metadata).
RESOURCES USED:

The resources the participant used to complete the task, if any:

  • Help from a Spanish native speaker
  • Bilingual dictionary (Spanish/Learners’ L1)
  • Monolingual dictionary (Spanish)
  • Spellchecker
  • Grammar book
  • Background readings about the task topic (newspapers, internet, TV, etc.)