2024년 한국코퍼스언어학회 여름 전국학술대회:
인공지능 시대의 말뭉치 구축과 활용

Effect of Speed and Breakdown Features
on Pronunciation Scoring:

Learning from AI-powered L2 English Speech Corpus


성신여자대학교 영어영문학과
윤태진

Part 1: Spoken corpora?

Linguistic Corpora


A collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language (corpus linguistics).

David Crystal. A Dictionary of Linguistics and Phonetics, 2003

Spoken Corpora


Corpora of spoken language contain transcriptions of spontaneous or planned speech, such as broadcast news or elicited narratives and dialogues. They are often aligned with the accompanying recordings. They are an invaluable resource for various kinds of linguistic research, such as phonology, conversational analysis, and dialectology. Such corpora are carefully sampled and rich in sociodemographic metadata.

CLARIN: The research infrastructure for language as social and cultural data

Corpus phonetics and phonology:

What I have been doing with corpora
alone or with collaborators...

Part 2: In the age of Al,
do we need copora of Korean Learners of English?

The benefits of speaking SLOW & FAST.

Replace your UMMs & AHHs with this...

Summary


  • 1. Higher script levels are closely associated with increased articulation scores, suggesting that the challenge presented by script difficulty may encourage more nuanced and precise articulation among learners.
  • 2. Gender was found to have a significant effect, with males displaying lower articulation scores on average.
  • 3. The phonetic feature of articulation rate emerged as a clear indicator of proficiency, supporting the hypothesis that a faster rate of speech correlates with higher language competence.
  • 4. Meanwhile, the silence mean revealed intricate patterns in how speech pauses contribute to overall articulation performance, with longer pauses potentially indicating more deliberate or processed speech.

Concluding Remarks...


  • 1. I illustrated some examples of phonetic and phonological studies using spoken corpora from 모두의 말뭉치 & AI-Hub.
  • 2. I demonstrated the relationship between scores and phonetic features such as articulation rate and silence mean, together with gender and complexity level of scripts.
  • 3. Fun is in Phonetics and Phonology.

😎

Thank you!

https://tyoon.net