I’m running one of the University of Manchester School of Arts, Languages and Cultures (SALC)’s Undergraduate Scholars Programme in 2021. The programme is designed to help curious and advanced undergraduate students conduct actual research and enhance their skills and employability. The project I have proposed is about creating and analysing spoken corpora. Here is the abstract:
HOW TO CONDUCT CORPUS-BASED RESEARCH ON SPOKEN LANGUAGE
Studying the language of written texts is relatively straight-forward: Collect lots of texts, look for the linguistic expressions you are interested in, and investigate your results. The study of spoken language, however, is a whole different animal. First, speech does not come in objective categories, but must be transcribed. This is not a trivial task but requires some theory and great effort. Secondly, there are many aspects of language that are exclusive to the spoken mode, such as certain errors or fillers (think of all the uhms and you knows). Such difficulties must be treated carefully within an explicit framework. Thirdly, the processes in uttering spoken words are multi-faceted and complex, involving factors such as attention to speech (you speak differently when you read or talk to others), speech participants, dialect, and much more. In this Undergraduate Scholars Research Programme, students learn how to deal with all of these difficulties and how to analyse spoken language professionally and competently.
Specifically, students will become familiar with …
(1) … the relevance of theoretical transcription guidelines, involving issues such as time stamps, tokenization, disfluencies and spelling conventions,
(2) … the state-of-the-art “tier transcription system”, with different levels for speakers, extralinguistic noises, comments, etc., associated software, as well as text and audio formats,
(3) … ways to find speech samples from public sources,
(4) … the importance of proper documentation of a speaker, including dialect, age, social, individual and situational variables,
(5) … basic analysis techniques in the study of linguistic features that are predominantly found in speech (e.g., quotative ‘be like’, I was like, “Yeah, definitely.”, the ‘is is’ construction, But the reality is is nobody knows the answer, or the use of emphatic ‘literally’, I literally couldn’t open my mouth.).
Each student will contribute two transcriptions of 5 minutes and 25 minutes of speech, resulting in the main outcome of this project – a professional, publicly accessible corpus of high-quality transcripts.
Three students have been accepted for this project. I’m looking forward to meeting them and working on the corpus. The start date is 10 February 2021. More info to follow.