|
Due: Tuesday, October 2nd
Theme:Collect and examine spoken discourse
Procedure:
Collect a sample of naturally occurring spoken discourse. "Naturally-occurring"
can be broadly construed to include radio or talk shows, children's play,
radio or TV news items, spontaneous or scripted storytelling, classroom
interactions, task-oriented conversations, classroom lectures, etc. By
"collect" we mean you should do either audio or video recordings. You should
collect a minimum of 10 minutes, and then transcribe at least 5 continuous
minutes (usually the middle of the discourse is the most natural). By "transcribe"
we mean you should make a record on paper of what you saw/heard --a good
enough record so that when we read the transcript, we know what went on.
Please indicate and explain any special annotations you use to indicate
speaker overlap, pausing, intonation, gesture, etc.
Discussion:
The point is to push you to think about what discourse is and what
makes it hard to model discourse in a computational system. You may want
to have an interactive system in mind when you choose your sample.
Think about how a computer could replace a participant in the discourse.
Supposing that you had perfect word recognition, what are the most challenging
issues in processing the discourse? Are some of these challenges specific
to the sample domain you chose? Another point is to think about what makes
a sufficient record of discourse: how do you turn a speech event into an
on-paper transcript? What parameters need to be transcribed (the words,
the pronunciation of words, the intonation, the facial expression, the
gestures, fidgeting, pauses, etc.)? We ask you to turn in the printed transcription
and a discussion of the points listed above. That is, minimally, discuss
the issue of what makes an adequate transcription, and what challenges
a computer might have in interacting in the discourse that you have collected. |