Week 2, Dialogue Systems

Content

• Great homework submission! Thanks!
• Simon’s solutions
• Vojta’s solutions
• Python & nice code style is a common practice!
• Deadline Tue 7 AM - so I can review the solutions before next class
• Unit-tests why?
• Kaldi & Slurm experience
• Phonetic examples walk-through
• Submission formats: github.com or compressed folder
• Barge-in
• Examples
• How to detect it? Audio/word/semantic/pragmatic level?
• Voice Activity Detection (VAD) vs Wake words
• speaker diarization
• End-pointing and hesitations
• Meaning abstraction
• opinionated stance
• words, sentences
• speech acts: assertive, directive, commissive, expressive, declarative
• Actions
• Maxims - is it hard?
• M. of quantity – don’t give too little/too much information
• M. of quality – be truthful
• M. of relation – be relevant
• M. of manner – be clear
• Grounding and dialogue recovery
• Entropy
• Definition $$H(text) = - \sum_{x \in \mbox{text}}{\frac{freq(x)}{len(\mbox{text})} log_2(\frac{freq(x)}{len(\mbox{text})})}$$
• Simplification - Find it!
• Cross-entropy and LM
• $H(p, q) = -\sum_{x}{p(x) log_2(q(x))}$
• $H(text, LM) = -\sum_{x \in text}{ 1/N * log_2(LM(x))}$
• n-gram Language models – see a detailed description by Jurafsky & Martin here

Homework

1. (1 point) Implement entropy calculations and compute the entropy for the following datasets:
• DSTC2 dataset
• All the news - use just the “Article Content”
• Use at most first 10,000 utterances/sentences if the dataset is large.
• Describe in 5 sentences the properties of each dataset and explain how they relate to the computed entropy value.
2. (2 point) Train a Language Model and compute cross entropy on the Vystadial dataset
• Recommended toolkit - KenLM
• Read the README and train a model bin/lmplz -o 5 <text >text.arpa on the Vystadial training set.
• Compute cross-entropy of the train, dev set and first sentence from dev set. See the example usage
• Describe and explain the results in 5 to 10 sentences.
3. BONUS (3 points) Train a wake word model and evaluate it with your voice!
• Recommended model: Mycroft precise
• Write a short summary of what you did and what problems you have faced.