Ondřej Plátek Blog
PhD candidate@UFAL, Prague. LLM & TTS evaluation. Engineer. Researcher. Speaker. Father.

Week 2, Dialogue Systems


  • Great homework submission! Thanks!
    • Simon’s solutions
    • Vojta’s solutions
    • Python & nice code style is a common practice!
    • Deadline Tue 7 AM - so I can review the solutions before next class
    • Unit-tests why?
    • Kaldi & Slurm experience
    • Phonetic examples walk-through
    • Submission formats: github.com or compressed folder
  • Barge-in
    • Examples
    • How to detect it? Audio/word/semantic/pragmatic level?
    • Voice Activity Detection (VAD) vs Wake words
      • speaker diarization
      • adaptive echo cancellation
    • End-pointing and hesitations
  • Meaning abstraction
    • opinionated stance
    • words, sentences
    • speech acts: assertive, directive, commissive, expressive, declarative
  • Actions
  • Maxims - is it hard?
    • M. of quantity – don’t give too little/too much information
    • M. of quality – be truthful
    • M. of relation – be relevant
    • M. of manner – be clear
  • Grounding and dialogue recovery
  • Entropy
    • Definition \(H(text) = - \sum_{x \in \mbox{text}}{\frac{freq(x)}{len(\mbox{text})} log_2(\frac{freq(x)}{len(\mbox{text})})}\)
      • Simplification - Find it!
    • Cross-entropy and LM
      • \[H(p, q) = -\sum_{x}{p(x) log_2(q(x))}\]
      • \[H(text, LM) = -\sum_{x \in text}{ 1/N * log_2(LM(x))}\]
  • n-gram Language models – see a detailed description by Jurafsky & Martin here


  1. (1 point) Implement entropy calculations and compute the entropy for the following datasets:
    • DSTC2 dataset
    • Facebook babi tasks 1-6. See github for details.
    • All the news - use just the “Article Content”
    • Use at most first 10,000 utterances/sentences if the dataset is large.
    • Describe in 5 sentences the properties of each dataset and explain how they relate to the computed entropy value.
  2. (2 point) Train a Language Model and compute cross entropy on the Vystadial dataset
    • Recommended toolkit - KenLM
      • Read the README and train a model bin/lmplz -o 5 <text >text.arpa on the Vystadial training set.
      • Compute cross-entropy of the train, dev set and first sentence from dev set. See the example usage
      • Describe and explain the results in 5 to 10 sentences.
  3. BONUS (3 points) Train a wake word model and evaluate it with your voice!
    • Recommended model: Mycroft precise
    • Write a short summary of what you did and what problems you have faced.
    • Include your dataset with your source code.
    • Include values of F1 measure on the training, development and test set.
  4. BONUS (3 points) Write a conditional language model using RNN (Recurrent Neural Networks).
    • Conditional language model is a decoder RNN with the initial state initialized with (i.e. conditioned on) additional information.
    • Run the conditional language model on user inputs from the DSTC2 dataset.
    • Use the previous dialogue state (or a part of it) as the initialization for your conditional language model.
    • Compare the perplexity of a vanilla RNN (zero-initialized) and your conditional implementation on the user inputs.