Ondřej Plátek Blog
PhD candidate@UFAL, Prague. LLM & TTS evaluation. Engineer. Researcher. Speaker. Father.

Week 3, Dialogue Systems



  1. (2 points) Implement your own dialogue state tracker using any means:
    • Submit the result to your fork of ds-dstc2 repository
    • Calculate F1 and accuracy
    • Dialogue State Tracker is evaluated after each turn
    • Evaluate the slots provided for you: food area price_range
    • (? BONUS points) for exceptional models/efforts: e.g. RNN, CNN models in TensorFlow/Pytorch and results about 60% in accuracy
    • (1 BONUS point) compare scores on ASR hypothesis vs gold hypothesis vs ASR hypothesis which takes into account ASR score.
  2. (1 point) Write down alternative responses from a chit-chat system and evaluate performance of such system.
    • E.g. write a unit-test computing BLEU score – use implementation above
    • Elaborate on cases where your metric works and where it does not