# Week 1, Statistical Dialog Systems 2016

## Content

• ASR measures
• Word Error Rate (WER) $$wer(w_{hyp}, w) = \frac{S + 0.5 D + 0.5 I}{\mid w \mid}$$ where $$w$$ is sequence of words and $$S, D, I$$ are substitute, delete and insert operations used for transforming gold transcription w to hypothesis $$w_{hyp}$$ with minimum edit distance.
• Minimum edit distance and the operations used are computed exactly using dynamic programming.
• Computed typically per utterance
• Sentence Error Rate (SER) $$ser(w) = \frac{\mid \{gold_t = hyp_t; t \in \{1, .., N\}\}\mid}{N}$$ where $$wer(gold_t, hyp_t) = 0$$
• RTF(real time factor)
• latency - for SDS how long user has to wait before hearing the reply, significant portion is the ASR latency before getting the ASR result
• problems
• lexicon size and OOVs
• domain dependence for language model (LM)
• balancing LM vs AM
• keyword spotting does not need so fluent sentences
• ready to use tools and services
• Kaldi toolkit: https://github.com/kaldi-asr/kaldi
• For custom domains http://cloudasr.com

## Homework

Install TensorFlow v0.7.1 till next time, we will use it instead of anounced scikit-learn. This week you will have several options what to submit as homework. Choose only one.

1. Code simple edit distance utility
• New print both the minimum edit distance and the best aligment
• alignment - sequence of operations how to transform the gold sequence to hypothesis sequence
• names of operations
• n - nothing/null/identity
• s - substitute
• d - delete
• i - insert
• Make optional weights for edit operations S,D,I (See above).
• Make optional separator for words defaulting to space
• Implement it yourself, do not copy it from web!
• Language of your choice but make it smoothly runnable on Ubuntu 14.04 or OSX 10.10.3
• Run wrapper script which demo the utility usage with following examples:
• hyp='', gold=''
• hyp='a a a', gold='a a a'
• hyp='a b', gold='a a a'
• hyp='a b c a', gold='a a a'
2. Use Cloudasr API(See batch API docs at the bottom and compare it to Google Web Speech api which can be also used from Python
• Create recording yourself
• Decode first 100 utterances from test set in Czech vystadial dataset
• Use sclite for scoring. See 3.rd task for details
• Publish the data and the code on the web or to Rotunda lab disc and share paths with me and your colleagues via email.
3. Measure WER, SER and confusion pairs for transcribed and gold utterances
• Install sclite by downloading a Makefile and running make sclite_compiled
• Verify the successful compilation by running sctk/bin/sclite which should output help to stderr.
• Run sclite on real data hyp_content.txt and gold_content.txt
• It may pay of to check that sclite works on dummy data