Lab 2, SDS 2016
Content
- SLU motivation
- Logistic regression and bag of words
- Using dev set for selecting best model
- E.g. early stopping for iterative training with stochastic gradient descent (SGD)
- Tensorflow very basics
- MNIST logistic regression example
- Logistic regression and cross entropy (if time)
Homework - Logistic regression
- Use TensorFlow v0.7.1 Preferably use Python 3.
- Explore Logistic regression example together with corresponding data loader for MNIST
- Explore train, dev, test data sets for Spoken Language Understanding
- understand what are the input, output pairs
- understand that you have two kind of inputs - gold transcriptions and ASR hypothesis
- implement data loader for bag of words/bigrams features.
- Predict and evaluate
- Compulsory DAI -
(goodbye, None, None), (thankyou, None, None), (inform, from_stop, ?)
- Choose two other Dialog Act Items (DAI) for prediction and evaluation
- Describe in few sentences why you have chosen these two
- First use bag of words representation as features, later compare it with bag of bigrams
- Compare results of models using 50%, 70% and 100% most common words “in bag”
- Update: For bigrams suggest another splits e.g. (30%, 40%, 100%)
- Use gold transcriptions as features
- Repeat the same experiment but with ASR transcriptions as features
- Submit your code
- Include a wrapper command to train and evaluate your best models for compulsory DAI.
- trained on ASR hypothesis train split and evaluated on corresponding test split
- Use top 50% most frequent bigrams for bag of words features
- Submit a results table describing accuracy for:
- prediction models for three compulsory DAI and two DAI according your taste
- datasets -
train, dev, test
- features
- input quality -
asr, golden transcriptions
- input form -
words, bigrams