Ondřej Plátek Blog
PhD candidate@UFAL, Prague. LLM & TTS evaluation. Engineer. Researcher. Speaker. Father.

Lab 2, SDS 2016


  • SLU motivation
  • Logistic regression and bag of words
  • Using dev set for selecting best model
    • E.g. early stopping for iterative training with stochastic gradient descent (SGD)
  • Tensorflow very basics
    • MNIST logistic regression example
import tensorflow as tf
x_var = tf.Variable(2.0, 'x')  # Variables with initialied values
y_var = tf.Variable(3.0, 'y')  # typically used for learnable parameteters of models. 
z_var = tf.mul(x_var, y_var)  # Variable representing result of the operation

a_var = tf.placeholder("float")  # Similar as variable but representing values feed as input.
b_var = tf.placeholder("float")  # See feed_dict below how the input values can be changed.
c_var = tf.add(a_var, b_var)  # Operation works on top of both Variables and Placeholders.
d_var = tf.add(x_var, a_var)

init = tf.initialize_all_variables()
with tf.Session() as sess:
    sess.run(init)  # Set the variables to its initial value
    print(sess.run(x_var), 'Default value x')
    print(sess.run(y_var), 'Default value y')
    print(sess.run(z_var), 'Computed value z')

    print(sess.run(c_var, feed_dict={a_var: -5.0, b_var: 2.0}), 'Computed value c1')
    print(sess.run(c_var, feed_dict={a_var: -3.0, b_var: 2.0}), 'Computed value c2')
    print(sess.run(x_var, feed_dict={a_var: -5.0}), 'Computed value x')
    # Writing variables to files instead of printing to stdout
    json.dump(sess.run(mylist_variable).tolist(), open('mylist.log', 'w'))
  • Logistic regression and cross entropy (if time)

Homework - Logistic regression

  • Use TensorFlow v0.7.1 Preferably use Python 3.
  • Explore Logistic regression example together with corresponding data loader for MNIST
  • Explore train, dev, test data sets for Spoken Language Understanding
    • understand what are the input, output pairs
    • understand that you have two kind of inputs - gold transcriptions and ASR hypothesis
    • implement data loader for bag of words/bigrams features.
  • Predict and evaluate
    • Compulsory DAI - (goodbye, None, None), (thankyou, None, None), (inform, from_stop, ?)
    • Choose two other Dialog Act Items (DAI) for prediction and evaluation
      • Describe in few sentences why you have chosen these two
  • First use bag of words representation as features, later compare it with bag of bigrams
    • Compare results of models using 50%, 70% and 100% most common words “in bag”
    • Update: For bigrams suggest another splits e.g. (30%, 40%, 100%)
    • Use gold transcriptions as features
  • Repeat the same experiment but with ASR transcriptions as features
  • Submit your code
    • Include a wrapper command to train and evaluate your best models for compulsory DAI.
      • trained on ASR hypothesis train split and evaluated on corresponding test split
      • Use top 50% most frequent bigrams for bag of words features
  • Submit a results table describing accuracy for:
    • prediction models for three compulsory DAI and two DAI according your taste
    • datasets - train, dev, test
    • features
      • input quality - asr, golden transcriptions
      • input form - words, bigrams