oplaTech Oplatek Teaching Archive
Oplatek's external memory

Machine Learning basics for Computational Linguistics

Machine learning general structure

Processing steps

  1. Objects/Instances

  2. Training data(features,selection of data)

  3. Machine Learning alg.(Simple generalisation, Decision Trees, Example based, Memory based, Support Vector Machine

  4. Classifier / Model

  5. Test Data

  6. Evaluation and back to revise the things in bold(features, data, algorithm)

Decision trees

tree BuildDecTree(Training Data T, Classes c)
IF all examples in T belong to the same class Ci THEN
create Node for Ci
1.a Select attribute F with values V1, .., Vm
1.a Create new
1.c Divide the T according F into subsets T1,.. Tm
2 foreach not empty TN in T1,...Tm
run BuildDecTree(TN,c)


log2 pt

pt ... probability (the number) of Class t

How to select attribute? Introduce information gain(less of entropy, entropy measures confusion)

GAIN(T,A) = todo

The Word Disabmiguation Problem

We have different meanings
Example world: chair
Sentence examples:
I sit on my new chair.
The chair of local newspaper earns unknown amount of money.

  1. chair -kind of furniture

  2. chair - role in institution

Lexical Matrix Wordnet

In lexical matrix are on rows synsets (
Synsets ; set of synonym - example home = {home#1,abitation#1}
senses of one world are in column - example home = ("place to live", "cell on chess")
This is lexical relation

In Lexical matrix are not stored semantic relation!
Example - hypernyms are not stored in Lexila Matrix
semantic relation are between synsets