T-61.5020 Statistical Natural Language Processing
Exercises 2 -- Entropy and perplexity
Version 1.0
W | P(W) | |
'kissa' | (cat) | |
'tuuli' | (wind) | |
'kiipeilijä' | (climber) | |
'naukaisi' | (meowed) | |
'tuivertaa' | (blows) | |
'katosi' | (disappeared) |
'naukaisi' | 'tuivertaa' | 'katosi' | ||
'kissa' | 0 | |||
'tuuli' | ||||
'kiipeilijä' | 0 | |||
Model 1 | Model 2 |
P(sana='kissa')=0.1 | P(word=subject)=0.33 |
P(sana='koira')=0.1 | P(word=verb)=0.33 |
P(sana='valas')=0.1 | P(word=object)=0.33 |
P(sana='kala')=0.1 | |
P(sana='istui')=0.1 | |
P(sana='menee')=0.1 | |
P(sana='on')=0.1 | |
P(sana='puuhun')=0.1 | |
P(sana='kuuhun')=0.1 | |
P(sana='suuhun')=0.1 |
Model 3 | |
P(sana='kissa' | word=first) | =0.25 |
P(sana='koira' | word=first) | =0.25 |
P(sana='valas' | word=first) | =0.25 |
P(sana='kala' | word=first) | =0.25 |
P(sana='istui' | previous_word {'kissa','koira','valas','kala'}) | =0.33 |
P(sana='menee' | previous_word {'kissa','koira','valas','kala'}) | =0.33 |
P(sana='on' | previous_word {'kissa','koira','valas','kala'}) | =0.33 |
P(sana='puuhun' | previous_word {'istui','menee','on'}) | =0.33 |
P(sana='kuuhun' | previous_word {'istui','menee','on'}) | =0.33 |
P(sana='suuhun' | previous_word {'istui','menee','on'}) | =0.33 |
Perplexity can be defined as the inverse of the geometric mean of the probabilities: