T-61.5020 Statistical Natural Language Processing
Exercises 2 -- Entropy and perplexity
Version 1.0
W | P(W) | |
'kissa' | (cat) |
![]() |
'tuuli' | (wind) |
![]() |
'kiipeilijä' | (climber) |
![]() |
'naukaisi' | (meowed) |
![]() |
'tuivertaa' | (blows) |
![]() |
'katosi' | (disappeared) |
![]() |
'naukaisi' | 'tuivertaa' | 'katosi' | ||
'kissa' |
![]() |
0 |
![]() |
![]() |
'tuuli' |
![]() |
![]() |
![]() |
![]() |
'kiipeilijä' |
![]() |
0 |
![]() |
![]() |
![]() |
![]() |
![]() |
Model 1 | Model 2 |
P(sana='kissa')=0.1 | P(word=subject)=0.33 |
P(sana='koira')=0.1 | P(word=verb)=0.33 |
P(sana='valas')=0.1 | P(word=object)=0.33 |
P(sana='kala')=0.1 | |
P(sana='istui')=0.1 | |
P(sana='menee')=0.1 | |
P(sana='on')=0.1 | |
P(sana='puuhun')=0.1 | |
P(sana='kuuhun')=0.1 | |
P(sana='suuhun')=0.1 |
Model 3 | |
P(sana='kissa' | word=first) | =0.25 |
P(sana='koira' | word=first) | =0.25 |
P(sana='valas' | word=first) | =0.25 |
P(sana='kala' | word=first) | =0.25 |
P(sana='istui' | previous_word ![]() |
=0.33 |
P(sana='menee' | previous_word ![]() |
=0.33 |
P(sana='on' | previous_word ![]() |
=0.33 |
P(sana='puuhun' | previous_word ![]() |
=0.33 |
P(sana='kuuhun' | previous_word ![]() |
=0.33 |
P(sana='suuhun' | previous_word ![]() |
=0.33 |
Perplexity can be defined as the inverse of the geometric mean of the probabilities:
![]() |