We have a speech signal from which we have calculated the feature vectors . In Table 1 there are emission probabilities of the edges for each vector.
Assume that we have trained two different statistical word segmentations, A and B, from a training corpus. Using the same corpus, we have trained three language models of different size for the units of both segmentations. The sizes are the numbers of n-grams in the models. From a separate 100000 word evaluation corpus we have calculated tokenwise cross-entropies for all of the models. The results are presented in Table 2.
In addition, the models have been tested in a speech recognition system. The recognition results are evaluated with word error rate (WER), which is the percentage of words recognized incorrectly. The results are in Table 3.
Find out which one of the segmentations work better based on the cross-entropy and speech recognition results. How reliable conclusions can be made based on this data?