We introduce an expectation maximization-type (EM) algorithm for
maximum likelihood optimization of conditional densities. It is
applicable to hidden variable models where the distributions are
from the exponential family. The algorithm can alternatively be
viewed as automatic step size selection for gradient ascent, where
the amount of computation is traded off to guarantees that each step
increases the likelihood. The tradeoff makes the algorithm
computationally more feasible than the earlier conditional EM. The
method gives a theoretical basis for extended Baum Welch algorithms
used in discriminative hidden Markov models in speech recognition,
and compares favourably with the current best method in the
experiments.
This work was supported by the Academy of Finland, decision
no. 202209, and by the IST Programme of the European
Community, under the PASCAL Network of Excellence, IST-2002-506778.