Theoretically the most satisfying contrast function in the multi-unit case is, in our view, mutual information.
Using the concept of differential entropy defined in Eq. (6),
one defines
the mutual information I between m (scalar) random variables
yi,
i=1...m, as follows
The use of mutual information can also be motivated using the
Kullback-Leibler divergence, defined for two probability densities
f1 and f2 as
(17) |
The connection to the Kullback-Leibler divergence also shows the close connection between minimizing mutual information and maximizing likelihood. In fact, the likelihood can be represented as a Kullback-Leibler distance between the observed density and the factorized density assumed in the model [26]. So both of these methods are minimizing the Kullback-Leibler distance between the observed density and a factorized density; actually the two factorized densities are asymptotically equivalent, if the density is accurately estimated as part of the ML estimation method.
The problem with mutual information is that it is difficult to estimate. As was mentioned in Section 2, to use the definition of entropy, one needs an estimate of the density. This problem has severely restricted the use of mutual information in ICA estimation. Some authors have used approximations of mutual information based on polynomial density expansions [36,1], which lead to the use of higher-order cumulants (for definition of cumulants, see Appendix A).
The polynomial density expansions are related
to the Taylor expansion. They give an approximation of a probability
density f(.) of a scalar random variable y using its higher-order
cumulants. For
example, the first terms of the Edgeworth expansion give, for a scalar
random variable y of zero mean and unit variance [88]:
(18) |
Cumulant-based approximations such as the one in (19) simplify the use of mutual information considerably. The approximation is valid, however, only when f(.) is not far from the Gaussian density function, and may produce poor results when this is not the case. More sophisticated approximations of mutual information can be constructed by using the approximations of differential entropy that were introduced in [64], based on the maximum entropy principle. In these approximations, the cumulants are replaced by more general measures of nongaussianity, see Section 4.4.3 and Section 4.4.1. Minimization of such an approximation of mutual information was introduced in [65].