Since mutual information is the natural information-theoretic
measure of the independence of random variables, we could use it
as the criterion for finding the ICA transform.
In this approach that is an alternative to the model estimation
approach, we define the ICA
of a random vector
as an invertible transformation as in (6),
where the matrix
is determined so that the
mutual information of the transformed components si is minimized.
It is now
obvious from (29) that finding an invertible transformation
that
minimizes the mutual information is roughly equivalent to finding
directions in which the negentropy is maximized. More precisely, it
is roughly equivalent to finding 1-D subspaces such that the
projections in those subspaces have maximum negentropy.
Rigorously, speaking, (29) shows that ICA estimation by
minimization of mutual information is equivalent to maximizing the sum
of nongaussianities of the estimates, when
the estimates are constrained to be uncorrelated.
The constraint of uncorrelatedness is in fact not necessary, but
simplifies the computations considerably, as one can then use the simpler
form in (29) instead of the more complicated form in (28).
Thus, we see that the formulation of ICA as minimization of mutual information gives another rigorous justification of our more heuristically introduced idea of finding maximally nongaussian directions.