It is possible to give a unifying view that encompasses most of the important contrast functions for ICA.

First of all, we saw above that the principles of mutual information and maximum likelihood are essentially equivalent [26]. Second, as already discussed above, the infomax principle is equivalent to maximum likelihood estimation [23,123]. The nonlinear PCA criteria can be shown to be equivalent to maximum likelihood estimation as well [86]. On the other hand, it was discussed above how some of the cumulant-based contrasts can be considered as approximations of mutual information. Thus it can be seen that most of the multi-unit contrast functions are, if not strictly equivalent, at least very closely related.

However, an important reservation is necessary here: for these
equivalencies to be at all valid, the densities *f*_{i} used in the likelihood
must be a sufficiently good approximations of the true densities of
the independent components. At the minimum, we must have one bit of
information on each independent component: whether it is sub- or
super-Gaussian [28,26,73]. This information
must be either available a priori or estimated from the data, see
[28,26,96,124]. This situation is quite different
with most contrast functions based on
cumulants, and the general contrast functions in
Section 4.4.3, which estimate directly independent
components of almost any non-Gaussian distribution.

As for the one-unit contrast functions, we have a very similar situation. Negentropy can be approximated by cumulants, or the general contrast functions in Section 4.4.3, which shows that the considered contrast functions are very closely related.

In fact, looking at the formulas for likelihood and mutual information in
(13) and (16), one sees that they can be considered as
sums of one-unit contrast functions plus a penalizing term that
prevents the vectors
from converging to the same directions.
This could be called a 'soft' form of decorrelation.
Thus we see that almost all the contrast functions could be described
by the single intuitive principle:
*Find the most nongaussian projections, and use some (soft)
decorrelation* to make sure that different independent components are found.

So, the choice of contrast function is essentially reduced the
simple *choice between estimating all the independent components in
parallel, or just estimating a few of them (possibly one-by-one)*.
This corresponds approximately to the choosing between symmetric and
hierachical decorrelation, which is a choice familiar in PCA learning
[114,111].

One must also make the less important choice between cumulant-based and robust contrast functions (i.e. those based on nonquadratic functions as in (29)), but it seems that the robust contrast functions are to be preferred in most applications.