To avoid the problems encountered with the preceding objective functions, new one-unit contrast functions for ICA were developed in [60,64,65]. Such contrast functions try to combine the positive properties of the preceding contrast functions, i.e. have statistically appealing properties (in contrast to cumulants), require no prior knowledge of the densities of the independent components (in contrast to basic maximum likelihood estimation), allow a simple algorithmic implementation (in contrast to maximum likelihood approach with simultaneous estimation of the densities), and be simple to analyze (in contrast to non-linear cross-correlation and non-linear PCA approaches). The generalized contrast functions (introduced in [60]), which can be considered generalizations of kurtosis, seem to fulfill these requirements.
To begin with, note that one intuitive interpretation of contrast functions
is that they are measures of non-normality [36]. A family
of such measures of non-normality could be constructed using
practically any functions G, and considering the difference of the
expectation of G for the actual data and the expectation of G for
Gaussian data. In other words, we can define a contrast function Jthat measures the non-normality of a zero-mean random variable yusing any even, non-quadratic, sufficiently smooth function G as
follows
Clearly, JG can be considered a generalization of (the modulus of) kurtosis. For G(y)=y4, JG becomes simply the modulus of kurtosis of y. Note that G must not be quadratic, because then JG would be trivially zero for all distributions. Thus, it seems plausible that JG in (28) could be a contrast function in the same way as kurtosis. The fact that JG is indeed a contrast function in a suitable sense (locally), is shown in [61,73]. In fact, for p=2, JG coincides with the approximation of negentropy given in (26).
In [62], the finite-sample statistical properties of the
estimators based on optimizing such a general contrast function were
analyzed. It was found that for a suitable choice of G, the statistical
properties of the estimator (asymptotic variance and robustness) are
considerably better than the
properties of the cumulant-based estimators. The following
choices of G were proposed: