Mathematically the simplest one-unit contrast functions are provided
by higher-order cumulants like kurtosis (see Appendix A
for definition).
Denote by
the observed data
vector, assumed to follow the ICA data model (11).
Now, let us search for a linear combination of the observations
xi, say
,
such that its kurtosis is maximized or minimized.
Obviously, this optimization problem is meaningful only if
is somehow bounded; let us assume
.
Using the (unknown)
mixing matrix ,
let us define
.
Then, using the data model
one obtains
(recall that
), and the well-known properties of kurtosis (see
Appendix A) give
Kurtosis has been widely used for one-unit ICA (see, for example, [40,103,71,72]), as well as for projection pursuit [78]. The mathematical simplicity of the cumulants, and especially the possibility of proving global convergence results, as in [40,71,72], has contributed largely to the popularity of cumulant-based (one-unit) contrast functions in ICA, projection pursuit and related fields. However, it has been shown, for example in [62], that kurtosis often provides a rather poor objective function for the estimation of ICA, if the statistical properties of the resulting estimators are considered. Note that despite the fact that there is no noise in the ICA model (11), neither the independent components nor the mixing matrix can be computed accurately because the independent components si are random variables, and, in practice, one only has a finite sample of . Therefore, the statistical properties of the estimators of and the realizations of can be analyzed just as the properties of any estimator. Such an analysis was conducted in [62] (see also [28]), and the results show that in terms of robustness and asymptotic variance, the cumulant-based estimators tend to be far from optimal5. Intuitively, there are two main reasons for this. Firstly, higher-order cumulants measure mainly the tails of a distribution, and are largely unaffected by structure in the middle of the distribution [46]. Secondly, estimators of higher-order cumulants are highly sensitive to outliers [57]. Their value may depend on only a few observations in the tails of the distribution, which may be outliers.