Approximations of negentropy

Next: Minimization of Mutual Information Up: Measures of nongaussianity Previous: Negentropy

Approximations of negentropy

The estimation of negentropy is difficult, as mentioned above, and therefore this contrast function remains mainly a theoretical one. In practice, some approximation have to be used. Here we introduce approximations that have very promising properties, and which will be used in the following to derive an efficient method for ICA.

The classical method of approximating negentropy is using higher-order moments, for example as follows [27]:

$\begin{displaymath} J(y)\approx \frac{1}{12}E\{y^3\}^2 +\frac{1}{48}\:\mbox{kurt}(y)^2 \end{displaymath}$

(20)

The random variable y is assumed to be of zero mean and unit variance. However, the validity of such approximations may be rather limited. In particular, these approximations suffer from the nonrobustness encountered with kurtosis.

To avoid the problems encountered with the preceding approximations of negentropy, new approximations were developed in [18]. These approximation were based on the maximum-entropy principle. In general we obtain the following approximation:

$\displaystyle J(y)\approx \sum_{i=1}^p k_i [E\{G_i(y)\}-E\{G_i(\nu)\}]^2,$

(21)

where k_i are some positive constants, and $\nu$ is a Gaussian variable of zero mean and unit variance (i.e., standardized). The variable y is assumed to be of zero mean and unit variance, and the functions G_i are some nonquadratic functions [18]. Note that even in cases where this approximation is not very accurate, (24) can be used to construct a measure of nongaussianity that is consistent in the sense that it is always non-negative, and equal to zero if y has a Gaussian distribution.

In the case where we use only one nonquadratic function G, the approximation becomes

$\displaystyle J(y)\propto [E\{G(y)\}-E\{G(\nu)\}]^2$

(22)

for practically any non-quadratic function G. This is clearly a generalization of the moment-based approximation in (23), if y is symmetric. Indeed, taking G(y)=y⁴, one then obtains exactly (23), i.e. a kurtosis-based approximation.

But the point here is that by choosing G wisely, one obtains approximations of negentropy that are much better than the one given by (23). In particular, choosing G that does not grow too fast, one obtains more robust estimators. The following choices of G have proved very useful:

$\displaystyle G_1(u)=\frac{1}{a_1}\log\cosh a_1 u,$

$\textstyle G_2(u)=-\exp(-u^2/2)$

(23)

where $1\leq a_1 \leq 2$ is some suitable constant.

Thus we obtain approximations of negentropy that give a very good compromise between the properties of the two classical nongaussianity measures given by kurtosis and negentropy. They are conceptually simple, fast to compute, yet have appealing statistical properties, especially robustness. Therefore, we shall use these contrast functions in our ICA methods. Since kurtosis can be expressed in this same framework, it can still be used by our ICA methods. A practical algorithm based on these contrast function will be presented in Section 6.

Next: Minimization of Mutual Information Up: Measures of nongaussianity Previous: Negentropy

Aapo Hyvarinen
2000-04-19