The classical measure of nongaussianity is kurtosis or the
fourth-order cumulant.
The kurtosis of y is classically defined by
![]() |
(13) |
Kurtosis can be both positive or negative.
Random variables that have a negative kurtosis are called subgaussian,
and those with positive kurtosis are called supergaussian. In
statistical literature, the corresponding expressions platykurtic and
leptokurtic are also used.
Supergaussian random variables have
typically a ``spiky'' pdf with heavy tails, i.e. the pdf is relatively large at
zero and at large values of the variable, while being small for
intermediate values. A typical example is the Laplace distribution,
whose pdf (normalized to unit variance) is given by
![]() |
(14) |
![]() |
Typically nongaussianity is measured by the absolute value of kurtosis. The square of kurtosis can also be used. These are zero for a gaussian variable, and greater than zero for most nongaussian random variables. There are nongaussian random variables that have zero kurtosis, but they can be considered as very rare.
Kurtosis, or rather its absolute value, has been widely used as a
measure of nongaussianity in ICA and related fields.
The main reason is its simplicity, both computational and
theoretical. Computationally, kurtosis can be estimated simply by
using the fourth moment of the sample data.
Theoretical analysis is simplified because of the following linearity
property:
If x1 and x2 are two independent
random variables, it holds
![]() |
(15) |
![]() |
(16) |
To illustrate in a simple example what the optimization landscape for
kurtosis looks like, and how independent components could be found by
kurtosis minimization or maximization, let us look at a 2-dimensional
model
.
Assume that the independent components s1, s2have kurtosis values
,
respectively, both
different from zero. Remember that we assumed that they have unit
variances. We seek for one of the independent components as
.
Let us again make the transformation
.
Then we have
.
Now, based on the additive
property of kurtosis, we have
.
On the other hand, we made the
constraint that the variance of y is equal to 1, based on the same
assumption concerning s1, s2. This implies a constraint on
:
.
Geometrically, this means that vector
is constrained to the unit circle on the 2-dimensional plane. The
optimization problem is now: what are the maxima of the function
on the unit circle?
For simplicity, you may consider that the kurtosis are of the same
sign, in which case it absolute value operators can be omitted.
The graph of this function is the "optimization landscape" for the
problem.
It is not hard to show [9] that the maxima are at the
points when exactly
one of the elements of vector
is zero and the other nonzero;
because of the unit circle constraint, the nonzero element must be
equal to 1 or -1. But these points are exactly the ones when yequals one of the independent components
,
and the problem
has been solved.
In practice we would start from some weight vector ,
compute the
direction in which the kurtosis of
is growing most
strongly (if
kurtosis is positive) or decreasing most strongly (if kurtosis is
negative) based on the available sample
of mixture vector
,
and use a gradient method or one of
their extensions for finding a new vector
.
The example can be
generalized to arbitrary dimensions, showing that kurtosis can
theoretically be used as an optimization criterion for the ICA
problem.
However, kurtosis has also some drawbacks in practice, when its value has to be estimated from a measured sample. The main problem is that kurtosis can be very sensitive to outliers [16]. Its value may depend on only a few observations in the tails of the distribution, which may be erroneous or irrelevant observations. In other words, kurtosis is not a robust measure of nongaussianity.
Thus, other measures of nongaussianity might be better than kurtosis in some situations. Below we shall consider negentropy whose properties are rather opposite to those of kurtosis, and finally introduce approximations of negentropy that more or less combine the good properties of both measures.