To begin with, we shall show the one-unit version of FastICA. By a "unit" we refer to a computational unit, eventually an artificial neuron, having a weight vector that the neuron is able to update by a learning rule. The FastICA learning rule finds a direction, i.e. a unit vector such that the projection maximizes nongaussianity. Nongaussianity is here measured by the approximation of negentropy given in (25). Recall that the variance of must here be constrained to unity; for whitened data this is equivalent to constraining the norm of to be unity.
The FastICA is based on a fixed-point iteration scheme for finding a
maximum of the nongaussianity of
,
as measured in (25),
see [24,19].
It can be also derived as an approximative Newton iteration [19].
Denote by g the derivative of the nonquadratic function G used in
(25); for example the derivatives of the functions in (26)
are:
The derivation of FastICA is as follows.
First note that the maxima of the approximation of the negentropy of
are obtained at certain optima of
.
According to the Kuhn-Tucker conditions [32], the optima
of
under
the constraint
are obtained at points
where
In practice, the expectations in FastICA must be replaced by their estimates. The natural estimates are of course the corresponding sample means. Ideally, all the data available should be used, but this is often not a good idea because the computations may become too demanding. Then the averages can be estimated using a smaller sample, whose size may have a considerable effect on the accuracy of the final estimates. The sample points should be chosen separately at every iteration. If the convergence is not satisfactory, one may then increase the sample size.