FastICA for one unit

Next: FastICA for several units Up: The FastICA Algorithm Previous: The FastICA Algorithm

FastICA for one unit

To begin with, we shall show the one-unit version of FastICA. By a "unit" we refer to a computational unit, eventually an artificial neuron, having a weight vector ${\bf w}$ that the neuron is able to update by a learning rule. The FastICA learning rule finds a direction, i.e. a unit vector ${\bf w}$ such that the projection ${\bf w}^T{\bf x}$ maximizes nongaussianity. Nongaussianity is here measured by the approximation of negentropy $J({\bf w}^T{\bf x})$ given in (25). Recall that the variance of ${\bf w}^T{\bf x}$ must here be constrained to unity; for whitened data this is equivalent to constraining the norm of ${\bf w}$ to be unity.

The FastICA is based on a fixed-point iteration scheme for finding a maximum of the nongaussianity of ${\bf w}^T{\bf x}$ , as measured in (25), see [24,19]. It can be also derived as an approximative Newton iteration [19]. Denote by g the derivative of the nonquadratic function G used in (25); for example the derivatives of the functions in (26) are:

$\displaystyle g_1(u)=\tanh(a_1 u),$			(36)
$\displaystyle g_2(u)=u\exp(- u^2/2)$

where $1\leq a_1 \leq 2$ is some suitable constant, often taken as a₁=1. The basic form of the FastICA algorithm is as follows:

1.: Choose an initial (e.g. random) weight vector ${\bf w}$ .
2.: Let ${\bf w}^+=E\{{\bf x}g({\bf w}^T{\bf x})\}-E\{g'({\bf w}^T{\bf x})\}{\bf w}$
3.: Let ${\bf w}={\bf w}^+/\Vert{\bf w}^+\Vert$
4.: If not converged, go back to 2.

Note that convergence means that the old and new values of ${\bf w}$ point in the same direction, i.e. their dot-product is (almost) equal to 1. It is not necessary that the vector converges to a single point, since ${\bf w}$ and $-{\bf w}$ define the same direction. This is again because the independent components can be defined only up to a multiplicative sign. Note also that it is here assumed that the data is prewhitened.

The derivation of FastICA is as follows. First note that the maxima of the approximation of the negentropy of ${\bf w}^T{\bf x}$ are obtained at certain optima of $E\{G({\bf w}^T{\bf x})\}$ . According to the Kuhn-Tucker conditions [32], the optima of $E\{G({\bf w}^T{\bf x})\}$ under the constraint $E\{({\bf w}^T{\bf x})^2\}=\Vert{\bf w}\Vert^2=1$ are obtained at points where

$\begin{displaymath} E\{{\bf x}g({\bf w}^T{\bf x})\}-\beta{\bf w}=0 \end{displaymath}$

(37)

Let us try to solve this equation by Newton's method. Denoting the function on the left-hand side of (40) by F, we obtain its Jacobian matrix $JF({\bf w})$ as

$\begin{displaymath} JF({\bf w})=E\{{\bf x}{\bf x}^T g'({\bf w}^T{\bf x})\}-\beta{\bf I} \end{displaymath}$

(38)

To simplify the inversion of this matrix, we decide to approximate the first term in (41). Since the data is sphered, a reasonable approximation seems to be $E\{{\bf x}{\bf x}^T g'({\bf w}^T{\bf x})\}\approx E\{{\bf x}{\bf x}^T\}E\{ g'({\bf w}^T{\bf x})\}= E\{g'({\bf w}^T{\bf x})\}{\bf I}$ . Thus the Jacobian matrix becomes diagonal, and can easily be inverted. Thus we obtain the following approximative Newton iteration:

$\begin{displaymath} \begin{split} {\bf w}^+={\bf w}-[E\{{\bf x}g({\bf w}^T{\bf x})\}-\beta{\bf w}]/[E\{g'({\bf w}^T{\bf x})\}-\beta] \end{split}\end{displaymath}$

(39)

This algorithm can be further simplified by multiplying both sides of (42) by $\beta-E\{g'({\bf w}^T{\bf x})\}$ . This gives, after algebraic simplication, the FastICA iteration.

In practice, the expectations in FastICA must be replaced by their estimates. The natural estimates are of course the corresponding sample means. Ideally, all the data available should be used, but this is often not a good idea because the computations may become too demanding. Then the averages can be estimated using a smaller sample, whose size may have a considerable effect on the accuracy of the final estimates. The sample points should be chosen separately at every iteration. If the convergence is not satisfactory, one may then increase the sample size.

Next: FastICA for several units Up: The FastICA Algorithm Previous: The FastICA Algorithm

Aapo Hyvarinen
2000-04-19