next up previous
Next: Properties of the Fixed-Point Up: Fixed-point algorithms for ICA Previous: Fixed-point algorithm for one

   
Fixed-point algorithm for several units

The one-unit algorithm of the preceding subsection can be used to construct a system of n neurons to estimate the whole ICA transformation using the multi-unit contrast function in (8). To prevent different neurons from converging to the same maxima we must decorrelate the outputs ${\bf w}_1^T{\bf x},...,{\bf w}_n^T{\bf x}$ after every iteration. We present here three methods for achieving this. These methods do not assume that the data is sphered. If it is, the covariance matrix ${\bf C}$ can simply be omitted in the following formulas.

A simple way of achieving decorrelation is a deflation scheme based on a Gram-Schmidt-like decorrelation. This means that we estimate the independent components one by one. When we have estimated p independent components, or p vectors ${\bf w}_1,...,{\bf w}_p$, we run the one-unit fixed-point algorithm for ${\bf w}_{p+1}$, and after every iteration step subtract from ${\bf w}_{p+1}$the 'projections' ${\bf w}_{p+1}^T{\bf w}_j {\bf w}_j, j=1,...,p$of the previously estimated p vectors, and then renormalize ${\bf w}_{p+1}$:

 \begin{displaymath}
\begin{array}{l}
\mbox{ 1. Let } {\bf w}_{p+1}={\bf w}_{p+1}...
...}_{p+1}/\sqrt{{\bf w}_{p+1}^T{\bf C}{\bf w}_{p+1}}
\end{array}\end{displaymath} (24)

In certain applications, however, it may be desired to use a symmetric decorrelation, in which no vectors are 'privileged' over others [28]. This can be accomplished, e.g., by the classical method involving matrix square roots,

 \begin{displaymath}
\mbox{Let }{\bf W}= ({\bf W}{\bf C}{\bf W}^T)^{-1/2} {\bf W}
\end{displaymath} (25)

where ${\bf W}$ is the matrix $({\bf w}_1,...,{\bf w}_n)^T$ of the vectors, and the inverse square root $({\bf W}{\bf C}{\bf W}^T)^{-1/2}$ is obtained from the eigenvalue decomposition of ${\bf W}{\bf C}{\bf W}^T={\bf E}{\bf D}{\bf E}^T$as $({\bf W}{\bf C}{\bf W}^T)^{-1/2}={\bf E}{\bf D}^{-1/2}{\bf E}^T$. A simpler alternative is the following iterative algorithm,

 \begin{displaymath}
\begin{array}{l}
\mbox{1. Let } {\bf W}={\bf W}/\sqrt{\Vert...
...\bf W}-\frac{1}{2}{\bf W}{\bf C}{\bf W}^T{\bf W}\\
\end{array}\end{displaymath} (26)

The norm in step 1 can be almost any ordinary matrix norm, e.g., the 2-norm or the largest absolute row (or column) sum (but not the Frobenius norm). The convergence of the orthonormalization method in (26), which may be considered a variation of Potter's formula (see [5]), is proven in the Appendix.

Finally, let us note that explicit inversion of the matrix ${\bf C}$ in (22) or (23) can be avoided by using the identity ${\bf C}^{-1}={\bf W}^T{\bf W}$which is valid for any decorrelating ${\bf W}$. This gives raise to a fixed-point algorithm in which neither sphering nor inversion of the covariance matrix is needed. In fact, such an algorithm can be considered as a fixed-point algorithm for maximum likelihood estimation of the ICA data model, see [21].


next up previous
Next: Properties of the Fixed-Point Up: Fixed-point algorithms for ICA Previous: Fixed-point algorithm for one
Aapo Hyvarinen
1999-04-23