Fixed-point algorithm for several units

Next: Properties of the Fixed-Point Up: Fixed-point algorithms for ICA Previous: Fixed-point algorithm for one

Fixed-point algorithm for several units

The one-unit algorithm of the preceding subsection can be used to construct a system of n neurons to estimate the whole ICA transformation using the multi-unit contrast function in (8). To prevent different neurons from converging to the same maxima we must decorrelate the outputs ${\bf w}_1^T{\bf x},...,{\bf w}_n^T{\bf x}$ after every iteration. We present here three methods for achieving this. These methods do not assume that the data is sphered. If it is, the covariance matrix ${\bf C}$ can simply be omitted in the following formulas.

A simple way of achieving decorrelation is a deflation scheme based on a Gram-Schmidt-like decorrelation. This means that we estimate the independent components one by one. When we have estimated p independent components, or p vectors ${\bf w}_1,...,{\bf w}_p$ , we run the one-unit fixed-point algorithm for ${\bf w}_{p+1}$ , and after every iteration step subtract from ${\bf w}_{p+1}$ the 'projections' ${\bf w}_{p+1}^T{\bf w}_j {\bf w}_j, j=1,...,p$ of the previously estimated p vectors, and then renormalize ${\bf w}_{p+1}$ :

$\begin{displaymath} \begin{array}{l} \mbox{ 1. Let } {\bf w}_{p+1}={\bf w}_{p+1}... ...}_{p+1}/\sqrt{{\bf w}_{p+1}^T{\bf C}{\bf w}_{p+1}} \end{array}\end{displaymath}$

(24)

In certain applications, however, it may be desired to use a symmetric decorrelation, in which no vectors are 'privileged' over others [28]. This can be accomplished, e.g., by the classical method involving matrix square roots,

$\begin{displaymath} \mbox{Let }{\bf W}= ({\bf W}{\bf C}{\bf W}^T)^{-1/2} {\bf W} \end{displaymath}$

(25)

where ${\bf W}$ is the matrix $({\bf w}_1,...,{\bf w}_n)^T$ of the vectors, and the inverse square root $({\bf W}{\bf C}{\bf W}^T)^{-1/2}$ is obtained from the eigenvalue decomposition of ${\bf W}{\bf C}{\bf W}^T={\bf E}{\bf D}{\bf E}^T$ as $({\bf W}{\bf C}{\bf W}^T)^{-1/2}={\bf E}{\bf D}^{-1/2}{\bf E}^T$ . A simpler alternative is the following iterative algorithm,

$\begin{displaymath} \begin{array}{l} \mbox{1. Let } {\bf W}={\bf W}/\sqrt{\Vert... ...\bf W}-\frac{1}{2}{\bf W}{\bf C}{\bf W}^T{\bf W}\\ \end{array}\end{displaymath}$

(26)

The norm in step 1 can be almost any ordinary matrix norm, e.g., the 2-norm or the largest absolute row (or column) sum (but not the Frobenius norm). The convergence of the orthonormalization method in (26), which may be considered a variation of Potter's formula (see [5]), is proven in the Appendix.

Finally, let us note that explicit inversion of the matrix ${\bf C}$ in (22) or (23) can be avoided by using the identity ${\bf C}^{-1}={\bf W}^T{\bf W}$ which is valid for any decorrelating ${\bf W}$ . This gives raise to a fixed-point algorithm in which neither sphering nor inversion of the covariance matrix is needed. In fact, such an algorithm can be considered as a fixed-point algorithm for maximum likelihood estimation of the ICA data model, see [21].

Next: Properties of the Fixed-Point Up: Fixed-point algorithms for ICA Previous: Fixed-point algorithm for one

Aapo Hyvarinen
1999-04-23