next up previous
Next: Fixed-point algorithm for one Up: Fixed-point algorithms for ICA Previous: Fixed-point algorithms for ICA

Introduction

In the preceding sections, we introduced new contrast (or objective) functions for ICA based on minimization of mutual information (and projection pursuit), analyzed some of their properties, and gave guidelines for the practical choice of the function G used in the contrast functions. In practice, one also needs an algorithm for maximizing the contrast functions in (7) or (8).

A simple method to maximize the contrast function would be to use stochastic gradient descent; the constraint could be taken into account by a bigradient feedback. This leads to neural (adaptive) algorithms that are closely related related to those introduced in [24]. We show in the Appendix B how to modify the algorithms in [24] to minimize the contrast functions used in this paper.

The advantage of neural on-line learning rules is that the inputs ${\bf x}(t)$ can be used in the algorithm at once, thus enabling faster adaptation in a non-stationary environment. A resulting trade-off, however, is that the convergence is slow, and depends on a good choice of the learning rate sequence, i.e. the step size at each iteration. A bad choice of the learning rate can, in practice, destroy convergence. Therefore, it would important in practice to make the learning faster and more reliable. This can be achieved by the fixed-point iteration algorithms that we introduce here. In the fixed-point algorithms, the computations are made in batch (or block) mode, i.e., a large number of data points are used in a single step of the algorithm. In other respects, however, the algorithms may be considered neural. In particular, they are parallel, distributed, computationally simple, and require little memory space. We will show below that the fixed-point algorithms have very appealing convergence properties, making them a very interesting alternative to adaptive learning rules in environments where fast real-time adaptation is not necessary.

Note that our basic ICA algorithms require a preliminary sphering or whitening of the data ${\bf x}$, though also some versions for non-sphered data will be given. Sphering means that the original observed variable, say ${\bf v}$ is linearly transformed to a variable ${\bf x}={\bf Q}{\bf v}$such that the correlation matrix of ${\bf x}$ equals unity: $E\{{\bf x}{\bf x}
^T\}={\bf I}$. This transformation is always possible; indeed, it can be accomplished by classical PCA. For details, see [7,12].


next up previous
Next: Fixed-point algorithm for one Up: Fixed-point algorithms for ICA Previous: Fixed-point algorithms for ICA
Aapo Hyvarinen
1999-04-23