Definitions of linear independent component analysis

Now we shall define the problem of independent components analysis, or ICA. We shall only consider the linear case here, though non-linear forms of ICA also exist. In the literature, at least three different basic definitions for linear ICA can be found [36,80], though the differences between the definitions are usually not emphasized. This is probably due to the fact that ICA is such a new research topic: most research has concentrated on the simplest one of these definitions. In the definitions, the observed m-dimensional random vector is denoted by ${\bf x}=(x_1,...,x_m)^T$ .

Definition 1 (General definition) ICA of the random vector ${\bf x}$ consists of finding a linear transform ${\bf s}={\bf W}{\bf x}$ so that the components s_i are as independent as possible, in the sense of maximizing some function F(s₁,...,s_m) that measures independence.

This definition is the most general in the sense that no assumptions on the data are made, which is in contrast to the definitions below. Of course, this definition is also quite vague as one must also define a measure of independence for the s_i. One cannot use the definition of independence in Eq. (7), because it is not possible, in general, to find a linear transformation that gives strictly independent components. The problem of defining a measure of independence will be treated in the next Section. A different approach is taken by the following more estimation-theoretically oriented definition:

Definition 2 (Noisy ICA model) ICA of a random vector ${\bf x}$ consists of estimating the following generative model for the data:

$\begin{displaymath} {\bf x}={\bf A}{\bf s}+{\bf n} \end{displaymath}$

(10)

where the latent variables (components) s_i in the vector ${\bf s}=(s_1,...,s_n)^T$ are assumed independent. The matrix ${\bf A}$ is a constant $m\times n$ 'mixing' matrix, and ${\bf n}$ is a m-dimensional random noise vector.

This definition reduces the ICA problem to ordinary estimation of a latent variable model. However, this estimation problem is not very simple, and therefore the great majority of ICA research has concentrated on the following simplified definition:

Definition 3 (Noise-free ICA model) ICA of a random vector ${\bf x}$ consists of estimating the following generative model for the data:

$\begin{displaymath} {\bf x}={\bf A}{\bf s} \end{displaymath}$

(11)

where ${\bf A}$ and ${\bf s}$ are as in Definition 2.

In this paper, we shall concentrate on this noise-free ICA model definition. This choice can be partially justified by the fact that most of the research on ICA has also concentrated on this simple definition. Even the estimation of the noise-free model has proved to be a task difficult enough. The noise-free model may be thus considered a tractable approximation of the more realistic noisy model. The justification for this approximation is that methods using the simpler model seem to work for certain kinds of real data, as will be seen below. The estimation of the noisy ICA model is treated in Section 6.

It can be shown [36] that if the data does follow the generative model in Eq. (11), Definitions 1 and 3 become asymptotically equivalent, if certain measures of independence are used in Definition 1, and the natural relation ${\bf W}={\bf A}^{-1}$ is used with n=m.