ICA data model, minimization of mutual information, and projection pursuit

Next: Contrast Functions through Approximations Up: Contrast Functions for ICA Previous: Contrast Functions for ICA

ICA data model, minimization of mutual information, and projection pursuit

One popular way of formulating the ICA problem is to consider the estimation of the following generative model for the data [1,3,5,6,23,24,27,28,31]:

$\begin{displaymath} {\bf x}={\bf A}{\bf s} \end{displaymath}$

(2)

where ${\bf x}$ is an observed m-dimensional vector, ${\bf s}$ is an n-dimensional (latent) random vector whose components are assumed mutually independent, and ${\bf A}$ is a constant $m\times n$ matrix to be estimated. It is usually further assumed that the dimensions of ${\bf x}$ and ${\bf s}$ are equal, i.e., m=n; we make this assumption in the rest of the paper. A noise vector may also be present. The matrix ${\bf W}$ defining the transformation as in (1) is then obtained as the (pseudo)inverse of the estimate of the matrix ${\bf A}$ . Non-Gaussianity of the independent components is necessary for the identifiability of the model (2), see [7].

Comon [7] showed how to obtain a more general formulation for ICA that does not need to assume an underlying data model. This definition is based on the concept of mutual information. First, we define the differential entropy Hof a random vector ${\bf y}=(y_1,...,y_n)^T$ with density f(.) as follows [33]:

$\begin{displaymath} H({\bf y})=-\int f({\bf y})\log f({\bf y})\mbox{d}{\bf y} \end{displaymath}$

(3)

Differential entropy can be normalized to give rise to the definition of negentropy, which has the appealing property of being invariant for linear transformations. The definition of negentropy J is given by

$\begin{displaymath} J({\bf y})=H({\bf y}_{gauss})-H({\bf y}) \end{displaymath}$

(4)

where ${\bf y}_{gauss}$ is a Gaussian random vector of the same covariance matrix as ${\bf y}$ . Negentropy can also be interpreted as a measure of nongaussianity [7]. Using the concept of differential entropy, one can define the mutual information I between the n (scalar) random variables y_i, i=1...n[8,7]. Mutual information is a natural measure of the dependence between random variables. It is particularly interesting to express mutual information using negentropy, constraining the variables to be uncorrelated. In this case, we have [7]

$\begin{displaymath} I(y_1,y_2,...,y_n)=J({\bf y})-\sum_i J(y_i). \end{displaymath}$

(5)

Since mutual information is the information-theoretic measure of the independence of random variables, it is natural to use it as the criterion for finding the ICA transform. Thus we define in this paper, following [7], the ICA of a random vector ${\bf x}$ as an invertible transformation ${\bf s}={\bf W}{\bf x}$ as in (1) where the matrix ${\bf W}$ is determined so that the mutual information of the transformed components s_i is minimized. Note that mutual information (or the independence of the components) is not affected by multiplication of the components by scalar constants. Therefore, this definition only defines the independent components up to some multiplicative constants. Moreover, the constraint of uncorrelatedness of the s_i is adopted in this paper. This constraint is not strictly necessary, but simplifies the computations considerably.

Because negentropy is invariant for invertible linear transformations [7], it is now obvious from (5) that finding an invertible transformation ${\bf W}$ that minimizes the mutual information is roughly equivalent to finding directions in which the negentropy is maximized. This formulation of ICA also shows explicitly the connection between ICA and projection pursuit [11,12,16,26]. In fact, finding a single direction that maximizes negentropy is a form of projection pursuit, and could also be interpreted as estimation of a single independent component [24].

Next: Contrast Functions through Approximations Up: Contrast Functions for ICA Previous: Contrast Functions for ICA

Aapo Hyvarinen
1999-04-23