A central problem in neural network research, as well as in statistics
and signal processing, is finding a suitable
representation or transformation of the data.
For computational and conceptual simplicity, the representation is
often sought as a linear transformation of the original data.
Let us denote by
a zero-mean m-dimensional
random variable that can be observed, and by
its n-dimensional transform.
Then the problem is to determine a constant (weight) matrix
so that the
linear transformation of the observed variables
We treat in this paper the problem of estimating the transformation given by (linear) independent component analysis (ICA) [7,27]. As the name implies, the basic goal in determining the transformation is to find a representation in which the transformed components si are statistically as independent from each other as possible. Thus this method is a special case of redundancy reduction [2].
Two promising applications of ICA are blind source separation and feature extraction. In blind source separation [27], the observed values of correspond to a realization of an m-dimensional discrete-time signal , t=1,2,.... Then the components si(t) are called source signals, which are usually original, uncorrupted signals or noise sources. Often such sources are statistically independent from each other, and thus the signals can be recovered from linear mixtures xi by finding a transformation in which the transformed signals are as independent as possible, as in ICA. In feature extraction [4,25], si is the coefficient of the i-th feature in the observed data vector . The use of ICA for feature extraction is motivated by results in neurosciences that suggest that the similar principle of redundancy reduction [2,32] explains some aspects of the early processing of sensory data by the brain. ICA has also applications in exploratory data analysis in the same way as the closely related method of projection pursuit [16,12].
In this paper, new objective (contrast) functions and algorithms for ICA are introduced. Starting from an information-theoretic viewpoint, the ICA problem is formulated as minimization of mutual information between the transformed variables si, and a new family of contrast functions for ICA is introduced (Section 2). These contrast functions can also be interpreted from the viewpoint of projection pursuit, and enable the sequential (one-by-one) extraction of independent components. The behavior of the resulting estimators is then evaluated in the framework of the linear mixture model, obtaining guidelines for choosing among the many contrast functions contained in the introduced family. Practical choice of the contrast function is discussed as well, based on the statistical criteria together with some numerical and pragmatic criteria (Section 3). For practical maximization of the contrast functions, we introduce a novel family of fixed-point algorithms (Section 4). These algorithms are shown to have very appealing convergence properties. Simulations confirming the usefulness of the novel contrast functions and algorithms are reported in Section 5, together with references to real-life experiments using these methods. Some conclusions are drawn in Section 6.