Noisy ICA

Finally, in this Section, we shall treat the estimation of the noisy ICA model, as in Definition 2 in Section 3. Not many such methods exist. The estimation of the noiseless model seems to be a challenging task in itself, and thus the noise is usually neglected in order to obtain tractable and simple results. Moreover, it may be unrealistic in many cases to assume that the data could be divided into signals and noise in any meaningful way.

Practically all methods taking noise explicitly into account assume that
the noise is Gaussian. Thus one might use only higher-order cumulants
(for example, 4th and
6th-order cumulants), which are unaffected by Gaussian noise, and then
use methods not unlike those presented above.
This approach was taken in [93,150].
Note that the
cumulant-based methods above used both second and fourth-order
cumulants. Second-order cumulants are *not* immune to Gaussian
noise. Most of the cumulant-based methods could still be modified to
work in the noisy case. Their lack of robustness may be very problematic in a
noisy environment, though.

Maximum likelihood estimation of the noisy ICA model has also been treated. First, one could maximize the joint likelihood of the mixing matrix and the realizations of the independent components, as in [117,63,31]. A more principled method would be to maximize the (marginal) likelihood of the mixing matrix, and possibly of the noise covariance, which was done in [107]. This was based on the idea of approximating the densities of the independent components as Gaussian mixture densities; the application of the EM algorithm then becomes feasible. In [15], the simpler case of discrete-valued independent components was treated. A problem with the EM algorithm is, however, that the computational complexity grows exponentially with the dimension of the data.

Perhaps the most promising approach to noisy ICA is given by bias removal techniques. This means that ordinary (noise-free) ICA methods are modified so that the bias due to noise is removed, or at least reduced. In [43], it was shown how to modify the natural gradient ascent for likelihood so as to reduce the bias. In [66], a new concept called gaussian moments was introduced to derive one-unit contrast functions and to obtain a version of the FastICA algorithm that has no asymptotic bias, i.e. is consistent even in the presence of noise. These methods can be used even in large dimensions.