Another application of ICA is feature extraction [14,13,58,74,81,116]. Then the columns of represent features, and si is the coefficient of the i-th feature in an observed data vector . The use of ICA for feature extraction is motivated by the theory of redundancy reduction, see Section 2.3.2.
In [116], an essentially equivalent method based on sparse coding was applied for extraction of low-level features of natural image data. The results show that the extracted features correspond closely to those observed in the primary visual cortex [116,118]. These results seem to be very robust, and have been later replicated by several other authors and methods [14,58,74,81,116]. A systematical comparison between the ICA features and the properties of the simple cells in the macaque primary visual cortex was conducted in [139,138], where the authors found a good match for most of the parameters, especially if video sequences were used instead of still images. The obtained features are also closely connected to those offered by wavelet theory and Gabor analysis [38,102]. In fact, in [68,74] it was shown how to derive a completely adaptive version of wavelet shrinkage from estimation of the noisy ICA model. Application of these features on data compression and pattern recognition are important research topics.