Likelihood and network entropy

Next: Mutual information and Kullback-Leibler Up: Multi-unit contrast functions Previous: Multi-unit contrast functions

Likelihood and network entropy

It is possible to formulate the likelihood in the noise-free ICA model (11), which was done in [124], and then estimate the model by a maximum likelihood method. Denoting by ${\bf W}=({\bf w}_1,...,{\bf w}_m)^T$ the matrix ${\bf A}^{-1}$ , the log-likelihood takes the form [124]:

$\begin{displaymath} L=\sum_{t=1}^T \sum_{i=1}^m \log f_i({\bf w}_i^T {\bf x}(t))+T\ln\vert\det {\bf W}\vert \end{displaymath}$

(13)

where the f_i are the density functions of the s_i (here assumed to be known), and the ${\bf x}(t),t=1,...,T$ are the realizations of ${\bf x}$ .

Another related contrast function was derived from a neural network viewpoint in [12,108]. This was based on maximizing the output entropy (or information flow) of a neural network with non-linear outputs. Assume that ${\bf x}$ is the input to the neural network whose outputs are of the form $g_i({\bf w}_i^T{\bf x})$ , where the g_i are some non-linear scalar functions, and the ${\bf w}_i$ are the weight vectors of the neurons. One then wants to maximize the entropy of the outputs:

$\begin{displaymath}L_2=H(g_1({\bf w}_1^T{\bf x}),...,g_m({\bf w}_m^T{\bf x})). \end{displaymath}$

(14)

If the g_i are well chosen, this framework also enables the estimation of the ICA model. Indeed, several authors, e.g., [23,123], proved the surprising result that the principle of network entropy maximization, or 'infomax', is equivalent to maximum likelihood estimation. This equivalence requires that the non-linearities g_i used in the neural network are chosen as the cumulative distribution functions corresponding to the densities f_i, i.e., g_i'(.)=f_i(.).

The advantage of the maximum likelihood approach is that under some regularity conditions, it is asymptotically efficient; this is a well-known result in estimation theory [127]. However, there are also some drawbacks. First, this approach requires the knowledge of the probability densities of the independent components. These could also be estimated [124,96], but this complicates the method considerably. A second drawback is that the maximum likelihood solution may be very sensitive to outliers, if the pdf's of the independent components have certain shapes (see [62]), while robustness against outliers is an important property of any estimator [50,56]⁴.

Next: Mutual information and Kullback-Leibler Up: Multi-unit contrast functions Previous: Multi-unit contrast functions

Aapo Hyvarinen
1999-04-23