next up previous
Next: Connection to mutual information Up: Maximum Likelihood Estimation Previous: The likelihood

The Infomax Principle

Another related contrast function was derived from a neural network viewpoint in [3,35]. This was based on maximizing the output entropy (or information flow) of a neural network with non-linear outputs. Assume that ${\bf x}$ is the input to the neural network whose outputs are of the form $g_i({\bf w}_i^T{\bf x})$, where the gi are some non-linear scalar functions, and the ${\bf w}_i$are the weight vectors of the neurons. One then wants to maximize the entropy of the outputs:

\begin{displaymath}L_2=H(g_1({\bf w}_1^T{\bf x}),...,g_n({\bf w}_n^T{\bf x})).
\end{displaymath} (28)

If the gi are well chosen, this framework also enables the estimation of the ICA model. Indeed, several authors, e.g., [4,37], proved the surprising result that the principle of network entropy maximization, or ``infomax'', is equivalent to maximum likelihood estimation. This equivalence requires that the non-linearities gi used in the neural network are chosen as the cumulative distribution functions corresponding to the densities fi, i.e., gi'(.)=fi(.).

Aapo Hyvarinen