The Infomax Principle

Next: Connection to mutual information Up: Maximum Likelihood Estimation Previous: The likelihood

The Infomax Principle

Another related contrast function was derived from a neural network viewpoint in [3,35]. This was based on maximizing the output entropy (or information flow) of a neural network with non-linear outputs. Assume that ${\bf x}$ is the input to the neural network whose outputs are of the form $g_i({\bf w}_i^T{\bf x})$ , where the g_i are some non-linear scalar functions, and the ${\bf w}_i$ are the weight vectors of the neurons. One then wants to maximize the entropy of the outputs:

$\begin{displaymath}L_2=H(g_1({\bf w}_1^T{\bf x}),...,g_n({\bf w}_n^T{\bf x})). \end{displaymath}$

(28)

If the g_i are well chosen, this framework also enables the estimation of the ICA model. Indeed, several authors, e.g., [4,37], proved the surprising result that the principle of network entropy maximization, or ``infomax'', is equivalent to maximum likelihood estimation. This equivalence requires that the non-linearities g_i used in the neural network are chosen as the cumulative distribution functions corresponding to the densities f_i, i.e., g_i'(.)=f_i(.).

Aapo Hyvarinen
2000-04-19