Proof of convergence of algorithm (20)

Next: Proof of convergence of Up: Appendix: Proofs Previous: Appendix: Proofs

Proof of convergence of algorithm (20)

The convergence is proven under the assumptions that first, the data follows the ICA data model (2) and second, that the expectations are evaluated exactly. We must also make the following technical assumption:

$\begin{displaymath} E\{s_i g( s_i)-g'( s_i)\}\neq 0, \mbox{for any }i \end{displaymath}$

(27)

which can be considered a generalization of the condition, valid when we use kurtosis as contrast, that the kurtosis of the independent components must be non-zero. If (27) is true for a subset of independent components, we can estimate just those independent components.

To begin with, make the change of variable ${\bf z}={\bf A}^T{\bf w}$ , as above, and assume that ${\bf z}$ is in the neighbourhood of a solution (say, $z_1\approx 1$ as above). As shown in proof of Theorem 1 (see [24]), the change in z₁ is then of a lower order than the change in the other coordinates, due to the constraint $\Vert{\bf z}\Vert=1$ . Then we can expand the terms in (20) using a Taylor approximation for g and g', first obtaining

$\displaystyle g({\bf z}^T{\bf s})=g(z_1 s_1)+g'(z_1 s_1) {\bf z}_{-1}^T{\bf s}_{-1} +\frac{1}{2}g''(z_1 s_1) ({\bf z}_{-1}^T{\bf s}_{-1})^2$			(28)
$\displaystyle +\frac{1}{6}g'''(z_1 s_1) ({\bf z}_{-1}^T{\bf s}_{-1})^3 +O(\Vert{\bf z}_{-1}\Vert^4)$			(29)

and then

$\displaystyle g'({\bf z}^T{\bf s})=g'(z_1 s_1)+g''(z_1 s_1) {\bf z}_{-1}^T{\bf s}_{-1}$			(30)
$\displaystyle +\frac{1}{2}g'''(z_1 s_1) ({\bf z}_{-1}^T{\bf s}_{-1})^2 +O(\Vert{\bf z}_{-1}\Vert^3)$			(31)

where ${\bf z}_{-1}$ and ${\bf s}_{-1}$ are the vectors ${\bf z}$ and ${\bf s}$ without their first components. Thus we obtain, using the independence of the s_i, and doing some tedious but straight-forward algebraic manipulations,

$\begin{displaymath} z_1^+=E\{s_1 g(z_1 s_1)-g'(z_1 s_1)\}+O(\Vert z_{-1}\Vert^2) \end{displaymath}$

(32)

$\displaystyle z_i^+=\frac{1}{2}\:\mbox{skew}\:(s_i)E\{g''(s_1)\}z_i^2$
$\displaystyle +\frac{1}{6}\:\mbox{kurt}\:(s_i)E\{g'''(s_1)\} z_i^3+O(\Vert z_{-1}(\Vert^4) ,\mbox{for }i>1$			(33)

We obtain also

$\begin{displaymath}{\bf z}^*={\bf z}^+/\Vert{\bf z}^+\Vert \end{displaymath}$

(34)

This shows clearly that under the assumption (27), the algorithm converges (locally) to such a vector ${\bf z}$ that $z_1=\pm 1$ and z_i=0 for i>1. This means that ${\bf w}=({\bf A}^T)^{-1}{\bf z}$ converges, up to the sign, to one of the rows of the inverse of the mixing matrix ${\bf A}$ , which implies that ${\bf w}^T{\bf x}$ converges to one of the s_i. Moreover, if $E\{g''(s_1)\}=0$ , i.e. if the s_i has a symmetric distribution, as is usually the case, (33) shows that the convergence is cubic. In other cases, the convergence is quadratic. In addition, if G(u)=u⁴, the local approximations above are exact, and the convergence is global.

Next: Proof of convergence of Up: Appendix: Proofs Previous: Appendix: Proofs

Aapo Hyvarinen
1999-04-23