where the tanh function is applied separately on every component of the vector , as above. The tanh function is used here because it is the derivative of the log-density of the 'logistic' distribution [12]. This function works for estimation of most super-Gaussian (sparse) independent components; for sub-Gaussian independent components, other functions must be used, see e.g. [124,26,96]. The algorithm in Eq. (36) converges, however, very slowly, as has been noted by several researchers. The convergence may be improved by whitening the data, and especially by using the natural gradient.

The natural (or relative) gradient method
simplifies the gradient method considerably, and makes it better conditioned.
The principle of the natural gradient [1,2] is based on
the geometrical
structure of the parameter space, and is related to the
principle of relative gradient [28] that uses the Lie
group structure of the ICA problem.
In the case of basic ICA, both of these principles amount to multiplying the
right-hand side of (36) by
.
Thus we obtain

with . After this modification, the algorithm does not need sphering. Interestingly, this algorithm is a special case of the non-linear decorrelation algorithm in (34), and is closely related to the algorithm in (35).

Finally, in [124], a Newton method for maximizing the likelihood was introduced. The Newton method converges in fewer iterations, but has the drawback that a matrix inversion (at least approximate) is needed in every iteration.