Adaptive algorithms
based on stochastic gradient descent may be problematic when used in
an environment where no adaptation is needed. This is the case in many
practical situations. The convergence is
often slow, and depends crucially on the choice of the learning rate sequence.
As a remedy for this problem, one can use batch (block) algorithms based on
fixed-point iteration [72,60,65].
In [72], a fixed-point algorithm, named FastICA,
was introduced using kurtosis, and in [60,65],
the FastICA algorithm
was generalized for general contrast functions. For sphered data, the
one-unit FastICA algorithm has the following form:
The FastICA algorithm is neural in that it is parallel and distributed, but it is not adaptive. Instead of using every data point immediately for learning, FastICA uses sample averages computed over larger samples of the data. The convergence speed of the fixed-point algorithms is clearly superior to those of the more neural algorithms. Speed-up factors in the range from 10 to 100 are usually observed [48].
An interesting point proven in [67] is that when FastICA is used with symmetric decorrelation, it is essentially equivalent to a Newton method for maximum likelihood estimation. This means that FastICA is a general algorithm that can be used to optimize both one-unit and multi-unit contrast functions.