These web pages only give a general view to Icasso. The detailed documentation of the functions is found in the help texts in MATLAB, e.g.,
>> help icassoStruct
Icasso is based on running FastICA
(resampling). Icasso pools all the estimates together and forms
clusters bottom-up among them. The basic idea is that a tight cluster
of estimates is considered to be a candidate for including a "good"
estimate. A centroid of such cluster is considered a more reliable
estimates than any estimate from an arbitrary run. (Instead of an
average as a centroid, Icasso visualizes and returns a
from each cluster. This is the one of the original
estimates that is most similar to other estimates in the same
cluster. You can compute the average by using Icasso functions.)
The basic procedure
Icasso is a sequential procedure that is split into several phases
(functions). In general, Icasso consists of the following steps:
- Parameters for the estimation algorithm(s) are selected: e.g.,
for FastICA the estimation approach (symmetrical or deflatory),
contrast function, etc. The estimation is run N times using
the selected training parameters. Each time the data is bootstrapped
and/or the initial conditions of the estimation algorithm are changed.
- Mutual similarities between all the estimates are computed. As the
measure of similarity, we use the absolute value of the linear
correlation coefficient between the independent components. The
estimates are clustered according to their mutual
(dis)similarities. In principle, the clustering method can be freely
selected. We apply agglomerative clustering with average-linkage
- The clustering is visualized as a dendrogram and a 2D plot. The
user investigates how dense the clusters are. The clustering of the
estimates is expected to yield information on the reliability
(robustness) of estimation. A compact cluster emerges when a similar
estimate repeatedly comes up despite of the randomization.
- The user can retrieve the estimates belonging to certain
cluster(s) for further analysis and visualization.
Read illustrative examples on using Icasso in the publications
Firstly, you have to select the parameters for FastICA. In
- the (reduced) data dimension (d) that may be less than the
original input data dimension (PCA dimension reduction is often
applied in FastICA) and
- the number of ICA estimates (m) extracted on each
are of interest here.
For Icasso you have to select also
- the resampling mode,
- the number of resampling cycles (N), and
- the number of estimate-clusters (L).
Yon can use
- both a different random initial condition and resampling of the data (by bootstrapping) in each resampling cycle,
- different random initial condition for FastICA on each resampling cycle but keep the training data set fixed, or
- fixed initial condition in each cycle but bootstraps every time the data.
Number of resampling cycles (N)
Basically, the more cycles the better. However, Icasso uses currently
hierarchical clustering which causes a computational
bottleneck. Icasso can currently handle a moderate total number of
, say, 1000-2000, and consequently, a moderate
number of resampling cycles (N
). For example, if you
extract 15 independent components at one resampling cycle 50
resamplings might be appropriate M
Number of ICA estimates (estimate-clusters) (L)
Often, ICA is performed so that the number of the components is the
same as the input data dimension (possibly after
=d. If you use
it means that you try to find
as many estimates as there are data dimensions - and the quality index
and centroid estimate for all of these. The default in Icasso is
to set the number of estimate-clusters L=d
In FastICA, you can extract less independent components than there are
dimesnsions in FastICA (m < d). In Icasso, you
can also freely select the number of estimate-clusters. For example,
you can run FastICA in the deflatory mode and extract, e.g., only one
component at each run but extract several "robust" estimates
after Icasso. You can also group the estimates to bigger or smaller
number of estimate-clusters. Interpreting the results is up to you.
Sources, demixing matrix (W), and mixing matrix (A)
FastICA estimates the demixing matrix (W
). In the Icasso
procedure this is done several times, and the estimates are
clustered. Icasso returns a centroid (centrotype) estimate
from each estimate-cluster. This should represent a more
reliable estimate than any single estimate from one run of
FastICA. You can also return all
estimates in a cluster by
using appropriate Icasso functions.
However, the computational results that Icasso give do usually not
represent a strictly orthogonal base in the whitened data space
since they are directly the natural centroids (centrotypes) of the
estimate-clusters. You have to orthogonalize the result in an
appropriate manner if necessary.
The mixing matrix A is a pseudoinverse of W and the
sources are returned by computing S=WX by using the
original data that is stored in Icasso data structure.
Estimate stability index (Iq)
Icasso returns a stability (quality) index (Iq) for each
estimate-cluster. This gives a rank for the corresponding ICA
estimate. In the ideal case of m
independent components, the estimates are concentrated in m
and close-to-orthogonal clusters. In this case the index to all
estimate-clusters is (very close) to one. The value drops when the
clusters grow wider and mix up.
R-index should be addressed only in exploratory work (if wish to
explore different clustering solutions). The R-index is a heuristic
Davies-Bouldin type relative measure for a "natural" number of
Local minima of this index are "good" solutions in terms of having
mutually isolated "natural" clusters.
As any relative clustering
validity index The index is heuristic and should be used only as a
guideline. If the structure of the estimate space is complex,
this index is dubious.
Implementating the procedure using Icasso functions
First step (
icassoEst) is to compute randomized ICA
estimates N times from data X using function
icassoEst. Output of this function (we will use variable
sR is called Icasso result data structure. It
logs all the methods and parameters used in the process, and the
results from the Icasso procedure. You can extract information from
this data structure either directly or by using functions
The batch of Icasso functions that perform similarity computation, clustering
and the 2D projection are collected in
Finally, you can explore the clustering and get the results by
icassoShow. You can examine rel ationships between
estimates and clusters in detail.
Functions that start with string
icasso are main
functions: they use the Icasso result structure as input and/or
- FastICA parameters and resampling
This is a subfunction automatically called by
However, its help text describes in detail the Icasso data
structure with reference to the Icasso process. This might be of interest to you if you are
going extract information directly from the data structure.
- Performs clustering and projections for visualization
- Visualizes and returns results
- Returns results
the basic procedure from resampling to visualization in one batch.
More functions and details of the Icasso process
Page maintained by email@example.com,
last updated Thursday, 17-Mar-2005 15:17:37 EET