Merja Oja, Göran Sperber, Jonas Blomberg, and Samuel Kaski.
Grouping and Visualizing Human Endogenous Retroviruses by
Bootstrapping Median Self-organizing Maps. In Proceedings of IEEE
Symposium on Computational Intelligence in Bioinformatics and
Computational Biology. San Diego, California, USA, 7-8 October,
pages 95-101. 2004.
(preprint pdf)
About eight percent of the human genome consists of human endogenous
retrovirus sequences. Human endogenous retroviruses (HERV) are remains
from ancient infections by retroviruses. The HERVs are mutated and
deficient, but they still may give rise to transcripts or may affect
the expression of human genes.
The HERVs stem from several kinds of retroviruses. The possible
current functioning of the HERV sequences may reflect the origin
of the HERVs. Hence, the classification of the diverse HERV
sequences is a natural starting point when investigating the
effect of HERVs in humans. The current HERV taxonomy is
incomplete: some sequences cannot be assigned to any class and the
classification is ambiguous for others.
A Median Self-Organizing Map (SOM), a SOM for data about pairwise
distances between samples, can be used to group all the HERVs found in
the human genome. It visualizes the collection of 3661 HERV sequences
found by the RetroTector system, on a two-dimensional display that
represents similarity relationships between individual sequences, as
well as cluster structures and similarities of clusters.
The SOM, as any dimensionality reduction method, necessarily has to make
compromises when representing the data. In this work we extend the
visualizations by bootstrap-based estimates on which parts of the
visualization are reliable and which not, and use the SOM to find
potentially new HERV groups.
Merja Oja <merja.oja'at'hut.fi>
Last modified: Wed Mar 9 08:29:24 EET 2005