Merja Oja, Göran Sperber, Jonas Blomberg, and Samuel Kaski. Grouping and Visualizing Human Endogenous Retroviruses by Bootstrapping Median Self-organizing Maps. In Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. San Diego, California, USA, 7-8 October, pages 95-101. 2004. (preprint pdf)

About eight percent of the human genome consists of human endogenous retrovirus sequences. Human endogenous retroviruses (HERV) are remains from ancient infections by retroviruses. The HERVs are mutated and deficient, but they still may give rise to transcripts or may affect the expression of human genes.

The HERVs stem from several kinds of retroviruses. The possible current functioning of the HERV sequences may reflect the origin of the HERVs. Hence, the classification of the diverse HERV sequences is a natural starting point when investigating the effect of HERVs in humans. The current HERV taxonomy is incomplete: some sequences cannot be assigned to any class and the classification is ambiguous for others.

A Median Self-Organizing Map (SOM), a SOM for data about pairwise distances between samples, can be used to group all the HERVs found in the human genome. It visualizes the collection of 3661 HERV sequences found by the RetroTector system, on a two-dimensional display that represents similarity relationships between individual sequences, as well as cluster structures and similarities of clusters. The SOM, as any dimensionality reduction method, necessarily has to make compromises when representing the data. In this work we extend the visualizations by bootstrap-based estimates on which parts of the visualization are reliable and which not, and use the SOM to find potentially new HERV groups.

Merja Oja <merja.oja'at'hut.fi>
Last modified: Wed Mar 9 08:29:24 EET 2005