The HERVs stem from several kinds of retroviruses. The possible current functioning of the HERV sequences may reflect the origin of the HERVs. Hence, the classification of the diverse HERV sequences is a natural starting point when investigating effects of HERVs in humans. The current HERV taxonomy is incomplete: some sequences cannot be assigned to any class and the classification is ambigous for others.
A Median Self-Organizing Map (SOM), a SOM for data about pairwise distances between samples, can be used to group all the HERVs found in the human genome. The Median SOM will visualize the HERV collection on a two-dimensional display. The visualization will represent the similarity relationships between individual sequences, as well as cluster structures and similarities of clusters.
In this work the Median SOM is used to cluster and visualize a collection of 3661 HERV sequences picked from the human genome by the RetroTector system. The trustworthiness of the visualization in representing the similarity relationships between the HERV sequences is evaluated, and confidence in the found groupings is estimated with resampling techniques.