Courses in previous years: [ 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 ]

Näitä sivuja ei päivitetä enää. Ole hyvä ja katso tietojenkäsittelytieteen laitoksen WWW-sivuja: http://ics.tkk.fi/fi/studies/.

These pages are not any more updated. Please, see web pages of Department of Information and Computer Science (ICS): http://ics.tkk.fi/en/studies/.

Näitä sivuja ei päivitetä enää. Ole hyvä ja katso tietojenkäsittelytieteen laitoksen WWW-sivuja: http://ics.tkk.fi/fi/studies/.

These pages are not any more updated. Please, see web pages of Department of Information and Computer Science (ICS): http://ics.tkk.fi/en/studies/.

Nonlinear Dimensionality Reduction (6 cr ECTS)

Lecturers | Amaury Lendasse, Francesco Corona |
---|---|

Assistant | Kristian Nybo |

Credits (ECTS) | 6 |

Semester | Autumn 2007 (during periods I and II) |

Seminar sessions | Tuesdays at 2 PM in room T4, starting 11.9 |

Language | English |

Web | http://www.cis.hut.fi/Opinnot/T-61.6050/ |

Registration | |

christian name dot surname at tkk dot fi |

Methods of dimensionality reduction are innovative and important tools in the fields of data analysis, data mining and machine learning. They provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimensional scaling suffer from being based on linear models. Until recently, very few methods were able to reduce the data dimensionality in a nonlinear way. However, since the late nineties, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth are e.g. the use of graphs to represent the manifold topology, and the use of new metrics, like the geodesic distance. In addition, new optimization schemes, based on kernel techniques and spectral decomposition, have lead to spectral embedding, which encompasses many of the recently developed methods.

This course describes existing and advanced methods to reduce the dimensionality of numerical databases. For each method, the description starts from intuitive ideas, develops the necessary mathematical details and ends by outlining the algorithmic implementation. Methods are compared with each other with the help of different illustrative examples.

The purpose of the course is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction. With this goal in mind, methods are all described from a unifying point of view, in order to highlight their respective strengths and shortcomings.

Nonlinear Dimensionality Reduction

Series: Information Science and Statistics

Lee, John A.; Verleysen, Michel

2007, approx. 300 p., hardcover

ISBN: 978-0-387-39350-6

Time | Lecturer | Topic | Slides |
---|---|---|---|

11.9. | Amaury Lendasse | High-Dimensional Data | |

18.9. | Elia Liitiäinen | Characteristics of an Analysis Method | |

25.9. | Kristian Nybo | Estimation of the Intrinsic Dimension | |

2.10. | Markus Ojala | Distance Preservation I | |

9.10. | Niko Vuokko | Distance Preservation II | |

16.10. | Amaury Lendasse | PROJECT | ppt, pdf |

23.10. | Laszlo Kozma and Dusan Sovilj | Topology Preservation I | Laszlo, Dusan |

6.11. | Antti Sorjamaa and Yoan Miche | Topology Preservation II | Antti, Yoan |

13.11. | Andrey Ermolov | Method Comparisons | |

20.11. | Emil Eirola and Lauri Oksanen | Conclusions | Emil, Lauri |

The reports should be handed in by email to both lecturers and the assistant no later than at 3.45 PM on the 21st of December. The reports should be written using LaTeX and the ESTSP template (tex file, cls file). The maximum length is 12 pages. For further information on the project, see Amaury Lendasse's slides above.

You can download the latest version of John Lee's toolbox here. Get the LSSVM toolbox here. Emil Eirola has also provided a simple example of using the LSSVM toolbox.

Markus Ojala reported a bug and a possible workaround:

*There is a bug with John Lees' NLDR software "NLPm". One cannot load a
validation set with command "iv": Interpolate validation set. It tries
to load only a binary file, and we didn't found any way to save the
files in appropriate binary format. This is needed at least for datasets
Chemometrics and Time Series.
*

*However, there is a workaround: Perform the mapping with the learning
set and save the mapping with command "sm". Modify your test data to
have the same size as the learning data by copying, for example, the
last point the correct number of times. After that you can load the test
data and the previous mapping file, and effectively cheat the software.
Then the interpolation of the test set is done by "il". Without
modifying the size of the test data the software doesn't allow to load
it. After the projection just remove the copies of the last point.*

Laszlo Kozma pointed out that the windows version of the toolbox, available on John Lee's website, does not have this bug.

You are at: CIS → T-61.6050 Special Course in Computer and Information Science V

Page maintained by t616050@cis.hut.fi, last updated Tuesday, 19-Aug-2008 10:51:04 EEST