Laboratory of Computer and Information Science / Neural Networks Research Centre CIS Lab Helsinki University of Technology

Courses in previous years: [ 2006 ][ 2007 ]

Näitä sivuja ei päivitetä enää. Ole hyvä ja katso tietojenkäsittelytieteen laitoksen WWW-sivuja:

These pages are not any more updated. Please, see web pages of Department of Information and Computer Science (ICS):

T-61.6070 Special course in bioinformatics I:
Modeling of proteomics data V P, (5-7 cr)

Lecturers Prof. Sami Kaski, Department of Information and Computer Science, Helsinki University of Technology
Assist. Prof. Sophia Kossida , Academy of Athens, Biomedical Research Foundation, Bioinformatics & Medical Informatics Team
Assistant M.Sc. Ilkka Huopaniemi
Credits (ECTS) 5 or 7
Semester Spring 2008 (period IV)
Sessions Two intensive days 13-14.3. A328. Thereafter on Wednesdays at 9-11 and Thursdays at 12-14 in room A328 (at the Computer Science and Engineering building).
Registration TKK students: WebTopi, others: send mail to t616070 at
E-mail t616070 at

Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was coined to make an analogy with genomics, the study of the genes. The proteome of an organism is the set of proteins produced by it during its life. Proteomics is often considered a main next step in the study of biological systems, after genomics. It is much more complicated than genomics, mostly because while an organism's genome is rather constant, a proteome differs from cell to cell and constantly changes through its biochemical interactions with the genome and the environment.

Proteomics research is undergoing excessive growth. Nearly every major biotech and pharmaceuticals firm has implemented a proteomics program. Functional proteomics, the study of protein function and identification of protein interactions, is playing a major role in drug discovery, biomarkers, molecular diagnostics, and antibody therapies. Today's estimation of the number of human genes is just 20,000 to 25,000. These genes give birth to around 100.000 protein transcripts. Posttranslational modifications turn the number of these proteins to about 1.000.000. This demands proteomic research to develop a wide range of software tools to manage the data arising from the need to interpret experiments, resolve protein structures, study protein-protein interactions and finally to place each protein to its functionally correct position in the rapidly expanding proteome network atlas.

This course is designed to introduce computational and statistical concepts and tools necessary to analyze proteomics data, mainly mass-spectrometry-based measurement data. The skills learned will also be applicable to other problems involving large data sets, such as gene expression data, metabolomics, and more generally in statistical data mining.

The course will be most useful for graduate-level (after bachelor) or doctoral students of bioinformatics or related fields. Mathematically oriented biology and medical students are very welcome as well. The modeling methodologies are very general, and useful also for other students of computer science, mathematics and physics.

Course format

The course contains two parts: Two intensive days (13-14.3), and a seminar course

In the seminar part, every participant gives one lecture/presentation. Passing the course with 7 credit points requires performing the following tasks:

  1. Attend actively
  2. Give a presentation of an article
  3. Devise a small exercise task and a model solution for it
  4. Solve the exercises given by the other participants
  5. Do a small project work on the topic of the presentation

Instructions for the individual tasks are given here, exercises here, and examples of the exercises of the 2007 course in old exercise problems. Leaving out the project work but passing the first four requirements results in 5 credit points. The course will be graded so that 60% of the grade is based on the presentation (including the exercise task) and 40% on the project work. If one solves almost all (90%) exercise problems then they have a weight of 10% towards the best grade, and solving at least half of them is required for passing.


Some basic course on machine learning helps significantly in understanding the models, but sufficient knowledge of mathematics (probability, statistics, linear algebra etc) should also be enough. Basic knowledge of bioinformatics or computational biology is strongly advisable.

Course material

The topics and material for the presentations will be tailored for each participant in the beginning of the course.


Below is a preliminary schedule for the course. The topics and the material of the remaining presentations will be added when they have been fixed.

Time Lecturer Subject and material
13-14.3. Sophia Kossida, Sami Kaski
  • Introduction to Proteomics
  • Administrative Issues
Thu 27.3. Paula Quantitative proteomics
  • de Groot et. al.: Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post-transcriptional regulation of key cellular processes (PDF)
  • slides (PDF)
Wed 2.4. Laszlo Phylogenetic trees
  • School of Computer Science, Tel-Aviv University, notes on phylogenetics (html)
  • Kim, Warnow, Tutorial on Phylogenetic Tree Estimation (PDF)
  • slides (PDF) (PPT2)
Thu 3.4. Tomi Protein-protein interactions
  • Shoemaker, Panchenko, Deciphering Protein-Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners (PDF)
  • Burger, van Nimwegen: Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method (PDF)
  • Slides (PDF)
Wed 9.4. Jose Integration of mRNA expression data with proteomics
  • Gunsalus et al.: Predictive models of moleculara machines involved in Caenorhabditis elegans early embryogenesis (PDF)
  • slides (PDF)
Thu 10.4. Jaakko, Gopal Coronary heart disease related proteomics(Jaakko)
  • Muredach et al. HDL proteomics: pot of gold or Pandora's box? (PDF)
  • Eng et. al.: An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database (PDF)
  • Link et. al. Direct analysis of protein complexes using mass spectrometry (PDF)
  • Vaisar et al. Shotgun proteomics implicates protease inhibition and complement activation in the antiinflammatory properties of HDL. (PDF)
  • slides (PDF)
Peak picking and preprocessing of raw protein spectra(Gopal)
  • Mantini et. al.: Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra (PDF)
  • slides (PDF)
16-17.4. No lectures
Wed 23.4. Abhishek, Hitomi
  • Fischer et. al.: Time-Series Alignment by Non-negative Multiple Generalized Canonical Correlation Analysis (PDF) (Abhishek)
  • Fischer : Time-series alignment by non-negative multiple generalized canonical correlation analysis (html) (Abhishek)
  • slides (PDF)
Kernel based methods for identifying protein-protein interactions(Hitomi)
  • Ben-Hur and Noble: "Kernel methods for predicting protein-protein interactions" (PDF)
  • slides (PDF)
  • Additional reading
  • A. Ben-Hur and D. Brutlag "Remote homology detection: a motif based approach" (PDF)
  • S. Gomez, W. Noble and A. Rzhetsky "Learning to predict protein-protein interactions from protein sequences" (PDF)
  • C. Leslie, E. Ezkin and W. Noble "The spectrum kernel: a string kernel for SVM protein classification" (PDF)
  • W. Noble "Support vector machine applications in computational biology" (PDF)
Thu 24.4. Taru Integrating proteomics and metabonomics data
  • Rantalainen et. al.: Statistically Integrated Metabonomic-Proteomic Studies on a Human Prostate Cancer Xenograft Model in Mice (PDF)
  • slides (PDF)
Wed 30.4. Lauri, Laxman Gene ontology(Lauri)
    Popescu et. al.: Fuzzy Measures on the Gene Ontology for Gene Product Similarity(PDF)
  • slides (pdf)
Identification of proteins(Laxman)
  • McHugh, Arthur:Computational Methods for Protein Identification from Mass Spectrometry Data (PDF)
  • Wan: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search (PDF)
  • slides (ppt)

You are at: CIS → T-61.6070 Special course in bioinformatics I: Modeling of biological networks

Page maintained by t616070 (at), last updated Tuesday, 19-Aug-2008 10:51:04 EEST