Task Apply PCA, CCA, kernel-CCA and/or gCCA for mining genes that have common behaviour in different stress treatments in yeast. The underlying motivation is that it is assumed that yeast has a set of general stress genes, "environmental stress response (ESR)" genes that are always affected under stress. The task is to discover this set genes by computationally fusing several gene expression data sets, all measured under some sort of stress. Compare the abilities of the methods to discover ESR genes, discuss the theoretical differences between the methods and the differences in practice. A preprocessed data is available for download (gzipped archive, right-click and save), but you can get the original data and the papers from the author's web pages. Links to two papers providing the expression data:
Implementation In at least R/BioConductor and Matlab, there are ready-made functions for doing the computations. Successful interpretation of the results will require understanding of the methods, their connections and output, however. You can use the ready-made functions, or make your own implementation.
Alternative data set If you like, you can use a leukemia data set instead of yeast stress case. The preprocessed data is available for download. There are five Leukemia subclasses. Their biological replicates are indicated by column names. Row names indicate the GeneID symbol for each gene expression profile across different conditions. Raw gene expression data was normalized by RMA, and logarithmic difference between the leukemia samples and measurements from normal patients is used here. The task is then to apply PCA/CCA/kCCA/gCCA between the different leukemia subtypes, by using data from all or some of the five subtypes. You can try to validate your findings by comparing them to a leukemia gene list offered by the bioinformatics.org community. Before doing this, you should convert the gene symbols of the web page to GeneID symbols of the ALL data. Another option is to check the GeneID identifiers of the most distinguished genes of the analysis by using the NCBI web server. Please ask if you encounter any problems.
You can get the original data through the original article by the authors. Link to the paper providing the expression data:
You are at: CIS → T-61.6080 Special course in bioinformatics II: Data integration and fusion in bioinformatics
Page maintained by t616080@cis.hut.fi, last updated Tuesday, 24-Oct-2006 11:39:09 EEST