Authors: Chang-Jiun Wu and Simon Kasif
Computational Genomics Laboratory
Bioinformatics Graduate Program, Boston University
Keywords: biclustering, biclustering, two-way clustering, microarray, gene expression, data mining, Gibbs sampling, module
UpdateGEMS version 1.5 is now available. The new version improves the masking function, and be able to generate two vector files for the selection of samples and genes individually. A web-based server for GEMS is available at <http://genomics10.bu.edu/terrence/gems/>
The NAR paper describing GEMS server is available HERE
If you use GEMS in your research please acknowledge the following paper:
This project has been presented in the Fourth
Annual International Workshop on Bioinformatics and Systems Biology in Kyoto
Japan June 3, 2004. Details of the algorithm can be found in Genome
Informatics 15(1): 239-248, 2004.
Recent advances in high throughput profiling of gene expression have catalyzed an explosive growth in functional genomics aimed at the elucidation of genes that are differentially expressed in different tissue or cell types across a range of experimental conditions. Traditional clustering methods such as hierarchical clustering, or principal component analysis are difficult to deploy effectively for several of these tasks since genes rarely exhibit similar expression pattern across a wide range of conditions. Biclustering (also referred to as co-clustering, two way clustering, projective clustering, block clustering) of gene expression data is a promising methodology for identification of gene groups that show a coherent expression profile across a subset of conditions. While biclustering was introduced in statistics in 1974 few robust and efficient solutions exist. Here we propose a simple but promising new approach for biclustering based on a Gibbs sampling paradigm. Our algorithm is implemented in the program GEMS (Gene Expression Module Sampler). GEMS had been tested on published leukemia data sets, as well as on synthetic data generated to evaluate the effect of noise on the performance of the algorithm. In our preliminary studies we showed that GEMS is a reliable, flexible and computationally efficient approach for biclustering gene expression data. These biclusters are potential targets for genes that are functionally related or co-regulated by common transcription factors. The samples produced by the algorithm can potentially suggest sub-classes of diseases and can serve as a diagnostic tool.
Dowload and installClick the link to download the source code: gems15.cpp
After downloading, run the following commands:
1. Linux/Unix System: GNU GCC complier version
3 and above preferred.
$ g++ gems15.cpp -o gems15
$ g++ gems15.cpp -o gems15
2. Windows System: Borland C Complier 5.5 prefered
> bcc32 gems15.cpp -o gems15
Now the GEMS program should be ready to run.
Three parameters are required: name of file containing the expression data, size constraint alpha, and width constraint w.
Working report will be displayed on STDOUT. For every bicluster extracted, users can choose to generate three files.
Command Line OptionsGEMS accepts the following command-line options:
$ gems15 arrayfile.name -a=??? -w=??? [optional parameters]
LicenseGEMS is open source software; you can redistribute it and/or modify it under the terms of
the GNU General Public License as published by the Free Software Foundation;
either version 2 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT
AcknowledgementsThis work is supported in part by NSF grants DBI-0239435 and ITR-048715 and NHGRI grant #1R33HG002850-01A1.
ContactIf you have any comments or questions, please contact Chang-Jiun Wu.
Last modified: Sun Feb 13 EST 2005