Bayes Networks and Graphical Models in Computational Molecular Biology and Bioinformatics
Survey of Recent Research
DNA Analysis, Genetics, System Biology, Gene Expression, Functional Annotation, Protein-protein interaction, Haplotype Inference, Pedigree Analysis, Data Integration
******************************** WARNING!!! ********************************
THE OBJECTS IN THIS "MIRROR" APPEAR CLOSER THAN THEY ACTUALLY ARE!
******************************** WARNING!!! ********************************DISCLAIMER: There is not attempt to be complete and we do not provide a strong endorsement of any of the publications below (including our own work). It is up to the readers to form their opinions and assess the strength and limitations of the work described in these papers. It is typical than in early publications on any given topic the authors are somewhat optimistic and perhaps even naive. Iif you want to add your paper to this resource please send email to kasif at bu . edu with the bibliographical information.
In 1992 Prof. Simon Kasif and two former students Arthur Delcher and William (Bill) Xsu at Johns Hopkins University described one of the earliest applications of Bayes networks (graphical models) to modern problems in bioinformatics. This application of graphical models to early genetics has some history. In a completely different setting S. Wright described the so called path diagrams (an ancestor of Causal Markov Networks) in a particular application of genetics in 1934 or even earlier. Initially our 1992 AAAI submission described the application of Bayes Networks to modeling protein secondary structure. Our follow-up paper in ISMB 2003 describes the use of Bayes networks for perturbational analysis to simulate in-silico mutagenesis and HMM modeling of secondary structure. Bayesian networks generalize HMMs (in fact the relationship of our simple model to an HMM was suggested to us first by Bill's uncle Kai-Fu Lee, one of the world experts on speech recognition). The number of applications of Bayes nets in computational biology is growing. These include applications such as integration of diverse biological databases, functional genomics, microarray analysis, comparative genomics genetics, linkage analysis, system biology and other key computational problems in molecular biology
This page provides a set of links to a few key papers. We make no claim to the relationship of our early work to these more recent applications of Bayes Networks in Bioinformatics. The methods are different, the models are different and the biology is different. There is also no attempt to provide a complete and comprehensive survey, just a few links that can provide a possible starting point. As mentioned above HMMs are a special case of Dynamic Bayes Nets. The number of applications of HMMs in computational biology is rather extensive and the book by Durbin et al is an excellent starting point. Our initial work is often not recognized as part of the HMM literature because many people are not aware of the connection between HMMs and Bayes Networks. This connection is documented in a number of recent books and papers on Bayes Nets e.g.:
Probabilistic Independence Networks for Hidden Markov Probability Models (1996)
Intro and many refs to Graphical models can be found at Kevin's Murphy's site at MIT
A., S. Kasif, H. Goldberg and W. Xsu, "Protein Secondary-Structure Modeling
with Probabilistic Networks", International Conference on Intelligent Systems
and Molecular Biology, pp. 109--117, 1993 . One of the first applications of
Hidden Markov models and first application of probabilistic networks (Bayes Nets) to modeling proteins.
Salzberg,S., D. Searl and S. Kasif, "Computational Methods in Molecular Biology",Elsevier Publ., 1998.
England: Ghahramani's Group, " Protein Secondary Prediction with Segmental Models
Singapore, " Protein Structure and Fold Prediction with Tree Augmented Bayesian classifier
Israel, Hanover, Weiss , " Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions
Israel, Hanover, Weiss , " Approximate Inference and Protein Folding
UC Irvine, Baaldi Lab , " Secondary Structure prediiction
Israel, Nir Friedman's Group, " Using Bayesian Network to Analyze Expression Data
Nir Friedman's Science Survey , " Inferring Cellular Networks Using Probabilistic Graphical Models,
Nir Friedman's Group, " Class Discovery in Gene Expression Data
Nir Friedman's Group, " Tissue Classification with Gene Expression Profiles
Other Nir Friedman's Group Publications , "
MIT: Gifford / Jaakkola, " Pathway modeling
Stanford: Eran Segal, Daphne Koller et al, " Expression Analysis, Module Discovery, Pathway analysis
Harvard: Zak Kohane's Group, " Relevance Networks and Bayesian Gene Expression Analysis
From CMU , " Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data (2000)
From Berkeley Kevin Murphy and S. Mian , " Modeling gene expression data using dynamic bayesian networks (1999)
Duke: Alex Hartemink , " Pathways and Bayes nets
More from Duke: Dobra and West , " Graphical model-based gene clustering and metagene expression analysis
More from Duke: Dobra and West , " Sparse graphical models for exploring gene expression data.
Japan: Miyano's Group " Combining Bayes Network and Regression
Japan: S. Miyano's Group " Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks
Japan: Miyano's Group " Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection
From U. Penn , " Estimating genomic coexpression networks using first-order conditional independence
From France , " Gene networks inference using dynamic Bayesian networks
From Norway , " MGraph: graphical models for microarray data analysis
From Japan , " Cluster Inference Methods and Graphical Models Evaluated on NCI60 Microarray Gene Expression Data
Dirk Husmeier , " Dynamic Bayes Networks and Microarray Analysis
England , " Two-Stage Bayesian Networks for Metabolic Network Prediction
From Switzerland , " Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana
Boston U: Pavlovic, V., A. Garg and S. Kasif, " A Bayesian Framework for Combining Gene Predictions", Computational Genomics Nov. 2000.
MIT: Jaakkola's Lab , " Physical network models and multi-source data integration
MIT: Jaakkola's, Gifford, Hartemik , " Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Networks
Stanford/Princeton: Russ Altman, Olga royanskaya, et al , " Data Integration using Bayes Networks -- Local Integration
Yale: Gerstein's Lab et al , " Integration of genomic datasets to predict protein complexes in yeast
Gerstein's Lab et al , " Bayes Networks Used to Integrate Data -- for protein protein interaction
USC: Chen's Lab et al , " Markov Random Field Approaches for Integration -- MRFs are undirected Bayes networks
Boston University, "Whole-genome annotation by using evidence integration in functional-linkage networks", " Hopfield Networks and Bolzmann machines have natural probabilistic analogs
DIMACS WORKSHOP ON INTEGRATION , " Various Integration approaches including Bayes Networks
Boston University: S. Letovsky, S. Kasif, "A Probabilistic Approach to Gene Function Assignment and Propagation in Protein Interaction Networks", Bioinformatics 2003. "
USC: Chen's Lab et al , " Markov Random Field Approaches for Integration -- Undirected Bayes netowrks
MIT Broad, Tarjei Mikkelsen et al, Improving genome annotations using phylogenetic profile anomaly detection "
Cai, D., B. Kao, S. Kasif and A. Delcher, "Modeling Splice Sites Using Bayes Networks'', Bioinformatics, 2000.
J. Zhang, V. Pavlovic, C. Cantor, S. Kasif, "Cross Species Gene Identification in Human and Mouse Sequences using Evidence Integration Frameworks, Genome Research, 2003.
Tom Kepler and Friends in Genome Biology , " Identification and utilization of arbitrary correlations in models of recombination signal sequences
Nir Friedman's Group, " Modeling Dependencies in Protein-DNA Binding Sites
Canada, B. Frey, M. Quaid, Hughes et al , " GenRate: A generative model that finds and scores genes by jointly accounting for the expression and genomic arrangement of putative exons
Denmarks : Graphical Models for Genetic Analyses , "
Becker, Dan Geiger & Alejandro Schaffer, " Automatic Selection of Loop Breakers for Genetic Linkage Analysis (2000)
Israel: Dan Geiger's Lab at the Technion, " Superlink Program
Oxford : Recombination Analysis Using Directed Graphical Models , "
Oxford : Phylogenetic evidence for recombination in dengue virus , "
Oxford : PAL Software , "
Max Planck: Likelihood Analysis of Phylogenetic Networks Using Directed Graphical Models , "
Berkeley, Multiple-sequence functional annotation and the generalized hidden Markov phylogeny , "
Berkeley, Bayesian Haplotype Inference via the Dirichlet Process , "
Utah, Graphical Modeling of the Joint Distribution of Alleles at Associated Loci , "