Bayes Networks and Graphical Models in Computational Molecular Biology and Bioinformatics

  Survey of Recent Research

  DNA Analysis, Genetics, System Biology, Gene Expression, Functional Annotation, Protein-protein interaction, Haplotype Inference, Pedigree Analysis, Data Integration

******************************** WARNING!!! ********************************


******************************** WARNING!!! ********************************

DISCLAIMER: There is not attempt to be complete and we do not provide a strong endorsement of any of the publications below (including our own work). It is up to the readers to form their opinions and assess the strength and limitations of the work described in these papers. It is typical than in early publications on any given topic the authors are somewhat optimistic and perhaps even naive. Iif you want to add your paper to this resource please send email to kasif at bu . edu with the bibliographical information.
Additionally, one should not confuse graphical models and Bayesian statistics. While the two areas are related and one can do Bayesian Statistics with Graphical Models, the relationship is not strict. One can do Bayesian inference with other models and we can use "non-Bayesian" maximum likelihood approaches for learning or inference in graphical models. There is no attempt to include a full reference set to Bayesian statistics but see BUGS as a starting point.

In 1992 Prof. Simon Kasif and two former students Arthur Delcher and William (Bill) Xsu at Johns Hopkins University described one of the earliest applications of Bayes networks (graphical models) to modern problems in bioinformatics. This application of graphical models to early genetics has some history. In a completely different setting S. Wright described the so called path diagrams (an ancestor of Causal Markov Networks) in a particular application of genetics in 1934 or even earlier. Initially our 1992 AAAI submission described the application of Bayes Networks to modeling protein secondary structure. Our follow-up paper in ISMB 2003 describes the use of Bayes networks for perturbational analysis to simulate in-silico mutagenesis and HMM modeling of secondary structure. Bayesian networks generalize HMMs (in fact the relationship of our simple model to an HMM was suggested to us first by Bill's uncle Kai-Fu Lee, one of the world experts on speech recognition). The number of applications of Bayes nets in computational biology is growing. These include applications such as integration of diverse biological databases, functional genomics, microarray analysis, comparative genomics genetics, linkage analysis, system biology and other key computational problems in molecular biology

This page provides a set of links to a few key papers. We make no claim to the relationship of our early work to these more recent applications of Bayes Networks in Bioinformatics. The methods are different, the models are different and the biology is different. There is also no attempt to provide a complete and comprehensive survey, just a few links that can provide a possible starting point. As mentioned above HMMs are a special case of Dynamic Bayes Nets. The number of applications of HMMs in computational biology is rather extensive and the book by Durbin et al is an excellent starting point. Our initial work is often not recognized as part of the HMM literature because many people are not aware of the connection between HMMs and Bayes Networks. This connection is documented in a number of recent books and papers on Bayes Nets e.g.:

Probabilistic Independence Networks for Hidden Markov Probability Models (1996)

Intro and many refs to Graphical models can be found at Kevin's Murphy's site at MIT

Protein Modeling

Delcher, A., S. Kasif, H. Goldberg and W. Xsu,  "Protein Secondary-Structure Modeling with Probabilistic Networks", International Conference on Intelligent Systems and Molecular Biology, pp. 109--117, 1993 . One of the first applications of Hidden Markov models and first application of probabilistic networks (Bayes Nets) to modeling proteins.

Salzberg,S., D. Searl and S. Kasif,  "Computational Methods in Molecular Biology",Elsevier Publ., 1998.

England: Ghahramani's Group, " Protein Secondary Prediction with Segmental Models

Singapore, " Protein Structure and Fold Prediction with Tree Augmented Bayesian classifier

Israel, Hanover, Weiss , " Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions

Israel, Hanover, Weiss , " Approximate Inference and Protein Folding

UC Irvine, Baaldi Lab , " Secondary Structure prediiction

System Biology, Functional Genomics, Gene Expression Analysis, Protein Protein Interaction

Nir Friedman (Hebrew University), Tommi Jaakkola and David Gifford (MIT) wrote several of the key papers using Bayes Networks for Gene Expression Analysis and Pathway Modeling. While initially it appeared to many that this approach is only a minor generalization of boolean networks for pathway modeling (a traditonal approach used by chemical engineers to abstract metabolic and biochemical networks) over time a number of advantages of Bayes networks have emerged including: convenient modeling of uncertainty, hidden variables, automated learning, inference of regulatory modules, and others. Many confuse probabilistic modeling with making an explicit assumption that the biological system being analyzed is stochastic. Indeed many biological systems are noisy and this might be a reasonable assumption. However, even if the system is completely deterministic, because a number of hidden variables effecting its behavior are unknown the observed data might be best explained with a probabilistic model.

Gene Expression (Microarray) Analysis, Networks, Pathways

Israel, Nir Friedman's Group, " Using Bayesian Network to Analyze Expression Data

Nir Friedman's Science Survey , " Inferring Cellular Networks Using Probabilistic Graphical Models,

Nir Friedman's Group, " Class Discovery in Gene Expression Data

Nir Friedman's Group, " Tissue Classification with Gene Expression Profiles

Other Nir Friedman's Group Publications , "

MIT: Gifford / Jaakkola, " Pathway modeling

Stanford: Eran Segal, Daphne Koller et al, " Expression Analysis, Module Discovery, Pathway analysis

Harvard: Zak Kohane's Group, " Relevance Networks and Bayesian Gene Expression Analysis

From CMU , " Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data (2000)

From Berkeley Kevin Murphy and S. Mian , " Modeling gene expression data using dynamic bayesian networks (1999)

Duke: Alex Hartemink , " Pathways and Bayes nets

More from Duke: Dobra and West , " Graphical model-based gene clustering and metagene expression analysis

More from Duke: Dobra and West , " Sparse graphical models for exploring gene expression data.

Japan: Miyano's Group  " Combining Bayes Network and Regression

Japan: S. Miyano's Group  " Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks

Japan: Miyano's Group  " Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection

From U. Penn , " Estimating genomic coexpression networks using first-order conditional independence

From France , " Gene networks inference using dynamic Bayesian networks

From Norway , " MGraph: graphical models for microarray data analysis

From Japan , " Cluster Inference Methods and Graphical Models Evaluated on NCI60 Microarray Gene Expression Data

Dirk Husmeier , " Dynamic Bayes Networks and Microarray Analysis

England , " Two-Stage Bayesian Networks for Metabolic Network Prediction

From Switzerland , " Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana

Biological Data Integration

Boston U: Pavlovic, V., A. Garg and S. Kasif, " A Bayesian Framework for Combining Gene Predictions",  Computational Genomics Nov. 2000.

MIT: Jaakkola's Lab , " Physical network models and multi-source data integration

MIT: Jaakkola's, Gifford, Hartemik , " Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Networks

Stanford/Princeton: Russ Altman, Olga royanskaya, et al , " Data Integration using Bayes Networks -- Local Integration

Yale: Gerstein's Lab et al , " Integration of genomic datasets to predict protein complexes in yeast

Gerstein's Lab et al , " Bayes Networks Used to Integrate Data -- for protein protein interaction

USC: Chen's Lab et al , " Markov Random Field Approaches for Integration -- MRFs are undirected Bayes networks

Boston University, "Whole-genome annotation by using evidence integration in functional-linkage networks",  " Hopfield Networks and Bolzmann machines have natural probabilistic analogs

DIMACS WORKSHOP ON INTEGRATION , " Various Integration approaches including Bayes Networks

Protein Protein Interaction and Functional Annotation

Boston University: S. Letovsky, S. Kasif, "A Probabilistic Approach to Gene Function Assignment and Propagation in Protein Interaction Networks", Bioinformatics 2003.  "

USC: Chen's Lab et al , " Markov Random Field Approaches for Integration -- Undirected Bayes netowrks

MIT Broad, Tarjei Mikkelsen et al, Improving genome annotations using phylogenetic profile anomaly detection "  

DNA Sequence Analysis

Cai, D., B. Kao, S. Kasif and A. Delcher,  "Modeling Splice Sites Using Bayes Networks'', Bioinformatics, 2000.

J. Zhang, V. Pavlovic, C. Cantor, S. Kasif,  "Cross Species Gene Identification in Human and Mouse Sequences using Evidence Integration Frameworks, Genome Research, 2003.

Tom Kepler and Friends in Genome Biology , " Identification and utilization of arbitrary correlations in models of recombination signal sequences

Nir Friedman's Group, " Modeling Dependencies in Protein-DNA Binding Sites

Canada, B. Frey, M. Quaid, Hughes et al , " GenRate: A generative model that finds and scores genes by jointly accounting for the expression and genomic arrangement of putative exons

Genetics, Phylogeny Linkage Analysis

Denmarks : Graphical Models for Genetic Analyses , "

Becker, Dan Geiger & Alejandro Schaffer, " Automatic Selection of Loop Breakers for Genetic Linkage Analysis (2000)

Israel: Dan Geiger's Lab at the Technion, " Superlink Program

Oxford : Recombination Analysis Using Directed Graphical Models , "

Oxford : Phylogenetic evidence for recombination in dengue virus , "

Oxford : PAL Software , "

Max Planck: Likelihood Analysis of Phylogenetic Networks Using Directed Graphical Models , "

Berkeley, Multiple-sequence functional annotation and the generalized hidden Markov phylogeny , "

Berkeley, Bayesian Haplotype Inference via the Dirichlet Process , "

Utah, Graphical Modeling of the Joint Distribution of Alleles at Associated Loci , "

More soon.


Last modified