To date, modeling the behavior of cellular networks under laboratory conditions has received more attention than modeling how ecological factors affect diversity in natural environments. As we move toward the ultimate goal of integrating laboratory model organism studies with field data, a key challenge will be identifying the geochemical/ecologial factors that underlie community diversity, and the phylogenetic boundaries of natural ecological populations. Thus, computational frameworks for automatically learning models of sequence evolution in the context of metadata (e.g., site geochemistry/ecology) will need to be developed.
We have developed one such framework: AdaptML, a maximum-likelihood-based tool for studying both the sequence evolution and ecological history of a set of gene sequences. To perform this latter task, AdaptML employs a hidden-Markov-model-like strategy of assigning gene sequences to unseen states we term "habitats." These habitats are inferred automatically and designed to recapitulate sequence partitioning observed in the wild. AdaptML was initially developed and tested in collaboration with Martin Polz’’s lab, using data from 1027 strains of marine Vibrio hsp60 gene sequences harvested off the coast of Maine. We showed that AdaptML can be used to analyze this dataset and to help build models of Vibrio resource partitioning.1
References
1. Hunt, D. E.*, David L.A*, et al. Resource Partitioning and Sympatric Differentiation Among Closely Related Bacterioplankton. In press, Science (New York, NY) (2008).
