Durbin Group

Evolutionary and Computational Genomics

Keywords

Genome sequences, evolution, cichlids, ancient DNA, computational genetics, bioinformatics, population genetics

Research interests

Our group works on genetic and evolutionary analysis of whole genome sequence data sets, using computational and mathematical approaches. Much of our research in the last five years has involved the discovery and analysis of human genetic variation using large data sets (1000 Genomes Project, UK10K, HGDP etc.), but more recently we have also begun working on the evolutionary genomics of other species, in particular cichlid fishes.

Human population history using modern and ancient genome sequences

We are all interested in where we come from. The patterns of genetic variation within and between individuals reflect population history over the last million years. Using the wealth of new data available, we combine population genetic theory and whole genome sequences to make inferences about historic population expansions and bottlenecks, separations and merges, and other events that have led to the diversity in the world today. One of the most exciting developments of the last decade has been the ability to obtain ancient DNA sequences from bones that are thousands or tens of thousands of years old, connecting genetic ancestry to specific times and places. We have been involved both in development of methods, and in analysis of important data sets in collaboration with others, and are continuing to pursue studies in both these areas.

Adaptive speciation in Malawi cichlid fishes

In the last few years the group has started applying whole genome sequence-based approaches to the evolution of cichlid fish in Lake Malawi, where over 500 species have radiated within the last million years. Although morphologically, ecologically and behaviourally diverse, these species are genetically very close, with divergence only a fraction of a percent, so that they share polymorphism and can be crossed experimentally. So far with collaborators we have collected around 2500 specimens, now deposited in the University Zoology museum. Initial analysis of genome sequences of 73 species of Malawi cichlids suggests serial radiation from a generalist ancestor, with subsequent adaptive introgression between groups, in particular for deep water vision and oxygen transport traits https://www.biorxiv.org/content/early/2017/12/04/143859 . More recently we have sequenced over 1100 further samples from over 200 species to support a range of studies concerning speciation, hybridisation and selection, with the potential to connect molecular function to organismal differentiation. There are opportunities in the group for both short term and long term projects in this remarkable and fascinating system.

Using long read sequencing to obtain reference de novo genome sequences across vertebrates

While the first wave of genome sequencing in the 1990s and 2000s gave us reference genomes for humans and the standard model organisms, and the second wave in the 2010s using short reads enabled sequence-based genetics in these systems, new single molecule long reads from Pacific Biosciences and Oxford Nanopore machines are now opening up our ability to sequence the rest of life. As part of the Vertebrate Genomes Project we are generating high quality and contiguity reference genome sequences for representatives of vertebrate orders, with our group currently focusing mainly on fish, amphibians and some rodents. In parallel we are collaborating with others at the University and elsewhere to provide new high quality reference genomes for “non-model” research species and systems, with particular focus within the group on notothenioid and anabantoid fish. There are opportunities in this area both on the technical aspects of genome assembly, in particular of highly heterozygous genomes, and in comparative evolutionary analysis. Large scale diversity sequencing will also open up opportunities in modern and ancient environmental DNA studies to help connect ecological and evolutionary genetics.

Efficient software and data structures for large scale genomic data

The amount of DNA sequence data has increased exponentially over more than 20 years, consistently doubling in under a year and outpacing Moore’s law for improvements in computing power. This has necessitated repeated development of new more efficient software and data structures, and our group has been at the forefront of this. We have worked on a variety of approaches that use the Burrows Wheeler Transform and suffix array methods for compression, indexing and search (e.g. the bwa read mapping software), and recently we have been developing with collaborators the vg package [see ref 1 below] to allow reads to be mapped to “variation graphs” which represent genetic variation as alternative paths through a network of sequences. There are important connections between the representation of variation and the genome assembly problem that is critical to the previous topic. We continue to be interested in innovative technical ideas in this area with opportunities for students, postdocs, visitors and collaborations.

Evolutionary and Computational Genomics

Keywords

Research interests

Human population history using modern and ancient genome sequences

Adaptive speciation in Malawi cichlid fishes

Using long read sequencing to obtain reference de novo genome sequences across vertebrates

Efficient software and data structures for large scale genomic data

Contact details

Contact us

About this site

Connect with us

Study at Cambridge

About the University

Research at Cambridge