skip to primary navigationskip to content

Durbin Group

Evolutionary and Computational Genomics


Genome sequences, evolution, cichlids, ancient DNA, computational genetics, bioinformatics, population genetics

Research interests

Our group works on genetic and evolutionary analysis of whole genome sequence data sets, using computational and mathematical approaches. Much of our research in the last five years has involved the discovery and analysis of human genetic variation using large data sets (1000 Genomes Project, UK10K, HGDP etc.), but more recently we have also begun working on the evolutionary genomics of other species, in particular cichlid fishes.

Human population history using modern and ancient genome sequences

We are all interested in where we come from. The patterns of genetic variation within and between individuals reflect population history over the last million years.  Using the wealth of new data available, we combine population genetic theory and whole genome sequences to make inferences about historic population expansions and bottlenecks, separations and merges, and other events that have led to the diversity in the world today.  One of the most exciting developments of the last decade has been the ability to obtain ancient DNA sequences from bones that are thousands or tens of thousands of years old, connecting genetic ancestry to specific times and places.  We have been involved both in development of methods, and in collaboration with others, in analysis of important data sets, and are continuing to pursue studies in both these areas.

Adaptive speciation in Malawi cichlid fishes

In the last few years the group has started applying whole genome sequence-based approaches to the evolution of cichlid fish in Lake Malawi, where over 500 species have radiated within the last million years.  Although morphologically, ecologically and behaviourally diverse, these species are genetically very close, with divergence only a fraction of a percent, so that they share polymorphism and can be crossed experimentally. So far with collaborators we have collected around 2500 specimens, now deposited in the University Zoology museum. Initial analysis of genome sequences of 73 species of Malawi cichlids suggests serial radiation from a generalist ancestor, with subsequent adaptive introgression between groups, in particular for deep water vision and oxygen transport traits . More recently we have sequenced over 1100 further samples from over 200 species to support a range of studies concerning speciation, hybridisation and selection, with the potential to connect molecular function to organismal differentiation.  There are opportunities in the group for both short term and long term projects in this remarkable and fascinating system.

Using long read sequencing to obtain reference de novo genome sequences across vertebrates

While the first wave of genome sequencing in the 1990s and 2000s gave us reference genomes for humans and the standard model organisms, and the second wave in the 2010s using short reads enabled sequence-based genetics in these systems, new single molecule long reads from Pacific Biosciences and Oxford Nanopore machines are now opening up our ability to sequence the rest of life. As part of the Vertebrate Genomes Project we are generating high quality and contiguity reference genome sequences for representatives of vertebrate orders, with our group currently focusing mainly on fish, amphibians and some rodents.  In parallel we are collaborating with others at the University and elsewhere to provide new high quality reference genomes for “non-model” research species and systems, with particular focus within the group on notothenioid and anabantoid fish. There are opportunities in this area both on the technical aspects of genome assembly, in particular of highly heterozygous genomes, and in comparative evolutionary analysis. Large scale diversity sequencing will also open up opportunities in modern and ancient environmental DNA studies to help connect ecological and evolutionary genetics.

Efficient software and data structures for large scale genomic data

The amount of DNA sequence data has increased exponentially over more than 20 years, consistently doubling in under a year and outpacing Moore’s law for improvements in computing power. This has necessitated repeated development of new more efficient software and data structures, and our group has been at the forefront of this. We have worked on a variety of approaches that use the Burrows Wheeler Transform and suffix array methods for compression, indexing and search (e.g. the bwa read mapping software), and recently we have been developing with collaborators the vg package to allow reads to be mapped to “variation graphs” which represent genetic variation as alternative paths through a network of sequences.  There are important connections between the representation of variation and the genome assembly problem that is critical to the previous topic.  We continue to be interested in innovative technical ideas in this area with opportunities for students, postdocs, visitors and collaborations.

Selected publications

  1. Narasimhan VM, Rahbari R, Scally A, Wuster A, Mason D, Xue Y, Wright J, Trembath RC, Maher ER, Heel DAV, Auton A, Hurles ME, Tyler-Smith C, Durbin R. (2017) Estimating the human mutation rate from autozygous segments reveals population differences in human mutational processes. Nat Commun. 8:303
  2. McCarthy S, et al.; Haplotype Reference Consortium. (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 48:1279-83
  3. Narasimhan VM, Hunt KA, Mason D, Baker CL, Karczewski KJ, Barnes MR, Barnett AH, Bates C, Bellary S, Bockett NA, Giorda K, Griffiths CJ, Hemingway H, Jia Z, Kelly MA, Khawaja HA, Lek M, McCarthy S, McEachan R, O'Donnell-Luria A, Paigen K, Parisinos CA, Sheridan E, Southgate L, Tee L, Thomas M, Xue Y, Schnall-Levin M, Petkov PM, Tyler-Smith C, Maher ER, Trembath RC, MacArthur DG, Wright J, Durbin R#, van Heel DA# (2016) Health and population effects of rare gene knockouts in adult humans with related parents. Science 352:474-7 #Co-corresponding authors.
  4. Schiffels S, Haak W, Paajanen P, Llamas B, Popescu E, Loe L, Clarke R, Lyons A, Mortimer R, Sayer D, Tyler-Smith C, Cooper A, Durbin R (2016) Iron Age and Anglo-Saxon genomes from East England reveal British migration history. Nat Commun. 7:10408
  5. Malinsky M, Challis RJ, Tyers AM, Schiffels S, Terai Y, Ngatunga BP, Miska EA, Durbin R, Genner MJ, Turner GF. (2015) Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake. Science 350:1493-8
  6. 1000 Genomes Project Consortium: Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. (2015) A global reference for human genetic variation. Nature 526:68-74
  7. Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22:549-556
  8. Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493-6

>> For full lists of publications you can link to PubMed at

>> and/or Google Scholar at


Updated 5 March 2018






Contact details

Group leader : Professor Richard Durbin

Department of Genetics,
University of Cambridge,
Downing Street,
Cambridge CB2 3EH,
United Kingdom


Tel.: +44 (0)1223 760252


Group members


Richard Durbin's publications on:


Google Scholar