Vervet Genome Sequencing Project (NIH)
The Vervet Genome Sequencing Project has been assigned by the NHGRI to the The Genome Institute, Washington University School of Medicine. The project is being led by Dr Wes Warren and Dr George Weinstock.
The VGSP will:
- generate a high quality genome sequence for a reference animal from the VRC
- identify genome wide SNPs for vervet subspecies
- detect genome wide genome rearrangements
- initiate sequence-based transcriptome analysis resources.
The VGSP plan has been developed in coordination with the NCRR integrated vervet genomics application.
Progress on the vervet reference genome assembly
The vervet reference genome assembly is derived from an adult male monkey of the VRC pedigreed colony, now at Wake Forest. Genome sequencing has utilized a variety of technologies (Roche/454, Illumina, conventional Sanger sequencing) and strategies (paired ends of different insert sizes), and sequencing experiments have benefited from improved protocols and extended read lengths as they have been implemented over the course of the project. DNA from the same animal has been cloned as a commercially available BAC library (CH252).
In April 2012 we generated a Newbler assembly based primarily on Roche/454 long read, Roche/454 8kb paired end, and BAC end sequence data sets. As of October 2012 we have also generated an ALLPATHS assembly (5.0) based on Illumina paired end data sets and the BAC end sequences. In each case we went through a series of iterations of BAC end re-alignments to identify additional scaffold joins to build longer range contiguity of our genomic sequence scaffolds.
We are currently displaying Newbler 184.108.40.206 and ALLPATHS 5.0.1 on the genome browser.
Presently we are merging Newbler 220.127.116.11 and ALLPATHS 5.0.2 and applying additional computational gap closing protocols to develop a high quality assembly that will be submitted to NCBI for public access and Ensembl for annotation. Our objective is to have the assembly finalized and submitted before February 28, 2013, with the reference assembly becoming fully accessible by June 30, 2013.
Characteristics of the two assemblies include:
|Newbler 18.104.22.168 (April 2012)||ALLPATHS 5.0.2 (December 2012)|
|>1Mb Scaffolds:||380 (2.74Gb, 94.8% of assembly)||150 (2.67Gb, 98.5% of assembly)|
|BAC end concordance:||91.8% (of 164,869 clones))||98.4% (of 161,660 clones)|
Figure 1. Graphical representation of ALLPATHS (red) and Newbler (blue) assemblies. Scaffolds > 1Mb are arranged by length with cumulative length plotted.
MHC regional assembly
Separately from whole genome assemblies, we have sequenced and assembled a BAC path spanning the MHC region.