Although sequencing technology and price performance per base-pair-sequenced continue to advance at an impressive rate, Finished whole genome sequencing projects are still costly and lengthy endeavors. Next Gen sequencing technology (and even next-next gen technology) isn’t addressing some of the common issues faced with creating a “Finished” quality genome, namely Contig Placement, Gap Closure and Validation. Addressing these issues takes several months and a substantial amount of the budget in a sequencing project.
Consider the current workflow for generating a Finished whole genome in the figure below.
As you can see, generating the initial sequence data is no longer the bottle neck. Small genomes can be sequenced using shot gun methods in a couple of days. After the initial assembly the hard part starts: Closing gaps between your contigs, navigating regions with a high number of repeats, resequencing for validation etc. These tasks can represent over 50% of the length of a sequencing project and over 50% the cost!
I wanted to see if other researchers had found novel and/or more cost effective ways of dealing with these challenges. Especially labs that are resource constrained. I came across an interesting paper titled Finishing genomes with limited resources: lessons from an ensemble of microbial genomes that was published last year in BMC Genomics1. It discusses how using Whole Genome Mapping technology, also called Optical Mapping, can significantly reduce the length of sequencing projects. Before we get into what the paper presents let’s learn more about Whole Genome (Optical) Mapping.
Whole Genome (Optical) Mapping is a de novo process that generates whole genome, ordered, restriction maps with no requirement for previous sequence information & provides a comprehensive view of genomic architecture. An Optical Map or Whole Genome Map (WGM) is displayed in the unique MapCode™ pattern below where the vertical lines indicate the locations of restriction sites, and the distance between the lines represent the fragment size.
The WGM acts as a scaffold for your sequencing project. How? The contigs generated from your sequence assembly are converted to Optical Maps in silico and then are aligned and assembled to the de novo WGM. The WGM acts as an independent validation tool for contig placement and length of repeat regions while also helping to easily identify gaps in your assembly. By taking unordered sequence contigs and aligning them to an ordered WGM you quickly orient the contigs. When aligned, you can then identify any possible misassemblies that may have occurred in the initial assembly portion of your project.
You might be wondering how the scaffold concept as it applies to Whole Genome Mapping is different from scaffolds obtained with mate-pairs. To quote Nagarajan et al in the paper referenced above “It should be noted that unlike scaffolds obtained with mate-pairs, the scaffolds here are genome-wide and one per genome and therefore well suited for finishing efforts.”(p3) Additionally “While paired-end reads can be invaluable to scaffold contigs, they provide local order information [only] and using them to recreate a genome wide ordering of contigs is computationally challenging.”(p7) Finally “In addition, for time-critical applications in a biodefense or clinical setting, the time to construct paired-end libraries can be a limiting factor. In such settings, Optical Restriction Mapping , a form of ordered restriction maps (see Figure 5), can be a promising alternative as it can quickly provide genome wide restriction site information that can be used to order and orient contigs .”(p7)
We are starting to get a picture of how using just one single WGM can save time and reduce the need for computationally intensive bioinformatic steps thereby saving money. Let’s look in more detail about how these time savings are gained.
Contig Placement and Validation
With shotgun sequencing, genomic rearrangements, like inversions, can be missed due to incorrect reconstruction of repeats. A WGM can help you validate your whole genome and identify any possible inversions, insertions, translocations and deletions that sequencing may not have identified. In the example below the contiguous map in the middle was generated de novo using Whole Genome (Optical) Mapping technology. The contigs were generated in silico. Notice the missassemblies for example in Contig980. You can see an example of an inversion in Contig1253. You can also see examples of insertions, deletions and run of the mill gaps that will have to be spanned in resequencing efforts.
Another example Nagarajan et al describe is using WGMs to reduce the number of PCR experiments needed. “Working with the original assembly (59 large contigs) could have necessitated on the order of 592 ≈ 3000 PCR experiments.” (p4) That’s a lot of PCR kits and a lot of time. Using WGMs as scaffolds, they were able to finish the genome using only 43 PCR experiments and 26 sequencing reactions to close 33 of the gaps. “From a finishing perspective, these (Optical Mapping) scaffolds are particularly useful, as for a set of n contigs, they help reduce the number of PCR experiments needed from roughly n2 to n.” (p7)
Let’s go back to our original figure describing the steps and average time to complete a sequencing project, this time comparing current methods to a workflow that uses a WGM.
As you can see, using a WGM as a scaffold reduces the time significantly by eliminating or greatly reducing the dependence and cost of generating paired-end libraries not to mention the bioinformatics muscle that is required with that approach. Plus having an accurate understanding of the gaps that need to be spanned in resequencing efforts reduces the number of PCR reactions thereby reducing the time and cost of gap closure. Finally the nature of having one whole, ordered contiguous scaffold makes validation inherently easier.
Currently there are many limitations when doing whole-genome sequencing projects. These issues include, but are not limited to: fragmented output of genomes, misassemblies of repeat regions, and limited resources to run these experiments. I’m confident that someday in the future sequencing technology will advance to address these issues. In the meantime Whole Genome (Optical) Mapping acts as a complementary technology to significantly reduce the time and cost associated with the issues discussed in this article.
1 Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. Nagarajan et al. BMC Genomics 2010, 11;242. Pubmed link