Jasmijn Baaijens, PhD student in CWI’s Life Sciences and Health group, has developed a new computational tool that can reconstruct mutated versions of a virus. Viruses such as HIV, Zika and Ebola change their genomes (a complete set of DNA, including all of its genes) very quickly during an infection. As a result, an organism infected by a virus, will host multiple different mutated versions of this virus (so-called ‘viral quasispecies’). Because the virus adapts itself to its environment, it is hard to cure the viral infection. Determining which strains are present in an infection, is the start of determining a therapy protocol.
‘De novo’
In her thesis, Baaijens presents several approaches for haplotype reconstruction that operate in a "de novo" fashion. This means that the newly developed methods do not require any prior information on the genome content. The fact that a representative genome of the virus is not a prerequisite for the reconstruction makes this tool especially innovative. Due to high mutation rates and high genetic diversity of viruses, high-quality reference genomes are often not available at the time of a new disease outbreak. The lack of a suitable reference genome is usually a major hindrance for many viral quasispecies assembly approaches. The new tools form the first de novo approach to full-length viral quasispecies reconstruction and achieve results with an accuracy beyond any existing method. Accurate reconstruction of each of the individual viral haplotypes causing the infection could lead to improved treatment plans and the development of novel medicine.
Next generation sequencing and overlap graphs
Baaijens and her colleagues were able to revive so called overlap graph based techniques, which had been deemed impossible in modern, “next-generation sequencing” based settings because of the huge amounts of data involved in the analysis. By following the overlap graph paradigm they developed a method for assembling polyploid genomes. The idea to use overlap graphs was crucial, because only this allows, finally, to distinguish technical errors from strain-specific sequence mutations. The method outperforms all relevant state-of-the-art approaches, often quite drastically, with respect to the quality of the reconstructed strains. Strains reconstructed by the new computational tool contain significantly less errors.
Jasmijn Baaijens and Alexander Schönhuth explain their research in this video (from: 0:38)