Developing novel bioinformatics tools and pipelines for working with reference genomes and large sets of resequenced genomes.
Date published
Free to read from
Authors
Supervisor/s
Journal Title
Journal ISSN
Volume Title
Publisher
Department
Type
ISSN
Format
Citation
Abstract
Both reference genomes assembled for individual species and large, publicly maintained sets of resequenced genomes are of immense value to researchers. The former represent important milestones for research involving the species of interest and serve as ostensibly static points of reference for other data, while the latter serve as catalogues of genetic variation, enabling researchers to place their own data in a wider context. However, maintaining sets of resequenced genomes and ensuring their integrity as they undergo updates to match any new releases of their reference genome poses certain computational challenges, as does manipulating and comparing those large sets of genomes in general. This work reports on the detection and correction of significant errors which were introduced into resequenced tomato data in the course of updating them to a new version. It also introduces Tersect, a low-level utility optimized for manipulating and comparing large sets of resequenced genomic data, as well as Tersect Browser, a Web application which uses the high performance of Tersect, coupled with a higher-level indexing and precomputation scheme to allow for interactive comparison of large sets of resequenced genomes, giving biologists a tool capable of generating visualisations of genetic distance and phylogenetic relationships based on whole-genome sequence data from hundreds of genomes in seconds rather than hours.