Abstract (eng)
The advent of high-throughput sequencing has enabled us to study genomic variation at an unprecedented scale, providing us with insight into how genomes evolve, how phenotypes are influenced by genetic changes, and the mechanisms behind countless diseases. Large-scale projects, like the 1000 genomes project, or similar projects for other model organisms sequenced thousands of genomes and cataloged the genetic variation they found. Most of these projects use a reference genome-based analysis approach where short high-quality sequencing reads are aligned to a high-quality reference genome like the human genome. Differences between the sequenced and the reference genome - mostly single nucleotide changes or small variants - are then detected using specialised tools. Many analysis tools have been developed and optimised to efficiently analyse the immense amounts of data produced by these projects. However, these tools are often not applicable to experimental setups where either no high-quality reference genome exists, other less accurate sequencing technologies are used, more complex genetic variations are studied, or other sources of noise cause higher mismatch rates between the reads and the reference. In this thesis we address this issue by introducing short and long read mapping tools that handle higher numbers of differences caused by sequencing error, evolutionary distance, or custom experimental designs, while offering the same ease of use and short runtimes as more specialised tools. Furthermore, we show how our analysis tools can enable researchers to study a wide range of genetic variations in model organisms as well as non-model organisms.