Abstract (eng)
The pairwise alignment of protein sequences is a central task in bioinformatics, which helps to identify biological relationships between protein sequences. Alignments can be performed with fast heuristics or with the exact Smith-Waterman algorithm, which is guaranteed to find the optimal local alignment between two sequences. Because of the quadratic time complexity of the Smith-Waterman algorithm, fast implementations are needed for large-scale protein similarity searches. Therefore, various parallelization approaches and adaptions for hardware accelerators (GPU, FPGA, Xeon Phi) have been developed for the Smith-Waterman algorithm. In this work, the performance of some of the fastest CPU and GPU Smith-Waterman methods are compared under various parameter settings. The main result is that the fastest CPU method SWIPE outperforms the fastest GPU method CUDASW++ 3 on a full compute node with 24 CPU cores and 6 GPUs. In the SIMAP (Similarity Matrix of Proteins) project, similarities between proteins of complete genomes are computed using the Smith-Waterman algorithm with compositional score adjustment. Because of the high runtime of the SIMAP workflow, in this work, possibilities of reducing the runtime are explored. Runtime profiling of the SIMAP workflow identified an inefficient compilation of a script and by optimizing the compilation process the runtime could be reduced to about a third.