Mahmud Sami Aydin: Efficient and Accurate Sequence Analysis in Genomics and Transcriptomics Using Seeding Methods
Half-term seminar
Tid: On 2026-05-13 kl 13.30 - 14.30
Plats: Albano, Cramer Room
Medverkande: Mahmud Sami Aydin
Abstract: The exponential growth of genomic and transcriptomic sequence data has made traditional comparison algorithms computationally expensive. To overcome these bottlenecks, bioinformatics relies on sequence sampling and subsampling methods, namely "seeds". This seminar presents two novel computational tools that leverage seeding strategies to scale sequence analysis while preserving accuracy. First, StrobeClust will be introduced, a reference-free long-read RNA sequencing (lrRNA-Seq) isoform prediction method. By utilizing gap-tolerant "strobemer" seeds, StrobeClust clusters reads in isoform level given the gene level clustered reads in reasonable time while maintaining a highly favorable balance of precision and recall compared to state-of-the-art tools. Second, SyncmerDigest will be presented, a memory-efficient algorithm for the lossy compression of genomes. Using syncmers, this method deterministically reduces entire sequences into tiny digest sequences. This reduction preserves maximal unique matches and relative distances more than other reduction methods, enabling large-scale genomic comparisons with a fraction of the traditional memory footprint.
