Skip to main content

Mahmud Sami Aydin: Efficient and Accurate Sequence Analysis in Genomics and Transcriptomics Using Seeding Methods

Half-term seminar

Time: Wed 2026-05-13 13.30 - 14.30

Location: Albano, Cramer Room

Participating: Mahmud Sami Aydin

Export to calendar

Abstract: The exponential growth of genomic and transcriptomic sequence data has made traditional comparison algorithms computationally expensive. To overcome these bottlenecks, bioinformatics relies on sequence sampling and subsampling methods, namely "seeds". This seminar presents two novel computational tools that leverage seeding strategies to scale sequence analysis while preserving accuracy. First, StrobeClust will be introduced, a reference-free long-read RNA sequencing (lrRNA-Seq) isoform prediction method. By utilizing gap-tolerant "strobemer" seeds, StrobeClust clusters reads in isoform level given the gene level clustered reads in reasonable time while maintaining a highly favorable balance of precision and recall compared to state-of-the-art tools. Second, SyncmerDigest will be presented, a memory-efficient algorithm for the lossy compression of genomes. Using syncmers, this method deterministically reduces entire sequences into tiny digest sequences. This reduction preserves maximal unique matches and relative distances more than other reduction methods, enabling large-scale genomic comparisons with a fraction of the traditional memory footprint.