Skip to main content

Fengzhu Sun: New development in alignment-free genome and metagenome comparison

Time: Fri 2018-09-21 14.00 - 14.45

Location: Seminar Hall Kuskvillan, Institut Mittag-Leffler

Participating: Fengzhu Sun, University of Southern California

Export to calendar

Next generation sequencing (NGS) technologies have generated enormous amount of shotgun read data and assembly of the reads is challenging, especially for organisms without reference sequences and metagenomes. We develop novel alignment-free and assembly-free statistics for genome and metagenome comparison. The key idea is to remove the background word counts from the observed counts when comparing genomes and metagenomes. Markov chains (MC) are usually used to model background molecular sequences and we develop a new statistical method to estimate the order of MCs based on short read data. The alignment-free sequence comparison statistics are used to study the relationships among species, to assign virus to their hosts, to classify metagenomes and metatranscriptomes, as well as to find the source of white oak trees. In all applications, our novel methods yield results that are consistent with biological knowledge. Thus, our statistics provide powerful alternative approaches for genome and metagenome comparison based on NGS short reads.