Nicolas Buchin: Improve the resizing of the Backpack Quotient Filter
Time: Wed 2025-05-28 13.00 - 14.00
Location: Room Cramer
Participating: Nicolas Buchin
Abstract
The Backpack Quotient Filter (BQF) [LAG+24] is a data structure designed for indexing k-mers, which are subsequences of length k from any genomic se- quence. Derived from the Counting Quotient Filter [PBJP17], this structure offers several key advantages: 1. Stores a proxy of the abundance of k-mers; 2. Limits the memory usage; 3. Presents high performance in terms of insertion and query speed. Compared, for instance, to Bloom Filters [Blo70], this struc- ture additionally offers the possibility to iterate over indexed k-mers and has negligible false positive calls. The BQF is dynamic in nature, allowing for the addition of new items at any time. Upon reaching saturation of the structure, its capacity can be doubled. However, in the initial version of the BQF, this operation was not optimized, offering opportunities for improvements. In this work, we will present a novel algorithm that optimizes the resizing operation of the BQF data structure. We capitalize on precomputed metadata. This makes it possible to efficiently determine the location of previously inserted values in the new resized structure. This drastically limits the number of steps required during this resizing operation, thus limiting the computation time, and also limiting the memory impact. The results show that this new approach enables one to gain a factor 4× to 8× in term of computation time. Notably, theses tests have shown that the proposed enhancements enable to gain a factor 10 in term of memory usage. It should be noted that this algorithm is not limited to the BQF, but can also be adapted to the Rank and Select Quotient Filter [PBJP17] and its variations, including the Counting Quotient Filter.
References
[Blo70] Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, 1970.
[LAG+24] Victor Levallois, Francesco Andreace, Bertrand Le Gal, Yoann Dufresne, and Pierre Peterlongo. The backpack quotient filter: a dynamic and space-efficient data structure for querying k-mers with abundance. bioRxiv, 2024.
[PBJP17] Prashant Pandey, Michael A Bender, Rob Johnson, and Rob Patro. A general-purpose counting filter: Making every bit count. In Proceedings of the 2017 ACM international conference on Management of Data, pages 775–787, 2017.