Date & Time:
November 26, 2018 3:00 pm – 4:30 pm
11/26/2018 03:00 PM 11/26/2018 04:30 PM America/Chicago Niv Dayan: Scaling Write-Intensive Key-Value Stores

In recent years, the log-structured merge-tree (LSM-tree) has become the mainstream core data structure used by key-value stores to ingest and persist data quickly. LSM-tree enables fast writes by buffering incoming data in memory and flushing it as independent sorted batches to storage whenever the buffer is full. To enable fast reads, LSM-tree sort-merges batches in storage to restrict the number that reads have to search, and it also uses in-memory Bloom filters to enable point reads to probabilistically skip accessing batches that do not contain a target entry. In this talk, we show that such LSM-tree based designs do not scale well: as the data size increases, both reads and writes take increasingly long to execute. We pinpoint the problem to suboptimal core design: the Bloom filters have been attached to LSM-tree as an afterthought and are therefore not optimized to minimize the overall probability of access to storage. Point reads are therefore unnecessarily expensive. To compensate, more merging than necessary has to take place thereby making writes unnecessarily expensive. As a part of the CrimsonDB project at the Harvard DasLab, we developed two insights to address this problem. Firstly, we show that the optimal approach for allocating Bloom filters given any amount of available memory resources is to assign significantly lower false positive rates to smaller data batches. This shaves a logarithmic factor from point read cost thereby allowing key-value stores to scale better in terms of reads. Secondly, having lower false positive rates for smaller batches allows to merge newer data more lazily without compromising point read cost. This allows eliminating most of the merge overheads of LSM-tree thereby improving the scalability of writes. We close by describing a higher-level lessons from our work: while data structure design up until today has focused on the cost balance between reads and writes, the inclusion of memory utilization as a direct additional optimization objective opens up new avenues for asymptotic improvements, which studying reads and writes in isolation could not have revealed.

Niv Dayan

Niv Dayan is a postdoc at the Data Systems Lab at Harvard since September 2015. Before that he was a PhD student at the IT University of Copenhagen. Niv works at the intersection of systems and theory for designing efficient data storage. His current work is towards identifying and mapping the fundamentally best scalability trade-offs that are possible to achieve for key-value stores. His past work includes data structure design for internal metadata management in SSDs. holds a Visiting Scientist position at the University of Tennessee Knoxville since 2011.

Related News & Events

UChicago CS News

UChicago Partners On New National Science Foundation Large-Scale Research Infrastructure For Education

Dec 10, 2024
UChicago CS News

Saturdays with CSIL — How Undergraduates are Transforming CS Education for Local High School Students

Dec 05, 2024
UChicago CS News

UChicago Researchers Receive Google Privacy Faculty Award for Research on AI Privacy Risks

Nov 22, 2024
UChicago CS News

The Climate App Designed to Tackle Chatham’s Flooding Crisis

Nov 21, 2024
In the News

Globus Receives Multiple Honors in 2024 HPCwire Readers’ and Editors’ Choice Awards

Nov 20, 2024
In the News

Argonne Team Breaks New Ground in AI-Driven Protein Design

Nov 15, 2024
UChicago CS News

DOE Awards Fred Chong and his National Research Team $7.5M to Develop a SMART Software Stack to Control Quantum Computer Noise

Nov 12, 2024
UChicago CS News

CS/LSSG Showcases Sustainability Research and Education

Nov 11, 2024
UChicago CS News

Ph.D. Student Jibang Wu Receives the Stigler Center Ph.D. Dissertation Award for His Work Modeling the Incentive Structures of Reward and Recommendation–Based Systems

Oct 24, 2024
UChicago CS News

Rebecca Willett Receives the SIAM Activity Group on Data Science Career Prize

Oct 23, 2024
UChicago CS News

UChicago CS Researchers Shine at UIST 2024 with Papers, Posters, Workshops and Demonstrations

Oct 10, 2024
UChicago CS News

UChicago Scientists Receive Grant to Expand Global Data Management Platform, Globus

Oct 03, 2024
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube