Our research work involves the use of machine learning techniques to improve the performance of flash-based storage systems. This improvement reflects in two major directions – improving reliability, and the response time of flash-based storage devices. Flash drives are used widely as a persistent form of storage in mobile devices, laptops, and cloud servers, and improving the performance of storage impacts the overall computing system and the experience of millions of end-users.
Reducing Workload Interference in Flash
Achieving high performance in virtualized data centers requires both deploying high throughput storage clusters, i.e. based on Solid State Disks (SSDs), as well as optimally consolidating the workloads across storage nodes. The current workload scheduling mechanisms used in production do not have the intelligence to optimally allocate block storage volumes based on the performance of SSDs. In this work, we work on an autonomous performance modeling and load balancing system designed for SSD-based cloud storage. Our system takes into account the characteristics of the SSD storage units and constructs hardware-dependent workload consolidation models. Thus, our system is able to predict the latency caused by workload interference and the average latency of concurrent workloads. Furthermore, our system leverages an I/O load balancing algorithm to dynamically balance the volumes across the cluster.
Neural Network-based Prefetching
The other project we are working on is titled “Smart Prefetching using Neural networks” and involves improving response time by smart prefetching using deep learning techniques. Prefetching in computer science is a technique for speeding up fetch operations by beginning a fetch operation whose result is expected to be needed soon. We use deep neural networks to learn the spatial IO access patterns in flash devices and optimize it to predict future accesses. Our novel approach of prefetching is a significant improvement over traditional algorithm based prefetchers and helps reduce the response time of the devices. The paper discussing our findings was accepted at the “European Conference on Machine Learning” (August, 2020)
Surprise in Games
In this project, we propose the VCL(Violation of Expectation, Caught
Off Guard, and Learning) model for surprise in games and study how people perceive and react to surprising events in a video game. We present a framework for automatically generating levels for 2D platform games with embedding surprising content. We developed a tile-based parametric level generator for a popular 2D platform game, (Super Mario Bros) and generated customized levels based on different metrics affecting level geometry ((linearity, the density of enemies, the density of game elements, pattern variation, and gap frequency) and physics (gravity, and player speed) in order to study their correlation with surprise. algorithm based prefetchers and helps reduce the response time of the devices. The paper discussing our findings was accepted at the “Foundations of Digital Games” (August, 2017)
Github Link: https://github.com/lucasnfe/science-mario
Predicting SSD failures
To improve reliability, we propose an approach of automatically predicting future drive failures at Google’s Datacenters. We collected telemetric data collected from over 40,000 drives for a period of 16 weeks and used it to create models for predicting anomalous behavior in drives and predict failures. This work was done in collaboration with Samsung Semiconductor Inc. who have been associated with UCSC lab for the last two years. The paper detailing our work was accepted at the “Symposium on Cloud Computing” (October, 2020).
Reducing Write Amplification by Death Time prediction of LBAs
Our latest work aims to reduce the write amplification in flash devices by use of Deep learning techniques to predict death time of logical block addresses (LBAs). The death time prediction per write IO is sent to the Flash Translation Layer (FTL), which optimizes the physical page placements. It is done by mapping the LBAs which are going to get overwritten together within a block reducing the internal need for garbage collection (GC). This results in fewer internal writes due to GC, resulting in improved reponse time and extended device lifetime. The paper discussing our findings was accepted at the “14th ACM International Systems and Storage Conference” (June, 2021)