Convertible Codes: Enabling efficient conversion of coded data in distributed storage

Authors

Francisco

Maturana

Carnegie Mellon University

Rashmi

Vinayak

Carnegie Mellon University

Abstract

In large-scale data storage systems, erasure codes are employed to store data in a redundant fashion for reliability. In this setting, a set of k data blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on distinct storage devices. In our recent work, we showed that the failure rate of storage devices vary significantly over time, and that dynamically tuning the parameters n and k of the code provides considerable reduction in storage cost. However, traditional codes suffer from prohibitively high resource overheads in changing the code parameters on already encoded data. Motivated by this application, we present a new theoretical framework formalizing "code conversion"-the process of converting data encoded with an [n, k] code into data encoded using a code with different parameters [n', k'] while maintaining desired decodability properties, such as the maximum-distance-separability (MDS) property. We introduce "convertible codes", a new class of codes that enable resource efficient conversions. We then derive lower bounds on the access and bandwidth costs of conversion for a wide range of parameter regimes and present explicit optimal constructions matching these lower bounds.