Fully Persistent Arrays

Feb 15, 2021

Persistent Data Structures preserve previous versions of themselves allowing us to revisit and audit any historical version. While we already did an exhaustive introduction in the previous essay, in this essay, we take a detailed look into how we can implement the Fully Persistent variant of the most common data structure - Arrays.

The essay is loosely based on the paper titled Fully Persistent Arrays for Efficient Incremental Updates and Voluminous Reads by Tyng-Ruey Chuang and we implement it using Backer's trick elaborated in the paper A Persistent Union-Find Data Structure.

Fully Persistent Arrays

An array is an abstract data structure consisting of n slots to hold a maximum of n elements such that each element is identified by at least one index. A typical array allows the following functions - create(n), update(index, value) and get(index).

The simplest form of an array, that we all are familiar with, is the Linear Array that is designed to hold elements in consecutive memory locations, leveraging spatial locality for faster and more efficient retrievals and scans. Before we jump into the implementation details of Fully Persistent Arrays, let's reiterate what exactly are Fully Persistent Data Structures.

Persistent Data Structures preserve previous versions of themselves allowing us to revisit and audit any historical version. Fully Persistent Data Structures allows access and modification to all the historical versions as well. It does not restrict any modifications whatsoever. This means we can typically revisit any historical version of the data structure, modify it like we are forking out a new branch.

https://user-images.githubusercontent.com/4745789/107117958-e112dd80-68a3-11eb-971b-58034e693f44.png

Fully Persistent Arrays are arrays that support Full Persistence which means it supports usual array operations while also allowing us to go back in time and make updates to any of the previous versions. We define the following operations on Fully Persistent Arrays -

create(n) - returns an array of size n having all the slots uninitialized
update(array, index, value) - returns a new array identical to array except for the element at the position index. The parent array array remains unaffected and is still accessible.
get(array, index) - returns the element present at the index index in array array

Implementing Fully Persistent Array

A naive way of implementing these arrays is to do a Copy-on-Write and keep track of historical versions. This approach very inefficient as it requires m times the memory required to hold n elements, where m is the total number of versions of the array.

Arpit’s Newsletter

Discussion about this post