Skip to content
Martin Pool edited this page Apr 26, 2022 · 2 revisions

Conserve next format update

These don't necessarily all have to be in one format change, but to avoid having too many formats floating around maybe that would be best.

All are of course subject to testing and measurement.

  • Move block and index compression to zstd.

  • Perhaps use fewer blockdir subdirectories: perhaps 2 hex chars, say. On modern filesystems we could perhaps just have literally zero prefix subdirectories. However, there is one small advantage, which is that they enable us to measure progress in listing the blocks: after reading one subdirectory we've seen 1/256th of the blocks, on average.

  • Store indexes in protobufs using Prost. Json is pretty verbose, even if compression makes up for this. In particular hashes can then be stored as bytes.

  • Be explicit in the index which hash algorithm is being used in each hash, which would be cheap in protobufs and then would allow for a gradual transition to Blake3, which would be faster.

  • Store small files (less than say 256B or 1kiB) inline in the index, again as bytes? Avoids some IO round trips to read and write small files...

Reuse index blocks across bands

In periodic backups of large directories with relatively few changes, it can easily happen that most of the new space used is for new index blocks. It's typically not very much space compared to the total tree size, but still would be nice to reduce.

In earlier design drafts I had though to have multiple layers of backups, where the index for each can be a delta to an earlier index. This was not yet implemented. It does have some drawbacks: it makes deletion more complex since the earlier indexes can't be released. It makes retrieval more complex since multiple indexes must be read in parallel.

Another approach is possible. The backup code already walks the basis index in parallel with the source tree, so that the hashes of unchanged files can be reused. (The basis index may actually be stitched together from multiple indexes if the most recent backup was incomplete.) We could take this one step further, and notice that nothing was changed from a whole hunk of the basis index: all the files are the same and there are no deletions. (Currently deletions are implicitly skipped but not attended-to during backup.)

Then, we could just somehow reference a whole hunk of the basis index from an earlier index.

How do we represent that indexing? We could have a specific index entry that means "unchanged from backup b, hunk h." Another approach would be to store the index hunks as blobs, too.

The overall index file would be an ordered list of hash-addressed index hunks. The index could also include the first a-path in each hunk, making it a kind of single-level B-Tree, and giving faster random access to any subdirectory!

One important goal for Conserve is that interrupted backups are still readable: up to the point they completed, and then falling back to the previous backup (if there is one). (This is a reason why the index cannot be a simple Merkle tree, because then the top node wouldn't be written until the whole backup is complete.) But this could be seen as a kind of "Merkle forest": the backup metadata (indexed by backup number) points to a list of hunks (by hash), each of which points to some file content (by hash).

After we finish an index hunk we need to write out the index file referring to it. On a local disk we can potentially append to the file. On something like S3 we might need to rewrite the whole thing, but maybe that's not so bad. If there are very many hunks perhaps we can break the list of hunks into multiple files to keep this quadratic rewriting under control.... If hunks each contain 1000 entries we could also have a list of up to 1000 hunks in the band directory, which would tick over after 1e6 files... And, perhaps those numbers could be larger...

Getting here probably requires some initial cleanup of the walking-and-stitching index code...

Clone this wiki locally