Skip to content
This repository has been archived by the owner on Sep 24, 2024. It is now read-only.

Comparing hashes of data to prevent saving the same data (1): hashing data file #7

Open
mjia8 opened this issue Jun 19, 2023 · 0 comments
Assignees

Comments

@mjia8
Copy link
Collaborator

mjia8 commented Jun 19, 2023

Team members: Jayesh, Kajoyrie, Jason
Sprint 4: 6/19-6/26

Overall Goal:
When we download data within the archiver, we key and hash the data and then we check within the registry and see if it is different from the hash of the last version of the data.

What does success look like?

  • We first want a function in archiver.js that takes a data file and a registry id, applies a MD5 hash to the data file, searches in the registry for the data assigned to that registry id, finds the data hash (that we will later implement to be stored in the registry too), and compares the two hashes, returning true is they are the same.
  • If the data hash or registry id does not exist in the registry, return false.
  • We will then want to add the data hash as a piece of data stored in the registry too when we archive data.

Comments:

  • Ideally, we will compute the hash when we download the data initially so that we do not have to read the data twice.
  • Hashing: we can use MD5 to hash the data when reading the contents of the file and do it incrementally, instead of the whole thing in the memory.
    • Good resource to look at when starting: archiving the file name in Ethan's archive demo
    • It will be interesting to see if the hash is the same depending on if we read the file as the bytes vs. text.
  • We want to only do this for data files because the about info files would change very frequently without much benefit for us.
@mjia8 mjia8 changed the title Comparing hashes of data to prevent saving the same data Comparing hashes of data to prevent saving the same data (1): hashing data file Jun 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants