Skip to content
This repository has been archived by the owner on Jul 4, 2018. It is now read-only.

python ingest, registration and sync tool #1

Open
jasoncoposky opened this issue Feb 7, 2017 · 0 comments
Open

python ingest, registration and sync tool #1

jasoncoposky opened this issue Feb 7, 2017 · 0 comments

Comments

@jasoncoposky
Copy link
Collaborator

jasoncoposky commented Feb 7, 2017

We have a requirement to ingest and / or register a considerable amount of data at rest in large, sometimes parallel, file systems. A design consideration is a new iRODS user with hundreds of millions of files, and also possibly an existing user who wishes to periodically sync a large volume with their existing iRODS catalog.

Given a target local directory, an initial list of features would include:

  • operates in parallel for all possible speed - recursively descend a file system and push fully qualified paths into a worker queue for ingest threads
  • option to wait N seconds to ensure file is at rest - landing zone style behavior
  • option to ingest files or just register in-place
  • option to checksum
  • option to provide regular expressions to skip
  • externalized metadata extraction using a DSL, inherited interface, or other mechanism for defining the rules to generate or extract metadata from the at-rest data
  • option to set iRODS ACLs after data is ingested
  • option for target collection, or collections given a mapping function
  • idempotent - ability to skip unchanged, properly ingested data and metadata

Other possible features:

  • proxy as other iRODS users for ingest
adetorcy added a commit that referenced this issue Mar 8, 2017
adetorcy added a commit that referenced this issue Apr 27, 2017
adetorcy added a commit that referenced this issue May 16, 2017
adetorcy added a commit that referenced this issue May 17, 2017
adetorcy added a commit that referenced this issue Sep 22, 2017
adetorcy added a commit that referenced this issue Sep 22, 2017
adetorcy added a commit that referenced this issue Oct 27, 2017
adetorcy added a commit that referenced this issue Oct 27, 2017
adetorcy added a commit that referenced this issue Oct 27, 2017
adetorcy added a commit that referenced this issue Dec 4, 2017
adetorcy added a commit that referenced this issue Dec 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

1 participant