Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup quarentine EC2/S3 instance for user data uploads #797

Open
3 tasks
k1o0 opened this issue May 7, 2024 · 3 comments
Open
3 tasks

Setup quarentine EC2/S3 instance for user data uploads #797

k1o0 opened this issue May 7, 2024 · 3 comments
Assignees

Comments

@k1o0
Copy link
Contributor

k1o0 commented May 7, 2024

Users need to register manually curated spike data. The registration could be done in ibllib by registering to an 3S instance mounted to an EC2 instance running Globus. Alyx could then handle the transfers however something needs to assert that no Flatiron data are overwritten by this S3 data.

This issue is for simply determining the best course of action:

  • Confirm S3 can allow CREATE without DELETE
  • Determine whether possible to create an Globus ID from an S3 bucket (may need to ask SDSC)
  • Determine whether to use Alyx Globus management command and/or to run S3 sync from SDSC
@mayofaulkner mayofaulkner transferred this issue from int-brain-lab/iblenv Jul 1, 2024
@mayofaulkner
Copy link
Contributor

Step 1: Run from local server of AWS instance
Run a task using the s3 patcher

  1. Check out s3_patcher branch of ibllib
  2. Run a task with location='EC2', e.g
task = SpikeSorting(session_path, one=one, location='EC2', **kwargs)
task.run()
# This will upload the files via the s3 patcher to the [s3 patcher bucket](https://us-east-1.console.aws.amazon.com/s3/buckets/s3-patcher?prefix=patcher/&region=us-east-1&bucketType=general)
task.register_datasets()

Step 2: Run from SDSC
Launch the sync script from the patcher to flatiron

  1. Checkout s3_patcher branch of iblalyx on SDSC
    Run the following
~/Documents/PYTHON/alyx/alyx/alyxvenv/bin/python ~/Documents/PYTHON/alyx/alyx/manage.py sync_patcher > /home/datauser/ibl_logs/sync_patcher.log 2>&1

To force an overwrite of a dataset

~/Documents/PYTHON/alyx/alyx/alyxvenv/bin/python ~/Documents/PYTHON/alyx/alyx/manage.py sync_patcher --force True > /home/datauser/ibl_logs/sync_patcher.log 2>&1

@mayofaulkner
Copy link
Contributor

mayofaulkner commented Oct 21, 2024

Items still TODO

  • Add dev group to better control who can patch datasets using the sync script
  • Look into boto3 authentication via alyx
  • Need to implement deletion of datasets from the s3 patcher once they have transferred to flatiron
  • Make sync script a cron job that is run daily
  • Add AWS cloudwatch to monitor the size of s3 patcher bucket
  • Need to change file name patter of spikesorting datasets to allow namspace prefix

@oliche oliche self-assigned this Oct 24, 2024
@oliche oliche closed this as completed Jan 6, 2025
@oliche oliche reopened this Jan 7, 2025
@oliche
Copy link
Member

oliche commented Jan 7, 2025

The patcher needs to set the other potential file records to exists=False.

If this is not done properly, there is a risk that the daily transfer overwrites the flatiron dataset, and the s3 patcher will never complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants