How to specify remote storage for target genomes in samplesheet? #290
-
The samplesheet setup page states:
But if the target genomes are at a remote path (e.g., gs://bucket/path/target.pgen, gs://bucket/path/target.pvar, gs://bucket/path/target.psam ), how should this advice be operationalized? When I simply try to use "gs://bucket/path/target" as the prefix in the scorefile, it seems to be treated as a subfolder of the working directory. For example, since I am running from In contrast, pointing to a score file in a gs:// bucket works just fine, so I don't think this is a fundamental limitation of nextflow. (I considered mounting to make these look like local files with gcsfuse, but doing so is prohibited due to restricted permissions. And the pipeline is not being executed locally, so localizing these [enormous] genotype files to the launch machine is not sensible anyways.) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @carbocation, we have an open issue (#288) to document how to use the pipeline with remote paths like that, it basically involves supplying a samplesheet in json format. @nebfield can explain more because he's the expert on getting it running in the cloud! |
Beta Was this translation helpful? Give feedback.
-
You're right, nextflow natively supports object storage and cloud execution 🚀 In |
Beta Was this translation helpful? Give feedback.
You're right, nextflow natively supports object storage and cloud execution 🚀
In
pgsc_calc
CSV samplesheets are incompatible with cloud storage, but you can use JSON samplesheets instead