How to specify remote storage for target genomes in samplesheet? #290

carbocation · 2024-05-01T21:08:32Z

carbocation
May 1, 2024

path_prefix should be set to the path of the target genomes excluding all file extensions

Always use absolute paths that begin with /

But if the target genomes are at a remote path (e.g., gs://bucket/path/target.pgen, gs://bucket/path/target.pvar, gs://bucket/path/target.psam ), how should this advice be operationalized?

When I simply try to use "gs://bucket/path/target" as the prefix in the scorefile, it seems to be treated as a subfolder of the working directory. For example, since I am running from /tmp/nextflow, it becomes: /tmp/nextflow/gs:/ in the error message.

In contrast, pointing to a score file in a gs:// bucket works just fine, so I don't think this is a fundamental limitation of nextflow.

(I considered mounting to make these look like local files with gcsfuse, but doing so is prohibited due to restricted permissions. And the pipeline is not being executed locally, so localizing these [enormous] genotype files to the launch machine is not sensible anyways.)

Answered by nebfield

May 2, 2024

You're right, nextflow natively supports object storage and cloud execution 🚀

In pgsc_calc CSV samplesheets are incompatible with cloud storage, but you can use JSON samplesheets instead

View full answer

smlmbrt · 2024-05-02T08:49:20Z

smlmbrt
May 2, 2024
Maintainer

Hi @carbocation, we have an open issue (#288) to document how to use the pipeline with remote paths like that, it basically involves supplying a samplesheet in json format. @nebfield can explain more because he's the expert on getting it running in the cloud!

0 replies

nebfield · 2024-05-02T11:21:12Z

nebfield
May 2, 2024
Maintainer

You're right, nextflow natively supports object storage and cloud execution 🚀

In pgsc_calc CSV samplesheets are incompatible with cloud storage, but you can use JSON samplesheets instead

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to specify remote storage for target genomes in samplesheet? #290

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to specify remote storage for target genomes in samplesheet? #290

carbocation May 1, 2024

Replies: 2 comments

smlmbrt May 2, 2024 Maintainer

nebfield May 2, 2024 Maintainer

carbocation
May 1, 2024

smlmbrt
May 2, 2024
Maintainer

nebfield
May 2, 2024
Maintainer