Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup_genome_sequences fails if rerunning from the "cat" step #54

Open
ning-y opened this issue Feb 28, 2024 · 1 comment
Open

setup_genome_sequences fails if rerunning from the "cat" step #54

ning-y opened this issue Feb 28, 2024 · 1 comment

Comments

@ning-y
Copy link
Contributor

ning-y commented Feb 28, 2024

This is a low importance issue, because the use case is rare and the workaround is easy.

If make_lastz_chains is called with --continue_from_step cat, but the "cat" step has been run before, make_lastz_chains will fail with e.g.

### Trying to continue from step: cat
Making chains for /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit and /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/210-repeatmasked/GCA_004027375.1_MacSob_v1_BIUU_genomic.2bit files, saving results to /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out
Pipeline started at 2024-02-28 12:56:25.148054
 * Setting up genome sequences for target
genomeID: hg38
input sequence file: /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit
is 2bit: True
planned genome dir location: /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out/target.2bit
Traceback (most recent call last):
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 261, in <module>
    main()
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 257, in main
    run_pipeline(args)
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 233, in run_pipeline
    setup_genome_sequences(args.target_genome,
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/modules/project_setup_procedures.py", line 172, in setup_genome_sequences
    os.symlink(arg_input_2bit, seq_dir)
FileExistsError: [Errno 17] File exists: '/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit' -> '/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out/target.2bit'

Because the "target.2bit" file was symlinked from the earlier run, and the os.symlink fails if the destination already exists.

The user workaround is to delete "target.2bit" which fixes this easily.

The developer fix is to check and remove the destination if it exists, or use a symbolic linking function which tolerates already existing destinations. Checking briefly, the os.symlink function does not have this option.

@MichaelHiller
Copy link
Collaborator

Thanks for reporting this.
The correct behavior should be to NOT overwrite results that already exists (to prevent that somebody executes the pipe accidentally again).
If the cat step was already run (either successfully, or not), then the user should clean it up and then continue with cat.

However, I agree that the script should output a proper warning + what the user needs to do.
And the symlinks of the 2bit is likely not a stable solution.
Bogdan, can you pls have a look at some point?

Thx a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants