Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling a --temp-directory parameter #438

Open
pasviber opened this issue Aug 6, 2024 · 1 comment
Open

Enabling a --temp-directory parameter #438

pasviber opened this issue Aug 6, 2024 · 1 comment

Comments

@pasviber
Copy link

pasviber commented Aug 6, 2024

Hi,

I am working on starting a nextflow pipeline (v24.04.3) which among many other programs makes use of HISAT2 (v2.2.1) from a singularity container (v3.11.1).

During the execution of the pipeline in a High Performance Computing cluster, the pipeline has stopped in an alignment process with HISAT2 due to the following error:

(ERR): mkfifo(/tmp/44.inpipe1) failed.
Exiting now ...

I have done all the checks related to space and permissions on the /tmp folder and everything is fine.

However, it seems that if I have a total of 50 hisat2 tasks running at the same time in the cluster it can happen that tasks that are in the same node generate the same /tmp/$$.inpipe1. That is, if there is a task that has generated /tmp/44.inpipe1 and /tmp/44.inpipe2, another task tries to create /tmp/44.inpipe1 and returns the error.

This happens in the case I show below:

JobID           JobName        QOS    Planned               Start                 End      User    Elapsed  ReqCPUS  AllocCPUS     ReqMem     MaxRSS   TotalCPU      State ExitCode        NodeList 
------------ ---------- ---------- ---------- ------------------- ------------------- --------- ---------- -------- ---------- ---------- ---------- ---------- ---------- -------- --------------- 
604005       nf-ALIGNM+      short   00:00:00 2024-08-05T16:36:08 2024-08-05T16:57:32  pasviber   00:21:24        6          6         6G              02:01:32  COMPLETED      0:0            cn02 
604005.batch      batch                       2024-08-05T16:36:08 2024-08-05T16:57:32             00:21:24        6          6              2072172K   02:01:32  COMPLETED      0:0            cn02 
606200       nf-ALIGNM+      short   00:00:01 2024-08-05T17:11:19 2024-08-05T17:41:11  pasviber   00:29:52        6          6         6G              02:48:42  COMPLETED      0:0            cn02 
606200.batch      batch                       2024-08-05T17:11:19 2024-08-05T17:41:11             00:29:52        6          6              2158316K   02:48:42  COMPLETED      0:0            cn02 
606206       nf-ALIGNM+      short   00:00:00 2024-08-05T17:20:58 2024-08-05T17:20:59  pasviber   00:00:01        6          6         6G             00:00.726     FAILED     17:0            cn02 
606206.batch      batch                       2024-08-05T17:20:58 2024-08-05T17:20:59             00:00:01        6          6                     0  00:00.726     FAILED     17:0            cn02 

These three jobs are HISAT2 alignment processes executed on node cn02. The first job (604005) does not coincide in time with any of the other two jobs. However, during the execution of job 606200 which creates /tmp/44.inpipe1 and /tmp/44.inpipe2 in the /tmp folder of node cn02, job 606206 is launched on the same node and tries to create in the /tmp folder of node cn02 /tmp/44.inpipe1. As that /tmp/44.inpipe1 already exists because of job 606200, the cluster throws the error I mentioned at the beginning.

This problem was also seen with centrifuge (DaehwanKimLab/centrifuge#268) and was solved by giving the possibility to modify the directory where the temporary files are stored through a --temp-directory parameter. This parameter would allow to create a folder with the name of the sample being aligned and save inside that folder the inpipe1 and inpipe2 of that process without the possibility of matching it with another process.

@imzhangyun, is it possible to add the --temp-directory parameter?

Another possibility would be that the hisat2 code would generate temporary folders with unique names in /tmp that would be deleted at the end of the execution and that would allow to isolate the inpipe of each process avoiding the problem of possible repeated inpipes. This would also solve the problem.

Thank you in advance :)

Pascual

@RaqManzano
Copy link

Hi, having the same issue. It would be great to check out the proposed solution, it looks simple enough and this issue really slow things down at bigger scale. Thanks @pasviber for making the detective work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants