Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursion limit reached #31

Open
Simarpreet-Kaur-Bhurji opened this issue Jul 25, 2024 · 18 comments
Open

Recursion limit reached #31

Simarpreet-Kaur-Bhurji opened this issue Jul 25, 2024 · 18 comments

Comments

@Simarpreet-Kaur-Bhurji
Copy link

Hello,
While running FastOMA on 2200 species, I encountered another mafft segmentation fault but when I resumed nextflow it seemed to not complain about the segmentation fault but I got the recursion limit reached error. Please find attached the log file of the run herewith. Do you know what's going on?

recurrsion_depth_err.log

@alpae
Copy link
Member

alpae commented Jul 26, 2024

Hi @Simarpreet-Kaur-Bhurji,

looks like this happened during the communication between different threads. not sure what goes on there exactly. could you share with us the whole work folder of that failing step (/hps/nobackup/flicek/ensembl/compara/sbhurji/Development/fastoma_run/work/18/6d8a8694445b6226830f618af3bf2f), including the data for the roothog D0138574. Probably something like this should work: cd /hps/nobackup/flicek/ensembl/compara/sbhurji/Development/fastoma_run/work/18/6d8a8694445b6226830f618af3bf2f; tar -cvzhf dump.tgz . should work.

@Simarpreet-Kaur-Bhurji
Copy link
Author

Hi Adrian, thank you for getting in touch. I kept the log message but I have unfortunately deleted the work directory in anticipation of the rerun. In the meanwhile I will rerun it and will let you know when I reach this issue again?

@sinamajidian
Copy link
Collaborator

No worries.
For future it would be also helpful to know whether the task ran out of memory or not. I see a case where Segmentation fault happened due to lack of enough memory. FastOMA by default retries three times increasing the allocated memory with slrum job.

To check the slrum job, you could see the relevant work folder, and find its job name and job ID (e.g. with sacct), which could be used with seff to see whether it ran out of memory or not (please see end of this wiki for an example).

$ head -n2 .command.run 
#!/bin/bash
#SBATCH -J nf-hog_rest_(25)

@Simarpreet-Kaur-Bhurji
Copy link
Author

Thank you that is helpful, I will check it for this run.

@Simarpreet-Kaur-Bhurji
Copy link
Author

Hi Sina,
I have run the pipeline again, I hit the mafft segmentation fault and bumped the memory based on your previous suggestion. After that it seemed to be running for 2 days and now it has again failed with recursion limit reached error. Please find attached the work folder herewith. Let me know if you need any other details. Thank you.
dump.tgz

@alpae alpae closed this as completed in 3f451d1 Aug 8, 2024
@sinamajidian sinamajidian reopened this Aug 8, 2024
@sinamajidian
Copy link
Collaborator

sinamajidian commented Aug 8, 2024

Hi Simarpreet
The fix had been on another branch and I think you ran the same code. Adrian just updated the main branch. So the latest code shouldn't hit the recursion limit. In order to save time/computation, you can run only this rootHOG (using the .command.sh) to see the problem is solved or not.

@Simarpreet-Kaur-Bhurji
Copy link
Author

Hi Sina,
I pulled the lastest changes and rerun but I still got the recursion limit reached error. Do you think it is to do with the data given Triticum aestivum is usually troublesome because of it's size? PFA the word dir herewith.
wheat_roothog_dir.tgz

@alpae
Copy link
Member

alpae commented Aug 21, 2024

Hi @Simarpreet-Kaur-Bhurji ,

I've uploaded a fix for this issue (hopefully this time for real). you could try it by updating the repo with the dev branch and submitting the .command.run from the failing work-directory. if you use containers, you should ensure that the dessimozlab/fastoma:sha-1aa97b8 e.g. docker pull dessimozlab/fastoma:sha-1aa97b8 is used. please let us know if this fixes your issue.

@Simarpreet-Kaur-Bhurji
Copy link
Author

Hey Adrian, thank you for looking into this. At the moment our servers are under scheduled maintenance I will let you know if this fixes it. Thank you. Would request to keep this issue open until then.

@sinamajidian
Copy link
Collaborator

Btw, if you share the fasta file of rootHOG ( inside the folder fastoma_run/work/30/45bab08427770d06e1b9e5f1f5d282/rhogs_big/58) with us, I can run on it and make sure the issue is resolved.

@Simarpreet-Kaur-Bhurji
Copy link
Author

Hi Sina, sure thing thank you for helping with this. PFA the fasta file herewith. Just to let you know I have also rerun the pipeline at my end but it will be a while until it reaches that step, so it will be great if you could check if the issue is resolved.
HOG_D0138574.fa.gz

@sinamajidian
Copy link
Collaborator

Thanks. Yes. It finished successfully in our cluster. Hope it will be smooth in your side.

2024-08-29 04:25:47 DEBUG    Inferring subHOGs for batch of 1 rootHOGs started.
2024-08-29 04:25:48 INFO     number of proteins in the rHOG is 20269.
2024-08-29 04:25:48 INFO     Number of unique species in rHOG D0138574 is 18.
...
2024-08-29 04:41:26 INFO     All subHOGs for the rootHOG D0138574 as OrthoXML format is written in pickle_hogs/file_D0138574.pickle

@Simarpreet-Kaur-Bhurji
Copy link
Author

Thank you so much for testing this on your side, will let you know how the run goes for us, finger crossed.

@Simarpreet-Kaur-Bhurji
Copy link
Author

Hi Sina and Adrian, sorry it has taken a while for me to get back to you. So as it stands the run was still not complete on my end. When I got the segmentation fault I tried to up the memory by updating it to the following in FastOMA.nf file:

` memory { mem_cat(getMaxFileSize(rhogsbig), nr_species as int) * task.attempt * 3 }

  • time { time_cat(getMaxFileSize(rhogsbig), nr_species as int) * task.attempt }`

After which I again got the segmentation fault error but with maxwm <- 0.0 the error and log files are attached herewith.

command.log.txt
command.err.txt

The fasta file:
HOG_D0138736.fa.txt

The size of the zipped folder is more than the allowed size for git I can send that via email.
I also tried the sacct but I do not have the job id of the affected job anymore because it has been a while since I last run. Please let me know what you think is going on here.

@srobb1
Copy link

srobb1 commented Sep 23, 2024

Hello. I am having a RecursionError: maximum recursion depth exceeded error as well.

I am running 0.3.4. I am only running with 15 species.

I am pasting the last good line and the first error line from my .nextflow.log. I am also attaching a screenshot of my summary report. I am lost as to what I should do next to trouble shoot this issue.

Thank you,
Sofia

from .netflow.log:
Sep-19 20:50:47.487 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 2104348; id: 20; name: infer_roothogs (1); status: COMPLETED; exit: 1; error: -; workDir: /n/sci/SCI-004219-SBCHAMELEO/Chamaeleo_calyptratus/genomes/CCA3-haplotypes/analysis/gene_gain_loss/fastoma/work/85/169b353adc16f9830a97bcb887204c started: 1726787282133; exited: 2024-09-20T01:49:51.671447Z; ]
Sep-19 20:50:47.487 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=infer_roothogs (1); work-dir=/n/sci/SCI-004219-SBCHAMELEO/Chamaeleo_calyptratus/genomes/CCA3-haplotypes/analysis/gene_gain_loss/fastoma/work/85/169b353adc16f9830a97bcb887204c
error [nextflow.exception.ProcessFailedException]: Process infer_roothogs (1) terminated with an error exit status (1)
Sep-19 20:50:47.518 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'infer_roothogs (1)'

Caused by:
Process infer_roothogs (1) terminated with an error exit status (1)

Command executed:

fastoma-infer-roothogs --proteomes proteome --hogmap hogmaps --splice splice --out-rhog-folder "omamer_rhogs" -vv

Command exit status:
1

Command output:
291057
83867
There are 83867 candidate pairs of rhogs for merging.

There are 4776 clusters.

Command error:
^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/FastOMA/_utils_roothog.py", line 1205, in HCS
H = HCS(sub_graphs[0])
^^^^^^^^^^^^^^^^^^
[Previous line repeated 4 more times]
File "/app/lib/python3.11/site-packages/FastOMA/_utils_roothog.py", line 1198, in HCS
E = nx.algorithms.connectivity.cuts.minimum_edge_cut(G)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 4", line 3, in argmap_minimum_edge_cut_1
File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/networkx/algorithms/connectivity/cuts.py", line 607, in minimum_edge_cut
this_cut = minimum_st_edge_cut(H, v, w, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 30", line 3, in argmap_minimum_st_edge_cut_27
File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/networkx/algorithms/connectivity/cuts.py", line 150, in minimum_st_edge_cut
cut_value, partition = nx.minimum_cut(H, s, t, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 34", line 3, in argmap_minimum_cut_31
File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/networkx/algorithms/flow/maxflow.py", line 457, in minimum_cut
non_reachable = set(dict(nx.shortest_path_length(R, target=_t)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 42", line 3, in argmap_shortest_path_length_39
File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call
return self.orig_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/networkx/algorithms/shortest_paths/generic.py", line 301, in shortest_path_length
G = G.reverse(copy=False)
^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/networkx/classes/digraph.py", line 1334, in reverse
return nx.reverse_view(self)
^^^^^^^^^^^^^^^^^^^^^
File "<class 'networkx.utils.decorators.argmap'> compilation 46", line 4, in argmap_reverse_view_43
File "/app/lib/python3.11/site-packages/networkx/classes/graphviews.py", line 266, in reverse_view
newG = generic_graph_view(G)
^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/networkx/classes/graphviews.py", line 104, in generic_graph_view
newG = G.class()
^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/networkx/classes/digraph.py", line 350, in init
self._node = self.node_dict_factory() # dictionary for node attr
^^^^^^^^^^
RecursionError: maximum recursion depth exceeded

report_2024-09-19_09-22-09 html

@sinamajidian
Copy link
Collaborator

Hi @srobb1
Thanks for reaching out. We believe we fixed this issue by the update provided in the dev branch (discussed on this page). Please let us know if it helps your case as well. Feel free to open a new github issue if the problem continues, and please provide us more info about the system you are using and the tree.
Best,
Sina

@sinamajidian
Copy link
Collaborator

Hi @Simarpreet-Kaur-Bhurji
It looks like it is a different rootHOG. Could you possibly run the command in the .command.sh (available inside the work folder) for this rootHOG and see how much memory it needs? (it would be best to copy the needed file in a new folder and run with slurm to have the full log). Btw, which MAFFt version are you using and how did you install it?
Yes, please send me the rootHOG, I could try it out too. We would love to arrange our next meeting, probably in mid October.

Best,
Sina

@Simarpreet-Kaur-Bhurji
Copy link
Author

Hi Sina, sure thing we will get in touch via email to schedule our next meeting. We can look into the above issues then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants