-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure running my ML workflows #1115
Comments
@anuprulez I just ran my third workflow (CNN workflow) on galaxy.eu and it failed. Could you please check the log to see what error message we get? Thanks. |
I only see "Failed to communicate with remote job server." |
@kxk302 in the first and third histories, I don't have permission to see those datasets. Can you unlock those? |
Update: I re-ran the third history after the initial failure and it completed successfully. @anuprulez how do I unlock the datasets? I don't see an option when trying to share history. If you want we can use Gitter to resolve this. Thx |
I see some changes have been made to: https://github.com/goeckslab/Galaxy-ML/tree/master/galaxy_ml very recently |
Yes, there was a bug fix in Galaxy-ML that was pushed recently. |
Here are the links to all workflows and datasets for histories: First history: https://training.galaxyproject.org/training-material/topics/statistics/tutorials/FNN/workflows/ |
You need to re-name the uploaded files and change their type to tabular, before running the workflows. Thx. |
@anuprulez did you downgrade the tool versions in the RNN workflow? |
No, I just ran it
…On Thu, May 13, 2021, 8:15 PM kxk302 ***@***.***> wrote:
@anuprulez <https://github.com/anuprulez> did you downgrade the tool
versions in the RNN workflow?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1115 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXB5NQQ7XM3ZTQDEWAKSYLTNPQYHANCNFSM442SE2IA>
.
|
If you downgrade the tool versions as I documented, it will work. |
I guess the questions is why it stopped working with the new versions of those tools. |
Try to check various package versions in the conda environment, python version as well (make sure python 3.6). The conda includes a lot of members, prone to make errors when a newer package joins the team. |
Thanks @qiagu, Could you please provide more info on how to do that? |
Sorry, I just say a general debugging process, not specific to any issue mentioned in this thread. From the stderr report @anuprulez provided, I feel the errors could be cleared by re-cleaning the input TSVs. |
Try to ensure the classification targets are integers, not float. |
I do not see the errors that Anup sees. I guess the first step would be to get these workflows working with older versions of the tools. Then we can use the new version to re-produce the problem. @anuprulez not sure what your internet connectivity is like, but we could possibly have a Zoom meeting to discuss tomorrow (Friday). I'm free from 8:00 am to 10:100 am EST time. |
That's a job running error, you'll want to check this with Nate, that is not a tool error. |
This is run on EU. I remember vaguely Bjorn saying that some jobs are configured to run on GPU and this error would show up then, and the error would go away when job was run on CPU. Am I right @bgruening? |
I have 3 workflows that use Galaxy's ML tools (namely Keras for neural networks). They all worked fine last time I ran them (maybe a month ago?).
These 3 workflows are used in 3 neural network tutorials that I am presenting at GCC 2021. I decided to re-run them to make sure all is good. All 3 workflows fail now. Here is the error message for the first 2 workflows:
Here are the histories:
Per @anuprulez' suggestion, I downgraded the tool versions and the first and second workflow work now. Below is the downgrade:
The third workflow still fails. BTW, it requires the most recent version of the third tool.
I started writing unit tests in galaxytools (https://github.com/kxk302/galaxytools/tree/nn_tests), so these workflows are run as part of the unit test. They would serve as regression tests and would guarantee future changes would not break old code. However, I ran into another issue: models saved to file cannot be loaded and error out. Not sure if this is related to the workflow error above. Here is the error message:
The text was updated successfully, but these errors were encountered: