CUDA + multiprocessing issue #404

hugokitano · 2022-03-08T19:41:29Z

Describe the bug
A Exception cudaErrorInitializationError: initialization error occurs within the multiprocessing pool when using GPU/CUDA on two or more files. This happens in the feature_finding step but could potentially affect any time CuPY is used within the entire workflow.

To Reproduce
Environment: nvcc** --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Using cupy-cuda115==10.2.0.

Script: following the convention described by test_gpu_.py,

def main():
    global alphapept
    alphapept.performance.set_compilation_mode('cuda')
    alphapept.performance.set_worker_count(30)
    importlib.reload(alphapept.feature_finding)

    settings = load_settings('/home/ubuntu/apps/alphapept/test_settings.yaml')
    r =  alphapept.interface.import_raw_data(settings)
    r = alphapept.interface.feature_finding(settings)

where test_settings.yaml is all the defaults, with two or more files in experiment/file_paths

Error
For three separate files

022-03-08 18:57:55> No *.hdf file with features found for /mnt/EXP21155/EXP21155_2021ms0603X7_A.ms_data.hdf. Adding to feature finding list.
2022-03-08 18:57:55> Feature finding on /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw
2022-03-08 18:57:55> Hill extraction with centroid_tol 8 and max_gap 2
2022-03-08 18:57:55> Feature finding of file /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:55> Processing of /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw for step find_features failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:55> No *.hdf file with features found for /mnt/EXP21155/EXP21155_2021ms0609X26_A.ms_data.hdf. Adding to feature finding list.
2022-03-08 18:57:56> Feature finding on /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw
2022-03-08 18:57:56> Hill extraction with centroid_tol 8 and max_gap 2
2022-03-08 18:57:56> Feature finding of file /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:56> Processing of /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw for step find_features failed. Exception cudaErrorInitializationError: initialization error

A Solution?
After some research, I was able to find the source of the problem. The combination of multiprocessing pools and CUDA is a little tricky. In short, we cannot use the CuPY API before we spawn processes. I'm not exactly sure where this happens in the code given, but I expect it's in some of the settings management. The solution I found was to set multiprocessing.set_start_method('spawn') ('forkserver' also works).

The speed and stability of the three options is up for debate, and I'm not sure if we will be able to obtain performance advantages using GPU if we cannot fork processes. I'm not an expert on multiprocessing, though.

Would like to know if you can replicate this problem and suggest a fix. Thank you.

The text was updated successfully, but these errors were encountered:

Jude-Zheng · 2022-04-19T09:06:35Z

hi hugokitano!
my system is Ubuntu 20 4.I have the same problem. Have you solved it?

straussmaximilian · 2022-04-26T08:26:41Z

Hi,
I had never tested analyzing multiple files on GPU, so this could indeed be an issue, and this potentially will not work out of the box. Historically, the GPU part started with how to improve performance on a single file. The use case here could be to launch multiple docker instances on single files and then combine them later in another instance.

However, if anyone has good ideas to get the multiprocessing to work or wants to tackle this, I am all ears.

straussmaximilian added the help wanted Extra attention is needed label Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA + multiprocessing issue #404

CUDA + multiprocessing issue #404

hugokitano commented Mar 8, 2022

Jude-Zheng commented Apr 19, 2022

straussmaximilian commented Apr 26, 2022

CUDA + multiprocessing issue #404

CUDA + multiprocessing issue #404

Comments

hugokitano commented Mar 8, 2022

Jude-Zheng commented Apr 19, 2022

straussmaximilian commented Apr 26, 2022