Replies: 1 comment 3 replies
-
Can you pls identity what in nekRS triggers this.
… On 25 Feb 2021, at 16:07, zjin-lcf ***@***.***> wrote:
I am not sure if anyone encountered the following error when building the example using the OCCA HIP version. Thanks.
nrsmpi ethier 1
...
key: SCALAR00 INITIAL GUESS DEFAULT, value: EXTRAPOLATION
key: SCALAR00 PRECONDITIONER, value: JACOBI
key: SCALAR00 SOLVER TOLERANCE, value: 1.000000e-12
key: SCALAR00 DIFFUSIVITY, value: 1.000000e-02
key: SCALAR00 DENSITY, value: 1.000000e+00
key: SCALAR01 INITIAL GUESS DEFAULT, value: EXTRAPOLATION
key: SCALAR01 PRECONDITIONER, value: JACOBI
key: SCALAR01 SOLVER TOLERANCE, value: 1.000000e-12
key: SCALAR01 DIFFUSIVITY, value: 1.000000e-02
key: SCALAR01 DENSITY, value: 1.000000e+00
key: SCALAR SOLVER, value: PCG
key: SCALAR BASIS, value: NODAL
key: SCALAR DISCRETIZATION, value: CONTINUOUS
key: BUILD ONLY, value: FALSE
key: DATA FILE, value: /path/to/nekRS-HIP/examples/ethier/.cache/udf/udf.okl
key: CI-MODE, value: 0
device memory usage: 0.0635219 GB
initialization took 289.073 s
timestepping for 100 steps ...
**:0:rocdevice.cpp :2303: 747494304564 us: Device::callbackQueue aborting with status: 0x100f**
[92464] *** Process received signal ***
[node:92464] Signal: Aborted (6)
[node:92464] Signal code: (-6)
[node:92464] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b5b86a4c630]
[node:92464] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b5b86c8f3d7]
[node:92464] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b5b86c90ac8]
[node:92464] [ 3] /opt/rocm-4.0.0/hip/lib/libamdhip64.so.4(+0x19e82b)[0x2b5b8807582b]
[node:92464] [ 4] /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1(+0x306bf)[0x2b5b89a396bf]
[node:92464] [ 5] /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1(+0x6569b)[0x2b5b89a6e69b]
[node:92464] [ 6] /opt/rocm-4.0.0/lib/libhsa-runtime64.so.1(+0x166c7)[0x2b5b89a1f6c7]
[node:92464] [ 7] /lib64/libpthread.so.0(+0x7ea5)[0x2b5b86a44ea5]
[node:92464] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2b5b86d579fd]
[node:92464] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node node exited on signal 6 (Aborted).
--------------------------------------------------------------------------
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am not sure if anyone encountered the following error when building the example using the OCCA HIP version. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions