Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an experimental docker image #171

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kazuki0824
Copy link
Member

@kazuki0824 kazuki0824 commented Mar 1, 2022

Run CPU only tasks on any hosts (ex. Builds, Tests)

DOCKER_BUILDKIT=1 docker build . -t perception_build_test -f Dockerfile.perceptioncamtest
docker run --rm -it --platform linux/arm64/v8  --network host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix perception_build_test

and the run this in the container.

# colcon build --continue-on-error

Run with GPU(still partway, only works on a real Jetson host)

docker build . -t perception_build_test -f Dockerfile.perceptioncamtest
docker run --rm -it --gpus all --platform linux/arm64/v8  --network host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix perception_build_test

and, for example, the run this in the container.

cp -r /usr/local/cuda/samples /tmp
cd /tmp/samples/1_Utilities/deviceQuery
make -j`nproc`

/tmp/samples/bin/aarch64/linux/release/deviceQuery 

If you use GUI, run xhost +local:root and xhost -local:root as non-root.

@kazuki0824 kazuki0824 force-pushed the feature/perception-img-docker branch from 8bd4dca to 632ca64 Compare March 1, 2022 17:48
@kazuki0824
Copy link
Member Author

kazuki0824 commented Mar 1, 2022

Known issues when using with x86 + GPU

If you run gpu codes in the container on x86 host, it won't work. I suspect CUDA in L4T is ONLY compatible with Tegra chips.

Run,

cp -r /usr/local/cuda/samples /tmp
cd /tmp/samples/1_Utilities/deviceQuery
make -j`nproc`

/tmp/samples/bin/aarch64/linux/release/deviceQuery 

then it emits an error below,

root@devenv:/tmp/samples/1_Utilities/deviceQuery# /tmp/samples/bin/aarch64/linux/release/deviceQuery 
/tmp/samples/bin/aarch64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

.

Next, run apt install nvidia-container-csv* -y and run /tmp/samples/bin/aarch64/linux/release/deviceQuery . It will get

root@devenv:/tmp/samples/1_Utilities/deviceQuery# /tmp/samples/bin/aarch64/linux/release/deviceQuery 
/tmp/samples/bin/aarch64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

NvRmPrivGetChipIdLimited: Could not read Tegra chip id/rev 
Expected on kernels without fuse support, using Tegra K1
NvRmPrivGetChipPlatform: Could not read platform information 
Expected on kernels without fuse support, using silicon
libnvrm_gpu.so: NvRmGpuLibOpen failed
cudaGetDeviceCount returned 999
-> unknown error
Result = FAIL

But if you uncomment

#RUN apt install nvidia-container -y
#RUN apt install nvidia-container-csv-* -y
and rebuild , and then run /tmp/samples/bin/aarch64/linux/release/deviceQuery in the container, the error code is 35, not 999.

@kazuki0824
Copy link
Member Author

@hakuturu583 please ensure it can "colcon build" and "colcon test" on x86 correctly. If successful, tensorrt_yolox will emit compilation errors and others can be built and tested properly

@hakuturu583
Copy link
Member

Can you add documentation here?
https://github.com/OUXT-Polaris/ouxt_automation/tree/master/docs

Copy link
Member

@hakuturu583 hakuturu583 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants