Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation] CudaContext::AllocDeferredCpuMem #23485

Open
axbycc-mark opened this issue Jan 24, 2025 · 0 comments
Open

[Documentation] CudaContext::AllocDeferredCpuMem #23485

axbycc-mark opened this issue Jan 24, 2025 · 0 comments
Labels
documentation improvements or additions to documentation; typically submitted using template

Comments

@axbycc-mark
Copy link

axbycc-mark commented Jan 24, 2025

Describe the documentation issue

Official documentation on implementing custom ops links to the repo code below.

void KernelOne(const Ort::Custom::CudaContext& cuda_ctx,
const Ort::Custom::Tensor<float>& X,
const Ort::Custom::Tensor<float>& Y,
Ort::Custom::Tensor<float>& Z) {
CUSTOM_ENFORCE(cuda_ctx.cuda_stream, "failed to fetch cuda stream");
CUSTOM_ENFORCE(cuda_ctx.cudnn_handle, "failed to fetch cudnn handle");
CUSTOM_ENFORCE(cuda_ctx.cublas_handle, "failed to fetch cublas handle");
CUSTOM_ENFORCE(cuda_ctx.arena_extend_strategy == 0, "arena_extend_strategy mismatch");
void* deferred_cpu_mem = cuda_ctx.AllocDeferredCpuMem(sizeof(int32_t));
CUSTOM_ENFORCE(deferred_cpu_mem, "failed to allocate deferred cpu allocator");
cuda_ctx.FreeDeferredCpuMem(deferred_cpu_mem);
auto z_raw = Z.Allocate(X.Shape());
cuda_add(Z.NumberOfElement(), z_raw, X.Data(), Y.Data(), cuda_ctx.cuda_stream);
}

On line 35, we see a call to cuda_ctx.AllocDeferredCpuMem. This memory is then immediately freed. This raises some questions.

  • Is that line just 35 dead code?
  • Why would we use the CudaContext::deferred_cpu_allocator over the default standard allocator (malloc, free)?
  • What is the meaning of deferred? Do we have to wait for some condition before the memory becomes usable?
  • Are there any cases within a custom op kernel where the deferred_cpu_allocator will be empty?

Page / URL

https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cuda/cuda_ops.cc#L35

@axbycc-mark axbycc-mark added the documentation improvements or additions to documentation; typically submitted using template label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation improvements or additions to documentation; typically submitted using template
Projects
None yet
Development

No branches or pull requests

1 participant