-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir-tensorrt] Add support for non-DPS calling convention #258
base: main
Are you sure you want to change the base?
Conversation
9760adc
to
63c8339
Compare
dbe281b
to
0a8dc8e
Compare
mlir-tensorrt/executor/include/mlir-executor-c/Runtime/Runtime.h
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/include/mlir-executor-c/Runtime/Runtime.h
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/include/mlir-executor-c/Runtime/Runtime.h
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/include/mlir-executor-c/Runtime/Runtime.h
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/include/mlir-executor/Support/Allocators.h
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/lib/Runtime/Backend/Lua/Modules/TensorRT/TensorRTModule.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/lib/Runtime/Backend/Lua/Modules/TensorRT/TensorRTModule.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/lib/Runtime/Backend/Lua/Modules/TensorRT/TensorRTModule.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/executor/lib/Runtime/Backend/Lua/Modules/TensorRT/TensorRTModule.cpp
Outdated
Show resolved
Hide resolved
a669203
to
36c143c
Compare
mlir-tensorrt/compiler/include/mlir-tensorrt/Dialect/TensorRTRuntime/IR/TensorRTRuntimeOps.td
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/IR/TensorRTRuntime.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/IR/TensorRTRuntime.cpp
Outdated
Show resolved
Hide resolved
if (result.getType().isa<TensorType>() != allTensors) { | ||
return emitOpError("all results must be of the same type (all tensors " | ||
"or all memrefs)"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also be verifying the layout (stride + offset) information on memref results, e.g.
trtrt.alloc_enqueue .... -> memref<?x?x?xf32>
implies the identity layout (canonical row major strides).
trtrt.alloc_enqueue .... -> memref<?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>
indicates that the strides are unknown.
Not being able to know anything about the strides is very worst-case since it disables many possible optimizations.
We need this information from TensorRT -- what layouts of results are possible/allowed? Can we enforce that canonical strides will always be returned from TensorRT using the output allocator? If we can, then the verifier should enforce that canonical layouts are used.
Currently for trtrt.enqueue
we are effectively enforcing canonical strides for input and output buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the other thread, MLIR-TRT does not use the nvinfer1::setAllowedFormat
API, which allows formats other than nvinfer1::TensorFormat::kLINEAR. So, for all practical purposes, stride here is canonical.
I will add a check in the verifier for canonical strides. Let me know if you meant this as I might have misunderstood you.
Also, How can I generate this assembly format: memref<?x?x?xf32, strided<[?, ?, ?, ?], offset: ?>>
?
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/IR/TensorRTRuntime.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/Transforms/BufferizableOpInterfaceImpl.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/Transforms/BufferizableOpInterfaceImpl.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/Transforms/BufferizableOpInterfaceImpl.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/Transforms/BufferizableOpInterfaceImpl.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/Transforms/BufferizableOpInterfaceImpl.cpp
Outdated
Show resolved
Hide resolved
mlir-tensorrt/compiler/lib/Dialect/TensorRTRuntime/Transforms/BufferizableOpInterfaceImpl.cpp
Outdated
Show resolved
Hide resolved
3844712
to
dd26988
Compare
mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/Options.h
Outdated
Show resolved
Hide resolved
ef2bddf
to
8a9ae8c
Compare
b8643e5
to
0c89615
Compare
cfeefc6
to
9118b9a
Compare
9118b9a
to
01f0518
Compare
15a15bf
to
e9839c1
Compare
3478d82
to
d8d2f81
Compare
d8d2f81
to
5ad00f2
Compare
This commit introduces support for a non-Destination-Passing Style (non-DPS) calling convention in mlir-tensorrt, while maintaining the existing DPS-style interface.
The changes aim to allow users to compile and execute a mlir-tensorrt executable without allocating output memrefs in advance. Removing the output memref allocation restriction alleviates users from computing output shapes in advance and allocating output memrefs. This is critical since it is part of the performance-critical execution loop.
Deferred output allocation has an added advantage as we no longer need to allocate shape upper bound output buffers and copy the exact output buffer from TensorRT results into mlir-tensorrt output buffers.
This approach also allows us to support data-dependent shapes since the outputs are not required to be allocated before execution.
The non-DPS style calling convention implementation leverages the
nvinfer1::IOutputAllocator
interface for deferred output allocation. Interface functions such asreallocateOutputAsync
andnotifyShapes
store the allocated output buffer address and its shape.User call sites with existing DPS-style calling convention:
Now, we can do the following with non-DPS style convention:
Key changes include:
Plan
dialect to implement non-DPS calling convention:PlanAllocTensorsPass
andCreateClosedRegionsPass
to handle non-DPS conventionCallAllocOp
andCallOp
respectively.ConvertTensorRTToTensorRTRuntime
to support both calling conventionsEnqueueAllocOp
for non-DPS style executionEnqueueAllocOp
to executorCallOp
.OutputAllocator
andCustomTensorRTOuputAllocator
classes with proper lifetime management usingOutputAllocatorTracker
.executeFunctionWithLuaBackend
to support both DPS and non-DPS styles