Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❓ [Question] How to decide if an Op should support dynamic shape or not #3224

Open
sean-xiang-applovin opened this issue Oct 9, 2024 · 6 comments
Labels
question Further information is requested

Comments

@sean-xiang-applovin
Copy link

❓ Question

Since only part of the ops support dynamic shapes, and some are not. What's the criteria to decide if an op supports dynamic shape or not?

For some existing ops, which are not marked as supports_dynamic_shapes=True, can I write a converter that wraps the existing converter, and mark my own converter with high priority? Is this the recommended way?

or should I just turn on assume_dynamic_shape_support, which seems to be a flag globally for all converters ?

What you have already tried

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • PyTorch Version (e.g., 1.0): 2.4.1
  • CPU Architecture: x86_64
  • OS (e.g., Linux): linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.11.9
  • CUDA version: 12.1
  • GPU models and configuration: Nvidia L4
  • Any other relevant information:

Additional context

@sean-xiang-applovin sean-xiang-applovin added the question Further information is requested label Oct 9, 2024
@narendasan
Copy link
Collaborator

Since only part of the ops support dynamic shapes, and some are not. What's the criteria to decide if an op supports dynamic shape or not?

  • supports_dynamic_shape specifies if a converter's implementation handles dynamic shape properly

For some existing ops, which are not marked as supports_dynamic_shapes=True, can I write a converter that wraps the existing converter, and mark my own converter with high priority? Is this the recommended way?

  • You can do this, but if you are just wrapping our converter and flipping supports_dynamic_shapes then either its not going to work because we know the converter doesn't support dynamic shape or supports_dynamic_shapes for the converter is stale and a PR flipping it would be welcome. You can totally write your own converter that supports dynamic shape, mark it as high priority and Torch-TensorRT will use it

or should I just turn on assume_dynamic_shape_support, which seems to be a flag globally for all converters ?

  • You can do this, this setting existed mostly for when we had not yet verified that all converters support dynamic shape but we were pretty sure most did. At this point we expect it to be near equivalent to having it off

If you are finding Core ATen ops which we convert that don't support dynamic shape, please file issues, my impression is that we should cover nearly all of them at this point. cc @apbose @chohk88

@sean-xiang-applovin
Copy link
Author

@narendasan thank you for your explanation, and I think your suggestion totally makes sense to me.

BTW, originally I make this question because I am seeing torch.ops.aten._embedding_bag_forward_only.default in my graph, after decomposition. And I see we support torch.ops.aten.embedding_bag.default, for static shape.

I haven't tried, but I plan to convert torch.ops.aten._embedding_bag_forward_only.default with the existing converter. That's why I am asking this question.

is this torch.ops.aten._embedding_bag_forward_only.default some ops that's missing to be covered, or it is meant to be not covered?

@narendasan
Copy link
Collaborator

Seems like embedding bag forward only is a new op in core aten.
@zewenli98 any thoughts about supporting embedding bag forward only?

@zewenli98
Copy link
Collaborator

To my knowledge, torch.ops.aten._embedding_bag_forward_only.default is the same as torch.ops.aten._embedding_bag.default. @sean-xiang-applovin I think the most convenient way for you to try it out is to add the decorator below to the converter.

@dynamo_tensorrt_converter(
    torch.ops.aten._embedding_bag_forward_only.default,
    capability_validator=embedding_bag_validator,
    supports_dynamic_shapes=True,
)

@dynamo_tensorrt_converter(
torch.ops.aten.embedding_bag.default,
capability_validator=embedding_bag_validator,
supports_dynamic_shapes=True,
)
@dynamo_tensorrt_converter(
torch.ops.aten._embedding_bag.default,
capability_validator=embedding_bag_validator,
supports_dynamic_shapes=True,
)
@enforce_tensor_types(
{
0: (TRTTensor,),
1: (TRTTensor,),
}
)
def aten_ops_embedding_bag(
ctx: ConversionContext,
target: Target,
args: Tuple[Argument, ...],
kwargs: Dict[str, Argument],
name: str,
) -> Union[TRTTensor, Sequence[TRTTensor]]:
return impl.embedding.embedding_bag(
ctx,
target,
SourceIR.ATEN,
name,
weight=args[0],
indices=args[1],
offsets=args[2],
mode=args_bounds_check(args, 4, 0),
per_sample_weights=args_bounds_check(args, 6, None),
include_last_offset=args_bounds_check(args, 7, False),
)

@sean-xiang-applovin
Copy link
Author

Thanks for your suggestion @zewenli98 , I remember I have tried with this, and the code failed on some strange shape assertion in impl.embedding.embedding_bag. Sorry I cannot paste the error here, since it is some days ago, but I remember I check the debugger, the two shapes are the same, but seem to be of different type, so assertion failed.

I have to comment out the assertion, and let it compile. However, the compilation failed due to some other reasons. So I am switching to the traditional onnx way now.

@zewenli98
Copy link
Collaborator

@sean-xiang-applovin Thanks for letting us know. It looks like the assertion you pointed out only checks their shapes, not types.

If you have runnable code at hand, could you try passing in None to per_sample_weights? It would bypass the if branch and see if it is the root cause.

Besides, I'm wondering if you passed in 1-dim indices? If yes, can you provide more details about the compilation failed due to some other reasons? I'm willing to debug for you if you can extract a minimal model if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants