Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify image tiling requirements for cl_khr_external_memory #861

Closed
nikhiljnv opened this issue Nov 22, 2022 · 41 comments
Closed

Clarify image tiling requirements for cl_khr_external_memory #861

nikhiljnv opened this issue Nov 22, 2022 · 41 comments

Comments

@nikhiljnv
Copy link
Contributor

cl_khr_external_memory spec currently doesn't specify if tiled images can be imported from APIs like Vulkan and
more specifically what minimal tiling modes needs to be supported.
This came up during review of KhronosGroup/OpenCL-CTS#1559
This issue tracks required changes to the spec.

@bcalidas
Copy link

bcalidas commented Jan 9, 2023

We have done some thinking about how the tiling considerations in external memory and image from buffer (#710) intersect with ext_image_tiling_control and exisiting spec behavior. We worked through some principles using which these considerations could be reconciled and we would be able to move forward with the external sharing extension.

The key goals of this exercise were.

  1. To reason about the current spec implicitly defines image tiling behavior. To clarify the meaning of different tiling modes while not requiring any changes to existing implementations or applications.
  2. To provide a reliable mechanism for creating linear and optimally tiled images from external memory without introducing a dependence on the ext_image_tiling_control extension. Allow the external memory spec to move forward without need to depend on ext_image_tiling_control.
  3. To propose the key features of ext_image_tiling_control that would help bring clarity and flexibility to images created using different approaches.

There's a lot of text in our next comment, but hopefully this summary will help with reviewing our proposal.

Thanks,
Balaji Calidas,
Qualcomm

@bcalidas
Copy link

bcalidas commented Jan 9, 2023

  1. The OpenCL spec (3.0 and earlier) implicitly sets the tiling mode of directly created images to optimal. Optimal in this context means that
    the implementation is free to pick whichever tiling mode it wants including linear tiling. The application may make no inference
    about the layout of image data in memory. We were able to establish that on other APIs including Vulkan, optimal tiling mode does not exclude a linear layout.

When EnqueueMapbuffer is called the spec requires the implementation to present a linear view of the image data. This may be accomplished through
detiling operation in some cases.

When an image is created from a buffer and the image row pitch is not zero, the tiling mode is now implicitly linear since the application specified image row pitch would
make no sense otherwise. When images are created from a buffer and the image row pitch is set to zero, then the image tiling is implicitly optimal unless there is an additional
external constraint which forces the layout to linear. ( For example, an external API may have forced a linear layout ).
When present the ext_image_tiling_control can make this behavior explicit. ( Use the optimal tiling flag with zero row pitch and linear tiling flag with non-zero row pitch)
In the absence of ext_image_tiling_control, the implicit behavior prevails.

  1. For images created from external memory, we can now add some additional clarifications independent of the ext_image_tiling_control behavior.
    If the image row pitch is non-zero, the application assumes responsibility for passing in a meaningful non-zero image row pitch. ( The implicit assertion
    is that a meaningful row_pitch is only possible for linear images but we need to confirm that vendors are comfortable with this assertion).
    In the case when a linearly tiled Vulkan image is being imported into OpenCL , the expected application behavior is that the row pitch will be queried from
    Vulkan and passed to OpenCL when the image is created.

We recommend adding a new param name CL_DEVICE_INFER_EXTERNAL_IMAGE_INFO_WITH_UUID for GetDeviceInfo as part of the External Memory extension.
When this param is used in a GetDeviceInfo query, the application will receive an array of uints corresponding to device and driver uuids for which automatic
image layout inference is supported. If NULL is returned , then the implementation does not support automatic inference of external images.

We also need to add a new property When importing the external image into OpenCL. If the implementation supports automatic inference of external images,
When CL_EXTERNAL_IMAGE_UUID_KHR is passed as a property
(along with the device and external driver uuid) in clCreateImageWithProperties, the implementation knows the origin of the external image and can activate
a vendor specific mechanism to infer the layout. This would be a reliable way to import optimally tiled Vulkan images into OpenCL.

  1. Regarding the ext_image_tiling_control extension, this can be thought of as a way to bring clarity to behavior that was previously implicit. We need 2 tiling
    modes - linear and optimal. When used to create an image directly the implementation will apply these modes to the layout. Note that it is valid for an
    implementation to continue to use linear layout when the application has specified optimal tiling. When creating an image from buffer or external memory, the use
    of linear vs optimal tiling flags can bring clarity to usage. When CL_IMAGE_TILING_LINEAR_EXT is used, the image data present in the buffer or external image is assumed to be linear and the application specified row_pitch
    is applied. When CL_IMAGE_TILING_OPTIMAL is used, the image data layout is considered opaque to the application and a zero row pitch should be specified.

By default a clEnqueueMapImage will present a linear view of image data to the host CPU regardless of the actual image data layout in device memory.
We suggest that ext_image_tiling_control extension add a new flag for clEnqueueMapImage called CL_MAP_IMAGE_NO_DETILE which will force the implementation to present a
raw view of the image data. This mechanism is useful when loading/storing tiled image image and is already leveraged by many vendor extensions.

4)DRM format modifiers can be a reliable mechanism for importing tiled images and can be seen as an alternative to specifying the uuid while
calling clCreateImageImageWithProperties. Currently it appears that DRM format modifiers are defined on Linux only, so additional work may be
needed to make them cross platform. We feel that this work can be deferred and that the external memory extension be finalized without a dependence
on DRM format modifiers.

Thanks,
Balaji Calidas
Qualcomm

@bcalidas
Copy link

Here are the broad takeaways from the most recent discussion.

  1. There is general agreement that optimal tiling does not exclude a linear layout.
  2. When images are created from external memory, it is important to be explicit about the tiling of the image. This may mean that external memory is now dependent on image_tiling_control. It is not clear if a robust mechanism for the import of tiled images can be defined based on the current core and extension specs.
  3. There were concerns about the idea of a uuid based image layout inference mechanism. Instead it was proposed that a drm format modifier type mechanism be used to transfer the image layout information. It was proposed that the drm format modifier extension could be made cross platform or an alternate mechanism could be easily designed.

Thanks,
Balaji Calidas
Qualcomm

@bcalidas
Copy link

An additional suggestion for implementing robust import of optimally tiled images from an API like Vulkan to OpenCL is to define a vendor extension. For example vendorXYZ_cl_vk_mem_import. The presence of the extension lets the application know that there is a robust mechanism in place for transferring the image tiling information even in the absence of drm format modifiers.

@nikhiljnv
Copy link
Contributor Author

Just capturing some of the key things we are looking for -

  1. No change to default settings to regular images that may break backward compatibility for existing use-cases/implementation while adopting for image tiling or any other extensions
  2. Not enforcing linear layout just to meet common base functionality at the cost of performance paths (which is optimal layout). Having it optional or as an explicit API mechanism is preferable.
  3. Avoid vendor extension path for external memory and semaphores.

Allowing linear layout as one of the optimal layout sounds reasonable to me.
In case of external image sharing, if we prefer to be explicit anyways, one possible way to avoid having a default layout preference would be to just clCreateImageWithProperties to return an error if layout information/preference is not passed and the image is being created from an external memory handle. This will deviate from regular images where the layout may be chosen implicitly.

I have been looking at drm format modifiers. It seems like this path could work for us. So, we are open to explore this.
Should we consider drm format modifiers as part of image tiling extension or should this be a separate extension?
Also, we will probably want a way to use this path only in case of external images only.

In terms of unblocking external memory and semaphore extensions in the short term, if we can not get image layout exchange sorted out in time, one way to move forward is to continue external memory extensions with buffer-only support with image support added later on. For implementations that do not support images, image requirements are already waived off for these extensions. We may just need to add a separate query to advertise image support for external memory sharing separate from default image support.

@bashbaug
Copy link
Contributor

From a portability point-of-view I don't think the default and minimum baseline tiling requirement for external memory import can be anything other than linear tiling.

I think we should do the following:

  1. Clarify in the cl_khr_external_memory spec that implementations must consume linearly tiled images by default. Update the CTS tests to test linearly tiled images (e.g. by using VK_IMAGE_TILING_LINEAR rather than VK_IMAGE_TILING_OPTIMAL images).
  2. Support optimally tiled images as an opt-in extension. This could take the form of the proposed cl_ext_image_tiling_control extension or it could take some other form. Note that this usage of "optimally tiled" images could still be a little risky because both the exporting and importing APIs need to agree what "optimally tiled" means (testing may be challenging!), but it will allow applications to use the faster optimally tiled images if they know it is safe to do so or are OK with the risks.
  3. Add an extension to support DRM format modifiers (or something like it) to explicitly describe the tiling format. This will allow applications to get the same performance as (2) without the risks.

Does this sound reasonable?

@bcalidas
Copy link

@bashbaug - thanks for your feedback. While separating out optimally tiled images into a different extension and requiring linear tiling would simplify things for some vendors, it might not be ideal for other vendors that really depend on optimal tiling for good performance.

If we are able to get the ext_image_tiling_control and the drm format modifers extension done, then we can define clean mechanisms for the import of both linear and optimally tiled images within cl_khr_external_memory. In this situation, we may not need to define a default tiling mode. It comes down to a question of timing in terms of reconciling all of the specs.

@nikhiljnv
Copy link
Contributor Author

I thought we were converging on default being optimal which can be linear as well as non-linear tiling. Was this the case only for regular images? Is there a reason to have linear layout as default for external images? We will need to add export memory mechanism at some point in the future where we would also need to query exportable handles from regular images created within OpenCL and having different default settings between the two may hurt in some way.

In any case, if the goal is to be explicit about image tiling/layout, then I agree that having default is not necessary and we can make it mandatory for applications to pass layout information (including linear, optimal and DRM format exchange) to ensure the behavior is deterministic. We can possibly error out in the absence of tiling information while importing external images.

I tend to agree with Ben's point (2) that optimal as is defined by Vulkan is problematic and we will mostly need to leave it implementation defined in the spec. If we want to avoid issues with this path, may be we should just focus on having DRM format/similar other mechanism for exchanging layout. Linear and any other optimal layouts that vendors want to support can be expressed via this mechanism. May be, we don't even need to define ambiguous /implementation defined optimal layout as such.

In addition to this, if we consider a case where the same memory is bound to different layouts (buffer as well as image, linear layout image as well as optimal layout image) and if the same is imported by OpenCL, the OpenCL's view can only be limited to the layout information passed by application or assumed by the implementation. In such cases, we may still have limitations to have a single coherent view of the data across APIs even after having an explicit mechanism in place. (PS: Some scenarios here may be hypothetical and may not happen in practice, but are still possible).

@nikhiljnv
Copy link
Contributor Author

Some questions for the discussion today -

  1. Do we need to define a default layout for images at all?
  2. If we need a default layout it, what it should be for different types of images (regular images with no row-pitch info, regular images with row-pitch specified, images created from buffer, external images created via memory imported from other APIs, images/memory exported from OpenCL to other APIs)? And should default be consistent across these or should this be different catering to need of each category?
  3. What are expectations from cl_khr_external_memory wrt non-dedicated memory of Vulkan imported by OpenCL as an image? The same vulkan memory can be bound to different layouts (buffer, linear image, optimally tiled image). Who owns the responsibility to match the layout across APIs? If it is the application's responsibility, then having a default setting to enforce a particular layout doesn't go well. The application must pass desired layout, else we should error out.
  4. What are expectations from cl_khr_external_memory wrt in terms of mismatching layouts between source API exporting image memory and destination (e.g. Optimally tiled image in Vulkan imported as linear layout in OpenCL or vice-a-versa. ). Again if this is application's responsibility to match the two views, should import be successful with layout requested potentially leading to inconsistent views (which can be expected) or should import operation error-out to avoid running into mismatching accesses or should the behavior be implementation defined?
  5. AFAIR, there was brief discussion on some potential use-cases to infer image format/layout info from external memory handle and applications not requiring to pass some image specific information. This is exactly opposite to the thought process here where we want be more explicit in APIs. We should take this into account before making any decisions here.

@nikhiljnv
Copy link
Contributor Author

nikhiljnv commented Feb 7, 2023

cl_khr_external_memory currently waives off need for image import for implementations not supporting images which already brings down the minimum bar for the functionality to importing buffers only.
Mandating a particular layout support for external images (which is somewhat different from the discussion on default layout in absence of layout information) in cl_khr_external_memory would mean that we will have conformant implementations of cl_khr_external_memory with no image support, but implementations that support images and import of images using external memory, but not the specific layout mandated by cl_khr_external_memory will be left non-conformant despite having more functionality. This seems bit odd.

Also, there are other mechanisms to extend support for linear images by importing external memory as buffer using cl_khr_external_memory and then using cl_khr_image_from_buffer. So, don't think we need to mandate the linear layout as default in cl_khr_external_memory to get the same functionality supported across vendors/implementations.

@nikhiljnv
Copy link
Contributor Author

In terms of wider interoperability which is probably the biggest motivating factor to have robust mechanism for exchanging image information, I can see below categories of interop -

  1. Same API, Same vendor, same device-type --> Having additional image related information may be helpful, but shouldn't prevent successful sharing. At present, since we do not have mechanism for memory export in cl_khr_external_memory, this seems out of scope.
  2. Different API/device types, Same vendor --> Absence of image layout information or having implementation defined layout may not be robust, but may still work.
  3. Same/Different API, different vendors --> Will need robust mechanism, is image layout information exchange in addition to image_format and image_desc sufficient? Or do we need more information on properties of underlying memory? Also, not sure if we can directly access device memory via import/export across implementations/devices without going through host memory

While we should pursue 3 to have more generic memory sharing and solving image information exchange may be a small part of it, it may require more than just this. May be we should focus on 1 and 2 for now and then continue to explore 3.

@nikhiljnv
Copy link
Contributor Author

Lastly, we need some clarity on dependencies across cl_khr_external_memory, cl_khr_image_from_buffer, cl_ext/khr_image_tiling and DRM format modifier or equivalent extensions -

  1. Will cl_khr_external_memory depend on/ require cl_ext/khr_image_tiling or will these be separate extensions with latter modifying the tiling support of the former?
  2. Will DRM modifier or equivalent extension require cl_ext/khr_image_tiling or will it be an independent extension?

@bcalidas
Copy link

Proposing these ideas as a thought experiment to work out if there is a way to robustly support import of optimally tilled images without implying that linear tiling is default.

If we make the external memory extension depend on both image tiling control, then we can require that the image tiling mode be explicitly specified when importing images.

Implementations that support image tiling control would be required to support both linear and optimal layouts. However an optimally tiled image layout could be implementing using linear tiling. Since the optimal layout is opaque to the application, this would work. This means that the spec is implementable even on platforms that only have linear tiling.

DRM format modifiers are currently available on Linux only. Until they are universally available, we could add a device info query ( TILED_IMAGE_SOURCE ) which lists the driver uuids for which image tiling information can be inferred. In the case of an implementation that supports the impot of optimally tiled Vulkan images, this would return the Vulkan driver uuid. In this case, the Vulkan and OpenCL implementations must be from the same vendor.

Long term, the robust mechanism for transferring tiling information is drm format modifers. However , we would need to get clarity on which operating systems are supported, which APIs would support them and how to add new modiifers. Note that it should be possible for an application to support only the drm format modifiers corresponding to linear layouts.

For OpenCL images created with commands such as clCreateImage, the tiling mode would be implicitly optimal. For specific cases, such as creating an image from a buffer, the tiling mode would be linear if not explicitly specified. These cases would be documented in the image tiling control spec. Overall, the spec would not have any notion of which tiling mode is default. Implementations would support both and for cases where the tiling mode was implicit, the spec would document which mode applied.

@nikhiljnv
Copy link
Contributor Author

Thanks Balaji. This is a longer route to take, but may be bit better than implicitly assuming some layout to counter the problem of lack of layout information.
Some comments -

If we make the external memory extension depend on both image tiling control, then we can require that the image tiling mode be explicitly specified when importing images.

By both do you mean both linear and optimal layouts or both image tiling control and DRM modifier extensions?
Also, I assume the import would fail in the absence of tiling / layout flags. Is that correct?

DRM format modifiers are currently available on Linux only. Until they are universally available, we could add a device info query ( TILED_IMAGE_SOURCE ) which lists the driver uuids for which image tiling information can be inferred.

What kind of inference are we looking for? Can you help understand purpose of this query? Also, uuids should be API agnostic and truly be the property of device. Not sure what we mean by Vulkan driver uuid.

@nikhiljnv
Copy link
Contributor Author

nikhiljnv commented Feb 21, 2023

I think all of us are in agreement that DRM format modifiers or equivalent mechanism is the way to go for robust image information exchange.
The only items we are debating are -

  • What should be the behavior in case of no layout information passed
  1. Should we have a default setting? I guess the only purpose having default would serve here is to define the behavior in the absence of tiling/layout information and provide backwards compatibility with core image support of OpenCL spec. If we need a default yes, what should it be? For regular images, we were tending towards optimal which can also be linear. For imported external images, if we choose linear as default to provide better portability across implementations and APIs, it may create inconsistencies with the default choice for regular images where default is optimal. This may especially be problematic when we start exporting memory from OpenCL (OpenCL exporting and OpenCL importing with no tiling/layout flags used while creating original image and/or while importing it). If we don't want to define the default, we will need to make cl_khr_external_memory to depend on cl_khr_image_tiling_control as Balaji mentioned and possibly error out while importing memory if no layout was specified (This is again inconsistent compared to regular image where we would probably need a default to keep backwards compatibility in the absence of tiling/layout info).

  2. What should be the bare minimum layouts supported? IMO, This is separate from what the default should be. If we need to enforce linear layout as a requirement, we can do so even in case of explicit tiling control by saying that the implementations will need to return a valid linear imported image if linear layout flag was passed during import (which is different than implicitly assuming linear layout in the absence of layout info).

@bashbaug
Copy link
Contributor

One of my takeaways from yesterday's teleconference is that the tiling requirements may be different depending on the external memory type. IIRC "Android Hardware Buffers" carry tiling information with them, hence for this type of external memory both linear tiling and optimal tiling could (should? must?) be supported, and explicitly passing any tiling information would be redundant.

Assuming this is the case can we define the tiling behavior for each of the external memory types we support?

  • For cl_khr_external_memory_dma_buf:
    • CL_EXTERNAL_MEMORY_HANDLE_DMA_BUF_KHR = ?
  • For cl_khr_external_memory_dx:
    • CL_EXTERNAL_MEMORY_HANDLE_D3D11_TEXTURE_KHR = ?
    • CL_EXTERNAL_MEMORY_HANDLE_D3D11_TEXTURE_KMT_KHR = ?
    • CL_EXTERNAL_MEMORY_HANDLE_D3D12_HEAP_KHR = ?
    • CL_EXTERNAL_MEMORY_HANDLE_D3D12_RESOURCE_KHR = ?
  • For cl_khr_external_memory_opaque_fd:
    • CL_EXTERNAL_MEMORY_HANDLE_OPAQUE_FD_KHR = ?
  • For cl_khr_external_memory_win32:
    • CL_EXTERNAL_MEMORY_HANDLE_OPAQUE_WIN32_KHR = ?
    • CL_EXTERNAL_MEMORY_HANDLE_OPAQUE_WIN32_KMT_KHR = ?

@nikhiljnv
Copy link
Contributor Author

Thanks Ben. Just jotting down few things for further discussion -
I guess we have two dimensions

  1. Whether layout (or any other relevant) information is explicitly available from app to implementation --> This may depend on nature of APIs exporting and importing memory. If both are required to provide, nothing else should be needed. Implementations should honor information passed by app during export/import operations. If one of them is implicit, will either need to define what default layout mean (which should be consistent across importable and exportable image memory) or implementation will need mechanism to infer the same to make things work.
  2. Whether implementations can infer layout (or any other relevant) information somehow --> If it can, we can make it implementation's responsibility to pursue a matching layout in the absence of information at the API level. If application passes extra information, we can decide whether we want implementation to honor application's ask/request or make decision based on it's inference capability.

For inference capability, is it solely a property of handle type or a property of platform + device + handle_type?

@bashbaug
Copy link
Contributor

Brief summary of offline discussion: we're trending toward defining tiling requirements and defaults for the external memory handle types we support. We think we know what to do for OPAQUE_WIN32 handle types and Android hardware buffers. We need to decide how to handle DMA_BUF and OPAQUE_FD handle types.

Details:

We tried to answer the following questions for each external memory handle type:

  1. Is importing with explicit tiling information valid?
  2. Is importing with explicit tiling information required?
  3. If importing with explicit tiling information not required, what is the default behavior?
    • Can the tiling layout be inferred?
    • Is a specific tiling layout assumed?

For DMA_BUF and OPAQUE_FD handles we cannot infer the tiling layout. We dicussed two possible options, which does not have an (initial) dependency on an image_tiling_controls extension, and one which does:

Behavior with Option 1: DMA_BUF and OPAQUE_FD with no initial dependency on image_tiling_controls:

  1. Not right now, would become valid when image_tiling_controls is supported.
  2. Not required as we have no initial mechanism to provide explicit tiling information.
  3. Cannot infer, can only assume a specific layout (LINEAR or TILED or queryable or ???).

What would a query look like if we added one for (3)?

  • One possibility: TRUE/FALSE: Does a device assume LINEAR tiling for import?
    • Could write a CTS test if the query returned TRUE (LINEAR).
    • Would be tough to write a CTS test if the query returns FALSE.
    • Might be easier to write a CTS test if the query returns FALSE if we also had image export?

Possible behavior with Option 2: DMA_BUF and OPAQUE_FD with an initial dependency on image_tiling_controls:

  1. Yes, image_tiling_controls provides the mechanism to provide explicit tiling information.
  2. Yes, requiring explicit tiling information avoids the need for a default.
  3. N/A, use what is passed explicitly.

We need to decide whether to pursue one of these two options or a different option.

@bcalidas
Copy link

After further internal discussion, we'd like to propose the following -

  1. For external_memory, unless the handle type explicitly supports automatic inference of layout ( AHB, OPAQUE_WIN32 ), the application must specify the tiling mode when calling clCreateImageWithProperties. At a minimum , DMA_BUF and OPAQUE_FD handle types would need such treatment.
  2. This means that external_memory extension now depends on tiling_control extension.
  3. There is no default tiling mode. Applications must specify one of LINEAR or OPTIMAL tiling when creating an image with a handle type like opaque_fd. Implementations can always use LINEAR tiling behind the scenes even when OPTIMAL tiling is requested.
  4. When an application creates an image with OPTIMAL tiling ( by importing a handle type like opaque_fd), the drm format modifer must also be specified.
  5. This means that the external_memory extension depends on both the tiling_control and drm format modifier extensions.
  6. We could simplify extension dependencies by requiring them by handle type. For example, external_memory could be implemented independently for OPAQUE_WIN32 only.

Thanks,
Balaji Calidas,
Qualcomm

@nikhiljnv
Copy link
Contributor Author

@bcalidas
Few weeks back Kevin mentioned that automatic inference may not be feasible for all cases for Android HW buffer and inference capability may be a function of more than just the handle type.

Also, with item 4, do we need tiling_control extension to depend on DRM format modifier or will this be a restriction only for externally imported images?

@bcalidas
Copy link

@nikhiljnv v - I was not aware of cases with AHB that did not support automatic inference. This can be discussed as part of the AHB handle type extension.

Regarding the tiling_control extension, it will not depend on DRM format modifier but external memory will depend on both tiling and drm format modifier.

@bcalidas
Copy link

We did some more thinking on item 2 from the earlier post.

"This means that external_memory extension now depends on tiling_control extension"

It may be possible to remove the dependence of external memory on tiling_control by stating that images imported without a tiling property explicitly specified, are linear. This would not necessarily imply that linear tiling in general is the default for OpenCL , but that for this specific case of external memory usage, linear would apply. If this image had been created in Vulkan with Optimal tiling, then the results would be undefined.

The related questions that come up are -
a) are there vendors who would want to support optimal tiling only.
b) Is there a concern that by not requiring optimal tiling upfront, applications may end up selecting linear tiling for portability. This means that they would miss out on performance.

@nikhiljnv
Copy link
Contributor Author

Some questions -

  1. Is the assumption to imply linear tiling in the absence of tiling information limited only for handle types where inference is not possible? Or is the change being proposed for all cases?
  2. What if the image is created in OpenCL with no tiling flags, exported by OpenCL and imported by OpenCL without any explicit tiling information? If by default OpenCL treats images as optimally tiled, importing these images as linear by default in the absence of tiling information is likely to break OpenCL export - OpenCL import scenarios.

@bashbaug
Copy link
Contributor

bashbaug commented May 1, 2023

One idea to potentially "un-stick" this issue: We essentially have implementation-defined behavior right now, where some implementations assume that external memory handles for images are linearly tiled and other implementations assume optimal tiling. Since it appears unlikely we will be able to unify this behavior, perhaps we should consider a query so applications will be able to determine what the behavior is and react accordingly.

This could be as simple as a single per-device query that returns whether the device assumes linear tiling (true) or some other tiling (false). The application would query the device it wants to interop with and then it would create images with the assumed tiling or go down a fallback path if the assumed tiling is unacceptable.

This query would only describe the default tiling behavior when no explicit tiling information is provided. So, if an implementation supports something like the image tiling controls extension or a DRM format modifier extension, any explicit tiling information would override the default and allow other types of tiling.

If this sounds interesting, things we will need to decide are: Can we have just one query for the device covering all handle types or do we need handle-specific queries? Does the query make sense for all handle types or just a few handle types?

@bcalidas
Copy link

bcalidas commented May 3, 2023

If we added a query for default tiling behavior, it would need to be per device and per handle. However, it still leaves unresolved the question of how an implementation can guarantee correct transfer of tiling information if the external image is not linearly tiled.

@bashbaug
Copy link
Contributor

bashbaug commented May 8, 2023

it would need to be per device and per handle

Makes sense.

We currently have a device-specific query to determine which handle types are supported for importing:

https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#_modifications_to_existing_apis_added_by_this_spec_3

Device Info Return Type Description
CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
HANDLE_TYPES_KHR
cl_external_memory_handle_type_khr[] Returns the list of importable external memory handle types supported by device.

We could add a similar query to determine which handle types are assumed linear? Don't take the enum name too seriously - I know it's really long right now:

Device Info Return Type Description
CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
ASSUME_LINEAR_HANDLE_TYPES_KHR
(yes this is too long)
cl_external_memory_handle_type_khr[] Returns the list of importable external memory handle types supported by device that are assumed to have linear tiling when no other tiling information is provided.

Would this work?

Note: we have a similar platform query for the handle types that are supported for importing. We could add a platform query for the "assumed linear tiling" query also, if desired. I'm not sure how useful this is, though...

it still leaves unresolved the question of how an implementation can guarantee correct transfer of tiling information if the external image is not linearly tiled.

Yes - the "these handles assume linear tiling" query would only be a solution for some use-cases, but it's something we could do easily in the short-term. A longer-term solution would use something like the image tiling extension or a TBD DRM format modifiers extension to transfer the tiling information explicitly.

@nikhiljnv
Copy link
Contributor Author

nikhiljnv commented May 9, 2023

I assume CL_DEVICE_EXTERNAL_MEMORY_IMPORT_ASSUME_LINEAR_HANDLE_TYPES_KHR can return a subset of handle types returned by CL_DEVICE_EXTERNAL_MEMORY_IMPORT_HANDLE_TYPES_KHR
A couple of questions -

  • Can above query return an empty list?
  • Would above query be required to return certain handle types (e.g. OPAQUE_FD) if supported by the platform/device?

I assume the diff between lists returned by CL_DEVICE_EXTERNAL_MEMORY_IMPORT_HANDLE_TYPES_KHR and CL_DEVICE_EXTERNAL_MEMORY_IMPORT_ASSUME_LINEAR_HANDLE_TYPES_KHR would imply handle types with non-linear tiling.

Do we also need a similar query for reporting handle types where inference can be supported e.g. CL_DEVICE_EXTERNAL_MEMORY_IMPORT_CAN_INFER_HANDLE_TYPES_KHR ?
If such a query is required for inference, should handle types returned by CL_DEVICE_EXTERNAL_MEMORY_IMPORT_CAN_INFER_HANDLE_TYPES_KHR and handle types returned by CL_DEVICE_EXTERNAL_MEMORY_IMPORT_ASSUME_LINEAR_HANDLE_TYPES_KHR mutually exclusive? Or just that inference would override assumes_linear behavior if there are common handle types?

@nikhiljnv
Copy link
Contributor Author

Also, I think that assumes_linear query is useful in general to allow implementations to report the default assumption for images in general in the absence of image tiling information and not just external images for different handle types.

@bashbaug
Copy link
Contributor

Thanks for the feedback and questions!

I assume CL_DEVICE_EXTERNAL_MEMORY_IMPORT_ASSUME_LINEAR_HANDLE_TYPES_KHR can return a subset of handle types returned by CL_DEVICE_EXTERNAL_MEMORY_IMPORT_HANDLE_TYPES_KHR

Yes, definitely. It wouldn't include any new handle types that are not supported for importing (we should have a test for this), but it could certainly be a subset.

  • Yes, the query could return an empty set if no handle types assume linear tiling (or, in other words, if all handle types assume non-linear tiling).
  • I don't think any of the handle types currently require linear (or non-linear) tiling, but if we find this is the case we can document this requirement and test that the queries behave as required.

Do we also need a similar query for [...]

I think we could use a similar mechanism to query other properties if needed, but I think to "un-stick" this issue we would only need the "assume linear" query.

@bcalidas
Copy link

We are ok with this proposal. Do we have agreement that the CL_DEVICE_EXTERNAL_MEMORY_IMPORT_ASSUME_LINEAR_HANDLE_TYPES_KHR query is sufficient to "un-stick" the issue.

@bcalidas
Copy link

Hi,

Quick update from Qualcomm.
We are in the process of pushing a PR to the cl_khr_external_memory spec which adds the CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
ASSUME_LINEAR_HANDLE_TYPES_KHR device query. ( couldn't find a shorter version that worked ).

We further reasoned that if an implementation reported a handle type in the CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
HANDLE_TYPES_KHR query but not the CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
ASSUME_LINEAR_HANDLE_TYPES_KHR query, an application could reasonably expect that however the image was created in the external API (and with whatever layout) for that handle type, the implementation would infer the layout and things would just work.

This allows us to proceed forward with the base cl_khr_external_memory spec.

We can follow up by finishing the ext_image_tiling_control which introduces the concept of linear and optimal tiling. Finally we can add an extension that layers on both image tiling control and external_memory and provides explicit control over the tiling of imported images. With this extension we can also add a per-handle type query about which formats would support automatic layout inference. This would be needed for handle types like Android Hardware Buffer.

Hopefully this plan can work and get us moving forward in stages. Feedback is appreciated.

Thanks,
Balaji

lakshmih pushed a commit to lakshmih/OpenCL-Docs that referenced this issue Jun 13, 2023
Add query to clGetDeviceInfo for external memory handle types
that are assumed to be linear.
@nikhiljnv
Copy link
Contributor Author

@bcalidas
Thanks for the comment and the proposal. While I am aligned on overall CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
ASSUME_LINEAR_HANDLE_TYPES_KHR query, assuming that a handle type that doesn't assume linear layout by default will imply inference abilities is problematic, especially if we already agree that inference abilities depend on more than just handle type.

If you already have a draft spec, will it be possible to push a PR sometime this week or early next week? May be we can continue to discuss any issues as part of spec PR review.

@bcalidas
Copy link

bcalidas commented Jul 15, 2023

@nikhiljnv
Could you take a look at lakshmih@048dc1a

This PR adds the spec update for assume_linear_handle_types. There is no expectation that the layout of handle types not in the assume_list will be inferred. Just that when no other information is provided, for these handle types the layout is not guaranteed to be linear.

Thanks,
Balaji

@Kerilk
Copy link
Contributor

Kerilk commented Jul 18, 2023

To try and summarize my position on the call of July 18, which stems from @bcalidas suggestions of adding hints to help infer tiling:

  • To import an image there should be a list properties, and among those properties two exclusive options:
    • EXPLICIT_TILING this flag indicates the user knows the tiling and will specify it in the following properties. For now we only support TILING_LINEAR. We should be able to extend to DRM modifiers in the future
    • INFERRED_TILING this flag indicates the user will rely on the implementation to accept the image an infer the tiling used. As @bcalidas suggested, this can be followed by one or more hints that will allow the implementation to eventually disambiguate. FROM_VULKAN or FROM_OPENGL (I am terrible at naming things so please don't consider these too harshly) were suggested, but other could be devised (and, as was suggested, they could be vendor specific). This would still most probably only work when exchanging images from implementation from the same vendor, but we could express this in the spec, or create a hint to express this as well.
  • Optimal tiling is subjective and dependent on the platform/implementation, so would most probably only work with exchanging images between implementations from the same vendor. This means optimal tiling is covered by INFERRED_TILING as the implementation has no reason to use anything else than optimal in the general case.

The Objective here is to get an API that we can evolve while keeping backward compatible.

@bcalidas
Copy link

To expand further on Brice's comments, here's how this could be done.

  1. We add a clGetDeviceInfo() query with the CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
    ASSUME_LINEAR_HANDLE_TYPES_KHR token. This query will return a list of handle types for which the tiling is assumed to be linear in the absence of other tiling information.

  2. We add a GetDeviceInfo() query with the CL_DEVICE_EXTERNAL_MEMORY_TILING_INFERENCE_HINTS_REQUIRED_HANDLE_TYPES_KHR token. This query will return a list of handle types for which linear is not the default and an inference hint is required. A handle type that appears in the list returned by CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
    ASSUME_LINEAR_HANDLE_TYPES_KHR query should not appear in the list returned by this query.

  3. If a handle type is supported by the implementation but does not appear in either of the lists ( from 1) and 2) ) then it is implicit that tiling inference will work without the need for any hints. On such an implementation for this handle type, an application could import an image with any kind of tiling without having to pass any hints in the properties list of clCreateImageWithProperties()

  4. For handle types where a hint is required an application may format the properties list of clCreateImageWithProperties() as follows -
    properties = {CL_EXTERNAL_MEMORY_HANDLE_OPAQUE_FD_KHR, fd, CL_EXTERNAL_MEMORY_INFER_TILING, CL_EXTERNAL_MEMORY_HINT_SOURCE_VK_KHR 0};
    CL_EXTERNAL_MEMORY_INFER_TILING is part of a pair. The 2nd token in the pair specifies what type of inference hint to apply.
    We could agree on some hints that are broadly cross vendor and part of the external_memory_spec. Vendors could add hints that are specific to their implementations ( through vendor extensions ).

*For some implementations these hints are necessary since a given handle type could have come from one of many different sources. The inference mechanism could be different for each source.

  1. When the tiling_control and drm format extensions are worked out , we would layer in the ability to specify tiling explicitly. For example, the properties passed to clCreateImageWithProperties() could look something like -
    properties = {CL_EXTERNAL_MEMORY_HANDLE_OPAQUE_FD_KHR, fd, CL_EXTERNAL_MEMORY_TILING_EXPLICIT, CL_TILING_OPTIMAL, CL_DRM_FORMAT_MODIFER, drm_fmt_modifer_value, 0};

  2. We looked at the intersection of these suggestions with proposed new extensions such as YUV images. We believe that these are compatible.

The overall takeaway is that in many cases tiling is already handled implicitly through inferencing. We can use the properties list in clCreateImageWithProperties() to bring clarity to the inferencing mechanisms.

@bashbaug
Copy link
Contributor

bashbaug commented Jul 24, 2023

Hmm, I had slightly different takeaways from our discussion last week. My key observation was that there really are only two "classes" (or "types") of tiling, which are mutually exclusive (you only have one or the other, never both):

The first class is "explicit tiling layouts", where an application specifies (prescribes) a specific tiling layout. If an application specifies this specific tiling layout incorrectly then the program is incorrect and will very likely result in an error or in undefined behavior (e.g. a corrupted image). The most obvious explicit tiling layout is "linear tiling", but there could be other explicit tiling layouts as well, such as DRM format modifiers.

The second class is "implicit tiling layouts", where no specific tiling layout is specified. With this class of tiling an implementation must to infer or otherwise determine what the tiling layout is for a handle. There are several mechanisms that could be used to do this: for example an implementation may be able to query the tiling layout (e.g. through the OS?) for some handle types, or an implementation may know by policy what the tiling layout is for certain types of handles plus image properties (e.g. image type, format, dimensions, ...).

A few follow-on observations:

  1. What we have been calling "optimal tiling" is in the class of "implicit tiling layouts" since it is not explicit. Knowing that an image is "optimally tiled" may help an implementation to determine what the tiling layout is, especially if an implementation assumes it is "optimally tiled like Vulkan", but it still requires some amount of inference or policy to determine exactly what the tiling layout is for an "optimally tiled" handle.
  2. At some cost of complexity we could add additional variants of "optimal tiling" to give an implementation more information, such as "optimally tiled like OpenGL", or "optimally tiled like Vulkan but with policy foo", but these are still going to fall into the "implicit tiling layouts" class.
  3. Any form of "implicit tiling layouts" by policy is going to be brittle, especially across vendors, since different vendors will almost certainly have have different policies.
  4. Forms of "implicit tiling layouts" where an implementation can query the tiling layout for a given handle could be robust, even across vendors, though it's not clear (to me, at least) which handle types support this functionality - maybe Android HW buffer?
  5. Forms of "explicit tiling layouts" are the most robust by definition, though they do require both the exporting implementation and the importing implementation to support the specific tiling layout.
  6. An extension like cl_ext_image_tiling_controls might be more useful than I originally thought because it lets an application choose between an "explicit tiling layout" (in the form of "linear tiling") for broad portability and an "implicit tiling layout" (in the form of "optimal tiling"). We should do our best to get this done ASAP (and, probably, the DRM format modifiers extension) to give us some additional tools in our tool box.

(I think this is all compatible with @Kerilk's observations above, maybe just said slightly differently and with a few more words.)

edit: spelling

@bcalidas
Copy link

bcalidas commented Jul 29, 2023

The conclusion from the most recent discussion is -

  1. To retain the We add a clGetDeviceInfo() query with the CL_DEVICE_EXTERNAL_MEMORY_IMPORT_
    ASSUME_LINEAR_HANDLE_TYPES_KHR token. With this query, the external_memory spec can be finalized.

  2. To work on a follow-on spec which adds inferencing to external memory import. It was agreed that many cases of external memory import would rely on inferred layout, format and size for images and buffers.

  3. To follow-up with completed image tiling control and drm format modifer spec which would layer on top of the external_memory spec and add support for explicit tiling control of imported images.

@bcalidas
Copy link

#940 has been updated per review comments.

@bcalidas
Copy link

bcalidas commented Aug 9, 2023

This is the pull request for the header file.
KhronosGroup/OpenCL-Headers#234

lakshmih pushed a commit to lakshmih/OpenCL-Docs that referenced this issue Aug 29, 2023
Add query to clGetDeviceInfo for external memory handle types
that are assumed to be linear.
bashbaug pushed a commit that referenced this issue Sep 5, 2023
* External memory import: Add assume linear query (#861)

Add query to clGetDeviceInfo for external memory handle types
that are assumed to be linear.

* Assign macro numeric value and improve formatting

* Address review comments

* Fixed token name to what was agreed on in the review

* Updated minor version

---------

Co-authored-by: Joshua Kelly <[email protected]>
@nikhiljnv
Copy link
Contributor Author

Need to check CTS coverage and decide if need a CTS issue to track further changes.

@nikhiljnv
Copy link
Contributor Author

Closing as discussed in memory subgroup call on Sept 19, 2023.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants