Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP cl_ext_image_tiling_control #710

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

kpet
Copy link
Contributor

@kpet kpet commented Nov 10, 2021

Open topics:

Change-Id: I6c391b5ecaa203f7db566db68a7e3d124d6038a2
Signed-off-by: Kevin Petit [email protected]

@kpet kpet force-pushed the cl-ext-image-tiling-control branch from cc5e8ec to 69ce1e4 Compare December 2, 2021 18:08
@kpet kpet force-pushed the cl-ext-image-tiling-control branch 2 times, most recently from fdabb60 to 938e085 Compare February 24, 2022 10:17
@bashbaug
Copy link
Contributor

bashbaug commented Oct 8, 2022

Discussed in the October 4th teleconference:

  • Do we need to define and document any interactions with YUV extensions (WIP cl_ext_yuv_images #722)?
  • Are there any interactions to consider when creating an image from a buffer, or from external memory?
  • It would be helpful to clearly describe the parts of this extension that provide additional functional guarantees beyond "performance hints".

@bcalidas
Copy link

bcalidas commented Oct 8, 2022

Some initial thoughts

  1. It should be possible for (non-linear) tiled images to be created with CL_MEM_USE_HOST_PTR. I believe the spec already allows for this implicitly.
  2. It would be helpful to allow tiled images to be created with a flag that permits host access but does not detile the image for host access. Therefore EnqueueMapImage would return a host pointer to the tiled image data. This could be useful in cases where the CPU has to transfer image data between the GPU and another IP block or the GPU and storage.
  3. As mentioned earlier we should clarify the interaction of this extension (specifically tiled images) with ext_image_from_buffer and external_memory_import with attention given to the role of row_pitch and slice_pitch.

Thanks,
Balaji Calidas
Qualcomm

Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One high-level thought: Especially because this extension leaves "optimal tiling" implementation-defined (this is a good thing), should the controls be a bit more descriptive and a bit less prescriptive?

Said another way, I think some implementations could benefit from this extension even if they don't support "tiling", by assuming that a "linear" image should be optimized for host access and an "optimal" image should be optimized for device access.

One change we could consider is to switch from CL_IMAGE_TILING_{LINEAR|OPTIMAL} to CL_IMAGE_LAYOUT_{LINEAR|OPTIMAL} (so tiling -> layout) to be a little more general.

Or, if we wanted to go a bit further, we could switch to something like CL_IMAGE_LAYOUT_{DYNAMIC|STATIC}, or even CL_IMAGE_LAYOUT_{HOST|DEVICE}_OPTIMAL.

What do you think?

Comment on lines 136 to 141
If {CL_MEM_IMAGE_TILING_EXT} is passed and set to a value different from
{CL_IMAGE_TILING_LINEAR_EXT}, {CL_MEM_USE_HOST_PTR} must not be set in _mem_flags_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It might be better to prohibit the CL_MEM_USE_HOST_PTR case specifically for CL_IMAGE_LAYOUT_OPTIMAL vs. anything that is NOT CL_IMAGE_TILING_LINEAR. It would certainly require documenting the tile format, but conceivably some specific tiling formats could work with CL_MEM_USE_HOST_PTR if the host knows how to interpret and de-tile the memory.

Then again, couldn't any tiled memory be de-tiled when it is mapped into the provided host pointer, then re-tiled when it is unmapped?

I'm not sure if I have a strong use-case that would require changing this behavior but it would be helpful to understand where it is coming from.

Comment on lines 184 to 189
| Return the tiling passed using the {CL_MEM_IMAGE_TILING_EXT} property or
{CL_IMAGE_TILING_LINEAR_EXT} if none was passed at image creation time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an obvious solution to this problem, but I think it will be misleading if this query returns CL_IMAGE_TILING_LINEAR if no property were specified at image creation time. If we implemented this extension then some of our devices would almost certainly behave differently if an application passed LINEAR explicitly vs. no property.

I suppose one solution to consider is to define a third DEFAULT layout type in this extension that is the equivalent of "no property specified". Some implementations might treat this the same as LINEAR or OPTIMAL, but some other implementations may not.

Comment on lines 201 to 206
*RESOLVED*: An application wishing to get visibility of tiled data can do so by
creating images from a buffer and mapping the buffer directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered defining a map flag to indicate "leave the memory as-is" vs. the current "convert to a linear layout"? I admittely haven't fully thought this through or even if this would be useful, but the disadvantage of the "create an image from a buffer" method is that it won't work after-the-fact if an image is already created, whereas a map flag might still work.

@bcalidas
Copy link

I wanted to leave some high-level thoughts about how this extension could work.

The major impact of linear vs optimal tiling modes would be when the image is created from a buffer or from external memory. In these cases, if the image has optimal tiling mode, then the application should be required to set row_pitch and/or slice_pitch to zero.

Applications should be able to choose whether an optimally tiled image should present a linear view when mapped for host CPU access. Currently, implementations are implicitly required to present a linear view of image data when mapped for host access. This extension could possibly give implementations the ability to opt out of providing that linear view. How the image data is finally presented when mapped for host access would then depend on both implementation capabilities and application selection.

Ben has suggested that applications make the linear view vs. raw view selection as part of the EnqueueMap command. This could also be specified when creating the image. We'll need to discussion these options further.

Thanks,
Balaji

@bashbaug
Copy link
Contributor

I wanted to leave some high-level thoughts about how this extension could work....

I think the comment above describes one way this extension could work, but not necessarily the only way.

IMHO the key value this extension provides is a hint (or assertion?) to the driver that the image should (or must?) be stored internally in a specific format - either "linear" (well-specified) or "tiled" (implementation-defined, in this extension, at least). This will primarily change the performance characteristics of an image, especially when it is mapped, though likely also when it is accessed within a kernel.

Once we have the ability to control or otherwise influence the internal layout it gives us the ability to add additional functionality, either in this extension or in other layered extensions. For example, we could:

  1. Influence or mandate an internal layout when creating an image from a buffer or from an external memory handle.
  2. Map the image but provide direct access to image data in the internal layout vs. converting to a linear layout.

One of the key questions I think we'll need to answer is whether this extension is purely a performance hint or whether it is something stronger. For the base-level functionality I think we could go either way, but for the additional functionality (external memory and direct access mapping) I think we'd want stronger guarantees (at least in some cases).

to *clCreateImageWithProperties* to control the tiling used for the image being
created. The following values are accepted:

* {CL_IMAGE_TILING_LINEAR_EXT}, which is the default value used if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For implementations that don't/won't support cl_ext_image_tiling_control, the property CL_MEM_IMAGE_TILING_EXT will be missing and the default implementation choice of clCreateImageWithProperties may not necessarily be linear tiling.
For such implementations, the requirement of CL_IMAGE_TILING_LINEAR_EXT being the default will mean that if they decide to support this extension, clCreateImageWithProperties behavior will need to break backwards compatibility and all existing implementations will need to explicitly pass CL_IMAGE_TILING_OPTIMAL_EXT or some other flag to get the same layout as before. Otherwise, they will end up forcing linear layout and are likely to incur performance penalty. This would be painful and extremely undesirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, and I had a similar comment above.

One possible way to solve this problem is to have three "tiling" states:

  1. An explicitly linear image.
  2. An explicitly tiled image.
  3. A "default" or "unspecified" layout where the implementation can choose whether the image is linear or tiled or something else.

(1) and (2) are already defined by this extension. (3) would need to be added. If we added (3) and made it the default value I think it would retain backwards compatibility and avoid any performance penalties?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in the January 3rd teleconference. Another possibility is to only have two tiling states, but to give freedom to implementations to use linear tiling even for "tiling optimal" images if it chooses to do so. If we did this then the default tiling property could be "optimal", which would also avoid the possible performance penalty.

This isn't a perfect solution. A case where this is insufficient is an external memory case, where the exporting API (e.g. Vulkan) uses a different policy for "tiling optimal" images than the importing API (e.g. OpenCL).

A more robust solution is to specify the explicit type of tiling, akin to VK_EXT_image_drm_format_modifier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both having a separate default or optimal covering non-linear as well as linear tiling modes should work.
Default may be unnecessary if optimal allows both.
In case of default or optimal tiling mode, if implementation is free to choose optimal or linear mode, what should clGetImageInfo CL_IMAGE_TILING_EXT query return? Should it return default/optimal tiling/layout or should it return optimal / linear as chosen by implementation?

@bcalidas
Copy link

bcalidas commented Jan 9, 2023

I posted some comments on #861 which include updates to cl_ext_image_tiling_control. These comments also discuss how cl_ext_image_tiling_control could bring clarity to external memory and image_from_buffer usage.

Thanks,
Balaji Calidas
Qualcomm

@bcalidas
Copy link

Following up on the discussion around #861 from Jan 10, 2023 We'd like to propose the following outline for his extension.

First we recommend that this extension be made khr since it is foundational to other extensions including external memory.
Image_tiling_control defines 2 new properties for image tiling.

CL_IMAGE_TILING_LINEAR_KHR and CL_IMAGE_TILING_OPTIMAL_KHR

These properties can be used when calling clCreateImageWithProperties and bring clarity to the image layout when used.

Optimal tiling can be thought of as a superset of linear tiling. What it does it to let implementations decide how the image should be tiled. An implementation can continue to use linear tiling even if the application selects CL_IMAGE_TILING_OPTIMAL_KHR With that in mind, we believe all implementations should be able to support both linear and optimal tiling.

We also recommend that all implementations be required to support device access to optimally tiled and linear images. It seems that device access is a minimum for the image to be useful. Regarding host access, we propose that the default behavior be to present a linear view of the image data when clEnqueueMapBuffer is called. This means that implementations will need to detile optimally tiled images ( if needed ) when clEnqueueMapBuffer is called. If this is problematic for some vendors, we can add a cl_device_info param that lets the application know if the implementation can detile optimally tiled images when clEnqueueMapBuffer is called.

For many use cases having direct access to tiled image data is useful. For this reason, we are proposing a new flag CL_MAP_IMAGE_NO_DETILE_KHR. When used in clEnqueueMapImage, the implementation will not detile the data for host access.

Thanks,
Balaji Calidas
Qualcomm

Change-Id: I6c391b5ecaa203f7db566db68a7e3d124d6038a2
Signed-off-by: Kevin Petit <[email protected]>
@kpet kpet force-pushed the cl-ext-image-tiling-control branch from 938e085 to e781646 Compare January 17, 2023 15:54
@kpet
Copy link
Contributor Author

kpet commented Jan 17, 2023

I've rebased the specification draft and captured what I believe to be an exhaustive list of all the open topics in the PR description. We seem to be mostly aligned on where this extension should go.

@bcalidas
Copy link

bcalidas commented Apr 1, 2023

We uploaded an update draft that has the following major modifications.

  1. Implementations supporting this extension will support both linear and optimal image layouts. Optimal layouts can be mapped to linear by an implementation.
  2. Allows for per image query of possible tiling modes, actual tiling mode and behavior of EnqueueMapImage.
  3. By default EnqueueMapImage will present a linear view of the image data. However, for some images ( likely extension images as opposed to core image formats ) a linear view will not be guaranteed when mapping the image. Instead the raw image data will be presented.
  4. Applying this extension to images that are created without explicitly specifying the tiling -
    a) Most images will be considered to have optimal tiling. This is the default. When these images are mapped, a linear view of the image data will be presented.
    b) When an image is created from a buffer, the image is considered to have linear tiling if a row pitch and/or slice pitch are specified.

Thanks,
Balaji

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants