-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP acceleration structure API proposal #3544
base: master
Are you sure you want to change the base?
Conversation
src/hal/src/device.rs
Outdated
// - write_acceleration_structures_properties | ||
|
||
// TODO for checking if a serialized blob is valid with the current driver version. DX12 docs say this is for PIX tooling and building a as from scratch is "likely to be faster than loading one from disk" (https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_raytracing_acceleration_structure_copy_mode). The Vulkan spec/literature doesn't mention perf implications of serialization. | ||
// - get_device_acceleration_structure_compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function operates basically the same in Vulkan (VkGetDeviceAccelerationStructureCompatibility) and DX (CheckDriverMatchingIdentifier). Both take a pointer to the serialized data and read two GUIDs (32 bytes) to see if it matches the current driver (DX can signal slightly more info than VK).
On the comment in AccelerationStructureCopyMode
, I wonder if serializing an acceleration structure is something we need to expose as a part of the API. I can't find any literature on why one would want to do this in Vulkan, but as said mentioned link, DX mainly uses serialization for PIX and not to save on build time. If this is also true for Vulkan (like RenderDoc wants it or something), we may be able to leave this off the API and handle the debug tooling case as a backend implementation detail (i.e. I don't think people are writing debug tooling targetting the gfx api yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want a ray-traced scene to load faster if the data like this is loaded from disk?
I'd be totally fine just postponing any serialization stuff if you don't consider it useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know of a vulkan resource where we can ask about the perf implications of serialization (or just general "best practices")? The DX docs (for D3D12_RAYTRACING_ACCELERATION_STRUCTURE_COPY_MODE_DESERIALIZE
) seem to imply this isn't a perf win:
This mode is only intended for tools such as PIX, though nothing stops any app from using it, but this mode is only permitted when developer mode is enabled in the OS. This copy operation is not intended to be used for caching acceleration structures, because running a full acceleration structure build is likely to be faster than loading one from disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Khronos Slack would be the best place to ask
} | ||
} | ||
|
||
// TODO AFAIK rust doesn't have custom sized fields, so we'll need some binary writer wrapper to actually support this at an API level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe packed_struct
or Pre-RFC: Generic integers (uint and int) could be applicable here? Unsure about the likelihood adding a library dependency or using an unimplemented language feature proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gfx-hal is a low-level API. It's fine to cut some corners and just document what bits are expected within u32
.
Extra dependencies are discouraged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here is that we probably want this struct to be bit-for-bit compatible with VkAccelerationStructureInstanceKHR
and D3D12_RAYTRACING_INSTANCE_DESC
so users can create an array of acceleration_structure::Instance
and upload it directly to a GPU buffer. Otherwise, we need to settle for describing this carefully and letting users BYO packed struct impl... thoughts?
The Metal backend will probably have to do something special to repack into its own special format, however
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did mean preserving the bit-by-bit if this needs to go into the driver. Just instead of doing bitfields, we could have raw unsigned integers with documented layouts - that's what I meant by cutting the corners.
|
||
#[derive(Debug)] | ||
pub enum AccelerationStructureCopyMode { | ||
Clone, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bikeshed: "clone" is the vk terminology, but I think in Rust it implies a deep copy, which I don't believe is what the acceleration structure copy command does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, let's rename. Shallow
? ShallowClone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any strong opinions against just Copy
? I do think simpler is probably better, but I don't think even Clone
would confuse many people familiar with gfx/vulkan...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion, Copy
is fine :)
src/hal/src/command/mod.rs
Outdated
@@ -604,6 +604,46 @@ pub trait CommandBuffer<B: Backend>: fmt::Debug + Any + Send + Sync { | |||
/// Requests a timestamp to be written. | |||
unsafe fn write_timestamp(&mut self, stage: pso::PipelineStage, query: query::Query<B>); | |||
|
|||
/// TODO docs | |||
/// `build_range_infos` must be the same length as `infos` and each element must have a length equal to the parallel `info` entry's `geometries.len()`. there's probably a way to make this safer without mangling the api shape too much... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These (and build_acceleration_structures_indirect) are some spicy requirements, but the API is unsafe, so I guess it could be argued to be reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they need to be the same length, and we aren't passing the pointers to these arrays directly in any of the backends anyway, why not make this one array? I.e.
infos: &[(acceleration_structure::AccelerationStructureGeometryDesc<B>, &[acceleration_structure::AccelerationStructureBuildRangeDesc])],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should probably make this an iterator, like in other commands (see wait_events
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions on the extra requirement (also for the indirect variant) of:
for (info, build_range_infos) in infos.iter().zip(build_range_infos.iter()) {
assert_eq!(info.geometries.len(), build_range_infos.len());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an assert_eq
sounds ok, for now at least.
/// A description of the geometry data needed to populate an acceleration structure. | ||
/// | ||
/// TODO: there's something here that smells w/ what fields are needed to get the required build size vs what fields are needed to actually build. Also, the top/bottom levels having different requirements on which fields are valid. | ||
// TODO: I don't like that this is reused for the "get build sizes" and "do actual build" with complex rules on what fields are ignored when. Perhaps we could use DX12's model for `D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_DESC` and `D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS` and move at least the src/dst fields onto the actual "build acceleration structure" command.? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the struct I'm most interested in feedback for. Mentioned in the many TODO comments on and around it, this is a kind of central type which most examples I've seen just keep around a mutable version of this and update fields as needed by methods.
I think we can move the src/dst/scratch values onto build_acceleration_structures*
or have another type that has src/dst/scratch and a AccelerationStructureGeometryDesc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds appropriate, let's try that!
I assume the Vulkan backend would be able to build this stuff internally with no issues if we separate it and follow DX12 style here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be correct.
Tangentially related: I'd wager we won't be able to reuse vulkan or dx structs across calls that effectively (i.e. the user will have a gfx *Desc, rather than a vulkan or dx *Desc struct), so I assume that we're okay with recreating these descriptor structs as needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, yes. We'll be recreating them all the time.
TopLevel, | ||
/// | ||
BottomLevel, | ||
// TODO vulkan supports "generic level" (where the concrete build type is specified), but discourages its use for applications "written directly for Vulkan" since it "could affect capabilities or performance in the future" (https://www.khronos.org/blog/vulkan-ray-tracing-final-specification-release). Perhaps this is to better support `vkd3d-proton`, but we probably don't want it exposed in gfx? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, if there is a possibility of running vkd3d over gfx-portability, then we should try to make it possible.
For example, people could use it to run windows games on macOS.
In this case, we can make a call to not support other levels at least for now, just leaving a comment about this omission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this back. I wonder if there's a nicer way to handle Generic
, as it's allowed in the "get sizes" and "create" functions, but not during the "build" commands.
src/hal/src/device.rs
Outdated
// - write_acceleration_structures_properties | ||
|
||
// TODO for checking if a serialized blob is valid with the current driver version. DX12 docs say this is for PIX tooling and building a as from scratch is "likely to be faster than loading one from disk" (https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_raytracing_acceleration_structure_copy_mode). The Vulkan spec/literature doesn't mention perf implications of serialization. | ||
// - get_device_acceleration_structure_compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want a ray-traced scene to load faster if the data like this is loaded from disk?
I'd be totally fine just postponing any serialization stuff if you don't consider it useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like github is eating the comments :(
src/backend/dx12/src/command.rs
Outdated
@@ -2829,6 +2829,75 @@ impl com::CommandBuffer<Backend> for CommandBuffer { | |||
); | |||
} | |||
|
|||
unsafe fn build_acceleration_structures<'a, I>(&self, _descs: I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure you don't want to just make this a default implementation in the trait, to avoid all the boilerplate in backends that don't support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've undone the backend stubs for now and will make a default "unimplemented" implementation in gfx-hal
.
720e91a
to
8cdcb17
Compare
Please try to rebase instead of merging, to keep history linear |
I'll do this when I clean up this draft, please call me out if I forget then! |
@kvark the changes in Some things on my mind:
|
Thank you for moving this forward! First of all, we've been trying to wrap up hal-0.7 release for the last month. I'd like to do this before merging the changes here. Also note how a lot of type signatures are simplified now:
We are totally cool with depending on github revisions from master. We only switch to releases when necessary, i.e. during a release. So this is not a blocker to merging the PR. |
@tangmi what's the status of this work? |
The current state is that I have an API shape I'm pretty happy with and am in the middle of fixing some issues in the Vulkan impl (I think the last thing I was working on was overthinking mapping lists-of-lists of structs such that all the lifetimes agree). Unfortunately, life and work has taken me away from this for the last couple months. Also, I (and everyone else) have been having a real hard time getting a hold of a GPU that can test ray query--although if ray tracing pipelines work, it's very likely that ray query will also work. I'm definitely still eager to get back into gfx-rs. On the plus side, I believe |
@tangmi it would be good if gfx-rs organization created a "hardware fund" for cases like this... |
@kvark I'd happily contribute to a fund like that--sadly, I just can't find any ray tracing cards for sale at all this year! (I know there's scalpers around but I feel icky supporting that!) |
@tangmi now with Vulkan's RT support, aren't there any AMD cards on the market? Sorry, I'm not generally shopping for these. |
It would be great to chat on #gfx:matrix.org (or a different platform, if you have a preference?) and see if we can unblock your progress. |
16c5212
to
1f622c9
Compare
Initial API proposal for just the acceleration structure part of adding ray tracing support. Once the API design is set, we can implement (and smoke test) and then move on to adding ray tracing pipelines and ray query feature detection.
There's a lot of open questions I have in the comments and I intend for this draft to be a place to track discussion. @kvark: I intend to drive discussion on individual pieces at
#gfx:matrix.org
, referring to this PR as a reference, so don't worry about reviewing all at once (I'll make sure we get through all of it)!The main theme of nearly all my questions is about how far can we deviate from
VK_KHR_acceleration_structure
to better fitgfx
(and potentially gain some API safety). Additionally, my intention is to make an API that is implemented trivially in Vulkan and pretty easily in DX12. Metal is an afterthought (although it is being thought about!), but I don't expect to make any significant concessions in the API to accommodate MPS.Here's some notes on implementation differences I expect between DX12 and Vulkan, which may be interesting to discuss how we'll solve:
B::AccelerationStructure
vk::AccelerationStructureKHR
ID3D12Buffer
, since there's no handle associated with an acceleration structureash::extensions::khr::AccelerationStructure::write_acceleration_structures_properties
withash::vk::QueryType::ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR
ID3D12GraphicsCommandList4::EmitRaytracingAccelerationStructurePostbuildInfo
(orBuildRaytracingAccelerationStructure
) and does not use a QueryPool. We can likely modify dx12's B::QueryPool to include a special case for the acceleration structure queries.vk::PhysicalDeviceAccelerationStructureFeaturesKHR::acceleration_structure_host_commands
+ methods onDevice
mint::RowMatrix3x4
. Metal uses a 4x4 matrix (matrix_float4x4
), which I believe maps tomint::RowMatrix4
, given this "Working with Matrices" doc.float3
min, then max to specify the AABB. Metal puts a bounding box on every acceleration structure and doesn't have an AABB-specific variant.VkAccelerationStructureInstanceKHR
D3D12_RAYTRACING_INSTANCE_DESC
VkAccelerationStructureGeometryKHR
andVkAccelerationStructureBuildRangeInfoKHR
at descriptor and build time, respectively. ThevkGetAccelerationStructureBuildSizesKHR
takes a primitive count array that we can use for the DX caseD3D12_RAYTRACING_GEOMETRY_TRIANGLES_DESC
orD3D12_RAYTRACING_GEOMETRY_AABBS_DESC
, and doesn't require any additional data at build time. We will just add the primitive count array values to the geometry info structs as we build them.Related: #2418