Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the metal backend to always reuse command encoders/buffers unless a shared memory access is requested #2037

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
b2ead34
oh my god it works
tomsanbear Apr 10, 2024
adaf302
fix other packages
tomsanbear Apr 10, 2024
bb594bc
refactor metal backend to split storage and device files for readability
tomsanbear Apr 10, 2024
73088cc
Merge branch 'main' of github.com:huggingface/candle into CommandEnco…
tomsanbear Apr 10, 2024
ba0aaa9
Revert "Merge branch 'main' of github.com:huggingface/candle into Com…
tomsanbear Apr 10, 2024
826b16d
Revert "refactor metal backend to split storage and device files for …
tomsanbear Apr 10, 2024
b1e09e6
undo setting chnage
tomsanbear Apr 10, 2024
90e6909
undo messed up revert
tomsanbear Apr 10, 2024
591e94c
undo messed up revert
tomsanbear Apr 10, 2024
0879f67
enable metal tracing option
tomsanbear Apr 10, 2024
789510e
fix kernel tests
tomsanbear Apr 10, 2024
93abe5c
rearrange the spot where we call wait until completed
tomsanbear Apr 10, 2024
843326c
remove unused import
tomsanbear Apr 10, 2024
1a5792a
reintroduce drop unused buffers
tomsanbear Apr 10, 2024
cf5e2fc
simplify even further by removing creation of buffers during copy ope…
tomsanbear Apr 10, 2024
c51c71e
fix synchronization method
tomsanbear Apr 10, 2024
4067bdd
cleanup memory barriers to only depend on inputs
tomsanbear Apr 10, 2024
41bb216
refactor command encoder and buffer to be lazily initialized
tomsanbear Apr 11, 2024
0a1d47e
add syncchronize back in
tomsanbear Apr 11, 2024
ef69643
clean up documentation
tomsanbear Apr 11, 2024
79647c8
simplify the to_cpu op
tomsanbear Apr 11, 2024
912755d
Merge branch 'main' of https://github.com/huggingface/candle into Com…
tomsanbear Apr 11, 2024
3577e24
Merge branch 'main' of https://github.com/huggingface/candle into Com…
tomsanbear Apr 12, 2024
f57a24f
cleanup constructor for the metal device
tomsanbear Apr 12, 2024
071199c
Merge branch 'main' of https://github.com/huggingface/candle into Com…
tomsanbear Apr 13, 2024
5ba51c1
update docs explaining usage of the end_compute_encoding function
tomsanbear Apr 13, 2024
74a265f
remove unnecessary memory barrier
tomsanbear Apr 13, 2024
c026881
review feedback changes on command encoder/buffer internals
tomsanbear Apr 14, 2024
7d28704
prevent oom errors
tomsanbear Apr 14, 2024
4089486
actually increment
tomsanbear Apr 14, 2024
6fcfdde
allocate data directly
tomsanbear Apr 14, 2024
675ef87
Merge branch 'main' of https://github.com/huggingface/candle into Com…
tomsanbear Apr 14, 2024
f988a52
change to independent command buffers for blit encoders
tomsanbear Apr 14, 2024
572e269
update ones impl to use direct buffer allocation
tomsanbear Apr 14, 2024
566bcb2
Merge branch 'main' of https://github.com/huggingface/candle into Com…
tomsanbear Apr 28, 2024
4f73ca1
Merge branch 'main' of https://github.com/huggingface/candle into Com…
tomsanbear May 4, 2024
b8b52ce
update impl to use command encoder
tomsanbear May 4, 2024
0d8f4f2
use sat sub
tomsanbear May 4, 2024
3add705
fix zero tests
tomsanbear May 4, 2024
b161320
fix
tomsanbear May 4, 2024
042e001
adjust buffer behaviour:
tomsanbear May 4, 2024
54c7e9b
Merge branch 'main' of https://github.com/huggingface/candle into Com…
tomsanbear May 6, 2024
53ac546
clippy fix
tomsanbear May 6, 2024
c4e2434
format
tomsanbear May 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion candle-core/benches/benchmarks/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ impl BenchDevice for Device {
}
Device::Metal(device) => {
#[cfg(feature = "metal")]
return Ok(device.wait_until_completed()?);
return Ok(device.synchronize()?);
#[cfg(not(feature = "metal"))]
panic!("Metal device without metal feature enabled: {:?}", device)
}
Expand Down
1 change: 0 additions & 1 deletion candle-core/src/convert.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
//! Implement conversion traits for tensors
use crate::{DType, Device, Error, Tensor, WithDType};
use half::{bf16, f16, slice::HalfFloatSliceExt};
use std::convert::TryFrom;

impl<T: WithDType> TryFrom<&Tensor> for Vec<T> {
type Error = Error;
Expand Down
261 changes: 188 additions & 73 deletions candle-core/src/metal_backend/device.rs

Large diffs are not rendered by default.

Loading
Loading