Skip to content

Commit

Permalink
test
Browse files Browse the repository at this point in the history
remove other

add max test

fix bug

add no prompt cache

add no prompt cache

add no prompt cache

add no prompt cache

add no prompt cache

add time cost test

add time cost test

add autotune

add autotune

add autotune

add autotune

add autotune

add autotune

add autotune

add all

add all

add all
  • Loading branch information
sufubao committed Sep 25, 2024
1 parent 377a882 commit 3854d32
Show file tree
Hide file tree
Showing 3 changed files with 634 additions and 171 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ def _context_attention_kernel(
o_tensor = self.alloc_tensor(q.shape, q.dtype) if out is None else out
if infer_state.use_dynamic_prompt_cache:
kv = infer_state.mem_manager.kv_buffer[self.layer_num_]

context_attention_fwd(
q.view(-1, self.tp_q_head_num_, self.head_dim_),
kv[:, 0 : self.tp_k_head_num_, :],
Expand Down
Loading

0 comments on commit 3854d32

Please sign in to comment.