Does it supprts batch inference? #16

SkyAndCloud · 2024-05-23T09:39:28Z

Hi guys, thank you for this excellent work!
It seems that this code does not consider the attention_mask during chunkllama inference. Does this code support batch inference?

ChenxinAn-fdu · 2024-05-23T13:28:21Z

Yes! We use flash_attn_func for better efficiency and simplicity. Changing to flash_attn_varlen_func should not be difficult. If you encounter any difficulties, please feel free to leave a comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it supprts batch inference? #16

Does it supprts batch inference? #16

SkyAndCloud commented May 23, 2024

ChenxinAn-fdu commented May 23, 2024

Does it supprts batch inference? #16

Does it supprts batch inference? #16

Comments

SkyAndCloud commented May 23, 2024

ChenxinAn-fdu commented May 23, 2024