You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason is that when calculating the attention, some element of multiplication of Q and K will be extramely large (~3e5) in line 319, class SparseCausalAttention. But when I changed to Stable Diffusion v1.4, this issue is solved.
# attention, what we cannot get enough of
if self._use_memory_efficient_attention_xformers:
hidden_states = self._memory_efficient_attention_xformers(query, key, value, attention_mask)
# Some versions of xformers return output in fp32, cast it back to the dtype of the input
hidden_states = hidden_states.to(query.dtype)
else:
if self._slice_size is None or query.shape[0] // self._slice_size == 1:
hidden_states = self._attention(query, key, value, attention_mask)
else:
hidden_states = self._sliced_attention(query, key, value, sequence_length, dim, attention_mask)
# linear proj
hidden_states = self.to_out[0](hidden_states)
# dropout
hidden_states = self.to_out[1](hidden_states)
return hidden_states
The text was updated successfully, but these errors were encountered:
The reason is that when calculating the attention, some element of multiplication of
Q
andK
will be extramely large (~3e5
) in line319
,class SparseCausalAttention
. But when I changed to Stable Diffusion v1.4, this issue is solved.The text was updated successfully, but these errors were encountered: