-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accuracy and performance of bfloat16 with bitblas linear #161
Comments
hi @AbedKhateeb2 , bfloat16 related test can be found at https://github.com/microsoft/BitBLAS/blob/main/testing/python/operators/test_general_matmul_bf16.py would you mind provide a simple unit test to reproduce? because I cannot access the layer that you mentioned within your problem. |
thank you @LeiWang1999 for your response 😃 import time
from bitblas import Linear as BitBLASLinear
import torch
import torch.nn as nn
import os
import torchvision.models as models
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
# Load a pre-trained VGG-16 model
vgg16 = models.vgg16(pretrained=True)
# Get the last linear layer from the pre-trained model
linear_layer = vgg16.classifier[3].to(torch.bfloat16).cuda()
linear_layer.bias=None
print(f"quantizing {linear_layer}")
in_features = linear_layer.in_features
out_features = linear_layer.out_features
opt_M = 1
input_tensor = torch.rand(opt_M, in_features).to(torch.bfloat16).cuda()
st = time.time()
while time.time() - st < 1.0:
linear_layer(input_tensor)
times = 1000
with torch.no_grad():
start_time = time.time()
for _ in range(times):
output_torch = linear_layer(input_tensor).to(torch.bfloat16)
end_time = time.time()
print(f"torch linear took by avg {(end_time-start_time)/times}")
start_time = time.time()
bitblas_linear = BitBLASLinear(linear_layer.in_features, linear_layer.out_features, bias=linear_layer.bias is not None, opt_M=opt_M, accum_dtype='float32', A_dtype='bfloat16', W_dtype='bfloat16')
bitblas_linear.load_and_transform_weight(linear_layer.weight.clone())
if linear_layer.bias is not None:
bitblas_linear.bias.data = linear_layer.bias.data.clone()
st = time.time()
while time.time() - st < 1.0:
bitblas_linear(input_tensor)
end_time = time.time()
print(f"bitblas linear took to init : {(end_time-start_time)} sec")
bitblas_linear.cuda()
with torch.no_grad():
start_time = time.time()
for _ in range(times):
output_bitblas = bitblas_linear(input_tensor)
end_time = time.time()
print(f"bitblas linear took by avg {(end_time-start_time)/times}")
print("torch compare : ", torch.mean(torch.abs(output_torch.to(torch.bfloat16)-output_bitblas.to(torch.bfloat16)))) the result : |
BitBLAS/bitblas/module/__init__.py Lines 267 to 270 in 872d6d7
There do exist a bug in BitBLASLinear that causes any datatype to be casted into |
Hi @AbedKhateeb2 , Take a look at pr #164 You can check out this fix by installing the upstream bitblas with command |
I tried to run bfloat16 linear of bitblas
but I got different result
output:
qunatizing /decoder/block/0/layer/0/SelfAttention/k
torch linear took by avg 7.802581787109375e-05
BitBLAS Operator found in global_operator_cache.
bitblas linear took to init : 1.157283067703247 sec
bitblas linear took by avg 7.474946975708008e-05
torch compare : tensor(2.2344, device='cuda:0', dtype=torch.bfloat16)
the linear layer is from pretrained model
the model was trained with bf16
cuda version : 12.1
gpu : A10G
ubuntu
bitblas version bitblas==0.0.1.dev15
The text was updated successfully, but these errors were encountered: