Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

想请教下:chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理,另外是否支持在昆仑R200的多卡推理? #9371

Open
xc005 opened this issue Nov 5, 2024 · 0 comments
Assignees
Labels
question Further information is requested

Comments

@xc005
Copy link

xc005 commented Nov 5, 2024

问题

chatglm2-6b用paddlenlp的预训练权重在多卡上推理,对显卡或环境有什么具体要求吗?这边在4090显卡跑了下但出错了,可以麻烦帮忙看下是什么问题导致的吗,如环境还是多卡推理代码,多谢

运行环境:4090显卡 + cuda11.8 + paddlepaddle-gpu==3.0.0b1,
+ PaddleNLP: 3.0.0b2.post20241105(branch:dev_20240926_update_chatglmv2)

执行命令:python -u -m paddle.distributed.launch --gpus "0,1" glm2_infer.py

glm2_infer.py内容如下
'''
import paddle
from paddlenlp.transformers import AutoModelForCausalLM, AutoTokenizer
from paddle.distributed import fleet
import time

tokenizer = AutoTokenizer.from_pretrained('/code/pretrain/pd/THUDM/chatglm2-6b')

tensor_parallel_degree = paddle.distributed.get_world_size()

print(f"***tensor_parallel_degree={tensor_parallel_degree}")

tensor_parallel_rank = 0

if tensor_parallel_degree > 1:
strategy = fleet.DistributedStrategy()
strategy.hybrid_configs = {
"dp_degree": 1,
"mp_degree": tensor_parallel_degree,
"pp_degree": 1,
"sharding_degree": 1,
}
fleet.init(is_collective=True, strategy=strategy)
hcg = fleet.get_hybrid_communicate_group()
tensor_parallel_rank = hcg.get_model_parallel_rank()

model = AutoModelForCausalLM.from_pretrained("/code/pretrain/pd/THUDM/chatglm2-6b",
tensor_parallel_degree=tensor_parallel_degree,
tensor_parallel_rank=tensor_parallel_rank, dtype='bfloat16')

question = '世界上第二高的山峰是哪座?'

query = f"[Round 0]\n\n问:{question}\n\n答:"

input_ids = tokenizer(query, return_tensors='pd')
#print(model(**input_ids))
b_time = time.time()
output = model.generate(**input_ids, decode_strategy='greedy_search', max_new_tokens=150)
out = tokenizer.decode(output[0][0])
print("*" * 45)
print(f"HUMAN:{question}\nAI:{out}\ncost_time:{time.time()-b_time}")

b_time = time.time()
output = model.generate(**input_ids, decode_strategy='greedy_search', max_new_tokens=150)
out = tokenizer.decode(output[0][0])
print("*" * 45)
print(f"HUMAN:{question}\nAI:{out}\ncost_time:{time.time()-b_time}")
'''

运行日志信息如下
'''
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 2 sharding comm group(s) create successfully!
I1105 06:10:16.275830 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275833 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
I1105 06:10:16.275861 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275863 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:16,275] [ INFO] topology.py:290 - HybridParallelInfo: rank_id: 0, mp_degree: 2, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1], sharding_group: [0], pp_group: [0], dp_group: [0], sep:group: None, check/clip group: [0, 1]
[2024-11-05 06:10:16,276] [ INFO] - We are using <class 'paddlenlp.transformers.chatglm_v2.modeling.ChatGLMv2ForCausalLM'> to load '/code/pretrain/pd/THUDM/chatglm2-6b'.
[2024-11-05 06:10:16,277] [ INFO] - Loading configuration file /code/pretrain/pd/THUDM/chatglm2-6b/config.json
[2024-11-05 06:10:16,278] [ INFO] - Loading weights file /code/pretrain/pd/THUDM/chatglm2-6b/model_state.pdparams
[2024-11-05 06:11:34,779] [ INFO] - Starting to convert orignal state_dict to tensor parallel state_dict.
[2024-11-05 06:11:54,819] [ INFO] - Loaded weights file from disk, setting weights to model.
[2024-11-05 06:12:21,429] [ INFO] - All model checkpoint weights were used when initializing ChatGLMv2ForCausalLM.

[2024-11-05 06:12:21,429] [ INFO] - All the weights of ChatGLMv2ForCausalLM were initialized from the model checkpoint at /code/pretrain/pd/THUDM/chatglm2-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMv2ForCausalLM for predictions without further training.
[2024-11-05 06:12:21,494] [ INFO] - Generation config file not found, using a generation config created from the model config.
LAUNCH INFO 2024-11-05 06:12:23,289 Pod failed
LAUNCH ERROR 2024-11-05 06:12:23,289 Container failed !!!
Container rank 0 status failed cmd ['/usr/bin/python', '-u', 'glm2_infer.py'] code -7 log log/workerlog.0
LAUNCH INFO 2024-11-05 06:12:23,289 ------------------------- ERROR LOG DETAIL -------------------------
:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:14,520] [ INFO] topology.py:370 - Total 2 pipe comm group(s) create successfully!
W1105 06:10:14.534988 31043 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.0, Runtime API Version: 11.8
W1105 06:10:14.536003 31043 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
/usr/local/lib/python3.10/dist-packages/paddle/distributed/communication/group.py:128: UserWarning: Current global rank 0 is not in group default_pg10
warnings.warn(
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 2 data comm group(s) create successfully!
I1105 06:10:16.275671 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout
1800000
I1105 06:10:16.275683 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 1 model comm group(s) create successfully!
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 2 sharding comm group(s) create successfully!
I1105 06:10:16.275830 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275833 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
I1105 06:10:16.275861 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275863 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:16,275] [ INFO] topology.py:290 - HybridParallelInfo: rank_id: 0, mp_degree: 2, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1], sharding_group: [0], pp_group: [0], dp_group: [0], sep:group: None, check/clip group: [0, 1]
[2024-11-05 06:10:16,276] [ INFO] - We are using <class 'paddlenlp.transformers.chatglm_v2.modeling.ChatGLMv2ForCausalLM'> to load '/code/pretrain/pd/THUDM/chatglm2-6b'.
[2024-11-05 06:10:16,277] [ INFO] - Loading configuration file /code/pretrain/pd/THUDM/chatglm2-6b/config.json
[2024-11-05 06:10:16,278] [ INFO] - Loading weights file /code/pretrain/pd/THUDM/chatglm2-6b/model_state.pdparams
[2024-11-05 06:11:34,779] [ INFO] - Starting to convert orignal state_dict to tensor parallel state_dict.
[2024-11-05 06:11:54,819] [ INFO] - Loaded weights file from disk, setting weights to model.
[2024-11-05 06:12:21,429] [ INFO] - All model checkpoint weights were used when initializing ChatGLMv2ForCausalLM.

[2024-11-05 06:12:21,429] [ INFO] - All the weights of ChatGLMv2ForCausalLM were initialized from the model checkpoint at /code/pretrain/pd/THUDM/chatglm2-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMv2ForCausalLM for predictions without further training.
[2024-11-05 06:12:21,494] [ INFO] - Generation config file not found, using a generation config created from the model config.
LAUNCH INFO 2024-11-05 06:12:23,290 Exit code -7
'''

@xc005 xc005 added the question Further information is requested label Nov 5, 2024
@xc005 xc005 changed the title [Question]: 1) chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理2)是否支持XPU? 想请教下:chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理2)是否支持XPU? Nov 5, 2024
@xc005 xc005 changed the title 想请教下:chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理2)是否支持XPU? 想请教下:chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理,另外是否支持在昆仑R200的多卡推理? Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants