想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理，另外是否支持在昆仑R200的多卡推理? #9371

xc005 · 2024-11-05T10:02:17Z

问题

chatglm2-6b用paddlenlp的预训练权重在多卡上推理，对显卡或环境有什么具体要求吗？这边在4090显卡跑了下但出错了，可以麻烦帮忙看下是什么问题导致的吗，如环境还是多卡推理代码，多谢

运行环境：4090显卡 + cuda11.8 + paddlepaddle-gpu==3.0.0b1,
+ PaddleNLP: 3.0.0b2.post20241105(branch:dev_20240926_update_chatglmv2)

执行命令：python -u -m paddle.distributed.launch --gpus "0,1" glm2_infer.py

glm2_infer.py内容如下
'''
import paddle
from paddlenlp.transformers import AutoModelForCausalLM, AutoTokenizer
from paddle.distributed import fleet
import time

tokenizer = AutoTokenizer.from_pretrained('/code/pretrain/pd/THUDM/chatglm2-6b')

tensor_parallel_degree = paddle.distributed.get_world_size()

print(f"***tensor_parallel_degree={tensor_parallel_degree}")

tensor_parallel_rank = 0

if tensor_parallel_degree > 1:
strategy = fleet.DistributedStrategy()
strategy.hybrid_configs = {
"dp_degree": 1,
"mp_degree": tensor_parallel_degree,
"pp_degree": 1,
"sharding_degree": 1,
}
fleet.init(is_collective=True, strategy=strategy)
hcg = fleet.get_hybrid_communicate_group()
tensor_parallel_rank = hcg.get_model_parallel_rank()

model = AutoModelForCausalLM.from_pretrained("/code/pretrain/pd/THUDM/chatglm2-6b",
tensor_parallel_degree=tensor_parallel_degree,
tensor_parallel_rank=tensor_parallel_rank, dtype='bfloat16')

question = '世界上第二高的山峰是哪座？'

query = f"[Round 0]\n\n问：{question}\n\n答："

input_ids = tokenizer(query, return_tensors='pd')
#print(model(**input_ids))
b_time = time.time()
output = model.generate(**input_ids, decode_strategy='greedy_search', max_new_tokens=150)
out = tokenizer.decode(output[0][0])
print("*" * 45)
print(f"HUMAN:{question}\nAI:{out}\ncost_time:{time.time()-b_time}")

b_time = time.time()
output = model.generate(**input_ids, decode_strategy='greedy_search', max_new_tokens=150)
out = tokenizer.decode(output[0][0])
print("*" * 45)
print(f"HUMAN:{question}\nAI:{out}\ncost_time:{time.time()-b_time}")
'''

运行日志信息如下
'''
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 2 sharding comm group(s) create successfully!
I1105 06:10:16.275830 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275833 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
I1105 06:10:16.275861 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275863 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:16,275] [ INFO] topology.py:290 - HybridParallelInfo: rank_id: 0, mp_degree: 2, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1], sharding_group: [0], pp_group: [0], dp_group: [0], sep:group: None, check/clip group: [0, 1]
[2024-11-05 06:10:16,276] [ INFO] - We are using <class 'paddlenlp.transformers.chatglm_v2.modeling.ChatGLMv2ForCausalLM'> to load '/code/pretrain/pd/THUDM/chatglm2-6b'.
[2024-11-05 06:10:16,277] [ INFO] - Loading configuration file /code/pretrain/pd/THUDM/chatglm2-6b/config.json
[2024-11-05 06:10:16,278] [ INFO] - Loading weights file /code/pretrain/pd/THUDM/chatglm2-6b/model_state.pdparams
[2024-11-05 06:11:34,779] [ INFO] - Starting to convert orignal state_dict to tensor parallel state_dict.
[2024-11-05 06:11:54,819] [ INFO] - Loaded weights file from disk, setting weights to model.
[2024-11-05 06:12:21,429] [ INFO] - All model checkpoint weights were used when initializing ChatGLMv2ForCausalLM.

[2024-11-05 06:12:21,429] [ INFO] - All the weights of ChatGLMv2ForCausalLM were initialized from the model checkpoint at /code/pretrain/pd/THUDM/chatglm2-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMv2ForCausalLM for predictions without further training.
[2024-11-05 06:12:21,494] [ INFO] - Generation config file not found, using a generation config created from the model config.
LAUNCH INFO 2024-11-05 06:12:23,289 Pod failed
LAUNCH ERROR 2024-11-05 06:12:23,289 Container failed !!!
Container rank 0 status failed cmd ['/usr/bin/python', '-u', 'glm2_infer.py'] code -7 log log/workerlog.0
LAUNCH INFO 2024-11-05 06:12:23,289 ------------------------- ERROR LOG DETAIL -------------------------
:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:14,520] [ INFO] topology.py:370 - Total 2 pipe comm group(s) create successfully!
W1105 06:10:14.534988 31043 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.0, Runtime API Version: 11.8
W1105 06:10:14.536003 31043 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
/usr/local/lib/python3.10/dist-packages/paddle/distributed/communication/group.py:128: UserWarning: Current global rank 0 is not in group default_pg10
warnings.warn(
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 2 data comm group(s) create successfully!
I1105 06:10:16.275671 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout 1800000
I1105 06:10:16.275683 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 1 model comm group(s) create successfully!
[2024-11-05 06:10:16,275] [ INFO] topology.py:370 - Total 2 sharding comm group(s) create successfully!
I1105 06:10:16.275830 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275833 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
I1105 06:10:16.275861 31043 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1105 06:10:16.275863 31043 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-11-05 06:10:16,275] [ INFO] topology.py:290 - HybridParallelInfo: rank_id: 0, mp_degree: 2, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1], sharding_group: [0], pp_group: [0], dp_group: [0], sep:group: None, check/clip group: [0, 1]
[2024-11-05 06:10:16,276] [ INFO] - We are using <class 'paddlenlp.transformers.chatglm_v2.modeling.ChatGLMv2ForCausalLM'> to load '/code/pretrain/pd/THUDM/chatglm2-6b'.
[2024-11-05 06:10:16,277] [ INFO] - Loading configuration file /code/pretrain/pd/THUDM/chatglm2-6b/config.json
[2024-11-05 06:10:16,278] [ INFO] - Loading weights file /code/pretrain/pd/THUDM/chatglm2-6b/model_state.pdparams
[2024-11-05 06:11:34,779] [ INFO] - Starting to convert orignal state_dict to tensor parallel state_dict.
[2024-11-05 06:11:54,819] [ INFO] - Loaded weights file from disk, setting weights to model.
[2024-11-05 06:12:21,429] [ INFO] - All model checkpoint weights were used when initializing ChatGLMv2ForCausalLM.

[2024-11-05 06:12:21,429] [ INFO] - All the weights of ChatGLMv2ForCausalLM were initialized from the model checkpoint at /code/pretrain/pd/THUDM/chatglm2-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMv2ForCausalLM for predictions without further training.
[2024-11-05 06:12:21,494] [ INFO] - Generation config file not found, using a generation config created from the model config.
LAUNCH INFO 2024-11-05 06:12:23,290 Exit code -7
'''

xc005 added the question Further information is requested label Nov 5, 2024

paddle-bot bot assigned KB-Ding Nov 5, 2024

xc005 changed the title ~~[Question]: 1) chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理2)是否支持XPU?~~ 想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理2)是否支持XPU? Nov 5, 2024

xc005 changed the title ~~想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理2)是否支持XPU?~~ 想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理，另外是否支持在昆仑R200的多卡推理? Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理，另外是否支持在昆仑R200的多卡推理? #9371

想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理，另外是否支持在昆仑R200的多卡推理? #9371

xc005 commented Nov 5, 2024 •

edited

Loading

想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理，另外是否支持在昆仑R200的多卡推理? #9371

想请教下：chatglm2-6b用paddlenlp的预训练权重怎么在多卡上推理，另外是否支持在昆仑R200的多卡推理? #9371

Comments

xc005 commented Nov 5, 2024 • edited Loading

问题

xc005 commented Nov 5, 2024 •

edited

Loading