Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage] Dataloader in train code maybe wrong. #21

Open
Luoyang144 opened this issue Dec 25, 2023 · 5 comments
Open

[Usage] Dataloader in train code maybe wrong. #21

Luoyang144 opened this issue Dec 25, 2023 · 5 comments

Comments

@Luoyang144
Copy link

Describe the issue

Issue: Dataloader in train code maybe wrong.

Command:

deepspeed train.py \
    --deepspeed scripts/zero2.json \
    --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \
    --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin"  \
    --version v1 \
    --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \
    --image_folder data/toy/image \
    --vision_tower openai/clip-vit-large-patch14 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 False \
    --output_dir $out_dir \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 False \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --mm_projector_type mlp2x_gelu

Log:

    trainer = LLaVATrainer(model=model,
TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType

After debugging the code I find make_supervised_data_module may not be finished, as it didn't return anything.

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset


    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    print(f"train_dataset size: {len(train_dataset)}")

Any idea?

@kaijieJiao
Copy link

Describe the issue

Issue: Dataloader in train code maybe wrong.

Command:

deepspeed train.py \
    --deepspeed scripts/zero2.json \
    --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \
    --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin"  \
    --version v1 \
    --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \
    --image_folder data/toy/image \
    --vision_tower openai/clip-vit-large-patch14 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 False \
    --output_dir $out_dir \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 False \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --mm_projector_type mlp2x_gelu

Log:

    trainer = LLaVATrainer(model=model,
TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType

After debugging the code I find make_supervised_data_module may not be finished, as it didn't return anything.

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset


    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    print(f"train_dataset size: {len(train_dataset)}")

Any idea?

so,do you have any solution?I meet the same error

@kaijieJiao
Copy link

Describe the issue

Issue: Dataloader in train code maybe wrong.
Command:

deepspeed train.py \
    --deepspeed scripts/zero2.json \
    --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \
    --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin"  \
    --version v1 \
    --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \
    --image_folder data/toy/image \
    --vision_tower openai/clip-vit-large-patch14 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 False \
    --output_dir $out_dir \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 False \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --mm_projector_type mlp2x_gelu

Log:

    trainer = LLaVATrainer(model=model,
TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType

After debugging the code I find make_supervised_data_module may not be finished, as it didn't return anything.

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset


    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    print(f"train_dataset size: {len(train_dataset)}")

Any idea?

so,do you have any solution?I meet the same error

I work it by the following steps:
image
image

add return and replace **data_module to train_dataset=data_module

but I meet other error about OOM?
If someone can solve it,please reply to me.

@Luoyang144
Copy link
Author

Luoyang144 commented Jan 26, 2024

@kaijieJiao

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset


    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
    print(f"train_dataset size: {len(train_dataset)}")
    return dict(train_dataset=train_dataset,
                eval_dataset=None,
                data_collator=data_collator)
data_module = make_supervised_data_module(tokenizer=tokenizer,
                                              data_args=data_args)
trainer = LLaVATrainer(model=model,
                    tokenizer=tokenizer,
                    args=training_args,
                    **data_module)

Referring to other codes in llava, no guarantee that there will be no issues

@kaijieJiao
Copy link

@kaijieJiao

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset


    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
    print(f"train_dataset size: {len(train_dataset)}")
    return dict(train_dataset=train_dataset,
                eval_dataset=None,
                data_collator=data_collator)
data_module = make_supervised_data_module(tokenizer=tokenizer,
                                              data_args=data_args)
trainer = LLaVATrainer(model=model,
                    tokenizer=tokenizer,
                    args=training_args,
                    **data_module)

Referring to other codes in llava, no guarantee that there will be no issues

Do you train it successfully?

@pedramaghazadeh
Copy link

pedramaghazadeh commented Mar 14, 2024

Neither of the solutions above have worked for me. In both cases I faced the same error:

  File "/workspace/tools/LaVA-Plus/./train_mem.py", line 13, in <module>
    train()
  File "/workspace/tools/LLaVA-Plus/llava/train/train.py", line 987, in train
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1553, in _inner_training_loop
    train_dataloader = self.get_train_dataloader()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 850, in get_train_dataloader
    dataloader_params["sampler"] = self._get_train_sampler()
  File "/workspace/tools/LLaVA-Plus/llava/train/llava_trainer.py", line 140, in _get_train_sampler
    lengths = self.train_dataset.modality_lengths
AttributeError: 'ConcatDataset' object has no attribute 'modality_lengths'
Traceback (most recent call last):
  File "/workspace/tools/LLaVA-Plus/./train_mem.py", line 13, in <module>
    train()
  File "/workspace/tools/LLaVA-Plus/llava/train/train.py", line 987, in train
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1553, in _inner_training_loop
    train_dataloader = self.get_train_dataloader()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 850, in get_train_dataloader
    dataloader_params["sampler"] = self._get_train_sampler()
  File "/workspace/tools/LLaVA-Plus/llava/train/llava_trainer.py", line 140, in _get_train_sampler
    lengths = self.train_dataset.modality_lengths
AttributeError: 'ConcatDataset' object has no attribute 'modality_lengths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants