Perplexity validation on PG19 error and Passkey Retrieval error #19

khfs · 2024-07-01T06:28:26Z

I followed the environment setup in the readme exactly. When performing Perplexity validation on PG19, the only difference from the original code is that I loaded the model from a local path and set the device to 'cpu' to see the exact error messages. My command line was:

python test_ppl.py --seq_len 16384 --scale 7b --data_path pg19_llama2.validation.bin

The terminal output was:

The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.88it/s]
Test PPL on seq length 16384
0%| | 0/9446 [00:00<?, ?it/s]
Traceback (most recent call last):
File "test_ppl.py", line 102, in
evaluate_ppl_all(seq_length=args.seq_len, sliding_window=256, args=args, model=model, data=data)
File "test_ppl.py", line 58, in evaluate_ppl_all
outputs = model(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1027, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/torch/nn/functional.py", line 2264, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

When performing Passkey Retrieval, the only difference from the original code is that I loaded the model from a local path. My command line was:

python test_passkey.py --seq_len 16384 --scale 7b

The terminal output was:
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.13it/s]
Traceback (most recent call last):
File "test_passkey.py", line 123, in
main(args)
File "test_passkey.py", line 77, in main
model = load_checkpoint_and_dispatch(model, checkpoint=model_path,
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/accelerate/big_modeling.py", line 607, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File "/data4/xylu/miniconda3/envs/chunkllama/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 1705, in load_checkpoint_in_model
raise ValueError(
ValueError: /data3/xylu/checkpoints/NousResearch/Llama-2-7b-hf containing more than one .index.json file, delete the irrelevant ones.

The text was updated successfully, but these errors were encountered:

ChenxinAn-fdu · 2024-07-02T05:01:51Z

Hi ! Have you verified your code without the chunkLlama monkey patch?

khfs · 2024-07-02T05:49:15Z

Yes, I commented out line 85 in test_ppl.py and got the same error message. Later, after comparing this code with the eval.py code of LongLoRA, I found that changing np.uint32 to np.uint16 on line 98 of test_ppl.py allowed the code to run. However, the result of running CHUNKLLAMA2 7B was {"seq_len": 16384, "gpu": "1", "data_path": "pg19_llama2.validation.bin", "scale": "7b", "pretraining_length": 4096, "ppl": 1803.4413082318101}. Obviously, this result is not correct, and I don't know what other issues exist in the code.

ChenxinAn-fdu · 2024-07-03T03:01:17Z

Thank you for letting me know! I think this issue is caused by mistakenly uploading the files using Llama3 tokenizer. I will check it right now.

ChenxinAn-fdu · 2024-07-03T04:03:37Z

Hi! Changing data = {'val': np.memmap(data_path, dtype=np.uint32, mode='r')} -> data = {'val': np.memmap(data_path, dtype=np.uint16, mode='r')} works for me. Remember not to comment out this line: replace_with_chunkllama(args.pretraining_length, args.pretraining_length//4)

khfs · 2024-07-03T04:10:13Z

Although this allows the code to run, the result I obtained is {"seq_len": 16384, "gpu": "1", "data_path": "pg19_llama2.validation.bin", "scale": "7b", "pretraining_length": 4096, "ppl": 1803.4413082318101}, where the PPL is too high. Therefore, I believe there is still an issue with the code. I am curious about your results.

ChenxinAn-fdu · 2024-07-03T04:35:51Z

I have updated the code. Plz try the newest version🤣.

khfs · 2024-07-03T06:46:51Z

Thank you for your response regarding the validation on PG19 and I am currently testing the latest version of the code. How can I resolve the error related to the passkey retrieval?

ChenxinAn-fdu · 2024-07-03T07:12:19Z

test_passkey.py has also been updated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perplexity validation on PG19 error and Passkey Retrieval error #19

Perplexity validation on PG19 error and Passkey Retrieval error #19

khfs commented Jul 1, 2024

ChenxinAn-fdu commented Jul 2, 2024

khfs commented Jul 2, 2024

ChenxinAn-fdu commented Jul 3, 2024 •

edited

Loading

ChenxinAn-fdu commented Jul 3, 2024

khfs commented Jul 3, 2024

ChenxinAn-fdu commented Jul 3, 2024

khfs commented Jul 3, 2024

ChenxinAn-fdu commented Jul 3, 2024

Perplexity validation on PG19 error and Passkey Retrieval error #19

Perplexity validation on PG19 error and Passkey Retrieval error #19

Comments

khfs commented Jul 1, 2024

ChenxinAn-fdu commented Jul 2, 2024

khfs commented Jul 2, 2024

ChenxinAn-fdu commented Jul 3, 2024 • edited Loading

ChenxinAn-fdu commented Jul 3, 2024

khfs commented Jul 3, 2024

ChenxinAn-fdu commented Jul 3, 2024

khfs commented Jul 3, 2024

ChenxinAn-fdu commented Jul 3, 2024

ChenxinAn-fdu commented Jul 3, 2024 •

edited

Loading