You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/app/ner_longformer/run_ner.py", line 600, in
main()
File "/home/app/ner_longformer/run_ner.py", line 427, in main
desc="Running tokenizer on train dataset",
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 1673, in map
desc=desc,
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 2010, in _map_single
offset=offset,
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 1896, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/home/app/ner_longformer/run_ner.py", line 394, in tokenize_and_align_labels
word_ids = tokenized_inputs.word_ids(batch_index=i)
File "/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 353, in word_ids
raise ValueError("word_ids() is not available when using Python-based tokenizers")
ValueError: word_ids() is not available when using Python-based tokenizers
想利用roberta_zh的tokenizer来做中文NER任务,用huggingface transformers官方的run_ner.py脚本作模板跑本地中文模型和数据,但在本地数据集通过datasets.load_dataset()读入后报错如下:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/app/ner_longformer/run_ner.py", line 600, in
main()
File "/home/app/ner_longformer/run_ner.py", line 427, in main
desc="Running tokenizer on train dataset",
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 1673, in map
desc=desc,
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 2010, in _map_single
offset=offset,
File "/opt/conda/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 1896, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/home/app/ner_longformer/run_ner.py", line 394, in tokenize_and_align_labels
word_ids = tokenized_inputs.word_ids(batch_index=i)
File "/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 353, in word_ids
raise ValueError("word_ids() is not available when using Python-based tokenizers")
ValueError: word_ids() is not available when using Python-based tokenizers
load_dataset()通过脚本将.json数据集读入,_generate_examples()得到的数据格式如下:
{'id': '5', 'tokens': '此处为文本内容【2352JF987】夹杂一些编号信息。', 'ner_tags': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-lo', 'I-lo', 'I-lo', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-con', 'I-con', 'I-con', 'I-con', 'I-con', 'I-con', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']}
即:
{
"id": str(guid),
"tokens": tokens,
"ner_tags": ner_tags,
}
The text was updated successfully, but these errors were encountered: