We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,想问一下为什么我这边用我自己的数据pretrain老是获取不到tfrecords
The text was updated successfully, but these errors were encountered:
你好,请问你解决这个问题了吗?我在pretrain的时候得到的instance数量和文章数量相比少很多
Sorry, something went wrong.
@ruleGreen 你好,我这边发现了一个问题,在create_pretrianing_data.py文件下330行左右,有时传入的列表会变成二维的导致后续读取不出来,我做了如下修改: document = all_documents[document_index] 改为 document = all_documents[document_index] document = np.squeeze(document).tolist() 从数据量上和数据样例上看起来正常了许多
document = all_documents[document_index]
document = np.squeeze(document).tolist()
hello 我想问一下,数据集的准备是否需要以每一篇文章为一个txt,每个txt里面的每个句子一行这样的形式?我不确定预测句子是都是下一句这个训练任务是否需要这样构建数据任务。
No branches or pull requests
您好,想问一下为什么我这边用我自己的数据pretrain老是获取不到tfrecords
The text was updated successfully, but these errors were encountered: