pretrain 数据问题 #73

ruleGreen · 2020-07-28T11:17:36Z

您好，想问一下为什么我这边用我自己的数据pretrain老是获取不到tfrecords

zhang-yunke · 2020-08-26T06:26:59Z

你好，请问你解决这个问题了吗？我在pretrain的时候得到的instance数量和文章数量相比少很多

zhang-yunke · 2020-09-01T09:12:35Z

@ruleGreen 你好，我这边发现了一个问题，在create_pretrianing_data.py文件下330行左右，有时传入的列表会变成二维的导致后续读取不出来，我做了如下修改：
document = all_documents[document_index]
改为
document = all_documents[document_index]
document = np.squeeze(document).tolist()
从数据量上和数据样例上看起来正常了许多

Rxma1805 · 2020-11-30T07:01:33Z

@ruleGreen 你好，我这边发现了一个问题，在create_pretrianing_data.py文件下330行左右，有时传入的列表会变成二维的导致后续读取不出来，我做了如下修改：
document = all_documents[document_index]
改为
document = all_documents[document_index]
document = np.squeeze(document).tolist()
从数据量上和数据样例上看起来正常了许多

hello 我想问一下，数据集的准备是否需要以每一篇文章为一个txt，每个txt里面的每个句子一行这样的形式？我不确定预测句子是都是下一句这个训练任务是否需要这样构建数据任务。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretrain 数据问题 #73

pretrain 数据问题 #73

ruleGreen commented Jul 28, 2020

zhang-yunke commented Aug 26, 2020

zhang-yunke commented Sep 1, 2020

Rxma1805 commented Nov 30, 2020

pretrain 数据问题 #73

pretrain 数据问题 #73

Comments

ruleGreen commented Jul 28, 2020

zhang-yunke commented Aug 26, 2020

zhang-yunke commented Sep 1, 2020

Rxma1805 commented Nov 30, 2020