BERT
条评论参考资料:BERT(Bidirectional Encoder Representations from Transformers),HUGGING FACE,月来客栈
- 数据预处理 - 文本格式化 - 1 
 2
 3
 4
 5
 6
 7
 8
 9- def read_wiki2(filepath=None): 
 with open(filepath, 'r') as f:
 lines = f.readlines() # 一次读取所有行,每一行为一个段落
 paragraphs = []
 for line in tqdm(lines, ncols=80, desc=" ## 正在读取原始数据"):
 if len(line.split(' . ')) >= 2:
 paragraphs.append(line.strip().lower().split(' . '))
 random.shuffle(paragraphs) # 将所有段落打乱
 return paragraphs
 
