Pre-train Based

Paper	Conference
SpanBERT: Improving Pre-training by Representing and Predicting Spans
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Pre-Training with Whole Word Masking for Chinese BERT
Unified Language Model Pre-training for Natural Language Understanding and Generation
ERNIE: Enhanced Representation through Knowledge Integration
ERNIE: Enhanced Language Representation with Informative Entities	ACL19
MASS: Masked Sequence to Sequence Pre-training for Language Generation	ICML19
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	NAACL19
Linguistic Knowledge and Transferability of Contextual Representations	NAACL19
Improving Language Understanding by Generative Pre-Training
Deep contextualized word representations	NAACL18

Provide feedback