Knowledge Distillation

Paper	Conference
Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention	ACL19
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Multilingual Neural Machine Translation with Knowledge Distillation	ICLR19
BAM! Born-Again Multi-Task Networks for Natural Language Understanding	ACL19
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection	AAAI19
Distilling Knowledge for Search-based Structured Prediction	ACL18
On-Device Neural Language Model based Word Prediction	COLING18
Zero-Shot Cross-Lingual Neural Headline Generation	IEEE/ACM TRANSACTIONS 18
Cross-lingual Distillation for Text Classification	ACL17
DOMAIN ADAPTATION OF DNN ACOUSTIC MODELS USING KNOWLEDGE DISTILLATION	ICASSP17
Sequence-Level Knowledge Distillation	EMNLP16
Distilling Word Embeddings: An Encoding Approach	CIKM16
Distilling the Knowledge in a Neural Network	NIPS14 Deep Learning Workshop

Provide feedback