XLNet

XLNet: Generalized Autoregressive Pretraining for Language Understanding. With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.


References in zbMATH (referenced in 20 articles , 1 standard article )

Showing results 1 to 20 of 20.
Sorted by year (citations)

  1. Aghahadi, Zeinab; Talebpour, Alireza: Avicenna: a challenge dataset for natural language generation toward commonsense syllogistic reasoning (2022)
  2. Feiliang Ren, Yongkang Liu, Bochao Li, Shilei Liu, Bingchao Wang, Jiaqi Wang, Chunchao Liu, Qi Ma: An Understanding-Oriented Robust Machine Reading Comprehension Model (2022) arXiv
  3. Koto, Fajri; Baldwin, Timothy; Lau, Jey Han: FFCI: a framework for interpretable automatic evaluation of summarization (2022)
  4. Liu, Ruibo; Jia, Chenyan; Wei, Jason; Xu, Guangxuan; Vosoughi, Soroush: Quantifying and alleviating political bias in language models (2022)
  5. Li, Zuchao; Zhou, Junru; Zhao, Hai; Zhang, Zhisong; Li, Haonan; Ju, Yuqi: Neural character-level syntactic parsing for Chinese (2022)
  6. Loureiro, Daniel; Mário Jorge, Alípio; Camacho-Collados, Jose: LMMS reloaded: transformer-based sense embeddings for disambiguation and beyond (2022)
  7. Quamar, Abdul; Efthymiou, Vasilis; Lei, Chuan; Özcan, Fatma: Natural language interfaces to data (2022)
  8. Moreo, Alejandro; Esuli, Andrea; Sebastiani, Fabrizio: Word-class embeddings for multiclass text classification (2021)
  9. Škrlj, Blaž; Martinc, Matej; Lavrač, Nada; Pollak, Senja: autoBOT: evolving neuro-symbolic representations for explainable low resource text classification (2021)
  10. Stevenson, Matthew; Mues, Christophe; Bravo, Cristián: The value of text for small business default prediction: a deep learning approach (2021)
  11. Tao Gui, Xiao Wang, Qi Zhang, Qin Liu, Yicheng Zou, Xin Zhou, Rui Zheng, Chong Zhang, Qinzhuo Wu, Jiacheng Ye, Zexiong Pang, Yongxin Zhang, Zhengyan Li, Ruotian Ma, Zichu Fei, Ruijian Cai, Jun Zhao, Xinwu Hu, Zhiheng Yan, Yiding Tan, Yuan Hu, Qiyuan Bian, Zhihua Liu, Bolin Zhu, Shan Qin, Xiaoyu Xing, Jinlan Fu, Yue Zhang, Minlong Peng, Xiaoqing Zheng, Yaqian Zhou, Zhongyu Wei, Xipeng Qiu, Xuanjing Huang: TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing (2021) arXiv
  12. Tripathy, Jatin Karthik; Sethuraman, Sibi Chakkaravarthy; Cruz, Meenalosini Vimal; Namburu, Anupama; P., Mangalraj; R., Nandha Kumar; S., Sudhakar Ilango; Vijayakumar, Vaidehi: Comprehensive analysis of embeddings and pre-training in NLP (2021)
  13. Volpi, Riccardo; Malagò, Luigi: Natural alpha embeddings (2021)
  14. Raeid Saqur, Ameet Deshpande: CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes (2020) arXiv
  15. Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al.: Language Models are Few-Shot Learners (2020) arXiv
  16. Yada Pruksachatkun, Phil Yeres, Haokun Liu, Jason Phang, Phu Mon Htut, Alex Wang, Ian Tenney, Samuel R. Bowman: jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models (2020) arXiv
  17. Jibril Frej, Didier Schwab, Jean-Pierre Chevallet: WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset (2019) arXiv
  18. Nils Reimers, Iryna Gurevych: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (2019) arXiv
  19. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu: TinyBERT: Distilling BERT for Natural Language Understanding (2019) arXiv
  20. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le: XLNet: Generalized Autoregressive Pretraining for Language Understanding (2019) arXiv