Xipeng Qiu

Professor, School of Computer Science, Fudan University

 

Link

Email

Github

Weibo

Zhihu

Contact

Building 2X, No. 2005 Songhu Road,Shanghai, China

A more comprehensive publication list: Google Scholar

Survey/Overview of NLP

  1. Pre-trained Models for Natural Language Processing: A Survey, SCIENCE CHINA Technological Sciences (SCTS) , Vol. 63(10), pp. 1872–1897, Science China Press, 2020. [BibTeX][DOI][PDF] 《中国科学:技术科学》2021年度高影响力论文奖
    Xipeng Qiu, TianXiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang.
  2. BibTeX:
    @article{qiu2020:scts-ptms,
      author = {Xipeng Qiu and TianXiang Sun and Yige Xu and Yunfan Shao and Ning Dai and Xuanjing Huang},
      title = {Pre-trained Models for Natural Language Processing: A Survey},
      journal = {SCIENCE CHINA Technological Sciences},
      publisher = {Science China Press},
      year = {2020},
      volume = {63},
      number = {10},
      pages = {1872–1897},
      doi = {https://doi.org/10.1007/s11431-020-1647-3}
    }
    
  3. Paradigm Shift in Natural Language Processing, Machine Intelligence Research , Vol. 19(3), pp. 169-183, 2022. [BibTeX][DOI]
    Abstract: In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of tasks such as POS-tagging, named entity recognition (NER), and chunking, and adopt the classification paradigm to solve tasks like sentiment analysis. With the rapid progress of pre-trained language models, recent years have witnessed a rising trend of paradigm shift, which is solving one NLP task in a new paradigm by reformulating the task. The paradigm shift has achieved great success on many tasks and is becoming a promising way to improve model performance. Moreover, some of these paradigms have shown great potential to unify a large number of NLP tasks, making it possible to build a single model to handle diverse tasks. In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.
    [Abstract]
    Tian-Xiang Sun, Xiang-Yang Liu, Xi-Peng Qiu, Xuan-Jing Huang.
  4. BibTeX:
    @article{Sun2022,
      author = {Sun, Tian-Xiang and Liu, Xiang-Yang and Qiu, Xi-Peng and Huang, Xuan-Jing},
      title = {Paradigm Shift in Natural Language Processing},
      journal = {Machine Intelligence Research},
      year = {2022},
      volume = {19},
      number = {3},
      pages = {169--183},
      url = {https://doi.org/10.1007/s11633-022-1331-6},
      doi = {https://doi.org/10.1007/s11633-022-1331-6}
    }
    
  5. A survey of transformers, AI Open , Elsevier, 2022. [BibTeX][DOI][PDF]
    Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu.
  6. BibTeX:
    @article{lin2022survey,
      author = {Lin, Tianyang and Wang, Yuxin and Liu, Xiangyang and Qiu, Xipeng},
      title = {A survey of transformers},
      journal = {AI Open},
      publisher = {Elsevier},
      year = {2022},
      url = {https://arxiv.org/abs/2106.04554},
      doi = {https://doi.org/10.1016/j.aiopen.2022.10.001}
    }
    

Foundation Models / Language-Model-as-a-Service (LMaaS)

  1. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation, SCIENCE CHINA Information Sciences (SCIS) , 2022. [BibTeX][DOI][PDF]
    Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Hang Yan, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu.
  2. BibTeX:
    @article{11,
      author = {Shao, Yunfan and Geng, Zhichao and Liu, Yitao and Dai, Junqi and Yan, Hang and Yang, Fei and Zhe, Li and Bao, Hujun and Qiu, Xipeng},
      title = {CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation},
      journal = {SCIENCE CHINA Information Sciences},
      year = {2022},
      url = {https://arxiv.org/abs/2109.05729},
      doi = {https://doi.org/10.1007/s11432-021-3536-5}
    }
    
  3. Black-Box Tuning for Language-Model-as-a-Service, ICML, 2022. [BibTeX]
    Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu.
  4. BibTeX:
    @inproceedings{sun2022black,
      author = {Sun, Tianxiang and Shao, Yunfan and Qian, Hong and Huang, Xuanjing and Qiu, Xipeng},
      title = {Black-Box Tuning for Language-Model-as-a-Service},
      booktitle = {International Conference on Machine Learning},
      year = {2022},
      volume = {162},
      pages = {20841--20855}, 
      url = {https://proceedings.mlr.press/v162/sun22e.html}
    }
    
  5. BBTv2: Towards a Gradient-Free Future with Large Language Models, EMNLP, 2022. [BibTeX]
    Tianxiang Sun, Zhengfu He, Hong Qian, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu.
  6. BibTeX:
    @inproceedings{,
      author = {Tianxiang Sun and Zhengfu He and Hong Qian and Yunhua Zhou and Xuanjing Huang and Xipeng Qiu},
      title = {BBTv2: Towards a Gradient-Free Future with Large Language Models},
      booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
      year = {2022}, 
      url = {https://doi.org/10.48550/arXiv.2205.11200}
    }
    

Information Extraction

  1. TENER: adapting transformer encoder for named entity recognition, arXiv preprint arXiv:1911.04474 , 2019. [BibTeX]
    Hang Yan, Bocao Deng, Xiaonan Li, Xipeng Qiu.
  2. BibTeX:
    @article{yan2019tener,
      author = {Yan, Hang and Deng, Bocao and Li, Xiaonan and Qiu, Xipeng},
      title = {TENER: adapting transformer encoder for named entity recognition},
      journal = {arXiv preprint arXiv:1911.04474},
      year = {2019}
    }
    
  3. FLAT: Chinese NER Using Flat-Lattice Transformer, ACL, 2020. [BibTeX][PDF][Code][Abstract]
    Xiaonan Li, Hang Yan, Xipeng Qiu, Xuanjing Huang.
  4. BibTeX:
    @inproceedings{li-etal-2020-flat,
      author = {Li, Xiaonan and Yan, Hang and Qiu, Xipeng and Huang, Xuanjing},
      title = {FLAT: Chinese NER Using Flat-Lattice Transformer},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
      year = {2020},
      pages = {6836--6842}, 
      url = {https://www.aclweb.org/anthology/2020.acl-main.611}
    }
    
    Abstract: Recently, the character-word lattice structure has been proved to be effective for Chinese named entity recognition (NER) by incorporating the word information. However, since the lattice structure is complex and dynamic, the lattice-based models are hard to fully utilize the parallel computation of GPUs and usually have a low inference speed. In this paper, we propose FLAT: Flat-LAttice Transformer for Chinese NER, which converts the lattice structure into a flat structure consisting of spans. Each span corresponds to a character or latent word and its position in the original lattice. With the power of Transformer and well-designed position encoding, FLAT can fully leverage the lattice information and has an excellent parallel ability. Experiments on four datasets show FLAT outperforms other lexicon-based models in performance and efficiency.
  5. Accelerating BERT Inference for Sequence Labeling via Early-Exit, ACL, 2021. [BibTeX][PDF][Abstract]
    Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang.
  6. BibTeX:
    @inproceedings{li-etal-2021-accelerating,
      author = {Li, Xiaonan and Shao, Yunfan and Sun, Tianxiang and Yan, Hang and Qiu, Xipeng and Huang, Xuanjing},
      title = {Accelerating BERT Inference for Sequence Labeling via Early-Exit},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
      year = {2021},
      pages = {189--199}, 
      url = {https://aclanthology.org/2021.acl-long.16}
    }
    
    Abstract: Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose a simple extension of sentence-level early-exit for sequence labeling tasks. To further reduce the computational cost, we also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%鈭�75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2{\times}, 3{\times}, and 4{\times}.
  7. A Unified Generative Framework for Various NER Subtasks, ACL, 2021. [BibTeX][PDF][Abstract]
    Hang Yan, Tao Gui, Junqi Dai, Qipeng Guo, Zheng Zhang, Xipeng Qiu.
  8. BibTeX:
    @inproceedings{yan-etal-2021-unified-generative,
      author = {Yan, Hang and Gui, Tao and Dai, Junqi and Guo, Qipeng and Zhang, Zheng and Qiu, Xipeng},
      title = {A Unified Generative Framework for Various NER Subtasks},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
      year = {2021},
      pages = {5808--5822}, 
      url = {https://aclanthology.org/2021.acl-long.451}
    }
    
    Abstract: Named Entity Recognition (NER) is the task of identifying spans that represent entities in sentences. Whether the entity spans are nested or discontinuous, the NER task can be categorized into the flat NER, nested NER, and discontinuous NER subtasks. These subtasks have been mainly solved by the token-level sequence labelling or span-level classification. However, these solutions can hardly tackle the three kinds of NER subtasks concurrently. To that end, we propose to formulate the NER subtasks as an entity span sequence generation task, which can be solved by a unified sequence-to-sequence (Seq2Seq) framework. Based on our unified framework, we can leverage the pre-trained Seq2Seq model to solve all three kinds of NER subtasks without the special design of the tagging schema or ways to enumerate spans. We exploit three types of entity representations to linearize entities into a sequence. Our proposed framework is easy-to-implement and achieves state-of-the-art (SoTA) or near SoTA performance on eight English NER datasets, including two flat NER datasets, three nested NER datasets, and three discontinuous NER datasets.

Efficient NLP

  1. Star-Transformer, NAACL, 2019. [BibTeX][PDF][Abstract]
    Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang.
  2. BibTeX:
    @inproceedings{guo2019star,
      author = {Guo, Qipeng and Qiu, Xipeng and Liu, Pengfei and Shao, Yunfan and Xue, Xiangyang and Zhang, Zheng},
      title = {Star-Transformer},
      booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
      year = {2019},
      pages = {1315--1325}, 
      url = {https://www.aclweb.org/anthology/N19-1133}
    }
    
    Abstract: Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving the capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.
  3. Low-rank and Locality Constrained Self-Attention for Sequence Modeling, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), December, 2019 , Vol. 27(12), pp. 2213 - 2222, 2019. [BibTeX][DOI]
    Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Zheng Zhang.
  4. BibTeX:
    @article{guo2019low,
      author = {Guo, Qipeng and Qiu, Xipeng and Xue, Xiangyang and Zhang, Zheng},
      title = {Low-rank and Locality Constrained Self-Attention for Sequence Modeling},
      journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
      year = {2019},
      volume = {27},
      number = {12},
      pages = {2213 - 2222},
      doi = {https://doi.org/10.1109/TASLP.2019.2944078}
    }
    
  5. Accelerating BERT Inference for Sequence Labeling via Early-Exit, ACL, 2021. [BibTeX][PDF][Abstract]
    Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang.
  6. BibTeX:
    @inproceedings{li-etal-2021-accelerating,
      author = {Li, Xiaonan and Shao, Yunfan and Sun, Tianxiang and Yan, Hang and Qiu, Xipeng and Huang, Xuanjing},
      title = {Accelerating BERT Inference for Sequence Labeling via Early-Exit},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
      year = {2021},
      pages = {189--199}, 
      url = {https://aclanthology.org/2021.acl-long.16}
    }
    
    Abstract: Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose a simple extension of sentence-level early-exit for sequence labeling tasks. To further reduce the computational cost, we also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%鈭�75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2{\times}, 3{\times}, and 4{\times}.
  7. Towards Efficient NLP: A Standard Evaluation and A Strong Baseline, arXiv preprint arXiv:2110.07038 , 2021. [BibTeX]
    Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu.
  8. BibTeX:
    @article{liu2021towards,
      author = {Liu, Xiangyang and Sun, Tianxiang and He, Junliang and Wu, Lingling and Zhang, Xinyu and Jiang, Hao and Cao, Zhao and Huang, Xuanjing and Qiu, Xipeng},
      title = {Towards Efficient NLP: A Standard Evaluation and A Strong Baseline},
      journal = {arXiv preprint arXiv:2110.07038},
      year = {2021}
    }
    

Adapting PTMs to Downstream NLP Tasks

  1. Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa, NAACL, 2021. [BibTeX][PDF][Abstract]
    Junqi Dai, Hang Yan, Tianxiang Sun, Pengfei Liu, Xipeng Qiu.
  2. BibTeX:
    @inproceedings{dai-etal-2021-syntax,
      author = {Dai, Junqi and Yan, Hang and Sun, Tianxiang and Liu, Pengfei and Qiu, Xipeng},
      title = {Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa},
      booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
      year = {2021},
      pages = {1816--1829}, 
      url = {https://www.aclweb.org/anthology/2021.naacl-main.146}
    }
    
    Abstract: Aspect-based Sentiment Analysis (ABSA), aiming at predicting the polarities for aspects, is a fine-grained task in the field of sentiment analysis. Previous work showed syntactic information, e.g. dependency trees, can effectively improve the ABSA performance. Recently, pre-trained models (PTMs) also have shown their effectiveness on ABSA. Therefore, the question naturally arises whether PTMs contain sufficient syntactic information for ABSA so that we can obtain a good ABSA model only based on PTMs. In this paper, we firstly compare the induced trees from PTMs and the dependency parsing trees on several popular models for the ABSA task, showing that the induced tree from fine-tuned RoBERTa (FT-RoBERTa) outperforms the parser-provided tree. The further analysis experiments reveal that the FT-RoBERTa Induced Tree is more sentiment-word-oriented and could benefit the ABSA task. The experiments also show that the pure RoBERTa-based model can outperform or approximate to the previous SOTA performances on six datasets across four languages since it implicitly incorporates the task-oriented syntactic information.
  3. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence, NAACL, 2019. [BibTeX][Code][Abstract]
    Chi Sun, Luyao Huang, Xipeng Qiu.
  4. BibTeX:
    @inproceedings{sun2019utilizing,
      author = {Sun, Chi and Huang, Luyao and Qiu, Xipeng},
      title = {Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence},
      booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
      year = {2019},
      pages = {380--385}, 
      url = {https://arxiv.org/pdf/1903.09588.pdf}
    }
    
    Abstract: Aspect-based sentiment analysis (ABSA), which aims to identify fine-grained opinion polarity towards a specific aspect, is a challenging subtask of sentiment analysis (SA). In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on SentiHood and SemEval-2014 Task 4 datasets. The source codes are available at https://github.com/HSLCY/ABSA-BERT-pair.
  5. How to Fine-Tune BERT for Text Classification?, CCL (Best Paper Award), 2019. [BibTeX][PDF]
    Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang.
  6. BibTeX:
    @inproceedings{sun2019finetune,
      author = {Chi Sun and Xipeng Qiu and Yige Xu and Xuanjing Huang},
      title = {How to Fine-Tune BERT for Text Classification?},
      booktitle = {Proceedings of China National Conference on Computational Linguistics},
      year = {2019},
      pages = {194--206}, 
      url = {https://arxiv.org/abs/1905.05583}
    }
    

Chinese NLP

  1. Gated Recursive Neural Network For Chinese Word Segmentation, ACL, 2015. [BibTeX]
    Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Xuanjing Huang.
  2. BibTeX:
    @inproceedings{chen2015gated,
      author = {Xinchi Chen and Xipeng Qiu and Chenxi Zhu and Xuanjing Huang},
      title = {Gated Recursive Neural Network For Chinese Word Segmentation},
      booktitle = {Proceedings of Annual Meeting of the Association for Computational Linguistics},
      year = {2015},
      pages = {1744--1753}, 
      url = {http://www.aclweb.org/anthology/P/P15/P15-1168.pdf}
    }
    
  3. Long Short-Term Memory Neural Networks for Chinese Word Segmentation, EMNLP, 2015. [BibTeX]
    Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Pengfei Liu, Xuanjing Huang.
  4. BibTeX:
    @inproceedings{chen2015long,
      author = {Xinchi Chen and Xipeng Qiu and Chenxi Zhu and Pengfei Liu and Xuanjing Huang},
      title = {Long Short-Term Memory Neural Networks for Chinese Word Segmentation},
      booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
      year = {2015},
      pages = {1197--1206}, 
      url = {http://www.aclweb.org/anthology/D/D15/D15-1141.pdf}
    }
    
  5. A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing, Transactions of the Association for Computational Linguistics (TACL) , Vol. 8, pp. 78-92, 2020. [BibTeX][DOI][PDF]
    Hang Yan, Xipeng Qiu, Xuanjing Huang.
  6. BibTeX:
    @article{yan2020graph,
      author = {Yan, Hang and Qiu, Xipeng and Huang, Xuanjing},
      title = {A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing},
      journal = {Transactions of the Association for Computational Linguistics},
      year = {2020},
      volume = {8},
      pages = {78--92},
      doi = {https://doi.org/10.1162/tacl_a_00301}
    }
    
  7. A New Psychometric-inspired Evaluation Metric for Chinese Word Segmentation, ACL, 2016. [BibTeX]
    Peng Qian, Xipeng Qiu, Xuanjing Huang.
  8. BibTeX:
    @inproceedings{qian2016new,
      author = {Peng Qian and Xipeng Qiu and Xuanjing Huang},
      title = {A New Psychometric-inspired Evaluation Metric for Chinese Word Segmentation},
      booktitle = {Proceedings of Annual Meeting of the Association for Computational Linguistics},
      year = {2016},
      pages = {2185--2194}, 
      url = {http://aclweb.org/anthology/P/P16/P16-1206.pdf}
    }
    
  9. Adversarial Multi-Criteria Learning for Chinese Word Segmentation, ACL (Outstanding Paper Award), 2017. [BibTeX]
    Xinchi Chen, Zhan Shi, Xipeng Qiu, Xuanjing Huang.
  10. BibTeX:
    @inproceedings{chen2017adversarial,
      author = {Xinchi Chen and Zhan Shi and Xipeng Qiu and Xuanjing Huang},
      title = {Adversarial Multi-Criteria Learning for Chinese Word Segmentation},
      booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
      year = {2017},
      pages = {1193--1203}, 
      url = {http://aclweb.org/anthology/P/P17/P17-1110.pdf}
    }
    
  11. VCWE: Visual Character-Enhanced Word Embeddings, NAACL, 2019. [BibTeX][PDF][Abstract]
    Chi Sun, Xipeng Qiu, Xuanjing Huang.
  12. BibTeX:
    @inproceedings{sun2019vcwe,
      author = {Sun, Chi and Qiu, Xipeng and Huang, Xuanjing},
      title = {VCWE: Visual Character-Enhanced Word Embeddings},
      booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
      year = {2019},
      pages = {2710--2719}, 
      url = {https://www.aclweb.org/anthology/N19-1277}
    }
    
    Abstract: Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information. In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information. Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.
  13. FLAT: Chinese NER Using Flat-Lattice Transformer, ACL, 2020. [BibTeX][PDF][Code][Abstract]
    Xiaonan Li, Hang Yan, Xipeng Qiu, Xuanjing Huang.
  14. BibTeX:
    @inproceedings{li-etal-2020-flat,
      author = {Li, Xiaonan and Yan, Hang and Qiu, Xipeng and Huang, Xuanjing},
      title = {FLAT: Chinese NER Using Flat-Lattice Transformer},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
      year = {2020},
      pages = {6836--6842}, 
      url = {https://www.aclweb.org/anthology/2020.acl-main.611}
    }
    
    Abstract: Recently, the character-word lattice structure has been proved to be effective for Chinese named entity recognition (NER) by incorporating the word information. However, since the lattice structure is complex and dynamic, the lattice-based models are hard to fully utilize the parallel computation of GPUs and usually have a low inference speed. In this paper, we propose FLAT: Flat-LAttice Transformer for Chinese NER, which converts the lattice structure into a flat structure consisting of spans. Each span corresponds to a character or latent word and its position in the original lattice. With the power of Transformer and well-designed position encoding, FLAT can fully leverage the lattice information and has an excellent parallel ability. Experiments on four datasets show FLAT outperforms other lexicon-based models in performance and efficiency.
  15. A Concise Model for Multi-Criteria Chinese Word Segmentation with Transformer Encoder, EMNLP Findings, 2020. [BibTeX][PDF][Abstract]
    Xipeng Qiu, Hengzhi Pei, Hang Yan, Xuanjing Huang.
  16. BibTeX:
    @inproceedings{qiu-etal-2020-concise,
      author = {Qiu, Xipeng and Pei, Hengzhi and Yan, Hang and Huang, Xuanjing},
      title = {A Concise Model for Multi-Criteria Chinese Word Segmentation with Transformer Encoder},
      booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
      year = {2020},
      pages = {2887--2897}, 
      url = {https://www.aclweb.org/anthology/2020.findings-emnlp.260}
    }
    
    Abstract: Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks, which are learned together under the multi-task learning framework. In this paper, we propose a concise but effective unified model for MCCWS, which is fully-shared for all the criteria. By leveraging the powerful ability of the Transformer encoder, the proposed unified model can segment Chinese text according to a unique criterion-token indicating the output criterion. Besides, the proposed unified model can segment both simplified and traditional Chinese and has an excellent transfer capability. Experiments on eight datasets with different criteria show that our model outperforms our single-criterion baseline model and other multi-criteria models. Source codes of this paper are available on Github.
  17. fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP, ACL, 2021. [BibTeX][PDF][Abstract]
    Zhichao Geng, Hang Yan, Xipeng Qiu, Xuanjing Huang.
  18. BibTeX:
    @inproceedings{geng-etal-2021-fasthan,
      author = {Geng, Zhichao and Yan, Hang and Qiu, Xipeng and Huang, Xuanjing},
      title = {fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations},
      year = {2021},
      pages = {99--106}, 
      url = {https://aclanthology.org/2021.acl-demo.12}
    }
    
    Abstract: We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation (CWS), Part-of-Speech (POS) tagging, named entity recognition (NER), and dependency parsing. The backbone of fastHan is a multi-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base model compressed from the 8-layer model. The joint-model is trained and evaluated on 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in dependency parsing and NER, achieving SOTA performance in CWS and POS. Besides, fastHan's transferability is also strong, performing much better than popular segmentation tools on a non-training corpus. To better meet the need of practical application, we allow users to use their own labeled data to further fine-tune fastHan. In addition to its small size and excellent performance, fastHan is user-friendly. Implemented as a python package, fastHan isolates users from the internal technical details and is convenient to use. The project is released on Github.

Reliable NLP

  1. BERT-ATTACK: Adversarial Attack Against BERT Using BERT, EMNLP, 2020. [BibTeX][PDF][Abstract]
    Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, Xipeng Qiu.
  2. BibTeX:
    @inproceedings{li-etal-2020-bert-attack,
      author = {Li, Linyang and Ma, Ruotian and Guo, Qipeng and Xue, Xiangyang and Qiu, Xipeng},
      title = {BERT-ATTACK: Adversarial Attack Against BERT Using BERT},
      booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
      year = {2020},
      pages = {6193--6202}, 
      url = {https://www.aclweb.org/anthology/2020.emnlp-main.500}
    }
    
    Abstract: Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial samples with gradient-based methods. Current successful attack methods for texts usually adopt heuristic replacement strategies on the character or word level, which remains challenging to find the optimal solution in the massive space of possible combinations of replacements while preserving semantic consistency and language fluency. In this paper, we propose BERT-Attack, a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT. We turn BERT against its fine-tuned models and other deep neural models in downstream tasks so that we can successfully mislead the target models to predict incorrectly. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage, while the generated adversarial samples are fluent and semantically preserved. Also, the cost of calculation is low, thus possible for large-scale generations. The code is available at https://github.com/LinyangLee/BERT-Attack.
  3. Token-Aware Virtual Adversarial Training in Natural Language Understanding, AAAI, 2021. [BibTeX][PDF]
    Linyang Li, Xipeng Qiu.
  4. BibTeX:
    @inproceedings{Li_Qiu_2021,
      author = {Li, Linyang and Qiu, Xipeng},
      title = {Token-Aware Virtual Adversarial Training in Natural Language Understanding},
      booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
      year = {2021},
      volume = {35},
      number = {9},
      pages = {8410-8418}, 
      url = {https://ojs.aaai.org/index.php/AAAI/article/view/17022}
    }
    
  5. Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning, EMNLP, 2021. [BibTeX][PDF][Abstract]
    Linyang Li, Demin Song, Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng Qiu.
  6. BibTeX:
    @inproceedings{li-etal-2021-backdoor,
      author = {Li, Linyang and Song, Demin and Li, Xiaonan and Zeng, Jiehang and Ma, Ruotian and Qiu, Xipeng},
      title = {Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning},
      booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
      year = {2021},
      pages = {3023--3032}, 
      url = {https://aclanthology.org/2021.emnlp-main.241}
    }
    
    Abstract: Pre-Trained Models have been widely applied and recently proved vulnerable under backdoor attacks: the released pre-trained weights can be maliciously poisoned with certain triggers. When the triggers are activated, even the fine-tuned model will predict pre-defined labels, causing a security threat. These backdoors generated by the poisoning methods can be erased by changing hyper-parameters during fine-tuning or detected by finding the triggers. In this paper, we propose a stronger weight-poisoning attack method that introduces a layerwise weight poisoning strategy to plant deeper backdoors; we also introduce a combinatorial trigger that cannot be easily detected. The experiments on text classification tasks show that previous defense methods cannot resist our weight-poisoning method, which indicates that our method can be widely applied and may provide hints for future model robustness studies.
  7. TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing, , 2021. [BibTeX][Abstract]
    Xiao Wang, Qin Liu, Tao Gui, Qi Zhang, Yicheng Zou, Xin Zhou, Jiacheng Ye, Yongxin Zhang, Rui Zheng, Zexiong Pang, Qinzhuo Wu, Zhengyan Li, Chong Zhang, Ruotian Ma, Zichu Fei, Ruijian Cai, Jun Zhao, Xingwu Hu, Zhiheng Yan, Yiding Tan, Yuan Hu, Qiyuan Bian, Zhihua Liu, Shan Qin, Bolin Zhu, Xiaoyu Xing, Jinlan Fu, Yue Zhang, Minlong Peng, Xiaoqing Zheng, Yaqian Zhou, Zhongyu Wei, Xipeng Qiu, Xuanjing Huang.
  8. BibTeX:
    @inproceedings{wang-etal-2021-textflint,
      author = {Wang, Xiao and Liu, Qin and Gui, Tao and Zhang, Qi and Zou, Yicheng and Zhou, Xin and Ye, Jiacheng and Zhang, Yongxin and Zheng, Rui and Pang, Zexiong and Wu, Qinzhuo and Li, Zhengyan and Zhang, Chong and Ma, Ruotian and Fei, Zichu and Cai, Ruijian and Zhao, Jun and Hu, Xingwu and Yan, Zhiheng and Tan, Yiding and Hu, Yuan and Bian, Qiyuan and Liu, Zhihua and Qin, Shan and Zhu, Bolin and Xing, Xiaoyu and Fu, Jinlan and Zhang, Yue and Peng, Minlong and Zheng, Xiaoqing and Zhou, Yaqian and Wei, Zhongyu and Qiu, Xipeng and Huang, Xuanjing},
      title = {TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations},
      year = {2021},
      pages = {347--355}, 
      url = {https://aclanthology.org/2021.acl-demo.41}
    }
    
    Abstract: TextFlint is a multilingual robustness evaluation toolkit for NLP tasks that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analyses. This enables practitioners to automatically evaluate their models from various aspects or to customize their evaluations as desired with just a few lines of code. TextFlint also generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model in terms of its robustness. To guarantee acceptability, all the text transformations are linguistically based and all the transformed data selected (up to 100,000 texts) scored highly under human evaluation. To validate the utility, we performed large-scale empirical evaluations (over 67,000) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. The toolkit is already available at https://github.com/textflint with all the evaluation results demonstrated at textflint.io.

Text Matching

  1. Modelling Interaction of Sentence Pair with Coupled-LSTMs, EMNLP, 2016. [BibTeX]
    Pengfei Liu, Xipeng Qiu, Yaqian Zhou, Jifan Chen, Xuanjing Huang.
  2. BibTeX:
    @inproceedings{liu2016modelling,
      author = {Liu, Pengfei and Qiu, Xipeng and Zhou, Yaqian and Chen, Jifan and Huang, Xuanjing},
      title = {Modelling Interaction of Sentence Pair with Coupled-LSTMs},
      booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
      year = {2016},
      pages = {1703--1712}, 
      url = {https://aclweb.org/anthology/D16-1176}
    }
    
  3. Convolutional Neural Tensor Network Architecture for Community-based Question Answering, IJCAI, 2015. [BibTeX]
    Xipeng Qiu, Xuanjing Huang.
  4. BibTeX:
    @inproceedings{qiu2015convolutional,
      author = {Xipeng Qiu and Xuanjing Huang},
      title = {Convolutional Neural Tensor Network Architecture for Community-based Question Answering},
      booktitle = {Proceedings of International Joint Conference on Artificial Intelligence},
      year = {2015}, 
      url = {http://ijcai.org/papers15/Papers/IJCAI15-188.pdf}
    }
    
  5. Convolutional Interaction Network for Natural Language Inference, EMNLP, 2018. [BibTeX]
    Jingjing Gong, Xipeng Qiu, Xinchi Chen, Dong Liang, Xuanjing Huang.
  6. BibTeX:
    @inproceedings{gong2018convolutional,
      author = {Gong, Jingjing and Qiu, Xipeng and Chen, Xinchi and Liang, Dong and Huang, Xuanjing},
      title = {Convolutional Interaction Network for Natural Language Inference},
      booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
      year = {2018},
      pages = {1576--1585}, 
      url = {https://www.aclweb.org/anthology/D18-1186}
    }
    
  7. Deep Fusion LSTMs for Text Semantic Matching, ACL, 2016. [BibTeX][PDF]
    Pengfei Liu, Xipeng Qiu, Jifan Chen, Xuanjing Huang.
  8. BibTeX:
    @inproceedings{liu2016deep,
      author = {Pengfei Liu and Xipeng Qiu and Jifan Chen and Xuanjing Huang},
      title = {Deep Fusion LSTMs for Text Semantic Matching},
      booktitle = {Proceedings of Annual Meeting of the Association for Computational Linguistics},
      year = {2016},
      pages = {1034--1043}
    }
    
  9. Extractive Summarization as Text Matching, ACL, 2020. [BibTeX][PDF][Code][Abstract]
    Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuanjing Huang.
  10. BibTeX:
    @inproceedings{zhong-etal-2020-extractive,
      author = {Zhong, Ming and Liu, Pengfei and Chen, Yiran and Wang, Danqing and Qiu, Xipeng and Huang, Xuanjing},
      title = {Extractive Summarization as Text Matching},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
      year = {2020},
      pages = {6197--6208}, 
      url = {https://www.aclweb.org/anthology/2020.acl-main.552}
    }
    
    Abstract: This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences, we formulate the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be (extracted from the original text) matched in a semantic space. Notably, this paradigm shift to semantic matching framework is well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors based on the property of the dataset. Besides, even instantiating the framework with a simple form of a matching model, we have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1). Experiments on the other five datasets also show the effectiveness of the matching framework. We believe the power of this matching-based summarization framework has not been fully exploited. To encourage more instantiations in the future, we have released our codes, processed dataset, as well as generated summaries in url.

Multi-Task Learning for NLP

  1. Deep Multi-Task Learning with Shared Memory, EMNLP, 2016. [BibTeX]
    Pengfei Liu, Xipeng Qiu, Xuanjing Huang.
  2. BibTeX:
    @inproceedings{liu2016deep-multitask,
      author = {Liu, Pengfei and Qiu, Xipeng and Huang, Xuanjing},
      title = {Deep Multi-Task Learning with Shared Memory},
      booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
      year = {2016},
      pages = {118--127}, 
      url = {https://aclweb.org/anthology/D16-1012}
    }
    
  3. Recurrent Neural Network for Text Classification with Multi-Task Learning, IJCAI, 2016. [BibTeX]
    Pengfei Liu, Xipeng Qiu, Xuanjing Huang.
  4. BibTeX:
    @inproceedings{liu2016recurrent,
      author = {Pengfei Liu and Xipeng Qiu and Xuanjing Huang},
      title = {Recurrent Neural Network for Text Classification with Multi-Task Learning},
      booktitle = {Proceedings of International Joint Conference on Artificial Intelligence},
      year = {2016},
      pages = {2873--2879}, 
      url = {https://arxiv.org/abs/1605.05101}
    }
    
  5. Meta Multi-Task Learning for Sequence Modeling, AAAI, 2018. [BibTeX][PDF]
    Junkun Chen, Xipeng Qiu, Pengfei Liu, Xuanjing Huang.
  6. BibTeX:
    @inproceedings{chen2018meta,
      author = {Chen, Junkun and Qiu, Xipeng and Liu, Pengfei and Huang, Xuanjing},
      title = {Meta Multi-Task Learning for Sequence Modeling},
      booktitle = {Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence},
      year = {2018},
      pages = {5070--5077}, 
      url = {https://arxiv.org/abs/1802.08969}
    }
    
  7. Same Representation, Different Attentions: Shareable Sentence Representation Learning from Multiple Tasks, IJCAI, 2018. [BibTeX][PDF]
    Renjie Zheng, Junkun Chen, Xipeng Qiu.
  8. BibTeX:
    @inproceedings{zheng2018same,
      author = {Zheng, Renjie and Chen, Junkun and Qiu, Xipeng},
      title = {Same Representation, Different Attentions: Shareable Sentence Representation Learning from Multiple Tasks},
      booktitle = {Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence},
      year = {2018},
      pages = {4616--4622}, 
      url = {https://arxiv.org/abs/1804.08139}
    }
    
  9. Adversarial Multi-task Learning for Text Classification, ACL, 2017. [BibTeX][PDF]
    Pengfei Liu, Xipeng Qiu, Xuanjing Huang.
  10. BibTeX:
    @inproceedings{liu2017adversarial,
      author = {Pengfei Liu and Xipeng Qiu and Xuanjing Huang},
      title = {Adversarial Multi-task Learning for Text Classification},
      booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
      year = {2017},
      pages = {1--10}
    }
    

Neural Architecture for NLP

  1. Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents, EMNLP, 2015. [BibTeX]
    PengFei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, Xuanjing Huang.
  2. BibTeX:
    @inproceedings{liu2015multitimescale,
      author = {PengFei Liu and Xipeng Qiu and Xinchi Chen and Shiyu Wu and Xuanjing Huang},
      title = {Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents},
      booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
      year = {2015},
      pages = {2326--2235}, 
      url = {http://www.aclweb.org/anthology/D/D15/D15-1280.pdf}
    }
    
  3. Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification, EMNLP, 2016. [BibTeX]
    Jiacheng Xu, Danlu Chen, Xipeng Qiu, Xuanjing Huang.
  4. BibTeX:
    @inproceedings{xu2016cached,
      author = {Jiacheng Xu and Danlu Chen and Xipeng Qiu and Xuanjing Huang},
      title = {Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification},
      booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
      year = {2016},
      pages = {1660--1669}, 
      url = {https://aclweb.org/anthology/D16-1172}
    }
    
  5. Convolutional Interaction Network for Natural Language Inference, EMNLP, 2018. [BibTeX]
    Jingjing Gong, Xipeng Qiu, Xinchi Chen, Dong Liang, Xuanjing Huang.
  6. BibTeX:
    @inproceedings{gong2018convolutional,
      author = {Gong, Jingjing and Qiu, Xipeng and Chen, Xinchi and Liang, Dong and Huang, Xuanjing},
      title = {Convolutional Interaction Network for Natural Language Inference},
      booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
      year = {2018},
      pages = {1576--1585}, 
      url = {https://www.aclweb.org/anthology/D18-1186}
    }
    
  7. Star-Transformer, NAACL, 2019. [BibTeX][PDF][Abstract]
    Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang.
  8. BibTeX:
    @inproceedings{guo2019star,
      author = {Guo, Qipeng and Qiu, Xipeng and Liu, Pengfei and Shao, Yunfan and Xue, Xiangyang and Zhang, Zheng},
      title = {Star-Transformer},
      booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
      year = {2019},
      pages = {1315--1325}, 
      url = {https://www.aclweb.org/anthology/N19-1133}
    }
    
    Abstract: Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving the capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.
  9. Low-rank and Locality Constrained Self-Attention for Sequence Modeling, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), December, 2019 , Vol. 27(12), pp. 2213 - 2222, 2019. [BibTeX][DOI]
    Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Zheng Zhang.
  10. BibTeX:
    @article{guo2019low,
      author = {Guo, Qipeng and Qiu, Xipeng and Xue, Xiangyang and Zhang, Zheng},
      title = {Low-rank and Locality Constrained Self-Attention for Sequence Modeling},
      journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
      year = {2019},
      volume = {27},
      number = {12},
      pages = {2213 - 2222},
      doi = {https://doi.org/10.1109/TASLP.2019.2944078}
    }
    

Knowledge-enhanced NLP

  1. Knowledge Graph Representation with Jointly Structural and Textual Encoding, IJCAI, 2017. [BibTeX][PDF]
    Jiacheng Xu, Xipeng Qiu, Kan Chen, Xuanjing Huang.
  2. BibTeX:
    @inproceedings{xu2017knowledge,
      author = {Jiacheng Xu and Xipeng Qiu and Kan Chen and Xuanjing Huang},
      title = {Knowledge Graph Representation with Jointly Structural and Textual Encoding},
      booktitle = {Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence},
      year = {2017},
      pages = {1318--1324}
    }
    
  3. CoLAKE: Contextualized Language and Knowledge Embedding, COLING, 2020. [BibTeX][PDF][Abstract]
    Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang, Zheng Zhang.
  4. BibTeX:
    @inproceedings{sun-etal-2020-colake,
      author = {Sun, Tianxiang and Shao, Yunfan and Qiu, Xipeng and Guo, Qipeng and Hu, Yaru and Huang, Xuanjing and Zhang, Zheng},
      title = {CoLAKE: Contextualized Language and Knowledge Embedding},
      booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
      year = {2020},
      pages = {3660--3670}, 
      url = {https://www.aclweb.org/anthology/2020.coling-main.327}
    }
    
    Abstract: With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models. Few works explore the potential of deep contextualized knowledge representation when injecting knowledge. In this paper, we propose the Contextualized Language and Knowledge Embedding (CoLAKE), which jointly learns contextualized representation for both language and knowledge with the extended MLM objective. Instead of injecting only entity embeddings, CoLAKE extracts the knowledge context of an entity from large-scale knowledge bases. To handle the heterogeneity of knowledge context and language context, we integrate them in a unified data structure, word-knowledge graph (WK graph). CoLAKE is pre-trained on large-scale WK graphs with the modified Transformer encoder. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks. Experimental results show that CoLAKE outperforms previous counterparts on most of the tasks. Besides, CoLAKE achieves surprisingly high performance on our synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language and knowledge representation.

Sentiment Analysis

  1. Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification, EMNLP, 2016. [BibTeX]
    Jiacheng Xu, Danlu Chen, Xipeng Qiu, Xuanjing Huang.
  2. BibTeX:
    @inproceedings{xu2016cached,
      author = {Jiacheng Xu and Danlu Chen and Xipeng Qiu and Xuanjing Huang},
      title = {Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification},
      booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
      year = {2016},
      pages = {1660--1669}, 
      url = {https://aclweb.org/anthology/D16-1172}
    }
    
  3. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence, NAACL, 2019. [BibTeX][Code][Abstract]
    Chi Sun, Luyao Huang, Xipeng Qiu.
  4. BibTeX:
    @inproceedings{sun2019utilizing,
      author = {Sun, Chi and Huang, Luyao and Qiu, Xipeng},
      title = {Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence},
      booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
      year = {2019},
      pages = {380--385}, 
      url = {https://arxiv.org/pdf/1903.09588.pdf}
    }
    
    Abstract: Aspect-based sentiment analysis (ABSA), which aims to identify fine-grained opinion polarity towards a specific aspect, is a challenging subtask of sentiment analysis (SA). In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on SentiHood and SemEval-2014 Task 4 datasets. The source codes are available at https://github.com/HSLCY/ABSA-BERT-pair.
  5. Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa, NAACL, 2021. [BibTeX][PDF][Abstract]
    Junqi Dai, Hang Yan, Tianxiang Sun, Pengfei Liu, Xipeng Qiu.
  6. BibTeX:
    @inproceedings{dai-etal-2021-syntax,
      author = {Dai, Junqi and Yan, Hang and Sun, Tianxiang and Liu, Pengfei and Qiu, Xipeng},
      title = {Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa},
      booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
      year = {2021},
      pages = {1816--1829}, 
      url = {https://www.aclweb.org/anthology/2021.naacl-main.146}
    }
    
    Abstract: Aspect-based Sentiment Analysis (ABSA), aiming at predicting the polarities for aspects, is a fine-grained task in the field of sentiment analysis. Previous work showed syntactic information, e.g. dependency trees, can effectively improve the ABSA performance. Recently, pre-trained models (PTMs) also have shown their effectiveness on ABSA. Therefore, the question naturally arises whether PTMs contain sufficient syntactic information for ABSA so that we can obtain a good ABSA model only based on PTMs. In this paper, we firstly compare the induced trees from PTMs and the dependency parsing trees on several popular models for the ABSA task, showing that the induced tree from fine-tuned RoBERTa (FT-RoBERTa) outperforms the parser-provided tree. The further analysis experiments reveal that the FT-RoBERTa Induced Tree is more sentiment-word-oriented and could benefit the ABSA task. The experiments also show that the pure RoBERTa-based model can outperform or approximate to the previous SOTA performances on six datasets across four languages since it implicitly incorporates the task-oriented syntactic information.
  7. A Unified Generative Framework for Aspect-based Sentiment Analysis, ACL, 2021. [BibTeX][PDF][Abstract]
    Hang Yan, Junqi Dai, Tuo Ji, Xipeng Qiu, Zheng Zhang.
  8. BibTeX:
    @inproceedings{yan-etal-2021-unified,
      author = {Yan, Hang and Dai, Junqi and Ji, Tuo and Qiu, Xipeng and Zhang, Zheng},
      title = {A Unified Generative Framework for Aspect-based Sentiment Analysis},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
      year = {2021},
      pages = {2416--2429}, 
      url = {https://aclanthology.org/2021.acl-long.188}
    }
    
    Abstract: Aspect-based Sentiment Analysis (ABSA) aims to identify the aspect terms, their corresponding sentiment polarities, and the opinion terms. There exist seven subtasks in ABSA. Most studies only focus on the subsets of these subtasks, which leads to various complicated ABSA models while hard to solve these subtasks in a unified framework. In this paper, we redefine every subtask target as a sequence mixed by pointer indexes and sentiment class indexes, which converts all ABSA subtasks into a unified generative formulation. Based on the unified formulation, we exploit the pre-training sequence-to-sequence model BART to solve all ABSA subtasks in an end-to-end framework. Extensive experiments on four ABSA datasets for seven subtasks demonstrate that our framework achieves substantial performance gain and provides a real unified end-to-end solution for the whole ABSA subtasks, which could benefit multiple tasks.

Text Generation

  1. Toward Diverse Text Generation with Inverse Reinforcement Learning, IJCAI, 2018. [BibTeX][PDF][Code]
    Zhan Shi, Xinchi Chen, Xipeng Qiu, Xuanjing Huang.
  2. BibTeX:
    @inproceedings{shi2018towards,
      author = {Shi, Zhan and Chen, Xinchi and Qiu, Xipeng and Huang, Xuanjing},
      title = {Toward Diverse Text Generation with Inverse Reinforcement Learning},
      booktitle = {Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence},
      year = {2018},
      pages = {4361--4367}, 
      url = {https://arxiv.org/abs/1804.11258}
    }
    
  3. Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation, ACL, 2019. [BibTeX][PDF][Code]
    Ning Dai, Jianze Liang, Xipeng Qiu, Xuanjing Huang.
  4. BibTeX:
    @inproceedings{dai2019style,
      author = {Ning Dai and Jianze Liang and Xipeng Qiu and Xuanjing Huang},
      title = {Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation},
      booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
      year = {2019},
      pages = {5997--6007}, 
      url = {https://www.aclweb.org/anthology/P19-1601/}
    }
    
  5. Extractive Summarization as Text Matching, ACL, 2020. [BibTeX][PDF][Code][Abstract]
    Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuanjing Huang.
  6. BibTeX:
    @inproceedings{zhong-etal-2020-extractive,
      author = {Zhong, Ming and Liu, Pengfei and Chen, Yiran and Wang, Danqing and Qiu, Xipeng and Huang, Xuanjing},
      title = {Extractive Summarization as Text Matching},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
      year = {2020},
      pages = {6197--6208}, 
      url = {https://www.aclweb.org/anthology/2020.acl-main.552}
    }
    
    Abstract: This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences, we formulate the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be (extracted from the original text) matched in a semantic space. Notably, this paradigm shift to semantic matching framework is well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors based on the property of the dataset. Besides, even instantiating the framework with a simple form of a matching model, we have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1). Experiments on the other five datasets also show the effectiveness of the matching framework. We believe the power of this matching-based summarization framework has not been fully exploited. To encourage more instantiations in the future, we have released our codes, processed dataset, as well as generated summaries in url.
  7. Syntax-guided text generation via graph neural network, Science China Information Sciences , Vol. 64(5), pp. 152102, 2021. [BibTeX][DOI]
    Abstract: Text generation is a fundamental and important task in natural language processing. Most of the existing models generate text in a sequential manner and have difficulty modeling complex dependency structures. In this paper, we treat the text generation task as a graph generation problem exploiting both syntactic and word-ordering relationships. Leveraging the framework of the graph neural network, we propose the word graph model. During the process, the model builds a sentence incrementally and maintains syntactic integrity via a syntax-driven, top-down, breadth-first generation process. Experimental results on both synthetic and real text generation tasks show the efficacy of our approach.
    [Abstract]
    Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Zheng Zhang.
  8. BibTeX:
    @article{guo2021syntax-gnn,
      author = {Guo, Qipeng and Qiu, Xipeng and Xue, Xiangyang and Zhang, Zheng},
      title = {Syntax-guided text generation via graph neural network},
      journal = {Science China Information Sciences},
      year = {2021},
      volume = {64},
      number = {5},
      pages = {152102},
      url = {https://doi.org/10.1007/s11432-019-2740-1},
      doi = {https://doi.org/10.1007/s11432-019-2740-1}
    }