Xipeng Qiu

Professor, School of Computer Science, Fudan University

 

Link

Email

Github

Weibo

Zhihu

Contact

Building 2X, No. 2005 Songhu Road,Shanghai, China

A more comprehensive publication list: Google Scholar

Selected Papers

  1. MOSS: An Open Conversational Large Language Model, Machine Intelligence Research , 2024. [BibTeX][DOI][PDF]
    Abstract: Conversational large language models (LLMs) such as ChatGPT and GPT-4 have recently exhibited remarkable capabilities across various domains, capturing widespread attention from the public. To facilitate this line of research, in this paper, we report the development of MOSS, an open-sourced conversational LLM that contains 16 B parameters and can perform a variety of instructions in multi-turn interactions with humans. The base model of MOSS is pre-trained on large-scale unlabeled English, Chinese, and code data. To optimize the model for dialogue, we generate 1.1 M synthetic conversations based on user prompts collected through our earlier versions of the model API. We then perform preference-aware training on preference data annotated from AI feedback. Evaluation results on real-world use cases and academic benchmarks demonstrate the effectiveness of the proposed approaches. In addition, we present an effective practice to augment MOSS with several external tools. Through the development of MOSS, we have established a complete technical roadmap for large language models from pre-training, supervised fine-tuning to alignment, verifying the feasibility of chatGPT under resource-limited conditions and providing a reference for both the academic and industrial communities. Model weights and code are publicly available at https://github.com/OpenMOSS/MOSS.
    [Abstract]
    Tianxiang Sun, Xiaotian Zhang, Zhengfu He, Peng Li, Qinyuan Cheng, Xiangyang Liu, Hang Yan, Yunfan Shao, Qiong Tang, Shiduo Zhang, Xingjian Zhao, Ke Chen, Yining Zheng, Zhejian Zhou, Ruixiao Li, Jun Zhan, Yunhua Zhou, Linyang Li, Xiaogui Yang, Lingling Wu, Zhangyue Yin, Xuanjing Huang, Yu-Gang Jiang, Xipeng Qiu.
  2. BibTeX:
    @article{Sun2024MOSS,
      author = {Sun, Tianxiang and Zhang, Xiaotian and He, Zhengfu and Li, Peng and Cheng, Qinyuan and Liu, Xiangyang and Yan, Hang and Shao, Yunfan and Tang, Qiong and Zhang, Shiduo and Zhao, Xingjian and Chen, Ke and Zheng, Yining and Zhou, Zhejian and Li, Ruixiao and Zhan, Jun and Zhou, Yunhua and Li, Linyang and Yang, Xiaogui and Wu, Lingling and Yin, Zhangyue and Huang, Xuanjing and Jiang, Yu-Gang and Qiu, Xipeng},
      title = {MOSS: An Open Conversational Large Language Model},
      journal = {Machine Intelligence Research},
      year = {2024},
      doi = {https://doi.org/10.1007/s11633-024-1502-8}
    }
    
  3. Paradigm Shift in Natural Language Processing, Machine Intelligence Research , Vol. 19(3), pp. 169-183, 2022. [BibTeX][DOI]
    Abstract: In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of tasks such as POS-tagging, named entity recognition (NER), and chunking, and adopt the classification paradigm to solve tasks like sentiment analysis. With the rapid progress of pre-trained language models, recent years have witnessed a rising trend of paradigm shift, which is solving one NLP task in a new paradigm by reformulating the task. The paradigm shift has achieved great success on many tasks and is becoming a promising way to improve model performance. Moreover, some of these paradigms have shown great potential to unify a large number of NLP tasks, making it possible to build a single model to handle diverse tasks. In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.
    [Abstract]
    Tian-Xiang Sun, Xiang-Yang Liu, Xi-Peng Qiu, Xuan-Jing Huang.
  4. BibTeX:
    @article{Sun2022,
      author = {Sun, Tian-Xiang and Liu, Xiang-Yang and Qiu, Xi-Peng and Huang, Xuan-Jing},
      title = {Paradigm Shift in Natural Language Processing},
      journal = {Machine Intelligence Research},
      year = {2022},
      volume = {19},
      number = {3},
      pages = {169--183},
      url = {https://doi.org/10.1007/s11633-022-1331-6},
      doi = {https://doi.org/10.1007/s11633-022-1331-6}
    }
    
  5. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation, SCIENCE CHINA Information Sciences (SCIS) , 2022. [BibTeX][DOI][PDF]
    Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Hang Yan, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu.
  6. BibTeX:
    @article{11,
      author = {Shao, Yunfan and Geng, Zhichao and Liu, Yitao and Dai, Junqi and Yan, Hang and Yang, Fei and Zhe, Li and Bao, Hujun and Qiu, Xipeng},
      title = {CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation},
      journal = {SCIENCE CHINA Information Sciences},
      year = {2022},
      url = {https://arxiv.org/abs/2109.05729},
      doi = {https://doi.org/10.1007/s11432-021-3536-5}
    }
    
  7. Black-Box Tuning for Language-Model-as-a-Service, ICML, 2022. [BibTeX]
    Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu.
  8. BibTeX:
    @inproceedings{sun2022black,
      author = {Sun, Tianxiang and Shao, Yunfan and Qian, Hong and Huang, Xuanjing and Qiu, Xipeng},
      title = {Black-Box Tuning for Language-Model-as-a-Service},
      booktitle = {International Conference on Machine Learning},
      year = {2022},
      volume = {162},
      pages = {20841--20855}, 
      url = {https://proceedings.mlr.press/v162/sun22e.html}
    }
    
  9. A Unified Generative Framework for Various NER Subtasks, ACL, 2021. [BibTeX][PDF][Abstract]
    Hang Yan, Tao Gui, Junqi Dai, Qipeng Guo, Zheng Zhang, Xipeng Qiu.
  10. BibTeX:
    @inproceedings{yan-etal-2021-unified-generative,
      author = {Yan, Hang and Gui, Tao and Dai, Junqi and Guo, Qipeng and Zhang, Zheng and Qiu, Xipeng},
      title = {A Unified Generative Framework for Various NER Subtasks},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
      year = {2021},
      pages = {5808--5822}, 
      url = {https://aclanthology.org/2021.acl-long.451}
    }
    
    Abstract: Named Entity Recognition (NER) is the task of identifying spans that represent entities in sentences. Whether the entity spans are nested or discontinuous, the NER task can be categorized into the flat NER, nested NER, and discontinuous NER subtasks. These subtasks have been mainly solved by the token-level sequence labelling or span-level classification. However, these solutions can hardly tackle the three kinds of NER subtasks concurrently. To that end, we propose to formulate the NER subtasks as an entity span sequence generation task, which can be solved by a unified sequence-to-sequence (Seq2Seq) framework. Based on our unified framework, we can leverage the pre-trained Seq2Seq model to solve all three kinds of NER subtasks without the special design of the tagging schema or ways to enumerate spans. We exploit three types of entity representations to linearize entities into a sequence. Our proposed framework is easy-to-implement and achieves state-of-the-art (SoTA) or near SoTA performance on eight English NER datasets, including two flat NER datasets, three nested NER datasets, and three discontinuous NER datasets.
  11. FLAT: Chinese NER Using Flat-Lattice Transformer, ACL, 2020. [BibTeX][PDF][Code][Abstract]
    Xiaonan Li, Hang Yan, Xipeng Qiu, Xuanjing Huang.
  12. BibTeX:
    @inproceedings{li-etal-2020-flat,
      author = {Li, Xiaonan and Yan, Hang and Qiu, Xipeng and Huang, Xuanjing},
      title = {FLAT: Chinese NER Using Flat-Lattice Transformer},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
      year = {2020},
      pages = {6836--6842}, 
      url = {https://www.aclweb.org/anthology/2020.acl-main.611}
    }
    
    Abstract: Recently, the character-word lattice structure has been proved to be effective for Chinese named entity recognition (NER) by incorporating the word information. However, since the lattice structure is complex and dynamic, the lattice-based models are hard to fully utilize the parallel computation of GPUs and usually have a low inference speed. In this paper, we propose FLAT: Flat-LAttice Transformer for Chinese NER, which converts the lattice structure into a flat structure consisting of spans. Each span corresponds to a character or latent word and its position in the original lattice. With the power of Transformer and well-designed position encoding, FLAT can fully leverage the lattice information and has an excellent parallel ability. Experiments on four datasets show FLAT outperforms other lexicon-based models in performance and efficiency.
  13. Star-Transformer, NAACL, 2019. [BibTeX][PDF][Abstract]
    Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang.
  14. BibTeX:
    @inproceedings{guo2019star,
      author = {Guo, Qipeng and Qiu, Xipeng and Liu, Pengfei and Shao, Yunfan and Xue, Xiangyang and Zhang, Zheng},
      title = {Star-Transformer},
      booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
      year = {2019},
      pages = {1315--1325}, 
      url = {https://www.aclweb.org/anthology/N19-1133}
    }
    
    Abstract: Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving the capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.
  15. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence, NAACL, 2019. [BibTeX][Code][Abstract]
    Chi Sun, Luyao Huang, Xipeng Qiu.
  16. BibTeX:
    @inproceedings{sun2019utilizing,
      author = {Sun, Chi and Huang, Luyao and Qiu, Xipeng},
      title = {Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence},
      booktitle = {Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
      year = {2019},
      pages = {380--385}, 
      url = {https://arxiv.org/pdf/1903.09588.pdf}
    }
    
    Abstract: Aspect-based sentiment analysis (ABSA), which aims to identify fine-grained opinion polarity towards a specific aspect, is a challenging subtask of sentiment analysis (SA). In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on SentiHood and SemEval-2014 Task 4 datasets. The source codes are available at https://github.com/HSLCY/ABSA-BERT-pair.
  17. Recurrent Neural Network for Text Classification with Multi-Task Learning, IJCAI, 2016. [BibTeX]
    Pengfei Liu, Xipeng Qiu, Xuanjing Huang.
  18. BibTeX:
    @inproceedings{liu2016recurrent,
      author = {Pengfei Liu and Xipeng Qiu and Xuanjing Huang},
      title = {Recurrent Neural Network for Text Classification with Multi-Task Learning},
      booktitle = {Proceedings of International Joint Conference on Artificial Intelligence},
      year = {2016},
      pages = {2873--2879}, 
      url = {https://arxiv.org/abs/1605.05101}
    }