About Me

Hi! I am a Ph.D. student at Gaoling School of Artificial Intelligence, Renmin University of China, where I am advised by Prof. Yankai Lin. Before that, I got my Master’s degree at the Center for Data Science of Peking University. I was a member of LANCO group, where I was advised by Prof. Xu Sun. I am interested in Machine Learning (ML) and Natural Language Processing (NLP). Specifically, I am working on the alignment and security problems of Large Language Models.

Education

Internship

  • Research Intern of Wechat AI, Tencent Inc., Jan. 2021 - Now.

Preprints

(# denotes Equal Contribution)

  • Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
    Wenkai Yang, Shiqi Shen, Guangyao Shen, Zhi Gong, Yankai Lin
    [arxiv, code]

  • Exploring Backdoor Vulnerabilities of Chat Models
    Yunzhuo Hao#, Wenkai Yang#, Yankai Lin
    [arxiv, code]

  • Enabling Large Language Models to Learn from Rules
    Wenkai Yang, Yankai Lin, Jie Zhou, Jirong Wen
    [arxiv]

Selected Publications

Full List

(# denotes Equal Contribution)

  • Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
    Wenkai Yang#, Xiaohan Bi#, Yankai Lin, Sishuo Chen, Jie Zhou, Xu Sun
    NeurIPS 2024 [arxiv, code]

  • Decentralized Decoupled Training for Federated Long-Tailed Learning
    Wenkai Yang, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun
    Transactions on Machine Learning Research [url, arxiv, code]

  • Towards Codable Text Watermarking for Large Language Models
    Lean Wang#, Wenkai Yang#, Deli Chen#, Hao Zhou, Yankai Lin, Fandong Meng, Jie Zhou, Xu Sun
    ICLR 2024 [url, arxiv, code]

  • When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning
    Wenkai Yang, Yankai Lin, Guangxiang Zhao, Peng Li, Jie Zhou, Xu Sun
    Transactions on Machine Learning Research [url, arxiv, code]

  • RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
    Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun
    EMNLP 2021 [url, arxiv, code]

  • Rethinking Stealthiness of Backdoor Attack against NLP Models
    Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun
    ACL 2021 [url, code]

  • Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
    Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, Bin He
    NAACL-HLT 2021 [url, arxiv, code]

Awards

  • Excellent Graduate of Beijing Ordinary Colleges and Universities, 2022-2023
  • Excellent Graduate of Peking University, 2022-2023
  • National Scholarship of China (The highest scholarship for graduate students), 2020-2021
  • Pacemaker to Merit Student (The highest honor for graduate students), 2020-2021
  • Xingye Bank Scholarship, 2021-2022
  • Merit Student of PKU, 2021-2022

Contact

Email: kevenyang98 (at) gmail (dot) com