I'm a third-year Ph.D. candidate in Computer Science at Yale University, where I am very fortunately advised by Prof. Mark Gerstein. I work closely with Prof. Arman Cohan as my “advisor of record”. Previously, I got my master's from Yale CS as well, advised by Prof. Dragomir Radev. I am a graduate affiliate at Grace Hopper College since 2021.

My research lies in the intersection of large language models and applications in bioinformatics. I aim at building AI agents to automate biomedical research, e.g.,

  • Reasoning and Coding: AI scientists capable of verifiable reasoning can autonomously design, plan, and perform experiments by code execution [ICLR 25, ACL 24 Findings, Bioinfo. 24, NAACL 24, ACL 23, EMNLP 23, SEKE 19].
  • LLM Agents and Tool Use: AI scientists could integrate AI models and specialized tools with experimental platforms [ICLR 24, ICLR 24 LLM Agents WS].
  • Drug Design: AI scientists can impact areas ranging from molecule modeling, protein folding, and virtual cell simulation to developing new therapies [Nat. Biotech. 24, Brief. in Bioinfo. 24, Bioinfo. 24].
I am looking for grads / undergrads to collaborate and actively engage in mentorship. Feel free to email me if you are starting in the field, PhD Admissions, etc. I especially encourage students from underrepresented groups to reach out.

My research is supported by Schmidt Futures.


Discover the google scholar | semantic scholar

Recent Preprints

  • [4] MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
    Xiangru Tang, Daniel Shao, Jiwoong Sohn, Jiapeng Chen, Jiayi Zhang, Jinyu Xiang, Fang Wu, Yilun Zhao, Chenglin Wu, Wenqi Shi, Arman Cohan, Mark Gerstein.
    arXiv preprint arXiv:2503.07459
    "Thinking models (DeepSeek R1 and OpenAI o3) show exceptional performance on medical QA tasks."
    [PDF] [Abstract] [Bib]
    MedagentsBench
  • [3] BC-Design: A Biochemistry-Aware Framework for High-Precision Inverse Protein Folding
    Xiangru Tang*, Xinwu Ye*, Fang Wu*, Yanjun Shao, Yin Fang, Siming Chen, Dong Xu, Mark Gerstein.
    bioRxiv 2024
    "A quantum leap in inverse protein folding from 67% to 88%!"
    [PDF] [Abstract] [Bib]
    BC-Design
  • [2] LocAgent: Graph-Guided LLM Agents for Code Localization
    Zhaoling Chen*, Xiangru Tang*, Gangda Deng*, Fang Wu, Jialong Wu, Zhiwei Jiang, Viktor Prasanna, Arman Cohan, Xingyao Wang.
    arXiv preprint arXiv:2503.09089
    "No need to embed the entire repo, agent + graph-based indexing is all you need!"
    [PDF] [Abstract] [Bib]
    LocAgent
  • [1] D-Flow: Multi-modality Flow Matching for D-peptide Design
    Fang Wu, Tinson Xu, Shuting Jin, Xiangru Tang, Zerui Xu, James Zou, Brian Hie.
    bioRxiv 2024
    [PDF] [Abstract] [Bib]
    PeptideDesign

Selected Publications

  • [16] ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
    Xiangru Tang*, Yuliang Liu*, Zefan Cai*, Junjie Lu, Yichi Zhang, Yanjun Shao, Zexuan Deng, Helan Hu, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Liang Chen, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein.
    ICLR 2025 Deep Learning for Code
    ICLR 2025 Agentic AI for Scientific Discovery
    "Can LLMs do machine learning tasks?"
    [PDF] [Abstract] [Bib]
    ML-Bench
  • [15] Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy
    Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein.
    Nature Communications, 2025 (IF 14.7)
    ICLR 2024 Workshop on LLM Agents
    [PDF] [Abstract] [Bib]
  • [14] ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
    Xiangru Tang*, Tianyu Hu*, Muyang Ye*, Yanjun Shao*, Xunjian Yin, Siru Ouyang, Wangchunshu Zhou, Pan Lu, Zhuosheng Zhang, Yilun Zhao, Arman Cohan, Mark Gerstein.
    ICLR 2025
    "Enable LLMs to continuously improve through experience."
    [PDF] [Abstract] [Bib]
    ChemAgent
  • [13] Fast, Sensitive Detection of Protein Homologs Using Deep Dense Retrieval
    Liang Hong*, Zhihang Hu*, Siqi Sun*, Xiangru Tang*, Jiuming Wang, Qingxiong Tan, Liangzhen Zheng, Sheng Wang, Sheng Xu, Irwin King, Mark Gerstein, Yu Li.
    Nature Biotechnology, 2024 (IF 33.1)
    "Up to 28,700 times faster than HMMER!"
    [PDF] [Abstract] [Bib]
    DPR
  • [12] MIMIR: A Customizable Agent Tuning Platform for Enhanced Scientific Applications
    Xiangru Tang*, Chunyuan Deng*, Hanmin Wang*, Haoran Wang*, Yilun Zhao, Wenqi Shi, Yi Fung, Wangchunshu Zhou, Jiannan Cao, Heng Ji, Arman Cohan, Mark Gerstein.
    EMNLP 2024
    [PDF] [Abstract] [Bib]
    MIMIR
  • [11] Step-Back Profiling: Distilling User History for Personalized Scientific Writing
    Xiangru Tang, Xingyao Zhang, Yanjun Shao, Jie Wu, Yilun Zhao, Arman Cohan, Ming Gong, Dongmei Zhang, Mark Gerstein.
    IJCAI 2024 Workshop on AI4Research (Best Paper Award)
    [PDF] [Abstract] [Bib]
    Step-Back Profiling
  • [10] A Survey of Generative AI for De Novo Drug Design: New Frontiers in Molecule and Protein Generation
    Xiangru Tang*, Howard Dai*, Elizabeth Knight*, Fang Wu, Yunyang Li, Tianxiao Li, Mark Gerstein.
    Briefings in Bioinformatics, 2024 (IF 13.99, JCR "Q1")
    "An introductory overview with a clear breakdown of datasets, benchmarks, & models."
    [PDF] [Abstract] [Bib]
    GenAI4Drug
  • [9] MolLM: A Unified Language Model for Integrating Biomedical Text with 2D and 3D Molecular Representations
    Xiangru Tang, Andrew Tran, Jeffrey Tan, Mark Gerstein.
    ISMB 2024, Proceedings in Bioinformatics (IF 6.93, JCR "Q1")
    [PDF] [Abstract] [Bib]
    MolLM
  • [8] BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models
    Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein.
    ISMB 2024, Proceedings in Bioinformatics (IF 6.93, JCR "Q1")
    "BioCoder input covers repository-level potential package dependencies, class declarations, & global variables."
    [PDF] [Abstract] [Bib]
    BioCoder
  • [7] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
    Xiangru Tang*, Anni Zou*, Zhuosheng Zhang, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein.
    ACL 2024 Findings
    "The first multi-agent framework within the medical context!"
    [PDF] [Abstract] [Bib]
    MedAgents
  • [6] Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data?
    Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein.
    NAACL 2024 (Oral)
    [PDF] [Abstract] [Bib]
    Struc-Bench
  • [5] Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
    Anni Zou, Zhuosheng Zhang, Hai Zhao, Xiangru Tang.
    IEEE Transactions on Audio, Speech and Language Processing (In Review)
    "Bridge the gap between performance and generalization when using the CoT prompting!"
    [PDF] [Abstract] [Bib]
    Meta-CoT
  • [4] Aligning Factual Consistency for Clinical Studies Summarization through Reinforcement Learning
    Xiangru Tang, Arman Cohan, Mark Gerstein.
    ACL 2023 Clinical Natural Language Processing
    [PDF] [Abstract] [Bib]
  • [3] GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from Doctor-Patient Conversations through Fine-tuning and In-context Learning
    Xiangru Tang, Andrew Tran, Jeffrey Tan, Mark Gerstein.
    ACL 2023 Clinical Natural Language Processing
    [PDF] [Abstract] [Bib]
    MEDIQA
  • [2] CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning
    Xiangru Tang, Arjun Nair, Borui Wang, Bingyao Wang, Jai Desai, Aaron Wade, Haoran Li, Asli Celikyilmaz, Yashar Mehdad, Dragomir Radev.
    NAACL 2022 (Oral)
    [PDF] [Abstract] [Bib]
  • [1] Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries
    Xiangru Tang, Alexander Fabbri, Haoran Li, Ziming Mao, Griffin Adams, Borui Wang, Asli Celikyilmaz, Yashar Mehdad, Dragomir Radev.
    NAACL 2022
    [PDF] [Abstract] [Bib]

Other Publications

  • [17] OpenHands: An Open Platform for AI Software Developers as Generalist Agents
    Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig
    ICLR 2025
    "AI agents function as software developers, capable of command execution, web browsing & API interaction."
    [PDF] [Abstract] [Bib]
    OpenHands
  • [16] MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
    Yilun Zhao, Lujing Xie, Haowei Zhang, Guo Gan, Yitao Long, Zhiyuan Hu, Tongyan Hu, Weiyuan Chen, Chuhan Li, Junyang Song, Zhijian Xu, Chengye Wang, Weifeng Pan, Ziyao Shangguan, Xiangru Tang, Zhenwen Liang, Yixin Liu, Chen Zhao, Arman Cohan.
    CVPR 2025
    [PDF] [Abstract] [Bib]
    MMVU
  • [15] Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
    Zhuosheng Zhang, Yao Yao, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, Hai Zhao.
    ACM Computing Surveys, 2024 (IF 23.8)
    "Generalization, efficiency, customization, scaling, and safety related to CoT and agents."
    [PDF] [Abstract] [Bib]
    CoT-Igniting-Agent
  • [14] Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
    Cunxiang Wang*, Xiaoze Liu*, Yuanhao Yue*, Xiangru Tang, Tianhang Zhang, Cheng Jiayang, Yunzhi Yao, Wenyang Gao, Xuming Hu, Zehan Qi, Yidong Wang, Linyi Yang, Jindong Wang, Xing Xie, Zheng Zhang, Yue Zhang.
    ACM Computing Surveys, 2024 (IF 23.8)
    [PDF] [Abstract] [Bib]
    LLM-Factuality-Survey
  • [13] OctoPack: Instruction Tuning Code Large Language Models
    Niklas Muennighoff, Qian Liu, Armel Randy Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro Von Werra, Shayne Longpre.
    ICLR 2024
    [PDF] [Abstract] [Bib]
    octopack
  • [12] ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, dahai li, Zhiyuan Liu, Maosong Sun.
    ICLR 2024
    [PDF] [Abstract] [Bib]
    ToolBench
  • [11] DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
    Yilun Zhao, Yitao Long, Hongjun Liu, Ryo Kamoi, Linyong Nan, Lyuhao Chen, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan.
    ACL 2024
    [PDF] [Abstract] [Bib]
    DocMath-Eval
  • [10] Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation
    Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, Arman Cohan.
    ACL 2024 Findings
    [PDF] [Abstract] [Bib]
    Contamination-Survey
  • [9] FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents
    Yilun Zhao, Yitao Long, Tintin Jiang, Chengye Wang, Weiyuan Chen, Hongjun Liu, Xiangru Tang, Yiming Zhang, Chen Zhao, Arman Cohan.
    EMNLP 2024
    [PDF] [Abstract] [Bib]
  • [8] PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes
    He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan Yao, Yu Li.
    EMNLP 2024 Findings
    [PDF] [Abstract] [Bib]
    PRESTO
  • [7] Investigating Data Contamination in Modern Benchmarks for Large Language Models
    Chunyuan Deng, Yilun Zhao, Xiangru Tang, Mark Gerstein, Arman Cohan.
    NAACL 2024
    [PDF] [Abstract] [Bib]
  • [6] Data preparation for Deep Learning based Code Smell Detection: A systematic literature review
    Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu, Wenhua Hu.
    Journal of Systems and Software, 2024 (IF 3.7, JCR "Q1")
    [PDF] [Abstract] [Bib]
  • [5] FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations
    Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin.
    Bioinformatics Advances, 2024 (IF 2.32, JCR Q2)
    [PDF] [Abstract] [Bib]
  • [4] RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
    Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir Radev.
    ACL 2023
    [PDF] [Abstract] [Bib]
  • [3] QTSumm: Query-Focused Summarization over Tabular Data
    Yilun Zhao, Zhenting Qi, Linyong Nan, Boyu Mi, Yixin Liu, Weijin Zou, Simeng Han, Ruizhe Chen, Xiangru Tang, Yumo Xu, Dragomir Radev, Arman Cohan.
    EMNLP 2023
    [PDF] [Abstract] [Bib]
  • [2] Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios
    Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan.
    EMNLP 2023
    [PDF] [Abstract] [Bib]
  • [1] RWKV: Reinventing RNNs for the Transformer Era
    Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Leon Derczynski, Xingjian Du, Matteo Grella, Kranthi Gv, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartłomiej Koptyra, Hayden Lau, Jiaju Lin, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Johan Wind, Stanisław Woźniak, Zhenyuan Zhang, Qinghua Zhou, Jian Zhu, Rui-Jie Zhu.
    EMNLP 2023
    [PDF] [Abstract] [Bib]

Services

Area Chair: ACL ARR (ACL, EMNLP, NAACL, etc).
Conference Program Committee / Reviewer: NeurIPS, ICML, ACL, EMNLP, CIKM, NAACL, INLG, IEEE BigData, COLM.
Journal Reviewer: npj Digital Medicine, TPAMI, Neurocomputing, Briefings in Bioinformatics, PLOS Computational Biology, BMC Bioinformatics, PLOS ONE, Health Data Science.
Workshop Reviewer: KDD 2023 Workshop on Data Mining in Bioinformatics, ACL 2023 Workshop on Building Educational Apps, ACL 2023 Workshop on Clinical NLP, ICML 2023 Workshop on Neural Conv AI, ICML 2023 Workshop on Interpretable ML in Healthcare, NAACL-HLT 2021 Workshop on Language and Vision Research.

Teaching

Teaching Fellow - CPSC 452/CPSC 552/AMTH 552/CB&B 663 Deep Learning Theory and Applications, Yale University, 2023 Spring.
Teaching Fellow - CPSC 437/CPSC 537 Introduction to Database Systems, Yale University, 2023 Fall.
Teaching Fellow - CPSC 452/CPSC 552/AMTH 552/CB&B 663 Deep Learning Theory and Applications, Yale University, 2024 Spring.
Teaching Fellow - CPSC 437/CPSC 537 Database Systems, Yale University, 2024 Fall.

Misc.

I took 12 courses (and 3 additional project credits) at Yale: CPSC 523 Principles of Operating Systems, 537 Intro to Database, 539 Software Engineering, 552 Deep Learning Theory, 553 Unsupervised Learning, 569 Randomized Algorithms, 577 NLP, 583 Deep Learning on Graph, 668 Blockchain Research, 677 Adv NLP, 680 Trustworthy Deep Learning, 752 Biomedical Data Sci.
Interestingly, this course load matches the entire requirement for a Yale undergraduate B.S. degree in Computer Science (which requires 11 courses + 1 project credit) and exceeds what's needed for a B.A. (which requires only 9 courses + 1 project credit).