About Me

I’m a CS graduate student at Yale University, advised by Prof. Dragomir Radev. I’m a member of LILY (Yale NLP) lab. I also work in Prof. Mark Gerstein’s Lab.

I work in the area of machine learning and natural language processing. I’m also interested in bioinformatics. My research lies in computational models of sequence modeling, spanning natural language, programming code, and DNA sequences. Recent research projects include (1) faithful text generation; (2) long dialogue understanding and summarization; (3) deciphering DNA with pre-trained language miodels.

[Google Scholar]

Publications and Manuscripts

Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types
Shentong Mo, Xi Fu, Chenyang Hong, Yizhen Chen, Yuxuan Zheng, Xiangru Tang, Zhiqiang Shen, Eric P Xing, Yanyan Lan
NeurIPS 2021 Workshop AI4Science [paper]

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries
Xiangru Tang, Alexander R. Fabbri, Ziming Mao, Griffin Adams, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev
arXiv: 2109.09195 preprint, 2021 [paper]

FeTaQA: Free-form Table Question Answering
Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Murori Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev
Transactions of the Association for Computational Linguistics (TACL) 2021 [paper] [code] [data]

Dart: Open-domain structured data record to text generation
Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
NAACL 2021 [paper] [code] [data]

FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching
Xiangru Tang, Alan Aw
arXiv: 2010.05523 preprint, 2020 [paper]

CUHK at semeval-2020 task 4: Commonsense explanation, reasoning and prediction with multi-task learning
Hongru Wang, Xiangru Tang, Sunny Lai, Kwong Sak Leung, Jia Zhu, Gabriel Pui Cheong Fung, Kam-Fai Wong
SemEval 2020 [paper]

Improving Code Generation From Descriptive Text By Combining Deep Learning and Syntax Rules
Xiangru Tang, Zhihao Wang, Jiyang Qi, Zengyang Li
International Conference on Software Engineering and Knowledge Engineering 2019 [paper]

Work Experience

I was a research intern at Langboat Tech, working on pre-trained models for natural language generation in business scenarios.

I worked at Dr. Kai-Fu Lee’s Sinovation Ventures and we teamed up 11 award-winning authors with my model to write science fiction, support creativity, and empower humans, reported by People’s Daily, Yahoo news, etc.

Previously, I did research on efficient pretraining with the Machine Learning group at Microsoft Research Asia, advised by Guolin Ke; and retrieval-augmented generation at Tencent AI Lab, advised by Yan Wang.

I was a master’s student at Chinese Academy of Sciences, advised by Yanyan Lan and Jiafeng Guo. My project on detecting COVID-19-related fake news in social media was awarded by our government and deployed into real practice to curb the pandemic, and my team won first place among 1,070 competitors (1/1070), our contribution was reported by news.cn, Sina, and Tencent, etc.

I used to be involved with the hackathon community. I was a member of Unique-Studio, we found the earliest and largest collegiate hackathon in China - Hackday, reported by Saikr and Sohu, etc. And I was the mentor in the world’s largest collegiate hackathon, CalHacks at UC Berkeley.

I actively encourage international students to get involved in entrepreneurship and, as the president, found Entrepreneurs Association of Yale International Students. We hosted the Technology Innovation Summit and invited more than 200 founders and investors, reported by Phoenix New Media, NetEase News, Sohu, and Tencent. We launched the first peer-focused start-up community of Yale entrepreneurs connecting undergrads, grads, and alumni, increased active membership by 1200%. I believe innovation is the key to entrepreneurship, while leadership affects the effectiveness and accuracy of innovation.

I was an undergraduate student at UC Berkeley (Berkeley International Study Program), where I work with some PhD students.

If you’re interested in my research, please feel free to contact me: xiangru.tang@yale.edu.


  • I got into the Tsai CITY Launch Pad with my startup, PreMind Health. Tsai Center for Innovative Thinking at Yale (Tsai CITY) aims at cultivating innovators, leaders, pioneers, creators, and entrepreneurs in all fields and for all sectors of society, donated by Joseph Chongxin Tsai.

  • I won first place in the final roadshow of the 6-th National Youth AI Innovation and Entrepreneurship Conference, organized by Chinese Association for Artificial Intelligence. This project is about providing AI functionality as a Service in enterprise contexts.

  • Our team won first place in the DeeCamp 2020, and was lectured by 12 world-renowned AI experts.

  • I won first place in the 2020 Artificial Intelligence Application Innovation Competition, HUAWEI CLOUD. This project is about AI-based enterprise office services.

  • National Scholarship, Ministry of Education.

  • MOE-Google Collaboration Fellowship, Ministry of Education and Google LLC.

Research Interests

My research focuses on developing machine learning methods in the healthcare domain. I have been working on text generation, where a model generates realistic-looking text (e.g. article writing, data-to-text, summarization, and chatbots). In my work, with robust and efficient pre-trained language models, generated texts can be controlled to match user-defined styles, task-specific behavior, and other attributes.

Language generation models can also be applied to synthetic protein design, which allows it to generate with fine-grained control by metrics based on primary sequence similarity, secondary structure accuracy, and conformational energy.

And I know it is my great happiness to contribute towards it and finding answers to the following question:

  • How can AI offer a high level of precision to the complicated and time-consuming discovery phase in drug development, which leads to faster development timelines and a lower failure risk down the road?

  • AI also gives researchers the power to analyze disparate datasets. For example, it can combine vast libraries of chemical compounds, biomedical data from the literature, and patient health data into knowledge graphs. This data model creates new connections and insights into previously unrelated information,

  • How to analyze disparate datasets from the literature, patient health data, and knowledge graphs? And how to can use them to make predictions, model novel pathways and disease states, and test findings?

Professional Service

Program committee or reviewer: ICLR 2022, EMNLP 2021, ICML 2021, ACL 2021, AAAI 2020, Challenge-HML@ACL 2020, ALVR 2021


I am also an amateur Go player and accordionist, and I started to learn how to play them when I was 5. I am actively looking for potential collaborations (both hobby-wise or research-wise), feel free to contact me for some potential projects or some fun!

“There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living.” – Nelson Mandela.