I am a graduate student in the Computer Science Department at Yale University, advised by Prof. Mark Gerstein and Prof. Dragomir Radev. I am affiliated with Gerstein Lab (Yale Computational Biology and Bioinformatics) and LILY Lab (Yale NLP).
I work in the field of Artificial Intelligence with applications in natural language processing, healthcare, and bio-medical research. I am passionate about building novel models to extract new knowledge from scientific data which is generated in extremely high throughput and inaccessible to currently existing methods.
Recent research projects include (1) faithful text generation; (2) dialogue understanding and summarization; (3) deciphering genomics/epigenomics/transcriptomics with pre-trained language models.
Publications and Manuscripts
EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts
Irene Li, Keen You, Xiangru Tang, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Dragomir Radev
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
ACL 2022 demo track [paper]
CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning
Xiangru Tang, Arjun Nair, Borui Wang, Bingyao Wang, Jai Desai, Aaron Wade, Haoran Li, Asli Celikyilmaz, Yashar Mehdad, Dragomir Radev
NAACL 2022 [paper]
Surfer100: Generating Surveys From Web Resources on Wikipedia-style
Irene Li, Alexander Fabbri, Rina Kawamura, Yixin Liu, Xiangru Tang, Jaesung Tae, Chang Shen, Sally Ma, Tomoe Mizutani, Dragomir Radev
LREC 2022 [paper]
CLICKER: A Computational LInguistics Classification Scheme for Educational Resources
Swapnil Hingmire, Irene Li, Rena Kawamura, Benjamin Chen, Alexander Fabbri, Xiangru Tang, Yixin Liu, Thomas George, Tammy Liao, Wai Pan Wong, Vanessa Yan, Richard Zhou, Girish K. Palshikar, Dragomir Radev
arXiv preprint [paper]
Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types
Shentong Mo, Xi Fu, Chenyang Hong, Yizhen Chen, Yuxuan Zheng, Xiangru Tang, Zhiqiang Shen, Eric P Xing, Yanyan Lan
NeurIPS 2021 Workshop AI4Science [paper]
Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries
Xiangru Tang, Alexander R. Fabbri, Ziming Mao, Griffin Adams, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev
NAACL 2022 [paper]
Improving RNA Secondary Structure Design using Deep Reinforcement Learning
Alexander Whatley, Zhekun Luo, Xiangru Tang
arXiv: 2111.04504 preprint, 2021 [paper]
FeTaQA: Free-form Table Question Answering
Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Murori Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev
TACL 2021 [paper] [code] [data]
Dart: Open-domain structured data record to text generation
Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
NAACL 2021 [paper] [code] [data]
FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching
Xiangru Tang, Alan Aw
arXiv: 2010.05523 preprint, 2020 [paper]
CUHK at semeval-2020 task 4: Commonsense explanation, reasoning and prediction with multi-task learning
Hongru Wang, Xiangru Tang, Sunny Lai, Kwong Sak Leung, Jia Zhu, Gabriel Pui Cheong Fung, Kam-Fai Wong
SemEval 2020 [paper]
I am developing novel models for dialogue summarization as part of a collaboration between Facebook (Meta) AI & Yale.
I was the first research intern at Langboat Tech, working on pre-trained models for natural language generation in business scenarios.
I worked at Dr. Kai-Fu Lee’s Sinovation Ventures and we teamed up 11 award-winning authors with my model to write science fiction, support creativity, and empower humans, reported by People’s Daily, Yahoo news, etc.
Previously, I did research on efficient pretraining with the Machine Learning group at Microsoft Research Asia, advised by Guolin Ke; and retrieval-augmented generation at Tencent AI Lab, advised by Yan Wang.
I was a master’s student at Chinese Academy of Sciences, advised by Yanyan Lan and Jiafeng Guo. My project on detecting COVID-19-related fake news in social media was awarded by our government and deployed into real practice to curb the pandemic, and my team won first place among 1,070 competitors (1/1070), our contribution was reported by news.cn, Sina, and Tencent, etc.
I used to be involved with the hackathon community. I was a member of Unique-Studio, we found the earliest and largest collegiate hackathon in China - Hackday, reported by Saikr and Sohu, etc. And I was the mentor in the world’s largest collegiate hackathon, CalHacks at UC Berkeley. I actively encourage students to be involved in research from nontraditional backgrounds and historically underrepresented communities.
At Yale, I am the Committee Chair of Graduate Student Assembly and found Entrepreneurs Association of Yale International Students. We launched the first peer-focused start-up community of Yale entrepreneurs connecting undergrads, grads, and alumni, increased active membership by 1200%.
I have been honored on the AACYF Top 30 Under 30 list.
I got into the Tsai CITY with my startup again. We are creating a Web 3.0 Healthcare Ecosystem - powered by blockchain and our unique cryptocurrency, $ATLRU token.
I won the Yale Annual Healthcare Hackathon.
I got into the Tsai CITY LaunchPad with my startup, PreMind Health. We provide streamlined access to multilingual, culturally sensitive mental health support with NLP.
I won first place in the final roadshow of the 6-th National Youth AI Innovation and Entrepreneurship Conference, organized by Chinese Association for Artificial Intelligence.
Our team won first place in the DeeCamp 2020, and was lectured by 12 world-renowned AI experts.
I won first place in the 2020 Artificial Intelligence Application Innovation Competition, HUAWEI CLOUD.
National Scholarship, Ministry of Education.
MOE & Google Fellowship, Chinese Ministry of Education and Google China LLC.
Nov. 2021, Talk at Ivey Business School
Sep. 2020, Talk at Shenzhen Artificial Intelligence Association
I know it is my great happiness to contribute towards it and find answers to the following question:
How can AI offer a high level of precision to the complicated and time-consuming discovery phase in drug development, which leads to faster development timelines and a lower failure risk down the road?
AI also gives researchers the power to analyze disparate datasets. For example, it can combine vast libraries of chemical compounds, biomedical data from the literature, and patient health data into knowledge graphs. This data model creates new connections and insights into previously unrelated information.
How to analyze disparate datasets from the literature, patient health data, and knowledge graphs? And how to can use them to make predictions, model novel pathways and disease states, and test findings?
I am also an amateur Go player and accordionist, and I started to learn how to play them when I was 5. I am actively looking for potential collaborations (both hobby-wise or research-wise), feel free to contact me for some potential projects or some fun!
“There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living.” – Nelson Mandela.