Wenyan Li

Wenyan Li

Biography

Hi! I am Wenyan. My research focuses on building and interpreting multimodal and NLP models. I am currently a senior Researcher at Alipes Capital, where I develop automated pipelines for financial document understanding that facilitates trading decisions.

I completed my PhD with a focus on multimodal learning at the CoAStaL NLP Group, University of Copenhagen, where I was supervised by Anders Søgaard.

I was also a senior NLP Researcher at Sensetime and Comcast AI Research Lab. Before that, I spent a wonderful time at University of Maryland, College Park for my MS, where I worked with Prof. Jordan Boyd-Graber on Natural Language Processing.

Feel free to reach out for collaboration on related projects. MSc or PhD thesis supervision is also possible (for fintech topics, check our company collaboration roles).

In my free time, I enjoy painting, cooking, yoga, and table tennis :)

NEWS:

  • 10/2025 – I have successfully defended my PhD!
  • 08/2025 – Two papers (one main and one findings) are accepted to EMNLP 2025!
  • 07/2025 – CultureCLIP is accepted to COLM 2025!
  • I’m currently on my research visit at RycoLab in Zürich until end of June 2025.
  • 11/2024 – Invited talk and short visit at MIT.
  • 11/2024 – Our W1KP paper won the Outstanding Paper Award at EMNLP 2024!
  • 11/2024 – I will present FoodieQA in EMNLP 2024, see you in Miami!
  • 09/2024 – FoodieQA and W1KP are accepted to EMNLP 2024 main conference!
  • 05/2024 – One paper accepted to ACL 2024 main conference!
Interests
  • Multimodal Learning
  • Natural Language Processing
  • Information Retrieval
  • Speech Technology
Education
  • Ph.D. in Computer Science, 2022-2025

    University of Copenhagen, Denmark

  • M.S. in EECS (Thesis Track), 2016-2018

    University of Maryland, College Park

Experience

 
 
 
 
 
Senior NLP Researcher (Artificial General Intelligence team)
Sep 2021 – Jun 2022 Shanghai, China
  • Knowledge-enhanced QA and dialogue system * Multimodal and prompt learning
 
 
 
 
 
Senior Machine Learning Research Engineer (NLP)
Jan 2019 – Jul 2021 Washington D.C.
  • Designed an unsupervised auto-annotation system for voice queries with user behavioral modeling to automatically identify errors in speech recognition and NLP systems and suggest corrections
  • Built an active learning pipeline with auto-labeled user transcriptions to improve ASR system for comcast X1, increasing system recognition accuracy by 9% (summarized the work into a conference paper as the first-author and filed a patent as the main inventor)
  • Developed a context-based approach that discovered misclassified user queries in question answering systems by performing semantic search with Sentence-BERT
  • Leveraged subword-level query representation and adversarial training in customer care dialogue system for misspelled user queries, which improved classification accuracy by 18% and increased user experience stability
  • Mentored interns and new-hires on projects relevant to multi-task learning and query representation

Certificates

Coursera
Applied Text Mining in Python
See certificate
edX
Using Python for Research
See certificate
Coursera
Object Oriented Programming in Java
See certificate

Contact