Yunfei Xie | 谢云飞

I am a fourth-year undergrad student in the School of Artificial Intelligence and Automation at Huazhong University of Science & Technology..

I am honored to work as a research intern with Prof. Yuyin Zhou and Prof. Cihang Xie in VLAA at UC, Santa Cruz and Dr.Jieru Mei in CCVL at Johns Hopkins University.

My research interests mainly include multimodal language models and computer vision applications like segmentation.

Additionally, I am actively seeking potential Ph.D. or RA positions enrolling in Fall 2025.

Email  /  Google Scholar  /  CV  /  Github  /  Twitter

profile photo

Publications

[Multimodal], [Segmentation], [Generation]

[Segmentation]

From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei
ECCV, 2024
paper
TL;DR: We developed a hierarchical superpixel-based model that simultaneously addresses two conflicting needs in segmentation: local detail for parts and global context for objects. By employing local aggregation for superpixels in part segmentation and global aggregation for group tokens in object segmentation, we achieved state-of-the-art performance, surpassing previous methods by 2.8% in part segmentation and 2.0% in object segmentation on common datasets.

[Multimodal]

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Yunfei Xie, Juncheng Wu, Haoqin Tu, Siwei Yang, Bingchen Zhao, Yongshuo Zong, Qiao Jin, Cihang Xie, Yuyin Zhou
arxiv, 2024
paper / code / data
TL;DR: We benchmarked OpenAI's o1-preview model on a comprehensive medical dataset covering 37 diverse tasks, including understanding, reasoning, multilingual capabilities, and agent interaction. Our exploration revealed that OpenAI’s o1-preview outperforms GPT-4 in overall accuracy by approximately 6.4%, marking a significant step toward the development of an AI doctor. Additionally, we presented preliminary findings that enhance clinical reasoning in o1 and discussed the model’s limitations and future directions for improvement.

[Multimodal]

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Liu Sheng, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou
arxiv, 2024
paper / website
TL;DR: To scale up multimodal medical datasets, we developed an automated pipeline to generate multigranular visual and textual annotations for any given medical image. Based on this pipeline, we introduced MedTrinity-25M, a comprehensive, large-scale dataset containing over 25 million images across 10 modalities, annotated with multigranular labels for more than 65 diseases, including detailed ROIs. Our VLM achieved SOTA results in three medical VQA tasks, exceeding previous benchmarks by 8.8%, 3.7%, and 18.3%.

[Generation]

Story-Adapter: A Training-free Iterative Framework For Long Story Visualization
Jiawei Mao*, Xiaoke Huang*, Yunfei Xie, Yuanqi Chang, Mude Hui, Bingjie Xu, Yuyin Zhou
arxiv, 2024
paper / code
TL;DR: We proposed a training-free and efficient framework for generating long stories with up to 100 frames. To optimize image generation, we designed an iterative paradigm that progressively refines the process by integrating text and global constraints, achieving more precise interactions and improved semantic consistency throughout the story.

[Segmentation]

Few-shot Medical Image Segmentation via Supervoxel Transformer
Yunfei Xie, Alan Yuille, Cihang Xie, Yuyin Zhou, Jieru Mei
arxiv, 2024
paper
TL;DR: To address the complexity of 3D medical volume representations, we proposed supervoxels, which are more flexible and semantically meaningful, suitable for 3D organ structures. Based on it, we introduced SVFormer, the first 3D Transformer-based few-shot framework for medical segmentation, which uses supervoxels to reduce redundancy and preserve 3D details. We validated SVFormer on three public datasets, consistently outperforming state-of-the-art methods in few-shot segmentation by 5.7%, 1.2%, and 1.3%.

[Segmentation]

Brain Tumor Segmentation Through SuperVoxel Transformer
Yunfei Xie, Ce Zhou, Jieru Mei, Xianhang Li, Cihang Xie, Yuyin Zhou
ISBI, 2024
paper
TL;DR: We developed two CNN-Transformer hybrid models as part of the BraTS-ISBI 2024 challenge to create brain tumor segmentation models with broad applicability. To enhance interpretability, we introduced a supervoxel Transformer that clusters similar voxels and uses a cross-attention mechanism to iteratively refine voxel assignments and features.

Education

Undergrad in Artificial Intelligence, Huazhong University of Science & Technology
2021.09 - Present, Wuhan, China

Experience

Dec. 2023 - Present: Research Intern, VLAA, University of California, Santa Cruz
Supervisor: Prof. Yuyin Zhou and Prof. Cihang Xie and Dr. Jieru Mei
Focus: Multimodal Language Models for Understanding, Reasoning and Captioning

Jul. 2023 - Dec. 2023: Research Intern, CCVL, Johns Hopkins University
Supervisor: Prof. Cihang Xie and Dr. Jieru Mei
Focus: Multi-Level Segmentation, Few-Shot Segmentation

Reviewer

ICLR '25, ICML '24, CVPR '24, IEEE ISBI '24


This website template was borrowed from  Jon Barron .
Last updated on July 29th, 2024.