I am Yimu Wang, a last-year Ph.D. student at UWaterloo.
I obtained my master’s degree under the supervision of Prof. Lijun Zhang in the LAMDA Group led by Prof. Zhihua Zhou at Nanjing University.
I was honored to spend a wonderful RA time at Tsinghua University with Prof. Jingjing Liu and Prof. Yang Liu and amazing experiences at Amazon, SONY AI, Borealis AI, Tencent Lightspeed & Quantum Studios, Alibaba, Netease Games, and Megvii.
My major research interests are Multi-modal Learning and 3D understanding.
Yimu Wang
PhD Student
University of Waterloo CS
News
[2025/09] One paper was accepted to JMLR 2025.
[2025/09] One paper was accepted to NeurIPS 2025.
[2025/08] One survey paper was accepted to TMLR 2025.
[2025/08] One paper was accepted to EMNLP 2025.
[2025/06] One paper was accepted to ICCV 2025.
[2025/05] One paper was accepted to ACL 2025.
[2025/01] Two paper was accepted to NAACL 2025.
[2024-10] One paper got accepted by WACV 2025!
[2024-09] One paper got accepted by NeurIPS Workshop 2025!
[2023-12] Two papers were accepted by AAAI 2024!
[2023-10] Three papers got accepted by EMNLP 2023 (one main paper and two findings)!
[2023-09] One paper got accepted by NeurIPS 2023!
[2023-04] I have been awarded by the CVPR's DEI award for traveling to Vancouver!
Lexicographic Lipschitz Bandits: New Algorithms and a Lower Boun
Bo Xue, Ji Cheng, Fei Liu, Yimu Wang, Lijun Zhang, and Qingfu Zhang Journal of Machine Learning Research (JMLR), 2025.
JMLR 2025
@misc{wang2025hawaiihierarchicalvisualknowledge,
title={HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models},
author={Yimu Wang and Mozhgan Nasr Azadani and Sean Sedwards and Krzysztof Czarnecki},
year={2025},
eprint={2506.19072},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.19072},
}
Hawaii: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki Annual Conference on Neural Information Processing Systems (NeurIPS), 2025.
@misc{wang2025hawaiihierarchicalvisualknowledge,
title={HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models},
author={Yimu Wang and Mozhgan Nasr Azadani and Sean Sedwards and Krzysztof Czarnecki},
year={2025},
eprint={2506.19072},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.19072},
}
Survey of Video Diffusion Models: Foundations, Implementations, and Applications
Yimu Wang, Xuye Liu, Wei Pang, Li Ma, Shuai Yuan, Paul Debevec, Ning Yu Transactions on Machine Learning Research (TMLR), 2025.
@article{
wang2025survey,
title={Survey of Video Diffusion Models: Foundations, Implementations, and Applications},
author={Yimu Wang and Xuye Liu and Wei Pang and Li Ma and Shuai Yuan and Paul Debevec and Ning Yu},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=2ODDBObKjH},
note={Survey Certification}
}
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki Empirical Methods in Natural Language Processing (EMNLP), 2025.
@misc{wang2025leominiefficientmultimodallarge,
title={LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts},
author={Yimu Wang and Mozhgan Nasr Azadani and Sean Sedwards and Krzysztof Czarnecki},
year={2025},
eprint={2504.04653},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.04653},
}
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
A. Chow, E. Riddell,Yimu Wang, S. Sedwards, K. Czarnecki International Conference on Computer Vision (ICCV), 2025.
@misc{chow2025ovscansemanticallyconsistentalignment,
title={OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection},
author={Adrian Chow and Evelien Riddell and Yimu Wang and Sean Sedwards and Krzysztof Czarnecki},
year={2025},
eprint={2503.06435},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.06435},
}
NBDESCRIB: A Dataset for Text Description Generation from Tables and Code in Jupyter Notebooks with Guidelines
Xuye Liu, Tengfei Ma,Yimu Wang, Fengjie Wang, Jian Zhao Annual Meeting of the Association for Computational Linguistics (Findings of ACL), 2025.
Findings of ACL 2025
bibtex
ELIOT: Zero-Shot Video-Text Retrieval through Relevance-Boosted Captioning and Structural Information Extractio
Xuye Liu, Yimu Wang, Jian Zhao NAACL Student Research Workshop (SRW of NAACL), 2025.
@inproceedings{liu-etal-2025-eliot,
title = "{ELIOT}: Zero-Shot Video-Text Retrieval through Relevance-Boosted Captioning and Structural Information Extraction",
author = "Liu, Xuye and Wang, Yimu and Zhao, Jian",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)",
year = "2025",
pages = "381--391",
}
DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models
Yimu Wang, Shuai Yuan, Bo Xue, Xiangru Jian, Wei Pang, Mushi Wang, Ning Yu Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025.
@inproceedings{wang-etal-2025-dream,
title = "{DREAM}: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models",
author = "Wang, Yimu and Yuan, Shuai and Xue, Bo and Jian, Xiangru and Pang, Wei and Wang, Mushi and Yu, Ning",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
year = "2025",
pages = "3037--3056",
}
AIDE: Improving 3D Open-Vocabulary Semantic Segmentation by Aligned Vision-Language Learning
Yimu Wang, Krzysztof Czarneck IEEE Winter Conference on Applications of Computer Vision (WACV), 2025.
@InProceedings{Wang_2025_WACV,
author = {Wang, Yimu and Czarnecki, Krzysztof},
title = {AIDE: Improving 3D Open-Vocabulary Semantic Segmentation by Aligned Vision-Language Learning},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {2674-2685}
}
Pretext Training Algorithms for Event Sequence Data
Yimu Wang, He Zhao, Ruizhi Deng, Frederick Tung, Greg Mori Conference on Neural Information Processing Systems Workshop (NeurIPS workshop), 2024.
@misc{wang2024pretexttrainingalgorithmsevent,
title={Pretext Training Algorithms for Event Sequence Data},
author={Yimu Wang and He Zhao and Ruizhi Deng and Frederick Tung and Greg Mori},
year={2024},
eprint={2402.10392},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2402.10392},
}
Lost Domain Generalization Is a Natural Consequence of Lack of Training Domains
Yimu Wang, Yihan Wu, Hongyang Zhang Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024.
@article{Wang_Wu_Zhang_2024,
title={Lost Domain Generalization Is a Natural Consequence of Lack of Training Domains},
volume={38}, url={https://ojs.aaai.org/index.php/AAAI/article/view/29497},
DOI={10.1609/aaai.v38i14.29497},
abstractNote={We show a hardness result for the number of training domains required to achieve a small population error in the test domain. Although many domain generalization algorithms have been developed under various domain-invariance assumptions, there is significant evidence to indicate that out-of-distribution (o.o.d.) test accuracy of state-of-the-art o.o.d. algorithms is on par with empirical risk minimization and random guess on the domain generalization benchmarks such as DomainBed. In this work, we analyze its cause and attribute the lost domain generalization to the lack of training domains. We show that, in a minimax lower bound fashion, any learning algorithm that outputs a classifier with an ε excess error to the Bayes optimal classifier requires at least poly(1/ε) number of training domains, even though the number of training data sampled from each training domain is large. Experiments on the DomainBed benchmark demonstrate that o.o.d. test accuracy is monotonically increasing as the number of training domains increases. Our result sheds light on the intrinsic hardness of domain generalization and suggests benchmarking o.o.d. algorithms by the datasets with a sufficient number of training domains.},
number={14},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Wang, Yimu and Wu, Yihan and Zhang, Hongyang}, year={2024},
month={Mar.}, pages={15689-15697} }
Multiobjective Lipschitz Bandits under Lexicographic Ordering
Bo Xue, Ji Cheng, Fei Liu,Yimu Wang, Qingfu Zhang Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024.
@article{Xue_Cheng_Liu_Wang_Zhang_2024,
title={Multiobjective Lipschitz Bandits under Lexicographic Ordering},
volume={38}, url={https://ojs.aaai.org/index.php/AAAI/article/view/29558}, DOI={10.1609/aaai.v38i15.29558}, abstractNote={This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize ? objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is O((KT)^(2/3)) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Omega(KlogT). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of O(T^((d_z^i+1)/(d_z^i+2))) for the i-th objective, where d_z^i is the zooming dimension for the i-th objective, with i in {1,2,...,m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.}, number={15},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Xue, Bo and Cheng, Ji and Liu, Fei and Wang, Yimu and Zhang, Qingfu},
year={2024}, month={Mar.}, pages={16238-16246}
}
Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, and Lijun Zhang Conference on Neural Information Processing Systems (NeurIPS), 2023.
@inproceedings{
xue2023efficient,
title={Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards},
author={Bo Xue and Yimu Wang and Yuanyu Wan and Jinfeng Yi and Lijun Zhang},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=Vbm5UCaYeh}
}
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Yimu Wang, Xiangru Jian, Bo Xue Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Oral), 2023.
@inproceedings{wang-etal-2023-balance,
title = "Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks",
author = "Wang, Yimu and
Jian, Xiangru and
Xue, Bo",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.652/",
doi = "10.18653/v1/2023.emnlp-main.652",
pages = "10542--10567"
}
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Yimu Wang, Peng Shi Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP), 2023.
@inproceedings{wang-shi-2023-video,
title = "Video-Text Retrieval by Supervised Sparse Multi-Grained Learning",
author = "Wang, Yimu and
Shi, Peng",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.46/",
doi = "10.18653/v1/2023.findings-emnlp.46",
pages = "633--649"
}
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
Xiangru Jian, Yimu Wang Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP), 2023.
@inproceedings{jian-wang-2023-invgc,
title = "{I}nv{GC}: Robust Cross-Modal Retrieval by Inverse Graph Convolution",
author = "Jian, Xiangru and
Wang, Yimu",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.60/",
doi = "10.18653/v1/2023.findings-emnlp.60",
pages = "836--865"
}
Cooperation or Competition: Avoiding Player Domination for Multi-target Robustness by Adaptive Budgets
Yimu Wang, Dinghuai Zhang, Yihan Wu, Heng Huang, Hongyang Zhang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
@INPROCEEDINGS{10203542,
author={Wang, Yimu and Zhang, Dinghuai and Wu, Yihan and Huang, Heng and Zhang, Hongyang},
booktitle={2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
title={Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets},
year={2023},
volume={},
number={},
pages={20564-20574},
keywords={Training;Deep learning;Adaptation models;Computer vision;Games;Benchmark testing;Robustness;Adversarial attack and defense},
doi={10.1109/CVPR52729.2023.01970}}
Multimodal Federated Learning via Contrastive Representation Ensemble
Qiying Yu, Yang Liu, Yimu Wang, Ke Xu, Jingjing Liu International Conference on Learning Representations (ICLR), 2023.
@inproceedings{
yu2023multimodal,
title={Multimodal Federated Learning via Contrastive Representation Ensemble},
author={Qiying Yu and Yang Liu and Yimu Wang and Ke Xu and Jingjing Liu},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=Hnk1WRMAYqg}
}
Deep Unified Cross-Modality Hashing by Pairwise Data Alignment
Yimu Wang, Bo Xue, Quan Cheng, Yuhui Chen, and Lijun Zhang International Joint Conference on Artificial Intelligence (IJCAI), 2021.
@inproceedings{ijcai2021p156,
title = {Deep Unified Cross-Modality Hashing by Pairwise Data Alignment},
author = {Wang, Yimu and Xue, Bo and Cheng, Quan and Chen, Yuhui and Zhang, Lijun},
booktitle = {Proceedings of the Thirtieth International Joint Conference on
Artificial Intelligence, {IJCAI-21}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Zhi-Hua Zhou},
pages = {1129--1135},
year = {2021},
month = {8},
note = {Main Track},
doi = {10.24963/ijcai.2021/156},
url = {https://doi.org/10.24963/ijcai.2021/156},
}
Searching Privately by Imperceptible Lying: A Novel Private Hashing Method with Differential Privacy
Yimu Wang, Shiyin Lu, and Lijun Zhang ACM International Conference on Multimedia (ACM MM), 2020.
@inproceedings{10.1145/3394171.3413882,
author = {Wang, Yimu and Lu, Shiyin and Zhang, Lijun},
title = {Searching Privately by Imperceptible Lying: A Novel Private Hashing Method with Differential Privacy},
year = {2020},
isbn = {9781450379885},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3394171.3413882},
doi = {10.1145/3394171.3413882},
abstract = {In the big data era, with the increasing amount of multi-media data, approximate nearest neighbor~(ANN) search has been an important but challenging problem. As a widely applied large-scale ANN search method, hashing has made great progress, and achieved sub-linear search time with low memory space. However, the advances in hashing are based on the availability of large and representative datasets, which often contain sensitive information. Typically, the privacy of this individually sensitive information is compromised. In this paper, we tackle this valuable yet challenging problem and formulate a task termed as private hashing, which takes into account both searching performance and privacy protection. Specifically, we propose a novel noise mechanism, i.e., Random Flipping, and two private hashing algorithms, i.e., PHashing and PITQ, with the refined analysis within the framework of differential privacy, since differential privacy is a well-established technique to measure the privacy leakage of an algorithm. Random Flipping targets binary scenarios and leverages the "Imperceptible Lying" idea to guarantee ε-differential privacy by flipping each datum of the binary matrix (noise addition). To preserve ε-differential privacy, PHashing perturbs and adds noise to the hash codes learned by non-private hashing algorithms using Random Flipping. However, the noise addition for privacy in PHashing will cause severe performance drops. To alleviate this problem, PITQ leverages the power of alternative learning to distribute the noise generated by Random Flipping into each iteration while preserving ε-differential privacy. Furthermore, to empirically evaluate our algorithms, we conduct comprehensive experiments on the image search task and demonstrate that proposed algorithms achieve equal performance compared with non-private hashing methods.},
booktitle = {Proceedings of the 28th ACM International Conference on Multimedia},
pages = {2700–2709},
numpages = {10},
keywords = {large-scale multimedia retrieval, hashing, differential privacy},
location = {Seattle, WA, USA},
series = {MM '20}
}
Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Bo Xue, Guanghui Wang,Yimu Wang, Lijun Zhang International Joint Conference on Artificial Intelligence (IJCAI), 2020.
@inproceedings{ijcai2020p406,
title = {Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs},
author = {Xue, Bo and Wang, Guanghui and Wang, Yimu and Zhang, Lijun},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Christian Bessiere},
pages = {2936--2942},
year = {2020},
month = {7},
note = {Main track},
doi = {10.24963/ijcai.2020/406},
url = {https://doi.org/10.24963/ijcai.2020/406},
}
An Adversarial Domain Adaptation Network for Cross-Domain Fine-Grained Recognition
Yimu Wang, Ren-Jie Song, Xiu-Shen Wei, and Lijun Zhang IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.