I am a Ph.D. student in the Institute for Artificial Intelligence at Peking University (PKU), advised by Prof. Yitao Liang. Before joining PKU, I received my MSc and BA degrees in Control Science and Technology from Beijing Institute of Technology. I work on building open-ended embodied agents with multi-task skills, including visual localization, task planning, and decision-making. In particular, I am interested in building and leveraging large pre-trained Foundation Models to improve the generalization of agent capabilities.
Recently, we have developed a series of open-world multi-task agents, including OmniJARVIS (pretrained end-to-end Vision-Language-Action models with self-supervised quantified tokenizer), JARVIS-1 (self-improving with multimodal memory), DEPS (interactive long-horizon planning agent), RAT (tool-use agent with retrieval-augmented thought), GROOT (self-supervised vision-based multitask policy), and ProAgent (collaborating agents).
🔥 News
- Jun 2024: 🎉🎉 Rectified Scaling Law is accepted by ICML 2024.
- Jan 2024: 🎉🎉 GROOT is accepted by ICLR 2024 for spotlight presentation (top 5%).
- Jan 2024: 🎉🎉 ProAgent is accepted by AAAI 2024 for oral presentation.
- Sep 2023: 🎉🎉 DEPS is accepted by NeurIPS 2023.
- Jul 2023: 🎉🎉 DEPS received Best Paper Award at ICML 2023 TEACH Workshop!
- Feb 2023: 🎉🎉 Two papers are accepted by CVPR 2023.
📝 Publications
![sym](images/papers/2407.00114.png)
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
An end-to-end open-ended agent based on Vision-Language-Action (VLA) models with self-supervised behavior tokenizer, that can answer questions and follow instructions in open-world Minecraft.
Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang
![sym](images/papers/2403.05313.png)
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
An agent with retrieval-augmented thought that can conduct code generation, math reasoning, embodied planning and open-ended question answering.
Zihao Wang, Anji Liu, Haowei Lin, Jiaqi Li, Xiaojian Ma, Yitao Liang
![sym](images/papers/2311.05997.png)
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
A multi-task agent that can self-improve in open-ended Minecraft and accomplish up to 200+ tasks.
Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang
![sym](images/papers/2302.01560.png)
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
Best Paper Award, ICML 2023 TEACH WorkshopZihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang
![sym](images/papers/2310.08235.png)
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
ICLR 2024 (Spotlight) | Project | Paper | Code | Twitter | Media
![sym](images/papers/2402.02314.png)
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang
![sym](images/papers/2310.08367.png)
MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft
Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang
![sym](images/papers/2308.11339.png)
ProAgent: Building Proactive Cooperative AI with Large Language Models
Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang
![sym](images/papers/2301.10034.png)
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
![sym](images/papers/CVPR23.png)
Learning Transformation-Predictive Representations for Detection and Description of Local Features
Zihao Wang, Chunxu Wu, Yifei Yang, Zhen Li
CVPR 2023
![sym](images/papers/TNNLS22.png)
Graph-Based Contrastive Learning for Description and Detection of Local Features
Zihao Wang, Zhen Li, Xueyi Li, Wenjie Chen, Xiangdong Liu
IEEE Trans. Neural Netw. Learn. Syst. (TNNLS 2022)
![sym](images/papers/IJCAI21.png)
Zihao Wang, Xueyi Li, Zhen Li
IJCAI 2021
💬 Talks
- [Dec 2023] Invited Talks at BAAI, Peking University on “Building Autonmous Agents in Open World”.
- [Jul 2023] Invited Talk at NVIDIA on “Towards Multi-task Agents in Open World”.
- [Mar 2023] Invited Talk at City University of Hong Kong and The Hong Kong Polytechnic University on “Open-Ended Embodied Agents with Multi-Task Skills”.
- [Aug 2022*] Invited Talk at Beijing Institute of General Artificial Intelligence (BIGAI) on “Learning Detection and Description of Local Features”
🔭 Experience
- Reviewer for ICML, NeurIPS, ICLR, ECCV.
- Intern in Alibaba Inc, Beijing, 2021.05 - 2021.08.
- Teaching Assistant for “Introduction to AI” Fall 2023, Peking University.
🎖 Honors and Awards
- [Jul 2023] Best Paper Award, ICML 2023 TEACH Workshop
- [Oct 2021] Chinese National Scholarship
- [Jun 2019] Outstanding Graduate of Beijing
- [Nov 2018] Autonomy Prize of Indoor Event on 10th International Micro Air Vehicle Competition and Conference, Melbourne.
- [Apr 2018] Meritorious Winner on American Mathematical Contest In Modeling (MCM) 2018.