I am a Ph.D. student in the Institute for Artificial Intelligence at Peking University (PKU), advised by Prof. Yitao Liang. Before joining PKU, I received my MSc and BA degrees in Control Science and Technology from Beijing Institute of Technology. I work on building open-ended embodied agents with multi-task skills, including visual localization, task planning, and decision-making. In particular, I am interested in building and leveraging large pre-trained Foundation Models to improve the generalization of agent capabilities.
Recently, we have developed a series of open-world multi-task agents, including OmniJARVIS (pretrained end-to-end Vision-Language-Action models with self-supervised quantified tokenizer), JARVIS-1 (self-improving with multimodal memory), DEPS (interactive long-horizon planning agent), RAT (tool-use agent with retrieval-augmented thought), GROOT (self-supervised vision-based multitask policy), and ProAgent (collaborating agents).
🔥 News
- Sep 2024: 🎉🎉 Our latest Vision-Language-Action models OmniJARVIS is accepted by NeurIPS 2024.
- Aug 2024: 📢📢 We will organize the 1st Open-world Agent Workshop in NeurIPS 2024 (Vancouver, BC, Canada). Calling for papers NOW! 🔥🔥
- Jun 2024: 🎉🎉 Rectified Scaling Law is accepted by ICML 2024.
- Jan 2024: 🎉🎉 GROOT is accepted by ICLR 2024 for spotlight presentation (top 5%).
- Jan 2024: 🎉🎉 ProAgent is accepted by AAAI 2024 for oral presentation.
- Sep 2023: 🎉🎉 DEPS is accepted by NeurIPS 2023.
- Jul 2023: 🎉🎉 DEPS received Best Paper Award at ICML 2023 TEACH Workshop!
- Feb 2023: 🎉🎉 Two papers are accepted by CVPR 2023.
📝 Publications
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting
Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, Yitao Liang
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
An end-to-end open-ended agent based on Vision-Language-Action (VLA) models with self-supervised behavior tokenizer, that can answer questions and follow instructions in open-world Minecraft.
Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
An agent with retrieval-augmented thought that can conduct code generation, math reasoning, embodied planning and open-ended question answering.
Zihao Wang, Anji Liu, Haowei Lin, Jiaqi Li, Xiaojian Ma, Yitao Liang
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
A multi-task agent that can self-improve in open-ended Minecraft and accomplish up to 200+ tasks.
Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
Best Paper Award, ICML 2023 TEACH WorkshopZihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
ICLR 2024 (Spotlight) | Project | Paper | Code | Twitter | Media
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang
MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft
Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang
ProAgent: Building Proactive Cooperative AI with Large Language Models
Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
Learning Transformation-Predictive Representations for Detection and Description of Local Features
Zihao Wang, Chunxu Wu, Yifei Yang, Zhen Li
CVPR 2023
Graph-Based Contrastive Learning for Description and Detection of Local Features
Zihao Wang, Zhen Li, Xueyi Li, Wenjie Chen, Xiangdong Liu
IEEE Trans. Neural Netw. Learn. Syst. (TNNLS 2022)
Zihao Wang, Xueyi Li, Zhen Li
IJCAI 2021
💬 Talks
- [Dec 2023] Invited Talks at BAAI, Peking University on “Building Autonmous Agents in Open World”.
- [Jul 2023] Invited Talk at NVIDIA on “Towards Multi-task Agents in Open World”.
- [Mar 2023] Invited Talk at City University of Hong Kong and The Hong Kong Polytechnic University on “Open-Ended Embodied Agents with Multi-Task Skills”.
- [Aug 2022*] Invited Talk at Beijing Institute of General Artificial Intelligence (BIGAI) on “Learning Detection and Description of Local Features”
🔭 Experience
- Organizer for 1st Open-world Agent Workshop in NeurIPS 2024.
- Reviewer for ICML, NeurIPS, ICLR, ECCV, AAAI.
- Intern in Alibaba Inc, Beijing, 2021.05 - 2021.08.
- Teaching Assistant for “Introduction to AI” Fall 2023, Peking University.
🎖 Honors and Awards
- [Jul 2023] Best Paper Award, ICML 2023 TEACH Workshop
- [Oct 2021] Chinese National Scholarship
- [Jun 2019] Outstanding Graduate of Beijing
- [Nov 2018] Autonomy Prize of Indoor Event on 10th International Micro Air Vehicle Competition and Conference, Melbourne.
- [Apr 2018] Meritorious Winner on American Mathematical Contest In Modeling (MCM) 2018.