I am a Ph.D. student in the Institute for Artificial Intelligence at Peking University (PKU), advised by Prof. Yitao Liang. Before joining PKU, I received my MSc and BA degrees in Control Science and Technology from Beijing Institute of Technology. I work on building open-ended embodied agents with multi-task skills, including visual localization, task planning, and decision-making. In particular, I am interested in building and leveraging large pre-trained Foundation Models to improve the generalization of agent capabilities.

Recently, we have developed a series of open-world multi-task agents, including OmniJARVIS (pretrained end-to-end Vision-Language-Action models with self-supervised quantified tokenizer), JARVIS-1 (self-improving with multimodal memory), DEPS (interactive long-horizon planning agent), RAT (tool-use agent with retrieval-augmented thought), GROOT (self-supervised vision-based multitask policy), and ProAgent (collaborating agents).

🔥 News

  • Jun 2024:  🎉🎉 Rectified Scaling Law is accepted by ICML 2024.
  • Jan 2024:  🎉🎉 GROOT is accepted by ICLR 2024 for spotlight presentation (top 5%).
  • Jan 2024:  🎉🎉 ProAgent is accepted by AAAI 2024 for oral presentation.
  • Sep 2023:  🎉🎉 DEPS is accepted by NeurIPS 2023.
  • Jul 2023:  🎉🎉 DEPS received Best Paper Award at ICML 2023 TEACH Workshop!
  • Feb 2023:  🎉🎉 Two papers are accepted by CVPR 2023.

📝 Publications

arXiv
sym

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

An end-to-end open-ended agent based on Vision-Language-Action (VLA) models with self-supervised behavior tokenizer, that can answer questions and follow instructions in open-world Minecraft.

Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

arXiv | Project | Paper | Twitter

arXiv
sym

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

An agent with retrieval-augmented thought that can conduct code generation, math reasoning, embodied planning and open-ended question answering.

Zihao Wang, Anji Liu, Haowei Lin, Jiaqi Li, Xiaojian Ma, Yitao Liang

arXiv | Project | Demo | Paper | Code | Twitter

arXiv
sym

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

A multi-task agent that can self-improve in open-ended Minecraft and accomplish up to 200+ tasks.

Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang

arXiv | Project | Paper | Code | Twitter | Media

NeurIPS 2023
sym

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Best Paper Award, ICML 2023 TEACH Workshop

Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang

NeurIPS 2023 | Paper | Code | Twitter

ICLR 2024
sym

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

ICLR 2024 (Spotlight) | Project | Paper | Code | Twitter | Media

ICML 2024
sym

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang

ICML 2024 | Project | Paper | Code

arXiv
sym

MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft

Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang

NeurIPSW 2024 | Paper | Code

AAAI 2024
sym

ProAgent: Building Proactive Cooperative AI with Large Language Models

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang

AAAI 2024 (Oral) | Project | Paper | Code

CVPR 2023
sym

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

CVPR 2023 | Paper | Code

TNNLS
sym

Graph-Based Contrastive Learning for Description and Detection of Local Features

Zihao Wang, Zhen Li, Xueyi Li, Wenjie Chen, Xiangdong Liu

IEEE Trans. Neural Netw. Learn. Syst. (TNNLS 2022)

💬 Talks

  • [Dec 2023] Invited Talks at BAAI, Peking University on “Building Autonmous Agents in Open World”.
  • [Jul 2023] Invited Talk at NVIDIA on “Towards Multi-task Agents in Open World”.
  • [Mar 2023] Invited Talk at City University of Hong Kong and The Hong Kong Polytechnic University on “Open-Ended Embodied Agents with Multi-Task Skills”.
  • [Aug 2022*] Invited Talk at Beijing Institute of General Artificial Intelligence (BIGAI) on “Learning Detection and Description of Local Features”

🔭 Experience

  • Reviewer for ICML, NeurIPS, ICLR, ECCV.
  • Intern in Alibaba Inc, Beijing, 2021.05 - 2021.08.
  • Teaching Assistant for “Introduction to AI” Fall 2023, Peking University.

🎖 Honors and Awards

  • [Jul 2023] Best Paper Award, ICML 2023 TEACH Workshop
  • [Oct 2021] Chinese National Scholarship
  • [Jun 2019] Outstanding Graduate of Beijing
  • [Nov 2018] Autonomy Prize of Indoor Event on 10th International Micro Air Vehicle Competition and Conference, Melbourne.
  • [Apr 2018] Meritorious Winner on American Mathematical Contest In Modeling (MCM) 2018.