OpenMOSS / Embodied-Planner-R1

Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning

25 1 Language: Python License: Apache-2.0 Updated: 1mo ago

📚Large Language Models 🔍SEO content 📝Writing

README

# Embodied-Planner-R1

🌠Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning 🚀

[**🤗Model (AlfWorld)**](https://huggingface.co/fnlp/Embodied_Planner-R1-Alfworld) | [**🤗Model (ScienceWorld)**](https://huggingface.co/fnlp/Embodied_R1-ScienceWorld) | [**📖Paper**](https://arxiv.org/abs/2506.23127v1) |

We introduce Embodied Planner-R1, a novel outcome-driven reinforcement learning framework that enables LLMs to develop interactive capabilities through autonomous exploration.

Embodied Planner-R1 enables LLM agents to learn causal relationships between actions and environmental feedback through multi-turn interactions, allowing them to update their policies based on an outcome reward.

🔥Releases

[2025/07/01]

🌌 Full training code and scripts are available.
🤗 We open-source our model weights in Embodied_R1-ScienceWorld & Embodied_Planner-R1-Alfworld.

🚀 Installation

We separate the VERL training framework from the environment and wrap the environment into a server for interaction.

Embodied-Planner-R1 is based on verl with vLLM>=0.8


# Create the conda environment
conda create -n Embodied-Planner-R1 python=3.10
conda activate Embodied-Planner-R1

cd Embodied-Planner-R1
pip3 install -e .

Install the latest stable version of vLLM

pip3 install vllm==0.8.3

Install flash-attn

pip3 install flash-attn --no-build-isolation
pip3 install tensorboard


2. Prepare the environment for ALFWorld

conda create -n alfworld python=3.9
conda activate alfworld

download task for training

pip install alfworld
pip install fastapi
pip install uvicorn
alfworld-download --data-dir ./get_data/alfworld


3. Prepare the environment for ScienceWorld

conda create --name scienceworld python=3.8
conda activate scienceworld

pip install scienceworld
conda install -y -c conda-forge openjdk=11
pip install fastapi
pip install uvicorn


## 🛠️ Data preparation
We need to prepare tasks for reinforcement learning.

get task data for rl training

cd get_data
bash get_data_for_training.sh


## 🕹️ Quick Start
In our experimental setup, we used a 1×8 A100 (80GB) for training, with detailed training parameters provided in [examples/grpo_trainer/alf.sh](examples/grpo_trainer/alf.sh).

Remember to replace the path in the shell script with your local path

bash cmd/alf.sh

bash cmd/sci_easy.sh


## 🎮 Evaluation

We follow the framework of MINT to evaluate models.

cd verl/eval_agent
conda create -n eval_agent python=3.10
conda activate eval_agent
bash setup.sh

conda create -n vllm python=3.10
conda activate vllm
pip install vllm

deploy the model

python -m vllm.entrypoints.openai.api_server --served-model-name embodied_r1_alfworld --model /path/to/model --port 8000 --disable-frontend-multiprocessing --gpu-memory-utilization 0.99 --disable-frontend-multiprocessing --max-model-len 4096 --enforce-eager

start evaluation

conda activate eval_agent

python -m eval_agent.main --agent_config er1_alfworld --exp_config alfworld_v2 --split dev --verbose # you can find more examples in eval.sh



## Acknowledgements
The training codebase is primarily based on [Verl](https://github.com/volcengine/verl), while the evaluation framework is adapted from [MINT](https://github.com/xingyaoww/mint-bench). Our model builds upon the foundation of [`Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). We deeply appreciate their excellent contributions.


## Citation

@article{fei2025unleashing,
title={Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning},
author={Fei, Zhaoye and Ji, Li and Wang, Siyin and Shi, Junhao and Gong, Jingjing and Qiu, Xipeng},
journal={arXiv preprint arXiv:2506.23127},
year={2025}
}

Search

OpenMOSS / Embodied-Planner-R1

README

🔥Releases

🚀 Installation

Install the latest stable version of vLLM

Install flash-attn

download task for training

get task data for rl training

Remember to replace the path in the shell script with your local path

We follow the framework of MINT to evaluate models.

deploy the model

start evaluation

Help

People also viewed

Feedback and Incident Report

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: