langfengQ · dependabot · Aug 13, 2025 · Sep 8, 2025
@@ -62,6 +62,7 @@ Unlike prior approaches that simply concatenate full interaction histories, `ver
     - [3. Sokoban](#3-sokoban)  
     - [4. Gym Cards](#4-gym-cards)  
     - [5. AppWorld (Experimental)](#5-appworld-experimental)  
+    - [6. Search](#6-search)
 - [Run Examples](#run-examples)  
   - [RL Training](#rl-training)  
     - [1. GiGPO](#1-gigpo)  
@@ -139,7 +140,7 @@ We have released our models on [HuggingFace](https://huggingface.co/collections/
 # Installation
 ## Install veRL
 ```bash
-conda create -n verl-agent python==3.12 -y
+conda create -n verl-agent python==3.10 -y
 conda activate verl-agent
 
 pip3 install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
@@ -177,12 +178,6 @@ alfworld-play-tw
 ---
 
 ### 2. WebShop
-WebShop requires Python <=3.10, so begin by creating a new `verl-agent-webshop` environment
-```bash
-conda create -n verl-agent-webshop python==3.10 -y
-conda activate verl-agent-webshop
-```
-
 Install WebShop
 ```bash
 cd ./agent_system/environments/env_package/webshop/webshop
@@ -241,9 +236,59 @@ appworld install
 appworld download data
 ```
 
+### 6. Search
+```bash
+conda activate verl-agent
+cd ./agent_system/environments/env_package/search/third_party
+pip install -e .
+pip install gym==0.26.2
+```
+
+Prepare dataset (data will be saved at `~/data/searchR1_processed_direct`):
+```bash
+cd repo_root/
+python examples/data_preprocess/preprocess_search_r1_dataset.py
+```
+
+
+Since faiss-gpu is not available via pip, we setup a separate conda environment for the local retrieval server. Running this server will use around 6GB of GPU memory per GPU, so make sure to account for this in your training run configuration. Build Retriever environments:
+```bash
+# Create and activate the retriever environment with Python 3.10
+conda create -n retriever python=3.10 -y
+conda activate retriever
+
+# Install PyTorch (with GPU support) and related libraries
+conda install numpy==1.26.4 # needed to stop incompatible version of numpy from being installed via pip
+pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
 
-<!-- > ⚠️ **Important:**  
-To run an agent in any of these environments, you must first install and configure the corresponding environment. Please refer to the [Environment Setup Guide](agent_system/environments/README.md) for step-by-step installation instructions. -->
+# Install other Python packages
+pip install transformers datasets pyserini huggingface_hub
+
+# Install the GPU version of faiss
+conda install faiss-gpu==1.8.0 -c pytorch -c nvidia -y
+
+# Install the API service framework
+pip install uvicorn fastapi
+```
+
+Download the index:
+```bash
+conda activate retriever
+
+local_dir=~/data/searchR1
+python examples/search/searchr1_download.py --local_dir $local_dir
+cat $local_dir/part_* > $local_dir/e5_Flat.index
+gzip -d $local_dir/wiki-18.jsonl.gz
+```
+
+Start the local flat e5 retrieval server: 
+```bash
+conda activate retriever
+
+# redirect the output to a file to avoid cluttering the terminal
+# we have observed outputting to the terminal causing spikes in server response times
+bash examples/search/retriever/retrieval_launch.sh > retrieval_server.log 
+```
 
 # Run Examples
 ## RL Training
@@ -266,6 +311,9 @@ bash examples/gigpo_trainer/run_webshop.sh # WebShop
 ```bash
 bash examples/gigpo_trainer/run_sokoban.sh # Sokoban
 ```
+```bash
+bash examples/gigpo_trainer/run_search.sh # Search
+```
 ### 2. GRPO
 GRPO is a critic-free algorithm that estimates relative advantages based on a group of full episode trajectories.
 ```bash
@@ -274,6 +322,9 @@ bash examples/grpo_trainer/run_alfworld.sh # ALFWorld
 ```bash
 bash examples/grpo_trainer/run_webshop.sh # WebShop
 ```
+```bash
+bash examples/grpo_trainer/run_search.sh # Search
+```
 ### 3. PPO
 PPO is a classic actor-critic algorithm that updates the policy using a clipped objective to ensure stable learning. It requires a separate value network (critic) to estimate state values.
 ```bash

@@ -0,0 +1,14 @@
+# agent_system/agent/agents/__init__.py
+from .action_agent import ActionAgent
+from .memory_agent import MemoryAgent
+from .planning_agent import PlanningAgent
+from .reflexion_agent import ReflexionAgent
+from .search_agent import SearchAgent
+
+__all__ = [
+    "ActionAgent",
+    "MemoryAgent",
+    "PlanningAgent",
+    "ReflexionAgent",
+    "SearchAgent",
+]