AgentVLN: Towards Agentic Vision-and-Language Navigation

AgentVLN is an efficient embodied navigation framework for long-horizon vision-and-language navigation in unseen environments. It formulates VLN as a POSMDP and follows a VLM-as-Brain paradigm that decouples high-level semantic reasoning from low-level perception and planning through a plug-and-play skill library.

Real-world Deployment

Real-world experiments show that AgentVLN can execute instruction-following navigation in both indoor and outdoor scenes, while maintaining robust planning and efficient deployment. We will release real-world video demos soon.

Simulation demos

Please see our project page for HD demo.

Highlights

VLM-as-Brain Navigation: AgentVLN decomposes long-horizon navigation into high-level reasoning and modular skill execution under a unified agentic framework.
Cross-space Representation Mapping: 3D topological waypoints are projected into the image plane as pixel-aligned visual prompts, bridging the gap between 3D planning and 2D VLM perception.
Context-aware Self-correction: fine-grained active exploration helps the agent recover from occlusions, blind spots, and trajectory drift during long-horizon navigation.
QD-PCoT for Spatial Ambiguity: the Query-Driven Perceptual Chain-of-Thought mechanism enables the agent to actively query missing geometric cues for more precise target grounding.
Lightweight Edge Deployment: AgentVLN achieves a strong accuracy-efficiency trade-off and supports real-time local inference on embedded edge platforms.

Efficiency

Compared with prior VLN systems that rely on larger models or remote cloud execution, AgentVLN is designed for efficient local deployment. The framework delivers a better accuracy-efficiency balance on long-horizon VLN benchmarks while remaining lightweight enough for real-time on-device inference.

Experimental Results

AgentVLN consistently outperforms prior state-of-the-art methods on the Val-Unseen splits of R2R-CE and RxR-CE, demonstrating strong generalization in complex unseen environments.

TODO

Release the project page and paper PDF
Release AgentVLN-Instruct
Open-source training and inference code
Release pretrained model checkpoints
Add installation and environment setup instructions

Citation

@misc{xin2026agentvln,
      title={AgentVLN: Towards Agentic Vision-and-Language Navigation},
      author={Zihao Xin and Wentong Li and Yixuan Jiang and Ziyuan Huang and Bin Wang and Piji Li and Jianke Zhu and Jie Qin and Sheng-Jun Huang},
      year={2026},
      eprint={2603.17670},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.17670}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentVLN: Towards Agentic Vision-and-Language Navigation

Real-world Deployment

Simulation demos

Highlights

Efficiency

Experimental Results

TODO

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AgentVLN: Towards Agentic Vision-and-Language Navigation

Real-world Deployment

Simulation demos

Highlights

Efficiency

Experimental Results

TODO

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages