Could you please provide some guidance on how to perform RL training on VLMs? For example, how to train on WebShop datasets?
Could you please provide some guidance on how to perform RL training on VLMs?
For example, how to train on WebShop datasets?