Add Learning Notes for Auto_Program RL Implementation 正文 by hhh2210 · Pull Request #39 · lsdefine/simple_GRPO

hhh2210 · 2025-04-17T14:13:14Z

Thank you so much for all the great work you've recently done on the GRPO implementation , especially on Auto_Program section!

I've carefully studied the related code implementation, especially the part concerning the use of GRPO for reinforcement learning to optimize the model's generation and execution of Python code (a form of tool use). I found it very insightful.
To better understand and document this process, I've put together some learning notes (primarily in Learn/Train/simple_GRPO/Auto_Program/my_progress.md). These notes analyze how hjy_grpo_program.py, ref_server.py, config.py, and the related prompt files work together.

Since the implementation details regarding this RL and tool use aspect don't seem to be fully covered in the project's README or other documentation yet, I thought these notes might be helpful for others trying to understand this part of the code. Perhaps they could also serve as a starting point for future documentation updates.

Thanks again for your excellent work!
I hope these notes can be of some value to the community.

hhh2210 added 2 commits April 17, 2025 21:52

增加 readme

bc0185b

更新 auto_program 的笔记

f8d61fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Learning Notes for Auto_Program RL Implementation 正文#39

Add Learning Notes for Auto_Program RL Implementation 正文#39
hhh2210 wants to merge 2 commits intolsdefine:mainfrom
hhh2210:main

hhh2210 commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hhh2210 commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant