Skip to content

Add Learning Notes for Auto_Program RL Implementation 正文#39

Open
hhh2210 wants to merge 2 commits intolsdefine:mainfrom
hhh2210:main
Open

Add Learning Notes for Auto_Program RL Implementation 正文#39
hhh2210 wants to merge 2 commits intolsdefine:mainfrom
hhh2210:main

Conversation

@hhh2210
Copy link

@hhh2210 hhh2210 commented Apr 17, 2025

Thank you so much for all the great work you've recently done on the GRPO implementation , especially on Auto_Program section!

I've carefully studied the related code implementation, especially the part concerning the use of GRPO for reinforcement learning to optimize the model's generation and execution of Python code (a form of tool use). I found it very insightful.
To better understand and document this process, I've put together some learning notes (primarily in Learn/Train/simple_GRPO/Auto_Program/my_progress.md). These notes analyze how hjy_grpo_program.py, ref_server.py, config.py, and the related prompt files work together.

Since the implementation details regarding this RL and tool use aspect don't seem to be fully covered in the project's README or other documentation yet, I thought these notes might be helpful for others trying to understand this part of the code. Perhaps they could also serve as a starting point for future documentation updates.

Thanks again for your excellent work!
I hope these notes can be of some value to the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant