Hi! I’ve been following LightThinker and making some modifications. I have two questions:
(i) I noticed that you seem to have implemented rolling_rope, but the default is "false". May I ask whether the "true" track is already implemented (considering whether it corresponds only to inference or to both training and inference)? If it is implemented, did your results show that "false" performed better than "true"? If not implemented yet, I plan to implement it myself and see.
(ii) For bs17k, more than half of the data has a length exceeding 4096. The approach in the code is to truncate directly. it seems truncating [EOS] as well. Could this introduce bias in how vanilla and LightThinker handle the closing after solving the questions? I’m also considering other suitable training datasets, etc.
BTW, I wanted to share that in LightThinker (token), using step6_comp2, I found through my experiments that step6 can actually be much much larger, and the results can even be better. e.g., based on my training and testing results, step32_comp2 outperformed step6_comp.
Hi! I’ve been following LightThinker and making some modifications. I have two questions:
(i) I noticed that you seem to have implemented rolling_rope, but the default is "false". May I ask whether the "true" track is already implemented (considering whether it corresponds only to inference or to both training and inference)? If it is implemented, did your results show that "false" performed better than "true"? If not implemented yet, I plan to implement it myself and see.
(ii) For bs17k, more than half of the data has a length exceeding 4096. The approach in the code is to truncate directly. it seems truncating [EOS] as well. Could this introduce bias in how vanilla and LightThinker handle the closing after solving the questions? I’m also considering other suitable training datasets, etc.
BTW, I wanted to share that in LightThinker (token), using step6_comp2, I found through my experiments that step6 can actually be much much larger, and the results can even be better. e.g., based on my training and testing results, step32_comp2 outperformed step6_comp.