Fix combine=True masking only last assistant turn in multi-turn finetuning#803
Open
Mr-Neutr0n wants to merge 1 commit intozai-org:mainfrom
Open
Fix combine=True masking only last assistant turn in multi-turn finetuning#803Mr-Neutr0n wants to merge 1 commit intozai-org:mainfrom
Mr-Neutr0n wants to merge 1 commit intozai-org:mainfrom
Conversation
…onversations When combine=True in process_batch(), the label masking logic previously found only the last occurrence of the assistant token (151337) and unmasked tokens after it. This meant that in multi-turn conversations, all earlier assistant responses were masked out with -100 and did not contribute to the training loss. The model was effectively only learning from the final assistant reply, wasting all intermediate assistant turns. This fix iterates through the full token sequence and unmasks every assistant response segment (from each 151337 marker through its corresponding 151336 end token), matching the behavior of the non-combine branch which correctly trains on all assistant turns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
combine=Trueinprocess_batch(), the current label masking logic only finds the last occurrence of the assistant role token (151337) and unmasks tokens after it:In multi-turn conversations this means all earlier assistant responses are masked with
-100and never contribute to the training loss. The model effectively only learns from the final assistant reply, which wastes all intermediate assistant turns.This is inconsistent with the non-combine branch, which correctly sets
loss_mask_val = Truefor every assistant message.Fix
Instead of searching backward for the last
151337, the fix iterates forward through the full token sequence and unmasks every assistant response segment — from each assistant role token (151337) through its corresponding end-of-turn token (151336). This matches the per-message masking behavior of the non-combine path.Impact
Anyone using
combine: truein their finetuning config with multi-turn conversation data was silently losing training signal from all assistant turns except the last one. This fix ensures all assistant responses contribute to the loss as intended.