Bound transcript memory during checkpointing#28
Closed
rasmusfaber wants to merge 1 commit into
Closed
Conversation
Author
|
Closing to reopen against upstream origin. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains:
What is the current behavior? (You can also link to an open issue here)
Long-running samples keep the full transcript resident in Python memory, and checkpointing historically depended on that resident transcript when writing host context. That blocks bounded-memory execution: evicted events can be lost unless another durable event history is available, and retry/display/compatibility callers still need a way to read full history when necessary.
What is the new behavior?
Bounded transcript mode keeps only a resident tail in memory while using the buffer database as the provider for full-history compatibility. Checkpointing now maintains its own incremental event store for host snapshots, including pooled model inputs/calls and attachments, so checkpoint fires do not need to re-walk an ever-growing resident transcript.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No. Bounded transcript mode remains opt-in for now, and existing transcript access patterns continue to work through provider-backed compatibility paths when buffer history is available.
Other information:
This is PR 3 of 3 in the event-store transcript split and is stacked on the buffer-history PR. It contains the bounded transcript integration, buffer-backed transcript history provider, retry-history behavior, and checkpoint event-store rewrite that need to land together to preserve bounded-memory checkpointing.