Skip to content

refactor(agent): improve notebook prompts and tool response clarity#351

Open
moyiliyi wants to merge 4 commits into
plmbr:mainfrom
moyiliyi:exploration_prompts
Open

refactor(agent): improve notebook prompts and tool response clarity#351
moyiliyi wants to merge 4 commits into
plmbr:mainfrom
moyiliyi:exploration_prompts

Conversation

@moyiliyi
Copy link
Copy Markdown
Contributor

@moyiliyi moyiliyi commented May 28, 2026

Summary

This PR combines three related improvements to notebook-agent behavior and tool response clarity.

  1. Prompt instruction refinement for notebook editing/execution workflows.
  2. Changes add-code-cell and add-markdown-cell to return the inserted cellIndex when available, improving traceability in iterative notebook editing.
  3. Adds UTF-8-safe output truncation to read_file so large file reads stay within an output budget, with regression tests covering truncation, multibyte content, and non-text files.

Motivation

  1. In a data analysis and modeling task, the agent followed the prompt by first stating a high-level goal and then proceeding to generate the entire notebook implementation in one pass. For exploratory notebook work, this is not a reasonable default behavior. The notebook instructions should instead encourage incremental analysis, intermediate validation, and task-adaptive planning rather than upfront full-pipeline code generation.

  2. add-cell tool actions previously returned generic success responses, which made it harder to reference newly inserted cells in subsequent steps. Returning the inserted cellIndex improves traceability for iterative notebook workflows.

  3. read_file could emit overly large outputs, which is risky for tool usability and model context management.

What changed

  • Updated NOTEBOOK_EDIT_INSTRUCTIONS to clarify when to prefer exploratory workflows vs. construction workflows.
  • Updated NOTEBOOK_EXECUTE_INSTRUCTIONS to encourage smaller, iterative execution cycles for exploratory/scientific tasks.
  • Updated notebook add_code_cell and add_markdown_cell responses to return { cellIndex } when the UI command provides it.
  • Updated Python tool wrappers and Claude tool handlers to surface the inserted cell index in tool responses.
  • Added max_output_tokens to read_file, with UTF-8-safe truncation and an [output truncated] marker.
  • Added regression tests for:
    • normal small-file reads
    • oversized output truncation
    • multibyte UTF-8 content
    • non-UTF-8 files

@moyiliyi moyiliyi changed the title Refine notebook editing and execution prompts refactor(prompts): Refine notebook editing and execution prompts May 28, 2026
@moyiliyi moyiliyi changed the title refactor(prompts): Refine notebook editing and execution prompts refactor(agent): improve notebook prompts and add-cell response feedback May 28, 2026
@moyiliyi moyiliyi changed the title refactor(agent): improve notebook prompts and add-cell response feedback refactor(agent): improve notebook prompts and tool feedback May 30, 2026
@moyiliyi moyiliyi changed the title refactor(agent): improve notebook prompts and tool feedback refactor(agent): improve notebook prompts and tool response clarity May 30, 2026
@pjdoland pjdoland added the enhancement New feature or request label Jun 2, 2026
@pjdoland
Copy link
Copy Markdown
Collaborator

pjdoland commented Jun 2, 2026

Really nice work on this, and thank you for taking the time to include tests. It's a pleasure to review.

A few notes from reading through it:

Returning the inserted cell index is a great idea, and the code looks spot on. I followed it all the way through: newCellIndex is set correctly in both spots in src/index.ts, it matches where the cell actually lands, it comes back to Python as an int so the isinstance(cell_index, int) checks pass, and those checks also fall back gracefully to the old message if the response isn't what's expected. I especially like that the index is immediately useful, since the other notebook tools (run_cell, get_cell_output, get_cell_type_and_source, set_cell_type_and_source, delete_cell, insert_cell) all take a cell index, so the agent can act on the new cell right away instead of having to ask for the cell count first. Small change, real quality-of-life win.

The read_file size limit is done with real care. The way it trims on UTF-8 boundaries avoids producing broken characters, and the result stays within the byte budget. That multibyte test checking every visible character is still whole is a lovely touch.

Two things on read_file I'd gently float for your consideration:

  1. After it trims, the header still claims the full file. By the time the output is cut down, end_line has already been set to the last line of the file, so the header still says (lines X-Y) for the whole range even though only part of it is actually there, and [output truncated] doesn't say where it stopped. Since read_file normally lets you read a specific line range, the agent doesn't have an easy way to pick up where it left off. It might help to show the range that was really included (say, the last line that made it in) so the model can do a follow-up read for the rest.

  2. max_output_tokens ends up being something the model can set. Because it's an argument on the tool, it shows up in what the model sees, so the model can change it on any call. A large value quietly cancels out the limit you're enforcing, and a very small value hits the path that trims the header itself (and for the tiniest values returns just a piece of the [output truncated] text with no file content). If the limit is mostly there to keep file reads from flooding the context, it might be simpler to keep it as a fixed default on the server side rather than something the model passes in, or at least set a minimum so the header and marker always survive. And if you'd rather keep it adjustable, a quick test for the very-small case would cover that branch nicely, since the current tests only exercise the normal trimming path.

The notebook instruction updates toward smaller, step-by-step exploration read really well too, and splitting the guidance into exploratory versus more defined tasks is a thoughtful distinction.

Overall this is clean, careful, well-tested work. The cell-index part looks ready to go as is, and the two read_file notes above are just the things I'd look at before merging. Thanks again for the contribution.

@moyiliyi
Copy link
Copy Markdown
Contributor Author

moyiliyi commented Jun 2, 2026

Hi @pjdoland ,

I’ve just pushed a follow-up fix based on your feedback.

The truncation marker now shows where the output stopped [output truncated within line {line} column {column}], and max_output_tokens is no longer exposed in the public tool schema.
I also updated the tests accordingly.

Let me know if this looks better now.

Copy link
Copy Markdown
Collaborator

@pjdoland pjdoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants