refactor(agent): improve notebook prompts and tool response clarity by moyiliyi · Pull Request #351 · plmbr/notebook-intelligence

moyiliyi · 2026-05-28T15:31:20Z

Summary

This PR combines three related improvements to notebook-agent behavior and tool response clarity.

Prompt instruction refinement for notebook editing/execution workflows.
Changes add-code-cell and add-markdown-cell to return the inserted cellIndex when available, improving traceability in iterative notebook editing.
Adds UTF-8-safe output truncation to read_file so large file reads stay within an output budget, with regression tests covering truncation, multibyte content, and non-text files.

Motivation

In a data analysis and modeling task, the agent followed the prompt by first stating a high-level goal and then proceeding to generate the entire notebook implementation in one pass. For exploratory notebook work, this is not a reasonable default behavior. The notebook instructions should instead encourage incremental analysis, intermediate validation, and task-adaptive planning rather than upfront full-pipeline code generation.
add-cell tool actions previously returned generic success responses, which made it harder to reference newly inserted cells in subsequent steps. Returning the inserted cellIndex improves traceability for iterative notebook workflows.
read_file could emit overly large outputs, which is risky for tool usability and model context management.

What changed

Updated NOTEBOOK_EDIT_INSTRUCTIONS to clarify when to prefer exploratory workflows vs. construction workflows.
Updated NOTEBOOK_EXECUTE_INSTRUCTIONS to encourage smaller, iterative execution cycles for exploratory/scientific tasks.
Updated notebook add_code_cell and add_markdown_cell responses to return { cellIndex } when the UI command provides it.
Updated Python tool wrappers and Claude tool handlers to surface the inserted cell index in tool responses.
Added max_output_tokens to read_file, with UTF-8-safe truncation and an [output truncated] marker.
Added regression tests for:
- normal small-file reads
- oversized output truncation
- multibyte UTF-8 content
- non-UTF-8 files

pjdoland · 2026-06-02T01:06:53Z

Really nice work on this, and thank you for taking the time to include tests. It's a pleasure to review.

A few notes from reading through it:

Returning the inserted cell index is a great idea, and the code looks spot on. I followed it all the way through: newCellIndex is set correctly in both spots in src/index.ts, it matches where the cell actually lands, it comes back to Python as an int so the isinstance(cell_index, int) checks pass, and those checks also fall back gracefully to the old message if the response isn't what's expected. I especially like that the index is immediately useful, since the other notebook tools (run_cell, get_cell_output, get_cell_type_and_source, set_cell_type_and_source, delete_cell, insert_cell) all take a cell index, so the agent can act on the new cell right away instead of having to ask for the cell count first. Small change, real quality-of-life win.

The read_file size limit is done with real care. The way it trims on UTF-8 boundaries avoids producing broken characters, and the result stays within the byte budget. That multibyte test checking every visible character is still whole is a lovely touch.

Two things on read_file I'd gently float for your consideration:

After it trims, the header still claims the full file. By the time the output is cut down, end_line has already been set to the last line of the file, so the header still says (lines X-Y) for the whole range even though only part of it is actually there, and [output truncated] doesn't say where it stopped. Since read_file normally lets you read a specific line range, the agent doesn't have an easy way to pick up where it left off. It might help to show the range that was really included (say, the last line that made it in) so the model can do a follow-up read for the rest.
max_output_tokens ends up being something the model can set. Because it's an argument on the tool, it shows up in what the model sees, so the model can change it on any call. A large value quietly cancels out the limit you're enforcing, and a very small value hits the path that trims the header itself (and for the tiniest values returns just a piece of the [output truncated] text with no file content). If the limit is mostly there to keep file reads from flooding the context, it might be simpler to keep it as a fixed default on the server side rather than something the model passes in, or at least set a minimum so the header and marker always survive. And if you'd rather keep it adjustable, a quick test for the very-small case would cover that branch nicely, since the current tests only exercise the normal trimming path.

The notebook instruction updates toward smaller, step-by-step exploration read really well too, and splitting the guidance into exploratory versus more defined tasks is a thoughtful distinction.

Overall this is clean, careful, well-tested work. The cell-index part looks ready to go as is, and the two read_file notes above are just the things I'd look at before merging. Thanks again for the contribution.

…coverage

moyiliyi · 2026-06-02T11:55:22Z

Hi @pjdoland ,

I’ve just pushed a follow-up fix based on your feedback.

The truncation marker now shows where the output stopped [output truncated within line {line} column {column}], and max_output_tokens is no longer exposed in the public tool schema.
I also updated the tests accordingly.

Let me know if this looks better now.

pjdoland

LGTM.

Refine notebook prompt workflows

2c6157b

moyiliyi changed the title ~~Refine notebook editing and execution prompts~~ refactor(prompts): Refine notebook editing and execution prompts May 28, 2026

Return inserted cell index from add-cell commands

f3e4312

moyiliyi changed the title ~~refactor(prompts): Refine notebook editing and execution prompts~~ refactor(agent): improve notebook prompts and add-cell response feedback May 28, 2026

feat(read-file): cap output size and add truncation tests

fcafb38

moyiliyi changed the title ~~refactor(agent): improve notebook prompts and add-cell response feedback~~ refactor(agent): improve notebook prompts and tool feedback May 30, 2026

moyiliyi changed the title ~~refactor(agent): improve notebook prompts and tool feedback~~ refactor(agent): improve notebook prompts and tool response clarity May 30, 2026

pjdoland added the enhancement New feature or request label Jun 2, 2026

Refine read_file truncation semantics and strengthen regression test …

ef83061

…coverage

pjdoland approved these changes Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(agent): improve notebook prompts and tool response clarity#351

refactor(agent): improve notebook prompts and tool response clarity#351
moyiliyi wants to merge 4 commits into
plmbr:mainfrom
moyiliyi:exploration_prompts

moyiliyi commented May 28, 2026 •

edited

Loading

Uh oh!

pjdoland commented Jun 2, 2026

Uh oh!

moyiliyi commented Jun 2, 2026

Uh oh!

pjdoland left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moyiliyi commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What changed

Uh oh!

pjdoland commented Jun 2, 2026

Uh oh!

moyiliyi commented Jun 2, 2026

Uh oh!

pjdoland left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moyiliyi commented May 28, 2026 •

edited

Loading