Integrating Doctest #544

datvo06 · 2026-02-06T23:09:52Z

Attempting to close #433 . Current implementation captures the template docstring at __apply__ time and add it to the synthesized function's docstring and run it.

eb8680

This is neat, but I'm not sure it makes sense to use doctests in the way that's proposed in this PR (or even my original issue #433), because it departs from the semantics they have for ordinary Python functions.

Instead, there are two cases where it would be more natural to incorporate doctests as semantic constraints on Templates, which may only be useful if the doctests are stripped from the prompt during the constraint satisfaction step so that it cannot simply memorize them:

When implementing Templates using ordinary tool-calling, the doctests should induce a prefix to the conversation in which the LLM learns to satisfy the semantic constraints in the doctests. This only needs to happen once per Template definition, rather than once per call, and should be thought of as augmenting the system prompt associated with that Template.
When generating and executing code to implement a Template (as in #526), the generated code should be required to pass the doctests to be valid. This needs to happen once per code generation step, which could either happen once per Template definition or once per Template call.

datvo06 · 2026-02-09T20:42:59Z

Thanks, that seems more comprehensive and useful towards our intended use cases! I'm implementing the following:

Take the doctest part out of docstring. So it would prevent LLM from memorizing the expected output rather than trying to think.
In the ordinary template call with tools: extract the input/output pairs, take each input, query LLM, check the output, and put them in the conversation prefix that precedes the actual call, like a mini ReAct agent.
In the synthesis case, we cache the doctest once for each template, and run doctest for each synthesis call.

eb8680 · 2026-02-09T21:20:15Z

These ideas make sense at a high level, but I think this feature could use some additional design discussion sometime this week before we invest a lot more time into the implementation. We probably also want to resolve other open issues #497 #513 #541 #548 #549 first in order for this functionality to be useful in practice.

datvo06 · 2026-02-09T22:03:49Z

Got it. I drafted some implementation, but I think they can/will better if we have more tools on message manipulating. Will check if I can take any of the above.

datvo06 added 3 commits February 6, 2026 18:06

Doctest

22dfa37

Update notebook

d0bdbef

Adding test

24c6dfb

eb8680 added the module:llm label Feb 8, 2026

datvo06 added 7 commits February 9, 2026 10:26

Merge

add45f0

Remove unnecessary DocTestHandler

7e09561

Lint

72459dd

Remove duplicate tests

12eb50d

Restore old version of test_handler_llm_provdier

2f9c6d1

Restore old version of test_handler_llm_handler_encoding

ea8bb34

Rerun

541a273

datvo06 changed the title ~~[Draft PR] Integrating Doctest~~ Integrating Doctest Feb 9, 2026

datvo06 requested review from eb8680 and kiranandcode February 9, 2026 17:34

datvo06 added 2 commits February 9, 2026 12:59

Fix test

533eb63

minor

8d39e12

eb8680 reviewed Feb 9, 2026

View reviewed changes

Draft implementation

fcd5eaf

Minor

dd2e3d6

datvo06 marked this pull request as draft February 9, 2026 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating Doctest #544

Integrating Doctest #544

datvo06 commented Feb 6, 2026

Uh oh!

eb8680 left a comment •

edited

Loading

Uh oh!

datvo06 commented Feb 9, 2026

Uh oh!

eb8680 commented Feb 9, 2026

Uh oh!

datvo06 commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Integrating Doctest #544

Are you sure you want to change the base?

Integrating Doctest #544

Conversation

datvo06 commented Feb 6, 2026

Uh oh!

eb8680 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datvo06 commented Feb 9, 2026

Uh oh!

eb8680 commented Feb 9, 2026

Uh oh!

datvo06 commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eb8680 left a comment •

edited

Loading