Skip to content

Conversation

@datvo06
Copy link
Contributor

@datvo06 datvo06 commented Feb 6, 2026

Attempting to close #433 . Current implementation captures the template docstring at __apply__ time and add it to the synthesized function's docstring and run it.

@datvo06 datvo06 changed the title [Draft PR] Integrating Doctest Integrating Doctest Feb 9, 2026
Copy link
Contributor

@eb8680 eb8680 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, but I'm not sure it makes sense to use doctests in the way that's proposed in this PR (or even my original issue #433), because it departs from the semantics they have for ordinary Python functions.

Instead, there are two cases where it would be more natural to incorporate doctests as semantic constraints on Templates, which may only be useful if the doctests are stripped from the prompt during the constraint satisfaction step so that it cannot simply memorize them:

  1. When implementing Templates using ordinary tool-calling, the doctests should induce a prefix to the conversation in which the LLM learns to satisfy the semantic constraints in the doctests. This only needs to happen once per Template definition, rather than once per call, and should be thought of as augmenting the system prompt associated with that Template.
  2. When generating and executing code to implement a Template (as in #526), the generated code should be required to pass the doctests to be valid. This needs to happen once per code generation step, which could either happen once per Template definition or once per Template call.

@datvo06
Copy link
Contributor Author

datvo06 commented Feb 9, 2026

Thanks, that seems more comprehensive and useful towards our intended use cases! I'm implementing the following:

  1. Take the doctest part out of docstring. So it would prevent LLM from memorizing the expected output rather than trying to think.
  2. In the ordinary template call with tools: extract the input/output pairs, take each input, query LLM, check the output, and put them in the conversation prefix that precedes the actual call, like a mini ReAct agent.
  3. In the synthesis case, we cache the doctest once for each template, and run doctest for each synthesis call.

@eb8680
Copy link
Contributor

eb8680 commented Feb 9, 2026

These ideas make sense at a high level, but I think this feature could use some additional design discussion sometime this week before we invest a lot more time into the implementation. We probably also want to resolve other open issues #497 #513 #541 #548 #549 first in order for this functionality to be useful in practice.

@datvo06
Copy link
Contributor Author

datvo06 commented Feb 9, 2026

Got it. I drafted some implementation, but I think they can/will better if we have more tools on message manipulating. Will check if I can take any of the above.

@datvo06 datvo06 marked this pull request as draft February 9, 2026 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support and exploit doctests in higher-order Template specifications

2 participants