Skip to content

Jk/optimizations#5

Open
DhirenMhatre wants to merge 3 commits into
masterfrom
jk/optimizations
Open

Jk/optimizations#5
DhirenMhatre wants to merge 3 commits into
masterfrom
jk/optimizations

Conversation

@DhirenMhatre
Copy link
Copy Markdown

Fixes #


Read the full contributing guidelines: https://docs.langchain.com/oss/python/contributing/overview

All contributions must be in English. See the language policy.

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

Thank you for contributing to LangChain! Follow these steps to have your pull request considered as ready for review.

  1. PR title: Should follow the format: TYPE(SCOPE): DESCRIPTION
  1. PR description:
  • Write 1-2 sentences summarizing the change.
  • The Fixes #xx line at the top is required for external contributions — update the issue number and keep the keyword. This links your PR to the approved issue and auto-closes it on merge.
  • If there are any breaking changes, please clearly describe them.
  • If this PR depends on another PR being merged first, please include "Depends on #PR_NUMBER" in the description.
  1. Run make format, make lint and make test from the root of the package(s) you've modified.
  • We will not consider a PR unless these three are passing in CI.
  1. How did you verify your code works?

Additional guidelines:

  • All external PRs must link to an issue or discussion where a solution has been approved by a maintainer, and you must be assigned to that issue. PRs without prior approval will be closed.
  • PRs should not touch more than one package unless absolutely necessary.
  • Do not update the uv.lock files or add dependencies to pyproject.toml files (even optional ones) unless you have explicit permission to do so by a maintainer.

Social handles (optional)

Twitter: @
LinkedIn: https://linkedin.com/in/

jkennedyvz and others added 3 commits February 27, 2026 21:08
Replace substring check `"azure.com" in url` with `urlparse`-based
hostname validation to prevent bypass via crafted URLs (CWE-20).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- get_child(): pass state directly to constructor instead of calling
  set_handlers/add_tags/add_metadata sequentially (-22%)
- add_tags(): use set-based dedup instead of O(n) list scans per tag,
  early return when tags list is empty (-70% on duplicate tags)
- handle_event/ahandle_event: early return when handlers list is empty
- _configure(): skip throwaway CallbackManager construction on the
  common path where inheritable_callbacks is provided

These are the top bottlenecks identified by profiling langgraph's
react_agent benchmark, where langchain_core callbacks account for ~35%
of runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…et config keys

- Add copy-on-write (COW) to BaseCallbackManager.copy() — defer
  shallow copies of handlers/tags/metadata until first mutation
- Cache inspect.signature() results for BaseChatModel._generate and
  _agenerate to avoid repeated introspection per invoke
- Cache signature(self._run) and _get_runnable_config_param in
  ChildTool.__init__ to avoid per-invocation introspection
- Convert CONFIG_KEYS and COPIABLE_KEYS from lists to frozensets
  for O(1) membership checks
- Fast path in _format_for_tracing when no messages have list content

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codity-dm
Copy link
Copy Markdown

codity-dm Bot commented May 19, 2026

Policy Check Failed

✗ 3/3 policy checks failed:

• Need 2 more approval(s) (0/2) — comment LGTM or approve via review
• Missing ticket reference (expected: JIRA-, ENG-, #*)
• 7 code file(s) changed but no test files added


To merge this PR:

  1. Address the failed checks listed above
  2. Ensure branch protection requires the codity/policy-check status

Configure policies in your dashboard

@codity-dm
Copy link
Copy Markdown

codity-dm Bot commented May 19, 2026

PR Summary

What Changed

  • Added copy-on-write semantics to CallbackManager to defer handler/metadata copying until mutation occurs.
  • Cached introspection results for chat model and tool method signatures to avoid repeated inspect.signature() calls.
  • Fixed Azure endpoint detection in DeepSeek integration to use proper URL parsing instead of substring matching.

Key Changes by Area

Callbacks: Replaced eager copying with lazy copy-on-write in CallbackManager and added fast-path returns for empty handler lists.

Chat Models: Added module-level cache for run manager parameter checks and early return in message formatting when no list content exists.

Tools: Added per-instance caching for _run signature introspection to avoid repeated reflection.

Security: Fixed _is_azure_endpoint in DeepSeek chat models to prevent hostname/subdomain bypass attacks via proper URL parsing.

Files Changed

File Changes Summary
libs/core/langchain_core/callbacks/base.py Added _cow_copy() method and copy-on-write logic for handlers/tags/metadata
libs/core/langchain_core/callbacks/manager.py Added empty handler guards in handle_event and ahandle_event
libs/core/langchain_core/language_models/chat_models.py Added cached introspection and fast-path message formatting
libs/core/langchain_core/runnables/config.py Minor changes supporting callback configuration
libs/core/langchain_core/tools/base.py Added per-instance signature caching for tool _run methods
libs/partners/deepseek/langchain_deepseek/chat_models.py Fixed _is_azure_endpoint to use proper URL parsing
libs/partners/deepseek/tests/unit_tests/test_chat_models.py Added tests for Azure endpoint detection fix

Review Focus Areas

  • Correctness of copy-on-write mutation detection in CallbackManager._cow_copy().
  • Cache invalidation logic for tool signature caching on instance reuse.
  • Security coverage of URL parsing edge cases in Azure endpoint detection.

Architecture

Design Decisions: Copy-on-write trades memory efficiency for slight code complexity. Caching introspection results accepts memory overhead for reduced CPU usage in hot paths. The Azure fix prioritizes security over backward compatibility with malformed URLs.

Risks: Copy-on-write relies on correct mutation detection. Missing a mutation path would cause shared state bugs. Cache keys for introspection must remain stable across Python versions.

Merge Status

MERGEABLE — PR Score 65/100, above threshold (50). All gates passed.

Comment on lines +992 to +1000
clone = self.__class__.__new__(self.__class__)
clone.handlers = self.handlers
clone.inheritable_handlers = self.inheritable_handlers
clone.parent_run_id = self.parent_run_id
clone.tags = self.tags
clone.inheritable_tags = self.inheritable_tags
clone.metadata = self.metadata
clone.inheritable_metadata = self.inheritable_metadata
clone._cow = True # noqa: SLF001
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional High

The copy-on-write clone bypasses __init__, so subclass-specific state is silently dropped; use the normal constructor or a subclass hook to preserve required attributes.

Suggested fix
        return self.__class__(
            handlers=self.handlers.copy(),
            inheritable_handlers=self.inheritable_handlers.copy(),
            parent_run_id=self.parent_run_id,
            tags=self.tags.copy(),
            inheritable_tags=self.inheritable_tags.copy(),
            metadata=self.metadata.copy(),
            inheritable_metadata=self.inheritable_metadata.copy(),
        )
Prompt for AI assistance

Copy the prompt below and paste it into ChatGPT, Claude, or any LLM:

You are an expert python developer with deep knowledge of security, performance, and best practices.

### Context

File: libs/core/langchain_core/callbacks/base.py
Lines: 992-1000
Issue Type: functional-high
Severity: high

Issue Description:
The copy-on-write clone bypasses `__init__`, so subclass-specific state is silently dropped; use the normal constructor or a subclass hook to preserve required attributes.

Current Code:
        clone = self.__class__.__new__(self.__class__)
        clone.handlers = self.handlers
        clone.inheritable_handlers = self.inheritable_handlers
        clone.parent_run_id = self.parent_run_id
        clone.tags = self.tags
        clone.inheritable_tags = self.inheritable_tags
        clone.metadata = self.metadata
        clone.inheritable_metadata = self.inheritable_metadata
        clone._cow = True  # noqa: SLF001

---

### Instructions

1. Fix the issue described above
2. Maintain the exact indentation and code style from the original
3. Follow python best practices and language-specific idioms
4. Ensure the fix addresses the root cause, not just the symptoms
5. Add brief inline comments explaining the fix if needed

### Constraints

- Do not change functionality beyond fixing the identified issue
- Preserve existing variable names and function signatures unless they are part of the problem
- Ensure the fix is production-ready

---


Like Dislike Create Issue Jira

@codity-dm
Copy link
Copy Markdown

codity-dm Bot commented May 19, 2026

Security Scan Summary

Metric Value
Vulnerabilities Critical: 0
Overall Risk Clean
Files Scanned 7

No critical security issues detected

Scan completed in 24.7s

Security scan powered by Codity.ai

@codity-dm
Copy link
Copy Markdown

codity-dm Bot commented May 19, 2026

License Compliance Scan

Metric Value
Packages Scanned 0
High Risk (Strong Copyleft) 0
Medium Risk (Weak Copyleft) 0
Low Risk (Permissive) 0
Unknown License 0

All licenses are low-risk and compliant

Powered by Codity.ai · Docs

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR applies a set of micro-optimizations across langchain-core and the DeepSeek partner package: copy-on-write (COW) semantics for BaseCallbackManager.copy(), per-class caching of inspect.signature results for run-manager detection, early-exit guards in event-dispatch hot paths, frozenset for O(1) key lookups, and a fast path in _format_for_tracing. It also fixes an Azure-endpoint detection bypass in ChatDeepSeek by switching from substring matching to proper hostname comparison.

  • COW callback manager (base.py): copy() now shares lists/dicts between original and clone and only materializes copies on first mutation via _cow_copy(). A bug exists in set_handlers, which clears _cow without first COW-copying tags/metadata, leaving shared mutable state exposed to unguarded mutations.
  • _is_azure_endpoint fix (chat_models.py): Uses urlparse().hostname to match only the actual hostname, closing bypass vectors via crafted paths or subdomains; three new unit tests cover the bypass attempts.
  • Remaining optimizations: Caching inspect.signature per class in chat_models.py and tools/base.py, converting CONFIG_KEYS/COPIABLE_KEYS to frozenset, and refactoring get_child/aget_child to construct managers directly rather than through a sequence of mutating calls.

Confidence Score: 3/5

The COW implementation in base.py has a real defect that can silently corrupt a cloned callback manager's tags and metadata when set_handlers is called on the original after copying; the remaining changes are straightforward and correct.

The set_handlers method clears the COW flag without first materializing copies of the shared tags, metadata, inheritable_tags, and inheritable_metadata. Any code that copies a manager, calls set_handlers on the original, and then calls add_tags or add_metadata will unknowingly mutate state that the clone still holds a reference to. This is a subtle bug that won't surface in unit tests that never exercise this exact ordering, but could produce hard-to-diagnose tracing corruption in real workloads.

libs/core/langchain_core/callbacks/base.py — the set_handlers method and the new-based copy() implementation deserve a close second look before merging.

Important Files Changed

Filename Overview
libs/core/langchain_core/callbacks/base.py Adds a copy-on-write optimization to BaseCallbackManager.copy(); contains a bug where set_handlers clears the COW flag without first COW-copying tags/metadata, leaving shared mutable state exposed to direct mutation.
libs/core/langchain_core/callbacks/manager.py Refactors get_child/aget_child to avoid set_handlers+add_tags chaining, adds early-exit guard for empty handler lists in handle_event/ahandle_event, and defers callback_manager construction in _configure to avoid wasted allocation; logic appears correct.
libs/core/langchain_core/language_models/chat_models.py Caches per-class inspect.signature results for _generate/_agenerate run_manager detection; adds a fast path in _format_for_tracing to skip list traversal when no messages have list content. Both changes are safe and correct.
libs/core/langchain_core/runnables/config.py Converts CONFIG_KEYS and COPIABLE_KEYS from lists to frozensets; improves O(n) membership checks to O(1). Safe change with no behavioral impact.
libs/core/langchain_core/tools/base.py Caches _has_run_manager_param and _runnable_config_param at construction time to avoid repeated inspect.signature calls per invocation. Correct for the common case; would be stale if _run is monkey-patched after construction (unlikely in practice).
libs/partners/deepseek/langchain_deepseek/chat_models.py Fixes _is_azure_endpoint to use hostname-based matching instead of substring search, closing bypass via strings like "evil-azure.com" or path components containing "azure.com".
libs/partners/deepseek/tests/unit_tests/test_chat_models.py Adds three targeted test cases for the _is_azure_endpoint bypass scenarios (hostname spoofing, subdomain squatting, path injection).

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Original as BaseCallbackManager (original)
    participant Clone as BaseCallbackManager (clone)

    Caller->>Original: copy()
    Note over Original: _cow = True
    Original-->>Clone: "new clone (_cow = True, shared lists)"

    alt Mutation on Clone
        Caller->>Clone: add_tags(["x"])
        Clone->>Clone: _cow_copy() → copies all shared lists
        Note over Clone: _cow = False, now owns its own lists
        Clone->>Clone: tags.extend(["x"])  ✅ safe
    else set_handlers then add_tags on Original
        Caller->>Original: set_handlers([h])
        Note over Original: _cow = False ← BUG: skips _cow_copy()
        Note over Original,Clone: tags/metadata still SHARED
        Caller->>Original: add_tags(["y"])
        Original->>Original: _cow_copy() → no-op (_cow is False)
        Original->>Original: tags.extend(["y"]) ← mutates SHARED list
        Note over Clone: clone.tags now unexpectedly contains "y" 🐛
    end
Loading

Comments Outside Diff (1)

  1. libs/core/langchain_core/callbacks/base.py, line 1103-1107 (link)

    P1 COW flag cleared without copying shared tags/metadata

    set_handlers sets self._cow = False before replacing handlers and inheritable_handlers, but it leaves tags, metadata, inheritable_tags, and inheritable_metadata still pointing to the shared objects established by the preceding copy() call. With _cow now False, any later call to add_tags or add_metadata will invoke _cow_copy() as a no-op and then mutate the shared list/dict in-place — silently corrupting the sibling clone's state as well.

    Concrete failure: mgr.copy()mgr.set_handlers([h])mgr.add_tags(["x"]) will unexpectedly add "x" to the clone's tags too.

    The fix is to call self._cow_copy() before clearing the flag and replacing the handler lists.

Reviews (1): Last reviewed commit: "perf(core): copy-on-write callbacks, cac..." | Re-trigger Greptile

Comment on lines 985 to +1001
def copy(self) -> Self:
"""Return a copy of the callback manager."""
return self.__class__(
handlers=self.handlers.copy(),
inheritable_handlers=self.inheritable_handlers.copy(),
parent_run_id=self.parent_run_id,
tags=self.tags.copy(),
inheritable_tags=self.inheritable_tags.copy(),
metadata=self.metadata.copy(),
inheritable_metadata=self.inheritable_metadata.copy(),
)
"""Return a copy of the callback manager.

Uses copy-on-write: the copy shares underlying lists/dicts until
either the original or the copy is mutated.
"""
self._cow = True
clone = self.__class__.__new__(self.__class__)
clone.handlers = self.handlers
clone.inheritable_handlers = self.inheritable_handlers
clone.parent_run_id = self.parent_run_id
clone.tags = self.tags
clone.inheritable_tags = self.inheritable_tags
clone.metadata = self.metadata
clone.inheritable_metadata = self.inheritable_metadata
clone._cow = True # noqa: SLF001
return clone
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 __new__-based clone skips subclass __init__

copy() uses self.__class__.__new__(self.__class__) and manually copies only the seven attributes declared in BaseCallbackManager.__init__. Any subclass that extends __init__ with additional instance attributes (e.g., self.ended, self.parent_run_manager in CallbackManagerForChainGroup) will produce a clone that raises AttributeError on first access.

The two current subclasses that add extra attributes (CallbackManagerForChainGroup, AsyncCallbackManagerForChainGroup) both override copy() and are safe today, but this is an invisible footgun for any future subclass that forgets to do the same. A docstring note or a __init_subclass__ guard would prevent silent breakage.

@codity-dm
Copy link
Copy Markdown

codity-dm Bot commented May 19, 2026

Code Quality Report — test-org-codity/langchain · PR #5

Scanned: 2026-05-19 19:31 UTC | Score: 51/100 | Provider: github

Executive Summary

Severity Count
Critical 0
High 2
Medium 4
Low 2
Top Findings

[CQ-LLM-001] libs/core/langchain_core/callbacks/base.py:970 (Complexity · HIGH)

Issue: The _cow_copy method has deep nesting due to multiple if statements and copy operations.
Suggestion: Consider refactoring the _cow_copy method to reduce nesting and improve readability.

if self._cow:
    self.handlers = self.handlers.copy()
    self.inheritable_handlers = self.inheritable_handlers.copy()
    self.tags = self.tags.copy()
    self.inheritable_tags = self.inheritable_tags.copy()
    self.metadata = self.metadata.copy()
    self.inheritable_metadata = self.inheritable_metadata.copy()
    self._cow = False

[CQ-LLM-006] libs/core/langchain_core/callbacks/base.py:1071 (Testability · HIGH)

Issue: The _cow_copy method relies on internal state, making it difficult to test in isolation.
Suggestion: Consider refactoring to allow dependency injection or to pass state explicitly for easier testing.

self._cow_copy()

[CQ-LLM-002] libs/core/langchain_core/callbacks/base.py:1000 (Duplication · MEDIUM)

Issue: The copy method contains duplicated logic for copying attributes.
Suggestion: Extract the attribute copying logic into a separate method to adhere to DRY principles.

self.handlers = self.handlers.copy()
self.inheritable_handlers = self.inheritable_handlers.copy()
tags = self.tags.copy()
inheritable_tags = self.inheritable_tags.copy()
metadata = self.metadata.copy()
inheritable_metadata = self.inheritable_metadata.copy()

[CQ-LLM-003] libs/core/langchain_core/callbacks/base.py:1071 (Error_Handling · MEDIUM)

Issue: The _cow_copy method does not handle potential exceptions when copying attributes.
Suggestion: Add error handling to manage exceptions that may arise during the copy operations.

self.handlers = self.handlers.copy()

[CQ-LLM-005] libs/core/langchain_core/callbacks/base.py:1100 (Maintainability · MEDIUM)

Issue: The use of magic numbers (e.g., 0) in the code can lead to confusion.
Suggestion: Replace magic numbers with named constants to improve code clarity.

self._cow = False

[CQ-LLM-007] libs/core/langchain_core/callbacks/base.py:1100 (Performance · MEDIUM)

Issue: The method add_tags has potential performance issues due to multiple list operations.
Suggestion: Optimize the add_tags method to minimize list operations and improve performance.

self.tags.extend(tags)

[CQ-LLM-004] libs/core/langchain_core/callbacks/base.py:970 (Documentation · LOW)

Issue: The _cow_copy method lacks a docstring explaining its purpose and behavior.
Suggestion: Add a docstring to the _cow_copy method to improve documentation.

def _cow_copy(self) -> None:

[CQ-002] libs/core/langchain_core/tools/base.py:977 (Complexity · LOW)

Issue: Deep nesting detected (depth ~5)
Suggestion: Extract nested blocks into helper functions

tool_kwargs |= {self._runnable_config_param: config}

Per-File Breakdown

File Critical High Medium Low Total
libs/core/langchain_core/callbacks/base.py 0 2 4 1 7
libs/core/langchain_core/tools/base.py 0 0 0 1 1

Recommendations

  1. Resolve High severity issues, especially error handling gaps and performance bottlenecks.
  • Run automated tests after applying fixes to verify no regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants