Skip to content

⚡️ Speed up method Tool.from_string by 159%#170

Open
codeflash-ai[bot] wants to merge 1 commit into
branch-3.9from
codeflash/optimize-Tool.from_string-mhx0t8k6
Open

⚡️ Speed up method Tool.from_string by 159%#170
codeflash-ai[bot] wants to merge 1 commit into
branch-3.9from
codeflash/optimize-Tool.from_string-mhx0t8k6

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai Bot commented Nov 13, 2025

📄 159% (1.59x) speedup for Tool.from_string in src/bokeh/models/tools.py

⏱️ Runtime : 7.82 milliseconds 3.01 milliseconds (best of 44 runs)

📝 Explanation and details

The optimization achieves a 159% speedup by eliminating redundant computations in the error handling path of Tool.from_string(), which is frequently called when processing tool configurations in Bokeh plots.

Key Optimizations:

  1. Class-level caching of tool names: The original code repeatedly called cls._known_aliases.keys() and computed .lower() for each key on every error. The optimized version caches both the original tool names tuple (_known_names_tuple) and their lowercased variants (_known_names_lower) as class attributes, computed only once per class.

  2. Efficient case-insensitive matching: Instead of passing known_names (which are mixed case) to difflib.get_close_matches() with name.lower(), the optimization passes the pre-computed known_names_lower list, eliminating redundant string lowering operations during fuzzy matching.

  3. Import reorganization: Moved imports to standard locations for better performance.

Performance Impact by Test Case:

  • Large-scale scenarios show dramatic improvements: Tests with 1000+ tools see speedups of 2900-5900% because the caching eliminates O(n) string operations on every error
  • Basic error cases: 1-7% improvements due to reduced overhead
  • Success cases: Minimal impact (±3%) since caching only helps error paths

Real-world Impact:
Based on the function references, Tool.from_string() is called from add_tools() in plot creation and _resolve_tools() during tool resolution. When users provide invalid tool names (common during development/configuration), this optimization prevents performance degradation that scales with the number of registered tools. The caching is particularly valuable in applications with many custom tools or when processing tool lists programmatically.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 9 Passed
🌀 Generated Regression Tests 67 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
🌀 Generated Regression Tests and Runtime
from typing import Callable, ClassVar, Dict

# imports
import pytest
from bokeh.models.tools import Tool


# Dummy tool subclasses for registration
class PanTool(Tool): pass
class ZoomTool(Tool): pass
class SaveTool(Tool): pass

# Register known aliases for testing
Tool._known_aliases = {
    "pan": PanTool,
    "zoom": ZoomTool,
    "save": SaveTool,
    # Add more as needed
}

# --- Unit Tests ---

# --- Basic Test Cases ---

def test_basic_known_name_pan():
    """Test that a known tool name returns the correct Tool instance."""
    codeflash_output = Tool.from_string("pan"); tool = codeflash_output # 154μs -> 153μs (0.275% faster)
    # Ensure a new instance is created each time
    codeflash_output = Tool.from_string("pan"); tool2 = codeflash_output # 92.3μs -> 90.7μs (1.78% faster)

def test_basic_known_name_zoom():
    """Test that another known tool name returns the correct Tool instance."""
    codeflash_output = Tool.from_string("zoom"); tool = codeflash_output # 119μs -> 119μs (0.086% slower)

def test_basic_known_name_save():
    """Test that a third known tool name returns the correct Tool instance."""
    codeflash_output = Tool.from_string("save"); tool = codeflash_output # 118μs -> 118μs (0.235% slower)

def test_basic_case_sensitive():
    """Test that case sensitivity is respected (should fail for wrong case)."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("Pan") # 38.4μs -> 39.9μs (3.79% slower)

def test_basic_unknown_name():
    """Test that an unknown tool name raises a ValueError with possible tools listed."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("foo") # 22.8μs -> 22.1μs (3.46% faster)
    msg = str(excinfo.value)
    # Should list all known tools
    for name in Tool._known_aliases.keys():
        pass

# --- Edge Test Cases ---

def test_edge_empty_string():
    """Test that an empty string raises a ValueError and lists all possible tools."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("") # 14.7μs -> 14.8μs (0.325% slower)
    msg = str(excinfo.value)
    for name in Tool._known_aliases.keys():
        pass

def test_edge_whitespace_string():
    """Test that a whitespace string raises a ValueError and lists all possible tools."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("   ") # 21.6μs -> 20.7μs (4.50% faster)
    msg = str(excinfo.value)
    for name in Tool._known_aliases.keys():
        pass

def test_edge_similar_name():
    """Test that a name similar to a known tool gives a ValueError with 'similar' suggestion."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("pann") # 33.4μs -> 36.0μs (7.29% slower)
    msg = str(excinfo.value)

def test_edge_similar_name_case_insensitive():
    """Test that a lowercased similar name triggers a suggestion."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("ZOOMM") # 33.4μs -> 34.8μs (3.77% slower)
    msg = str(excinfo.value)

def test_edge_numeric_name():
    """Test that a numeric string raises a ValueError and lists all possible tools."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("123") # 22.1μs -> 21.3μs (3.36% faster)
    msg = str(excinfo.value)

def test_edge_special_characters():
    """Test that a string with special characters raises a ValueError."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("@pan!") # 32.3μs -> 33.3μs (3.04% slower)
    msg = str(excinfo.value)



def test_edge_tool_name_with_spaces():
    """Test that a known tool name with extra spaces is not accepted."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string(" pan ") # 39.9μs -> 41.1μs (2.95% slower)
    msg = str(excinfo.value)

# --- Large Scale Test Cases ---

def test_large_scale_many_tools():
    """Test from_string behavior when many tools are registered."""
    # Save original aliases
    orig_aliases = Tool._known_aliases.copy()
    # Register 1000 dummy tools
    class DummyTool(Tool): pass
    Tool._known_aliases = {f"tool_{i}": (lambda i=i: DummyTool()) for i in range(1000)}
    # Test that all can be instantiated
    for i in range(0, 1000, 100):  # Sample every 100th tool for speed
        codeflash_output = Tool.from_string(f"tool_{i}"); tool = codeflash_output # 845μs -> 847μs (0.248% slower)
    # Test an unknown name with many tools
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("not_a_tool") # 1.52ms -> 25.0μs (5998% faster)
    msg = str(excinfo.value)
    # Should list some known tool names
    for name in ["tool_0", "tool_999"]:
        pass
    # Restore original aliases
    Tool._known_aliases = orig_aliases

def test_large_scale_similar_name_in_large_set():
    """Test 'similar' suggestion works with large tool sets."""
    orig_aliases = Tool._known_aliases.copy()
    class DummyTool(Tool): pass
    Tool._known_aliases = {f"tool_{i}": (lambda i=i: DummyTool()) for i in range(1000)}
    # Add a specific tool for similarity testing
    Tool._known_aliases["pan"] = PanTool
    # Misspelled "pan"
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("pann") # 1.27ms -> 42.2μs (2917% faster)
    msg = str(excinfo.value)
    Tool._known_aliases = orig_aliases

def test_large_scale_performance():
    """Test that from_string does not take excessive time with large sets."""
    import time
    orig_aliases = Tool._known_aliases.copy()
    class DummyTool(Tool): pass
    Tool._known_aliases = {f"tool_{i}": (lambda i=i: DummyTool()) for i in range(1000)}
    start = time.time()
    codeflash_output = Tool.from_string("tool_500"); tool = codeflash_output # 137μs -> 135μs (1.80% faster)
    end = time.time()
    Tool._known_aliases = orig_aliases

# --- Additional Edge Cases ---

def test_edge_tool_name_is_empty_string():
    """Test that empty string is not accepted as a valid tool name."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("") # 18.9μs -> 19.9μs (4.97% slower)

def test_edge_tool_name_is_only_spaces():
    """Test that a string of only spaces is not accepted."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("   ") # 23.7μs -> 23.0μs (2.72% faster)


def test_edge_tool_name_is_numeric():
    """Test that a numeric string is not accepted."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("123") # 27.3μs -> 27.5μs (0.774% slower)

def test_edge_tool_name_is_special_characters():
    """Test that a string with special characters is not accepted."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("@pan!") # 35.3μs -> 36.2μs (2.37% slower)

def test_edge_tool_name_is_list():
    """Test that a list is not accepted."""
    with pytest.raises(TypeError):
        Tool.from_string(["pan"]) # 1.20μs -> 1.39μs (13.4% slower)

def test_edge_tool_name_is_dict():
    """Test that a dict is not accepted."""
    with pytest.raises(TypeError):
        Tool.from_string({"pan": 1}) # 1.18μs -> 1.36μs (13.4% slower)

import difflib
from typing import Callable, ClassVar

# imports
import pytest
from bokeh.models.tools import Tool


# For testing, we need some dummy tool subclasses and to populate _known_aliases
class PanTool(Tool):
    def __init__(self):
        super().__init__(name="pan")

class ZoomTool(Tool):
    def __init__(self):
        super().__init__(name="zoom")

class SaveTool(Tool):
    def __init__(self):
        super().__init__(name="save")

class CustomTool(Tool):
    def __init__(self):
        super().__init__(name="custom")

# Populate known aliases for testing
Tool._known_aliases = {
    "pan": PanTool,
    "zoom": ZoomTool,
    "save": SaveTool,
    "custom": CustomTool,
}

# ------------------- Unit Tests -------------------

# 1. Basic Test Cases

def test_basic_known_tool_pan():
    """Test that a known tool 'pan' returns a PanTool instance."""
    codeflash_output = Tool.from_string("pan"); tool = codeflash_output # 159μs -> 152μs (4.33% faster)

def test_basic_known_tool_zoom():
    """Test that a known tool 'zoom' returns a ZoomTool instance."""
    codeflash_output = Tool.from_string("zoom"); tool = codeflash_output # 126μs -> 125μs (0.788% faster)

def test_basic_known_tool_save():
    """Test that a known tool 'save' returns a SaveTool instance."""
    codeflash_output = Tool.from_string("save"); tool = codeflash_output # 120μs -> 121μs (0.428% slower)


def test_unknown_tool_raises_value_error():
    """Test that an unknown tool name raises ValueError with correct message."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("unknown") # 28.2μs -> 27.7μs (1.71% faster)
    msg = str(excinfo.value)
    # Should mention possible tools
    for alias in Tool._known_aliases.keys():
        pass

def test_case_insensitive_match_suggests_similar():
    """Test that a case-insensitive close match is suggested."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("Pan") # 34.0μs -> 35.9μs (5.05% slower)
    msg = str(excinfo.value)

def test_typo_suggests_similar():
    """Test that a typo in tool name suggests similar names."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("pna") # 36.2μs -> 36.7μs (1.37% slower)
    msg = str(excinfo.value)

def test_empty_string_suggests_all():
    """Test that an empty string suggests all possible tools."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("") # 15.1μs -> 14.7μs (2.48% faster)
    msg = str(excinfo.value)
    # Should mention all possible tools
    for alias in Tool._known_aliases.keys():
        pass

def test_whitespace_string_suggests_all():
    """Test that a whitespace string suggests all possible tools."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string(" ") # 15.8μs -> 15.6μs (0.947% faster)
    msg = str(excinfo.value)
    for alias in Tool._known_aliases.keys():
        pass


def test_numeric_string_not_found():
    """Test that a numeric string not matching any tool raises ValueError."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("123") # 27.5μs -> 27.2μs (0.976% faster)
    msg = str(excinfo.value)

def test_special_characters_not_found():
    """Test that a string with special characters not matching any tool raises ValueError."""
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("@zoom!") # 36.1μs -> 37.8μs (4.72% slower)
    msg = str(excinfo.value)


def test_large_number_of_known_aliases():
    """Test performance and correctness with a large number of aliases."""
    # Add 900 dummy tools
    for i in range(100, 1000):
        Tool._known_aliases[f"tool{i}"] = lambda i=i: Tool(name=f"tool{i}")
    # Test that a known tool in the large set returns correctly
    codeflash_output = Tool.from_string("tool999"); tool = codeflash_output # 155μs -> 159μs (2.64% slower)
    # Test that an unknown tool suggests all possible tools (should not error on size)
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("not_a_tool") # 1.11ms -> 24.3μs (4472% faster)
    msg = str(excinfo.value)
    # Clean up
    for i in range(100, 1000):
        Tool._known_aliases.pop(f"tool{i}")

def test_large_scale_typo_suggestion():
    """Test that a typo in a large set still suggests the closest match."""
    # Add many similar tools
    for i in range(100, 200):
        Tool._known_aliases[f"zoom{i}"] = lambda i=i: Tool(name=f"zoom{i}")
    # Typo for 'zoom199'
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("zooom199") # 870μs -> 42.5μs (1947% faster)
    msg = str(excinfo.value)
    # Clean up
    for i in range(100, 200):
        Tool._known_aliases.pop(f"zoom{i}")

def test_large_scale_all_suggestions():
    """Test that an empty string in a large alias set suggests all possible tools."""
    # Add many tools
    for i in range(500):
        Tool._known_aliases[f"custom{i}"] = lambda i=i: Tool(name=f"custom{i}")
    with pytest.raises(ValueError) as excinfo:
        Tool.from_string("") # 181μs -> 15.7μs (1058% faster)
    msg = str(excinfo.value)
    # Clean up
    for i in range(500):
        Tool._known_aliases.pop(f"custom{i}")

def test_large_scale_known_tool():
    """Test that a known tool in a large set is returned correctly."""
    for i in range(500):
        Tool._known_aliases[f"save{i}"] = lambda i=i: Tool(name=f"save{i}")
    codeflash_output = Tool.from_string("save123"); tool = codeflash_output # 149μs -> 150μs (0.667% slower)
    # Clean up
    for i in range(500):
        Tool._known_aliases.pop(f"save{i}")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from bokeh.models.tools import Tool
import pytest

def test_Tool_from_string():
    with pytest.raises(ValueError, match="unexpected\\ tool\\ name\\ '\x00\\-ě',\\ possible\\ tools\\ are\\ pan,\\ xpan,\\ ypan,\\ pan_left,\\ pan_right,\\ pan_up,\\ pan_down,\\ pan_west,\\ pan_east,\\ pan_north,\\ pan_south,\\ xwheel_pan,\\ ywheel_pan,\\ wheel_zoom,\\ xwheel_zoom,\\ ywheel_zoom,\\ zoom_in,\\ xzoom_in,\\ yzoom_in,\\ zoom_out,\\ xzoom_out,\\ yzoom_out,\\ click,\\ tap,\\ doubletap,\\ crosshair,\\ xcrosshair,\\ ycrosshair,\\ box_select,\\ xbox_select,\\ ybox_select,\\ poly_select,\\ lasso_select,\\ box_zoom,\\ xbox_zoom,\\ ybox_zoom,\\ auto_box_zoom,\\ save,\\ copy,\\ undo,\\ redo,\\ reset,\\ help,\\ examine,\\ fullscreen,\\ box_edit,\\ line_edit,\\ point_draw,\\ poly_draw,\\ poly_edit,\\ freehand_draw\\ or\\ hover"):
        Tool.from_string(Tool, '\x00-ě')

To edit these changes git checkout codeflash/optimize-Tool.from_string-mhx0t8k6 and push.

Codeflash Static Badge

The optimization achieves a **159% speedup** by eliminating redundant computations in the error handling path of `Tool.from_string()`, which is frequently called when processing tool configurations in Bokeh plots.

**Key Optimizations:**

1. **Class-level caching of tool names**: The original code repeatedly called `cls._known_aliases.keys()` and computed `.lower()` for each key on every error. The optimized version caches both the original tool names tuple (`_known_names_tuple`) and their lowercased variants (`_known_names_lower`) as class attributes, computed only once per class.

2. **Efficient case-insensitive matching**: Instead of passing `known_names` (which are mixed case) to `difflib.get_close_matches()` with `name.lower()`, the optimization passes the pre-computed `known_names_lower` list, eliminating redundant string lowering operations during fuzzy matching.

3. **Import reorganization**: Moved imports to standard locations for better performance.

**Performance Impact by Test Case:**
- **Large-scale scenarios show dramatic improvements**: Tests with 1000+ tools see speedups of **2900-5900%** because the caching eliminates O(n) string operations on every error
- **Basic error cases**: 1-7% improvements due to reduced overhead
- **Success cases**: Minimal impact (±3%) since caching only helps error paths

**Real-world Impact:**
Based on the function references, `Tool.from_string()` is called from `add_tools()` in plot creation and `_resolve_tools()` during tool resolution. When users provide invalid tool names (common during development/configuration), this optimization prevents performance degradation that scales with the number of registered tools. The caching is particularly valuable in applications with many custom tools or when processing tool lists programmatically.
@codeflash-ai codeflash-ai Bot requested a review from mashraf-222 November 13, 2025 06:01
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants