boxed · nicklafleur · Feb 28, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/.gitignore b/.gitignore
@@ -10,6 +10,8 @@ venv
 table.css.map
 .idea
 .vscode
+.claude
+.cursor
 .cache
 .DS_Store
 .pytest_cache

diff --git a/ARCHITECTURE.rst b/ARCHITECTURE.rst
@@ -19,7 +19,7 @@ The mutated files contains the original code and the mutants. With the ``MUTANT_
 Collecting tests and stats
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-We collect a list of all tests and execute them. In this test run, we track which tests would execute which mutants, and how long they take. We use both stats for performance optimizations later on. The results are stored in ``mutants/mutmut-stats.json`` and global variables.
+We collect a list of all tests and execute them. In this test run, we track which tests would execute which mutants, and how long they take. We also track function call dependencies (which functions call which other functions) for cascading invalidation when code changes. We use these stats for performance optimizations later on. The results are stored in ``mutants/mutmut-stats.json`` and global variables.
 
 
 Collecting mutation results

diff --git a/README.rst b/README.rst
@@ -64,6 +64,35 @@ source code control and committed before you apply a mutant!
 
 
 If during the installation you get an error for the `libcst` dependency mentioning the lack of a rust compiler on your system, it is because your architecture does not have a prebuilt binary for `libcst` and it requires both `rustc` and `cargo` from the [rust toolchain](https://www.rust-lang.org/tools/install) to be built. This is known for at least the `x86_64-darwin` architecture.
+left off.
+
+
+Incremental Testing
+~~~~~~~~~~~~~~~~~~~
+
+Mutmut is designed for incremental workflows. It remembers which mutants have
+been tested and their results, so subsequent runs skip already-tested mutants.
+
+**Function-level change detection:** Mutmut computes a hash of each function's
+source code. When you modify a function, mutmut detects the change and
+automatically re-tests all mutants in that function. Unchanged functions keep
+their previous results.
+
+**Dependency tracking:** Mutmut tracks which functions call which other functions
+during stats collection. When a function changes, mutmut automatically invalidates
+and re-tests mutants in all functions that depend on it (transitively). For example,
+if function A calls B which calls C, and you modify C, mutants in A, B, and C are
+all re-tested.
+
+This means you can:
+
+- Run ``mutmut run``, stop partway through, and continue later
+- Modify your source code and re-run - only changed functions are re-tested
+- Update shared utilities and have dependent functions automatically re-tested
+- Update your tests and use ``mutmut browse`` to selectively re-test mutants
+
+The mutation data is stored in the ``mutants/`` directory. Delete this
+directory to start completely fresh.
 
 
 Wildcards for testing mutants
@@ -140,6 +169,59 @@ but will also lead to more surviving mutants that would otherwise have been
 caught.
 
 
+Dependency tracking
+~~~~~~~~~~~~~~~~~~~
+
+Mutmut automatically tracks function call dependencies during stats collection.
+When a function's code changes, all functions that depend on it (transitively)
+are also invalidated and re-tested. This is enabled by default.
+
+To disable dependency tracking:
+
+.. code-block:: toml
+
+    [tool.mutmut]
+    track_dependencies = false
+
+You can also limit the depth of dependency tracking (defaults to ``max_stack_depth``):
+
+.. code-block:: toml
+
+    [tool.mutmut]
+    dependency_tracking_depth = 5
+
+The dependency graph is stored in ``mutants/mutmut-stats.json`` under the
+``function_dependencies`` key.
+
+**Config change detection:**
+
+Mutmut automatically detects when dependency tracking configuration changes
+between runs. If you enable/disable tracking or change the depth, mutmut will
+re-collect stats to ensure the dependency graph matches your current settings.
+This avoids both missed invalidations (too few edges) and unnecessary test runs
+(too many edges).
+
+**Performance considerations:**
+
+For large codebases, be aware of the overhead at each phase:
+
+- **Mutant generation:** The BFS expansion runs once per ``mutmut run`` when
+  changes are detected. Complexity is O(changed + edges), typically milliseconds
+  even for graphs with 10,000+ functions.
+
+- **Stats collection:** Adds ~1-5% overhead. Each function call records a single
+  edge (caller → callee) via a ContextVar lookup and set insertion—both O(1).
+  The depth check is a simple integer comparison.
+
+- **Storage:** The dependency graph adds to ``mutmut-stats.json``. A codebase
+  with 10,000 functions and 50,000 call edges adds roughly 1-2 MB.
+
+- **Memory:** The in-memory graph uses ~100 bytes per edge. 50,000 edges ≈ 5 MB.
+
+If you experience issues in very large monorepos, you can limit tracking depth
+with ``dependency_tracking_depth`` or disable entirely with ``track_dependencies = false``.
+
+
 Exclude files from mutation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 

diff --git a/e2e_projects/benchmark_1k/README.md b/e2e_projects/benchmark_1k/README.md
@@ -0,0 +1,3 @@
+# Benchmark 1K
+
+A synthetic benchmark project with 1000 mutants for validating mutmut's fucntion hashing and incremental mutation testing features.
diff --git a/e2e_projects/benchmark_1k/pyproject.toml b/e2e_projects/benchmark_1k/pyproject.toml
@@ -0,0 +1,15 @@
+[project]
+name = "benchmark-1k"
+version = "0.1.0"
+description = "Benchmark project for mutmut warmup strategy comparison (~1000 mutants)"
+requires-python = ">=3.10"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/benchmark"]
+
+[tool.mutmut]
+debug = false
diff --git a/e2e_projects/benchmark_1k/src/benchmark/__init__.py b/e2e_projects/benchmark_1k/src/benchmark/__init__.py
@@ -0,0 +1,34 @@
+"""Benchmark package for mutmut warmup strategy testing.
+
+Simulates a real application that imports heavy libraries on startup.
+Set BENCHMARK_IMPORT_DELAY environment variable to control the delay.
+"""
+
+import os
+import time
+
+from benchmark import arguments
+from benchmark import booleans
+from benchmark import comparisons
+from benchmark import complex
+from benchmark import numbers
+from benchmark import operators
+from benchmark import returns
+from benchmark import strings
+
+__all__ = [
+    "numbers",
+    "strings",
+    "booleans",
+    "operators",
+    "comparisons",
+    "arguments",
+    "returns",
+    "complex",
+]
+
+
+# Simulate library imports
+import_delay = float(os.environ.get("BENCHMARK_IMPORT_DELAY", "0.05"))
+if import_delay > 0:
+    time.sleep(import_delay)
diff --git a/e2e_projects/benchmark_1k/src/benchmark/arguments.py b/e2e_projects/benchmark_1k/src/benchmark/arguments.py
@@ -0,0 +1,71 @@
+"""Benchmark functions with various argument patterns."""
+
+
+# === Helper functions ===
+
+
+def helper_2(a, b):
+    """Helper with 2 args."""
+    return (a, b)
+
+
+def helper_3(a, b, c):
+    """Helper with 3 args."""
+    return (a, b, c)
+
+
+def combiner(first, second):
+    """Combine 2 values."""
+    if first is None or second is None:
+        return None
+    return f"{first}-{second}"
+
+
+# === 2-arg calls ===
+
+
+def call_2args_batch_1():
+    """2-arg calls."""
+    r1 = helper_2(1, 2)
+    r2 = helper_2(3, 4)
+    return r1, r2
+
+
+# === 3-arg calls ===
+
+
+def call_3args_batch_1():
+    """3-arg calls."""
+    r1 = helper_3(1, 2, 3)
+    return (r1,)
+
+
+# === dict() keyword calls ===
+
+
+def dict_2keys_batch_1():
+    """dict with 2 keys."""
+    d1 = {"a": 1, "b": 2}
+    return (d1,)
+
+
+def dict_3keys_batch_1():
+    """dict with 3 keys."""
+    d1 = {"x": 1, "y": 2, "z": 3}
+    return (d1,)
+
+
+# === String method calls ===
+
+
+def string_method_calls():
+    """String method calls with multiple args."""
+    text = "a-b-c-d-e"
+    r1 = text.split("-", 2)
+    return (r1,)
+
+
+def format_calls():
+    """String format calls."""
+    r1 = "{} {}".format("hello", "world")
+    return (r1,)
-Original file line number
+Diff line change
@@ Expand Up / @@ -10,6 +10,8 @@ venv @@
     table.css.map
     .idea
     .vscode
+    .claude
+    .cursor
     .cache
     .DS_Store
     .pytest_cache
@@ Expand Down @@
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Benchmark 1K

		A synthetic benchmark project with 1000 mutants for validating mutmut's fucntion hashing and incremental mutation testing features.