Skip to content

Tiny Fix: lcmspy crash#1384

Open
jeff-hykin wants to merge 1 commit intodevfrom
jeff/fix/lcmspy
Open

Tiny Fix: lcmspy crash#1384
jeff-hykin wants to merge 1 commit intodevfrom
jeff/fix/lcmspy

Conversation

@jeff-hykin
Copy link
Member

Problem

lcmspy will eventually crash with an error about mutating a value while iterating

Solution

Copy the list before iterating

Breaking Changes

None

How to Test

lcmspy (run for a long time while a bunch of topics publish)

Contributor License Agreement

  • I have read and approved the CLA.

@jeff-hykin jeff-hykin changed the base branch from main to dev February 28, 2026 21:39
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Greptile Summary

This PR fixes a crash in lcmspy caused by a RuntimeError: deque mutated during iteration — wrapping self.message_history in list() before the list comprehension in _get_messages_in_window creates an atomic snapshot of the deque, preventing mutation by a concurrent thread (the LCM callback) during iteration.

  • The fix is minimal, correct, and directly targets the reported crash.
  • It mirrors the existing pattern already used in graph_log at line 172, where list(self.topic.values()) is used for the same reason.
  • No new logic is introduced; behaviour is unchanged in the single-threaded case.

Confidence Score: 5/5

  • This PR is safe to merge — it is a one-line, targeted fix with no risk of regressions.
  • The change is a single-line, well-understood thread-safety fix. It is consistent with an identical pattern already present in the same file (list(self.topic.values()) on line 172). The list() call on a CPython deque is an atomic snapshot due to the GIL, making this a correct and complete fix for the described crash.
  • No files require special attention.

Important Files Changed

Filename Overview
dimos/utils/cli/lcmspy/lcmspy.py Adds list() snapshot around self.message_history deque iteration in _get_messages_in_window to prevent RuntimeError when the deque is mutated concurrently by another thread.

Sequence Diagram

sequenceDiagram
    participant LCMThread as LCM Callback Thread
    participant GraphThread as Graph Log Thread
    participant Deque as message_history (deque)

    LCMThread->>Deque: append((timestamp, size))
    LCMThread->>Deque: popleft() [_cleanup_old_messages]

    GraphThread->>Deque: list(deque) [snapshot — NEW FIX]
    Note over GraphThread,Deque: Safe snapshot prevents<br/>RuntimeError during iteration

    GraphThread->>GraphThread: iterate snapshot [(ts, size) for ...]
    GraphThread->>GraphThread: update_graphs()
Loading

Last reviewed commit: 67d827f

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

current_time = time.time()
cutoff_time = current_time - time_window
return [(ts, size) for ts, size in self.message_history if ts >= cutoff_time]
return [(ts, size) for ts, size in list(self.message_history) if ts >= cutoff_time]
Copy link
Contributor

@paul-nechifor paul-nechifor Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as a quick fix, but I don't know if it make sense to duplicate the list every time you iterate over it. At a glance, I can't tell if this has a limit, and just copies more memory as you increase the size.

An alternative solution would be to add with._lock when accessing/mutating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this is the ideal fix. AFAIK _get_messages_in_window wants a copy because its trying to take a snapshot at a particular time. I believe list(deque) is atomic, so it grabs the snapshot as intended.

Blocking all modifications to the deque (self.message_history) until the list comprehension is done I think would be slower and more complex than grabbing a copy and then iterating over the copy.

I can change it to a tuple instead of a list (I need to run CI again to get past the flakey tests anyways)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(deque) is atomic because it's implemented in C and holds the GIL. It's an implementation detail. Python is already moving towards removing the GIL. When that's mainstream, you'll have Python code adding a value and C code iterating over it to create the list.

I don't think it's an ideal fix. It only prevents mutations by implication, because the code for list() is holding the global interpreter lock.

If you use with lock, you're both using an explicit lock and using less memory.


But I don't think with lock is the ideal solution either. Taking another look at the code, I think the ideal solution would be to use two circular buffers (one for size, one for number of messages), each with 60 entries (a bucket for each second), and when a message comes, add it's size to one, and increment its corresponding timestamp bucket. That would use a fixed memory size.

The code only uses a maximum of 60 seconds and only counts the number of messages and the sum of sizes.

Copy link
Member Author

@jeff-hykin jeff-hykin Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'll come back to this after the rosnav stuff. I just PR'd thinking a 30sec fix would be helpful

If we change the approach away from copies, I'm guessing we'd probably also want to redo these existing lines in GraphLCMSpy.

        while not self.graph_log_stop_event.is_set():
            self.update_graphs(self.config.graph_log_window)  # type: ignore[attr-defined]
            # Copy to list to avoid RuntimeError: dictionary changed size during iteration
            for topic in list(self.topic.values()):
                topic.update_graphs(self.config.graph_log_window)  # type: ignore[attr-defined]
            time.sleep(self.config.graph_log_window)  # type: ignore[attr-defined]

mypy in that file could also use help

jeff/fix/lcmspy2 has changes for a proper fix (except for GraphLCMSpy), but I don't want to get too invested (that branch is untested)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants