Conversation
Greptile SummaryThis PR fixes a crash in
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant LCMThread as LCM Callback Thread
participant GraphThread as Graph Log Thread
participant Deque as message_history (deque)
LCMThread->>Deque: append((timestamp, size))
LCMThread->>Deque: popleft() [_cleanup_old_messages]
GraphThread->>Deque: list(deque) [snapshot — NEW FIX]
Note over GraphThread,Deque: Safe snapshot prevents<br/>RuntimeError during iteration
GraphThread->>GraphThread: iterate snapshot [(ts, size) for ...]
GraphThread->>GraphThread: update_graphs()
Last reviewed commit: 67d827f |
| current_time = time.time() | ||
| cutoff_time = current_time - time_window | ||
| return [(ts, size) for ts, size in self.message_history if ts >= cutoff_time] | ||
| return [(ts, size) for ts, size in list(self.message_history) if ts >= cutoff_time] |
There was a problem hiding this comment.
Works as a quick fix, but I don't know if it make sense to duplicate the list every time you iterate over it. At a glance, I can't tell if this has a limit, and just copies more memory as you increase the size.
An alternative solution would be to add with._lock when accessing/mutating.
There was a problem hiding this comment.
I actually think this is the ideal fix. AFAIK _get_messages_in_window wants a copy because its trying to take a snapshot at a particular time. I believe list(deque) is atomic, so it grabs the snapshot as intended.
Blocking all modifications to the deque (self.message_history) until the list comprehension is done I think would be slower and more complex than grabbing a copy and then iterating over the copy.
I can change it to a tuple instead of a list (I need to run CI again to get past the flakey tests anyways)
There was a problem hiding this comment.
list(deque) is atomic because it's implemented in C and holds the GIL. It's an implementation detail. Python is already moving towards removing the GIL. When that's mainstream, you'll have Python code adding a value and C code iterating over it to create the list.
I don't think it's an ideal fix. It only prevents mutations by implication, because the code for list() is holding the global interpreter lock.
If you use with lock, you're both using an explicit lock and using less memory.
But I don't think with lock is the ideal solution either. Taking another look at the code, I think the ideal solution would be to use two circular buffers (one for size, one for number of messages), each with 60 entries (a bucket for each second), and when a message comes, add it's size to one, and increment its corresponding timestamp bucket. That would use a fixed memory size.
The code only uses a maximum of 60 seconds and only counts the number of messages and the sum of sizes.
There was a problem hiding this comment.
I guess I'll come back to this after the rosnav stuff. I just PR'd thinking a 30sec fix would be helpful
If we change the approach away from copies, I'm guessing we'd probably also want to redo these existing lines in GraphLCMSpy.
while not self.graph_log_stop_event.is_set():
self.update_graphs(self.config.graph_log_window) # type: ignore[attr-defined]
# Copy to list to avoid RuntimeError: dictionary changed size during iteration
for topic in list(self.topic.values()):
topic.update_graphs(self.config.graph_log_window) # type: ignore[attr-defined]
time.sleep(self.config.graph_log_window) # type: ignore[attr-defined]mypy in that file could also use help
jeff/fix/lcmspy2 has changes for a proper fix (except for GraphLCMSpy), but I don't want to get too invested (that branch is untested)
Problem
lcmspy will eventually crash with an error about mutating a value while iterating
Solution
Copy the list before iterating
Breaking Changes
None
How to Test
lcmspy(run for a long time while a bunch of topics publish)Contributor License Agreement