perf: Logical replication perf experiments#667
perf: Logical replication perf experiments#667bdewilde wants to merge 7 commits intoMeltanoLabs:mainfrom
Conversation
|
Hi @bdewilde, thanks for taking the time to post this!
I'm not sure what this refers to. Is there a specific location in the code where you see this happening?
I'm reading through the performance docs1 and I don't see anything obvious we're missing other than using The serialization itself with msgspec should be considerably faster even without using Structs, so I think we're doing something wrong elsewhere. I know try/except blocks inside loops can be expensive, and Footnotes |
Sorry about the confusion! This was in reference to an earlier comment in the PR about apparent perf bottlenecks:
I'll try to take a peek at the sdk's usage of msgspec, thanks for the link to perf tips 👍 |
changes
msgspecas package dependency and setsMsgSpecWriteras tap's message writer classcontext
Diving deeper into the code wrt issue #587
These changes may or may not be of interest for merging; they're mostly just a demonstration of what I found.
details
All together, these changes reduced runtimes for my example E+L pipeline from ~112s to ~107s. Not very impressive, I know. Here's the
py-spyflame graph based on this repo's main branch:And here's the (very similar) equivalent based on the latest version of my branch:
My changes only touch a small part of the tap's call stack:
Unfortunately, I wasn't able to do anything about the slowest parts of the tap:
For what it's worth, here is the corresponding functionality in the pipelinewise variant's code. There's a lot of additional logic there, but I had a hard time understanding the key differences with this variant.
questions
msgspec?