perf: skip fragment splitting for single-fragment molecules in MolToSmiles#6
Open
perf: skip fragment splitting for single-fragment molecules in MolToSmiles#6
Conversation
…lecules For the common case of single-fragment molecules, avoid the full getMolFrags machinery (which copies and splits) and instead make a single RWMol copy. Also extract fragment processing into a shared helper and use ostringstream for SMILES construction.
== /workdir/tmp/bench_results/master.txt → /workdir/tmp/bench_results/opt-smiles-write.txt == Benchmark Old (mean±sd) New (mean±sd) Δ% Speed Sig(p) ------------------------------------------------------------------------------------------------------------------------------- bench_common::nth_random 0.548 ns ± 0.00752 ns 0.548 ns ± 0.00368 ns -0.1% 1.00× faster ns p=0.944 Chirality::findPotentialStereo 587 us ± 34.9 us 597 us ± 36.7 us +1.6% 1.02× slower ns p=0.741 CIPLabeler::assignCIPLabels 266 ms ± 816 us 264 ms ± 929 us -0.7% 1.01× faster below p=0.00551 Descriptors::calcNumBridgeheadAtoms 1.03 us ± 25.9 ns 1.05 us ± 3.36 ns +1.9% 1.02× slower ns p=0.188 Descriptors::calcNumSpiroAtoms 1.22 us ± 21.1 ns 1.25 us ± 8.36 ns +2.5% 1.03× slower sig p=0.0179 InchiToInchiKey 45.7 us ± 6.16 us 35.6 us ± 4.58 us -22.0% 1.28× faster sig p=0.0235 InchiToMol 4.24 ms ± 96 us 4.18 ms ± 42.4 us -1.3% 1.01× faster ns p=0.359 memory pressure test 10 us ± 1.35 us 11.1 us ± 412 ns +11.1% 1.11× slower ns p=0.174 MolOps::addHs 419 us ± 1.16 us 419 us ± 7.86 us -0.1% 1.00× faster ns p=0.932 MolOps::assignStereochemistry legacy=false 734 us ± 25 us 760 us ± 22.4 us +3.5% 1.04× slower ns p=0.184 MolOps::assignStereochemistry legacy=true 474 us ± 8.14 us 486 us ± 16.9 us +2.5% 1.02× slower ns p=0.277 MolOps::FindSSR 192 us ± 22.1 us 174 us ± 7.4 us -9.4% 1.10× faster ns p=0.179 MolOps::getMolFrags 1.07 ms ± 12.8 us 1.09 ms ± 43 us +2.1% 1.02× slower ns p=0.381 MolPickler::molFromPickle 173 us ± 8.94 us 168 us ± 11.8 us -2.6% 1.03× faster ns p=0.602 MolPickler::pickleMol 108 us ± 6.56 us 107 us ± 7.3 us -1.1% 1.01× faster ns p=0.833 MolToCXSmiles 1.28 ms ± 35.4 us 1.23 ms ± 38.6 us -4.0% 1.04× faster ns p=0.0905 MolToInchi 2.06 ms ± 31.3 us 2.05 ms ± 46.8 us -0.2% 1.00× faster ns p=0.896 MolToSmiles 963 us ± 36.9 us 964 us ± 48.5 us +0.1% 1.00× slower ns p=0.979 MorganFingerprints::getFingerprint 185 us ± 628 ns 198 us ± 10.4 us +6.6% 1.07× slower sig p=0.0423 ROMol copy constructor 43.6 us ± 592 ns 42.5 us ± 841 ns -2.5% 1.03× faster ns p=0.0717 ROMol destructor 17.3 us ± 92.1 ns 17.5 us ± 264 ns +1.1% 1.01× slower ns p=0.241 ROMol::getNumHeavyAtoms 107 ns ± 4.37 ns 106 ns ± 1.95 ns -1.6% 1.02× faster ns p=0.528 ROMol::GetSubstructMatch 32.5 us ± 417 ns 32.5 us ± 419 ns +0.2% 1.00× slower ns p=0.863 ROMol::GetSubstructMatch patty 1.14 ms ± 39.2 us 1.13 ms ± 21.5 us -0.8% 1.01× faster ns p=0.739 ROMol::GetSubstructMatch RLewis 7 ms ± 12.5 us 7.04 ms ± 71.4 us +0.6% 1.01× slower ns p=0.333 SmilesToMol 1.25 ms ± 43.3 us 1.2 ms ± 23.8 us -3.9% 1.04× faster ns p=0.0893 Summary: sig=3, below=1 (stat-sig but < threshold), ns=22, missing=0, new_only=0
f6c0e65 to
26fba03
Compare
== /home/ubuntu/rdkits/tmp/bench_results/master.txt → /home/ubuntu/rdkits/tmp/bench_results/opt-smiles-write.txt == Benchmark Old (mean±sd) New (mean±sd) Δ% Speed Sig(p) --------------------------------------------------------------------------------------------------------------------- bench_common::nth_random 1.67 ns ± 0.566 ns 1.6 ns ± 0.492 ns -3.8% 1.04× faster ns p=0.851 Chirality::findPotentialStereo 1.76 ms ± 315 us 1.71 ms ± 307 us -2.9% 1.03× faster ns p=0.793 CIPLabeler::assignCIPLabels 585 ms ± 9.24 ms 587 ms ± 7.72 ms +0.2% 1.00× slower ns p=0.792 Descriptors::calcNumBridgeheadAtoms 4.7 us ± 1.6 us 6.73 us ± 1.62 us +43.3% 1.43× slower sig p=0.0454 Descriptors::calcNumSpiroAtoms 6.3 us ± 2.35 us 5.49 us ± 1.9 us -12.9% 1.15× faster ns p=0.545 InchiToInchiKey 64.8 us ± 21.6 us 57.6 us ± 16.5 us -11.0% 1.12× faster ns p=0.556 InchiToMol 11.7 ms ± 493 us 11.7 ms ± 515 us -0.3% 1.00× faster ns p=0.915 memory pressure test 23.2 us ± 7.1 us 25.2 us ± 7.6 us +8.5% 1.08× slower ns p=0.672 MolOps::addHs 923 us ± 111 us 992 us ± 25.4 us +7.5% 1.07× slower ns p=0.176 MolOps::assignStereochemistry legacy=false 2.1 ms ± 266 us 2.34 ms ± 325 us +11.4% 1.11× slower ns p=0.202 MolOps::assignStereochemistry legacy=true 1.14 ms ± 153 us 1.18 ms ± 206 us +3.4% 1.03× slower ns p=0.732 MolOps::FindSSR 418 us ± 68.5 us 386 us ± 59.3 us -7.6% 1.08× faster ns p=0.43 MolOps::getMolFrags 2.64 ms ± 377 us 2.53 ms ± 301 us -3.9% 1.04× faster ns p=0.635 MolPickler::molFromPickle 329 us ± 67.4 us 346 us ± 62.7 us +5.0% 1.05× slower ns p=0.689 MolPickler::pickleMol 280 us ± 55.3 us 309 us ± 68.3 us +10.6% 1.11× slower ns p=0.453 MolToCXSmiles 2.79 ms ± 247 us 2.59 ms ± 234 us -7.3% 1.08× faster ns p=0.18 MolToInchi 6.27 ms ± 599 us 6.5 ms ± 506 us +3.7% 1.04× slower ns p=0.512 MolToSmiles 2.16 ms ± 230 us 2.16 ms ± 229 us +0.3% 1.00× slower ns p=0.968 MorganFingerprints::getFingerprint 778 us ± 122 us 744 us ± 120 us -4.4% 1.05× faster ns p=0.657 ROMol copy constructor 220 us ± 19.7 us 202 us ± 35.3 us -8.1% 1.09× faster ns p=0.321 ROMol destructor 79.3 us ± 19.7 us 74.1 us ± 17.8 us -6.5% 1.07× faster ns p=0.662 ROMol::getNumHeavyAtoms 374 ns ± 69.4 ns 394 ns ± 72 ns +5.3% 1.05× slower ns p=0.658 ROMol::GetSubstructMatch 138 us ± 33 us 130 us ± 34.3 us -5.8% 1.06× faster ns p=0.709 ROMol::GetSubstructMatch patty 3.67 ms ± 576 us 3.59 ms ± 430 us -2.2% 1.02× faster ns p=0.8 ROMol::GetSubstructMatch RLewis 24.6 ms ± 1.25 ms 23.5 ms ± 584 us -4.5% 1.05× faster ns p=0.069 SmilesToMol 3.75 ms ± 176 us 3.31 ms ± 387 us -11.6% 1.13× faster sig p=0.0223 Summary: sig=2, below=0 (stat-sig but < threshold), ns=24, missing=0, new_only=0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
summary by opus-4.6 via opencode:
Add a fast path in
MolToSmilesthat skips the expensivegetMolFrags()call when the molecule has only one fragment (the common case). Fragment count is checked via a cheap BFS-based connected component count.Benchmark (5 rounds × 100 samples, interleaved):
No significant impact on MolToSmiles (+0.3%). Significant results in unrelated benchmarks only — noise.
Benchmark Results (vs master)
Summary: sig=2, below=0 (stat-sig but < threshold), ns=24, missing=0, new_only=0