⚡ Bolt: [_distance_rmse 3D broadcasting 최적화]#52
Conversation
`python/fast_mlsirm/diagnostics.py`의 `_distance_rmse`에서 발생하는 거대한 중간 배열(O(N*J*D)) 할당 문제를 해결했습니다. 기존의 3D 배열 브로드캐스팅 방식 대신 `np.einsum`과 `np.dot`를 활용한 방식(O(N*J))으로 최적화하여 메모리 사용량을 줄이고 속도를 대폭 개선했습니다.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the _distance_rmse diagnostic by replacing a memory-heavy 3D broadcasting Euclidean distance computation with an algebraic expansion using np.einsum + np.dot, and documents the optimization in the Jules Bolt notes.
Changes:
- Replaced
((x[:, None, :] - y[None, :, :]) ** 2).sum(axis=2)broadcasting withx² + y² - 2xyusingeinsum/dotin_distance_rmse. - Added a non-negativity clamp before
sqrtto guard against small negative values from floating point roundoff. - Documented the pairwise-distance broadcasting optimization pattern in
.jules/bolt.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
python/fast_mlsirm/diagnostics.py |
Reworks _distance_rmse distance computation to avoid O(N*J*D) intermediate allocations. |
.jules/bolt.md |
Adds a note describing the pairwise-distance optimization approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Optimized distance calculation using einsum/dot to avoid O(N*J*D) intermediate 3D array | ||
| true_xi_sq = np.einsum('ij,ij->i', true_xi, true_xi) | ||
| true_zeta_sq = np.einsum('ij,ij->i', true_zeta, true_zeta) | ||
| true_d_sq = true_xi_sq[:, None] + true_zeta_sq[None, :] - 2 * np.dot(true_xi, true_zeta.T) | ||
| true_d = np.sqrt(np.maximum(true_d_sq, 0.0)) | ||
|
|
| est_xi_sq = np.einsum('ij,ij->i', est_xi, est_xi) | ||
| est_zeta_sq = np.einsum('ij,ij->i', est_zeta, est_zeta) | ||
| est_d_sq = est_xi_sq[:, None] + est_zeta_sq[None, :] - 2 * np.dot(est_xi, est_zeta.T) | ||
| est_d = np.sqrt(np.maximum(est_d_sq, 0.0)) | ||
|
|
There was a problem hiding this comment.
Pull request overview
OpenCode reviewed the current-head evidence but found unresolved reviewer or review-agent threads before approval.
Findings
1. HIGH .github/workflows/opencode-review.yml:1 - Unresolved reviewer thread blocks automated approval
- Problem: OpenCode reached an APPROVE control result, but the approval step found unresolved, non-outdated human or review-agent thread evidence on the current pull request.
- Root cause: Reviewer and review-agent feedback can arrive after bounded model evidence is prepared, so the approval step must re-query GitHub immediately before publishing an approval.
- Fix: Address or resolve the listed reviewer thread(s), then re-run OpenCode on the current head.
- Regression test: Keep the approval gate querying reviewThreads(first: 100) after model output and before create_pull_review APPROVE, including bot review agents other than OpenCode itself.
Review thread evidence
Latest unresolved reviewer thread evidence
python/fast_mlsirm/diagnostics.py line 738
- Latest reviewer comment: @copilot-pull-request-reviewer at 2026-07-01T18:41:24Z
- Comment URL: #52 (comment)
- Comment excerpt: 'np.maximum(true_d_sq, 0.0)' and 'np.sqrt(...)' both allocate new (N×J) arrays. Since this PR is targeting memory pressure, you can do the clamp and sqrt in-place via 'out=' and by reusing the 'np.dot' result to reduce peak memory.
python/fast_mlsirm/diagnostics.py line 743
-
Latest reviewer comment: @copilot-pull-request-reviewer at 2026-07-01T18:41:24Z
-
Comment URL: #52 (comment)
-
Comment excerpt: Same as the true-distance block: 'np.maximum(est_d_sq, 0.0)' and 'np.sqrt(...)' allocate additional (N×J) arrays. Reusing 'est_d_sq' in-place reduces peak memory for large N/J.
-
Result: REQUEST_CHANGES
-
Reason: unresolved reviewer or review-agent thread(s) were present before approval.
-
Head SHA:
13adeecfd1a0a27a9e10acdc2e481a92774f1147 -
Workflow run: 28539655343
-
Workflow attempt: 1
Changed-File Evidence Map
flowchart LR
PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
Evidence --> S1["Changed file (2 files)"]
S1 --> I1["repository behavior"]
I1 --> R1["Review risk: Changed file (2 files)"]
R1 --> V1["required checks"]
OpenCode Review Overview
Pull request overviewOpenCode reviewed the current-head evidence but found unresolved reviewer or review-agent threads before approval. Findings1. HIGH .github/workflows/opencode-review.yml:1 - Unresolved reviewer thread blocks automated approval
Review thread evidenceLatest unresolved reviewer thread evidence
|
💡 What:
_distance_rmse함수에서 유클리디안 거리를 계산할 때 사용하던 메모리 집약적인 3D 배열 브로드캐스팅 로직을np.einsum및np.dot을 사용한 방식으로 변경했습니다.🎯 Why:
((true_xi[:, None, :] - true_zeta[None, :, :]) ** 2)와 같은 3D 브로드캐스팅은O(N * J * D)의 거대한 메모리 할당을 유발하여 시스템 병목과 성능 저하의 주요 원인이 됩니다.📊 Impact:
🔬 Measurement:
python -m pytest tests를 실행하여 모든 테스트를 통과했으며 성능 측정을 완료했습니다.PR created automatically by Jules for task 5546963417972343883 started by @seonghobae