Batch ContentSummaryLog queries in _consolidate_courses_data#14444
Batch ContentSummaryLog queries in _consolidate_courses_data#14444devswithme wants to merge 1 commit into
Conversation
|
Hi @devswithme - could you open an issue for this so we can discuss further before jumping to a solution? I agree with the assessment that there is some potential for optimization here, but I'd rather plan it out a bit more carefully. |
|
Yes can, Thank you @rtibbles. |
|
📢✨ Before we assign a reviewer, we'll turn on |
rtibblesbot
left a comment
There was a problem hiding this comment.
Clean optimization with correct semantics. CI passing. No UI files changed.
- suggestion: The cross-course batching has no multi-course progress test — see inline.
- praise: Empty-set guard and deduplication — see inline.
@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly
How was this generated?
Reviewed the pull request diff checking for:
- Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
- Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
- Architecture: duplicated concerns, minimal interfaces, composition over inheritance
- Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
- Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
- Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
- Checked CI status and linked issue acceptance criteria
- For UI changes: inspected screenshots for layout, visual completeness, and consistency
| ) | ||
|
|
||
| total_content = content_qs.count() | ||
| all_content_ids = {cid for ids in course_content_map.values() for cid in ids} |
There was a problem hiding this comment.
suggestion: all_content_ids merges content across all N courses, but the test suite only has a single-course progress test (test_learner_course_progress_calculated_correctly). The multi-course batching path — where two courses share the same completed_ids set and each gets its own per-course intersection — has no direct test. Consider a test with 2 courses having different completion states to confirm the batched lookup produces the same per-course progress values as the old per-course queries did.
| total_content = content_qs.count() | ||
| all_content_ids = {cid for ids in course_content_map.values() for cid in ids} | ||
| completed_ids = set() | ||
| if all_content_ids: |
There was a problem hiding this comment.
praise: Good guard — skips the ContentSummaryLog query entirely when no courses have any content. Also worth noting: using set() on the query result means the new code correctly handles the edge case where ContentSummaryLog has multiple rows for the same (user, content_id) pair (the model has no unique constraint). The old .count() approach would have overcounted in that scenario; the new approach counts each logical completion exactly once.
Summary
Reduces database queries in the learn plugin at
_consolidate_courses_datafrom 2N to N+1, where N is the number of courses a learner is enrolled in. Previously, for each course node the function issued two separate queries: oneCOUNTfor non-topic descendants and oneCOUNTonContentSummaryLogfor completed content. On a learner's home page or course list with 10 enrolled courses, this produced 20 queries.References
N/A — standalone performance improvement, no related issue.
Reviewer guidance
_consolidate_courses_datainkolibri/plugins/learn/viewsets.py.pytest kolibri/plugins/learn/test/test_learner_course.pyandpytest kolibri/plugins/learn/test/test_learner_classroom.pytoconfirm all existing tests pass.
AI usage
I used Claude to identify the N+1 query pattern and propose the batching approach and reviewed the generated code for correctness specifically verifying UUID type consistency between
ContentNodeandContentSummaryLog, the double-iteration of thecourse_nodesquery set, and the empty-set guard and confirmed the change produces identical output across all existing test cases before accepting it.