Batch ContentSummaryLog queries in _consolidate_courses_data by devswithme · Pull Request #14444 · learningequality/kolibri

devswithme · 2026-03-25T08:59:23Z

Summary

Reduces database queries in the learn plugin at _consolidate_courses_data from 2N to N+1, where N is the number of courses a learner is enrolled in. Previously, for each course node the function issued two separate queries: one COUNT for non-topic descendants and one COUNT on ContentSummaryLog for completed content. On a learner's home page or course list with 10 enrolled courses, this produced 20 queries.

References

N/A — standalone performance improvement, no related issue.

Reviewer guidance

The only changed function is _consolidate_courses_data in
kolibri/plugins/learn/viewsets.py.
Run pytest kolibri/plugins/learn/test/test_learner_course.py and
pytest kolibri/plugins/learn/test/test_learner_classroom.py to
confirm all existing tests pass.
No risky areas: logic is equivalent, only query count changes.

AI usage

I used Claude to identify the N+1 query pattern and propose the batching approach and reviewed the generated code for correctness specifically verifying UUID type consistency between ContentNode and ContentSummaryLog, the double-iteration of the course_nodes query set, and the empty-set guard and confirmed the change produces identical output across all existing test cases before accepting it.

rtibbles · 2026-03-25T15:56:10Z

Hi @devswithme - could you open an issue for this so we can discuss further before jumping to a solution? I agree with the assessment that there is some potential for optimization here, but I'd rather plan it out a bit more carefully.

devswithme · 2026-03-26T01:24:40Z

Yes can, Thank you @rtibbles.

learning-equality-bot · 2026-04-22T14:29:42Z

📢✨ Before we assign a reviewer, we'll turn on @rtibblesbot to pre-review. Its comments are generated by an LLM, and should be evaluated accordingly.

rtibblesbot

Clean optimization with correct semantics. CI passing. No UI files changed.

suggestion: The cross-course batching has no multi-course progress test — see inline.
praise: Empty-set guard and deduplication — see inline.

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Reviewed the pull request diff checking for:

Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
Architecture: duplicated concerns, minimal interfaces, composition over inheritance
Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
Checked CI status and linked issue acceptance criteria
For UI changes: inspected screenshots for layout, visual completeness, and consistency

rtibblesbot · 2026-04-22T14:35:21Z

        )

-        total_content = content_qs.count()
+    all_content_ids = {cid for ids in course_content_map.values() for cid in ids}


suggestion: all_content_ids merges content across all N courses, but the test suite only has a single-course progress test (test_learner_course_progress_calculated_correctly). The multi-course batching path — where two courses share the same completed_ids set and each gets its own per-course intersection — has no direct test. Consider a test with 2 courses having different completion states to confirm the batched lookup produces the same per-course progress values as the old per-course queries did.

rtibblesbot · 2026-04-22T14:35:21Z

-        total_content = content_qs.count()
+    all_content_ids = {cid for ids in course_content_map.values() for cid in ids}
+    completed_ids = set()
+    if all_content_ids:


praise: Good guard — skips the ContentSummaryLog query entirely when no courses have any content. Also worth noting: using set() on the query result means the new code correctly handles the edge case where ContentSummaryLog has multiple rows for the same (user, content_id) pair (the model has no unique constraint). The old .count() approach would have overcounted in that scenario; the new approach counts each logical completion exactly once.

Batch ContentSummaryLog queries in _consolidate_courses_data

2afebc7

devswithme mentioned this pull request Mar 26, 2026

[Enhancement]: Batch ContentSummaryLog queries in _consolidate_courses_data #14453

Open

rtibbles self-assigned this Apr 21, 2026

rtibbles requested a review from rtibblesbot April 22, 2026 14:29

rtibblesbot reviewed Apr 22, 2026

View reviewed changes

akolson requested a review from rtibbles May 4, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch ContentSummaryLog queries in _consolidate_courses_data#14444

Batch ContentSummaryLog queries in _consolidate_courses_data#14444
devswithme wants to merge 1 commit into
learningequality:developfrom
devswithme:optimize/batch-course-progress-summary-queries

devswithme commented Mar 25, 2026

Uh oh!

rtibbles commented Mar 25, 2026

Uh oh!

devswithme commented Mar 26, 2026

Uh oh!

learning-equality-bot Bot commented Apr 22, 2026

Uh oh!

rtibblesbot left a comment

Uh oh!

rtibblesbot Apr 22, 2026

Uh oh!

rtibblesbot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

devswithme commented Mar 25, 2026

Summary

References

Reviewer guidance

AI usage

Uh oh!

rtibbles commented Mar 25, 2026

Uh oh!

devswithme commented Mar 26, 2026

Uh oh!

learning-equality-bot Bot commented Apr 22, 2026

Uh oh!

rtibblesbot left a comment

Choose a reason for hiding this comment

Uh oh!

rtibblesbot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

rtibblesbot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants