Skip to content

Batch ContentSummaryLog queries in _consolidate_courses_data#14444

Open
devswithme wants to merge 1 commit into
learningequality:developfrom
devswithme:optimize/batch-course-progress-summary-queries
Open

Batch ContentSummaryLog queries in _consolidate_courses_data#14444
devswithme wants to merge 1 commit into
learningequality:developfrom
devswithme:optimize/batch-course-progress-summary-queries

Conversation

@devswithme
Copy link
Copy Markdown

Summary

Reduces database queries in the learn plugin at _consolidate_courses_data from 2N to N+1, where N is the number of courses a learner is enrolled in. Previously, for each course node the function issued two separate queries: one COUNT for non-topic descendants and one COUNT on ContentSummaryLog for completed content. On a learner's home page or course list with 10 enrolled courses, this produced 20 queries.

References

N/A — standalone performance improvement, no related issue.

Reviewer guidance

  • The only changed function is _consolidate_courses_data in
    kolibri/plugins/learn/viewsets.py.
  • Run pytest kolibri/plugins/learn/test/test_learner_course.py and
    pytest kolibri/plugins/learn/test/test_learner_classroom.py to
    confirm all existing tests pass.
  • No risky areas: logic is equivalent, only query count changes.

AI usage

I used Claude to identify the N+1 query pattern and propose the batching approach and reviewed the generated code for correctness specifically verifying UUID type consistency between ContentNode and ContentSummaryLog, the double-iteration of the course_nodes query set, and the empty-set guard and confirmed the change produces identical output across all existing test cases before accepting it.

@rtibbles
Copy link
Copy Markdown
Member

Hi @devswithme - could you open an issue for this so we can discuss further before jumping to a solution? I agree with the assessment that there is some potential for optimization here, but I'd rather plan it out a bit more carefully.

@devswithme
Copy link
Copy Markdown
Author

Yes can, Thank you @rtibbles.

@learning-equality-bot
Copy link
Copy Markdown

📢✨ Before we assign a reviewer, we'll turn on @rtibblesbot to pre-review. Its comments are generated by an LLM, and should be evaluated accordingly.

Copy link
Copy Markdown
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean optimization with correct semantics. CI passing. No UI files changed.

  • suggestion: The cross-course batching has no multi-course progress test — see inline.
  • praise: Empty-set guard and deduplication — see inline.

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Reviewed the pull request diff checking for:

  • Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
  • Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
  • Architecture: duplicated concerns, minimal interfaces, composition over inheritance
  • Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
  • Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
  • Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
  • Checked CI status and linked issue acceptance criteria
  • For UI changes: inspected screenshots for layout, visual completeness, and consistency

)

total_content = content_qs.count()
all_content_ids = {cid for ids in course_content_map.values() for cid in ids}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: all_content_ids merges content across all N courses, but the test suite only has a single-course progress test (test_learner_course_progress_calculated_correctly). The multi-course batching path — where two courses share the same completed_ids set and each gets its own per-course intersection — has no direct test. Consider a test with 2 courses having different completion states to confirm the batched lookup produces the same per-course progress values as the old per-course queries did.

total_content = content_qs.count()
all_content_ids = {cid for ids in course_content_map.values() for cid in ids}
completed_ids = set()
if all_content_ids:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Good guard — skips the ContentSummaryLog query entirely when no courses have any content. Also worth noting: using set() on the query result means the new code correctly handles the edge case where ContentSummaryLog has multiple rows for the same (user, content_id) pair (the model has no unique constraint). The old .count() approach would have overcounted in that scenario; the new approach counts each logical completion exactly once.

@akolson akolson requested a review from rtibbles May 4, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants