[feat]: implement custom hybrid search#109
Conversation
WalkthroughA new hybrid search capability for contributors was implemented in the Weaviate database operations. This includes an asynchronous method in the Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant WeaviateUserOperations
participant WeaviateDB
Caller->>WeaviateUserOperations: hybrid_search_contributors(query_embedding, keywords, ...)
WeaviateUserOperations->>WeaviateDB: vector_search(query_embedding, limit)
WeaviateDB-->>WeaviateUserOperations: vector_results
WeaviateUserOperations->>WeaviateDB: bm25_search(keywords, limit)
WeaviateDB-->>WeaviateUserOperations: bm25_results
WeaviateUserOperations->>WeaviateUserOperations: merge & score results
WeaviateUserOperations-->>Caller: top contributors list
Caller->>search_contributors: (calls hybrid_search_contributors internally)
Estimated code review effort2 (~15 minutes) Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/database/weaviate/operations.py (1)
216-223: Consider adding weight validation for hybrid scoring.The hybrid search implementation looks good, but consider validating that the weights sum to 1.0 for proper score normalization.
async def hybrid_search_contributors( self, query_embedding: List[float], keywords: List[str], limit: int = 10, vector_weight: float = 0.7, bm25_weight: float = 0.3 ) -> List[Dict[str, Any]]: """ Hybrid search combining vector similarity and BM25 keyword search. """ + if abs(vector_weight + bm25_weight - 1.0) > 0.001: + logger.warning(f"Hybrid search weights don't sum to 1.0: {vector_weight + bm25_weight}") + try:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/database/weaviate/__init__.py(2 hunks)backend/app/database/weaviate/operations.py(2 hunks)
🧠 Learnings (1)
📓 Common learnings
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.
🔇 Additional comments (2)
backend/app/database/weaviate/__init__.py (1)
6-6: LGTM! Clean module export addition.The new
search_contributorsfunction is properly imported and exported, making the hybrid search functionality accessible to external modules.Also applies to: 17-17
backend/app/database/weaviate/operations.py (1)
360-373: LGTM! Clean convenience function implementation.The function follows the established pattern of other convenience functions in the file and properly delegates to the class method.
887679a to
43d479c
Compare
Summary by CodeRabbit