Skip to content

Conversation

@phernandez
Copy link
Member

Summary

  • map tag:<value> query prefix to tags metadata filter so tag searches work again
  • split multiple tags by commas/whitespace and drop text search when tag filter is applied
  • add positive and negative tests covering tag shorthand and multi-tag behavior across backends

Testing

  • uv run pytest tests/services/test_search_service.py -k "tag_prefix" -q

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2026

Claude finished @phernandez's task in 3m 37s —— View job


Code Review: Support tag: query shorthand in search

I've completed a thorough review of this PR against the Basic Memory team checklist.


Summary

This PR implements tag:<value> query shorthand for tag filtering in search. The implementation is functionally sound and well-tested for the happy path. However, it needs improvements to meet the project's literate programming standards and documentation requirements.

Files changed:

  • search_service.py (+11 lines) - Core implementation
  • test_search_service.py (+75/-9 lines) - Test coverage

Team Checklist Results

✅ Code Quality & Standards

  • ✅ Python 3.12+ type annotations maintained
  • ✅ 100-character line length limit
  • ✅ Follows async patterns
  • ✅ Proper naming conventions
  • ✅ No commented-out code

❌ Missing: Literate programming comments

The new code at search_service.py:84-91 lacks decision point comments required by CLAUDE.md. Per the guidelines, conditionals that materially change behavior need:

  • Trigger: what condition causes this branch
  • Why: the rationale
  • Outcome: what changes downstream

Add comments here →

⚠️ Testing & Documentation

  • ✅ Unit tests added (3 tests covering positive, negative, multi-tag cases)
  • ✅ Test coverage appears maintained
  • Missing: User-facing documentation - The tag: syntax isn't documented anywhere
  • ⚠️ Missing edge case tests:
    • Empty tag value: tag:
    • Uppercase prefix: TAG:tier1
    • Mixed delimiters: tag:tier1, alpha beta
    • Special characters in tags: tag:my-tag_v2.1

Add edge case tests →

✅ Basic Memory Architecture

  • ✅ Uses existing metadata filter infrastructure
  • ✅ Preserves local-first architecture
  • ✅ Knowledge graph consistency maintained
  • ✅ No database schema changes needed

✅ Security & Performance

  • ✅ Input validation via safe regex
  • ✅ No hardcoded secrets
  • ✅ Error handling inherited from existing search
  • ⚠️ Minor regex overhead on every text query (likely negligible)

✅ Compatibility

  • ✅ Windows path compatibility not applicable
  • ✅ No emoji/unicode in console output

Detailed Issues & Questions

1. Input Mutation (search_service.py:90-91)

The code mutates the input query object:

query.tags = tags
query.text = None

Question: Is this mutation intentional? If the caller reuses the SearchQuery object, they'll get unexpected behavior on subsequent calls.

2. Text Search Dropped (search_service.py:91)

Setting query.text = None completely removes text search when tag filter is applied.

Questions:

  • Why is text search dropped? The PR description states this happens but doesn't explain the rationale
  • Should combined queries work? e.g., tag:tier1 important to find tagged entities also containing "important"
  • This should be documented in code comments explaining the design decision

3. Delimiter Ambiguity (search_service.py:87)

The regex r"[,\s]+" treats both commas AND whitespace as tag separators:

  • tag:tier1,alpha["tier1", "alpha"]
  • tag:tier1 alpha["tier1", "alpha"]
  • tag:tier1, alpha["tier1", "alpha"]

Question: Is whitespace as a separator intentional or just a side effect? Should be documented.

4. Case Sensitivity (search_service.py:86)

The prefix check is case-insensitive (text.lower().startswith("tag:")), but:

Question: Are tag values case-sensitive? Does tag:Tier1 match an entity with tier1? Tests don't verify this.

5. Test Assertions Could Be Stronger

Test at line 534 verifies results but doesn't verify the query transformation:

results = await search_service.search(SearchQuery(text="tag:tier1"))
assert any(r.permalink == tagged_entity.permalink for r in results)

Suggestion: Also assert that query.tags == ["tier1"] and query.text is None to verify the transformation actually occurred.


Recommendations

🔴 High Priority (Should address before merge)

  1. Add decision point comments per CLAUDE.md literate programming style - explains trigger, why, outcome
  2. Document the feature - add to search service docstring, MCP tool docs, or user guide
  3. Explain text search behavior - add comment explaining why query.text = None

🟡 Medium Priority (Recommended)

  1. Add edge case tests - empty values, uppercase, special chars, mixed delimiters
  2. Clarify case sensitivity - document or test tag value case handling
  3. Verify test coverage - run just coverage to confirm 100% coverage maintained

🟢 Low Priority (Nice to have)

  1. Document delimiter behavior - clarify that both , and whitespace work as separators
  2. Consider combined queries - decide if tag:tier1 searchterm should work
  3. Test assertion improvements - verify query transformation, not just results

Final Assessment

Functionality: ✅ Works as described
Code Quality: ⚠️ Missing literate programming comments (CLAUDE.md requirement)
Testing: ⚠️ Happy path covered, edge cases missing
Documentation: ❌ Feature not documented for users
Architecture: ✅ Follows project patterns

Recommendation: The core implementation is solid and functional. To meet Basic Memory's standards, address the high-priority items (literate programming comments and documentation) before merge. The edge case tests and other medium-priority items would strengthen the implementation but aren't blocking.


Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2026

Claude finished @phernandez's task in 2m 18s —— View job


Code Review: Support tag: query shorthand in search

I've completed a thorough review of this PR against the Basic Memory team checklist.


Summary

This PR implements tag:<value> query shorthand for tag filtering in search. The implementation is functionally sound with good test coverage for the happy path. However, it needs improvements to meet the project's literate programming standards and has a few design questions that should be addressed.

Files changed:

  • search_service.py:84-91 (+11 lines) - Core tag: prefix parsing
  • postgres_search_repository.py:335 (+1/-1 line) - Postgres CAST syntax fix
  • test_search_service.py (+75/-9 lines) - Test coverage

Team Checklist Results

✅ Code Quality & Standards

  • ✅ Python 3.12+ type annotations maintained
  • ✅ 100-character line length limit
  • ✅ Follows async patterns
  • ✅ Proper naming conventions
  • ✅ No commented-out code
  • ✅ Import order correct (added re module)

❌ Missing: Literate programming comments

The new code at search_service.py:84-91 lacks decision point comments required by CLAUDE.md. Per the guidelines, conditionals that materially change behavior need:

  • Trigger: what condition causes this branch
  • Why: the rationale
  • Outcome: what changes downstream

Example of what's needed:

# Support tag:<tag> shorthand by mapping to tags filter
if query.text:
    text = query.text.strip()
    # Trigger: query starts with "tag:" prefix (case-insensitive)
    # Why: provide convenient shorthand for tag filtering without verbose metadata syntax
    # Outcome: transforms to tags filter, drops text search to avoid empty results
    if text.lower().startswith("tag:"):
        # Split on both commas and whitespace to support: tag:tier1,alpha or tag:tier1 alpha
        tag_values = re.split(r"[,\s]+", text[4:].strip())
        tags = [t for t in tag_values if t]
        if tags:
            query.tags = tags
            # Outcome: text search is removed since tag prefix consumed entire query
            query.text = None

Add literate programming comments →


⚠️ Testing & Documentation

Testing:

  • ✅ Unit tests added (3 tests covering positive, negative, multi-tag cases)
  • ✅ Tests use descriptive names and docstrings
  • ✅ Test coverage appears maintained at 100%
  • ⚠️ Missing edge case tests:
    • Empty tag value: tag: (what happens?)
    • Uppercase prefix: TAG:tier1 (works due to .lower() but not verified)
    • Mixed delimiters: tag:tier1, alpha beta (regex handles but not tested)
    • Special characters in tags: tag:my-tag_v2.1 (common in real use)
    • Whitespace variations: tag: tier1, tag:tier1 (strip handles but not tested)

Documentation:

  • Missing: User-facing documentation - The tag: syntax isn't documented anywhere users would find it:
    • Not in MCP tool docstrings (where search is exposed to LLMs)
    • Not in CLAUDE.md MCP capabilities section
    • Not in service-level docstring for SearchService.search()

Suggestions:

  1. Add to SearchService.search() docstring:

    """Search across all indexed content.
    
    Supports three modes:
    1. Exact permalink: finds direct matches for a specific path
    2. Pattern match: handles * wildcards in paths
    3. Text search: full-text search across title/content
       - Supports tag: shorthand (e.g., "tag:tier1" or "tag:tier1,alpha")
  2. Update CLAUDE.md's "Search & Discovery" section to mention tag: syntax

Add edge case tests →


✅ Basic Memory Architecture

  • ✅ Uses existing metadata filter infrastructure (no schema changes)
  • ✅ Preserves local-first architecture
  • ✅ Knowledge graph consistency maintained
  • ✅ Follows service layer pattern
  • ✅ Postgres fix (CAST syntax) maintains cross-DB compatibility

✅ Security & Performance

  • ✅ Input validation via safe regex (no eval/exec)
  • ✅ No hardcoded secrets
  • ✅ Error handling inherited from existing search
  • ✅ No SQL injection risk (uses existing parameterized queries)
  • ⚠️ Minor regex overhead on every text query (likely negligible but runs on every search)

✅ Compatibility

  • ✅ Windows path compatibility not applicable
  • ✅ No emoji/unicode in console output
  • ✅ Cross-database support maintained (works on both SQLite and Postgres)

Detailed Issues & Questions

1. Input Mutation (search_service.py:90-91)

The code mutates the input query object:

query.tags = tags
query.text = None

Issue: If the caller reuses the SearchQuery object, they'll get unexpected behavior on subsequent calls.

Question: Is this mutation intentional? SearchQuery is a Pydantic schema - mutating input objects can lead to surprising behavior.

Recommendation: Either:

  • Document this mutation behavior clearly in comments/docstring
  • OR create a new query object (safer): query = query.model_copy(update={'tags': tags, 'text': None})

2. Text Search Dropped (search_service.py:91)

Setting query.text = None completely removes text search when tag filter is applied.

Questions:

  • Why is text search dropped? The PR description states this happens but doesn't explain the rationale
  • Should combined queries work? e.g., tag:tier1 important to find tagged entities also containing "important"
  • This design decision should be documented in code comments

Current behavior:

  • tag:tier1 important → searches for tags ["tier1", "important"] (treats "important" as a tag, not search text)

Possible alternative behavior:

  • tag:tier1 important → searches for tags=["tier1"] AND text="important"

Recommendation: Add a comment explaining why text is dropped and whether this is the desired long-term behavior.


3. Delimiter Ambiguity (search_service.py:87)

The regex r"[,\s]+" treats both commas AND whitespace as tag separators:

  • tag:tier1,alpha["tier1", "alpha"]
  • tag:tier1 alpha["tier1", "alpha"]
  • tag:tier1, alpha["tier1", "alpha"]

Questions:

  • Is whitespace as a separator intentional or just a side effect?
  • Should tag:tier1 alpha be one tag with space ("tier1 alpha") or two tags?
  • If intentional, should be documented

Recommendation: Add comment documenting delimiter behavior:

# Split on both commas and whitespace: "tag:tier1,alpha" or "tag:tier1 alpha" both work
tag_values = re.split(r"[,\s]+", text[4:].strip())

4. Case Sensitivity (search_service.py:86)

The prefix check is case-insensitive (text.lower().startswith("tag:")):

  • tag:tier1
  • TAG:tier1
  • Tag:tier1

Questions:

  • Are tag values case-sensitive?
  • Does tag:Tier1 match an entity with tier1 in frontmatter?
  • Should tag values be normalized to lowercase?

Recommendation: Add test verifying case handling for both prefix and tag values.


5. Test Assertions Could Be Stronger

Test at line 534 verifies results but doesn't verify the query transformation:

results = await search_service.search(SearchQuery(text="tag:tier1"))
assert any(r.permalink == tagged_entity.permalink for r in results)

Suggestion: Also assert that the transformation actually occurred:

query = SearchQuery(text="tag:tier1")
results = await search_service.search(query)
assert query.tags == ["tier1"], "tag: prefix should set tags filter"
assert query.text is None, "tag: prefix should clear text search"
assert any(r.permalink == tagged_entity.permalink for r in results)

6. Postgres CAST Fix (postgres_search_repository.py:335)

Changed :{tag_param}::jsonb to CAST(:{tag_param} AS jsonb).

Question: Was the :: syntax causing issues with SQLAlchemy parameter binding? This is a good fix for compatibility, but understanding why is helpful.

Note: The CAST() syntax is ANSI SQL standard and more portable than Postgres-specific :: casting.


Recommendations

🔴 High Priority (Should address before merge)

  1. Add decision point comments per CLAUDE.md literate programming style
  2. Document the feature - add to search service docstring and CLAUDE.md
  3. Explain text search behavior - add comment explaining why query.text = None
  4. Document delimiter behavior - clarify that both , and whitespace work as separators

🟡 Medium Priority (Recommended)

  1. Add edge case tests - empty values, uppercase, special chars, mixed delimiters
  2. Clarify case sensitivity - document or test tag value case handling
  3. Consider query mutation - document or avoid mutating input object
  4. Verify test coverage - run just coverage to confirm 100% maintained

🟢 Low Priority (Nice to have)

  1. Test assertion improvements - verify query transformation, not just results
  2. Consider combined queries - decide if tag:tier1 searchterm should work as tag filter + text search

Final Assessment

Functionality: ✅ Works as described
Code Quality: ⚠️ Missing literate programming comments (CLAUDE.md requirement)
Testing: ⚠️ Happy path covered, edge cases missing
Documentation: ❌ Feature not documented for users
Architecture: ✅ Follows project patterns
Security: ✅ No issues identified
Performance: ✅ Negligible impact

Recommendation: The core implementation is solid and functional. To meet Basic Memory's standards, address the high-priority items (literate programming comments and documentation) before merge. The medium-priority items would strengthen the implementation but aren't blocking.

The implementation follows existing patterns well and the Postgres fix is a good catch. With the additions above, this will be a clean, well-documented feature.


| Branch: fix/tag-search-shorthand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants