Query Best Practices

Optimize your queries for speed, accuracy, and cost-effectiveness.

Quick Wins

1. Choose the Right Mode

Don't always use mix mode:

// ❌ BAD - Slow and expensive for simple queries
await query({ query: 'Alice', mode: 'mix' });

Do match mode to query type:

// ✅ GOOD - Fast for entity lookups
await query({ query: 'Alice', mode: 'local' });

// ✅ GOOD - Fast for keywords
await query({ query: 'API refactor', mode: 'naive' });

// ✅ GOOD - Use mix for complex questions
await query({
  query: 'What did Alice say about the API refactor last week?',
  mode: 'mix',
});

2. Adjust `top_k` Based on Needs

Don't always use the maximum:

// ❌ BAD - Unnecessarily slow
await query({ query: 'test', top_k: 200 });

Do start small and increase if needed:

// ✅ GOOD - Fast for simple queries
await query({ query: 'test', top_k: 30 });

// ✅ GOOD - More thorough for complex queries
await query({
  query: 'complex question needing context',
  top_k: 100,
});

3. Cache Frequently Asked Questions

const cache = new Map();
const TTL = 5 * 60 * 1000; // 5 minutes

const cachedQuery = async (question: string) => {
  const cached = cache.get(question);
  if (cached && Date.now() - cached.timestamp < TTL) {
    return cached.results;
  }

  const results = await query(question);
  cache.set(question, { results, timestamp: Date.now() });
  return results;
};

4. Use Progressive Enhancement

Start fast, upgrade if needed:

const smartQuery = async (question: string) => {
  // Try fast mode first
  let results = await query({ query: question, mode: 'naive', top_k: 30 });

  // Check if results are good enough
  const avgScore = results.reduce((s, r) => s + r.score, 0) / results.length;

  // Upgrade to more accurate mode if needed
  if (avgScore < 0.7 || results.length < 5) {
    results = await query({ query: question, mode: 'mix', top_k: 60 });
  }

  return results;
};

Query Mode Selection

Decision Tree

Is it a simple keyword search?
└─ YES → Use "naive" mode (fastest)

Is it asking about a specific person/entity?
└─ YES → Use "local" mode (entity-focused)

Is it asking about relationships between things?
└─ YES → Use "global" mode (relationship-focused)

Is it a moderately complex question?
└─ YES → Use "hybrid" mode (balanced)

Is accuracy critical and cost/latency acceptable?
└─ YES → Use "mix" mode (most accurate)

Mode Comparison Table

Mode	Speed	Accuracy	Cost	Best For
naive	⚡⚡⚡	⭐	💰	Keywords, simple searches
local	⚡⚡	⭐⭐	💰	"Who/what is X?", entity lookup
global	⚡⚡	⭐⭐	💰	"How does X relate to Y?"
hybrid	⚡	⭐⭐⭐	💰💰	General queries, balanced needs
mix	⚡	⭐⭐⭐⭐	💰💰💰	Complex questions, accuracy is critical

Real-World Examples

✅ Good Mode Choices

// Entity lookup → local
query({ query: 'Alice', mode: 'local' });
query({ query: 'What is Alice working on?', mode: 'local' });

// Relationship → global
query({ query: 'How does auth connect to billing?', mode: 'global' });
query({ query: 'What depends on the API?', mode: 'global' });

// Keywords → naive
query({ query: 'API documentation', mode: 'naive' });
query({ query: 'bug fix', mode: 'naive' });

// Complex questions → mix
query({
  query: 'What did Alice say about the API refactor last month?',
  mode: 'mix',
});

❌ Poor Mode Choices

// ❌ Overkill - simple keyword doesn't need graph
query({ query: 'bug', mode: 'mix' }); // Use naive instead

// ❌ Underpowered - complex question needs more
query({
  query: 'What were the main concerns raised about the API refactor?',
  mode: 'naive', // Use mix instead
});

// ❌ Wrong focus - asking about relationships but using entity mode
query({
  query: 'How do these components interact?',
  mode: 'local', // Use global or hybrid instead
});

Parameter Tuning

`top_k` (Candidates Retrieved)

Purpose: How many candidates to retrieve before reranking

Guidelines:

// Quick answer, chatbot response
top_k: 20 - 40;

// Standard queries (recommended)
top_k: 60 - 80;

// Research, comprehensive answers
top_k: 100 - 200;

Example:

// User asking quick question in chatbot
await query({
  query: "What's our return policy?",
  mode: 'naive',
  top_k: 30, // Fast, focused
});

// User doing research
await query({
  query: 'Analyze all discussions about API security',
  mode: 'hybrid',
  top_k: 150, // Comprehensive
});

`chunk_top_k` (Final Results)

Purpose: How many results to return to user

Guidelines:

// Chatbot, concise answer
chunk_top_k: 5 - 10;

// Standard display (recommended)
chunk_top_k: 20;

// Research, comprehensive view
chunk_top_k: 50 - 100;

Example:

// Display in UI with limited space
await query({
  query: 'recent updates',
  chunk_top_k: 10,
});

// Export for analysis
await query({
  query: 'all API discussions',
  chunk_top_k: 100,
});

`score_threshold` (Quality Filter)

Purpose: Minimum relevance score (0.0-1.0)

Guidelines:

// High recall (more results, some may be less relevant)
score_threshold: 0.3 - 0.4;

// Balanced (recommended)
score_threshold: 0.5;

// High precision (fewer but more relevant results)
score_threshold: 0.7 - 0.8;

Example:

// Chatbot - want high-quality answers only
await query({
  query: 'how to reset password',
  score_threshold: 0.7, // Only confident answers
});

// Research - want to see everything
await query({
  query: 'mentions of security',
  score_threshold: 0.3, // Cast wide net
});

Performance Optimization

1. Parallel Queries

When querying multiple sources:

// ❌ BAD - Sequential (slow)
const slack = await query({ query: q, source: 'slack' });
const github = await query({ query: q, source: 'github' });
const notion = await query({ query: q, source: 'notion' });

// ✅ GOOD - Parallel (3x faster)
const [slack, github, notion] = await Promise.all([
  query({ query: q, source: 'slack' }),
  query({ query: q, source: 'github' }),
  query({ query: q, source: 'notion' }),
]);

2. Request Batching

For multiple queries:

// ❌ BAD - Many small requests
for (const q of questions) {
  await query(q); // Network overhead per request
}

// ✅ GOOD - Batch request
const results = await Promise.all(questions.map((q) => query(q)));

3. Debounce User Input

For search-as-you-type:

import { debounce } from 'lodash';

const debouncedQuery = debounce(async (searchTerm: string) => {
  const results = await query(searchTerm);
  updateUI(results);
}, 300); // Wait 300ms after user stops typing

4. Prefetch Common Queries

// On app load, prefetch frequently asked questions
const commonQueries = [
  "What's our return policy?",
  'How do I contact support?',
  'Where is my order?',
];

// Warm up cache
await Promise.all(commonQueries.map((q) => query(q)));

5. Use Streaming for Long Results

// For real-time display
const streamResults = async (question: string) => {
  const response = await fetch('/api/query/stream', {
    method: 'POST',
    body: JSON.stringify({ query: question }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    displayResult(JSON.parse(chunk));
  }
};

Cost Optimization

1. Minimize Reranking

// ❌ Expensive - reranking everything
await query({ mode: 'mix', disable_rerank: false }); // Uses LLM

// ✅ Cheaper - rerank only when needed
const mode = isComplexQuery ? 'mix' : 'hybrid';
await query({ mode, disable_rerank: !isComplexQuery });

2. Use Appropriate Embedding Models

# Most accurate but expensive
LLM_EMBEDDING_MODEL=text-embedding-3-large  # $0.13/1M tokens

# Balanced (recommended)
LLM_EMBEDDING_MODEL=text-embedding-3-small  # $0.02/1M tokens

# Local and free
LLM_EMBEDDING_MODEL=nomic-embed-text  # Ollama, no cost

3. Batch Indexing

// ❌ Expensive - index one at a time
for (const doc of documents) {
  await indexDocument(doc); // Separate API calls
}

// ✅ Cheaper - batch indexing
await indexDocuments(documents); // Single API call

Error Handling

1. Graceful Degradation

const robustQuery = async (question: string) => {
  try {
    // Try best mode first
    return await query({ query: question, mode: 'mix' });
  } catch (error) {
    console.warn('Mix mode failed, falling back to hybrid');
    try {
      return await query({ query: question, mode: 'hybrid' });
    } catch (error) {
      console.warn('Hybrid failed, falling back to naive');
      return await query({ query: question, mode: 'naive' });
    }
  }
};

2. Timeout Handling

const queryWithTimeout = async (question: string, timeout = 30000) => {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), timeout);

  try {
    const response = await fetch('/api/query', {
      method: 'POST',
      signal: controller.signal,
      body: JSON.stringify({ query: question }),
    });
    return await response.json();
  } catch (error) {
    if (error.name === 'AbortError') {
      throw new Error('Query timeout - try a simpler query');
    }
    throw error;
  } finally {
    clearTimeout(timeoutId);
  }
};

3. Retry Logic

const queryWithRetry = async (question: string, maxRetries = 3) => {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await query(question);
    } catch (error) {
      if (attempt === maxRetries) throw error;

      const delay = Math.pow(2, attempt) * 1000; // Exponential backoff
      console.log(`Retry ${attempt}/${maxRetries} after ${delay}ms`);
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }
};

Query Optimization Patterns

Pattern 1: Query Expansion

Expand vague queries for better results:

const expandQuery = (query: string): string => {
  const expansions = {
    api: 'API OR REST API OR GraphQL API',
    bug: 'bug OR issue OR error OR problem',
    docs: 'documentation OR docs OR guide OR tutorial',
  };

  return Object.entries(expansions).reduce(
    (q, [key, expansion]) => q.replace(new RegExp(key, 'gi'), expansion),
    query,
  );
};

// Usage
const results = await query({
  query: expandQuery('api bug'),
  // "API OR REST API OR GraphQL API bug OR issue OR error"
  mode: 'hybrid',
});

Pattern 2: Query Rewriting

Rewrite natural language to search-friendly format:

const rewriteQuery = (query: string): string => {
  // "What did Alice say about X?" → "Alice X"
  return query
    .replace(/what did (.*?) say about/i, '$1')
    .replace(/how does (.*?) work/i, '$1')
    .replace(/when was (.*?) created/i, '$1')
    .trim();
};

Pattern 3: Multi-Step Queries

Break complex queries into steps:

const complexQuery = async (question: string) => {
  // Step 1: Find relevant entities
  const entities = await query({
    query: question,
    mode: 'local',
    chunk_top_k: 5,
  });

  // Step 2: Get relationships for those entities
  const entityNames = entities.map((r) => r.entities).flat();
  const relationships = await query({
    query: entityNames.join(' '),
    mode: 'global',
    chunk_top_k: 10,
  });

  // Step 3: Combine and rerank
  const combined = [...entities, ...relationships];
  return combined.sort((a, b) => b.score - a.score).slice(0, 20);
};

Common Pitfalls

❌ Don't: Always Use Maximum Settings

// Slow and expensive
await query({
  query: 'test',
  mode: 'mix',
  top_k: 200,
  chunk_top_k: 100,
  enable_rerank: true,
});

✅ Do: Match Settings to Needs

// Fast and appropriate
await query({
  query: 'test',
  mode: 'naive',
  top_k: 30,
  chunk_top_k: 10,
});

❌ Don't: Ignore Error Responses

// No error handling
const results = await query(userInput);
displayResults(results.results); // Might crash

✅ Do: Handle Errors Gracefully

try {
  const results = await query(userInput);
  if (results.results.length === 0) {
    showMessage('No results found');
  } else {
    displayResults(results.results);
  }
} catch (error) {
  showError('Search failed. Please try again.');
}

❌ Don't: Query Without Validation

// Dangerous
await query({ query: userInput });

✅ Do: Validate and Sanitize

const safeQuery = (userInput: string) => {
  // Validate
  if (!userInput || userInput.trim().length < 2) {
    throw new Error('Query too short');
  }

  if (userInput.length > 500) {
    throw new Error('Query too long');
  }

  // Sanitize
  const sanitized = userInput.trim().substring(0, 500);

  return query({ query: sanitized });
};

Monitoring & Analytics

Track Query Performance

const monitoredQuery = async (question: string, mode: string) => {
  const start = Date.now();

  try {
    const results = await query({ query: question, mode });
    const duration = Date.now() - start;

    // Log metrics
    analytics.track('query', {
      duration,
      mode,
      resultsCount: results.results.length,
      avgScore: results.results.reduce((s, r) => s + r.score, 0) / results.results.length,
    });

    return results;
  } catch (error) {
    analytics.track('query_error', {
      duration: Date.now() - start,
      mode,
      error: error.message,
    });
    throw error;
  }
};

A/B Testing Query Modes

const abTestQuery = async (question: string) => {
  const mode = Math.random() < 0.5 ? 'hybrid' : 'mix';

  const results = await query({ query: question, mode });

  // Track which mode performed better
  analytics.track('query_ab_test', {
    mode,
    resultsCount: results.results.length,
    avgScore: results.results.reduce((s, r) => s + r.score, 0) / results.results.length,
  });

  return results;
};

Summary: Quick Reference

Speed Priority

{
  mode: "naive",
  top_k: 30,
  chunk_top_k: 10,
  disable_rerank: true
}

Accuracy Priority

{
  mode: "mix",
  top_k: 100,
  chunk_top_k: 20,
  disable_rerank: false
}

Balanced (Recommended)

{
  mode: "hybrid",
  top_k: 60,
  chunk_top_k: 20,
  disable_rerank: true
}

Cost Priority

{
  mode: "naive",  // No LLM usage
  top_k: 40,
  chunk_top_k: 10,
  disable_rerank: true  // No LLM reranking
}

Next Steps

API Reference - Complete parameter documentation
Query Modes - Detailed mode explanations
Examples - Real-world usage examples

FilesExpand file tree

best-practices.md

Latest commit

History

best-practices.md

File metadata and controls

Query Best Practices

Quick Wins

1. Choose the Right Mode

2. Adjust top_k Based on Needs

3. Cache Frequently Asked Questions

4. Use Progressive Enhancement

Query Mode Selection

Decision Tree

Mode Comparison Table

Real-World Examples

✅ Good Mode Choices

❌ Poor Mode Choices

Parameter Tuning

top_k (Candidates Retrieved)

chunk_top_k (Final Results)

score_threshold (Quality Filter)

Performance Optimization

1. Parallel Queries

2. Request Batching

3. Debounce User Input

4. Prefetch Common Queries

5. Use Streaming for Long Results

Cost Optimization

1. Minimize Reranking

2. Use Appropriate Embedding Models

3. Batch Indexing

Error Handling

1. Graceful Degradation

2. Timeout Handling

3. Retry Logic

Query Optimization Patterns

Pattern 1: Query Expansion

Pattern 2: Query Rewriting

Pattern 3: Multi-Step Queries

Common Pitfalls

❌ Don't: Always Use Maximum Settings

✅ Do: Match Settings to Needs

❌ Don't: Ignore Error Responses

✅ Do: Handle Errors Gracefully

❌ Don't: Query Without Validation

✅ Do: Validate and Sanitize

Monitoring & Analytics

Track Query Performance

A/B Testing Query Modes

Summary: Quick Reference

Speed Priority

Accuracy Priority

Balanced (Recommended)

Cost Priority

Next Steps

2. Adjust `top_k` Based on Needs

`top_k` (Candidates Retrieved)

`chunk_top_k` (Final Results)

`score_threshold` (Quality Filter)