Skip to content

Fix(llm): retry transient failures and make concurrency configurable#29

Open
jhamze7 wants to merge 4 commits into
NVIDIA:mainfrom
jhamze7:fix-llm-retry-backoff
Open

Fix(llm): retry transient failures and make concurrency configurable#29
jhamze7 wants to merge 4 commits into
NVIDIA:mainfrom
jhamze7:fix-llm-retry-backoff

Conversation

@jhamze7

@jhamze7 jhamze7 commented Jun 12, 2026

Copy link
Copy Markdown

This pull request addresses the issue of LLM calls immediately failing due to a lack of retry logic.

Changes:

llm_utils.py:

Created helper functions for the sync and async batch runners that retry LLM calls 4 times by default (this number is configurable with the max_attempts argument) with exponential backoff timing. These functions string match the error message to keep the implementation universal across different providers.

llm_analyzer_base.py:

Wrapped the LLM calls in the helper functions created in llm_utils.py. Also set max_concurrency to the SKILLSPECTOR_MAX_CONCURRENCY environment variable, with the value defaulting to 5 if the variable is empty.

.env.example:

Created a SKILLSPECTOR_MAX_CONCURRENCY environment variable. This change was also noted in docs/DEVELOPMENT.md

test_llm_analyzer_base.py:

Added sync and async tests verifying that LLM calls are retried when a simulated 429 error occurs. The tests also verify that processing succeeds after a transient failure and raises an exception once the retry limit is exhausted.


Closes #10

@CharmingGroot

Copy link
Copy Markdown

Verified on a self-hosted OpenAI-compatible backend (vLLM serving Qwen/Qwen3.6-35B-A3B-FP8).

  • Configurable concurrency works: scanning the repo's tests/fixtures/malicious_skill/ fixture with LLM analysis completes the full pass at SKILLSPECTOR_MAX_CONCURRENCY=5 (default) and at 3. Being able to tune concurrency down is what a single-instance backend needs.
  • Retry classifier behaves as written: exercising retry_llm_call_sync directly — 429 and timeout errors are retried (4 attempts with backoff), non-retryable errors raise immediately. Matches the PR's stated scope.

LGTM for the #10 retry/backoff + configurable concurrency goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stage 2: LLM calls have no retry/backoff on 429 or timeout; default concurrency 10 is unconfigurable

2 participants