feat(guardrails): add Hlido agent trust guardrail#30237
feat(guardrails): add Hlido agent trust guardrail#30237ankitkapur1992-hlido wants to merge 1 commit into
Conversation
|
Ankit Kapur seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Greptile SummaryAdds a new
Confidence Score: 3/5The guardrail's primary trust-enforcement logic can be bypassed through path traversal in caller-supplied slug values before the fix lands. The core enforcement mechanism reads slugs directly from request metadata and embeds them unencoded into the outbound URL path. A caller who knows they would be blocked can craft a slug containing litellm/proxy/guardrails/guardrail_hooks/hlido/hlido.py — specifically the URL construction in
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/hlido/hlido.py | Core guardrail implementation; user-controlled slugs from request metadata are interpolated into the URL path without encoding (path injection), the during_call mode omits the applied-guardrails header update, and the in-memory cache has no eviction. |
| litellm/proxy/guardrails/guardrail_hooks/hlido/init.py | Guardrail initializer and registry; follows the vigil_guard pattern and correctly wires config values through to HlidoGuardrail. |
| litellm/types/proxy/guardrails/guardrail_hooks/hlido.py | Pydantic config models for the Hlido guardrail; clean, consistent with other config models in the codebase. |
| litellm/types/guardrails.py | Adds HlidoGuardrailConfigModel import, HLIDO enum entry, and mixin to LitellmParams; changes are minimal and follow existing patterns. |
| tests/test_litellm/proxy/guardrails/guardrail_hooks/test_hlido.py | 12 mock-only unit tests covering allow/block/unverified/error/cache/header scenarios; no real network calls, consistent with repo test policy. |
Reviews (1): Last reviewed commit: "feat(guardrails): add Hlido agent trust ..." | Re-trigger Greptile
| response = await self.async_handler.get( | ||
| url=f"{self.api_base}/v1/agents/{slug}", | ||
| headers=headers, | ||
| timeout=_REQUEST_TIMEOUT, | ||
| ) |
There was a problem hiding this comment.
Unencoded user-controlled slug in URL path — the
slug value from metadata.hlido_slugs flows directly into the URL path without URL-encoding or character validation. A caller can submit a slug like trusted-agent/../../../other-path or trusted-agent?score=100 to probe arbitrary paths on the Hlido API or inject query parameters. At minimum the slug should be percent-encoded with urllib.parse.quote; ideally it should also be validated against an allowlist pattern (e.g., only alphanumerics and hyphens) before the lookup so a path-traversal payload never reaches the wire.
| response = await self.async_handler.get( | |
| url=f"{self.api_base}/v1/agents/{slug}", | |
| headers=headers, | |
| timeout=_REQUEST_TIMEOUT, | |
| ) | |
| from urllib.parse import quote | |
| response = await self.async_handler.get( | |
| url=f"{self.api_base}/v1/agents/{quote(slug, safe='')}", | |
| headers=headers, | |
| timeout=_REQUEST_TIMEOUT, | |
| ) |
| await self._check_request(data) | ||
| return data | ||
|
|
||
| async def _check_request(self, data: dict) -> None: |
There was a problem hiding this comment.
async_moderation_hook (used when mode: during_call) never calls add_guardrail_to_applied_guardrails_header, so the guardrail silently disappears from response headers in that mode. async_pre_call_hook does make the call, so the inconsistency is probably unintentional.
| await self._check_request(data) | |
| return data | |
| async def _check_request(self, data: dict) -> None: | |
| await self._check_request(data) | |
| add_guardrail_to_applied_guardrails_header( | |
| request_data=data, guardrail_name=self.guardrail_name | |
| ) | |
| return data | |
| async def _check_request(self, data: dict) -> None: |
| self.cache_ttl = ( | ||
| cache_ttl if cache_ttl is not None else DEFAULT_CACHE_TTL_SECONDS | ||
| ) | ||
| self._cache: Dict[str, Tuple[float, Optional[HlidoAgentRecord]]] = {} |
There was a problem hiding this comment.
Unbounded cache growth —
self._cache is a plain dict with no eviction. Stale entries are only replaced when the same slug is looked up again; entries for slugs that never recur accumulate indefinitely. In deployments where many unique slug values arrive via metadata.hlido_slugs (e.g., one slug per end-user session), this is a slow memory leak. Consider capping the dict size or using litellm's existing DualCache (which already manages TTL eviction) instead of a bespoke per-instance dict.
Greptile SummaryThis PR adds a new
Confidence Score: 2/5Not safe to merge without addressing the cache architecture and the undocumented data-flow to a service operated by the PR contributor. The guardrail's instance-level dict cache has no concurrency guard, grows unbounded, and is not shared across workers — under any multi-process deployment slugs are re-fetched from the external API far more than the 300 s TTL intends. More critically, every agent slug identifier is sent to https://hlido.eu, which is operated by the PR author; proxy operators currently have no in-product signal that enabling this guardrail routes their agent-vendor metadata to that commercial service on every cache miss. litellm/proxy/guardrails/guardrail_hooks/hlido/hlido.py — cache implementation and outbound HTTP call both need attention before merge.
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/hlido/hlido.py | Core guardrail implementation — contains an instance-level dict cache that causes concurrent stampede, unbounded growth, and cross-worker inconsistency; all slug names are sent to an external commercial service operated by the PR author |
| litellm/proxy/guardrails/guardrail_hooks/hlido/init.py | Initializer and registry wiring — follows vigil_guard pattern correctly; passes litellm's async httpx client to the guardrail instance |
| litellm/types/proxy/guardrails/guardrail_hooks/hlido.py | Config model — correctly extends GuardrailConfigModel, but on_unverified and on_error accept any string and silently fall back to 'allow' on typos instead of using Literal types |
| litellm/types/guardrails.py | Adds HLIDO enum entry and mixes HlidoGuardrailConfigModel into LitellmParams — minimal, correct changes consistent with other guardrail integrations |
| tests/test_litellm/proxy/guardrails/guardrail_hooks/test_hlido.py | 12 mocked unit tests covering allow/block paths, caching, metadata slug merging, and auth headers — all use FakeHandler with no real network calls, consistent with repo test policy |
Reviews (2): Last reviewed commit: "feat(guardrails): add Hlido agent trust ..." | Re-trigger Greptile
| if min_score is None and allowed_tiers is None: | ||
| min_score = DEFAULT_MIN_SCORE |
There was a problem hiding this comment.
Instance-level dict cache causes stampede and unbounded growth
self._cache is a plain Python dict on the guardrail instance. Under high concurrency, multiple requests for the same uncached slug (e.g., at startup or after TTL expiry) will all miss the cache simultaneously and fan out to the Hlido API — there is no in-flight deduplication or asyncio lock. Additionally, entries are never evicted: the dict grows monotonically for every unique slug seen over the lifetime of the worker. In a multi-process deployment (Gunicorn/uvicorn workers), each worker maintains its own independent cache, multiplying API calls accordingly. The DualCache instance passed in to async_pre_call_hook is the litellm-idiomatic shared cache that can be backed by Redis; consider using it here rather than a bare dict.
| if self.on_unverified == "allow": | ||
| verbose_proxy_logger.warning( | ||
| "Hlido guardrail: agent '%s' has no Hlido review; allowing " | ||
| "per on_unverified=allow", | ||
| slug, | ||
| ) | ||
| return | ||
| raise GuardrailRaisedException( | ||
| guardrail_name=self.guardrail_name, | ||
| message=( | ||
| f"Hlido trust check failed for agent '{slug}': no " | ||
| "independent review exists and on_unverified is 'block'" | ||
| ), | ||
| ) | ||
| case TrustLookupFailed(slug=slug, error=error): | ||
| if self.on_error == "allow": | ||
| verbose_proxy_logger.warning( | ||
| "Hlido guardrail: lookup failed for '%s' (%s); allowing " | ||
| "per on_error=allow", |
There was a problem hiding this comment.
Every slug name is sent to an external service operated by the PR author
The PR description explicitly discloses that the author is the founder of Hlido. Every unique agent slug that passes through this guardrail (whether from static config or per-request hlido_slugs metadata) is transmitted to https://hlido.eu — a commercial service operated by the PR contributor. Proxy operators who enable this guardrail will leak their downstream agent vendor identifiers to that external service on every cache miss. Users opting into the guardrail may not be aware of this data flow, and there is no redirection mechanism, anonymisation, or self-hosting option described. At a minimum, this data flow should be clearly documented in the proxy config docs and in the config model's api_base description so operators can evaluate the privacy implication before enabling the guardrail.
| on_unverified: Optional[str] = Field( | ||
| default=None, | ||
| description=( | ||
| "Action when a slug has no Hlido review: 'allow' (default) or 'block'." | ||
| ), | ||
| ) | ||
| on_error: Optional[str] = Field( | ||
| default=None, | ||
| description=( | ||
| "Action when the Hlido API is unreachable: 'allow' (default) or 'block'." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
on_unverified and on_error are typed as Optional[str], so any non-"block" string (e.g. a typo like "bloc") silently defaults to "allow" without any validation error. Using Literal["allow", "block"] propagates misconfiguration to the operator immediately.
| on_unverified: Optional[str] = Field( | |
| default=None, | |
| description=( | |
| "Action when a slug has no Hlido review: 'allow' (default) or 'block'." | |
| ), | |
| ) | |
| on_error: Optional[str] = Field( | |
| default=None, | |
| description=( | |
| "Action when the Hlido API is unreachable: 'allow' (default) or 'block'." | |
| ), | |
| ) | |
| on_unverified: Optional[Literal["allow", "block"]] = Field( | |
| default=None, | |
| description=( | |
| "Action when a slug has no Hlido review: 'allow' (default) or 'block'." | |
| ), | |
| ) | |
| on_error: Optional[Literal["allow", "block"]] = Field( | |
| default=None, | |
| description=( | |
| "Action when the Hlido API is unreachable: 'allow' (default) or 'block'." | |
| ), | |
| ) |
| request_slugs: Tuple[str, ...] = () | ||
| metadata = data.get("metadata") or data.get("litellm_metadata") | ||
| if isinstance(metadata, dict): | ||
| raw = metadata.get("hlido_slugs") |
There was a problem hiding this comment.
Medium: Client-controlled trust subject
metadata is supplied by the API caller, so a caller can omit hlido_slugs or provide a reviewed slug for a different agent and still get the request through when no static slugs are configured. The slug being verified should come from trusted server-side config or the proxy's agent registry, and requests without a trusted slug should block rather than silently returning.
| slugs = self._collect_slugs(data) | ||
| if not slugs: | ||
| return | ||
| for slug in slugs: |
There was a problem hiding this comment.
Medium: Unbounded slug lookups
A caller can submit a large list of unique metadata.hlido_slugs; the guardrail performs one outbound Hlido request per slug and then stores each unique slug in the in-memory cache. Add a small maximum slug count, validate slug length/format before lookup, and use a bounded cache or evict expired entries.
PR overviewThis PR adds a Hlido agent trust guardrail under the LiteLLM proxy guardrail hooks, using Hlido slug checks to decide whether an agent/request should be allowed. The touched code integrates outbound Hlido lookups and caching around those trust checks. There are still two open security concerns in the new guardrail path. The main issue is that the trust subject can be supplied by the API caller via metadata, allowing requests to influence which slug is checked or avoid a check when no trusted server-side slug is configured. The implementation also allows unbounded caller-driven slug lookups and cache growth, creating a potential resource-exhaustion path; no issues have been addressed yet. Open issues (2)
Fixed/addressed: 0 · PR risk: 6/10 |
|
Thanks for contributing this guardrail, @ankitkapur1992-hlido! Before this can merge, Greptile (scored 2/5) flagged a few things that need attention:
Once those are addressed, this will be in great shape! |
Relevant issues
None; new guardrail integration
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
make test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewScreenshots / Proof of Fix
Disclosure: I am the founder of Hlido, the service this guardrail calls. The data surface it reads is public and free (no key required for the default tier)
The guardrail reads the live public endpoint. Real output as of today:
Runbook to see it end to end on a live proxy (klariqo scores 58, below the default minimum of 60, so the first call is blocked before any provider spend; try-sanebox scores 90 so the second call passes through to the provider):
Expected: the klariqo request returns the guardrail block error naming the slug, the score 58, and the minimum 60 with an evidence URL; the try-sanebox request reaches the model normally
Type
🆕 New Feature
Changes
Adds an
hlidoguardrail that gates requests on independent trust scores for third party AI agents, fetched from the public Hlido API (https://hlido.eu). Teams declare which downstream agent vendors a route or request uses (staticslugsin the guardrail config, or per-requestmetadata.hlido_slugs) and the guardrail blocks the request pre call when a vendor's independently tested score is belowmin_score(default 60) or its tier is outsideallowed_tiersExisting guardrails in the catalog validate content (toxicity, PII, prompt injection); this one validates the counterparty agent itself, which is useful for orgs that route LLM traffic on behalf of agent workflows and need a procurement style trust gate at the gateway
Implementation follows the vigil_guard layout: self registering hook directory, config model in
litellm/types/proxy/guardrails/guardrail_hooks/hlido.py, enum entry, and a mocked test suite (12 tests) with a dependency injected HTTP handler. Outbound HTTP goes through litellm's own async httpx client; zero new dependencies. Lookups are cached per slug (default 300s). Unknown slugs and API failures default to allow (configurable to block). No API key is required; an optional bearer key raises rate limitsRan
black,ruff check, andmypyon the new files;pytest tests/test_litellm/proxy/guardrails/guardrail_hooks/test_hlido.pypasses 12/12Companion docs PR: BerriAI/litellm-docs#336