Skip to content

feat(pl-drivers): cache presigned download URLs#1701

Merged
vadimpiven merged 9 commits into
mainfrom
feat/cache-download-urls
Jun 18, 2026
Merged

feat(pl-drivers): cache presigned download URLs#1701
vadimpiven merged 9 commits into
mainfrom
feat/cache-download-urls

Conversation

@vadimpiven

@vadimpiven vadimpiven commented Jun 16, 2026

Copy link
Copy Markdown
Member

What

Caches GetDownloadURL responses in ClientDownload so repeated reads of the same blob don't each pay a backend round-trip.

  • New DownloadUrlCache (LRU, keyed by SignedResourceId) holding the full {downloadUrl, headers} response.
  • grpcGetDownloadUrl now checks the cache before the gRPC/REST call and populates it on success.
  • Each entry's TTL is derived from the expiry encoded in the presigned URL, not a guessed constant:
    • X-Amz-Date + X-Amz-Expires — S3 and the FS remote driver (AWS SigV4).
    • X-Goog-Date + X-Goog-Expires — GCS (V4 signed URLs).
    • The timestamp is UTC (Z/Zulu), verified against pl util/storage/v4sign/presigner.go. The compact YYYYMMDDTHHMMSSZ form is expanded to the extended ISO form before parsing (V8 cannot parse the compact form).
    • A 30s safety margin is subtracted to cover clock skew between this host and pl-core plus in-flight time.
    • Local storage:// URLs carry no encoded expiry and get a bounded 1h default TTL.

Why

GetDownloadURL is called on every blob read; the returned URL is valid for its full TTL (1h for FS-remote/GCS by default), so re-requesting it per read is wasted work.

Notes

  • The TTL clock is captured atomically inside DownloadUrlCache.set() (not before the await), so the cache entry never outlives expiry − margin.
  • The cache key omits isInternalUse because the only caller always passes false.
  • No request coalescing: two concurrent misses for the same id will both call the backend (unchanged from before).

Tests

download_url_cache.test.ts — 10 unit tests covering S3/FS, GCS, the UTC-independence guarantee, malformed/non-URL input, the 30s margin, and the no-expiry default.

Greptile Summary

This PR adds a DownloadUrlCache (LRU, keyed by SignedResourceId) to ClientDownload so that each GetDownloadURL backend call is only made once per presigned URL's lifetime instead of once per blob read. Cache entry TTLs are computed from the expiry embedded in the presigned URL (X-Amz-Date/X-Amz-Expires for S3 and the FS remote driver, X-Goog-Date/X-Goog-Expires for GCS), with a 30 s safety margin; storage:// URLs fall back to a 1 h default.

  • download_url_cache.ts — New module with extractUrlExpiryMs, downloadUrlCacheTtlMs, and DownloadUrlCache; the injected logger is currently unused, and a malformed presign date (however unlikely) would cause a fall-through to the 1 h default TTL instead of bypassing the cache.
  • download.tsgrpcGetDownloadUrl now checks the cache before the gRPC/REST call and populates it on success; diff is minimal and correct.
  • download_url_cache.test.ts — Tests cover extractUrlExpiryMs and downloadUrlCacheTtlMs thoroughly, but the DownloadUrlCache class itself (round-trip get/set, TTL enforcement, the ttl ≤ 0 skip path) has no direct test coverage.

Confidence Score: 4/5

Safe to merge; the caching logic is correct for the happy path and the changes are well-scoped to a single client class.

The core TTL derivation and LRU integration are sound. The three findings are all quality/observability concerns: an unused logger, a misleading comment, and a theoretical over-caching scenario on malformed presign dates that the backend never produces in practice. Nothing in the changed path can cause data loss or incorrect auth.

download_url_cache.ts deserves a second look for the unused logger and the malformed-date fallback path; the test file is missing coverage of the DownloadUrlCache class itself.

Important Files Changed

Filename Overview
lib/node/pl-drivers/src/clients/download_url_cache.ts New LRU cache for presigned download URLs with TTL derived from AWS/GCS query parameters; logger is accepted but unused, a misleading comment exists, and a malformed presign date falls back to the 1h default TTL instead of bypassing the cache.
lib/node/pl-drivers/src/clients/download.ts Integrates DownloadUrlCache into grpcGetDownloadUrl; cache check and populate added correctly with minimal diff.
lib/node/pl-drivers/src/clients/download_url_cache.test.ts 10 unit tests covering URL expiry extraction and TTL calculation; no tests for the DownloadUrlCache class itself (get/set/TTL enforcement), leaving that integration path untested.
.changeset/cache-download-urls.md Changeset entry describing the patch; accurate and well-written.
lib/node/pl-drivers/package.json Adds lru-cache as a direct dependency via the workspace catalog; no issues.
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
lib/node/pl-drivers/src/clients/download_url_cache.ts:108-111
**Misleading "Cache miss" comment**

`// Cache miss.` is the wrong term here — a cache miss means the key was not found. This path means the URL has already (nearly) expired and should not be stored. Consider `// URL already expired or within safety margin; skip caching.`

### Issue 2 of 3
lib/node/pl-drivers/src/clients/download_url_cache.ts:93-101
**`logger` is injected but never used**

`logger` is stored as `private readonly logger` but is not referenced anywhere in `get()` or `set()`. If it was intended for cache-hit/miss diagnostics it should be wired up; if not needed it can be dropped to avoid misleading future readers.

### Issue 3 of 3
lib/node/pl-drivers/src/clients/download_url_cache.ts:42-54
**Early `return undefined` on malformed expiry causes over-long default TTL**

When a URL has recognizable signing params (`X-Amz-Date` is present) but the date is malformed, line 49 returns `undefined` rather than `continue`-ing. `downloadUrlCacheTtlMs` then cannot distinguish "no expiry encoded" (`storage://`) from "expiry encoded but unreadable", and applies the full 1 h default TTL. If a near-expiring URL ever has a corrupt date field, the stale URL would be served for up to ~1 h, producing 403s from S3/GCS on every blob read until the entry ages out.

Consider returning a distinct sentinel (e.g. `null`) or throwing, so the caller knows a signing scheme was detected but the TTL cannot be trusted.

Reviews (1): Last reviewed commit: "feat(pl-drivers): cache presigned downlo..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

Cache GetDownloadURL responses by signed resource id to avoid a backend
round-trip on every blob read. Each entry's TTL is derived from the
expiry encoded in the presigned URL (X-Amz-Date/Expires for S3 and the
FS remote driver, X-Goog-Date/Expires for GCS), minus a 30s safety
margin; the timestamp is UTC. URLs with no encoded expiry (local
storage://) get a bounded default TTL.
@changeset-bot

changeset-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 4df307e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
@milaboratories/pl-drivers Minor
@milaboratories/pl-middle-layer Patch
@milaboratories/pf-spec-driver Patch
@milaboratories/pf-driver Patch
@platforma-sdk/workflow-tengo Patch
@platforma-sdk/pl-cli Patch
@platforma-sdk/test Patch
@platforma-sdk/ui-vue Patch
@milaboratories/milaboratories.monetization-test.workflow Patch
@milaboratories/milaboratories.ui-examples.workflow Patch
@milaboratories/milaboratories.pool-explorer.workflow Patch
@milaboratories/milaboratories.monetization-test.ui Patch
@milaboratories/milaboratories.ui-examples.ui Patch
@milaboratories/milaboratories.pool-explorer.ui Patch
@milaboratories/milaboratories.monetization-test Patch
@milaboratories/milaboratories.ui-examples Patch
@milaboratories/milaboratories.pool-explorer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a caching mechanism for presigned download URLs in @milaboratories/pl-drivers using lru-cache to avoid redundant GetDownloadURL round-trips. The cache derives TTLs from AWS SigV4 or GCS V4 URL expiries with a safety margin, falling back to a default TTL for local storage URLs. Feedback suggests throwing immediately on aborted signals even on cache hits, removing an unused logger parameter from the DownloadUrlCache constructor, and defensively verifying that parsed URL expiry values are non-negative.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread lib/node/pl-drivers/src/clients/download.ts Outdated
Comment thread lib/node/pl-drivers/src/clients/download.ts
Comment thread lib/node/pl-drivers/src/clients/download_url_cache.ts
Comment thread lib/node/pl-drivers/src/clients/download_url_cache.ts Outdated
Comment thread lib/node/pl-drivers/src/clients/download_url_cache.ts
Comment thread lib/node/pl-drivers/src/clients/download_url_cache.ts
Comment thread lib/node/pl-drivers/src/clients/download_url_cache.ts
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.82353% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.06%. Comparing base (5a63400) to head (4df307e).
⚠️ Report is 15 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/node/pl-drivers/src/clients/download.ts 78.94% 3 Missing and 1 partial ⚠️
.../node/pl-drivers/src/clients/download_url_cache.ts 87.50% 1 Missing and 3 partials ⚠️
lib/node/pl-middle-layer/src/pool/driver.ts 66.66% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1701       +/-   ##
===========================================
+ Coverage   44.56%   55.06%   +10.49%     
===========================================
  Files          43      312      +269     
  Lines        2540    17548    +15008     
  Branches      663     3830     +3167     
===========================================
+ Hits         1132     9662     +8530     
- Misses       1225     6647     +5422     
- Partials      183     1239     +1056     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The URL cache derives its TTL from the expiry encoded in the presigned
URL minus a 30s margin, but clock skew between this host and the storage
backend can still leave the cache serving a URL the storage rejects. With
no recovery path, every read of that blob failed until the entry's TTL
lapsed.

Move all cache interaction into withBlobContent (its sole consumer) and,
on a 4xx from a cached URL, evict the entry and retry once with a freshly
fetched URL. Expired URLs surface as 4xx across all backends: S3 403, the
pl FS remote driver 403, GCS 400 ExpiredToken - so the retry keys off the
whole 4xx range, not 403 alone. grpcGetDownloadUrl is now a pure fetcher.
Bump @milaboratories/pframes-rs-* to 1.1.48 (ships the caching runtime) and
wire its CachingObjectStore into the PFrames remote blob provider. Downloaded
parquet byte ranges are now served from a local cache that survives restarts,
backed by a new parquetCachePath (default <workDir>/parquetCache, not emptied
on startup). Cache tuning is exposed via parquetCacheOps (8 GiB budget, 0.2
single-file admission fraction, 50k tracked files); getCacheMetrics() and
resetCache() now report real data instead of null / no-op.
…ypass)

getContentDirect mirrors getContent but never reads from or writes to the
ranges cache - a fresh read straight from storage that does not populate the
cache. Shared via getContentImpl with an internal bypassRangesCache flag that
skips the cache get and the tee/set in withContent; for local handles it is
identical to getContent. Kept off the block-facing BlobDriver interface.

The PFrames data-source blob reader (pl-middle-layer pool driver) now uses
getContentDirect, since PFrames manages its own caching.
@vadimpiven vadimpiven force-pushed the feat/cache-download-urls branch from 8335198 to ae79571 Compare June 17, 2026 10:02
extractUrlExpiryMs and downloadUrlCacheTtlMs were unexported, breaking
download_url_cache.test.ts. Re-export them and align the malformed-params
test with the current contract (present-but-invalid presign params return
null -> not cached, distinct from undefined = no expiry encoded).

Also: log parsed cache metrics via JSON.stringify, and log cached
download URL acceptance.
Remove the per-cache-hit "cached download URL ... accepted" log and the
"Parquet cache metrics" log (with its now-unused getCacheMetrics call).
Comment thread lib/node/pl-drivers/src/clients/download_url_cache.ts Outdated
downloadUrlCacheTtlMs now reads Date.now() internally instead of taking
nowMs, so the tests pin the system clock with fake timers instead of
passing a timestamp. Also fix the stale nowMs reference in its doc comment.
@vadimpiven vadimpiven enabled auto-merge June 17, 2026 14:48
@vadimpiven vadimpiven disabled auto-merge June 18, 2026 07:24
The parquet caching object store (newly wired in this branch) adds latency to
every parquet read, which the model-test exercises via MiddleLayer.init. The
test was under-provisioned versus its siblings (table-test/filter-column-test):
'with args' used a 10s timeout and 'block-level services' relied on the 5s
awaitBlockDone default, so on slow CI the block did not reach Done in time
('Test timed out in 10000ms' / 'Aborted while awaiting block done').

Align with table-test: 60s test timeouts and an explicit 50s awaitBlockDone.
@vadimpiven vadimpiven enabled auto-merge June 18, 2026 09:49
@vadimpiven vadimpiven added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit 3f8423a Jun 18, 2026
14 checks passed
@vadimpiven vadimpiven deleted the feat/cache-download-urls branch June 18, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants