-
Notifications
You must be signed in to change notification settings - Fork 309
feat: universal proxy (new approach) #837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cjroth
wants to merge
46
commits into
main
Choose a base branch
from
cjroth/universal-proxy
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
5bc10f2
refactor: universal proxy
cjroth 45deabb
fix: multiple dev environment fixes / improvements
cjroth d0c6a62
fix: dev environment config
cjroth ff32a8e
fix: vite require-corp blocking images + confusing 401 status from proxy
cjroth 7593271
fix: noisy errors
cjroth ad8f4ca
fix: use request body / post for url preview endpoint to prevent url …
cjroth 6592b6b
fix: e2e tests
cjroth 6c495a7
fix: lint
cjroth e7701e7
feat: e2e test
cjroth ac967fb
refactor: replace mock.module with dependency injection in tests
cjroth 650e0bc
refactor: share PGlite across rerun-each to avoid RSS leak
cjroth b382a37
ci: retrigger workflows
cjroth bda1429
test: verify session row exists before returning e2e bearer
cjroth c088671
refactor: close PGlite via afterAll instead of process exit handler
cjroth 7b8dee3
ci: retrigger workflows for intermittent flake
cjroth d2b7b87
chore: cleanup
cjroth 371b46e
chore: remove unused dirname import
cjroth ea9d0fb
chore: address lint findings across backend and proxy-fetch
cjroth d7f6b77
chore: cleanup
cjroth a019564
refactor: share proxy protocol constants and tidy route signatures
cjroth 837250e
Merge remote-tracking branch 'origin/main' into cjroth/universal-proxy
cjroth 685de89
chore: relax COEP to credentialless for broader resource compatibility
cjroth 839af8a
docs: explain COEP credentialless choice in security headers
cjroth c8265d2
fix: echo CORS request headers to support universal proxy
cjroth 3c3f78b
Merge branch 'main' into cjroth/universal-proxy
darkbanjo 59f359e
bucket-B1: ws.ts style cleanup
ital0 07fedd0
bucket-D: api/preview cache + dedupes
ital0 0666c5e
bucket-D: GLM review fixes
ital0 89f0d42
bucket-B2: useFetch context refactor
ital0 14dcb51
test(bucket-B2): cover getOrCreateProxyFetch memoization
ital0 e7def9a
bucket-B2: GLM review fixes
ital0 f84893a
bucket-C: backend config cleanup
ital0 ebf3fb0
bucket-C: restore CORS regressions (PostHog allowHeaders + protocol e…
ital0 4acfb60
bucket-C: refresh CORS docs/comments to match allowedHeaders:true on …
ital0 662c52c
bucket-A: proxy core refactor (observability + content-encoding passt…
ital0 4945f69
fix(bucket-A): defer bytes_in read until response stream completes
ital0 6aea254
bucket-A: GLM review fixes (DoS guard + bytesIn regression test)
ital0 8013d2d
fix(bucket-B2): rename test helper to satisfy naming-convention lint
ital0 4584966
bucket-E: proxy_enabled toggle
ital0 3507897
style: reformat proxy files to match prettier line-width
ital0 7c862d2
refactor: rename SCREAMING_SNAKE proxy-protocol constants to camelCase
ital0 4c6aeb6
refactor(ws): add error_type to WS observability paths
ital0 62b9922
chore: extract dev-env items into separate stacked PR
ital0 94342ac
Merge remote-tracking branch 'origin/main' into cjroth/universal-proxy
ital0 e46a256
fix(proxy): drain response bodies in tests to prevent timer leak
ital0 a9c2d2b
fix(e2e): bypass backend dev.sh (moved to stacked PR)
ital0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| /* This Source Code Form is subject to the terms of the Mozilla Public | ||
| * License, v. 2.0. If a copy of the MPL was not distributed with this | ||
| * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ | ||
|
|
||
| import { afterEach, describe, expect, it } from 'bun:test' | ||
|
|
||
| import { | ||
| authHeaders, | ||
| createTestApp, | ||
| createTestUpstream, | ||
| createUpstreamRouter, | ||
| type TestAppHandle, | ||
| } from '@/test-utils/e2e' | ||
|
|
||
| const buildHtml = (body: string) => `<!doctype html><html><head>${body}</head><body></body></html>` | ||
|
|
||
| describe('GET /v1/preview — e2e', () => { | ||
| let handle: TestAppHandle | ||
|
|
||
| afterEach(async () => { | ||
| if (handle) { | ||
| await handle.cleanup() | ||
| } | ||
| }) | ||
|
|
||
| it('returns OG metadata with HTTPS-upgraded image, title, summary, siteName', async () => { | ||
| const upstream = createTestUpstream( | ||
| 'preview.test', | ||
| () => | ||
| new Response( | ||
| buildHtml(` | ||
| <meta property="og:title" content="Hello & world" /> | ||
| <meta property="og:description" content="A "short" summary" /> | ||
| <meta property="og:image" content="http://preview.test/cover.png" /> | ||
| <meta property="og:site_name" content="Preview Test" /> | ||
| `), | ||
| { status: 200, headers: { 'content-type': 'text/html; charset=utf-8' } }, | ||
| ), | ||
| ) | ||
| handle = await createTestApp({ fetchFn: createUpstreamRouter({ 'preview.test': upstream }) }) | ||
|
|
||
| const res = await handle.app.handle( | ||
| new Request(`http://localhost/v1/preview`, { | ||
| method: 'POST', | ||
| headers: { ...authHeaders(handle.bearerToken), 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ url: 'https://preview.test/article' }), | ||
| }), | ||
| ) | ||
| expect(res.status).toBe(200) | ||
| // Italo's review: per-user 10 min cache; no shared/CDN cache (`private`). | ||
| expect(res.headers.get('cache-control')).toBe('private, max-age=600') | ||
| const data = (await res.json()) as Record<string, string | null> | ||
| expect(data.title).toBe('Hello & world') | ||
| expect(data.summary).toBe('A "short" summary') | ||
| expect(data.siteName).toBe('Preview Test') | ||
| // http:// in og:image is auto-upgraded. | ||
| expect(data.previewImageUrl).toBe('https://preview.test/cover.png') | ||
| }) | ||
|
|
||
| it('returns all-null when the page has no OG tags', async () => { | ||
| const upstream = createTestUpstream( | ||
| 'preview.test', | ||
| () => | ||
| new Response(buildHtml('<title>plain</title>'), { | ||
| status: 200, | ||
| headers: { 'content-type': 'text/html' }, | ||
| }), | ||
| ) | ||
| handle = await createTestApp({ fetchFn: createUpstreamRouter({ 'preview.test': upstream }) }) | ||
|
|
||
| const res = await handle.app.handle( | ||
| new Request(`http://localhost/v1/preview`, { | ||
| method: 'POST', | ||
| headers: { ...authHeaders(handle.bearerToken), 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ url: 'https://preview.test/empty' }), | ||
| }), | ||
| ) | ||
| expect(res.status).toBe(200) | ||
| // Successful extraction with no OG tags is a legitimate result — cache it. | ||
| expect(res.headers.get('cache-control')).toBe('private, max-age=600') | ||
| const data = (await res.json()) as Record<string, string | null> | ||
| expect(data.title).toBeNull() | ||
| expect(data.summary).toBeNull() | ||
| expect(data.previewImageUrl).toBeNull() | ||
| expect(data.siteName).toBeNull() | ||
| }) | ||
|
|
||
| it('does not cache the empty-fallback when upstream returns a non-OK status', async () => { | ||
| const upstream = createTestUpstream('preview.test', () => new Response('bad gateway', { status: 502 })) | ||
| handle = await createTestApp({ fetchFn: createUpstreamRouter({ 'preview.test': upstream }) }) | ||
|
|
||
| const res = await handle.app.handle( | ||
| new Request(`http://localhost/v1/preview`, { | ||
| method: 'POST', | ||
| headers: { ...authHeaders(handle.bearerToken), 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ url: 'https://preview.test/down' }), | ||
| }), | ||
| ) | ||
| expect(res.status).toBe(200) | ||
| // Transient upstream failures must not stick in the per-user cache for 10 minutes. | ||
| expect(res.headers.get('cache-control')).not.toBe('private, max-age=600') | ||
| const data = (await res.json()) as Record<string, string | null> | ||
| expect(data.title).toBeNull() | ||
| }) | ||
|
|
||
| it('rejects targets that resolve to a private address with 400', async () => { | ||
| handle = await createTestApp({}) | ||
| const res = await handle.app.handle( | ||
| new Request(`http://localhost/v1/preview`, { | ||
| method: 'POST', | ||
| headers: { ...authHeaders(handle.bearerToken), 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ url: 'https://127.0.0.1/secret' }), | ||
| }), | ||
| ) | ||
| expect(res.status).toBe(400) | ||
| }) | ||
|
|
||
| it('returns 401 for unauthenticated requests', async () => { | ||
| handle = await createTestApp({}) | ||
| const res = await handle.app.handle( | ||
| new Request(`http://localhost/v1/preview`, { | ||
| method: 'POST', | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ url: 'https://preview.test/x' }), | ||
| }), | ||
| ) | ||
| expect(res.status).toBe(401) | ||
| }) | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,191 @@ | ||
| /* This Source Code Form is subject to the terms of the Mozilla Public | ||
| * License, v. 2.0. If a copy of the MPL was not distributed with this | ||
| * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ | ||
|
|
||
| import type { Auth } from '@/auth/elysia-plugin' | ||
| import { createAuthMacro } from '@/auth/elysia-plugin' | ||
| import { safeErrorHandler } from '@/middleware/error-handling' | ||
| import { createSafeFetch, ensureHttps, validateSafeUrl, type DnsLookup } from '@/utils/url-validation' | ||
| import { Elysia, t, type AnyElysia } from 'elysia' | ||
|
|
||
| export type PreviewDto = { | ||
| previewImageUrl: string | null | ||
| summary: string | null | ||
| title: string | null | ||
| siteName: string | null | ||
| } | ||
|
|
||
| const maxHtmlBytes = 2 * 1024 * 1024 | ||
| const fetchTimeoutMs = 10_000 | ||
| const userAgent = | ||
| 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' | ||
|
|
||
| const emptyPreview: PreviewDto = { previewImageUrl: null, summary: null, title: null, siteName: null } | ||
|
|
||
| /** Read up to `maxBytes` from a body stream, returning null if the cap is exceeded. | ||
| * Avoids buffering an entire response when Content-Length is missing or lying. */ | ||
| const readCappedBody = async (body: ReadableStream<Uint8Array>, maxBytes: number): Promise<Uint8Array | null> => { | ||
| const reader = body.getReader() | ||
| const chunks: Uint8Array[] = [] | ||
| let total = 0 | ||
| try { | ||
| while (true) { | ||
| const { done, value } = await reader.read() | ||
| if (done) { | ||
| break | ||
| } | ||
| total += value.byteLength | ||
| if (total > maxBytes) { | ||
| await reader.cancel().catch(() => {}) | ||
| return null | ||
| } | ||
| chunks.push(value) | ||
| } | ||
| } finally { | ||
| reader.releaseLock() | ||
| } | ||
| const out = new Uint8Array(total) | ||
| let offset = 0 | ||
| for (const chunk of chunks) { | ||
| out.set(chunk, offset) | ||
| offset += chunk.byteLength | ||
| } | ||
| return out | ||
| } | ||
|
|
||
| const decodeHtmlEntities = (text: string): string => | ||
| text | ||
| .replace(/&#x([0-9A-Fa-f]+);/g, (_, hex) => String.fromCharCode(parseInt(hex, 16))) | ||
| .replace(/&#(\d+);/g, (_, dec) => String.fromCharCode(parseInt(dec, 10))) | ||
| .replace(/"/g, '"') | ||
| .replace(/'/g, "'") | ||
| .replace(/</g, '<') | ||
| .replace(/>/g, '>') | ||
| .replace(/&/g, '&') | ||
|
|
||
| const resolveUrl = (baseUrl: string, relativeUrl: string): string => { | ||
| try { | ||
| return new URL(relativeUrl, baseUrl).href | ||
| } catch { | ||
| return relativeUrl | ||
| } | ||
| } | ||
|
|
||
| const metaRegexCache = new Map<string, [RegExp, RegExp]>() | ||
| const getMetaRegex = (attr: 'property' | 'name', value: string): [RegExp, RegExp] => { | ||
| const key = `${attr}:${value}` | ||
| const cached = metaRegexCache.get(key) | ||
| if (cached) { | ||
| return cached | ||
| } | ||
| const pair: [RegExp, RegExp] = [ | ||
| new RegExp(`<meta[^>]*${attr}=["']${value}["'][^>]*content=["']([^"']+)["'][^>]*>`, 'i'), | ||
| new RegExp(`<meta[^>]*content=["']([^"']+)["'][^>]*${attr}=["']${value}["'][^>]*>`, 'i'), | ||
| ] | ||
| metaRegexCache.set(key, pair) | ||
| return pair | ||
| } | ||
|
|
||
| /** Match a meta tag in either content-first or property-first form. */ | ||
| const matchMeta = (html: string, attr: 'property' | 'name', value: string): string | null => { | ||
| const [propertyFirst, contentFirst] = getMetaRegex(attr, value) | ||
| return html.match(propertyFirst)?.[1] ?? html.match(contentFirst)?.[1] ?? null | ||
| } | ||
|
|
||
| const extractMetadata = (html: string, baseUrl: string): PreviewDto => { | ||
| const ogTitle = matchMeta(html, 'property', 'og:title') | ||
| const ogDesc = matchMeta(html, 'property', 'og:description') | ||
| const ogImage = matchMeta(html, 'property', 'og:image') | ||
| const ogSite = matchMeta(html, 'property', 'og:site_name') | ||
| const hasSocial = ogTitle || ogDesc || ogImage || ogSite | ||
|
|
||
| const fallbackTitle = hasSocial ? (html.match(/<title[^>]*>([^<]+)<\/title>/i)?.[1] ?? null) : null | ||
| const metaDesc = hasSocial ? matchMeta(html, 'name', 'description') : null | ||
|
|
||
| const decode = (s: string | null) => (s?.trim() ? decodeHtmlEntities(s.trim()) : null) | ||
| const previewImageUrl = ogImage ? ensureHttps(resolveUrl(baseUrl, ogImage)) : null | ||
| return { | ||
| previewImageUrl, | ||
| summary: decode(ogDesc) ?? decode(metaDesc), | ||
| title: decode(ogTitle) ?? decode(fallbackTitle), | ||
| siteName: decode(ogSite), | ||
| } | ||
| } | ||
|
|
||
| export type CreatePreviewRoutesOptions = { | ||
| auth: Auth | ||
| fetchFn?: typeof fetch | ||
| rateLimit?: AnyElysia | ||
| dnsLookup?: DnsLookup | ||
| } | ||
|
|
||
| export const createPreviewRoutes = (options: CreatePreviewRoutesOptions) => { | ||
| const { auth, rateLimit, dnsLookup } = options | ||
| const fetchFn = options.fetchFn ?? globalThis.fetch | ||
| const safeFetch = createSafeFetch(fetchFn, dnsLookup) | ||
|
|
||
| return new Elysia({ name: 'preview-routes' }) | ||
| .onError(safeErrorHandler) | ||
| .use(createAuthMacro(auth)) | ||
| .guard({ auth: true }, (g) => { | ||
| if (rateLimit) { | ||
| g.use(rateLimit) | ||
| } | ||
| // POST so target URLs do not appear in access logs. | ||
| return g.post( | ||
| '/preview', | ||
| async ({ body, set }): Promise<PreviewDto | { error: string }> => { | ||
| const targetUrl = body.url | ||
| const validation = validateSafeUrl(targetUrl) | ||
| if (!validation.valid) { | ||
| set.status = 400 | ||
| return { error: validation.error ?? 'Invalid URL' } | ||
| } | ||
|
|
||
| const controller = new AbortController() | ||
| const timeoutId = setTimeout(() => controller.abort(), fetchTimeoutMs) | ||
| try { | ||
| const response = await safeFetch(targetUrl, { | ||
| method: 'GET', | ||
| headers: { | ||
| 'User-Agent': userAgent, | ||
| Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', | ||
| 'Accept-Language': 'en-US,en;q=0.9', | ||
| }, | ||
| signal: controller.signal, | ||
| }) | ||
|
|
||
| if (!response.ok) { | ||
| return emptyPreview | ||
| } | ||
| const contentLength = response.headers.get('content-length') | ||
| const parsed = contentLength ? parseInt(contentLength, 10) : null | ||
| if (parsed !== null && Number.isFinite(parsed) && parsed > maxHtmlBytes) { | ||
| return emptyPreview | ||
| } | ||
| if (!response.body) { | ||
| return emptyPreview | ||
| } | ||
| const buffer = await readCappedBody(response.body, maxHtmlBytes) | ||
| if (!buffer) { | ||
| return emptyPreview | ||
| } | ||
| const html = new TextDecoder().decode(buffer) | ||
| // Cache successful OG metadata per-user for 10 minutes. Safe here (unlike | ||
| // /v1/proxy) because the response is a small, derived JSON DTO — not the | ||
| // raw upstream body — and the request body is the only cache key (no | ||
| // `?token=` style explosion). `private` keeps shared/CDN caches out. | ||
| // Only set on the success path so transient upstream failures (empty | ||
| // fallback) aren't sticky for 10 minutes. | ||
| set.headers['Cache-Control'] = 'private, max-age=600' | ||
| return extractMetadata(html, targetUrl) | ||
| } catch { | ||
| return emptyPreview | ||
| } finally { | ||
| clearTimeout(timeoutId) | ||
| } | ||
| }, | ||
| { body: t.Object({ url: t.String() }) }, | ||
| ) | ||
| }) | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.